How much do we know about pitcher value?

I want to start with an exercise. Let’s look at the pitching staffs of two teams. Both teams finished well over .500, so we would expect them to have good staffs.

Team one had six pitchers who (more or less) filled the five rotation spots over the course of the year. These six pitchers generated 16.1 WAR, which is good for about 3.2 wins per rotation slot. Not bad at all.

Team two also had six pitchers filling the five rotation slots, but they didn’t fare as well. They generated only 12.3 WAR, good for just about 2.5 WAR per rotation slot. That’s still a solid staff, but it doesn’t look nearly as good as the first staff.

What about the bullpens? Again, team one looks awfully good with 7.4 WAR from relievers who pitched at least 20 innings for the big club. Team two, still falling short. That club got only 1.9 WAR from relievers who threw at least 20 innings. Pretty pathetic, actually.

So this must be an article about how a team can win with either pitching or offense, right? I mean, team one obviously has a far superior pitching staff. The starters generated 31 percent more value while the relievers were an incredible 389 percent more valuable.

But this isn’t an article about winning different ways. These teams both won the same way. The exact same way. They are the same team. Ladies and gentlemen, your 2013 Cincinnati Reds.

There was a big fuss this season over the two different kinds of WAR (those generated separately by Baseball-Reference and FanGraphs). There was a big conference and an agreement was reached and everyone was happy. But really, we aren’t talking about two different kinds of WAR, we’re talking about four different kinds because they each figure WAR separately for hitting and pitching.

With offense, you’ll find a fair bit of agreement. There will occasionally be an outlier, but in general, the numbers are pretty close and the differences that exist cause you to look a little more closely at the players.

But with pitching, the error bars are huge. Look at the chart below to see the differences in how the Reds pitchers discussed above were measured:

Player             ERA       bWAR      fWAR      IP
Mat Latos          3.16       3.8       4.4     210.2
Homer Bailey       3.49       3.2       3.7     209
Mike Leake         3.37       3.0       1.6     192.1
Bronson Arroyo     3.79       2.5       0.8     202.1
Johnny Cueto       2.82       1.4       0.6      60.2
Tony Cingrani      2.77       2.2       1.3     104.2
Bullpen            3.29       7.4       1.9

As you can see, there are only two pitchers here about whom there is some agreement. Mat Latos and Homer Bailey have reasonably similar values from both FanGraphs and Baseball-Reference. Interestingly, they are very similar kinds of pitchers. They strike out a fair amount of batters, they don’t walk a ton of guys. They are good pitchers, and it seems the two systems pretty well know how to handle them.

But there is radical disagreement everywhere else. Was Mike Leake well-above average or a serviceable fourth or fifth starter? Was Bronson Arroyo a solid number three or a bad start or two away from a replacement player? Did Johnny Cueto and Tony Cingrani provide an All-Star caliber season between them or were they merely adequate as a pair? And don’t even get me started on the bullpen.

And I’m not just cherry-picking the Reds here. I noticed the prominent differences when writing a season wrap-up about the Reds, but these kinds of disparities are pretty common.

There have been lots of discussions about how WAR should be used, but let’s face it, it’s become for many what ERA or W-L used to be for pitchers, and it isn’t doing a very good job. WAR is supposed to be a value statistic that tells us within a reasonable margin of error how many more wins a player is worth than freely available minor league talent. But even if we assume the truth lies in the exact middle of these two versions, that’s still too much margin for error.

Or, to put it bluntly, according to both versions of WAR, you can fit Clayton Kershaw between the FanGraphs and Baseball-Reference views of the Reds’ staff and still have some room to spare.

There are two things that really concern me about this. One is that, given the information available, I can make entirely different arguments about the relative quality of both individual pitchers and entire staffs. Bronson Arroyo can’t be both above and below average, yet WAR, in its various incarnations, tells us he is. As a commenter noted on the Reds piece I wrote, it makes it impossible to take the stat seriously. The other concern is that where a person looks for WAR will dramatically skew his/her analysis. At this point, I don’t see how an analytical piece that uses WAR can be said to be intellectually honest unless it presents both versions.

There is a lot of information exchange in the sabermetric community, but right now, WAR, which is the flagship stat, isn’t getting it done for pitchers. All it’s telling us is that we don’t really know very much about how to measure pitcher value, and until we do, we should stop using it as a catch-all stat to illustrate pitcher quality.

A Hardball Times Update
Goodbye for now.

Jason teaches high school English, writes fiction, runs a small writing program and writes about education and literature. He also writes for Redleg Nation and both writes and edits for The Hardball Times. Follow him on Twitter @JasonLinden, visit his website or email him here.
15 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
dave
10 years ago

By now I’ve learned that they are two different stats, and I’m glad we measure both.  Referencing WAR is incorrect, akin to asking “Is Molina a good hitter?”  You have to ask “which one?”.  As I understand it WAR isn’t even a stat, it’s just a concept.  rWAR, fWAR, WARP – those are stats.

I think I know more about pitcher value from having both stats than I would if there was just one.  Constricting WAR to one “right” version to attain mainstream acceptance or avoid criticism seems intellectually dishonest, like saying there’s only one way to make a pancake. 

For the record, dPancakes have shredded sweet potatoes and blueberries in them.  and are served with bacon.

Jason Linden
10 years ago

First, will you make me some pancakes.

Second, I mostly agree except that lots of analytical articles use just one version and they use it to represent the value of a player and they don’t mention the wide descrepancies in how different systems see that player.

David
10 years ago

So is the splitting of value between pitching and fielding a big part of the discrepancy?

Ryan
10 years ago

They are supposed to be different. I’d be more concerned if they weren’t. The stats being different tells you two things:

1. The Reds had a good season from their pitchers
2. They should not go into next season expecting the same performance from those pitchers, especially the pen.

I think I agree with the general point though, we should find some way of defining this so that one stat is “WAR” and the other is whatever it is. It would seem to me that baseball-reference is an accurate reflection of what happened after the season, but that fangraphs is the more useful overall.

Jason Linden
10 years ago

Ryan,

I tend to agree with you, but then there is a pitcher like Arroyo who (except for the year he had mono) is always better than FanGraphs says he should be. Additionally, Dave Cameron wrote an article where he said that Johnny Cueto is so good at controlling the running game that his FIP is skewed higher than it should be.

So, even looking at the future, FanGraphs has its limitations.

Paul G.
10 years ago

Hey, if you want more fun, look at 1884.  Old Hoss Radbourn wins 59 games.  He leads the league in Wins (duh), Winning Percentage, Innings Pitched, Strikeouts (though not the rate stat), ERA, and ERA+.

bWAR: 19.3, fWAR 7.9. 

For added fun, bWAR has him as the best player in the league but not the best pitcher with Pud Galvin beating him out 20.5-19.1 for pitching only.  By fWAR his 7.9 (7.3 pitching only) ranks third best behind two more pitchers in the NL: Galvin (9.4 overall, 8.9 as a pitcher) and Charlie Buffington (8.7, 8.8 as pitcher).  Keep in mind that Radbourn’s ERA+ is 205 versus 155 (Galvin) and 133 (Buffington).  It boogles.

Greg Simons
10 years ago

Jason, nice job of highlighting a gap in our knowledge.  Yes, the goal of WAR is admirable, but the various implementations show it’s far from an ideal catch-all stat.

dave
10 years ago

Jason,

yeah as a fan of both types of WAR Arroyo and glavine (and buehrle, cain . . .) are fascinating.  do they just happen to be outliers or are they outliers by design?  I know some work has been done but I’m not convinced either way.

Pancakes coming off the griddle in 15 minutes, how fast can you get to northern VT?

db
10 years ago

Plain and simple, fWar is calculated stupidly as it looks at fip and not RA.  Results matter and we don’t calculate war for hitters by looking at batted ball profiles – a double is a double whether it is a bloop down the line or a laser off the wall.  They both have the same value.  It makes no sense to ignore the results from pitchers in looking at a retrospective measure of value.

Mike Ford
10 years ago

<start RANT>

I have to disagree with the idea that having differant stats as WAR tells as anything useful.  As a concept WAR is a measure of how many wins a player adds to his team.  A player will add x wins to his team.  If WAR is really measuring this it will produce a value of x.  ALWAYS!  So if two differant schemes produce answers that do not agree we have to assume that one or the other (or even both) do not reflect the true value that player has as a team.

If WAR-1 produces a value of y wins and WAR-2 produces a value of z wins, we do not have a meaningful way of parsing out any other interuption than wins.  Wins are wins,  Neither scheme changes the value of the win.  Would losing the player mean a loss of y or z wins to the team?
I don’t know.

This is where the concept of WAR is flawed.  Each measured event needs to be weighted.  The weight of these events often reflects the biases of the researchers modeling a baseball season.  It becomes hard to argue about these because it is an easy to understand number with a hard to understand a formula.

We can see the biase in the old fashion won-loss record.  Of course so and so won a lot, he got lots of run support,  Or such and such is better than his record, they don’t hit behind him.  But why DOES one WAR favor one pitcher while the other hates him?  (SHRUG)

<end RANT>

MGL
10 years ago

I disagree with this premise. WAR is in fact a concept and not a statistic. fWAR and bWAR are two different statistics using the same concept and represent two similar but different things.

It is not the fault of the concept (or of the two stats) if people are equating WAR the concept with one or the other statistic, it is fault of the person doing the analysis or making the assertion.

Granted, it is a little bit confusing, but so are lots of other “technical” things.

So while both are a measure of theoretical “wins” which one you use depends on what you are trying to measure, present, analyze, etc. One simply tells you how many runs a pitcher has allowed compared to a replacement pitcher throwing the same number of innings, translated to “wins” while the other does the same thing with a little bit of the luck and defense “removed.”

Whichever one you are interested in you can use.

One is more for retrospective “analysis” like awards, and the other is more for evaluating talent and projecting performance.

The only confusing part is that they both have the same last 3 letters in caps. They are two different stats representing two different, but equally good and useful things. They happen to share the same concept which is why they have similar names.

Again, any confusion of one with the other or attempt to suggest that they represent the same thing, or any attempt to call one or the other WAR, without explaining which one you using and why, it the fault of the person and not the concept of the two stats.

obsessivegiantscompulsive
10 years ago

Thank you Jason, for that article, this has bugged me for a long time too. 

Add on top of that, defensive measures are all over the place too, and thus position players’ WAR values are all over the place too.

I’m coming to the opinion that WAR is even less useful than Wins for a pitcher. 

Say, if you got time, maybe you can update the table with Bill James Winshares number for the pitchers (where 3 winshares equals one win, if I remember right), and see where his numbers go.

Frank
10 years ago

I love that there are other smart baseball fams out there on the internet. Seeing as they are definitely the rarer breed. WAR isn’t evil, but it’s blindly believed by the casual sabre fan, and arrogantly defended by those who calculate it, because they don’t want to admit how speculative their best work is.

Gideon Clarke
10 years ago

MGL wrote, regarding bWAR vs. fWAR:

“One is more for retrospective “analysis” like awards, and the other is more for evaluating talent and projecting performance.”

Half of that sentence is just silly, though. It’s silly to use WAR to decide who should get awards. Not that I think we should go back to using batting averages and win-loss records, but whatever stats we use, they should be stats that start from a sensible basis for comparison. When trying to figure out which players in the league are the best, we should be comparing them to other good players. WAR compares MVP candidates to Brock Holt or Dana Eveland to decide who was the best player. I think the premise here is faulty.

It’s worse when people use WAR to talk about who should be in the Hall of Fame. Guys can add a bunch of WAR to their numbers at the front and back of their careers by while actually being below-average players. If WAR ever becomes the go-to stat for HOF voters, it is conceivable that a situation could arise where a borderline player gets pushed over the hump by playing badly for a couple of extra years at the end of his career, because it is a stat that rewards below-average play with numbers that look positive once you add them all up at the end of a career. As lame as batting average and pitcher win-loss are, at least they leave you with a number that goes down when a player stops playing well. I think that the intelligent equivalents of those stats–maybe OPS+ & wOBA for batters and WHIP & ERA+ for pitchers–should be the defaults in discussions of which players had the best seasons or the best career.

WAR was never meant to be used that way, though. WAR was designed for making roster decisions in the context of a small-market team that needs to get something for nothing whenever it can if it wants to win. WAR is for deciding who to protect on the 40-man roster, who to take in the Rule 5 draft and who to invite to spring training. It’s not for putting together an All-Star team or voting players into the Hall of Fame.

Michael
9 years ago

I thought WAR was the title of a great U2 album…….lets let them play the games and at the end of the day, all is speculation as the Fantasy baseball gods across the internet spectrum would be rich beyond their wildest dreams.