I want to start with an exercise. Let’s look at the pitching staffs of two teams. Both teams finished well over .500, so we would expect them to have good staffs.
Team one had six pitchers who (more or less) filled the five rotation spots over the course of the year. These six pitchers generated 16.1 WAR, which is good for about 3.2 wins per rotation slot. Not bad at all.
Team two also had six pitchers filling the five rotation slots, but they didn’t fare as well. They generated only 12.3 WAR, good for just about 2.5 WAR per rotation slot. That’s still a solid staff, but it doesn’t look nearly as good as the first staff.
What about the bullpens? Again, team one looks awfully good with 7.4 WAR from relievers who pitched at least 20 innings for the big club. Team two, still falling short. That club got only 1.9 WAR from relievers who threw at least 20 innings. Pretty pathetic, actually.
So this must be an article about how a team can win with either pitching or offense, right? I mean, team one obviously has a far superior pitching staff. The starters generated 31 percent more value while the relievers were an incredible 389 percent more valuable.
But this isn’t an article about winning different ways. These teams both won the same way. The exact same way. They are the same team. Ladies and gentlemen, your 2013 Cincinnati Reds.
There was a big fuss this season over the two different kinds of WAR (those generated separately by Baseball-Reference and FanGraphs). There was a big conference and an agreement was reached and everyone was happy. But really, we aren’t talking about two different kinds of WAR, we’re talking about four different kinds because they each figure WAR separately for hitting and pitching.
With offense, you’ll find a fair bit of agreement. There will occasionally be an outlier, but in general, the numbers are pretty close and the differences that exist cause you to look a little more closely at the players.
But with pitching, the error bars are huge. Look at the chart below to see the differences in how the Reds pitchers discussed above were measured:
Player ERA bWAR fWAR IP Mat Latos 3.16 3.8 4.4 210.2 Homer Bailey 3.49 3.2 3.7 209 Mike Leake 3.37 3.0 1.6 192.1 Bronson Arroyo 3.79 2.5 0.8 202.1 Johnny Cueto 2.82 1.4 0.6 60.2 Tony Cingrani 2.77 2.2 1.3 104.2 Bullpen 3.29 7.4 1.9
As you can see, there are only two pitchers here about whom there is some agreement. Mat Latos and Homer Bailey have reasonably similar values from both FanGraphs and Baseball-Reference. Interestingly, they are very similar kinds of pitchers. They strike out a fair amount of batters, they don’t walk a ton of guys. They are good pitchers, and it seems the two systems pretty well know how to handle them.
But there is radical disagreement everywhere else. Was Mike Leake well-above average or a serviceable fourth or fifth starter? Was Bronson Arroyo a solid number three or a bad start or two away from a replacement player? Did Johnny Cueto and Tony Cingrani provide an All-Star caliber season between them or were they merely adequate as a pair? And don’t even get me started on the bullpen.
And I’m not just cherry-picking the Reds here. I noticed the prominent differences when writing a season wrap-up about the Reds, but these kinds of disparities are pretty common.
There have been lots of discussions about how WAR should be used, but let’s face it, it’s become for many what ERA or W-L used to be for pitchers, and it isn’t doing a very good job. WAR is supposed to be a value statistic that tells us within a reasonable margin of error how many more wins a player is worth than freely available minor league talent. But even if we assume the truth lies in the exact middle of these two versions, that’s still too much margin for error.
Or, to put it bluntly, according to both versions of WAR, you can fit Clayton Kershaw between the FanGraphs and Baseball-Reference views of the Reds’ staff and still have some room to spare.
There are two things that really concern me about this. One is that, given the information available, I can make entirely different arguments about the relative quality of both individual pitchers and entire staffs. Bronson Arroyo can’t be both above and below average, yet WAR, in its various incarnations, tells us he is. As a commenter noted on the Reds piece I wrote, it makes it impossible to take the stat seriously. The other concern is that where a person looks for WAR will dramatically skew his/her analysis. At this point, I don’t see how an analytical piece that uses WAR can be said to be intellectually honest unless it presents both versions.
There is a lot of information exchange in the sabermetric community, but right now, WAR, which is the flagship stat, isn’t getting it done for pitchers. All it’s telling us is that we don’t really know very much about how to measure pitcher value, and until we do, we should stop using it as a catch-all stat to illustrate pitcher quality.