FIP, Game Score, and evaluating starting pitchingby Matt Hunter
April 04, 2013
If you read my first piece at THT a couple weeks ago, you are aware of my recent interest in questions as they pertain to baseball, and my not-so-recent interest in questions as they pertain to life, philosophy, and all that jazz. I'm a strong believer that in order to discover new things, and improve upon the things that we have already discovered, we need to know how to ask the right questions.
Last time I looked at what sorts of questions the most common offensive statistics do, and/or are trying to, answer. Naturally, my first thought for this piece was to turn to pitching statistics. But instead of looking at three or more stats like last time, I'm going to be more specific and expand on a couple stats that I find most interesting.
Which stats? Well, though there is much to be said about saves, I find the way in which we measure starting pitcher performance much more fascinating, maybe just because it's so much more important. Traditionally, pitcher wins and ERA are the primary factors in determining a starting pitcher's performance, as well as the quality of an individual start. Of course, we all know that ERA and especially wins aren't the best indicators of pitcher quality, especially at an individual game level.
But just because they aren't perfect doesn't mean they aren't useful. Why are they useful? Because they give us interesting questions to answer, of course!
On the surface: "In how many games did the starting pitcher's team win the game, given that he pitched five innings, left with a lead, and that lead held for the remainder of the game?"
Digging deeper: "In how many wins was the pitcher the main contributor (of all pitchers) to said win?”
The fundamental question: "How many games did the pitcher help his team win?"
On the surface: "At what rate did the pitcher prevent runs from scoring, not counting runs resulting from plays deemed to be errors by the official scorer?"
Digging deeper: "At what rate did the pitcher, of his own accord, prevent runs from scoring?"
The fundamental question: "How well did the pitcher pitch?"
In the end, wins and ERA are getting at very similar questions. The former wants to measure pitcher performance as a counting statistic—that is, how many wins can we credit to the pitcher—while the latter wants to measure pitching performance as a rate statistic—that is, simply, how good is the pitcher?
Here's the thing: for a starting pitcher, there is less use than one might think in separating the counting statistic and the rate statistic. After all, though we divide a starter's performance into a number of units, such as batters faced and innings pitched, in the end the most important unit for a starting pitcher is games started.
Before you object, I don't mean that starts should be the denominator when we're measuring strikeout rate or walk rate or what have you. I mean that games started are the only units that are, for the most part, independent of each other. In other words, each batter faced is not independent—what happens in one plate appearance has an effect on the next plate appearance, whether it be a change in approach, pitching out of the stretch, or fatigue for the pitcher. The same goes for innings pitched; each inning is not independent of the previous one.
On the other hand, each start is basically a new event for the pitcher. Sure, there might be some fatigue or changed approach carried over, but for the most part, the end of one start marks the end of that pitching performance. This is what pitcher wins gets right—in theory, it counts the starts that helped the team win and doesn't count the starts that didn't do this.
Getting back to rate stats vs. counting stats, the idea that games started is the most important unit for a starting pitcher means that as far as measuring pitching performance goes, we want to base our judgments on what a pitcher does per game, not per inning or per batter faced. There are certainly times—in fact, many or most times—in which those are better units of measurements, but when talking about the quality of starting pitching performance, we fundamentally care about what the pitcher does per game.
I'll admit, this train of thought was prompted not by the above questions, but by a quick look at the FanGraphs leaderboards for starting pitchers after the first couple days of the season. At first, all the list isn't all that surprising—Yu Darvish, Jeff Samardzija, and Clayton Kershaw, each of whom pitched gems—lead the pack in fWAR.
However, as you move down the list, there are a few questionable entries. One pair stood out in particular:
Matt Harrison: 5.2 IP, 6 R, 5 ER, 9 K, 3 BB, 0 HR, 0.2 WAR
Stephen Strasburg: 7 IP, 0 R, 0 ER, 3 K, 0 BB, 0 HR, 0.2 WAR
At least in my view, Stephen Strasburg pretty clearly pitched a better game than Matt Harrison. He pitched over an inning more than Harrison while giving up five fewer runs and allowing six fewer baserunners. So why is Harrison (barely) ahead in WAR? Well, as most of you likely know, FanGraphs uses FIP to calculate WAR, only counting strikeouts, walks, and home runs.
Now I'm not here to bash FIP—I think it's a great metric and over a season it's a great stat to use. However, it seems to miss the mark on an individual game level. Not only do we, as fans, care about non-HR hits and runs, but it seems ridiculous to discount them when evaluating a single pitching performance. If a pitcher gives up a bunch of line drive doubles and six runs, yet gives up no home runs and gets eight strikeouts and no walks, do we really want to say that he pitched a good game? I don't think so.
Here's the problem: if FIP doesn't work on an individual game level, then it can't really work on a larger scale. The reason it looks like it works is because those weird games where FIP overvalues or undervalues a performance tend to balance out in the end. For the most part, the flaws that we see on an individual basis don't appear when we congregate many games together.
What do we conclude from this? Well, if we assume that each game started is basically independent of the previous one, all we need to do is figure out a way to evaluate each start on an individual basis, and then we can simply add up the scores for each start to determine the pitcher's value.
That's where Game Score comes in. A metric developed by Bill James, Game Score does exactly what I just described: evaluates each start individually in order to rank its quality. While the original Game Score formula may be a bit imperfect, the idea is sound. If we can figure out a good way to value each start individually, we can use that to value a season's worth of pitching.
How can we improve Game Score? Well, that's a great question, and one that I don't know how to answer right now. First of all, I'd encourage you to check out a post by Tom Tango at FanGraphs a few years ago, in which he introduced some possible versions of Game Score (and inspired great discussion in the comments). Tom also reintroduced the post yesterday on his site (which I was unaware of until halfway through this article), so you should check that out and provide some input.
I can't think of a good reason why we can't use Game Score to evaluate a starting pitcher's season-wide value or contribution. There may need to be some extra steps to account for park and league and scale the number to wins, but as a concept, this could an interesting alternative to our current evaluation tools.
Matt writes for FanGraphs, Beyond the Box Score, and the Hardball Times. You can contact him via Twitter @MRHBaseball or email.