About a year ago, I pleaded for baseball analysts (particularly fantasy analysts) to stop using FIP in forward-looking analysis (and then expanded upon my point a month later). In the year that has since passed, FIP has largely (and rightfully) been replaced by xFIP.
Despite the strides that have been made in the use of FIP and xFIP, you will still occasionally see them misunderstood or used incorrectly. I don’t pen articles every time this happens because I think that I (and others) have covered it pretty fully in the past, but a couple of readers pointed out an article that was published earlier this week at ESPN Fantasy that I figured I would comment on since it has received some attention (Tom Tango also responded to it here).
In the article, A.J. Mass is critical of FIP and tries to show how it may not be all that it’s cracked up to be, but his methods are flawed:
Proponents of FIP would have us believe that if a pitcher’s ERA is far lower than his FIP, we should expect a regression the following season. Similarly, if a pitcher has a higher ERA than FIP, then he was probably more unlucky than anything else, and due for a bounce-back campaign. So how does that play so far in 2010? Let’s go to the leaderboard and see:
[2010 ERA Leaderboard graphic]
Certainly the season is not over yet, but even though every single one of these current ERA leaders who pitched in the majors last year has a lower ERA in 2010, only four were “predicted” to do so.
This analysis is riddled with selection bias. By looking only at the league leaders in ERA, you’re guaranteeing that the vast majority will have an ERA lower than their FIP. Why? Because they’re overperforming! They’re statistical outliers in a small sample size. Are we really expecting Jaime Garcia to post a 1.32 ERA or Ubaldo Jimenez to post an ERA under 1.00? I certainly hope not.
Nobody’s skills are that good, so how could we possibly expect FIP to say that they are? By no reasonable standard could we have predicted any of these pitchers to have an ERA as low as they currently do. Reject fielding independent pitching stats if you want (you’ll still be wrong, although you’re welcome to do so), but don’t do so on the basis that they can’t predict Livan Hernandez to post a 2.22 ERA through June 10 unless you can show me a method that can.
Also, as a minor point, we should note that FIP isn’t a projection. It’s not necessarily a predictive stat, though it is often treated as such because it is more predictive than ERA.
I suppose one could argue that Livan Hernandez could well finish this year with an ERA of 4.00 and satisfy both the current prediction that he’s due for a regression this season, as well as the prediction that he would better his 5.44 ERA from 2009, and use that as “proof” that FIP works. It seems to me, however, that this particular use of FIP is misguided.
Let’s say a pitcher’s FIP does indicate that his ERA “should” be lower than it is, because his defense has let him down. Well, if his defense isn’t changing — meaning he’s going to continue to have the same basic starting lineup behind him for the rest of the season — then why should we expect a change in its impact on his ERA?
Because, like anything else, we’re looking at a finite sample. In two months of a season, we can’t say with absolute certainty how good (or bad) a particular defense is, and it certainly won’t manifest itself in 75 innings for a particular pitcher. Considering how unstable BABIP is (the primary way in which defense manifests itself in a pitcher’s line), expecting a pitcher’s BABIP through June 10 to match his BABIP at the end of the season is misguided.
This argument becomes even more absurd when you realize that pitchers on the same team rarely post identical BABIPs. By this article’s logic, though, they should, since they have the same defense behind them. That, however, is simply not the case. For example, Wandy Rodriguez currently has a .354 BABIP while teammate Roy Oswalt is rocking out to a .278 figure.
Also, this statement fails to realize that FIP does more than just strip out the effects of defense. It also helps to eliminate luck, simple random variation. There is simply too much that happens when a cylindrical bat meets a spherical ball that will land somewhere in a 100,000-square-foot playing area for BABIP to perfectly encapsulate the concept of “defense.” Fielding independent stats remove both the portion of BABIP that is defense and the portion that is luck.
Continuing on …
The following table [2010 FIP Trailers] shows us the pitchers who have the least control of their own fates. As such, they fall victim to bad breaks and balls hit just out of the reach of a diving outfielder far more than the previous list [2010 FIP Leaders]. Some of these guys may indeed simply be bad. Others, like Ian Kennedy and his .229 batting average against, might be in for a rude awakening as the summer drags on.
More important here than having “the least control of their own fates,” these pitchers are simply bad. They’re prone to “bad breaks” to the same extent that good pitchers are—it’s just that they allow more balls in play that they can get lucky or unlucky on. They don’t “fall victim to … balls hit just out of the reach of a diving outfielder” more than the pitchers with good FIPs. Once a ball is put in play, there is minimal difference between one allowed by a good pitcher and one allowed by a bad pitcher (as judged by FIP). It’s left up to the fielder and to chance in both cases.
Finally, because Mass uses FIP instead of xFIP, he draws an incorrect conclusion about Ian Kennedy—that he is pitching poorly. Sure, his 3.17 ERA is way too low, but his 4.30 xFIP is a half-run lower than his 4.80 FIP and plenty valuable in deep mixed and NL-only leagues.
That’s all for this week. I’m sure this is review for many of you, but I know that THT Fantasy has welcomed a lot of new readers since I last discussed FIP a year ago, so I thought it’d be a good idea to go over some of the misconceptions and misapplications of it. If you have any questions, as always, feel free to let me know.