Ever since Voros McCracken wrote his treatise on DIPS (Defensive Independent Pitching Statistics), baseball statisticians have been involved in a race to find chinks in the theory’s armor. While overall, DIPS theory—which states that pitchers have little control over whether or not a ball put into play becomes a hit—has been shown to work pretty well, it has also become clear that it does not apply consistently to every pitcher.
For example, in his second version of DIPS, McCracken adjusted for the fact that knuckleballers tend to allow lower batting averages on balls in play (BABIP) than the average pitcher, and that lefties tend to allow more hits on balls in play. Tom Tippet showed that good pitchers allow less hits on balls in play than their teammates, and JC Bradbury has done a lot of work with this as well.
Over the past couple of weeks, comments by esteemed baseball statisticians Mike Emeigh and Tangotiger have inspired me to further look into cases in which DIPS theory breaks down.
Does a High BABIP Indicate Something More than Bad Luck?
“What I am suggesting is this:
1. DIPS works well for identifying flukish “good” seasons based on a low BABIP (Joe Mays, 2001).
2. You need more information to determine whether a “bad” season based on a high BABIP is flukish (Randy Johnson, 2003) or legitimately bad. Without such information, you’re usually better off assuming that it was legitimately bad …
—Mike Emeigh (June 19, 2006)”
The question here is pretty simple. Essentially, what Mike is stating is that pitchers who allow a large amount of hits on balls in play are more likely to maintain a high BABIP than pitchers who allow few hits on balls in play are likely to maintain a low BABIP.
There can be a few possible explanations here: (1) The pitcher is injured, and easier to hit because of it; (2) The pitcher is done and his stuff more easily hittable; (3) The pitcher is simply not cut out to be a major leaguer. As Clay Davenport has demonstrated, guys with high BABIPs in the minors generally don’t make it to the major leagues, so if one slips through the cracks, they will probably have a high BABIP in the major leagues as well, and get sent down soon enough.
But is this true? Do pitchers who allow a lot of hits on balls in play in fact perform worse the next season than they might be expected to?
To examine this question, I looked at all pitchers who faced at least 200 batters in both 2004 and 2005. I split my sample of 153 pitchers into three groups: The pitchers with the 25 best BABIPs in 2004, those with the 25 worst, and the rest.
The pitchers who were best at preventing hits on balls in play in 2004 had a composite BABIP of .252 that year, versus an average of .291 for the whole sample. In 2005, they regressed to a BABIP of .282, versus an average of .288, so this group saw an advantage of only six points of BABIP in the next year.
This seems like as good a point as any to make an interjection. What would the expected BABIP for this group of pitchers be in 2005, given their BABIPs in 2004? The way to determine the answer to this question is to determine something called regression to the mean. Essentially, regression to the mean tells us how much we move a player’s numbers towards the average in any given statistic.
Regression to the mean is determined by looking at correlation (sometimes known as “r”). For example, the correlation for BABIP in our sample is .143, which means we need to regress the average pitcher 86% to the mean. If we do a little math, we find that the average pitcher in this sample allowed 411 balls in play, and from that we can construct our regression formula, which is:
r = BIP/(BIP + 2,462)
Regression to the mean is equivalent to 1-r.
So if we regress each pitcher’s BABIP sepaqrately, we find that the guys with the 25 lowest BABIPs in 2004 were expected to have a BABIP of .284 in 2005, versus an actual figure of .282. Pretty close.
Now what about the high-BABIP pitchers? In 2005, they combined to post a .324 BABIP, which translates into an expected BABIP of .294 in 2005. The actual figure was .297—again pretty close.
From this we can conclude that Emeigh is wrong, and that high-BABIP pitchers tend to regress to the mean as much as the average pitcher. However, as some readers have probably noticed, there is a possible selection bias in this study. Let me explain.
By taking only pitchers from 2005 who faced at least 200 batters, I’m limiting my sample to pitchers who were trusted enough by management to pitch to 200 batters in 2005. Now let’s say that Emeigh is right, and there is indeed generally something wrong with pitchers with extraordinarily high BABIPs. In that case, they would probably continue to struggle in spring training and in the next year, and would likely not be allowed to rack up that many innings. So perhaps by looking only at pitchers that did face 200 or more batters in 2005, we’re eliminating the subset that Mike is talking about.
So what happens if we include all pitchers from 2005, while still employing the 200 BFP minimum for 2004? Our sample goes up to 208, and our year-to-year correlation also rises to .169. Our new regression formula is BIP/(BIP + 1,764).
Among this group, the 25 pitchers with best BABIPs in 2004 posted a collective BABIP of .283 in 2005, versus an expected BABIP of .284. One pitcher, Paul Abbott, did not pitch at all in ’05. In 2004, by the way, this group combined for a BABIP of .246.
What about the high BABIP pitchers? In 2004, they had a collective BABIP of .332, which works out to an expected BABIP of .297 in 2005. They actually posted a .301 BABIP in ’05, which indicates that high-BABIP pitchers do not do worse than expected in the next season.
There is some evidence present that Mike is right. First of all, two of the high-BABIP pitchers were out of the league in 2005. More importantly, of the 23 remaining pitchers, seven posted BABIPs 30 points or higher than expected in 2005. Only three of the 24 pitchers in the low-BABIP group posted a BABIP 30 points lower than expected in ’05. This suggests that a pitcher is much more likely to sustain a high BABIP than a low one, as Mike posited.
Nevertheless, the evidence is at best inconclusive.
Does DIPS Apply Differently to Starters and Relievers?
“Is the DIPS phenomenon really about starters and relievers?”
—Tangotiger (June 10, 2006)
This certainly is a good question. Let’s take all pitchers from 2004 and 2005 with at least 200 BFP in both seasons (153 in all), and split them into four groups: starters, who started in at least 80% of their appearances in 2004; swingmen, who started in at least 20% (but less than 80%) of their appearances in 2004; closers, who started in less than 20% of their appearances in 2004 and also had at least 20 saves; and middle relievers, who started in less than 20% of their appearances in 2004 and also had less than 20 saves.
Does each group demonstrate the same consistency in maintaining their BABIP from year-to-year. Surprisingly, no.
The starters in our sample showed a year-to-year correlation of .17, and closers were a little bit stronger at .21. On the other hand, middle relievers and swingmen showed no consistency, displaying year-to-year correlations of .04 and -.07.
Furthermore, if we take JC Bradbury’s thesis that we can predict a pitcher’s BABIP by using fielding-independent statistics, and apply it to these four groups, we find that while closers and starters (to a very, very small degree) are somewhat predictable, there is no correlation between the fielding-independent numbers of swingmen and middle relievers and their BABIP the next year.
What does this all mean? It seems that DIPS theory does not really apply to closers, and starting pitchers may also be somewhat out of its bounds. On the other hand, middle relievers and swingmen have very unpredictable BABIPs—in large part because of small sample sizes, but also probably because they simply have less control over balls in play.
What seems to be the conclusion, for now, is that good pitchers control the outcomes of their balls in play much more than mediocre ones.