In my last article, I introduced a concept called “contact quality” which may indicate whether a pitcher can determine whether batted balls are easier or harder to field than expected. Contact Quality Average (CQA) is calculated by subtracting the pitcher’s expected batting average on balls in play (BABIP) from the BABIP predicted by David Pinto’s Probabilistic Model of Range (PMR). Since both approaches remove the effect of the defense by looking at league average conversion rates, CQA should also be defensive independent.
But merely being defense independent is not enough to imbue CQA with any importance. It needs to demonstrate that it improves the way we measure performance—either looking backward at value, or looking forward as a predictive tool.
When considering a pitcher’s value independent of defense, most people look at Defense Independent Pitching Statistics (DIPS), first formulated by Voros McCracken. The work has gone through many iterations since then, including a version known as DIPS 3.0 that David Gassko unveiled in 2005.
Actually, Gassko released DIPS 3.0 twice, using different methods each time. The first approach used a regression model to identify the relative contribution of different batted ball types to the current year’s ERA. His second attempt translated a pitcher’s actual performance into expected performance based on which batted ball types a pitcher can control and league average conversion rates. The latter approach seems to be more theoretically sound, but both do a fairly good job of predicting a pitcher’s ERA. According to Gassko, the regression method had an “r” value of 0.8, signifying a strong correlation with actual ERA. He doesn’t provide a value for the translation approach, but one can assume it achieves results at least as good as his first attempt.
Since CQA captures information that cannot be inferred simply from batted ball types, it seems like a good candidate to add to the DIPS 3.0 mix to see whether the predictive quality can be improved. Only the regression approach to figuring DIPS 3.0 allows for the addition of another variable, so that was the path I followed with CQA.
When Gassko first attempted his linear regression in 2005, he used data from 2004 which included fewer batted ball types than today. Because of changes in the scorekeeping methodology, it’s not clear whether something that was classified as a line drive in 2004 would, in 2008, be classified as a line drive, a fly ball, or something called a fliner.
So although DIPS 3.0 produced a formula to estimate ERA based on batted ball types, it seemed unlikely that the results would hold up with 2008 data.
Rather than use Gassko’s formula, I repeated his approach, and regressed batted ball types, strikeouts, walks and hit by pitches onto 2008 ERA. Since my batted ball counts for figuring CQA left out home runs, I added those back in for the regression. My sample was the 140 pitchers who had the requisite 300 balls in play to qualify for PMR. I found that only fly balls, ground balls, line drives, strikeouts and walks were significant predictors of ERA (at either 95 percent or 99 percent confidence levels), so I re-ran the regression with only those variables.
The regression found the following coefficients for each of the dependent variables:
Ground ball 0.075 Fly ball 0.239 Line drive 0.452 Strikeout -0.130 Walk 0.401
These results are better expressed in the following formula:
(0.075*GB + 0.239*FB + 0.452*LD - 0.130*SO + 0.401*BB) / IP*9
This equation achieved an “r” value of 0.84 which is slightly better than Gassko found in his study, and slightly better than using the 2004 formula against the 2008 data (also 0.8).
The batted ball approach is quite effective in predicting the current year’s ERA. Can it be made any better by including CQA?
Following the same process as before, but including CQA this time produced the following coefficients, where CQ Hits is CQA multiplied by balls in play to become a number of hits:
Ground ball 0.071 Fly ball 0.225 Line drive 0.458 Strikeout -0.127 Walk 0.380 CQ Hits 0.049
The resulting formula is:
(0.071*GB + 0.225*FB + 0.458*LD - 0.127*SO + 0.380*BB + 0.049*CQH) / IP*9
And while this formula did do a little bit better in predicting the current year’s ERA; the improvement was minuscule, raising the “r” value less than 0.01.
So it’s fair to say that there’s no reason to consider CQA above and beyond normal batted ball types when using the initial conception of DIPS 3.0 to measure pitcher value. It also doesn’t appear to add much additional information in predicting the next season’s ERA, although I wasn’t able to get enough data to do a full comparison.
Although CQA doesn’t seem to be a helpful way of identifying which pitchers were helped by batted balls that were easier to field, I still believe there’s work to be done in this area.
Perhaps more careful analysis of pitch speed, which explains nearly 70 percent of the variation in CQA, might lead to some breakthroughs. An even more fruitful set of data becomes available with the advent of HITf/x in 2009, which will provide batted ball speed and trajectory upon contact.
With that information and more detail on fielder positioning and other data items that are just beginning to be captured, we’re working toward finally solving the riddle of DIPS—exactly how much influence do pitchers really have over balls in play?