Predictive ability of swing areaby Josh Weinstock
November 30, 2011
The internet baseball community loves to use plate discipline statistics. Measures like swing rate, in-zone rate, first-pitch strike rate, and many others are prevalent in many sabermetric analysis. A month ago, I introduced another metric called swing area, which you can read about here and here.
We like these metrics because they give us a glimpse into the actual batter-pitcher match-up. With most baseball statistics, we are stuck with the result of a plate appearance. Strikeouts, walks, homeruns—these tell us nothing about what happened during the plate appearance. And while this is extremely important information, it feels somewhat detached from the actual baseball experience.
But what can be lost in our frequent usage of plate discipline metrics is how much useful information they actually tell us.
In the previous two swing area articles, I looked at what swing area tells us about batters and pitchers in the same year. This time I will look at what swing area, and other metrics, can tell us about the future. To test this, I first calculated swing area for 2010 pitchers with the same restriction as before: I only looked at pitchers who had thrown at least 1000 pitches.
To confirm previous results, I looked at the relationship between 2010 swing area and 2010 strikeout rate. This time, after again ignoring outliers, I find that 2010 swing area explains 18.5 percent of the variation in 2010 strikeout rate. With 2011 data, I found that swing area explained 14.5 percent. As we found before, O-swing explained significantly less of the variation in strikeout rate than did swing area.
If we look at the relationship between plate discipline metrics in the N-1 year and the strikeout rate in year N, we find some pretty interesting results. I should also note here that the only pitchers I looked at have thrown at least 1000 pitches in both 2010 and 2011 so that I could calculate swing area. I have also not controlled for aging or run environment.
Unsurprisingly, swing area does worse at predicting strikeout rates of the next year than in the same year. However, it still does pretty well, relative to similar metrics. The coefficient of swing area is .019. This means that for every increase in the values of 2010 swing area by one, we can expect a corresponding increase in 2011 strikeout rate by a little less than two percent. This is equivalent to the coefficient I found for 2011 swing area and 2011 strikeout rate.
O-swing is entirely worthless at predicting next year's strikeout rate; a regression of 2011 strikeout rate on 2010 O-swing yields an R-squared of zero, and O-swing does not even approach significance (p-value = 0.4 for a one-sided test, 0.8 for a two-sided test).
But if we have plate discipline metrics from 2010, we also know strikeout rate from 2010. Do these metrics give us any information that the previous year's strikeout rate does not?
If I run a regression of 2011 strikeout rate on 2010 strikeout rate and 2010 swing area, I find that swing area no longer has significance. I find the same result for O-swing. In other words, these plate discipline metrics are not useful in predicting the next year's strikeout rate if we already know the previous year's strikeout rate.
K/PA = strikeouts per plate appearance
whiff = whiffs / pitch
contact = 1 - (whiff/swings)
swing = swing rate
zone = rate at which pitches are thrown in the strike zone. I have used two strike zones here for left-handed and right-handed batters, based on Mike Fast's research.
BB/PA = walks per plate appearance
fip = fielding independent pitching
oswing = percentage of pitches outside the strikezone that the batter swings at.
rv100 = linear weights based metric that multiplies a pitcher's average run value per pitch by 100
babip = batting average on balls in play
area = swing area
K/PA .73 whiff .73 contact .73 swing .68 zone .68 BB/PA .64 fip .48 oswing .45 rv100 .44 babip .31 area .28I should first restate some limitations. Again, these are only for pitchers who threw at least 1000 pitches in both 2010 and 2011. The data also include both relievers and starters, which is likely artificially increasing the correlation of more than a few of these metrics.
Unsurprisingly, strikeout rate is very stable from year to year. Disappointingly though, swing area is not very stable from year to year, and is less stable than O-swing.
This can lead us to infer that swing area is subject to more noise than O-swing. Does this mean that it's less of a skill? Probably. Of course some of the low year-to-year correlation for swing area can likely be attributed to kinks in the calculation method, which I'm sure can be improved so that it does not exaggerate the swing areas of pitchers with data problems and outliers.
Also surprising is the year-to-year correlation for BABIP, which higher than I expected. I'm sure the correlation is significantly inflated by the fact that I have both relievers and starters in the sample, and relievers typically demonstrate lower BABIPs than starters by about 17 percent.
Still so much to explore. Why is swing area so much more useful for pitchers than hitters? Why is swing area much more useful than O-swing in predicting strikeout rates in both year N and year N-1, but less stable? And in more general terms, are we placing too much importance on plate discipline stats? These metrics seem to have use if we want to create a narrative, but they are not very helpful when we want to make predictions.
References and Resources
PITCHf/x data from MLBAM via Darrel Zimmerman's pbp2 database. Thanks to Lucas Apostoleris for coming up with the idea behind swing area
You can read more of Josh's work at FanGraphs, Beyond the Box Score, and itsaboutthemoney.net. Josh welcomes discussion through email and twitter. You can reach him at josh82093 at gmail dot com and on Twitter @J__Stock (two underscores).
<< Return to Article