A few weeks ago, fellow THTer Lucas Apostoleris approached me with an interesting idea for evaluating plate discipline. Using O-swing—the percentage of a batter’s swings that come on pitches outside the strikezone—we know how often the batter is chasing. But what we don’t know—and what may be a very useful piece of information—is how far the batter is willing to chase outside the zone.
This is because O-swing is binary. Whether the batter is chasing two inches off the corner, or a pitch that bounces before homeplate, is ignored:
The implicit assumption that all chases are equivalent helps to create a simple and powerful metric, but one that is flawed. Of course, we can try different variations of O-swing, such as a model with three outcomes instead of two, but we are still bound by the same flaws.
An important restriction is that these metrics live and die with the accuracy of their definitions of the strike zone, the arbitrary box that defines strikes and balls. Of course, right now we are pretty confident in our knowledge of the called strike zone, but we are not 100 percent confident. A metric that measures plate discipline* without relying on a strike zone definition may provide new information.
*Of course, we can’t actually measure plate discipline, because that would involve knowing what pitches the batter actually wants to hit. This is analogous to our inability to measure command; we don’t know where the pitcher wants to throw the ball, we can only infer based on his pitch locations. However, we can use various metrics like walk rate and O-swing as reasonable proxies for plate discipline.
After discussion with Lucas, I calculated the swing area for each batter with at least 1000 pitches thrown to him in 2011. I did by this by finding the area of the 22.4 percent swing contour for each batter. I derived the predicted swing rates by performing logistic regression for each batter, with the existence of a swing being the dependent variable, and pitch location being the independent variable (with smoothing).
Why 22.4 percent? That’s half the average swing rate, which is admittedly somewhat arbitrary. The 22.4 percent swing contour is actually pretty expansive, representing all pitch locations where the batter is expected to swing at least 22.4 percent of the time. Here is the league average 22.4 percent swing contour for right handed batters:
Data are from the catcher’s perspective, and the shading indicates the swing rate, where white indicates no swings and black indicates a high rate of swings. The dotted box represents the strike zone, and the black line that encircles it is the swing contour.
As shown in the graph, the average swing area (at the 22 percent swing contour) is about 8 feet. For context, the area of the strike zone is a little less than 4 square feet. The standard deviation of swing areas for batters who saw at least 1000 pitches in 2011 is 1.8 feet.
Here are the five batters with the smallest swing areas:
Jack Cust 4.88 feet Sam Fuld 4.88 feet Jorge Posada 5.04 feet Josh Willingham 5.12 feet Nick Swisher 5.12 feet
And the largest 5 zones are:
Mark Trumbo 15.08 feet Alex Gonzalez 13.56 feet Pablo Sandoval 13.16 feet A.J. Pierzynski 12.68 feet Miguel Olivo 12.64 feet
Variance of swing rates
I also measured the variance of a batter’s swing rates. Theoretically, variance should give us information about the batter’s strategy. For example, if the player has no variance in his swing rates, then he is swinging at the same rate at every pitch location.
Intuitively, a uniform distribution of swing rates would be a poor strategy, because there are different rewards (or penalties) based on pitch location. A batter with a high level of variance in his swing rates might swing very often at pitches down the middle, but very rarely at all other pitches.
Here are the five batters with the greatest variance in swing rates, in decreasing order:
I should also note that there is a strong relationship between variance of swings and overall swing rate; the correlation between the two is .81.
And here are the five batters with the lowest variance in swing rates, in ascending order:
Comparing to O-swing
To find out if swing area has any value as a metric, I compared it to O-swing. I calculated O-swing using PITCHf/x data and the strike zone definitions found by Mike Fast here. In the interest of transparency, I found the average O-swing to be 28.2 percent and the average zone rate—the percentage of pitches in the strike zone—to be 48.4 percent.
First of all, I find that O-swing and swing area have a very strong relationship:
The strong relationship between the two metrics strongly implies that they are measuring the same skill, or a very similar set of skills. But which metric is more useful?
To test this, I looked at the relationships between these metrics and overall walk rate. I found that O-swing explains 40 percent of the variation in walk rate, while area explains 36 percent of the variance in walk rate. If I run a regression with both swing area and O-swing, I find that the predictive ability of the model is only marginally improved—further evidence that O-swing and swing area measure the same skill.
This result suggests that in terms of predicting walk rates, swing area does not serve any value if O-swing already exists. Additionally, overall swing rate explains 38 percent of variance in walk rates.
I also looked at the relationship between some of these metrics and a measure of overall batter success, run value per 100 pitches (rv100), a linear weights-based statistic that calculates the average run value a batter produces in 100 pitches.
Out of swing area, O-swing, and swing variance, swing variance had the strongest relationship with rv100. As noted earlier, swing variance has a strong relationship with overall swing rate. Despite the strength of this relationship, swing variance has a positive relationship with rv100 and swing rate has a negative relationship.
Perhaps most interestingly, if I run a multiple linear regression with rv100 as the dependent variable and swing rate and swing variance as the explanatory variables, I find an R-squared of 22 percent. This is notable because if I run a regression with rv100 as the dependent variable and walk rate as the (only) explanatory variable, I find that the R-squared is nearly identical at 21.4 percent.
This finding suggests that swing variance and swing rate combined tell us as much about a batter’s overall batting ability as walk rate, which is very surprising. Swing variance also does not explain walk rate very well, so this may suggest that swing variance plays a role in power (ISO) or BABIP ability, but that’s something to research at a later time.
While swing area may not have yielded any additional information about plate discipline, it does give us a way to measure the expansiveness of a batter’s swing area in familiar terms. Additionally, swing variance appeared much more significant than I expected it to be and may give cause for further research as to the relationship between swing variance and isolated power or BABIP. I also plan to extend this analysis to pitchers.
In terms of limitations, swing area doesn’t actually tells us anything about at what pitches or where the batter is swinging, which is a serious impediment. Additionally, I was not able to account for the context of these swings. By this I mean that batters do not swing at the same rate in each count, so we are implicitly including the effect of different count distributions. Of course, this is also a problem with O-swing, but nonetheless, it’s a significant bias and should be noted.
References & Resources
*PITCHf/x data from MLBAM via Darrel Zimmerman’s pbp2 database and scripts by Joseph Adler/Mike Fast/Darrel Zimmerman