People like trends. I like trends, you like trends, heck, even your mom likes trends. Why? Because we are human, and viewing the world in trends is intuitive and makes the big bad world just a little less scary. While tempting, this impulse does not always produce the best analysis.
Consider hot streaks. All research indicates that the majority of hot streaks, especially in baseball, are not predictive. In fact, hot streaks have been referred to as a “cognitive illusion” by Thomas Gilovich, professor of psychology at Cornell University. But they certainly feel tangible, predictive, real.
A great deal of sabermetrics controverts our itch for trends. A chief example is defense independent pitching statistics, or DIPS. Perhaps the ultimate sobering sabermetric doctrine, DIPS stresses the lack of control that pitchers have over outcomes on balls in play.
Years after Voros McCracken’s breakthrough research, DIPS has been intensely scrutinized and has come out relatively unscathed. Despite the support for DIPS, I will admit that at times I still struggle with the concept. I am human after all, and it is reasonable, if not wise, to be skeptical of a theory that so strongly goes against common sense.
It is often said that pitching is about disrupting the timing of the batter. I do not know who first said this, but it sounds reasonable. [Ed.: It was Warren Spahn.] And what better types of pitchers personify this advice than pitchers who throw lots of changeups?
In reality, a changeup is really only effective when coupled with a fastball. In terms of movement, a changeup is very similar to a two-seam fastball. The only difference is velocity, usually around an eight-mph separation. It does not seem outrageous, then, that changeup-heavy pitchers might be a little better at preventing hits on balls in play than other pitchers.
There is certainly anecdotal evidence. Johan Santana, Clay Buchholz, Shaun Marcum, and Cole Hamels are just a few changeup-reliant pitchers who have sustained significantly better BABIPs than average over considerable sample sizes. But this kind of support does not mean very much, so I turned to PITCH-f/x data for help.
First, I found all pitchers in 2011 who have thrown at least 1350 pitches through Sept. 19. I used only 2011 data because the run environment from this season—and consequently the league-average BABIP—are lower than in years previous. I used the 1350-pitch threshold to eliminate relievers from the sample, because relievers are known to be able to sustain lower BABIPs than starters. This gave me a sample size of 157 starters, which is not a huge amount, but sufficient.
Here is a graph of the relationship between changeup usage and BABIP:
This is for all 157 pitchers in the sample, and the gray bands indicate confidence. As you can see, BABIP fluctuates a little until around a .17 usage rate, when BABIP starts to fall. For purposes of transparency, I have the league-average BABIP for the 157 starters as .287. This is slightly lower than the league average found elsewhere because I did not deal with sacrifices (included as outs).
After seeing this 17 percent threshold, I split the starters into two groups: Those who threw changeups at least 17 percent changeups, and those who do not. I will refer to the pitchers who threw at least 17 percent changeups as the “changeup heavy” group. I will refer to the pitchers who threw fewer than 17 percent changeups as the “changeup light” group.
The changeup heavy group (n=45) had a mean BABIP of .279, and the changeup light group (n=112) had a mean BABIP of .290. A two-sided t-test finds the difference in means to be statistically significant at a 98% level.
As suggested by Dan Turkenkopf, I looked at the mean velocities of these two groups. For pitches classified as four-seam fastballs by Gameday, the changeup-heavy group averaged 90.05 mph, and the changeup-light group average 91.0 mph. A one mph difference was also observed with two-seam fastballs (including pitches classified as sinkers). For pitches classified as changeups, the changeup-heavy grouped averaged 81.97 mph and the changeup-light group averaged 83.38 mph.
In terms of repertoires, here is how the two groups differed:
This graph shows difference in pitch usage for four-seams (FF), two-seams (FT), sliders (SL), curveballs (CU), cutters (FC), and of course changeups (CH). As you can see, the only major difference is with changeups (13.2 percent).
Batted Ball Differences
We know that batted ball profiles can help us predict BABIP. Pitchers at the extremes—tons of flyballs or tons of groundballs—are known to be able to sustain lower-than-average BABIPs. Therefore, if changeup heavy pitchers fall into one of these groups, then we have not really found anything interesting.
I am calculating these batted ball profiles using the data available from MLBAM stringers. There are four possible types of batted balls: Fly balls, ground balls, line drives, and pop-ups. This is different from Fangraphs, which includes pop-ups within fly balls.
In terms of groundball rates, the changeup-heavy group averaged 42.5 percent, and the changeup-light group averaged 43.5 percent. All other types of batted ball rates were within two percent for each group, meaning the batted ball profiles were essentially the same! Assuming these batted ball classifications are reliable, we can reason that batted ball differences are not the reason for the difference in BABIP skill.
So which batted balls are going for hits less often for changeup heavy pitchers? Both ground balls and fly balls go for hits less often for changeup heavy pitchers, though line drives become hits at a marginally higher rate. I have not tested these individual BABIP differences for statistical significance.
As stated earlier, all classifications used were from MLBAM. This creates some uncertainty about the actual changeup rate for some of these pitchers. For example, if a pitcher throws both a splitter and a changeup, Gameday usually has a lot of trouble distinguishing the two (see: Ubaldo Jimenez, Freddy Garcia).
In addition, I have not adjusted for the counts in which these pitches were thrown. Ideally, each pitcher in the sample would have an identical distribution of counts. This is because the average BABIP is not same for each count; BABIP is going to be [presumably] higher in 2-0 counts than in 0-2 counts.
Therefore, if pitchers that are in the heavy changeup usage group are better at getting ahead in the count, then we are just implicitly measuring the effect of count distribution on BABIP. Other cautions include that I have not adjusted for team fielding, ballpark, league, or opposition. Also important to note is the inaccuracy of MLBAM batted ball stringers. However, if these errors are randomly distributed, or at least distributed in a manner that does not systematically favor the BABIP of one group vs. another, then these limitations are not huge concerns.
Another limitation, as pointed about by Josh Smolow, is that of handedness. Indeed, we do find that changeup-heavy pitchers are more likely to be lefties than changeup-light pitchers:
But does this really matter? For the pitchers in the dataset, lefties have an average BABIP of .291, and righties have an average BABIP of .285, a difference which is not statistically significant. If anything, this means that we may be underestimating the effect of having a changeup that you can throw 17 percent of the time.
Here is a graph of the relationship of BABIP by changeup usage, split up by pitcher handedness:
It may appear that lefties and righties are displaying very different relationships here to changeup usage, but part of that is because of the smoothing method used. If we present the data using a linear regression instead, the two groups look much more similar:
However, I have not looked into splitting up the data by the handedness of the batter. It is also important to note that the 17 percent changeup usage threshold used to create the two groups is arbitrary. At lower thresholds, the difference in means in not significant. I re-ran the t-test using different thresholds, and the difference in means is significant at at least a 92.5 percent level for every threshold from 17 to 24 percent (17 percent, 18 percent, 19 percent, etc.).
Keep in mind this is also a two-sided t-test, despite the fact that our alternative hypothesis is really one-sided, meaning that we can halve the p-value. This means that for a one-sided t-test, these results would be significant at above a 95 percent level (the standard level in the social sciences) for every threshold in the 17-to-24 percent range.
One bias to watch out for
As stated earlier, I split the starters into two groups, those who throw changeups at least 17 percent of the time, and those who don’t. So perhaps all we are measuring is the effect of having a pitch good enough to throw 17 percent of the time.
Well, were this to be true, we would see a similar BABIP split with other pitches that are thrown at least 17 percent of the time. But we don’t. Slider-, curveball-, and cutter-heavy pitchers (usage greater than 17 percent) do not display a statistically significant BABIP advantage.
Are changeup-heavy pitchers allowing weaker contact, or are these findings just the result of a confluence of luck and limited data? Unless we obtain a lot of HIT-f/x data, we won’t know the answer to this question. But based on the above information, I would feel comfortable saying that there is evidence supporting a BABIP-suppressing skill for changeup-heavy pitchers.
If we do accept that changeup-heavy pitchers have BABIP-suppressing skill, we also need to accept that this skill is not very large. In a way, that makes this just another win for DIPS.
References & Resources
PITCH-f/x data from MLBAM via Darrel Zimmerman’s pbp2 database and scripts by Joseph Adler/Mike Fast/Darrel Zimmerman, this study about hot streaks in baseball, and fangraphs.