In 2008, Josh Kalk wrote a series of posts that examine each of the prominent pitches thrown in major league baseball, and what makes them effective. He covered curveballs, sliders, change-ups, and the league average pitcher. He did a great job and was very thorough in all of his analyses, and was eventually hired by the Rays. However, since that time, we have advanced our knowledge of PITCHf/x greatly, so it’s time to update our pitch profiles.
Benchmarks, benchmarks, benchmarks
As with all statistics, context is super important. PITCHf/x data are not different, so below is a lot of benchmark info for four-seams. Below is a table with the league average values for all 2011 four-seams, as classified by MLBAM. I have also excluded all pitches below a type confidence value of .25, because lots of pitches below that threshold have data errors.
pitcher = pitcher handedness
batter = batter handedness
MPH = equivalent to start_speed value. Velocity of pitch when it is 50 feet from the back end of home plate.
pfx_x = horizontal displacement due to spin in inches. pfx_x_SD gives the standard deviation.
pfx_z = vertical displacement due to spin in inches.
Vzf = Final vertical velocity component. See this article for explanation of Vzf/Vxf.
Vxf = Final horizontal velocity component
When the pitcher field is “L” or “R” and the batter field is “N,” the fields are not split up by batter handedness. When both fields are null (“N”), then the row contains information that is not split up by either pitcher handedness or batter handedness. In other words, the final row has information about all four-seams thrown in 2011. For result based information, please see Harry Pavlidis’ benchmarks.
In the past year or two, “heat maps” have become very popular, so it seems about time that we have some “heat map” benchmarks. Below you can see some benchmarks for location.
These graphs are from the catcher’s perspective, with the dotted box indicating the strike zone. Red represents the locations where four-seams were thrown to the most, blue where four-seams were thrown the least. The color/density (z) axis is scaled to be the same on all graphs, and ranges from 0 to .26. Of interest here is the LH pitcher to RH batter graph, where the distribution is more spread out than the other three types of matchups.
In Kalk’s anatomy of a curveball, he looked at how different variables correlated with rv100, a linear-weights based measurement of the quality of the result of a pitch. Here I will do the same with the four-seam. However, before we began, it should be noted that these correlations only speak to linear relationships. Kalk also performed his analysis on the aggregate data of various pitchers, where I will be doing this analysis on a pitch-by-pitch level because that gives us a lot more data to work with and avoids the issue of having to choose a population of pitchers to include in a sample.
If we run a linear regression using run values as our dependent variable and velocity as our independent variable, the results are disappointing. For all four-seam fastballs with type confidence at least .25 in the 2011 season, we get a coefficient of -0.00056 and a p-value of 0.0182. This tells us that yes, velocity does influence run value in a significant way, but that the effect is really, really small. The coefficient is negative because a negative run value is favorable to the pitcher.
However, this does not account for the situation in which the pitch was thrown. If pitchers are throwing harder in 0-2 counts where it’s easier to pitch, we’re not really isolating the effect of velocity very well. We can deal with this by taking what’s called a stratified random sample, where we treat each of the 12 different counts as a different population and take a random sample of the same size from each group. If we do this, the coefficient becomes -0.00024 and the p-value skyrockets to .636, meaning that there is no statistically significant relationship between velocity and run value.
If we add in vertical displacement due to spin (pfx_z) to the linear regression, we get a coefficient of -0.00088 and a p-value of 0.0262. This tells us that there is a significant relationship (at a 95 percent level) between vertical displacement and run value. And because the coefficient is negative, the regression suggests that the more vertical “rise,” the better. And remember, this is just for four-seam fastballs. If we replace vertical displacement with vertical velocity (see Vzf used in the benchmarks section), we get a coefficient of -0.0013, with significance at a 95 percent level. This is the largest effect size we have seen so far. In really any situation, horizontal displacement due to spin (pfx_x) did not have a significant relationship with run value.
If we look at whether pitch location matters, this is what we find:
These graphs are from the catcher’s perspective, with the dotted box indicated the strike veone. Red represents the locations where four-seams have the highest run value, blue where four-seams have the lowest run value. The color/density (z) axis is scaled to be the same on all graphs, and ranges from -.15 to .15. This uses the data that have the same amount of pitches for each type of count, so the sample gets a little small in the lefty to lefty graph at n=2296. Of interest here is that when handedness is the same for the pitcher and batter, low and away is the best location. When handedness is different for the pitcher and batter, pitching up in the zone nets the best results for pitchers. Remember that this is only for four-seams.
In recent years there have also been some interesting advances in platoon splits. Make sure to check out this piece by Max Marchi, who finds that four-seam fastball type pitches do have a platoon split, but that it’s much less pronounced than two-seam fastballs.
References & Resources
*PITCHf/x data through MLBAM via Darrel Zimmerman’s pbp2 database.