Statistics basically tell us what happened. If a pitcher gives up 20 earned runs in 50 innings, his ERA is 3.60. Given a league average of around 4.40 for starters, a 3.60 ERA means that the pitcher did a good job when he was on the mound—in terms of results.
But those results don’t necessarily tell us how well that pitcher actually pitched. He could have had a team behind him with an infield of Albert Pujols, Chase Utley, Brendan Ryan (or Elvis Andrus, depending on your taste) and Evan Longoria, and an outfield with three Franklin Gutierrezes. In that case, it’s likely that he didn’t pitch as well as his ERA suggested, and his stats would suffer in front of a less awesome defensive team.
With the advent of fielding independent pitching statistics, we can start to get a better feel of how well a pitcher actually pitched, by looking only at things a pitcher has a reasonable amount of control over (namely walks, strikeouts, home runs and batted ball types). However, even those pitching independent stats measure only the results of what happened. A pitcher could throw a 12-6 curveball down and away, and Alfonso Soriano could shoelace it into the stands. Or a pitcher could throw a fastball in the strike zone on a 3-2 count and have it be called a ball.
The point of this long-winded introduction is that even the best stats tell us the results of what happened, not the process. Over a large sample size, the results start to conform to how well a pitcher actually pitched. However, when you look at the highs and lows that all pitchers experience, it’s likely that the results of each start don’t actually tell you much about how a pitcher pitched that day.
In fact, I believe that they tell you almost nothing at all!
Okay, hotshot, prove it
I can’t do that. I can, however, look at a example to see if my theory holds water. I’ve chosen A.J. Burnett as my case study for today. Why Burnett? Well, over the years, despite being a very effective pitcher, he has developed the reputation of being quite streaky. In 2009, that reputation was justified - at least by looking at his monthly splits. Since I am expressly trying to look at the highs and lows of a pitcher’s season, it makes sense to pick one who had a lot of highs and lows.
I took a gander at Burnett’s game logs this year and sorted them by Game Score—a metric devised by Bill James to rate the effectiveness of a pitcher in a single start. I then chose his 10 best and 10 worst starts of the season to make my comparison.
In his 10 best starts, he averaged a Game Score of 70.9. In his 10 worst starts, he averaged a Game Score of 31.9. More intuitively, his ERA was 1.06 in his good starts compared to 9.13 in his bad starts… quite the difference.
I then grabbed all of the PITCHf/x information on those two groups of starts. In case you are unfamiliar with it, PITCHf/x is a ball- tracking technology powered by SportVision, which measures certain key characteristics of each pitched ball, including speed, spin deflection (movement) and location. After manually classifying Burnett’s pitches game-by-game (yes this was a pain), I was ready to look at the data.
My agenda was simple. I wanted to see, using the intrinsic qualities of each pitch, exactly how differently he pitched in his best and worst starts of the season. I looked at three variables: stuff, location and approach.
For stuff, I looked at the three main aspects of a pitch: horizontal and vertical spin deflection, and velocity. Spin deflection is the technical term for movement that you may often hear with regard to PITCHf/x studies. The numbers indicate how many inches the ball moved compared to a theoretical pitch thrown without spin. They are from the catcher’s point of view, so negative means that the pitch moves toward the third base side and positive means it moves toward the first base side.
First, some notes on his pitch classifications. Burnett is an interesting pitcher, in terms of stuff. His two-seam fastball (FT) is the exact same speed as his fastball, but it has a lot more horizontal movement and more drop. Since 2007, there have been only 5,001 pitches thrown at least 94 mph with less than 10 inches of horizontal spin deflection and less than six inches of vertical spin deflection. Of those, 216 belong to Burnett. His change-up is also in the mid to high 80s, which is awesome.
I couldn’t tell his slider and curveball apart, even at the game-by-game level. I identified a few pitches that I thought were in fact sliders (they were about 5 mph faster than his curves that day, with a little bit less drop), but there were so few pitches that actually looked like sliders that you probably could just lump them with the curveballs.
Anyway, back the table above. Look real hard at it and tell me if you can see a meaningful difference in the two groups of starts. His fastball velocity was actually better in his bad starts, and the average spin deflection and velocity on the rest of his pitches were nearly identical. If you want to see the spin deflection charts for each of the two groups, look here.
So I think we can say, for all intents and purposes, that there was no practical difference in the quality of Burnett’s stuff in his best and worst starts of season. That’s not too surprising as stuff is primarily a matter of mechanics. Location, on the other hand, also includes release timing and intent, two things that you would expect to add a lot more variance from start to start.
First, let’s check out some basic location stats. “Outside” is the percentage of pitches thrown greater than .5 feet off of the strike zone in either direction, “Border out” is pitches within .5 feet of the strike zone, but not actually in the zone, “Border in” is pitches within .5 feet in the strike zone and “Middle” is pitches in the middle square foot of the zone. Given that left-handed hitters and right-handed hitters have different strike zones, I calculated each of the stats separately, then summed the results.
|Outside||Border out||Border in||Middle|
|Outside||Border out||Border in||Middle|
In his good starts, he threw more pitches on the inner borders of the strike zone and fewer pitches down the middle. Those are good things. He also threw fewer pitches in the strike zone overall, and more pitches well outside the strike zone. Those are bad things.
Of course, those are just some very general stats and don’t really tell you much either way. Fortunately, PITCHf/x records show, to the half inch, the location of each pitch thrown by Burnett (and all pitchers).
First, let’s check out his four-seam fastball location:
In both groups, he throws predominantly to the outer half of the plate to both righties and lefties. In his bad starts, the pitches appear to conform to a diagonal pattern, while they are a little bit more scattered in his good starts; however, there doesn’t appear to be much of a difference in his four-seam fastball location.
Let’s check out his curveballs now:
Again, the pitch distributions look almost identical. If you check out his two-seam fastball location, it’s no different. So it appears that any difference in location is going to be too subtle to pick up in a scatterplot. It looks like we need a better shovel.
So I divided the strike zone into 19 parts, like so, and measured the percentage of each pitch, out of total pitches, to righties and lefties separately that Burnett threw in each of those zones in his two groups of starts. That gave me 190 bins of pitches. I then queried my PITCHf/x database for all pitches that were similar to those of Burnett (you can read more about that in the References and Resources section belowe), and figured out the average run value per 100 pitches (rv100) to righties and lefties separately in each of the 19 zones, which gave me another 190 bins.
Then I multiplied the two bins together and summed the results. That gave me an expected rv100 based off of location for each of the two groupings of starts. In his good starts, the expected rv100 was -.19. In his bad starts, the expected rv100 was -.36. Remember that these are from the pitcher’s point of view, so negative is better. It turns out that his location was actually better in his bad starts!
By looking at both his stuff and location in those two groups of starts, we have found practically no difference between his best and worst starts of the season. In fact, there is some evidence to suggest that he was actually better in his bad starts. However, by using aggregate data like I did, you take all context out of the equation. The count, score, batter, and base/out situation all dictate how a pitcher goes about his business. And how successful a pitcher is at pitching to the situation obviously will have some determination on how effective he is.
Now, there is no one stat for measuring how well a pitcher pitches to context—that would pretty much solve baseball—so it makes comparing his approach in the two different groups of starts a tough proposition. However, we can look at some basic stats from which we could infer something about the quality of his approach.
First, I looked at his F-Zone, which is the percentage of pitches thrown in the strike zone on the first pitch to each batter. Given that getting ahead of batters is touted as a good approach, it makes sense to look at how Burnett was able to do that. In his good starts, his F-Zone was 56.4 percent. In his bad starts, it was 55.6 percent. Over the 1,000 or so pitches he threw in each the two groupings of starts, that’s a difference of about eight pitches. Significant? Probably. But it doesn’t nearly explain the difference in results in those two groupings of starts.
We also can look at his his pitch selection by count. To do this, I broke the counts into three categories—hitter, neutral and pitcher—based on the average linear weights found in this article by John Walsh. The count distinctions are wholly arbitrary, but it’s a lot more productive to try to compare three categories than 12. Here is what I got:
I’ve heard a picture is worth a thousand tables of data. In this case, that holds true. It’s really hard to tell his pitch selection apart. The only major difference is in pitcher’s counts, in which he threw slightly more four-seam fastballs and change-ups and fewer two-seamers. Overall, the average absolute difference was 1.9 percent, which seems very small.
There are more things I could look at with regard to approach, like pitch location by count, but the sample size will get too small to make any meaningful conclusions. Plus, I already looked at aggregate pitch location, and aggregate pitch selection by count, and found no noticeable differences, so I doubt we’ll see anything by breaking it down further.
So what did we learn today with this obnoxiously long article? Well I took a pitcher’s 10 best and worst starts of the year, in which you’ll remember there was an ERA difference of about 8, and found no meaningful differences in terms of what he threw, the velocity/movement of his pitches, where he threw them and when he threw them. I think I’ve established that there was practically no difference in how he pitched in his good starts compared to his bad starts.
Does this show that all peaks and valleys of performance over a long season are simply due to luck? Of course not. Burnett is only one pitcher. However, I believe that this is a strong piece of evidence to support that notion to some extent. I hope someone smarter than me will develop a way to quantify the expected production of a pitcher using PITCHf/x data. Then we could apply it to the population to see if the phenomenon I found today holds true for most pitchers.
References & Resources
PITCHf/x data provided by MLBAM and Gameday.
Regarding the location by run value I looked at in the “Location” selection, I feel I should offer some caveats. The zones I used are obviously not very precise, thus the final run value tally is subject to measurement error. I used those zones because breaking it down any smaller resulted in massive sample size issues for the run values, making the results practically unworkable. To figure out similar pitches to those thrown by Burnett, I used all pitches that were within 2 mph of velocity on either side, and three inches of movement. I used data from 2007-2009.