The Compassionate Umpireby John Walsh
April 07, 2010
The strike zone, as defined by the rule book, is supposed to be a constant of nature, like the speed of light or the boiling temperature of water. Well you know what? Don't you believe it! The strike zone, the size of it, is changing continuously, it gets bigger or smaller after nearly every pitch. Several questions come to mind: How? Why? Are you drunk?
Let's not get into the last of those, but I can tell you why: because major league umpires are a compassionate bunch of guys. They can't help pulling for the underdog. I'm convinced that they don't do it in purpose, but they do it—they can't escape their sympathetic nature. They seemingly cannot avoid giving the batter a little helping hand when he finds himself down in the count, 0-2. But our boys in blue are not biased against pitchers, oh no. They are more than willing to come to the aid of a pitcher who has just thrown three straight balls. Ever see a Little League game where an 8-year-old pitcher is having a hell of a time throwing a strike? Does the ump call like 20 balls in a row? Of course not, anything remotely close is going to be called a strike. Well, it's the same in the big leagues, to a lesser degree, of course.
Measuring the strike zone
So, in order to see this vacillating, capricious, fickle strike zone, we need to have some way to measure its size. A long time ago, I did some strike zone analysis, but those were the heady days of the advent of Pitch f/x: There wasn't very much data to analyze. I did the best I could, but now we can do a lot better. Here's how I measure the size of the strike zone.*
*I realize that some (most?) of you couldn't care less how I measure the strike zone. Feel free to skip down to the next section. I won't take it personally. Well, maybe a little.
First, I divide the strike zone and the surrounding area into a matrix of bins, or buckets. A bin is nothing more than a little box that defines a pitch location. Then, using the PItch f/x data, I gather location information on a large number of called pitches, by which I mean either balls or called strikes. I put each called pitch in its box. Next I select all the called strikes and put them in their respective boxes. Finally, within each box, I divide the number of called strikes by the total number of called pitches, giving the fraction of called strikes, or the called strike percentage for each bin. Got all that? Doesn't matter, here's a picture:
Here we see the area surrounding the strike zone, as seen from the catcher's perspective. I used 200,000 pitches to right-handed* batters to make this plot. The color of each bin tells you the called strike percentage for that location—brown down the middle (strike percentage 100 percent) and deep blue well outside the zone (strike percentage near zero percent).
*There is a good reason why it's important to work only with right-handed or left-handed batters, but not both together. But I will have to discuss it another time, since it has nothing to do with today's topic.
It's clear from the graph that pitches in certain areas within the rule book zone (shown by the white box), especially in the lower and upper portions of the zone, are being called balls, while pitches in other areas that are outside the rule book zone are being called mostly strikes. Everybody complains about the umps not calling the high strike, but they don't call the low strike either.
I want to measure the size of the strike zone at it's actually called, but first I have to define that. It's obvious from the graph that there isn't a sharp cut-off where pitches go from being strikes to being balls. There is a more of a gradual shift from strikes to balls. In some bins a pitch is equally likely to be called a ball or a strike, the 50 percent bins. If you identify the 50 percent bins and draw a line connecting them, you get the black contour line shown in the graph. That's the true strike zone, as called by the umps. The size of the strike zone, then, is just the area inside the contour—in this case 3.09 square feet.
Effect of the ball-strike count
OK, now that I've got this new strike zone measuring toy, I thought I'd have a look at whether the size of the zone depends on the count. I think most of us have the intuition that on a 3-0 pitch, umpires will tend to call "anything close" a strike. Conversely, I have the notion (which turns out to be true) that when the hitter is in a deep hole, and 0-2 count, the umpires get picky on calling that third strike. Below you see the strike zone graphic separately for pitches thrown at 3-0 and 0-2, respectively.
These graphs are not as pretty as the previous one because far fewer pitches go into them. I had to enlarge the bin size and there are more statistical fluctuations in each bin. Nevertheless, we can still see the large difference in the two strike zones. Here are the numbers:
Count Strike zone size (sq. ft.) ----------------------------------- All 3.09 3-0 3.52 0-2 2.42
Wow, the 3-0 zone is nearly 50 percent larger than the 0-2 zone. It's even more striking if you overlay the two zones, like this:
It's as clear as day: These umpires are a bunch of softies. They see a pitcher struggling to put the ball over and they go all Gandhi on us, giving the pitcher an an extra chunk of strike zone to work with when the count reaches 3-0.
And when the batter becomes the underdog, when the count goes to 0-2? Why, the hearts of our merciful arbiters simply turn to mush: They can't help pulling for the poor batter as he chokes up on the bat, hoping to make some kind of contact. Who knew the umps were such empathetic characters?
I have always assumed that the umps call a different strike zone based on count only on the extreme pitchers' and hitters' counts. Sure, we see a big difference between 3-0 and 0-2, but the strike zone is more or less constant for the other counts, yes? Well, no.
I'm going to show a plot of strike zone area vs. ball-strike count, but before I do that (no looking ahead!) I need to quantify how much each count favors the hitter or pitcher. Actually, it turns out that you can assign a run value to each count, so a 3-0 count is worth .22 runs to the batter, while an 0-2 count is worth about -.11 runs. I worked out these values in a THT article on pitch values. See that article for how these run values are determined, or have a look in the Resources section below if you want to see the actual values.
Knowing the value of each count is useful, because now I can plot the size of the strike zone vs. the run value. For low run values, where the hitter is at a disadvantage, we might expect the umps to shrink the strike zone. Conversely, at high run values, where the pitcher is struggling, we expect the strike zone to grow in size. Indeed, this is what we saw in the two extreme cases. The plot, which shows the size of the strike zone for all counts, is shown here on the right.
Each point represents a different ball-strike count (some of them are labeled). On the horizontal axis the run value of the count is shown, e.g. you can see that 0-2 is worth a little less than -0.1 runs, as I mentioned above. The vertical axis shows the area of the strike zone, using the 50 percent contour as describe above.
Interestingly, we see a very strong correlation between strike zone size and the run value of the count.* So, umps are (sub-consciously, to be sure) making small adjustments to the size of the strike zone depending on the count. And they are doing it in such a way as to help the underdog of the moment in the batter/pitcher matchup.
*You'll notice that the 3-2 count and, to a lesser degree, the 3-1 count do not follow the overall trend so well. I don't have a good reason for that, maybe the ump, after being generous with a called strike on 3-0, is less inclined to help out on 3-1? Possibly, but maybe it's just statistical scatter in the strike zone measurement.
Here's a secret: All this is not exactly ground-breaking research—we already knew that the size of the strike zone varies with count. THT contributor Jon Hale, among others, has looked at variations in the strike zone due to ball-strike count.
But I'd like to give a special shout-out to Dave Allen, who writes for Baseball Analysts and FanGraphs. Dave has been doing all kinds of great work using the Pitch f/x data, including work on the strike zone*. I've also been very envious of his graphics for some time and I finally got around to learning how to produce these "heat maps" graphics, with help from a tutorial that Dave gave at the Pitch-FX Summit last year. So, thanks for that, Dave.
*If you read Dave's article on the strike zone (and you definitely should), you'll notice that his values for the area come out somewhat different than mine. We both find the same general trends, but Dave's numbers come out a little smaller. I suspect we are defining our strike zone slightly differently.
References and Resources
Here is a table of the run values of ball-strike counts, taken directly from my previous article, "Searching for the game's best pitch".
+-------+-------------+ | Count | Run value | +-------+-------------+ | 0-0 | 0.000 | | 1-0 | 0.038 | | 2-0 | 0.104 | | 3-0 | 0.220 | | 0-1 | -0.044 | | 1-1 | -0.015 | | 2-1 | 0.037 | | 3-1 | 0.142 | | 0-2 | -0.106 | | 1-2 | -0.082 | | 2-2 | -0.039 | | 3-2 | 0.059 | +-------+-------------+
John Walsh dabbles in baseball analysis in his spare time. He welcomes questions and comments via e-mail.