Strike Zone: Fact vs. Fictionby John Walsh
July 11, 2007
It's been called the War Zone, the place where batters and pitchers square off, the territory that each side has to conquer if victory is to be assured: the strike zone. This is how the strike zone is defined in the rulebook:
The STRIKE ZONE is that area over home plate the upper limit of which is a horizontal line at the midpoint between the top of the shoulders and the top of the uniform pants, and the lower level is a line at the hollow beneath the knee cap. The Strike Zone shall be determined from the batter's stance as the batter is prepared to swing at a pitched ball.
Straightforward I guess, although certainly not easy for an umpire. How are they supposed to determine that mid-point between belt and top of shoulder, not when the batter takes his stance, but when he is "prepared to swing at a pitched ball?" But the difficulty of calling balls and strikes is not exactly what I want to write about today. What I would like to address is this: Are umpires accurately calling the rulebook strike zone or do they have their own, unwritten, strike zone? We have all heard anecdotal accounts of umpires calling strikes six inches off the plate and calling belt-high pitches balls.
In 2001, Sandy Alderson, then executive vice president of operations for MLB, initiated a campaign to get the umpires to call the rulebook strike zone. He was especially keen to see the return of the high strike, since virtually all pitches from the belt up were being called balls. The implication of Alderson's crusade is that umpires were not calling the rulebook strike zone and needed to be re-educated.
Have they improved? Are they now calling the strike zone according to the rulebook? We can try to answer this question by actually measuring the strike zone as called by major league umpires and compare that to the rulebook strike zone. This is of more than purely academic interest: one of the reasons given for the huge increase in offensive production in the 1990s is the ever-shrinking strike zone. We can't now go back and see if that claim is accurate or not, but we can measure the strike zone today and in the future and see how it changes and how it affects offense.
So, let's roll up our sleeves and get to work.
Strike zone: actual vs. rulebook
We can actually measure the dimensions of the strike zone using MLB's detailed pitch data, which gives detailed location information on every pitch thrown in the ballparks where the hardware is installed (about 80,000 pitches at the time I'm writing this). We know for each pitch the (x,y) position (horizontal and vertical) of where it crossed the plate and we know whether the umpire called it a ball or strike. We also know, thanks to the operator of the pitchF/x system, the lower and upper limits of the strike zone for each individual batter.
I decided to split up the job into two pieces: first I'm going to see how wide the strike zone is, then I'll tackle the vertical size. The first step is selecting pitches, either balls or called strikes, that are safely within the vertical strike zone, i.e. they are clearly not low or high. The picture on the right shows what I mean. Each pitch is represented by a black circle, but there are so many of them they appear as the black band in the plot.
The red rectangle represents the strike zone. I've mapped the vertical location of each pitch to an average strike zone that goes from 1.6 feet to 3.56 feet. This removes the batter-to-batter variation and allows me to show a single strike zone in the plots. Home plate is 17 inches wide, but the horizontal dimension of the strike zone is actually just a hair under 20 inches (1.66 feet), because a strike is called if any part of the ball crosses over any part of the plate.
As you can see, these pitches are all well within the vertical strike zone, so if they are called a ball, we can be fairly sure that they are considered either inside or outside by the umpire. Note that I've avoided the upper part of the strike zone, just in case the umps are calling that pitch high.
Next, I divide the data into bins of horizontal position and see how many balls and strikes are called at each position. As is often the case, it's easier to show the results on a graph than to describe it in words.
The bottom plot shows the fraction of called balls as a function of the horizontal position. If the umpires and the pitch location data were perfect, we'd expect the ball fraction to be equal to zero within the rulebook strike zone (depicted by the vertical red lines) and equal to one outside the red lines. We see this general behavior with a few key differences.
For one, there are some balls called even when the data say the ball is right down the middle, both vertically and horizontally. These are either totally blown calls or faulty data or a mixture of both (more on this point later). Secondly, we don't see an abrupt change where the ball fraction changes from zero to one, but rather a continuous curve. That's to be expected, of course, since neither umpires not pitch measurings systems are infinitely accurate.
We can still define the width of the strike zone, however, even though there is no sharp transition. We somewhat arbitrarily, but reasonably, say that the edge of the measured strike zone corresponds to the position where the ball fraction is one-half. On the graph this is shown by the vertical green lines, whose positions are defined by where the horizontal green line crosses the blue curve (or, equivalently, where the blue and red curves cross in the upper plot).
The horizontal width of the measured and rulebook strike zones are shown by the green and red arrows. It appears from this data that the umpires' strike zone is about two inches too wide on each side, compared to the rulebook strike zone. Oh, all this is for right-handed batters.
For left-handed batters, things are a bit different, as you can see in this plot:
I've used the same procedure to measure the vertical size of the strike zone. First, balls and called strikes that are horizontally centered on the plate are chosen, then I look at the fraction of called balls as a function of the vertical location of the pitch. Here are the results for right and left handed batters:
Now we see that the umpires are calling a smaller vertical strike zone than the rulebook calls for. For right-handed batters, the upper edge of the zone is called correctly, while umps are not calling the lowest strikes. That doesn't surprise me — whenever I see a replay of a low pitch from the side of the plate, i.e. so you can judge the height relative to the batter's knees, I'm often surprised that what I thought was a very low pitch is actually about knee-high.
Umps have the same problem calling low strikes for left-handed batters and they also call fewer high strikes against lefties.
Another thing to note about these plots of the vertical strike zone: there is less accuracy in calling the vertical strike zone compared to the horizontal one. This is seen by noting how rounded the blue curves are for the vertical dimension: the transition from strike to ball is much less sharp than for the horizontal case. This should not be surprising since judging the vertical strike zone, where the umpire has to estimate "the mid-point between the top of the pants and the top of the shoulder," is more difficult.
We can summarize the findings on the actual strike zone with the following plot and table:
Actual vs. Rulebook Strike Zone Dimensions (inches) Left Right Lower* Upper Total Area+ RHB -12.0 12.1 21.6 42.7 509 LHB -14.6 9.9 21.2 41.0 485 Rulebook -9.9 9.9 19.2 42.7 465 * vertical strike zone mapped to average + total area in square inches
Fly in the ointment?
I've already mentioned the fact that the ball fraction for pitches right down the middle of the plate is not zero, in fact it's about 5-6%. Can umpires be missing these easy calls so frequently? It seems hard to believe. The alternative explanation is that there is some problem with the data.
I believe we can settle this issue by studying carefully the data, but a detailed investigation will have to wait for another day. As a first look, I have watched video of a half-dozen of these pitches that were measured to be right down the pipe, but were called balls.
What I found was this: a few of these were clearly outside the strike zone, a couple were borderline, none were right down the middle. And if you doubt my pitch-calling ability using mlb.tv on my little computer screen, all I can say is that one pitch whose recorded location was right in the heart of the strike zone, was actually an intentional ball that was thrown two feet off the plate!
I am currently working on understanding this issue in more depth. Right now it appears that 1) this is an issue with the quality of the data and 2) it will not materially affect the strike zone measurements I've presented here. I hope to have more information on this next time.
John Walsh dabbles in baseball analysis in his spare time. He welcomes questions and comments via e-mail.