Mike Fast at Baseball Prospectus has done a good job, in two articles (here and here), looking at the nuances of the umpires’ strike zones and pondering about why umpires get things wrong. Even the PITCHf/x data are almost a hindrance to discovery; based on how they are is collected and what is missing.
One big missing piece is where, exactly, the umpire is standing when the pitch is delivered. Umpires are taught to stand behind the catcher and position themselves to look high and inside to the batter. If the umpire is positioning himself to get his best view of high and inside, this would lead, based on how the eyes work and process the flight of a ball, to a low and outside pitch being one of the hard ones to follow. It makes perfect sense that those types of pitches cause the most questionable calls.
It also leads to the impression that if a ball is difficult to follow but it appears to go directly where it should (it hits the catcher’s mitt), it must be a good pitch. Some people would say that it’s a “gut feeling’’ that the pitch is right when the eyes are confused.
Also, if we know the location of the umpire, we can understand if he is make a prediction on where the pitch might go. Of course if the umpire is attempting to predict the location of the pitch, it would be interesting to see how he calls the pitch if he guesses wrong.
It is this lack of information that make me shake my head and people talk about the need for more gut feelings in baseball. That type of decision-making can be affected by many things, including if you have recently eaten or not.
Note: Always schedule your parole hearing right after a food break. It also might be a good idea to make sure the umpires are fed well between innings.
So, not only do you have to worry about the statistical side of an umpire’s strike zone, but behavioral economics might be involved.
For a little bit of perspective, I wanted to see how called strikes, swinging strikes, balls, and the rest (hits, foul balls, etc.) were affected by the height of the pitch. I reasoned that the boundaries of the strike zone are biased toward the batter’s ability to hit the ball. Just a simple bias about basic human limitation: if it is a borderline pitch that can’t be hit, it is a ball. (This assumption does not have to be TRUE, but just a common assumption people make.)
I grabbed about 75,000 pitches from this year that were over the middle 12 inches of home plate. Then I split them between right- and left-handed batters and by the previously mentioned categories. This is an approach Mike discussed in his second article.
Between 0.6 feet and 3.8 feet, more than 1,000 pitches are present for each 0.2 feet in height. It should be noted that there is ONE 4.8 foot swinging strike. I suspect that is either a switching pitch-out or a misreported pitch. Sample sizes restrict accurate information from other height ranges. I had an average strike zone bottom of 1.6 feet and an average strike zone top of 3.4 feet.
A lot of things make sense.
At first look, the height of the pitch affects things very much as one would expect. Throwing to the middle of the plate in or around the strike zone is more likely to get you a ball in play or a strike, but generally a called strike. There are still a percentage of balls called, but upon some investigation those balls are due much more to varying heights of batters than to questionable calls by the umpire.*
* I did some minor corrections, normalizing batter strike zones and giving some variance for park corrections, and only about 1 percent of the ball calls between the top and bottom of the strike zone could be considered questionable. But it was a very quick and dirty method, so non-statistical that a statistician would rather remove his left foot with a dull butter knife than review it.
Umpires are calling a rather defined strike zone.
The real question is, does the graph above represent a real strike zone, as Mike suggested, defined by the players, or does it reflect the size of the strike zone as defined by the rules of baseball. Granted, the current size of the strike zone is smaller than written in the rule book, but if we changed the rule book strike zone would we change the size of the reflected strike zone?
Looking back at my graph, just around the average strike zone top and bottom swinging strikes increase and balls in play decrease. This seems to follow exactly what would happen if a batter swings at pitches near the edge of strike zone. We also see balls increasing and called strikes decrease (well, called strikes increase a bit, then tail off). This follows when a batter holds off at pitches near the edge of the strike zone.
Right around the strike zone there are two fairly clearly defined patterns; swinging or not swinging at pitches near the edge of the strike zone.
I say that the strike zone is defined by the batter results. If batters are swinging and missing at more pitches, the locations will be considered a ball. If batters are hitting the balls, swinging and missing less, the location will be considered a strike. Even if the size of the strike zone changes, the boundaries enforced by the umpire will be defined by that simple bias.
If a player has trouble putting the bat on the ball, it is a ball. If he swings and hits it, it is a strike. The strike zone is defined by the rule book, largely refined by the players and enforced by the umpires.