A very non-statistical look at umpires and strike zones

by Mat Kovach
June 2, 2011

Mike Fast at Baseball Prospectus has done a good job, in two articles (here and here), looking at the nuances of the umpires’ strike zones and pondering about why umpires get things wrong. Even the PITCHf/x data are almost a hindrance to discovery; based on how they are is collected and what is missing.

One big missing piece is where, exactly, the umpire is standing when the pitch is delivered. Umpires are taught to stand behind the catcher and position themselves to look high and inside to the batter. If the umpire is positioning himself to get his best view of high and inside, this would lead, based on how the eyes work and process the flight of a ball, to a low and outside pitch being one of the hard ones to follow. It makes perfect sense that those types of pitches cause the most questionable calls.

It also leads to the impression that if a ball is difficult to follow but it appears to go directly where it should (it hits the catcher’s mitt), it must be a good pitch. Some people would say that it’s a “gut feeling’’ that the pitch is right when the eyes are confused.

Also, if we know the location of the umpire, we can understand if he is make a prediction on where the pitch might go. Of course if the umpire is attempting to predict the location of the pitch, it would be interesting to see how he calls the pitch if he guesses wrong.

It is this lack of information that make me shake my head and people talk about the need for more gut feelings in baseball. That type of decision-making can be affected by many things, including if you have recently eaten or not.

Note: Always schedule your parole hearing right after a food break. It also might be a good idea to make sure the umpires are fed well between innings.

So, not only do you have to worry about the statistical side of an umpire’s strike zone, but behavioral economics might be involved.

For a little bit of perspective, I wanted to see how called strikes, swinging strikes, balls, and the rest (hits, foul balls, etc.) were affected by the height of the pitch. I reasoned that the boundaries of the strike zone are biased toward the batter’s ability to hit the ball. Just a simple bias about basic human limitation: if it is a borderline pitch that can’t be hit, it is a ball. (This assumption does not have to be TRUE, but just a common assumption people make.)

I grabbed about 75,000 pitches from this year that were over the middle 12 inches of home plate. Then I split them between right- and left-handed batters and by the previously mentioned categories. This is an approach Mike discussed in his second article.

Between 0.6 feet and 3.8 feet, more than 1,000 pitches are present for each 0.2 feet in height. It should be noted that there is ONE 4.8 foot swinging strike. I suspect that is either a switching pitch-out or a misreported pitch. Sample sizes restrict accurate information from other height ranges. I had an average strike zone bottom of 1.6 feet and an average strike zone top of 3.4 feet.

A lot of things make sense.

Eliminating the outside part of the plate gives a look only at pitches over the meat of the plate. The number of called strikes is low until one gets above or below the strike zone.

As the pitches start to get out of the strike zone, hits and called strikes go down, swinging strikes and balls increase.

Hits, called strikes, and balls are clustered together for both right-handers and left-handers, with putting balls in play being the most likely outcome for a ball over the middle of the plate and in the strike zone.

At first look, the height of the pitch affects things very much as one would expect. Throwing to the middle of the plate in or around the strike zone is more likely to get you a ball in play or a strike, but generally a called strike. There are still a percentage of balls called, but upon some investigation those balls are due much more to varying heights of batters than to questionable calls by the umpire.*

* I did some minor corrections, normalizing batter strike zones and giving some variance for park corrections, and only about 1 percent of the ball calls between the top and bottom of the strike zone could be considered questionable. But it was a very quick and dirty method, so non-statistical that a statistician would rather remove his left foot with a dull butter knife than review it.

Umpires are calling a rather defined strike zone.

The real question is, does the graph above represent a real strike zone, as Mike suggested, defined by the players, or does it reflect the size of the strike zone as defined by the rules of baseball. Granted, the current size of the strike zone is smaller than written in the rule book, but if we changed the rule book strike zone would we change the size of the reflected strike zone?

Looking back at my graph, just around the average strike zone top and bottom swinging strikes increase and balls in play decrease. This seems to follow exactly what would happen if a batter swings at pitches near the edge of strike zone. We also see balls increasing and called strikes decrease (well, called strikes increase a bit, then tail off). This follows when a batter holds off at pitches near the edge of the strike zone.

Right around the strike zone there are two fairly clearly defined patterns; swinging or not swinging at pitches near the edge of the strike zone.

I say that the strike zone is defined by the batter results. If batters are swinging and missing at more pitches, the locations will be considered a ball. If batters are hitting the balls, swinging and missing less, the location will be considered a strike. Even if the size of the strike zone changes, the boundaries enforced by the umpire will be defined by that simple bias.

If a player has trouble putting the bat on the ball, it is a ball. If he swings and hits it, it is a strike. The strike zone is defined by the rule book, largely refined by the players and enforced by the umpires.

3 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Peter Jensen

12 years ago

Your graph would make much more sense if the X axis percentages were not the percentage of all pitches thrown, but the percentages of the pitches thrown to left handed and rught handed batters for the pitch results for those batters.

The way it appears now is that the umps are not calling all pitches over 5 feet high balls. They actually are but since you are showing percentage of all pitches the x values for 5 feet are 39% and 61%.

Or better yet make two graphs one for left handed hitters and one for right.

Mat Kovach

@Dave …

As you said, HS umps are slightly different. I thought mentioning that might confuse things a bit since I was focusing on MLB umpiring. Thanks for the input. I like the outside corner hanging from a string comment because a) not the first time I have heard that and b) it would make a great title for another article.

@Peter

I only found ~75,000 pitches there just were not a bunch of pitches about ~4.5 feet

HT:TOTAL
3.8:1329
4.0:861
4.2:533
4.4:344
4.6:191
4.8:113
5.0:77
5.2:32
5.4:25
5.6:15
5.8:8
6.0+:9

So, there is not a large number of pitches, over the middle 12” of the plate, that are about about 4.5’. It is not a lack of the umps making calls.

Separating between right and left will probably been done when expand the number of pitches.

Dave Estobar

Nice article. One point I can tell you from a high school umpire’s perspective: we are not “taught to stand behind the catcher”. We are taught to stand in “the slot”. The slot is the area on the inside corner between the catcher and the batter. This is mainly for safety reasons due to the number of foul balls that come straight back over the middle of the plate and this reduces the risk of concussions. But, this reinforces your point. The inside corner pitch is right in front of us. The outside corner is a learned call. I have a ball hanging from a string in my garage over the outside corner of a plate on the floor so I can set up in the slot and get a feel for where that outside corner is. Good article, thanks.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG