Admonished for reasonable mistakes and seldom credited for accuracy, umpires unquestionably have one of the hardest jobs in baseball. They must maintain a high level of concentration for *every single play*.

But perhaps most difficult for umpires is when they are behind the plate. With modern technology, home plate umpires are under an absurd level of scrutiny. Borderline call? Replay it with K-zone or Pitchtrax. Egregious enough, and the mistake will be shown on ESPN or MLB Network the next day. They are also evaluated with Questec to standardize the strike zone.

But this is not to trivialize putting umpires under the microscope. Indeed, home plate umpires have a significant effect on run scoring. How the strike zone is interpreted can affect not only ball-strike counts, but pitchers’ approaches and batters’ approaches. Small strike zone? A pitcher will have to modify his location so that he can throw strikes, an advantage for the hitter.

Based on observational evidence, umpires seem to be pretty consistent across the league. But for a more exact measure, it is possible to calculate the size of an umpire’s strike zone. To do this, I modified the method I used to find swing area. This time, I found the area in which umpires call a strike at least 50 percent of the time. I arbitrarily chose the 50 percent mark, but it seemed to be reasonable from a common sense standpoint.

I eliminated all pitches where the umpire did not make a decision—when the batter swung. I was ambivalent about including pitches involved in intentional walks, but I end up keeping them; they compromise only a small amount of pitches, anyway. I also only looked at umpires who saw at least 3000 pitches in 2011 to make sure outliers weren’t having as large an influence as they did when I looked at swing area.

The umpires with the five smallest strikezones are:

Tim Tschida 2.85 Tim McClelland 2.99 Paul Schrieber 3.01 Ed Hickox 3.07 Chad Fairchild 3.07

The umpires with the five largest strikezones are:

Phil Cuzzi 3.60 Ron Kulpa 3.60 Bill Miller 3.63 Ted Barret 3.63 Doug Eddings 3.65

The mean strikezone area is 3.32 square feet. The overall distribution looks like this:

The standard deviation is 0.16, but I’m not sure how well it describes the data in this case, so take that figure with a grain of salt. This means that Tim Tschida is nearly three standard deviations away from the mean, which is the largest distance from the center. I can’t say I have much of a memory of Tschida’s umpiring, so it would be interesting for people to report their observations of his umpiring in the comments.

Because I limited the sample to umpires who had seen 3000 pitches in 2011, we only have 74 umpires. This is why the histogram looks a little ragged, even though strike zone area is probably normally distributed.

But does strike zone area actually *mean anything*? To verify that these results actually mean something, I calculated the FIP for each umpire. The relationship was very significant, and in the correct direction—a larger strike zone means a lower FIP.

The coefficient of strike zone area was -0.58, meaning that for every one foot increase in strike zone area, we can expect a decrese in FIP of about -0.58. The relationship had limited explanatory value though, with an R-squared of 0.13. This means that the values of strike zone area explain 13 percent of the variance in the values of umpire FIP. You can see this relationship below:

However, we may be underestimating the value of the metric. FIP includes home runs as a major component. And while strike zone area probably does have a relationship with home run rate, ballpark effects and randomness probably play a much larger role. A metric that ignores home runs is kwERA, an ERA estimator based on only strikeouts and walks. You can read more about the metric here.

Strike zone area has a much stronger relationship with kwERA, yielding an R-squared of 0.39. You can see this relationship below:

I was also interested in the relationship between strike zone area and swing rate. My thinking was that if an umpire had a smaller strike zone, batters would have better pitches to hit and swing more. In other words, I guessed that strike zone area would have an inverse relationship with swing rate.

Turns out the opposite is true, and that strike zone area has a positive relationship with swing rate, significant at a 95-percent level. Although this seems to controvert common sense—or at least my intuition, anyway—upon reflection, it makes sense. In many pitchers’ counts, batters swing more than average. This is because they are trying to protect the plate.

So if a larger strike zone means more pitcher counts, then it might also mean a higher swing rate. I should also note that there is a very tight distribution of swing rates among umpires, ranging from 44 percent to 47 percent.

**Limitations**

There are many more factors at play here than just umpires. I have not adjusted the metric for batter identity, pitcher identity, league or ballpark. The largest variable which I have not accounted for is batter handedness; because left-handed batters and right-handed batters have different called strikezones, this probably decreases the accuracy of the metric.

However, if we assume that each home plate umpire had a similar distribution of batter handedness, then the imprecision is distributed in a way that it doesn’t make much difference overall.

**Finishing thoughts**

While it can be interpreted in many ways, it seems to me that umpires are pretty consistent in terms of their strike zone sizes. However, what differences there are definitely have a significant effect on run scoring, which we can see in the relationship between strike zone area and kwERA. Strike zone area should compliment the research that is already out there, such as Brian Mill’s umpire call database.

**References & Resources**

PITCHf/x data via MLBAM through Darrel Zimmerman’s pbp2 database and scripts by Joseph Adler/Mike Fast/Darrel Zimmerman

Millsy said...

Regarding LHB and RHB, the strike zones on the whole are quite similar from what I’ve seen. The big difference is that it is a mirror image (the zone tilts slightly downward toward the outside corner) and shifted outside for LHB.

One thing that I’ve noticed is that the younger guys tend to have a bit larger strike zone, while the older guys kind of do their own thing (this is just observational). My guess is that the younger guys came through the system with significant technological coaching and adhere closer to the rulebook zone than the older guys.

Good work, and thanks for the link back

Millsy said...

Any chance you are going to share the contour area calculations? I’ve got a method myself that I found, but I’m always curious about better ways to do so.

MikeEl said...

Very nice work. Any chance you can break the data up by team? I’d be interested to see if there is much variance in the strike zones some teams receive over others.

Millsy said...

MikeEI,

I think there may be issues with breaking things up by team, since the size of the zone varies so much by count and pitch type (among a plethora of other variables). It might seem that the Yankees get a bigger zone, but this could just mean they’re in more 3-0 counts, where the odds of a strike call given location increase a LOT.

Controlling for these factors (say, in a regression) makes it much more difficult to directly compare the area of the zones. It’s possible, but you do severely reduce your sample size when you break it down into Yankees only vs. Red Sox only, in 0-0 Counts only, and fastballs only.

But as I remember, Josh does do some Bayesian mean regression in his work so I agree this would be interesting.

Josh Weinstock said...

Millsy,

I’m actually going to have the full code for this later on my blog.

Millsy said...

Cool. I suspect we’re using nearly the exact same code, but it’s always nice to see it from another person’s perspective.

Josh Weinstock said...

The R code to conduct this analysis can be found here: http://pitchrx.blogspot.com/2012/01/calculate-umpire-strikezone-sizes.html

Detroit Michael said...

As the first comment hints, I think the Questec system was mothballed a few years ago. It became redundant once the Pitch F/X technology was installed. Umpires are evaluated using Pitch F/X data.

SCIENCE! said...

Great info. Could this be spun into fantasy data… look at teams that frequently get home umpires with larger zones, maybe they have an upper hand. Or if you have a pick up a spot starter, grab the one who has a wider zone for that given day. Thoughts?

Millsy said...

SCIENCE!

Who is behind the plate in the first game of the series is not released, but you can usually impute after that.

To the chagrin of my fantasy buddies, I will be trying to do just that this year . The advantages, I think, would be pretty marginal (especially in a shallow league without much waiver wire, or in a weekly lineup league).

Ciprian said...

Questec? That’s so 2002

James M. said...

How much different would your results be if you chose your cutoff at, say, 75% instead of 50%? Everybody’s zone would be smaller, of course. But I’m guessing it won’t make much of a difference for most umpires. Those where it does matter have a consistency problem which could bias your results.

Also, does your data enable you to distinguish between high- and low-strike umpires?

Josh Weinstock said...

James M.,

I would guess that nearly all umpires would have a very similar 75 percent cutoff, so we wouldn’t learn very much. The 50 percent cutoff helps us best distinguish between umpires.

And theoretically I could use this to find out who are high and low strike umpires, but I’m not entirely sure what that would look like or if such a thing exists.

Millsy said...

Josh,

I think you could use the method to draw a line at the “midpoint” of the rule book zone and calculate the area above and below and compare the two at whatever contour you’d like. You can do this for inside/outside, too, I would imagine.

I think James M.‘s point about the consistency problem, and this was—I think—what I asked about regarding the “dropoff” rate a while back with the swing areas.

Theoretically, one umpire could go from calling 100% strikes to 0% strikes very quickly, If quick enough, his 50% contour is essentially at the point this happens.

On the other hand, you could have an umpire that has a tiny 95% contour, but a strike zone that dissipates slowly such that the 50% contour is humongous.

In this respect, it is possible to have misleading comparisons between umpires based just on the area of a single contour. Possibly reporting the areas of multiple contours (say, 10%, 25%, 50%, 75%, and 90%) in some tabular form would be interesting.

Jim Detry said...

I wonder if advance knowledge of the ump behind the plate would be useful in betting the over/under in Las Vegas.

James M. said...

Yes, that was my point about consistency. I don’t really care about the size of the umpire’s zone that much as long as it isn’t very small or very large. What I do care about is consistency. If an umpire always calls the high strike the pitcher and batter can adjust to that. But if he calls it 50% of the time anything you do is likely to be wrong.

Josh Weinstock said...

Ahh, I understand the question now. I agree that looking at the rate of change in contours would be interesting. Theoretically, a steeper decline would be better than a gradual one, I would think.

Lotto said...

I started keeping some simple notes on a recipe card with my opinions of whether or not certain umpires were pitcher friendly or hitter friendly. I do this for fantasy purposes, as I commonly pick up starting pitchers when they’re facing weak hitting teams, and with pitcher friendly umpires behind the plate being a nice incentive as well. At the top of my list for pitcher friendly is Doug Eddings, supported by your research. At the bottom, the most hitter friendly were McClelland and Schreiber, also supported by your research. I’ll now add your in depth analysis of the others to my recipe card. Thanks!