MLB Umpires: 2016 Review, World Series Preview

John Hirschbeck is the crew chief in his final World Series as an umpire. (via Keith Allison)

John Hirschbeck is the crew chief in his final World Series as an umpire. (via Keith Allison)

As part of the MLB-watching general public, one fault I believe we have is forgetting how exceptionally talented the people are who grace our screens on a daily basis. We are quick to write off the hitter mired in a 1-for-27 slump, or the reliever who gives up runs in four straight appearances. “Why is this guy on the team?” we hear ourselves asking. We forget how many levels of baseball exist that this player has worked his way through, how many thousands of other baseball players from Little League all the way through Triple-A he has beat out just to make it onto your screen.

We notice these relative failures because we keep track of these players closely. Their stats can be found on any number of websites broken down in almost any manner you’d every like to see. We buy jerseys with their names on the back. Kids emulate them at the local park.

While the players are the people on the screen who are supposed to get all the attention, there are other people on that screen who also have risen through the ranks to make it to the highest possible level in their profession – the MLB level. These people are the umpires.

Umpires have a very, very difficult job. Even if we restrict the entirety of their jobs to only calling balls and strikes, the role is very challenging. Batters are different heights. Pitchers throw different types of pitches from all kinds of angles. Pitchers throw harder than ever. Catchers are angling to receive pitches in a manner that makes borderline pitches look more like strikes.

Home plate umpires have to make roughly 150 judgments per game in real time on pitches the general viewing audience gets to see painted on the screen overlaying a supposed strike zone grid. The mere fact that every strike zone grid I’ve seen on any broadcast is rectangular in shape tells me it does not represent a strike zone that any umpire in the league would call.

Much like the slumping players, we are quick to grow frustrated with umpires who appear to be failing. We all know the difference between a 1-1 pitch on the edge being given as a strike instead of a ball, putting the pitcher in the driver’s seat or letting the hitter sit on a fastball while ahead in the count.

While I think we all could learn to remember how difficult the jobs are of these people on our screens, there are a couple of differences when it comes to complaining about an umpire compared to a player. The first is that we believe umpires are supposed to be “invisible,” to do their jobs without drawing any attention. Some believe we could or even should utilize robots to replace their judgment. The second is that generally speaking, we don’t follow umpires on a day-to-day basis. We don’t know all of their names, we don’t look at their stats, and we don’t collect their baseball cards. You probably don’t remember who was umpiring behind home plate at the last game you went to see, but you probably do remember who the starting pitcher was for your favorite team that day.

All of this said, since the introduction of PITCHf/x in stadiums around the league a decade ago, every pitch location as it crosses home plate is tracked, and every umpire ball or strike decision is recorded. I have been monitoring and measuring the MLB called strike zone for a number of years now, so I know on aggregate how the strike zone is called in the majors. What I can do then is investigate how each individual umpire calls the strike zone as compared to the aggregate zone for the league.

The method of measuring individual umpires I used is taken from a suggestion by Tom Tango on his website last year. The idea is that many called pitches in a game don’t tell us much about an umpire’s strike zone. Pitches taken in the heart of the plate and pitches in the dirt always are called strikes and balls, respectively, by all umpires around the league. Where things get more interesting is where home plate umpires call pitches in areas where there is no consensus. Basically, around the edges of the strike zone.

Since I am using the aggregate MLB strike zone as the standard for this metric, I calculated an MLB-wide called-strike percentage over the entire regular season for each square inch above the front plane of home plate for both left-handed hitters and right-handed hitters. Umpire calls are given positive value for a call if it agrees with the majority based on the pitch location. The magnitude of that value depends on how likely a pitch in that location is called a strike. If on average, a pitch in a particular location is called a strike 60 percent of the time, then it is of course called a ball the other 40 percent of the time. If an umpire calls a pitch in that location a strike, he is awarded a positive score based on agreeing with the way that pitch was usually called around the league in that season.

In the example above case, an umpire would receive (0.60 strike% – 0.40 ball%) * 0.40 ball% = 0.08 to his expected call score. Had he called it a ball, he would have been docked (0.60 strike% – 0.40 ball%) * 0.60 strike% = 0.12 from his expected call score. This scheme weights calls in a reasonable manner based on how “easy” the call should have been, including attributing no value to calls made on pitches that are always called strikes or balls.

As Bryan Cole pointed out in the comments of this post, the formula works out to:

(2p – 1) * (c – p)

where p is the probability of the pitch being called a strike in the aggregate, and

A Hardball Times Update
Goodbye for now.

c is the home plate umpire’s call (1 for strike, 0 for ball)

Using this method, I calculated separately the sum of each umpire’s expected score (agreeing calls) and their unexpected score (disagreeing calls). Once I had the expected and unexpected scores, I calculated a ratio of expected to unexpected scores and then converted each ratio to a “plus” stat, Expected+, by dividing by the league average ratio. The ratio puts umpires with different numbers of opportunities on the same scale, and then the “plus” stat puts scores on a scale where 100 is league average. Every point higher or lower than 100 means one percent more expected or less expected the umpire’s calls are behind the plate than the average MLB umpire.

It is important to understand what this metric is measuring. Note that I use the word expected. It is ascribing positive value to calls that agree with the league majority for a given pitch location and negative value otherwise. This does not necessarily mean the call is correct, based on the rule book strike zone, which means this does not necessarily identify the best home plate umpire. An umpire may have a unique strike zone he calls quite consistently, and there is certainly an argument to be made that being consistent in any one zone is fine. However, this metric rates umpires on how well their calls agree with what is being called around the league.

In my opinion, given that all home plate umpires are evaluated and provided with feedback after each game, over the course of an entire season the aggregate of all called pitches is a good proxy for the strike zone the league wishes to have called. I also believe there is value is understanding how much an umpire tends to differ from the league average as far as the calls he is making, in terms of how expected his calls are and how often his unexpected calls are strikes on pitches that are more commonly balls and vice versa.

I calculated these numbers for all umpires in both the 2015 and 2016 seasons. Taking the seventy umpires who worked the most behind home plate over both seasons, there was a correlation of 0.66 in the Expected+ scores between the seasons. This suggests calling balls and strikes per the typically called league zone has a sizable degree of skill that is repeatable between seasons for umpires.

Here are the results of the 2016 season for Expected+ for all umpires.

Umpire Name 2016 Expected+
Jim Joyce 139
Mark Ripperger 135
Pat Hoberg 132
James Hoye 128
Chad Fairchild 127
Toby Basner 125
Mark Carlson 125
Ben May 124
Alan Porter 123
Greg Gibson 122
Bill Welke 121
Roberto Ortiz 118
Quinn Wolcott 116
D.J. Reyburn 115
Eric Cooper 114
Todd Tichenor 112
Adam Hamari 111
Mike Muchlinski 111
Sam Holbrook 111
Phil Cuzzi 110
Gabe Morales 110
Stu Scheurwater 110
David Rackley 110
Tony Randazzo 109
Marvin Hudson 108
Brian Knight 108
Chris Guccione 106
Jerry Meals 106
Jim Reynolds 106
Manny Gonzalez 105
Will Little 105
Mark Wegner 105
Alfonso Marquez 105
Cory Blaser 105
Jeff Kellogg 104
Sean Barber 104
John Tumpane 104
Brian Gorman 104
Paul Emmel 103
Mike Estabrook 102
Fieldin Culbreth 100
Brian O’Nora 98
Bill Miller 98
Ramon De Jesus 97
Chris Conroy 97
Doug Eddings 97
Joe West 97
Mike DiMuro 97
Ryan Blakney 97
Tripp Gibson 96
Tim Timmons 96
Scott Barry 95
Paul Nauert 95
Mike Everitt 94
Dan Bellino 94
Dan Iassogna 94
Jim Wolf 93
Chris Segal 93
Marty Foster 93
Dana DeMuth 92
Ted Barrett 92
Vic Carapazza 92
Gerry Davis 91
Chad Whitson 91
Gary Cederstrom 90
Adrian Johnson 90
Clint Fagan 89
Laz Diaz 89
Jeff Nelson 89
Jerry Layne 88
Mike Winters 88
Carlos Torres 88
Larry Vanover 87
CB Bucknor 86
Rob Drake 85
Jordan Baker 84
Lance Barksdale 84
Tom Woodring 84
Nic Lentz 83
Andy Fletcher 83
Ron Kulpa 83
Hunter Wendelstedt 82
Lance Barrett 82
Angel Hernandez 82
Kerwin Danley 82
Tom Hallion 81
Ed Hickox 81
John Hirschbeck 78
Bob Davidson 77
Dale Scott 72

Aside from this metric, I also drilled down into the unexpected calls made by each umpire to see the ratio of scores from pitches they called strikes when the league typically called the pitch a ball, and called balls when the league majority was a strike. This acts as somewhat of a proxy for strike zone size, as home plate umpires who call more unexpected strikes relative to balls than normal would tend to have what we perceive as a larger strike zone, and a smaller-than-average ratio would tend to indicate a smaller strike zone.

Once again, I adjusted the ratios to a “plus” stat. Here are the unexpected strike-to-ball ratio scores, or Unexpected S:B+, for the 2016 season:

Umpire Name 2016 Unexpected S:B+
Bill Miller 231
Jim Wolf 214
Bob Davidson 184
Brian Gorman 175
Roberto Ortiz 172
Doug Eddings 166
Stu Scheurwater 166
Lance Barrett 164
Mike Estabrook 162
Chris Segal 157
Eric Cooper 148
Hunter Wendelstedt 146
CB Bucknor 137
Will Little 137
Ben May 135
Kerwin Danley 133
Mike Everitt 130
Tripp Gibson 128
Andy Fletcher 126
Ed Hickox 125
Ted Barrett 123
Nic Lentz 120
Ron Kulpa 119
Jeff Nelson 116
Fieldin Culbreth 115
Mike DiMuro 115
Dan Iassogna 114
Jerry Layne 113
Jeff Kellogg 111
John Hirschbeck 111
Quinn Wolcott 109
Cory Blaser 109
Tim Timmons 108
Phil Cuzzi 106
Marty Foster 106
Vic Carapazza 105
Toby Basner 105
Marvin Hudson 102
Jim Reynolds 101
Brian Knight  98
Carlos Torres  98
Adam Hamari  98
Adrian Johnson  98
Mark Ripperger  97
Tony Randazzo  95
Angel Hernandez  94
Brian O’Nora  92
Mike Winters  90
Rob Drake  86
Lance Barksdale  86
Paul Emmel  86
Laz Diaz  86
Chris Guccione  85
Gabe Morales  84
David Rackley  83
Gary Cederstrom  83
Jim Joyce  81
Dan Bellino  81
Jordan Baker  81
Chad Fairchild  81
Chris Conroy  80
John Tumpane  80
Dana DeMuth  78
Mike Muchlinski  75
Sean Barber  75
Dale Scott  72
Sam Holbrook  70
James Hoye  70
Ramon De Jesus  69
Clint Fagan  68
Alan Porter  67
Chad Whitson  66
Mark Wegner  66
Pat Hoberg  64
Ryan Blakney  64
Bill Welke  63
Jerry Meals  63
D.J. Reyburn  61
Todd Tichenor  61
Joe West  60
Paul Nauert  59
Tom Hallion  58
Greg Gibson  57
Gerry Davis  56
Scott Barry  54
Manny Gonzalez  54
Larry Vanover  53
Alfonso Marquez  52
Mark Carlson  49
Tom Woodring  29

Note that Tom Woodring did not work many games behind home plate, so his extremely low score here is a small sample size. Do not take this to mean Bill Miller’s strike zone is 131 percent larger than the league average! Obviously, this could not be the case. This means that on calls Miller makes that are counter to the league norm, he is much more likely to be calling strikes when most umpires call balls than balls when most umpires call strikes.

The correlation between 2015 and 2016 scores for the busiest seventy umpires was 0.69, meaning once again this is an aspect of game calling that umpires do seem to carry significantly from season-to-season.

There was a correlation of -0.30 between Expected+ and Unexpected S:B+ in 2016, meaning there was value in having a slightly smaller zone this year in trying to conform to the league majority. This correlation was only -0.11 in 2015. Umpires tend to make more unexpected calls by calling pitches that are typically called balls as strikes than the other way around, so umpires that are less susceptible to this pattern tend to have marginally higher Expected+ scores under this system.

The most interesting home plate umpire to me after undertaking this exercise is Mark Ripperger. His Expected+ score in 2015 of 156 was by far the highest of any umpire in either of the last two seasons, with a difference between his score and the second-place score greater than the difference between second place and thirty-third place that season. Ripperger followed that up with the second-highest Expected+ score in 2016. He seems to have an excellent grasp on the strike zone being called in the league right now.

Another fun exercise I tried was looking at the most expected called game of the 2016 regular season. The game that had the best pitch calling with respect to matching the league aggregate was almost a perfect game from Brian Knight! There was only one “unexpected” pitch call, which was a called ball on a pitch location that was called a strike 51 percent of the time over the course of the season.

2016 World Series

Here is a game-by-game view of the home plate umpires assigned for the World Series games this season based on the perspective offered from these metrics. You’ll notice these umpires do not all call the most expected strike zones out of the set of umpires working in 2016. As I mentioned earlier, umpires’ jobs are very difficult, and in this article we have only been examining them from the lens of pitch calling. They are also responsible for game management, calling plays at bases, fair/foul judgments, and much more. I would expect the league would base the selection process for umpires on their entire body of work and include seniority and other undoubtedly factors when considering postseason assignments.

(Editor’s note: This article was written before the World Series began. We thought readers would be interested in knowing how all the scheduled home plate umpires rated.)

Game One: Larry Vanover
Expected+: 73rd (out of 90)
Unexpected S:B+: 87th (out of 90)
Vanover is at the extreme end among MLB umpires with respect to calling pitches typically called strikes as balls. His small zone tends to favor the hitter, and thus may be a challenge to navigate for Corey Kluber and Jon Lester. According to Baseball Prospectus, the strikeout-to-walk ratio in games he worked this season was 79th out of 90. Aside from a small zone, his calls did not line up as well with the expected MLB zone as most home plate umpires this season. Vanover worked Game Three of the NLDS between the Cubs and Giants.

Game Two: Chris Guccione
Expected+: 27th (out of 90)
Unexpected S:B+: 53rd (out of 90)
Guccione has a slightly smaller strike zone than average, as well, but he called a somewhat better-than-average expected zone. His results from 2015 were almost identical, so his calling pattern appears to be consistent. Guccione worked Game Two of the NLDS between the Dodgers and the Nationals.

Game Three: John Hirschbeck
Expected+: 88th (out of 90)
Unexpected S:B+: 30th (out of 90)
Set to retire after the season, Hirschbeck is the crew chief for the World Series. He seems to call one of the more unique zones, as his calls rated as one of the most unexpected in the game this season. He still calls one of the larger zones in the game, although it was less extreme this season than in 2015. Baseball Prospectus has his strikeout-to-walk ratio as 21st highest out of 90. This is his fifth World Series assignment and a nice way to complete his final season.

Game Four: Marvin Hudson
Expected+: 25th (out of 90)
Unexpected S:B+: 38th (out of 90)
Hudson rated as higher than average as far as his pitch-calling matched with the typical league zone this season. His metrics are also very similar to the previous season, so he seems to have settled into a relatively consistent pattern of calling pitches. Hudson was behind the plate for Game Four of the NLDS when the Cubs knocked out the Giants, so John Lackey already has pitched to his zone in this postseason.

Game Five: Tony Randazzo
Expected+: 24th (out of 90)
Unexpected S:B+: 45th (out of 90)
Randazzo also called a zone that matched the aggregate zone quite closely. His unexpected calls tend to be extra strikes and extra balls near the league average ratio. Randazzo was the home plate umpire for Game Three of the ALDS between Cleveland and Boston, the Red Sox’s final game of the 2016 season, when Josh Tomlin made the start.

Game Six: Joe West
Expected+: 47th (out of 90)
Unexpected S:B+: 80th (out of 90)
West called more pitches unexpectedly as balls this season than most umpires, meaning his strike zone was smaller than most. Baseball Prospectus noted his strikeout to walk ratio as 71st out of 90. This is his sixth World Series assignment, joining John Hirschbeck as the most experienced World Series umpires working this year.

Game Seven: Sam Holbrook
Expected+: 19th (out of 90)
Unexpected S:B+: 67th (out of 90)
Holbrook starts the World Series as the replay umpire before moving onto the field for Game Three. His pitch calling rated well as far as calling to the typical MLB zone this season. Holbrook worked Game Three of the ALDS when the Blue Jays finished the sweep of the Rangers.

Enjoy the World Series everyone!

UPDATE: Here are the results for the first two games.

Game One:
Expected+: Top 68% of games from 2016
Unexpected S:B+: Top 58% of games from 2016

Game Two:
Expected+: Top 9% of games from 2016
Unexpected S:B+: Top 4% of games from 2016

References and Resources

All data from Fangraphs unless otherwise noted.


Jon Roegele is a baseball analyst and writer for The Hardball Times. He was nominated for a SABR Analytics Conference Research Award in 2014 and 2015. Follow him on Twitter @MLBPlayerAnalys.
15 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Alex
7 years ago

Thanks Jon, really interesting stuff.

Is there anything to the notion that umpires can be (subconsciously?) biased towards the home team?

So could you split the stats between calling for home team and away team to see which umpire sees the biggest change?

Cheers.

Jim Anderson
7 years ago
Reply to  Alex

The home plate ump tonight was clearly trying to cheat the Indians, he made 17 wrong calls on balls and strikes against the Indians. It was pathetic. I live in Chicago, at least I am honest.

Jarod Garza
7 years ago
Reply to  Jim Anderson

Yes dude, he was trying to “cheat” the Indians…in a World Series game. Because that’s what these guys do…they go out there with complete disregard for their jobs in the grandest setting in baseball and intentionally “cheat” teams. In a given game they see close to 200 of the fastest and dirtiest moving pitches from best pitchers in the game and have to make judgement calls where a half inch is the difference between a ball and a strike. Had an off night, maybe, but he certainly was not trying to cheat anyone.

John Riley
7 years ago
Reply to  Jarod Garza

Jarod, you should stick to coaching t-ball as you obviously are not a baseball guy, dude.

Todd Huff
7 years ago
Reply to  Jarod Garza

1/2 inch off the plate? Wow, you must have pretty bad eyesight. Those pitches clearly all went against the Indians and was so one sided it was pathetic.

Dennis Bedard
7 years ago

The TV strike zone is a one dimensional picture frame that I believe is super imposed in front of the plate. However, the strike zone properly defined is three dimensional and any ball that enters the “box” is a strike. Thus, a breaking ball could look to be outside as it traverses the strike zone but then break into the zone behind the imaginary box. I always wondered if a pitcher could develop a pitch that was thrown high in the air and then dropped down into the strike zone at a perfect vertical angle. It would enter the strike zone at the top and then hit the center of home plate. This pitch would be impossible to hit. I remember Steven Talbot, a Yankee pitcher in the Horace Clarke era, throwing a blooper pitch that got a lot of laughs but obviously the pitch (and him) never made it.

Dubslow
7 years ago

Do I interpret the Game Results right by saying that Game 1 was average or a touch better than average on both metrics, while Game 2 was a pretty bad outlier having a consistently smaller than typical zone?

Barbie
7 years ago
Reply to  Dubslow

The Sutton & Barto book is indeed the earliest mention I have found of the hashing-trick. Well sp#eotd!Dtn&o8217;t we need to choose the size of the hashing range (i.e. bit mask) as an a priori model complexity parameter?

Gary Growe
7 years ago

Aren’t we talking about Steve Hamilton’s “Folly Floater” in the exchange re the 3-D strike zone?

Dennis Bedard
7 years ago

Ah yes! How could I confuse Hamilton for Talbot. It was Fred and not Steve Talbot. But Steve Hamilton was definitely the player who threw what purported to be a vertical strike.

Guy
7 years ago

Jon:
Great work, as usual. Is it possible to give us a sense of how large the difference is between high and low Unexpected S:B+ umpires, in terms of their average impact on a game? For example, could you tell us how many more/fewer strikes are called per game, and how R/G compare, between the top 30 and bottom 30 umps (or whatever grouping you think is appropriate)?

Kincaid
7 years ago

This is really interesting. Good work.

One thing to note is that the negative correlation between Expected+ and Unexpected S:B+ is probably expected and not necessarily a sign that a smaller zone correlates with conformity to the league-wide zone. That’s because Unexpected S:B+ uses a ratio with called balls as the denominator, which means it won’t give symmetrical scores for high-strike-call umps and high-ball-call umps.

For example, say the average ratio is 1:1 and you have two umpires, one of whom calls 100 strikes and 50 balls, and the other 50 strikes and 100 balls. The former will have a ratio of 2:1 for an Unexpected S:B+ of 200, while the latter will have a ratio of 1:2 for an Unexpected S:B+ of 50, which is closer to the average Unexpected S:B+ score than the first ump. Because both are equally extreme in their ball-strike tendencies but the high-strike-call ump has a more extreme Unexpected S:B+, you’ll probably get a negative correlation even in the absence of any relationship between strike zone size and Expected+.