That was a strike?by Josh Kalk
January 20, 2009
|Angel Hernandez getting ready to work a game behind the plate. (Icon/SMI)|
The strike zone is that area over home plate the upper limit of which is a horizontal line at the midpoint between the top of the shoulders and the top of the uniform pants, and the lower level is a line at the hollow beneath the knee cap. The strike zone shall be determined from the batter's stance as the batter is prepared to swing at a pitched ball.
In theory it is a simple rule. In practice it is much much harder than that. There are more arguments with umpires over the strike zone than any other call by a huge margin. The sheer volume of calls the umpire has to make behind the plate means that more than likely over the course of a game a batter or a pitcher is going to see a pitch differently than the umpire. This makes calling balls and strikes the most important job of any umpire.
Because of this, in recent years automated systems have been put in place to see how umpires are doing behind the plate. In 2001, QuesTec camera system was put in place in 11 stadiums to track umpire calls. In 2008, a PITCHf/x camera system was installed in every major league park and gathered data on almost every pitch thrown (around 700,000 pitches). These data are what Major League Baseball Advanced Media (MLBAM) uses for its gameday feature and is accurate to about a half an inch near home plate. The findings are put online for anyone who wants to study them.
It didn't take long for people to start using this, and other data, to start studying umpires and how good a job they really do calling balls and strikes. Here at THT John Walsh helped pioneer this research, defining an actual strike zone for left- and right-handed batters and many others have followed suit with other excellent studies. Another study by Hamermesh, et. al., looked at the effect of race of the pitcher and umpire on the percentage of strikes called and found that the umpires were indeed slightly racially biased.
This study sparked a lot of reaction and further examination, the best of which came from Phil Birnbaum at his blog. He noted that because the number of African American and Latino umpires was small, most or all of the significance came from umpires Laz Diaz and Angel Hernandez. This sparked debate at Tom Tango's site, where everyone seemed to agree that Hernandez was a bad umpire.
In fact, there's even an anti-Hernandez anti-fan page and he certainly has seen his share of controversy in his time as an umpire, much of this revolving around his high strike percentage.
So: If everyone seems to think that Hernandez is a poor umpire and we now have data on almost every pitch he has called, then we now should be able to statistically verify or refute this.
Welcome to the world of PITCHf/x
The PITCHf/x camera system takes about 25 pictures of the ball in flight between the mound and home plate. Sportvision then runs a best fit algorithm and reports the flight of the ball as a parabola in each of the three dimensions. From this path, different variables can be calculated and studied, including two variables that report the position of the ball when it cross the front of home plate. These variables, which are reported in the data as x (horizontal) and z (vertical), can then be plotted along with the umpire's call for the pitch. Here is an example of straight out of the box data for Angel Hernandez from his perspective.
Again, this view is from behind home plate, so a right-handed batter is standing around -18 inches. Only pitches that were called balls or strikes are plotted and intentional balls have been removed. The strike zone is inflated by the radius of the ball to represent any part of the ball crossing any part of the plate as being a strike. I've superimposed the MLB-defined strike zone for a player of average height in black and the best fit strike zone for Hernandez in blue. I obtained that by drawing the rectangle that had the least amount of errors (called strikes outside the box and called balls inside the box). Straight out of the box things aren't looking good for Hernandez; he appears to have a wide zone, especially outside, to both left- and right-handed batters and hardly ever calls pitches at the knees strikes. In addition, there is a smattering of strikes high out of the zone and many balls inside the upper portion of the strike zone.
|Identifying the top and bottom of Ryan Howard's strike zone is easier said than done. (Icon/SMI)|
You may have noticed a couple of problems with the data out of the box, however. The most glaring is that batters come in different sizes, so what is a strike to one may be a ball to another. To help with this issue, when a batter steps to home plate a person working for Sportvision—called a stringer—will identify where the top and bottom of the strike zone is and report that in the data. This would be perfect if the stringers would report the same numbers each time a batter steps to the plate, but sadly these numbers can really jump around. To try to minimize statistical error, the best method is to average each at-bat for a hitter and use that number for all his at-bats. When you do that, you can see how big of a difference this can make.
Once we have a fixed strike zone for each hitter, we can normalize all the data to a league average height, thus fixing any problems with the top and bottom of the strike zone. As far as I know, John Walsh was the first to use this, in the articles I have linked above.
Now we are getting somewhere, but there still is another problem with the data. Home plate isn't just a simple plane like a window. It is a complicated, three-dimensional object (kind of like a house that has fallen on its side). A pitch that isn't over the plate at the front can still move over the plate near the back. Here is a great example of such a pitch by Andy Sonnanstine pitching to Jorge Posada with Angel Hernandez behind the plate from the top down.
Here the plate hasn't been enlarged and the width of the curve is the actual diameter of a baseball. The y position, which is on the x axis here, has values near 55 feet because around 55 feet from home plate is where the camera system begins to report the ball (I didn't come up with this naming/numbering scheme; I am just using it). Here the ball is not over the plate at the front but narrowly scrapes home plate near the back. Angel Henandez correctly called this pitch a strike.
You might be thinking this is a small effect that can't possible matter. Well, at the front of the plate the middle of the ball is over two inches away from the edge of the plate. If you look back at the two plots of Hernandez's calls to left and right handed batters you can see that Hernandez appears to be generous on the corners of the plate by a little more than two inches. Clearly, if you want to look at umpires in detail just looking at the front of the plate isn't going to be enough.
Method for corrections
Any part of the ball that crosses any part of the plate.*
* From a Little League umpire whose husband e-mailed me discussing the accuracy of a strike zone tracker used by a local broadcast team.
Asking even a very accurate tracking system like PITCHf/x to determine if any part of a sphere crossed any part of a crazy seven- sided, three-dimensional object that changes from batter to batter is not an easy task. The best way to attack the problem is to increase the size of the strike zone by the radius of the ball. Then, the sphere becomes a curve. Now asking where that curve crossed a plane of the strike zone is an easier task, but we would like to do better than that. Ideally, we would like to present the data in a nice graphical format like the left- and right-handed plots above. We can reduce the problem back into two dimensions by reporting the point where the ball came closest to the strike zone if it didn't actual enter the zone or the point of deepest penetration for balls that did enter the strike zone. This reduces to a three-dimensional minimization/maximization problem that still would make a Calculus III student wince, but can be done on a computer in a reasonable amount of time.
Using these minimums/maximums and normalizing for the changing height of the strike zone, we come up with these two plots for Hernandez.
Again, the MLB-defined strike zone is in black and the best fit strike zone for Hernandez is in blue.
After these adjustments, things look a lot better for Hernandez. This top of the strike zone as defined by MLB and Hernandez's top are almost in complete agreement. The inside corner (negative for RHB positive for LHB) is also almost perfect. While Hernandez is still calling strikes a few inches off the outside corner, this too has been reduced. You will notice that many of these called strikes that are slightly off the plate now appear to be low in the zone.
The reason for this is pitches almost always are falling as they reach home plate. If a pitch is slightly off home plate horizontally but moving in toward home plate when it crosses the front ,the point where the ball is closest to the plate will be at the back of the plate when the ball has dropped some. If, however, the ball already started at the knees, this will cause the ball to move further away from home plate even if it is getting closer horizontally. So a ball at the knees when it crosses the front of the plate probably is already near its closest point, while a ball at the letters probably won't reach its closest point until the back of the plate.
Because the adjustment at the knees is small, the gap between the MLB strike zone and Hernandez's strike zone still remains. This isn't too surprising, since it was exactly what John Walsh found in his original study. I do want to point out here that we are relying on the Sportvision stringers to determine the top and bottom of the zone, and while we already have corrected any statistical errors they might have made, they might be systematically reporting a lower value than the rules suggest. In any case, this is about a two-inch difference for left-handed batters and about three inches for right-handed batters.
Analyzing the results
If you do this for all umpires, then you can start studying them as a group. You can ask how often they correctly identify a pitch compared to the MLB rule definition (accuracy), how often they correctly identify a pitch compared with their personal strike zone (precision), and how large these strike zones are compared to the MLB strike zones (size) for both left- and right-handed batters. When you do this, the first thing that jumps out at you is just how accurate and precise major league umpires are as a group. While they tend to give the outside corner a bit too much and are a bit stingy near the top and bottom, for the most part a pitch over the plate is called a strike and almost always a pitch that is thrown within a specific umpire's zone is called a strike. Of the 82 umpires who were behind the plate when the PITCHf/x cameras were working last year, even the least consistent was very consistent.
When you focus in on Angel Hernandez specifically, you find that he grades out as an above-average umpire in each metric of even-handedness. No matter how you slice it, looking at the 2008 PITCHf/x data, Hernandez is a quality umpire among an already amazing group. While he doesn't get every call right, he gets the vast majority correct and his strike zone hardly varies from game to game.