October 10, 2008
Pre-order the 2009 AnnualThe 2009 Hardball Times Annual, to be shipped in early November, is ready for preorder. This year's Annual will, once again, be the best yet. We're beefing up our articles to include 40 prominent writers, such as Craig Wright, Rob Neyer, Joe Posnanski, Don Malcolm, Tangotiger, MGL, and many others. Please support THT and order it directly from ACTA today. If you can't afford to purchase from ACTA, then please order from Amazon with this link. ![]()
Rich Barbieri John Barten Sal Baxamusa John Beamer Brian Borawski John Brattain Craig Brown Matthew Carruth Derek Carty Alex Eisenberg Mike Fast David Gassko
Brandon Isleib
Chris Jaffe Josh Kalk Chris Neault Dave Studeman Steve Treder Bryan Tsao Tuck! John Walsh Geoff Young And here's the full roster.
Or you can search by:
THT's ToolboxStubHub is where fans buy and sell Yankees Tickets, Red Sox Tickets, White Sox Tickets, Mets Tickets and all other baseball tickets. If you are looking for World Series Tickets, ALCS Tickets or NLCS Tickets, you can find them at StubHub! More hot selling tickets include: Cubs Tickets, Astros Tickets, Dodgers Tickets, Angels Tickets and Detroit Tigers Tickets. Gear up for baseball season with Chicago White Sox tickets and New York Yankees tickets. LA Angels tickets, Houston Astros tickets, and Atlanta Braves tickets are hot sellers! You can get Boston Red Sox tickets, San Diego Padres tickets or Chicago Cubs tickets for your favorite baseball fan. Coast to Coast Tickets has the best MLB tickets like Minnesota Twins tickets, LA Dodgers tickets, Milwaukee Brewers tickets, New York Met tickets and St. Louis Cardinals tickets.
All content on this site (including text, graphs, and any other original works), unless otherwise noted, is licensed under a
Creative Commons
License.
|
Are umpires racist?by John BeamerAugust 27, 2007 Accusing someone of being racist is a bold and emotive claim. But that is exactly what Time does in writing up a study from Daniel Hamermesh, a respected professor at McGill University, who released an academic paper on the subject. To be fair to Daniel he doesn't use the word racist anywhere in the study—in economist lingo the term is racial bias: Umpires are more generous in their calls to pitchers who are of the same race—in other words they exhibit racial bias. Perhaps the difference is that racial bias is a subconscious act, whereas racism is an overt choice, but either way this has the potential to be an inflammatory issue. Before I go on it is worth acknowledging the slew of blogs and individuals that have contributed to the debate and on which I draw a lot for this column. First, Phil Birnbaum at Sabermetric Research has a couple of extremely lucid posts on the subject. Phil edits SABR's By the Numbers journal, and his blog is one of the most accessibly written on statistical matters. A lot of the studies I refer to originated from Phil. Also MGL of The Book Blog has tried to replicate the Hamermesh result but through different techniques. During this column I'll refer to his work too. Also I'll draw on the expertise of some of the great minds of sabermetrics; people like Guy, Tango and Pizza Cutter have all thrown their tuppence worth into the ring and added significantly to the debate. We're done with the bluster; let's investigate the study in a little more detail. The studyIn summary, Hamermesh et al look at pitch-by-pitch data from ESPN for every MLB game between 2004 and 2006—a total of 2.1 million pitches. For each pitch they recorded the outcome (eg, called strike, swinging strike, hit by pitch, ball in play) as well as other game information such as the pitcher (including his line), umpire crew, team standings, attendance, mascot name etc.—you get the idea. The pitchers and umpires are classified according to race as either White, Black, Asian or Hispanic. A combination of databases and image searching is used for this classification, which seems a reasonable method. The authors then did a couple of things to try to detect racial bias. First they tabulated the data by umpire and pitcher race to see if any clear differences popped out. Then they ran some complicated regression models to try to tease out hidden relationships, pin down firm results (statistical significance in the trade) and identify the implications for the game (baseball significance). So, what did the study find? First, let me present the simple data tabulation: Summary of Umpires’ Calls by Umpire-Pitcher Racial/Ethnic Match
Pitcher race
Umpire race White Hispanic Black Asian TOTAL
White Pitches 1,388,318 445,107 47,797 56,866
Called strikes (%) 32.06 31.47 30.61 31.97 31.89
Hispanic Pitches 45,603 13,737 1,552 1,406
Called strikes (%) 31.91 31.8 30.77 30.43 31.81
Black Pitches 87,170 26,054 3,377 3,179
Called strikes (%) 31.93 30.87 30.76 30.19 31.62
TOTAL Called strikes 32.05 31.45 30.62 31.84 31.87The authors use called strikes to detect possible racial bias by the umpire. A called strike is a subjective ruling, so the theory is that if a white ump favors a white pitcher he is more likely to call pitches on the margin of the zone a strike rather than a ball. The table is a bit busy, but we can infer a few things from the data. First, there are no Asian umpires in major league baseball (hence the lack of data). Second, white umpires and pitchers are in the majority accounting for 90% and 60% of their respective populations, which is a lot and as you will see plays an important role when interpreting some of the authors' conclusions. Third, it doesn't matter what race the umpire but white pitchers have a greater ability to throw strikes than do non-white pitchers. That's an important finding. White hurlers throw more strikes. Look at the data. Black umps call 31.93% strikes when a white pitcher is on the mound compared to 30.76% when the pitcher is black! Accepting the raw percentages, the most discriminated set of pitchers are Asian by black umpires, although that particular combo only accounts for 3,000 of the 2,000,000 pitches recorded; at the other end of the spectrum white umps call strikes for white pitchers 32.06% of the time. As well as looking at the raw numbers the authors ran a few regressions as is their wont. The benefit of a regression is that you can control for a bunch of different factors. The factors that are controlled are: pitch count, inning, home-field and game score. Interestingly when considering the race of pitchers and umpires separately (eg, white on white, white on black etc.), the authors find no racial bias—this is attributed to sample size issues, particularly among non-white pitchers and umpires. Only when clumping all the data together is any effect observed. The most important finding is that when the pitcher and umpire are the same race then a pitch is 0.34% more likely to be called a strike—which the authors claim is equivalent to just less than one pitch per game. In fact it is closer to one strike every five games, but that is by-the-by (there are approximately 70 called pitches a game). Think about that for a second. Is that evidence of racial bias? Is it significant in a baseball sense? We'll come back to those questions later. The authors then look at a number of other factors. The first is whether the umpire was in a Questec park. Questec is strike zone recognition software that can grade an ump's performance by assessing how accurately he called the strike zone. The theory is that in Questec parks we should see any evidence of racial bias disappear because the monitoring system makes umpires more vigilant. Eyeballing some of the data it appears that different race umpires actually call more strikes than same race umpires in Questec parks—in other words over compensating for the race effect! A regression is used to try to add some statistical validity to this finding. First the authors find the "Questec effect", that called strikes are less likely in those parks with Questec installed (0.66% per pitch). Then the authors looked at same race umpires and found that they called about 1% fewer strikes in Questec parks, which again points to racial bias in non-Questec parks. I can sense you're running out of gas so let me wrap up by sharing a couple of the remaining conclusions. The authors also found a link to attendance—the more watched the game the less the racial bias—and also to terminal pitches (where there are two strikes and/or three balls on the board). Again, in terminal counts there was less racial bias. The inference is that the more scrutinized the situation, be it Questec, the public or the media, the more the umps were likely to adjust their inherent tendencies. Do umpires show racial bias?The million dollar question! The Hamermesh study shouts a yes, but it is unclear. In my opinion benefit of doubt probably sits with the umpires unless we can categorically prove the contrary. Also the effect, if it exists at all, is so small as to not make any practical difference. Phil Birnbaum's analysis is probably the best refute of the racial bias argument. Consider the called strike line in the table above. This gives us: Pitcher Umpire White Hispanic Black Asian TOTAL White 32.06 31.47 30.61 31.97 31.89 Hispanic 31.91 31.8 30.77 30.43 31.81 Black 31.93 30.87 30.76 30.19 31.62 TOTAL 32.05 31.45 30.62 31.84 31.87 Phil's argument, which is correct, is that if racial bias exists then we should be able to see it in this table. He hypothesizes that an unbiased table could have the same called strike percentage across all the pitchers. Pitcher Umpire White Hispanic Black Asian TOTAL White 32.05 31.45 30.62 31.84 31.87 Hispanic32.05 31.45 30.62 31.84 31.87 Black 32.05 31.45 30.62 31.84 31.87 White pitchers hit the strike zone more often than pitchers of a different race do, and this is reflected as the umps call the same strike percentage irrespective of ethnicity. Phil calculates the number of pitches that need to be called differently to translate the top table to the bottom. The answer? Precisely 228, that's how much. That's out of 2 million actual pitches and 700,000 called pitches. Yes, just 228. It doesn't sound like a big difference and as one standard deviation in the data is about 500 pitches, it assuredly isn't. A quick way to verify this result is to run a Chi Squared test. A Chi Squared test compares an observed distribution to an expected distribution and reports whether they are similar or different. In this instance the test confirms there is no significant difference between the two matrices. Of course there are other unbiased tables, for example, black umpires could consistently call a small strike zone across all pitchers. Phil shows that these tables aren't statistically different from the actual data. Although we can't find any statistically valid result using this approach the table still hints at some racial bias (look at Asian pitchers vs black umps). It could be that by controlling for other factors, as the regression does, allows us to peer behind the superficial numbers and detect racial bias; or it could be that there is something funny going on with the authors' analysis. So how did the authors managed to show that there was a significant effect? There are some possible clues in the data. If you look closely the only racial bias appears to be among Hispanic and Asian pitchers, particularly when a black ump is behind the plate. This could be skewing the data to show racial bias as a general conclusion when it only really exists in patches—and these pockets of differences are based on relatively small sample sizes. Also there appears to be little difference in called strike distribution among the black and white pitchers save that white pitchers generally have a higher called strike percentage overall (look at the table above). Another clue is that when looking at pitcher and umpire race individually no effect can be found, it is only when combining the data into same race and different race buckets is the effect revealed. Yes, the aggregation appears to be turning the regression significant. If you calculate a weighted average of same race umpires the called strike percentage is 32.05% versus 31.5% for different race umps—a 0.6% difference. We know that white pitchers have more called strikes so if we look at white hurlers against the rest the percentage called strikes are 32.05% and 31.4% respectively—a 0.7% difference. That's it. Most of the difference is explained by the fact that white pitchers have more called strikes and white umpires take charge behind the plate more often! You'd expect the regression to capture the result, so why doesn't it? That is still a little unclear. There are a couple of likely explanations. One, by aggregating the data the sample size becomes large enough for a miniscule effect to be teased out. Or two, the regression is somehow failing to control for race properly. So, which is it? My hunch is that while there may be a tiny effect among some umpire groups, the authors' regression hasn't properly controlled for the race of the pitcher. Ninety-five percent-plus of the umpire/pitcher pairs are white, so this variable is a proxy for white pitchers rather than same race umpire/pitcher combinations. To test this the authors could run the regression with pitcher = white as a variable and if the above theory is true they should get the same results. I suspect that controlling for pitcher race would make the claimed effects small to negligible. For the main regression result (the 0.34%) there is no individual pitcher or pitcher race control (although, strangely, this is controlled for in later regressions). Another issue is that there are only very few non-white umpires in the bigs. The authors have identified three Hispanic umpires and five black umpires compared to 85 white umpires. It could easily be that one anomaly among the non-white umpires skews the data. Drawing conclusions about racial bias on such a skewed sample is a little dangerous. What about the other findings, for instance, that playing in Questec parks reduce the likelihood of racial bias among same race umpires? I suspect that the same issues highlighted above are in play. Questec is only in a small number of parks (11 out of 30) and teams and umpires will not be distributed randomly among them (home teams will be over-represented). The fact that we see over-compensation from the graphical data is also strange. Both these facts have to put the Questec conclusions in some doubt. The attendance finding is also spurious. Why would a better attended game result in a less biased strike zone? If you're in the crowd behind the plate can you really call the strike zone? No. And every game is televised—there is more scrutiny from that than there is from a few fans behind home plate whose view is obscured by the umpire.
|