May 20, 2013

Now Available for 2012


THT Essentials:

Now available


You can now purchase the Hardball Times Baseball Annual 2013, with 300 pages of great content. It's also available on Amazon and Kindle. Read more about it here.
Fangraphs Player Search:

THT's latest book


Third Base: The Crossroads is THT's new e-book, available for $3.99 from the Kindle store. The good news is that anyone can read a Kindle book, even on a PC. So enjoy the best from THT in a new format.

Most Recent Comments




And here's the full roster.



Or you can search by:


Creative Commons License
All content on this site (including text, graphs, and any other original works), unless otherwise noted, is licensed under a Creative Commons License.
Roll mouse over date for entries
THT Dispatch Calendar
May 2013
S M T W T F S



1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31

Tuesday, April 03, 2012

Umpire statistics

Posted by Dan Brooks
We (Harry and I) have released PITCHf/x based statistics for every umpire that has called a PITCHf/x enabled game, provided that they called enough pitches to accurately represent their called strikezone. We sincerely appreciate you taking the time to read this post before using our umpiring data.

Being an umpire is hard. It might be one of the hardest skills in baseball, and it sure doesn’t pay $20 million a year.

Umpires are also damn good at what they do.

But at least in the public domain, there’s been little systematic survey of umpiring. There are several reasons. First, because of the way people would naturally use the data without proper instruction, it would create unnecessary controversy. No one wants to be at the center of a media scandal involving umpiring. And so, consider this your proper instruction: If you use these data to rip umpires, consider yourself an idiot.

It’s true, there are good umpires and there are better umpires, but we’re aiming to show you what umpiring really looks like, not what umpiring fails to do. We want to paint a picture of each umpire's strengths and weaknesses, of their proclivities for calling particular pitches in particular ways. This is not an argument for computerized or mechanized strike zones; nothing could be further from the truth. Use this data wisely.

Second, umpiring is a difficult skill set to properly describe. To really do it well, you’ve got to have access to a nice database, have things properly classified, and have the right mathematical models to present the data. Here, we would like to extend a warm thanks to Dave Allen, who some five years ago showed us how to apply heat maps to PITCHf/x data in a presentation at Sportvision’s Summit using LOESS (Locally Weighted Scatterplot Smoothing) Regression.

Third, defining the strike zone is notoriously difficult. The problem is that using some average strikezone will probably not be good, because batters vary in height. However, the “easy” solution, which is to use the sz_top and sz_bot parameters from the Gameday data isn’t really a solution at all, because those parameters (as Mike Fast has convincingly shown) vary too wildly between games to give a good estimate of batter height on a per-pitch basis. Here, we’ve chosen to use an equation that looks at the average sz_top scores and weights those by a player’s height.

The strike zone is also technically a three-dimensional volume, and we’ve chosen to define it as a two-dimensional plane at the front of home plate (as we have elsewhere on the site). We realize this introduces error, but the alternative is simply too difficult to represent graphically. So we hope you forgive us here, and understand that this may slightly bias results.

We’re choosing to first present the data in two ways. The first is in a tabular form that reports not only hits (a pitch in the strike zone called a strike), misses (a strike called a ball), correct rejections (CRs, a ball called a ball) and false alarms (FAs, a ball called a strike), but also some psychometric measures of detection: d’ and c. Here’s the example from Angel Hernandez’s card:

image

While these last two require some explanation, the easiest way to think about them is that d’ represents discriminability (how well an umpire performs; larger is better) and that c represents how biased the umpire was in favor of hitters or pitchers (c<0 = pitcher friendly, c>0 = hitter friendly) on any particular pitch. These measures have not yet been re-normalized, but they will be, so that you get an idea of how friendly a particular umpire was relative to other umpires.

The second is a LOESS Heat Map for each batter handedness split by pitch type. This will give you the ability to tab through and see the differences in the strike zones called by each umpire in a more graphical way. Here’s Angel Hernandez’s strike zone:

image

We hope you enjoy these statistics and use them responsibly. Be an educated, informed fan who recognizes that ripping an umpire because he missed a call is often a selfish act with little justification. Often, looking at the bigger pattern of data can give you more answers than a single pitch.

Please feel free to direct any feedback to Dan Brooks (@brooksbaseball) or Harry Pavlidis (@harrypav) on Twitter, or by commenting below.



Dan Brooks is a Neuroscientist at Brown University. He operates BrooksBaseball.net and eats Fried Chicken during every Red Sox game, especially in September. Come follow him @brooksbaseball.


Comments

mp said...

Dan, hope you can add a single table with every umpire and their basic stats.  Great work.  Thanks.

Posted 04/03  at  12:59 PM
Page 1 of 1

Leave a comment:

Commenting is not available in this weblog entry.




The best online source for major league baseball tickets is Ticket City.

     Next Dispatch:  Shark attack>> <<Previous Dispatch:  Things are trending up