May 18, 2013

THT Essentials:
Fangraphs Player Search:


And here's the full roster.

Now available


You can now purchase the Hardball Times Baseball Annual 2013, with 300 pages of great content. It's also available on Amazon and Kindle. Read more about it here.

THT's latest e-book


Third Base: The Crossroads is THT's new e-book, available for $3.99 from the Kindle store. The good news is that anyone can read a Kindle book, even on a PC. So enjoy the best from THT in a new format.

Most Recent Comments





Get your very own THT merchandise from our CafePress store. We've got baseball caps, t-shirts, coffee mugs and even wall clocks with the classy THT logo prominently displayed. Also, check out the THT Bookstore. Please support your favorite baseball site by purchasing something today.



Or you can search by:


Creative Commons License
All content on this site (including text, graphs, and any other original works), unless otherwise noted, is licensed under a Creative Commons License.
Roll mouse over date for entries
THT Live Calendar
May 2013
S M T W T F S



1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31

Monday, August 24, 2009

What’s a “hit?”...and thoughts on advanced baseball analysis

Posted by Pat Andriola
Andrew Gelman is a professor of statistics and poltical science at Columbia University, and may be most known for his fame as a writer at Nate Silver's outstanding politics blog, Five Thirty-Eight. In a post at his own blog, Gelman answered a question on sabermetrics, and in his reply wrote the following:

There's often a fuzzy line between (a) making inferences, and (b) simply "measuring what happened." I mean, what's a "save"? For that matter, what's a "hit"? Etc. These definitions are constructed to be relevant for inferential questions about players' abilities and contributions to the team.

One way things are changing is that there's a ton of raw, raw data--locations of where every ball landed on the field, things like that. In that case, the steps going from raw data to inference are going to be more apparent. With old-fashioned statistics such as batting and fielding averages, it can be easier to fool yourself into thinking of them as pure measurement.


It's easy to discredit a save as a flawed statistic that really doesn't tell you anything. The save is an easy target; it was invented in 1959 by the late sportswriter Jerome Holtzman to give some tangible statistical credit to guys who were closing games, and despite its popularity, it has clear fundamental flaws that make it an entirely useless metric. But what about hits? When we say that Derek Jeter got a "hit," are we at the same time not doing exactly what Holtzman and others did: giving credit to a player by using a concise and easy-to-use word to sum up what they have done on the field.

A hit "is credited to a batter when the batter safely reaches first base after hitting the ball into fair territory, without the benefit of an error or a fielder's choice" (per wikipedia). However, with what we know about DIPS theory and the various amount of luck involved in baseball, a "hit" doesn't much mean anything at all. That's why we've begun to use batted ball data to help us understand in what fashion a player put a ball into play. We've already begun to divide the batted ball data into sub-categories as well: line drives, ground balls, and fly balls (and recently we've seen the use of "fliners).

However, while looking at how many line drives a player has is nice, it is only a step up from a "hit." It still doesn't tell us everything we need to know about the ball in play. Just like how all hits aren't made equal, neither are all line drives (or gound balls, or fly balls, etc.). In fact, even the stats that we use and love aren't perfect. The great MGL, in a post at The Book blog, criticizes Joe Posnanski for over-stating the accuracy of UZR, saying:

He is really overstating the precision with which the data is recorded and I think he knows that, or at least should. There is no way that they can differentiate between a ball hit 6 inches from the base line and 3 feet from the base line. And there is NO category that I am aware of that is a “high chopping ground ball just over the pitcher’s glove.” Come on! Which is one reason why there is so much measurement error in these metrics in the short run. A high chopping ground ball over the pitchers mound (that could easily be fielded by either the SS or 2B) could just as easily have been a ground skinner up the middle that no one could possibly have fielded. They could easily fall into the same bucket, in which case, the fielder who catches the first one will be over compensated on that ball and both the SS and 2B will be overly penalized on the second one.


This isn't to say at all that UZR is innacurate; on the contrary, along with John Dewan's Plus/Minus system, it's helping to revolutionize how we rate fielders, and is far and away better than the old metrics used. But while UZR is one of the best defensive stats we have, it still follows the guidelines of how we use the term "hits," only on a much less egregious scale. A ball hit six inches from third base is placed into the same bucket as one hit three feet away, even though both are definitely different grounders. A ten-foot roller down the third base bag is placed into the same category ("single") as a scorching liner played on one hop by the center fielder.

So what can we do further? Well, thanks to the wonderful advancement of technology, we can look at literal, objective facts (assume little recording error/bias) about balls put in play: the force at which they were hit, the velocity, at which vector, etc. In fact, Alan Schwarz did a good job at The New York Times detailing the future of this analysis, saying:

A new camera and software system in its final testing phases will record the exact speed and location of the ball and every player on the field, allowing the most digitized of sports to be overrun anew by hundreds of innovative statistics that will rate players more accurately, almost certainly affect their compensation and perhaps alter how the game itself is played...In San Francisco, four high-resolution cameras sit on light towers 162 feet up, capturing everything that happens on the field in three dimensions and wiring it to a control room below. Software tools determine which movements are the ball, which are fielders and runners, and which are passing seagulls. More than two million meaningful location points are recorded per game.


This is the future of baseball analysis. We can then use regression to determine just how valuable a ball hit along vector 12 at 95 mph is, further enhancing our ability to evaluate players. Some may deem this as a system that is taking the fun out of the game. "When my grandfather sat me down to talk about Bobby Thomson's homerun, he wasn't talking about a ball hit at x mph in vector 17." However, passion and respect for the game and its history are not mutually exclusive from in-depth analysis of what happens on the field. In fact, one could argue that advanced analysis strengthens our love and understanding of baseball.

The ultra-precise stats aren't here quite yet; however, they are right around the corner, and they will soon become part of everyday advanced anlysis. It doesn't mean we have to stop talking about Ichiro getting 200 hits every season, or even ignore milestone moments like 300 wins or 500 saves. It just means that when it comes to evaluating the performance of players, we'll have the use of cutting-edge technology to help us, and what can be so bad about that?




Pat Andriola is a JD/MBA student at NYU. He likes the Mets a lot. You can contact him at .(JavaScript must be enabled to view this email address).


Comments

Nick Steiner said...

That was MGL who criticized Poz, BTW.

Nice post though, this has always crossed my mind whenever I hear people diss defensive stats.  IMO, the only reason that they correlate worse than offensive ones is that they comes in a smaller sample size.

The one good thing we have with hitters is BABIP, which allows to to somewhat gauge how lucky/unlucky hitters are.  But year, Hit f/x is gonna help a lot.

Posted 08/25  at  02:22 AM
Pat Andriola said...

Thanks for the pointout, Nick. And yes, Hit/fx should certainly be awesome.

Posted 08/25  at  02:26 AM
Paul said...

I wonder what exactly the future holds. Perhaps a classification of hits that could be deemed luck-driven flukes? Or maybe a classification system for unlucky outs? Just trying to think of ways to disentangle true talent from observed talent..

Hit/fx also tracks outs right? Or would we need an Out/fx too?

Posted 08/25  at  01:15 PM
Pat Andriola said...

Paul, hit/fx would/should track all balls hit into play, regardless of whether they are outs or not.

Posted 08/25  at  02:26 PM
Page 1 of 1

Leave a comment:

Commenting is not available in this weblog entry.