February 9, 2010

Fangraphs Player Search:

Order Now


Get "The world champ of baseball annuals." The Hardball Times Baseball Annual 2010 features articles by THT's staff as well as Bill James, Tom Tango and Craig Wright and contains much, much more. Please support THT and use this link to purchase the Annual.


Get the fantasy book that everyone's raving about! Edited by THT Fantasy's Rob McQuown and Michael Street, and featuring our own Matt Hagen on prospects. Shipping now from ACTA!


And here's the full roster.



Or you can search by:

Sports Tickets

Gear up for baseball season with Chicago White Sox tickets and New York Yankees tickets. LA Angels tickets, Houston Astros tickets, and Atlanta Braves tickets are hot sellers! You can get Boston Red Sox tickets, San Diego Padres tickets or Chicago Cubs tickets for your favorite baseball fan. Coast to Coast Tickets has the best MLB tickets like Minnesota Twins tickets, LA Dodgers tickets, Milwaukee Brewers tickets, New York Met tickets and St. Louis Cardinals tickets.
Find premium Chicago Cubs tickets and other Chicago tickets at JustGreatTickets.com.
Chicago Cubs Tickets
Chicago Tickets
Championship Tickets



Creative Commons License
All content on this site (including text, graphs, and any other original works), unless otherwise noted, is licensed under a Creative Commons License.

Evaluating defense using HITf/x

by Colin Wyers
July 09, 2009

This is a look at what's possible, not a serious attempt at a defensive evaluation metric. We'll get there someday (and hopefully by someday I mean "some day this month"), just not today.

Our own Harry Pavlidis has the best look I've seen so far at the sheer depth of data available from the preview HITf/x data we've been given courtesy of Sportsvision. It's the most data that I've seen made available to the public about what happens to a batted ball after it leaves the bat. But how do we get from there to an evaluation of defense?

What we knew about batted balls before HITf/x

The answer is, not very much. Typically data providers put a batted ball into one of four buckets:

  • Ground ball
  • Line drive
  • Fly ball
  • Pop-up

This is simply not very descriptive for our purposes, as I've stated before. And if that's not bad enough, different data providers often don't agree on the difference between a fly ball and a line drive. For example, is a Texas Leaguer over the infield into shallow right a fly ball or a line drive? What if the outfielder is able to race up and snag it? As best we can tell, the former is more likely to be called a line drive and the latter a fly ball, even if they follow the exact same flight path.

What we want to know about a batted ball

That's simple. To evaluate the play of an outfielder, we would preferably know the following about batted balls hit to the outfield:
  • What direction the ball is hit.
  • How far the ball is hit.
  • How fast it gets there.

Can we get there from what we have available to us via HITf/x

Right now, the answer is: sorta. I took a look earlier in the week, and what have right now is the angle (horizontal and vertical) as well as the speed off the bat of batted balls. What we don't have is spin. How important is the spin? Here's an example of the path of a batted ball, launched at 35 degrees with an initial velocity of 95 mph:

image

The blue line is the path the ball would take if there was no spin; the red line is the path the ball would take if there was 2000 rpm of backspin. With spin, the ball travels almost 50 additional feet, and stays in the air about a second and a half longer. That's a significant difference.

(This of course only takes into account the spin of the ball along the flightpath, ignoring any spin to the sides. Sidespin is of course very important to the path of a batted ball—picture a long, deep drive that you just know would be a home run, if it wasn't slicing into the stands and ending up as a much less exciting foul ball.)

Can we estimate spin? There has been some helpful progress made in this regard, almost entirely by people who aren't me. (I hope to learn more in this regard at this weekend's PITCHf/xSummit.) Until then, we're left with an imperfect picture of the flight of a batted ball.

What we can tell from an imperfect picture

First, we'll look at the effect of flight time on DER. (This differs from the chart earlier in the week in that 2000 rpm of backspin were included in the estimates.)

Time
DER
0.0
0.839
0.5
0.674
1.0
0.527
1.5
0.545
2.0
0.466
2.5
0.196
3.0
0.109
3.5
0.412
4.0
0.640
4.5
0.718
5.0
0.881
5.5+
0.964

And from another point of view, we'll look at distance in feet travelled:

Distance
DER
0
0.851
50
0.761
100
0.704
150
0.640
200
0.467
250
0.609
300
0.695
350
0.601
400
0.353

Obviously there is some substantial overlap between the two; the correlation between time and distance is a very robust.

What we still need

So how do we get from here to a defensive metric? The first thing we need is the direction the ball is hit laterally, which HITf/x helpfully provides. The next thing we need is an idea of who was on the field when each batted ball is struck. This can presumably be parsed from the Gameday XML data that is freely available.

Probably the biggest thing we are missing is just more HITf/x data. That's necessary to establish a baseline to compare a fielder to, as the more data we have, the smaller we can slice the data we have and the more precision we get.

And of course, as mentioned above, our estimates of the flight path of a batted ball can improve. But we now are a lot closer to having that sort of information than we ever were before.

Why is this a big deal?

One of the most vexing problems in sabermetrics is how to split responsibility for hits and runs between a pitcher and his defense. Our understanding has advanced only slightly, in fits and starts, since McCracken opened up the whole can of worms to begin with.

Simply having better data won't solve this problem by itself, but it will give us a powerful new set of tools in at least finding the right questions to ask. I'm very, very excited about this, and I hope you are, too.

I am still learning an awful lot about this myself, and I plan to learn a lot more this weekend. Hopefully I'll be recovered enough by this time next week to pass on what I've learned. We may not have a defensive metric that uses HITf/x yet, but we're very close, and I'm confident we will soon.



References and Resources
Probably the greatest resource for anyone looking to study baseball using physics is Professor Alan Nathan's website. Pay special attention to his course notes on the subject - you get Powerpoint slides, Excel spreadsheets and more.

The graph in the article of the flight path was generated using one of the spreadsheets available at that site.

Also invaluable is Robert Adair's book, The Physics of Baseball.

As an aside that only a few of you will care about - I do believe I've figured out how to parse the required data about who was playing what position from the Gameday XML data provided. I am not, however, certain of this. As of this writing, the final query to put the data all together has been running for a solid hour and likely will not be done for a while longer. If I am correct and the data checks out I will be more than happy to share it with interested parties.

Colin Wyers knows exactly how much of a nerd he is. He is very interested in hearing about any other concerns you may have; you can reach him by e-mail, and he will try his best to respond in a timely fashion. He also blogs at Statistically Speaking.


Peter Jensen said...

Colin - I am glad you are going to make it to the Summit this year.  I look forward to meeting you.  Last year I worked with the Sportvision raw footage for 2 games in an attempt to manually calculate the Hit f/x parameters and also to try and estimate spin and hit ball landing location from those parameters.  I was successful in showing that Hit f/x parameters could be calculated from the existing footage, a result that I presented at last years Summit.  I was unsuccessful at the latter task, my conclusion being that there was too much variation in the spin characteristics and not enough input information to be able to make an accurate estimate.  Dr. Nathan and Sportvision have come to the same conclusion.  But everyone is still trying so there may be some further news this weekend. 

You are on the right track with your effort to establish the limits that different spin rates might reasonably put on a hit ball.  However, last year when I calculated spin rates on hit balls by matching their input parameters and the best guess of there landing location using an average of MLB Gameday, STATS, my observations from video, and Greg Rybarcyzk’s observations from video, I found spin rates that were over 3000 RPM, which extends the possible landing points of your graph considerably.  I concluded that until we get full path tracking of hit balls from Sportvision’s proposed future Fielding f/x that we would not be able to pinpoint a landing location from the Hit f/x parameters that is any more accurate than that given by Gameday’s hit locations. 

That does not mean that the Hit f/x parameters cannot be used to improve existing fielding metrics.  My talk for the Summit will be specifically on that subject and I will publish a follow up Skill Based Fielding Metric to my Skill Based Batting and Pitching Metrics that will include the improvements that I will discuss on Saturday.

Posted 07/09  at  10:15 AM
jedlovec3 said...

“What if the outfielder is able to race up and snag it? As best we can tell, the former is more likely to be called a line drive and the latter a fly ball, even if they follow the exact same flight path.”

Colin, I think you’re correct in assuming that this is a POTENTIAL source of bias, and you’re absolutely correct that type (liner/fliner/fly) and velocity (hard/medium/soft) are far from perfect descriptions of flight path.

But I think you’re a little TOO presumptive.  Scorers at BIS, for example, are carefully trained to minimize (or eliminate?) this bias.  That doesn’t mean it doesn’t exist, but I wonder if it’s a big problem like you make it sound.  Perhaps most important is internal consistency.  If one BIS scorer would call it a fliner and another would call it a fly, that’s a problem.  BIS, for one, works very hard to ensure internal consistency.

I’m looking forward to meeting everyone in San Francisco this weekend.

Posted 07/09  at  11:57 AM
DaninPhilly said...

Just curious, why can’t you just use a metric of where the ball lands and how long it takes to get there?  Wouldn’t that take all thr guesswork out of the whole classification of fly ball, liners, etc? 

It seems to me that with the advanced tracking systems we have we can plot every point on a given ballpark, and also calculate pretty accurately how long a ball takes to land on that area, and from that we should be able to determine how often such a ball becomes a single, double, out, etc.  From there it can easily be deterimed the average value of a ball hit with such a placing/timing ratio and therefore the expected value to the hitter/pitcher/fielder depending on the actual result.

Why worry about spin?  Why not just break it down to the basics and go from there?

Posted 07/09  at  04:37 PM
Alan Nathan said...

Actually, I’ll have quite a bit to say about spin at the summit on Saturday.  By combining hitf/x and hittracker data, I can back out the two spin components (backspin and sidespin).  I am finding backspin values for home runs looking more or less like a normal distribution, centered at about 2000 rpm and with an rms of about 600 rpm.  I was not successful in finding a simple relationship between the backspin and the initial velocity magnitude or direction.  The backspin more or less increases with launch angle, but there is a lot of scatter about the general trend.  I found some very interesting results for the sidespin.  The sidespin distribution is very dependent on the spray angle, as you might expect and there is much less scatter about the general trend than I find for the backspin.  More about all this on Saturday.

Posted 07/09  at  07:55 PM
Alan Nathan said...

I’ll have quite a bit to say about spin at the summit on Saturday.

Posted 07/09  at  07:57 PM
Alan Nathan said...

Re:  DaninPhilly—From my perspective, the issue is how well we can predict hang time and landing point from hitf/x data alone.  As Colin pointed out, to do that requires some knowledge of the spin of the batted ball.  Of course, if we have the full trajectory or even just the landing point and hang time, the issue of spin is a moot one from the point of view of baseball analysis (although still an interesting issue from the point of view of baseball physics).

Posted 07/09  at  08:01 PM
Peter Jensen said...

DaninPhilly - No one is currently mapping where a ball lands. MGL has a project to determine hang time but the information will not be publicly available. Eventually we will have access to both pieces of information, perhaps in the next year or two, and, as you suggest, that will be all we need to know.  What people are talking about here is how to do the best we can with what information we have now.

Posted 07/09  at  08:11 PM
Colin Wyers said...

jedlovec3 - I can’t speak to the BIS data specifically. When it comes to hit location data, all I’ve ever had direct access to is the Retrosheet and Gameday data. That said, I may have mis/overstated my claims there, even as regards that data.

To all that will be at the summit, look forward to meeting you all.

Posted 07/10  at  03:48 AM
DaninPhilly said...

Peter, for some reason, I remember read an article lately which I thought had said that obsticle had been overcome and some company was actually tracking that information.  If I come accross it, I’ll post it here.

Posted 07/10  at  12:23 PM
DaninPhilly said...

I guess this is the article I thought about.  It looks to be a year or two away from the common people.
http://www.nytimes.com/2009/07/10/sports/baseball/10cameras.html?_r=2&hp;

Posted 07/10  at  02:31 PM
Scot Gould said...

I have suspected that the Rays (and possibly some other teams), have used vectorizing of the flight of the ball, which the game tracking information provides, and not hit charts to realign their outfield defenses. By measuring both the eventually landing site of a hit and time of flight, from multiple years of data at the Trop, they have calculated the position for the outfielders which maximizes the probability of catching any flyball weighted on how much value the hit would have should it drop in. This has allowed them to bring their outfielders in to catch the more frequently occurring, quickly falling single and thus spend less time defending against the less frequently occurring long fly balls. [It also helps that air resistance is quite significant at the sea-level sitting Trop.] While this is fine-and-dandy, the down side is what happened earlier in the year where some very hard line drives where hit over Upton’s head during key moments near the end of the game. Of course the team was crucified in the papers because “that’s not the we do things in baseball.” Kind of like not having a closer or why NFL coaches are too conservative on 4th down.

Based on watching a large number of Rays games,  nearly every game there is at least one very long fly ball.  It looks like it is significantly over Upton’s head and will fall him, but the time of flight is long enough he tracks it down. It helps to have a fast outfielder. smile

I’m pleased to see baseball start using physics solutions to an applied physics problem.

Posted 07/13  at  06:37 PM
Page 1 of 1

Commenting is not available in this weblog entry.

Do you have a general question or comment for one of THT's writers? Send it in to our weekly mailbag We also welcome unsolicited op-ed pieces of approximately 500 words for consideration. We reserve the right to edit for length, clarity and consistency of style. Please include your whole name and location to be considered. If you have a comment about this specific article, please email the writer.



The best online source for major league baseball tickets is Ticket City.

     Next Article:  Fielding stats for college shortstops>> <<Previous Article:  TUCK! sez: Bring your own whine and brew