# The Hardball Times

## Evaluating defense using HITf/x

by Colin Wyers
July 09, 2009

This is a look at what's possible, not a serious attempt at a defensive evaluation metric. We'll get there someday (and hopefully by someday I mean "some day this month"), just not today.

Our own Harry Pavlidis has the best look I've seen so far at the sheer depth of data available from the preview HITf/x data we've been given courtesy of Sportsvision. It's the most data that I've seen made available to the public about what happens to a batted ball after it leaves the bat. But how do we get from there to an evaluation of defense?

### What we knew about batted balls before HITf/x

The answer is, not very much. Typically data providers put a batted ball into one of four buckets:

• Ground ball
• Line drive
• Fly ball
• Pop-up

This is simply not very descriptive for our purposes, as I've stated before. And if that's not bad enough, different data providers often don't agree on the difference between a fly ball and a line drive. For example, is a Texas Leaguer over the infield into shallow right a fly ball or a line drive? What if the outfielder is able to race up and snag it? As best we can tell, the former is more likely to be called a line drive and the latter a fly ball, even if they follow the exact same flight path.

### What we want to know about a batted ball

That's simple. To evaluate the play of an outfielder, we would preferably know the following about batted balls hit to the outfield:
• What direction the ball is hit.
• How far the ball is hit.
• How fast it gets there.

Can we get there from what we have available to us via HITf/x

Right now, the answer is: sorta. I took a look earlier in the week, and what have right now is the angle (horizontal and vertical) as well as the speed off the bat of batted balls. What we don't have is spin. How important is the spin? Here's an example of the path of a batted ball, launched at 35 degrees with an initial velocity of 95 mph:

The blue line is the path the ball would take if there was no spin; the red line is the path the ball would take if there was 2000 rpm of backspin. With spin, the ball travels almost 50 additional feet, and stays in the air about a second and a half longer. That's a significant difference.

(This of course only takes into account the spin of the ball along the flightpath, ignoring any spin to the sides. Sidespin is of course very important to the path of a batted ball—picture a long, deep drive that you just know would be a home run, if it wasn't slicing into the stands and ending up as a much less exciting foul ball.)

Can we estimate spin? There has been some helpful progress made in this regard, almost entirely by people who aren't me. (I hope to learn more in this regard at this weekend's PITCHf/xSummit.) Until then, we're left with an imperfect picture of the flight of a batted ball.

### What we can tell from an imperfect picture

First, we'll look at the effect of flight time on DER. (This differs from the chart earlier in the week in that 2000 rpm of backspin were included in the estimates.)

 Time DER 0.0 0.839 0.5 0.674 1.0 0.527 1.5 0.545 2.0 0.466 2.5 0.196 3.0 0.109 3.5 0.412 4.0 0.640 4.5 0.718 5.0 0.881 5.5+ 0.964

And from another point of view, we'll look at distance in feet travelled:

 Distance DER 0 0.851 50 0.761 100 0.704 150 0.640 200 0.467 250 0.609 300 0.695 350 0.601 400 0.353

Obviously there is some substantial overlap between the two; the correlation between time and distance is a very robust.

### What we still need

So how do we get from here to a defensive metric? The first thing we need is the direction the ball is hit laterally, which HITf/x helpfully provides. The next thing we need is an idea of who was on the field when each batted ball is struck. This can presumably be parsed from the Gameday XML data that is freely available.

Probably the biggest thing we are missing is just more HITf/x data. That's necessary to establish a baseline to compare a fielder to, as the more data we have, the smaller we can slice the data we have and the more precision we get.

And of course, as mentioned above, our estimates of the flight path of a batted ball can improve. But we now are a lot closer to having that sort of information than we ever were before.

### Why is this a big deal?

One of the most vexing problems in sabermetrics is how to split responsibility for hits and runs between a pitcher and his defense. Our understanding has advanced only slightly, in fits and starts, since McCracken opened up the whole can of worms to begin with.

Simply having better data won't solve this problem by itself, but it will give us a powerful new set of tools in at least finding the right questions to ask. I'm very, very excited about this, and I hope you are, too.

I am still learning an awful lot about this myself, and I plan to learn a lot more this weekend. Hopefully I'll be recovered enough by this time next week to pass on what I've learned. We may not have a defensive metric that uses HITf/x yet, but we're very close, and I'm confident we will soon.

References and Resources
Probably the greatest resource for anyone looking to study baseball using physics is Professor Alan Nathan's website. Pay special attention to his course notes on the subject - you get Powerpoint slides, Excel spreadsheets and more.

The graph in the article of the flight path was generated using one of the spreadsheets available at that site.

Also invaluable is Robert Adair's book, The Physics of Baseball.

As an aside that only a few of you will care about - I do believe I've figured out how to parse the required data about who was playing what position from the Gameday XML data provided. I am not, however, certain of this. As of this writing, the final query to put the data all together has been running for a solid hour and likely will not be done for a while longer. If I am correct and the data checks out I will be more than happy to share it with interested parties.

Colin Wyers knows exactly how much of a nerd he is. He is very interested in hearing about any other concerns you may have; you can reach him by e-mail, and he will try his best to respond in a timely fashion. He also blogs at Statistically Speaking.