May 25, 2013

THT Essentials:
Fangraphs Player Search:


And here's the full roster.

Now available


You can now purchase the Hardball Times Baseball Annual 2013, with 300 pages of great content. It's also available on Amazon and Kindle. Read more about it here.



Or you can search by:

THT E-book


Third Base: The Crossroads is THT's e-book, available for $3.99 from the Kindle store. The good news is that anyone can read a Kindle book, even on a PC. So enjoy the best from THT in a new format.



Get your very own THT merchandise from our CafePress store. We've got baseball caps, t-shirts, coffee mugs and even wall clocks with the classy THT logo prominently displayed. Also, check out the THT Bookstore. Please support your favorite baseball site by purchasing something today.


Creative Commons License
All content on this site (including text, graphs, and any other original works), unless otherwise noted, is licensed under a Creative Commons License.

When is a fly ball a line drive?

by Colin Wyers
December 04, 2009

Let's talk about parallax.

Simply put, parallax is a phenomenon in which the apparent position of an object can change based upon the location of the observer. Let's consider a very modest change in location—moving from the driver's seat to the passenger's seat of a car. (Do not attempt this while the car is in motion, please.) Observe how the gas gauge looks in each position:

image

In this instance, the passenger would tend to believe that there was less gas in the car than the driver would.

Baseball fans will be most familiar with the effects of this phenomenon when it comes to calling balls and strikes off the television feed; most broadcasts use an offset camera angle that distorts our view of the strike zone. But what else could it affect?

Most of what we know about batted balls comes from stringers who score games. Typically those stringers are given a spot in the press box. Now, because of the way different stadiums are constructed, the view from the press box shifts around. This, for instance, is the view from the Wrigley Field press box:

image

Compare to Citi Field:

image

It's a bit more of a dramatic difference than simply moving over a seat in a car, isn't it?

So let's test a theory—that the placement of the observer has an effect on how that observer determines the trajectory of a batted ball. Let's focus on air balls—fly balls, line drives and pop-ups. Based upon what we know, we should expect that the higher the observer, the flatter a batted ball looks and the more likely it is to be scored a line drive.

Grading the parks

Figuring out press box heights is not a simple task. I did the best I could given the tools I had. But the heights I collected are at best estimates. This is especially true for stadiums where press boxes have multiple levels. And for some parks I gave up on trying to get a decent estimate at all. I collected data on 27 parks in use from 2005-2009. The entire list, including the estimated heights, is available here. Parks are coded with Retrosheet park codes. I excluded one park from consideration, Coors Field; its inflated line drive rate caused by the high elevation makes it unsuitable for this study.

Then I calculated line drives per total air balls (flies, liners and pop-ups) as per the batted ball data available from Retrosheet from 2005-2009. Those data are based on the observations of MLB Gameday stringers. To avoid having a league bias, I removed all at-bats from pitchers. And because a team often uses the same hitters over a period of years, I looked only at the visiting team batting. Yes, there may be some persistence of pitcher line drive rates across seasons, but it's a minor effect compared to the persistence of hitter line-drive rates.

We do in fact see a slight correlation between press box height, about .16, after weighting for the number of years a park was in use during the sample:

image

(Each park gets its own data point, but the correlation—and the linear trend line—are based on the weighted data.)

If that's all there was to it, we could probably table the matter as perhaps real but not especially significant. But let's focus on the extreme parks a minute—those 40 feet or lower and 70 feet or higher:

image

The blue-coded points are parks that are either extremely low (the Oakland Colliseum and Shea Stadium) or extremely high (Fenway Park, Turner Field, PNC Park and Citizens Bank Park). They don't seem to follow the trend line at all; they actually seem pretty centered on the median. My hunch is that in those parks, scorers aren't relying on their view from the press box. Instead, they are looking at the TV feed. If we look at the trend with those parks excluded, the relationship becomes much stronger, with a correlation of .38:

image

Running a regression analysis, we see that a change in observer height of one foot is worth nearly .002 points of line drive percentage. That's a significant effect, for my money.

The implications

So far we've looked at a theory about how batted balls are observed and provided some evidence to support the claim. What are the implications, if this theory is true?

The professional data providers, BIS and STATS, certainly should provide better data than what the Gameday stringers provide; they take more care with the data, provide cross-checking, etc. But they are still unable to provide a consistent point of observation in every ballpark. (STATS uses a primary scorer in the press box; BIS has no in-park scorers but relies on video feeds.)

The implication of this is that we could see an effect where fielders are over- or underrated by defensive metrics based upon that scoring data, even over a period of years, because of an error introduced by a persistent bias. What I can't tell you—at least not without a lot more study—is which players, by how much or even the magnitude of the potential effect.

This isn't a repudiation of current defensive metrics, mind you. But people get the impression that they are based on a cold, calculating computer. But all current means we have of measuring defensive impact are based on human observation. We don't have a perfect means of evaluating anything— hitting, defense or pitching. It doesn't mean we don't strive for perfection, though.



References and Resources

For those who are interested, the full regression is:

LD_RATE = 0.253415 + PB_HEIGHT * 0.00157926

The standard error of the coefficient is 0.000403918, with a p-value of 0.0002. This indicates that the results are, at least, statistically significant. All graphs and regressions were done with gretl.

This article would not have been possible without the help of Greg Rybarczyk of HitTracker. He spent a lot of time helping me figure out the park measurements necessary to calculate the press box heights, and even provided some himself. I owe him a great debt.

Special thanks also to Chris Dial, Larry Mahnken, Cory Schwartz and Ben Jedlovec for their assistance in researching this article.

Also essential to the project was Google Earth and Panoramio. Admittedly it feels a bit like swatting a fly with a hand grenade—millions in taxpayer dollars, an entire space program supporting a constellation of satellites, and I'm using it to figure out scorer bias in line drives.

Some other helpful resources:

Photo of Wrigley Field pressbox courtesy of pedalfreak and released under a Creative Commons license. Photo of Citi Field press box courtesy of kenyee and released under a Creative Commons license.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org."

For those curious—the pictures of the gas gauge come from my car, an '07 VW Rabbit.



Colin Wyers knows exactly how much of a nerd he is. He is very interested in hearing about any other concerns you may have; you can reach him by e-mail, and he will try his best to respond in a timely fashion. He also blogs at Statistically Speaking.

Comments






     Next Article:  Cooperstown Confidential: Danny Murtaugh and the Hall of Fame>> <<Previous Article:  Pitch run value and count