When is a fly ball a line drive?by Colin Wyers
December 04, 2009
Let's talk about parallax.
Simply put, parallax is a phenomenon in which the apparent position of an object can change based upon the location of the observer. Let's consider a very modest change in location—moving from the driver's seat to the passenger's seat of a car. (Do not attempt this while the car is in motion, please.) Observe how the gas gauge looks in each position:
In this instance, the passenger would tend to believe that there was less gas in the car than the driver would.
Baseball fans will be most familiar with the effects of this phenomenon when it comes to calling balls and strikes off the television feed; most broadcasts use an offset camera angle that distorts our view of the strike zone. But what else could it affect?
Most of what we know about batted balls comes from stringers who score games. Typically those stringers are given a spot in the press box. Now, because of the way different stadiums are constructed, the view from the press box shifts around. This, for instance, is the view from the Wrigley Field press box:
Compare to Citi Field:
It's a bit more of a dramatic difference than simply moving over a seat in a car, isn't it?
So let's test a theory—that the placement of the observer has an effect on how that observer determines the trajectory of a batted ball. Let's focus on air balls—fly balls, line drives and pop-ups. Based upon what we know, we should expect that the higher the observer, the flatter a batted ball looks and the more likely it is to be scored a line drive.
Grading the parks
Figuring out press box heights is not a simple task. I did the best I could given the tools I had. But the heights I collected are at best estimates. This is especially true for stadiums where press boxes have multiple levels. And for some parks I gave up on trying to get a decent estimate at all. I collected data on 27 parks in use from 2005-2009. The entire list, including the estimated heights, is available here. Parks are coded with Retrosheet park codes. I excluded one park from consideration, Coors Field; its inflated line drive rate caused by the high elevation makes it unsuitable for this study.
Then I calculated line drives per total air balls (flies, liners and pop-ups) as per the batted ball data available from Retrosheet from 2005-2009. Those data are based on the observations of MLB Gameday stringers. To avoid having a league bias, I removed all at-bats from pitchers. And because a team often uses the same hitters over a period of years, I looked only at the visiting team batting. Yes, there may be some persistence of pitcher line drive rates across seasons, but it's a minor effect compared to the persistence of hitter line-drive rates.
We do in fact see a slight correlation between press box height, about .16, after weighting for the number of years a park was in use during the sample:
(Each park gets its own data point, but the correlation—and the linear trend line—are based on the weighted data.)
If that's all there was to it, we could probably table the matter as perhaps real but not especially significant. But let's focus on the extreme parks a minute—those 40 feet or lower and 70 feet or higher:
The blue-coded points are parks that are either extremely low (the Oakland Colliseum and Shea Stadium) or extremely high (Fenway Park, Turner Field, PNC Park and Citizens Bank Park). They don't seem to follow the trend line at all; they actually seem pretty centered on the median. My hunch is that in those parks, scorers aren't relying on their view from the press box. Instead, they are looking at the TV feed. If we look at the trend with those parks excluded, the relationship becomes much stronger, with a correlation of .38:
Running a regression analysis, we see that a change in observer height of one foot is worth nearly .002 points of line drive percentage. That's a significant effect, for my money.
So far we've looked at a theory about how batted balls are observed and provided some evidence to support the claim. What are the implications, if this theory is true?
The professional data providers, BIS and STATS, certainly should provide better data than what the Gameday stringers provide; they take more care with the data, provide cross-checking, etc. But they are still unable to provide a consistent point of observation in every ballpark. (STATS uses a primary scorer in the press box; BIS has no in-park scorers but relies on video feeds.)
The implication of this is that we could see an effect where fielders are over- or underrated by defensive metrics based upon that scoring data, even over a period of years, because of an error introduced by a persistent bias. What I can't tell you—at least not without a lot more study—is which players, by how much or even the magnitude of the potential effect.
This isn't a repudiation of current defensive metrics, mind you. But people get the impression that they are based on a cold, calculating computer. But all current means we have of measuring defensive impact are based on human observation. We don't have a perfect means of evaluating anything— hitting, defense or pitching. It doesn't mean we don't strive for perfection, though.
References and Resources
For those who are interested, the full regression is:
LD_RATE = 0.253415 + PB_HEIGHT * 0.00157926
The standard error of the coefficient is 0.000403918, with a p-value of 0.0002. This indicates that the results are, at least, statistically significant. All graphs and regressions were done with gretl.
This article would not have been possible without the help of Greg Rybarczyk of HitTracker. He spent a lot of time helping me figure out the park measurements necessary to calculate the press box heights, and even provided some himself. I owe him a great debt.
Special thanks also to Chris Dial, Larry Mahnken, Cory Schwartz and Ben Jedlovec for their assistance in researching this article.
Also essential to the project was Google Earth and Panoramio. Admittedly it feels a bit like swatting a fly with a hand grenade—millions in taxpayer dollars, an entire space program supporting a constellation of satellites, and I'm using it to figure out scorer bias in line drives.
Some other helpful resources:
The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org."
For those curious—the pictures of the gas gauge come from my car, an '07 VW Rabbit.
Colin Wyers knows exactly how much of a nerd he is. He is very interested in hearing about any other concerns you may have; you can reach him by e-mail, and he will try his best to respond in a timely fashion. He also blogs at Statistically Speaking.