Park effects and batted ball typesby Harry Pavlidis
September 01, 2009
One of the best things about Gameday is the ability to download the data and do your own research and analysis. The possibilities are almost limitless, and Hardball Times readers will be familiar with the many uses, including pitching, hitting and fielding analysis. One of my favorite uses is to create a fielding independent measure of pitch quality.
By now, you've probably seen Graham MacAree's tRA at Fangraphs or StatCorner. It's based on batted ball types, from ground ball to fly ball. It's a great tool to use, among many other quality measures of pitching, including xFIP.
I like to use something similar to tRA, which I've been calling rv100E. That refers to "expected" run value per 100 pitches thrown, as in runs allowed, or saved, above average. Linear weights are used, based on hit type (single to home run) or out, as well as balls and strikes. The run expectancies are count adjusted, allowing for measurement of a particular pitch, or even a particular location—or both.
Wait a second. I said rv100E is based on batted ball type, but the linear weights are based on hit type. Hit types are probabilistic, and are distributed based on batted ball type.
A table will explain it better. Each batted ball type has a particular range of outcomes, and various frequencies. Outs, zero, one or more, can occur, too, even when a batter reaches safely. Those also are counted. Here are the 2009 batted ball type to even types, along with the linear weight (LW) associated with each hit type. Values are for 2009 only.
This is a nice start, but the data provided by Gameday, as far as batted ball type, are entered by the stringers in the press box—free-lancers hired for this purpose. And not all of them classify hits the same way.
Dealing with park effects in layers
Consider this problem. A line drive in Petco Park may be worth more than one in Wrigley Field. Or, more likely, it may have a different range of outputs. I can envision more line drive home runs in Wrigley, but fewer triples. At the same time, I'd expect a different range of outcomes with fly balls. More outs and doubles in Petco, more home runs in Wrigley. In that event, I need to apply different weights based on the park.
Or do I? Do I really care if a pitcher gave up a line drive in Citi Field rather than Safeco? Either that pitch was hit hard, or it wasn't, and I want a park neutral value assigned. So, super duper, I don't need to worry about park effects. I treat all fly balls as equals, assign the league average home run rate (indirectly shown above) and move on.
Or do I? Don't forget about the stringers. There's a "park effect" I care about. Let me show you what I mean. Here are the ratios for fly balls to line drives, by park, in 2009.
So, do the Mariners hit a lot of line drives, or do the stringers like to tag hits as line drives? Or should we blame their pitchers?
One way to tease out the team itself from the stringers is to apply the park correction methodology and find the "park effect" on batted ball classification.
In reality, it isn't just line drives and flies we have to worry about. On a base hit, when does a liner become a grounder? How likely is a home run to be a line drive or a fly ball?
This table keeps line drives and fly balls separate from their home run counterparts, while the above did not.
Dizzy yet? I am. If a number above is less than one, it appears the stringer is less likely than average to classify a batted ball as such. With a caveat for the home runs, there's a real park effect mixed in.
Now that I've crunched some numbers, there a few things left to do. First, expose this to public scrutiny to flush out issues with my methodology. At the same time, crowd source the application of this information. Based on a given park, and a stringer's classification, how would you distribute the linear weights for hits and outs? In other words, how should rv100E work?
References and Resources
Gameday data from MLBAM
Linear weights calculated using Tom Tango's tool
All math errors and other brain cramps by the author, but he'll blame the editor
Harry Pavlidis admits he has a baseball problem. He is the founder of Pitch Info LLC, His pitch classifications power the player cards at Brooksbaseball.net. Feedback, questions and comments are appreciated - Email firstname.lastname@example.org and Twitter @harrypav