Tuesday, June 29, 2004
Data Erratum ReduxPosted by Dave Studeman
In yesterday's article, Data Erratum Et Cetera, I noted the difference between the two leagues in BABIP and LD%, and wondered what might have caused it.
A number of readers and commentators mentioned that I overlooked the obvious -- pitchers don't bat in the AL. Doh! That is obvious. So I went back and ran my analysis a little differently.
This time, I only included batters with at least 40 plate appearances in either league (which I probably should have done in the first place). That excludes almost all pitchers at this time, but still represents 93% of all plate appearances in the NL, 96% in the AL.
Now, there is only a 10 point difference between the two leagues.
NL: LD% .183 BABIP .292 Diff .110 AL: LD% .176 BABIP .297 Diff .120A couple of points:
- Taking out batters with less than 40 PA's has very little impact on LD% (one point down in the AL, one point up in the NL). That's a bit surprising, and probably important in some way.
- It has no impact on BABIP in the AL, but brings down BABIP ten points in the NL. That's the pitcher effect.
The remaining 10% diff could easily result from a slight difference in fielders or ballparks, as well as sample size issues or sheer luck.
Dave was called a "national treasure" by Rob Neyer. Seriously. Comments about this article can be sent to him through the miracle of e-mail.