More on BABIP for batters

by Derek Carty
November 21, 2007

In case you missed it, I talked a lot about Batting Average on Balls in Play (BABIP) a few weeks ago, going over certain things that do and don’t affect it, and to what extent.

What we found, though, really wasn’t very conclusive (especially when you consider the mistake I made, which is explained in the “Errata” section of this post). To satisfy my curiosity, I really wanted to get a complete look at the batted ball components of BABIP, so I thought I’d share my results.

Year-to-Year

Let’s first see which of these batted ball components can predict themselves: which are consistent, which ones players have a good deal of control over. To do this, we’ll run some simple correlations to check how well each type of batted ball and its corresponding hit percentage correlates from year-to-year.

For each correlation, a batter needs to have at least 250 plate appearances in both the first and second year in order to be eligible. Data from 2004-2007 was used.

Batted Ball Percentage	Correlation Coefficient
GB%	0.75
OF FB%	0.70
LD%	0.24
IF FB%	0.62

Batted Ball BABIP	Correlation Coefficient
GB BABIP	0.24
OF FB BABIP	0.14
LD BABIP	0.10
IF FB BABIP	-0.03

So, it looks as though hitters have a lot of control over ground balls and fly balls, but little control over everything else, at least on a year-to-year basis. Let’s look a little deeper to see if we can find some better results.

3-Year correlations

Let’s see, if we expand the sample size of these numbers, if more hitter control over some of these components can be found.

For each correlation, a batter needs to have at least 650 combined plate appearances among the first three seasons and at least 250 in the fourth. The three-year figure comes from 2004-2006 stats, and year four is 2007. In the weighting tables, a 5/4/3 weighting is used.

Unweighted

Batted Ball Percentage	Correlation Coefficient
GB%	0.76
OF FB%	0.79
LD%	0.39
IF FB%	0.74

Batted Ball BABIP	Correlation Coefficient
GB BABIP	0.34
OF FB BABIP	0.19
LD BABIP	0.24
IF FB BABIP	-0.06

Weighted

Batted Ball Percentage	Correlation Coefficient
GB%	0.77
OF FB%	0.80
LD%	0.39
IF FB%	0.73

Batted Ball BABIP	Correlation Coefficient
GB BABIP	0.36
OF FB BABIP	0.19
LD BABIP	0.23
IF FB BABIP	-0.06

When we do this, we get some really nice results. It becomes even easier to predict ground balls and fly balls when we look at three year data. There is some definite potential there. We also see a nice little spike in line drive percentage. It isn’t as high as we’d like to see it, but we might be able to do something there, depending on how other options work out. Groundball BABIP also produces a decent result. Perhaps ground balls are the key to BABIP?

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

Concluding thoughts

I won’t draw any conclusions today. The purpose of this article was simply to present this data to you and let you know where I’m thinking of going with this stuff. There is still so much to learn about BABIP, and hopefully we’ll be able to uncover some new things here.

Next time, we’ll do some regression analysis to see if there is indeed anything that could be a legitimate option for predicting BABIP.

Errata

In my previous post on BABIP, I made some mistakes. I had incorrectly calculated BABIP2. This had little affect on most of the correlation coefficients, but a few had significant changes. All of the new correlation coefficients are listed below.

2) Walk rate correlation with BABIP2 — 0.05
3) (Called Strikes + Balls)/(Total Pitches) with BABIP2 — 0.03
4) Walks/Strikeouts (BB/K) correlation with BABIP2 — -0.02
5) Line drive rate correlation with BABIP2 — 0.45
6) Outfield fly ball BABIP correlation with BABIP2 — 0.52
9) 3 year, unweighted BABIP2 correlation with Year 4 BABIP2 — 0.39
10) 2 year, unweighted BABIP2 correlation with Year 3 BABIP2 — 0.37
11) 3 year, weighted BABIP2 correlation with Year 4 BABIP2 — 0.38

Outfield fly ball BABIP gets a big boost, enough to become the top predictor of BABIP2 that we looked at. Unfortunately, as we explained above, it isn’t a very stable event. Line drive rate also got a tick higher, and — as we discussed — it is somewhat predictable using a three-year figure.

9, 10, and 11 — obviously — are significantly lower than where we had them before. They are still decent, but not great. More work certainly needs to be done in this field.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG