2010 xBABIP Splits And Adjustments

Determining the true talent level of hitters is no simple task. While some metrics tend to gain “statistical significance” over the course of a full season, many do not, and even for the ones that do, many hitters do not reach the requisite “statical significance threshold” from which we can start drawing conclusions. One such metric, BABIP, requires more than 650 plate appearances to reach a predictive or indicative level. Given the volatility of BABIP luck, AVG/OBP/SLG/OPS determination, a function of BABIP, becomes a fickle task.

Thanks to a few good men, however, talent spotting is a less exacting task than it once was.

A couple of years ago, Chris Dutton and Peter Bendix did some research on batted-ball data and created a metric called xBABIP (“expected BABIP”). xBABIP dispelled the myth that BABIP was primarily a function of “LD%+ .120.” Rather, as Dutton and Bendix found, BABIP was better explained as a function all batted-ball types and ratios with speed/power/strikeout considerations.

Last year, Derek Carty and Chris Dutton debuted the simple xBABIP calculator on THT. This tool has empowered users to determine a player’s xBABIP and compare it to their actual BABIP. Therefrom, one could forecast a hitter’s expected batting line, assuming all the input ratios were to remain constant. Over the course of 500+ PA, these ratios tend to be significant, though conclusions can still be drawn at the 300 PA threshold (we’d really only be waiting on IFFB% stabilization).

For all 270 hitters who accrued 300 or more plate appearances this season, I applied the xBABIP formula (by park) to determine each hitter’s expected batting lines. In short, what I have created is a spreadsheet of “what you can expect as a baseline for production in 2011, assuming all else remains constant.” In other words, this is how these hitters should have hit in 2010.

My methodology was tedious but simple: once you calculate xBABIP, you reverse engineer expected hits (xHits). Here, a judgment call is necessary—do you keep a hitter’s power ratios and assume they represent his true talent line or do you assume all hits are of a single type? I use the latter strategy, as I do not have a means by which to accurately calculate the proportion of singles/doubles/triples/home runs saved/gained by bad/good luck. Conceding that no matter how I apportion the hit types that it would be by guessing, I simply assume that all hits gained/subtracted would have been exclusively of the singles variety. It should be noted that as a result of this, the expected ISO forecast is not to be relied upon. This methodology is most effective at finding a hitter’s expected batting average and on-base percentage. Nonetheless, BABIP considerations have an impact on hitter’s slugging percentage, so I have accordingly included an xSLG forecast in my chart.

At this point, before I reveal the file and discuss my xBABIP adjustments further, it is essential that I note that the park factor constants utilized in calculating expected BABIP are from 2009. Since there was less than three months worth of park data available at the time the xBABIP tool was released (assuming the new parks were used at all in determining xBABIP park factors), the numbers on Yankees and Mets hitters may not be completely accurate. Take their numbers with a grain of salt. (I’ve italicized Yankees/Mets hitters in the spreadsheet for reference.)

It should also be noted that hitters with high HR/AB rates should be treated with caution. Relatively low Ball-In-Play (BIP) hitters are subject to more underlying numbers volatility.

Now that you understand how I calculated the numbers, let’s take a look at them. You can access the full xBABIP spreadsheet by clicking here. The password to manipulate the spreadsheet is “soto18.” Again, please note that these numbers assume persistence. Adjust mentally as you expected a hitter’s batted ball distro to change (more line drives and groundballs, more AVG, more flyballs (especially infield flies), less AVG). If a player’s numbers and xBABIP do not settle well with you, look at his three-year or career BABIP numbers. If there’s a large disparity, his 2010 xBABIP may be the byproduct of fluke rather than true talent.

Here are the 35 hitters who experienced the most BABIP luck in 2010 (click to embiggen):

Here you find my eternal nemesis of irrational hatred, Justin Morneau, topping the BABIP-luck chart. He’s really a mid-high .270’s AVG hitter in my mind, but xBABIP confirms that there is no reason to reasonably expect even a .300 AVG in 2011. Obviously Josh Hamilton will likely come down to earth some next season (though still hit for a plus-AVG, as he did in 2007 and 2008), along with Austin Jackson (unless he decides to taunt my calls for regression with more virginal sacrifices to the BABIP gods), Colby Rasmus, Nick Swisher and Jayson Werth (buyer beware?).

Here are the 35 hitters who the BABIP gods disfavored most in 2010 (click to embiggen):

Amongst guys who should see their AVGs grow in 2011, we find Carlos Peña, who hit below .200 this season, topping the chart of “guys you can expect more from next year.” Also on this list are Aaron Hill (.273 AVG), my boy Matt LaPorta (.262 AVG), Jose Bautista (though that’s likely a byproduct of his super home run binge…I’d expect a .270 average tops), Zorilla (.278 AVG) and Mike Napoli (.272 AVG, though lots of squatting may depress this figure). Juan Rivera’s on this list too, but that, to me and my gut, feels like a flukey outlier at his age.

For your viewing pleasure, here are also the top 35 expected average (xAVG) hitters (click to embiggen):

And the bottom 35 (click to embiggen):

And here are the top 35 expected on-base (xOBP) guys for you guys in OBP leagues (click to embiggen):

A Hardball Times Update
Goodbye for now.

And the bottom 35 (click to embiggen):

Post your love/hate in the comments.


Jeffrey Gross is an attorney who periodically moonlights as a (fantasy) baseball analyst. He also responsibly enjoys tasty adult beverages. You can read about those adventures at his blog and/or follow him on Twitter @saBEERmetrics.
13 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
brian
13 years ago

I’m curious about some of the guys that show up high on the xBABIP-BABIP list despite routinely having low BABIPs. See Jones, Andruw (only one season with > .250 BABIP since 2005) and Encarnacion, Edwin (no better than .270 BABIP since 2007).  How do they have xBABIPs as high as they are?

Andrew
13 years ago

Great stuff, Jeffrey.

Jeffrey Gross
13 years ago

@brian,

all the data is 2010 batted ball profiles. Both Jones and Pierre caught my eyes an “fluke” BABIP profiles, actually.

With respect to Jones, I speculate it is because of the low IFFB% on the season.

Nick
13 years ago

Great article but something jumped out at me that doesn’t make sense.  178 players had an xAVG that exceeded their actual AVG while only 87 had the reverse split (more than 2 to 1).  Similarly, 91 had an xAVG at least .020 higher than their actual AVG while only 21 had the reverse split (more than 4 to 1).  I would think that these numbers should be closer to equal.  Is it possible that there’s something incorrect in the formula or is there a better explanation that I’m missing.  Thanks.

David
13 years ago

tremendous like always Jeffrey

Jeffrey Gross
13 years ago

@Nick my unscientific guess is the year of the pitcher and defense affected this

Jeffrey Gross
13 years ago

@david: thanks!

Jeffrey Gross
13 years ago

@nick,

To elaborate further (now that I’m not on my ipod touch), I think the influx of plus-defense this season had some noticeable impact on the expected outs by BIP type rates. If teams settle into a new defensive posture compared to the years from which the data was gleaned, then the xBABIP formula would need some updating, but let’s see where the multi-year data takes us. For now, let’s just say it is a byproduct of the year of the pitcher

Jeffrey Gross
13 years ago

Quintero:

I’ve emailed them to you based on the email you posted on here.

Enjoy

Jason B
13 years ago

Good work. Solid data. EXCELLENT usage of “embiggen”.  =)

Jeffrey Gross
13 years ago

Thanks Jason. I always try to be cromulent with my word choice.

Quintero
13 years ago

Would love to see those charts in a different internet-storage space. Blogspot is been blocked in China. Appreciate.

Jason B
13 years ago

Why that’s…that’s not even a word! I’m banishing you from this place – you, and your children, and your children’s children!

For three months!

(Meh…tis funnier when spoken by Donald Sutherland to a 7-year-old girl.)