**Don’t forget to follow THTFantasy on Twitter**. A special shout-out thanks to Yizhe Shen for helping me compile the data for players on multiple teams this year.

Each of the past two seasons, I have made it a habit to use The Hardball Times’ expected BABIP (xBABIP) formula in an attempt to take a somewhat luck-neutral look at batting lines from the previous year to help better forecast relative value for the (ages away) upcoming season. Not to break habit, what follows is a breakdown of 2011 batting lines.

Before I present the data, which can be accessed and sorted by **clicking here**, let me explain my methodology and the crucial-to-understand underlying assumptions. If you have not yet read Chris Dutton and Peter Bendix’s article on their xBABIP formula, I suggest doing so before proceeding, because I use their formula.

Step one is calculating each player’s xBABIP. This can be done through a variety of methods, but as I have indicated above, I use Chris Dutton and Peter Bendix’s xBABIP formula. It is worth noting that other xBABIP formulas do exist, such as the one posted by slash12 a couple of years ago on Beyond The Boxscore. xBABIP is a theoretical model, and each formula has its own pros and cons.

I prefer to use The Hardball Times’ version because 1) I’m a company man and 2) it accounts for park (though admittedly, the park factor data are a few years old now, and for a few teams—the Yankees, Twins, Mets, and starting next year, the Marlins—the park factors are entirely obsolete). Feel free to use the follow methodology of determining batting line with whatever formulation of xBABIP you choose.

Once you have calculated each player’s xBABIP (a feat easier said than done, especially if you have to account for partial seasons and league/park factors), you will need to apply it using fancy algebra to determine a player’s expected, luck-neutralized batting average (xAVG), on base percentage (xOBP), and slugging percentage (xSLG).

To calculate expected batting average, you begin by calculating the expected hits differential between a player’s actual BABIP and his expected BABIP. To calculate a player’s expected hits total, simply rearrange the BABIP formula using xBABIP in place of actual BABIP.

In other words, a player’s expected hits are equal to that player’s actual home run total plus his xBABIP times the following: At-bats minus strikeouts minus home runs plus sacrifice flies. In other words, xH=HR+xBABIP*(AB-K-HR+SF). Take this expected hits total and divide by at-bats to get xAVG.

Next, you will need to calculate xOBP. This is done by simply taking the quotient of the sum of hits, walks and hit by pitches and dividing that by the sum of at bats, walks, hit by pitches, and sacrifice flies. Not too complicated.

Calculating xSLG is at least as easy as calculating xOBP, but how you calculate it largely depends on how you perceive xBABIP to affect hits. If you think that a player’s power rate would remain constant irrespective of BABIP luck, then you simply calculate a player’s actual ISO (slugging percentage minus batting average) and add that value to his expected batting average.

If you pessimistically/optimistically believe that all hits gained/lost to BABIP luck were singles, then you calculate xSLG as by adding the difference between expected hits and actual hits to a player’s singles total, and then dividing the sum of singles plus two times doubles plus three times triples plus four times home runs by at-bats.

As may be obvious, both methods have their own issues with calculating the expected power of the hits gained/saved through BABIP luck.

The first xSLG method holds power constant, which seems nice in theory. However, given that home runs totals are generally not affected by BABIP luck hit changes, using ISO either over/underestimates power depending on whether xBABIP would either subtract or add hits to a player’s final line.

With the hits-added method, a player would be adding non-home run hits at an ISO pace that includes home runs. Alternatively, if hits are subtracted, it is subtracting some home run power value.

The “be overly pessimistic/optimistic approach” of course greatly oversimplifies this error, but it does so with a degree of skepticism. For hits added, we see what life would be like if all hits were singles, and think that there’s power upside to be had in the projection.

Alternatively, for hits subtracted, we get some dose of reality with the understanding that there’s a little more risk than the downward adjustment the numbers indicate. You might think of a hits-subtracted situation assuming all singles as the “upside” of luck-adjustment.

So pick you method of xSLG; each has its own vices. I prefer to use the first method (constant ISO adjustment), so that is what you will find in my spreadsheet of numbers below.

The methodology laid out, there are a few crucial points that must be addressed before the data are presented.

First is the people included in my data set. My data address only players who accumulated 300 or more plate appearances. With the exception of infield flyballs, pretty much all of the rest of the relevant xBABIP data stabilize by a half season’s worth of plate appearances.

However, several players of interest were fewer than 15 plate appearances under the threshold (Desmond Jennings, Justin Morneau, Grady Sizemore, Chris Coghlan and John Mayberry) who I decided to add to the sample out of personal interest nonetheless.

Second, you are probably wondering how to use a different xBABIP formula (particularly slash12’s) to get all the relevant numbers without having to do any additional, unnecessary work on your own. As a guy with a background in economics, I understand that desire to do the least amount of additional work necessary to capture the benefit sought, and accordingly, making an xBABIP formula adjustment is very easy with my spreadsheet.

All you need to do is change the formula in the xBABIP cell for the first player to reflect your favored xBABIP formula. Then, drag that cell down vertically to the bottom of the data set. Voila! All of the resulting changes and math will be done for you.

Finally, it is worth reminding you that the default xBABIP method used in my spreadsheet has slightly obsolete data (it’s multi-year data from a couple of years ago) that is totally obsolete with respect to a few teams: The Mets, Yankees and Twins. With these three teams, you will need to mentally adjust the numbers to reflect the differential between these teams’ old parks and their new ones.

Beyond just the limits of my particular data set, there is also an important assumption that underlies xBABIP that is critical to note. This assumption—which will be true of any xBABIP formula (well, unless that formula regresses a player’s numbers towards some skill-based mean, which in and of itself would raise its own issues)—is that a player’s xBABIP from year N will remain constant in year N+1. This is a bold assumption, and highly unlikely to be true in any single case.

xBABIP analyzes past luck based on past results, but it does not forecast the underlying elements that go in to figuring out the difference between skill and luck-based reality for future situations. To the extent a player’s expected future walk rate, strikeout rate, groundball rate, flyball rate, infield flyball rate, line drive rate and home run rate—to name a few areas—could/will deviate next year from this year, xBABIP will not reflect those deviations.

Hence, if you think a player’s line drive rate will increase in 2012 compared to 2011, then you should assume that his real expected future BABIP will be higher than his xBABIP. Let’s call this difference nominal xBABIP and real xBABIP.

You should be particularly wary of players who had abnormally high/low home run rates last year. To the extent that home runs will increase or decrease in 2012, that will be a major factor that will impact the player’s real versus nominal xBABIP figure. My spreadsheet calculates nominal xBABIP and makes adjustments accordingly. You will need to calculate or mentally adjust real xBABIP on your own.

That said, let’s look at the data. In case you have not already, you can download the spreadsheet by **clicking here**. If the column header has an “x” in front of the stat, it is xBABIP adjusted. If there is no “x,” then that stat is the player’s actual 2011 stat. For example, “AVG” is the player’s 2011 batting average, whereas “xAVG” is his expected batting average based on xBABIP.

If the column header has a “d” in front of the stat, then it is a differential. For example “dBABIP” is the difference between a player’s xBABIP and actual BABIP.

Looking through the 275-player spreadsheet, only 61 players (22 percent) have xBABIPs below their actual BABIPs, a testament to another year of excellent pitching and defense. The average actual batting average of the player sample is .267, while the average expected batting average was .281.

Clearly the data are a bit skewed on the high end. I tested the data set with slash12’s xBABIP formula, and it also had an average expected batting average that was more than .10 points above the actual league batting average. Fewer than 30 qualified players had a batting average of or above .300 this year; xBABIP believes that that number should have been 42.

Turning to the data, let’s first look at the “unluckiest” batters of 2011—those who are most likely to see the sharpest batting average improvements in 2012 (dBABIP greater than .050):

LastName FirstName Team BABIP xBABIP dBABIP Chone Figgins Mariners 0.215 0.314 0.100 Vernon Wells Angels 0.214 0.298 0.084 Rafael Furcal MULTIPLE 0.240 0.320 0.080 Chris Coghlan Marlins 0.263 0.331 0.068 Ian Kinsler Rangers 0.243 0.310 0.068 Russell Martin Yankees 0.252 0.318 0.066 Logan Morrison Marlins 0.265 0.328 0.064 Casey McGehee Brewers 0.249 0.313 0.064 Jonathan Herrera Rockies 0.273 0.337 0.063 Evan Longoria Rays 0.239 0.302 0.063 Alex Rios White Sox 0.237 0.299 0.062 Hanley Ramirez Marlins 0.275 0.337 0.062 Dan Uggla Braves 0.253 0.314 0.061 Ben Revere Twins 0.293 0.354 0.061 Ty Wigginton Rockies 0.271 0.330 0.059 Orlando Cabrera MULTIPLE 0.259 0.318 0.059 Adam Dunn White Sox 0.240 0.299 0.059 Jason Heyward Braves 0.260 0.318 0.058 Mark Teixeira Yankees 0.239 0.296 0.057 Jorge Posada Yankees 0.262 0.317 0.055 Miguel Tejada Giants 0.254 0.308 0.054 Juan Uribe Dodgers 0.245 0.299 0.053 Kelly Johnson MULTIPLE 0.277 0.330 0.053 Adam Lind Blue Jays 0.265 0.317 0.052 Wilson Valdez Phillies 0.288 0.338 0.051 Coco Crisp Athletics 0.284 0.335 0.051

As you might expect, a lot of the guys with some of the lowest batting averages in baseball populate this list. Those players, though mostly terrible, were not nearly as terrible as their batting lines from last year indicate. For example, Alex Rios was likely more a .260-.270 than a .227 hitter, and Adam Dunn should have hit closer to .200 than .159.

Mingled in with the bad players with bad luck last year, however, are a few really interesting names. The one that most stands out is Ian Kinsler, who I already explained could be a first-round caliber player next season. In addition to Kinsler are Evan Longoria and Hanley Ramirez. Long-time fans of the pair can take a cautious sigh of relief if they were worried about spending a third-round pick on either. Mark Texeira is on this list, but I am more skeptical than I am with Ramirez and Longoria that he can bounce back to previous batting average form.

The most shocking name on this list might be Chone Figgins, who seems to be at the end of his career after a .302 wOBA (88 wRC+) last season and a putrid .218 wOBA (34 wRC+) this season. xBABIP thinks Figgins should have hit .273/.321/.332 (.653 OPS) this year, which would have been about league average by wOBA standards once park factors are considered.

Figgins’ bat is pretty hollow in real life, but as a perennial base-stealing threat when he gets on, it is encouraging to see that Figgins still has the potential to get on base 33 percent of the time. Figgins’ walk rate this season plummeted to a career-low 6.7 percent after four seasons of a walk rate above 10 percent, so some bounceback could be imminent just from regression. This noted, Figgins could be a sleeper source of stolen bases next year.

Next, the 26 “luckiest” batters of 2012 (dBABIP less than -.015), who are most likely to see the sharpest batting average declines in 2012:

LastName FirstName Team BABIP xBABIP dBABIP Wilson Betemit MULTIPLE 0.391 0.323 -0.068 Adrian Gonzalez Red Sox 0.380 0.333 -0.047 Nick Hundley Padres 0.362 0.317 -0.044 Alex Avila Tigers 0.366 0.326 -0.041 Miguel Cabrera Tigers 0.365 0.324 -0.041 Hunter Pence MULTIPLE 0.361 0.322 -0.039 Chase Headley Padres 0.368 0.329 -0.039 Jose Reyes Mets 0.353 0.319 -0.034 Matt Kemp Dodgers 0.380 0.345 -0.034 Daniel Murphy Mets 0.345 0.311 -0.034 Victor Martinez Tigers 0.343 0.309 -0.034 Nyjer Morgan Brewers 0.362 0.329 -0.032 Jemile Weeks Athletics 0.350 0.320 -0.030 Michael Young Rangers 0.367 0.337 -0.030 Lucas Duda Mets 0.326 0.297 -0.029 Alex Gordon Royals 0.358 0.331 -0.027 Jhonny Peralta Tigers 0.325 0.300 -0.025 Dustin Ackley Mariners 0.339 0.316 -0.023 Andre Ethier Dodgers 0.348 0.326 -0.023 Carlos Beltran MULTIPLE 0.324 0.302 -0.021 Mike Napoli Rangers 0.344 0.323 -0.021 Joey Votto Reds 0.349 0.329 -0.020 Ryan Raburn Tigers 0.324 0.305 -0.020 Casey Kotchman Rays 0.335 0.318 -0.017 Michael Morse Nationals 0.344 0.328 -0.016 Ryan Braun Brewers 0.350 0.334 -0.016

As mentioned above, only 22 percent of the players in the sample overperformed their expected BABIP in 2011. This is likely due to the returned recognition of value provided by athleticism and defense in the post-*Moneyball* era, along with better pitching league-wide.

Unsurprisingly, the “luckiest” batters tend to be the guys who competed for the batting title, and in this regard we find the names Matt Kemp, Adrian Gonzalez, Victor Martinez, Miguel Cabrera, Jose Reyes, and Ryan Braun mingled into the list.

This does not mean that these players are per se guys to avoid next year; they are still great. Their inclusion on this list simply means that their value will be inflated above their luck-neutral talent line. An inflated batting average through BABIP luck tends to lead to extra runs and RBIs, as well as stolen bases, by virtue of the law of opportunity.

Some of the interesting non-elite names on the luck list are second basemen Jemile Weeks and Dustin Ackley. Second base was surprisingly deep this year. Per Yahoo’s end of season player rankings, four of the top 26 players were second-base eligible, while seven of the top 100 players were second basemen. With both second base rookies poised to see their averages drop precipitously next season, it is quite possible second base might not be as bountiful next year.

Alex Avila also resides on this list. While his .295 batting average may not be for real, his 15-20 home run power is. The same can be said about Mike Napoli, who is really a .260 hitter with 20-30 home run power depending on playing time.

Of all the names on the list, however, I think Alex Gordon might end up being the most overrated for 2012. As a long-time Gordon supporter and well-rewarded 2011 owner, it pains me to call the guy overrated after years of him not getting a proper chance, but Gordon is not a .300/20/20 player.

Rather, he is more a .275-.280 hitter capable of a low .800s OPS with 20 home run capability and double-digit stolen base potential. A .280/20/13 campaign may be in the cards, but you’ll likely be paying a premium over that level to acquire him next year in non-keeper formats. It is also worth noting that Gordon loses his third-base eligibility next year, which will also negatively affect his fantasy value.

So who are some names on the BABIP luck list that most shocked you? Who do you think is least likely to match his expected batting average?

As always, leave the love/hate in the comments below.

Tim said...

Awesome piece.

Jeffrey Gross said...

Thanks Tim!

johnnycuff said...

what about age? is BABIP a “young player’s skill?”

looking at the ages in your data sets, the “lucky” xBABIP players had an average age about two years younger (28.2 vs. 30) than the “unlucky” set.

the standard deviations tell an interesting story as well. the “lucky” set (stdev 2.7) is clustered tighter around the generally accepted MLB peak age range of 27-29 than the “unlucky” set (stdev 4.3). 16 of the 26 “lucky” players (61.5%) are inside of that age range.

i’m a pretty poor statistician, so i’m going to leave it to you to determine if any decent conclusions can be drawn from this. has anybody looked at BABIP vs. age before?

johnnycuff said...

also i would like to raise the startling result that russell martin’s full name is actually russell nathan coltrane jeanson martin

jeffrey gross said...

johnncuff,

That would make sense, as bat and foot speed is an element of BABIP skill. Younger players tend to have better speed in both regards.

Jeffrey Gross said...

I highly encourage everyone to read these two articles on age and skill by Rany:

1. http://www.baseballprospectus.com/article.php?articleid=15295

2. http://www.baseballprospectus.com/article.php?articleid=15306

The Wizard said...

Hi Jeffrey,

Thanks for the calculations.

Is there a discrepancy on the xBABIP column between the spreadsheet and the article? Rios’ xBABIP in the article is .299 but the spreadsheet says .322. Dunn’ xBABIP in the article is .299 but the spreadsheet says .307.

My apologies if I’m reading the spreadsheet wrongly. I see the columns in the spreadsheet as LastName, FirstName, Team, P/PA, BABIP, xBABIP, dBABIP…

Again, thanks for your work

Andrew07 said...

Excellent work, Jeffrey.

Bobby A said...

Great work! Now, I just need remember to come back to this page before my draft in March.

Jeffrey gross said...

Bobby a,

Bookmark us!

Jeffrey gross said...

Wizard, I’ll take a second look when I get back in town tonight

jeffrey gross said...

@The Wizard

Where are you seeing the discrepancy. I downloaded the provided sheet, and it shows Dunn’s xBABIP at .299. Rios also shows up correctly

jeffrey gross said...

Thanks Andrew

jeffrey gross said...

I got a few questions regarding the high batting average. Keep in mind that teams generally do not give struggling players more at bats, so the fact that the mean batting average of players who accrued 300+ PA should inherently be higher than the total mean in the baseball universe

The Wizard said...

Alex Rios is on Row 69. His BABIP is listed as .237, his xBABIP as .322, and his dBABIP as .0095.

After seeing your reply I started looking around. The spreadsheet has several sheets: ‘Overview’, ‘xBABIP 2012.xls’, ‘THT_Mult_Team’, ‘THT_one_team’, ‘Sheet7’, ‘Team C’, ‘Sheet1’, and ‘xBABIP 2012.csv’.

When I fitst open the spreadsheet I get the ‘Overview’ sheet. That’s where the values I gave in the 1st sentence come from. However, if I click on the bottom the tab for the ‘xBABIP 2012.xls’ sheet (Alex Rios is on Row 12) the sheet that opens has the correct values! Did you use 2 sheets or am I opening this somehow wrongly? What sheet do you see when you first open the file?

Thanks Jeffrey

jeffrey gross said...

The Wizard,

I see what you are talking about. I accidentally shared the wrong file. The data on the “OVERVIEW” page sorted incorrectly when I tried to prearrange it. I will upload a new file later tonight.

The “xbabip2012.xls” sheet is the one you should use

Jeffrey Gross said...

I have uploaded a corrected spreadsheet

Jeffrey Gross said...

(just redownload from the above links)

The Wizard said...

Looks good!

Thanks Jeffrey.

Ben Pritchett said...

Good stuff Jeffrey.

Zach said...

I think a .323 xbabip makes Napoli a .280 hitter, not a .260 hitter. With 30 home runs in 432 abs and the probability he plays full time next year, he’s also a 30-40 hr hitter, not a 20-30 hr hitter. His k rate was down and his walk rate was up. These, and others, were real gains. I’m guessing you weren’t bullish on Bautista going into 2011 either?

Jeffrey gross said...

I was actually very bullish on Bautista heading into the year the thing about Mike Napoli is despite the fact that he’s gotten limited playing time in the past is that he pretty much is what he is. he is a high strikeout hi homerun pretty good walk hitter. His approach points towards a sub. .270 avg. I’m on a phone without computer this week but this is certainly something I will address in my upcoming catchers ranking article

S&P said...

Awesome article…I found myself wondering, though—all the guys with strong positive and negative differentials were guys whose base BABIP were unusually high or low to start with. There is no one in the sample, eg, with a BABIP of .275, and xBABIP of .220. So why not go the simple route, and just look straight at BABIP, make a mental adjustment for really fast/slow guys or dead pull hitters, and bypass the additional work?

Jeffrey gross said...

Im glad you enjoyed the article. The reason I use this method is it gives some baseline of what level of improvement or regression should be expected next year Rather than just merely should a player mprove or probably regress in the coming season. This way is easier to weigh the relative batting average values of the different players in the league through the other method would be just throwing darts and saying I think I will improve next year or regress

SaberFan said...

So much baseball. I love it.

Alex K said...

It seems odd that so many “lucky” players were on the Tigers (5 in total). Is that a coincidence or could it be that their stadium has some sort of effect on this stat?