Order NowThe Hardball Times Baseball Annual 2010 is now in development and will ship in mid November! This year's book will feature articles by THT's staff as well as Bill James, Rob Neyer, Tom Tango and Craig Wright. If you use this link to purchase the Annual, you will be in the first group to receive it and you'll be supporting THT. ![]() Derek Ambrosino
John Burnson Derek Carty Marco Fujimoto Eriq Gardner Matt Hagen Jonathan Halket Rob McQuown Troy Patterson Mike Silver Paul Singman Michael Street And here's the full roster. Got a question for our fantasy baseball experts? Email us:
Heater MagazineAdd 10 MPH to your fantasy team — see for yourself
HEATER MAGAZINE Winner, 2008 CBS Sportsline Fantasy League of Experts ![]() Plus our Statistical Definitions Most Recent Comments
Waiver Wire Offseason: NL (4)
Approaching unconscious competence (25) Waiver Wire Offseason: AL (6) Waiver Wire Offseason: AL (5) Top 10 prospects for 2010: Tampa Bay Rays and Baltimore Orioles (5) Monthly Archives
November, 2009
October, 2009 September, 2009 August, 2009 July, 2009 June, 2009 May, 2009 April, 2009 March, 2009 February, 2009 January, 2009 December, 2008 November, 2008 October, 2008 September, 2008 August, 2008 July, 2008 June, 2008 May, 2008 April, 2008 March, 2008 February, 2008 January, 2008 December, 2007 November, 2007 October, 2007 September, 2007 August, 2007 July, 2007 June, 2007 May, 2007 Gear up for baseball season with Chicago White Sox tickets and New York Yankees tickets. LA Angels tickets, Houston Astros tickets, and Atlanta Braves tickets are hot sellers! You can get Boston Red Sox tickets, San Diego Padres tickets or Chicago Cubs tickets for your favorite baseball fan. Coast to Coast Tickets has the best MLB tickets like Minnesota Twins tickets, LA Dodgers tickets, Milwaukee Brewers tickets, New York Met tickets and St. Louis Cardinals tickets. Find premium Chicago Cubs tickets and other Chicago tickets at JustGreatTickets.com. Chicago Cubs Tickets Chicago Tickets ![]() All content on this site (including text, graphs, and any other original works), unless otherwise noted, is licensed under a Creative Commons License. |
Most Recent Posts
Monday, January 26, 2009What’s the best BABIP estimator?Posted by Derek Carty at 3:17amBABIP is a stat that lots of people like to throw around but many don't fully understand (even some who profess to be statistically inclined). Background infoBABIP stands for Batting Average on Balls in Play. It measures the rate at which balls in play fall in for hits. Essentially, any ball that the batter makes contact with, puts into fair territory, and does not become a home run falls into the domain of BABIP. It is calculated as (H-HR)/(AB-K-HR). We use BABIP to evaluate both pitchers and hitters, but the way in which we use it differs greatly among the two. Most pitchers regress toward the league average BABIP of around .300 or .305. Very few pitchers can repeatedly do better or worse than this, so we say that pitchers have very little control over BABIP. Hitters, on the other hand, can have a substantial amount of control over BABIP. Ichiro Suzuki, for example, has a .356 career BABIP. Hitters do not regress toward league average, rather, they each regress toward their own, unique number. The big question these days seems to be, what is that number? Today, I'd like to look at several ways of determining it and see which is best. The testThis is something I've been curious about for a while, so I took as many BABIP estimators as I could think of and decided to put them up against each other to see which does the job of predicting the following year's BABIP the best. The combatants
The processI used data from 2004 to 2008, matching players from one year to the next. As xBABIP was the reason for doing the study, I had to work around that a little bit. xBABIP wasn't calculated for anyone with fewer than 300 plate appearances, so I made that the cut-off for both year one and year two. There are some biases with using cut-offs, but there's no way around it in this instance. From there, I adjusted each stat for differences in league average and ran a couple of tests. You can see the results below. The results+---------------+-------+--------+---------+---------+-------------+-----------+--------+ | TEST | BABIP | xBABIP | qxBABIP | ldBABIP | studesBABIP | xBA BABIP | mBABIP | +---------------+-------+--------+---------+---------+-------------+-----------+--------+ | Correlation | 0.38 | 0.50 | 0.45 | 0.20 | 0.32 | 0.40 | 0.46 | | R-Squared | 0.14 | 0.25 | 0.20 | 0.04 | 0.10 | 0.16 | 0.21 | | Average Error | 0.028 | 0.021 | 0.022 | 0.029 | 0.024 | 0.022 | 0.022 | +---------------+-------+--------+---------+---------+-------------+-----------+--------+ As you can see, there's a pretty clear pecking order in these results: +------+-------------+ | RANK | ESTIMATOR | +------+-------------+ | 1 | xBABIP | +------+-------------+ | 2 | mBABIP | | 3 | qxBABIP | +------+-------------+ | 4 | xBA BABIP | | 5 | BABIP | +------+-------------+ | 6 | studesBABIP | +------+-------------+ | 7 | ldBABIP | +------+-------------+ I've also broken things down by tiers. Dutton and Bendix's xBABIP seems to be the best, and I can only imagine what looking at multiple years of it would do. Just one year of data can explain 25 percent of the change in BABIP, a very big number for a stat with such wide variability. That it beats three years worth of Marcels data (plus regression to the mean and age adjustments) is excellent as well. After that comes Marcels (which I've currently been using), and the quick version of xBABIP (which, I should note, doesn't include a not-hard-to-apply team adjustment. I didn't include it for some logistical reasons, but it would likely improve the accuracy a bit). It's very nice to see the quick version grade out so nicely since it will be easy to calculate in-season (although thanks to Sal Baxamusa, Marcels isn't very difficult either). Then comes Baseball HQ (which Average Error thinks belongs in tier two) and actual BABIP, followed by Dave's more complex BABIP estimator (which was derived back at the beginning of 2005 when we were first starting to work with batted ball data). Finally, line drive BABIP — which is the arguably the most popular of any other measure on this list — comes in dead last, well below everyone else and significantly worse than simply using actual BABIP. I've long said that I dislike this way of estimating BABIP, and it's very nice to see the tests confirm it. Going forwardGoing forward, I'll be using xBABIP in place of Marcels BABIP in my True Batting Average calculations and when discussing a player's BABIP in general. I'm committed to giving you guys the best there is, and Chris and Peter's model is tops among any BABIP estimator that I know of. If you missed the original article, I'd definitely recommend you go back and read it. Some notes from Chris DuttonChris worked a lot with me on this, and I really appreciate his receptiveness and helpfulness. Here are some things he wanted me to pass along. First, he has changed the model a bit since the original article. Here are the exact changes and his explanation of them: Old formula: Hitter eye, Pitches per extra-base hit, LD%, FB/GB, Speed score, Contact rate, Spray, Pitches per AB New formula: HR/FB, IF/FB, LD%, FB/GB, Speed score, Lefty*(FB/GB%), Contact rate, Spray The differences are basically that I used hr/fb as a measure of power rather than pitches per extra base hit, added popups/FB to measure poorly hit balls, and included an interaction variable of lefty*(fb/gb%) to adjust for the fact that lefty ground ball hitters tend to often hit balls to the right side of the field (which rarely become hits). I also removed pitches_per_AB, which seemed to be potentially correlated with other variables, and removed hitter_eye since contact rate seemed to be capturing a very similar effect. Chris also says that he's isn't done improving the model. He is constantly looking for ways to improve it even further, and is specifically hoping to incorporate some PITCHf/x data as soon as possible. Finally, Chris is developing a tool that would allow readers to easily calculate the quick version of xBABIP. This would prove to be useful in-season when we constantly need to be changing our evaluation of hitters. While constantly calculating things like Spray would be time-consuming and difficult, the quick version utilizes stats that are all readily available and — as the tests show — is still effective. The tool also has some other cool features: interactive graphs, projected stat lines, and some other things you might find useful. References and resourcesExpected BABIP and Quick Expected BABIP data was provided for me by Chris Dutton. A big thanks to him for his help and also for helping to create such an excellent stat. Marcels BABIP was taken from Tango's site. The rest of the stats I calculated myself. Derek Carty is a 22-year old fantasy baseball analyst residing in New Jersey. In addition to writing for THTF, his work has appeared at Rotoworld (NBC), Sports Illustrated, FOX Sports, and Heater Magazine. In his two years competing in expert leagues, he has won 2 titles with 4 four top three finishes, including a LABR NL title in 2009, making him the youngest person to ever win a major expert league title. Derek is a proud graduate of the MLB Scouting Bureau's Scout Development Program and is a firm believer in the importance of combining stats and scouting. He welcomes questions via e-mail. Comments
MattS said...
I wrote an article on BABIP a couple weeks ago that discusses breaking down BABIP on line drives, groundballs, and flyballs. I model each individual one and predict the following year’s BABIP from the previous three. I’m working on just a simple plug-in regression model to compute expected BABIP, but the variables that I use actually have a higher correlation with the following year’s BABIP than one using the available variables that xBABIP use, at least with my dataset. Have a look. I’m currently working on a newer article to come out soon on this topic. Here’s the previous article discussing what I did. http://www.thegoodphight.com/2009/1/16/726379/babip-projection-and-new-s Posted 01/26 at 09:26 AM
Dave Studeman said...
Very cool, Derek. I’m glad folks like you and Chris have followed up on this, because my calculations were only retrospective in nature, and not meant to be predictive. I was also curious because I know Chris and Peter used stats from Baseball Prospectus for their line drives, and I wondered how the formula would work with the BIS stats. Looks like it works pretty well. Posted 01/26 at 10:48 AM
TangoTiger said...
Derek, good job. I would be interested in seeing the correlation results of seasons 2005-2007 onto 2008 for all the non-Marcel seasons, and using the Marcel 2008 forecast onto the 2008 season. The reason is that Marcel uses 3 years of data, so it has an unfair leg up on all the others. Finally, how about a correlation of ALL the estimators onto the 2008 BABIP. There’s no reason for an “either/or” choice. Well, for things like ldBABIP, it can be discarded, since it’s a subset of a few of the other models you have. (But, the regression you hopefully pick that out, and give it a weight of zero.) Posted 01/26 at 11:06 AM
Barca said...
“xBABIP wasn’t calculated for anyone with fewer than 300 plate appearances, so I made that the cut-off for both year one and year two” Mike Napoli was the first one that I would like see calculations for. but it appears that he just missed the cut-off. Posted 01/26 at 11:11 AM
nilodnayr said...
It would be interesting to see this broken down by batter type. While one model might be best for all players, it might have deficiencies with certain types of players (power vs single, FB vs GB, high BA vs low BA, etc), which either might point us to use different models for different types of players or perhaps a continued tinkering of the xBABIP model. Posted 01/26 at 11:12 AM
Barca said...
“xBABIP wasn’t calculated for anyone with fewer than 300 plate appearances, so I made that the cut-off for both year one and year two” Mike Napoli was the first person that I wanted to see the calcualations for, but it looks like it misses your cut off. Posted 01/26 at 11:12 AM
centris said...
Do you think that CHONE or PECOTA projected BABIP would beat xBABIP? Posted 01/26 at 12:30 PM
Red Sox Talk said...
Hi I also have done a study looking to improve on the LD% +.120 method. I looked at 449 hitters form 2008 with at least 300 PA: Posted 01/26 at 05:55 PM
philosofool said...
I find it ironic that the old formula .12+LD% is actually less reliable a predictor of future BABIP than BABIP itself. Posted 01/26 at 11:39 PM
BobbyRoberto said...
I noticed Derek’s BABIP calculation does not include sacrifice flies, while the one linked by Red Sox Talk did include sac flies. Which is the right, or best, way to do it? Shouldn’t everyone figure it the same way? Posted 01/27 at 12:23 AM
youngid said...
That’s an interesting approach, Red Sox Talk, if we ever get HitFX data I’m sure a variation of your formula will be a great BABIP predictor. Posted 01/27 at 01:28 AM
Brian Cartwright said...
Good work Derek. Confirms what I wrote a couple weeks ago at FanGraphs that Marcel is better than LD% Posted 01/27 at 02:49 AM
Derek Carty said...
Dave, Tango, Also, you’re right that Marcels has a leg up on all of the systems since it uses three years of data. Looking back, it seems like I only glanced over that point. I’ll run a three-year average of the other systems tomorrow and see how they stack up. Posted 01/27 at 09:26 PM
Derek Carty said...
Sorry, Barca. I didn’t put the stats together, but it looks like Napoli did miss the cut-off. nilodnayr, centris, Interesting, Red Sox Talk. If I do another study like this, I’ll include your formula. philosofool, BobbyRoberto, Posted 01/27 at 09:34 PM
Derek Carty said...
mulkowsky, MattS, Posted 01/27 at 09:37 PM
Page 1 of 1
Next Post: Dollar specials>> <<Previous Post: THT Season Preview 2009 goes to printers |
Great, cutting edge stuff! (And Marcel continues to amaze.)
In the original article you guys posted a link to download Dutton and Bendix’s xBABIP results. Could you do that again for their updated model?