February 9, 2010
Order NowGet "The world champ of baseball annuals." The Hardball Times Baseball Annual 2010 features articles by THT's staff as well as Bill James, Tom Tango and Craig Wright and contains much, much more. Please support THT and use this link to purchase the Annual. Get the fantasy book that everyone's raving about! Edited by THT Fantasy's Rob McQuown and Michael Street, and featuring our own Matt Hagen on prospects. Shipping now from ACTA! ![]()
Pat Andriola
Rich Barbieri John Barten Brian Borawski Craig Brown Evan Brunell Chuck Brownson Kevin Dame Joshua Fisher David Gassko Jeremy Greenhouse Brandon Isleib Chris Jaffe Max Marchi Bruce Markusen Dan Novick Harry Pavlidis Alex Pedicini Jeff Sackmann Nick Steiner Dave Studeman Steve Treder Bryan Tsao Tuck! Geoff Young John Brattain And here's the full roster.
Or you can search by:
Gear up for baseball season with Chicago White Sox tickets and New York Yankees tickets. LA Angels tickets, Houston Astros tickets, and Atlanta Braves tickets are hot sellers! You can get Boston Red Sox tickets, San Diego Padres tickets or Chicago Cubs tickets for your favorite baseball fan. Coast to Coast Tickets has the best MLB tickets like Minnesota Twins tickets, LA Dodgers tickets, Milwaukee Brewers tickets, New York Met tickets and St. Louis Cardinals tickets. Find premium Chicago Cubs tickets and other Chicago tickets at JustGreatTickets.com. Chicago Cubs Tickets Chicago Tickets ![]() All content on this site (including text, graphs, and any other original works), unless otherwise noted, is licensed under a Creative Commons License. |
Batters and BABIPby Chris Dutton and Peter BendixDecember 02, 2008 What we didBatting average on balls in play—the rate at which batted balls other than home runs become hits—is commonly used as a measure of pitching performance. However, precious little work has been done to explore BABIP from the hitter’s perspective. While luck is bound to play a large role in determining whether a ball in play will become a hit or an out, there are certainly some quantifiable aspects of hitting ability that give a batter at least some control of the outcome. Some people like to add .120 to a batter's Line Drive Percentage to predict his BABIP (a guideline originally suggested by Dave Studeman). But one would expect that BABIP depends on more than just the ability to hit line drives. Speed, for instance, clearly seems to play a significant role. And what about the ability to control the strike zone, make consistent contact and hit the ball to all fields? For example, if Jacoby Ellsbury hits a ground ball in the hole between short and third, he has a higher chance of getting a hit than if Bengie Molina hits the exact same ball in the exact same place. Anecdotally, this is how Ichiro manages to get so many hits every year. And fans of the Red Sox, Yankees and Rays can tell you that David Ortiz, Jason Giambi and Carlos Pena have been robbed of many a base hit because of the extreme defensive shifts used against them, whereas Dustin Pedroia, Derek Jeter and BJ Upton have gotten more hits because of their batting eye and their ability to use the whole field. Surely, these factors contribute to whether or not a batted ball becomes a hit. We endeavored to take a more scientific look at batted ball data to develop a better method of finding a hitter’s expected BABIP. Using Baseball Prospectus data from 2002-2008, we calculated a range of variables that we considered to be the primary factors in determining BABIP:
Using this dataset, we designed a regression model to determine the relationship between each factor and a hitter’s BABIP. Essentially, the model takes seven years worth of data and compresses it into a single formula that inputs the variables above and spits out a predicted BABIP. Using this, we can compare players’ actual and predicted BABIP to identify instances in which a player significantly outperformed or underperformed his expectations. Furthermore, we can use the model to strip luck from the equation and calculate a “luck-neutral” measure of BABIP. Our regression model yields an R-squared value of .348, and all non-vector explanatory variables are significant at the 1 percent level. This suggests that the factors included are all highly significant, and jointly explain roughly 35 percent of the variance in a hitter’s BABIP. As an additional test of accuracy, we find a robust 59 percent correlation between actual and predicted BABIP for all players in our sample. Given the tremendous uncertainty regarding the outcome of balls in play, these results are extremely promising. By contrast, commonly used models based on line drive percentage alone explain only about 3 percent of the variance in BABIP when applied to the same dataset, and yield a mere 18 percent correlation between predicted and actual values. As mentioned above, all of our key independent variables are statistically significant at the 1 percent level. That is to say, there is virtually no chance that the effects reflected in this model are the product of random chance. Our regression results show positive effects for hitter’s eye, line drive percentage, speed score and pitches per plate appearance, all of which conform to common sense. On the other hand, we find negative coefficients on pitches per extra-base hit, fly ball/ground ball ratio, spray and contact rate. One might expect a higher contact rate to lead to a higher BABIP, but the opposite actually seems to be the case. This is likely caused by the correlation between strikeouts and power, since players who swing hard tend to either miss entirely or crush the ball for hits. If this theory is reflected in our data, it makes sense that we would expect a player with a lower contact rate to generate a higher predicted BABIP. This is consistent with Studeman's follow-up work on BABIP. What does it mean?Okay, now you know what we did. Let’s discuss what it means. We’ve developed a new and better way of finding a batter’s expected BABIP. We will call our model’s predicted BABIP "xBABIP," in contrast to the old way of calculating BABIP, which was LD% + .120. We will refer to this old model of calculating expected BABIP as "old-xBABIP." The idea is to separate skill from variance. We’ve isolated a batter’s skill at getting hits on balls in play; therefore, we can assume that most deviation in BABIP from our model’s predicted BABIP is likely due to random fluctuation, and therefore unlikely to be repeated. We can actually test this theory by looking to the past. Let’s examine the players whose actual BABIPs differed most from their xBABIPs in 2007 (the expected BABIP as predicted by our model), and then look at what happened in 2008. Our hypothesis is that these players shouldn’t consistently under/over-perform their xBABIP. Let’s start with players who were “unlucky” in 2007. YEAR NAME BABIP xBABIP Diff YEAR NAME BABIP xBABIP Diff 2007 Ramon Vazquez .258 .322 -.063 2008 Ramon Vazquez .342 .322 .020 2007 John Buck .233 .283 -.049 2008 John Buck .269 .282 -.013 2007 Bobby Crosby .253 .303 -.050 2008 Bobby Crosby .275 .276 -.001 2007 Julio Lugo .258 .309 -.051 2008 Julio Lugo .312 .284 .029 2007 Ray Durham .231 .276 -.044 2008 Ray Durham .342 .313 .029 2007 Lyle Overbay .270 .321 -.051 2008 Lyle Overbay .311 .310 .001 2007 Rickie Weeks .270 .321 -.050 2008 Rickie Weeks .266 .294 -.028 2007 Dioner Navarro .243 .286 -.043 2008 Dioner Navarro .313 .303 .010 2007 Brad Wilkerson .269 .316 -.046 2008 Brad Wilkerson .267 .279 -.011 2007 Jay Payton .261 .299 -.038 2008 Jay Payton .266 .304 -.038 2007 Adam Lind .265 .303 -.038 2008 Adam Lind .313 .302 .011 2007 Ian Kinsler .267 .305 -.038 2008 Ian Kinsler .325 .295 .030 2007 Nick Punto .251 .285 -.034 2008 Nick Punto .331 .304 .027 2007 Dan Uggla .268 .304 -.036 2008 Dan Uggla .313 .294 .018Wow—that’s pretty compelling evidence for the model. We didn’t cherry-pick these, either—these were the “unluckiest” hitters of 2007 who also had enough plate appearances to qualify for our model in 2008. Only Rickie Weeks and Jay Payton saw their actual BABIP remain below their xBABIP in 2008, while everyone else had a 2008 BABIP that was either very close to their xBABIP, or above it. Had we seen these numbers after 2007, we may have been able to predict the rise of Vazquez, Navarro, Lind, Kinsler and Uggla—all of whom seemingly "came out of nowhere" in 2008. And what about hitters who were particularly lucky in 2007? YEAR NAME BABIP xBABIP Diff YEAR NAME BABIP xBABIP Diff 2007 Matt Kemp .411 .301 .110 2008 Matt Kemp .359 .312 .047 2007 Ichiro Suzuki .384 .317 .067 2008 Ichiro Suzuki .330 .307 .023 2007 Willy Taveras .355 .293 .062 2008 Willy Taveras .282 .292 -.010 2007 Magglio Ordonez .379 .315 .064 2008 Magglio Ordone .331 .303 .028 2007 Howie Kendrick .374 .314 .060 2008 Howie Kendrick .351 .316 .035 2007 Jayson Werth .380 .322 .058 2008 Jayson Werth .319 .314 .005 2007 Mark Reynolds .368 .313 .055 2008 Mark Reynolds .319 .304 .014 2007 Edgar Renteria .373 .319 .053 2008 Edgar Renteria .289 .301 -.012 2007 Mike Lowell .335 .288 .047 2008 Mike Lowell .278 .282 -.003 2007 Ryan Braun .353 .304 .050 2008 Ryan Braun .301 .287 .014 2007 David Ortiz .352 .306 .046 2008 David Ortiz .269 .302 -.033 2007 Jose Vidro .333 .290 .043 2008 Jose Vidro .242 .290 -.048 2007 B.j. Upton .387 .338 .048 2008 B.j. Upton .340 .340 .000 2007 Luis Castillo .318 .284 .034 2008 Luis Castillo .258 .245 .013Again good results, although more mixed. Kemp, Ichiro and Kendrick again significantly beat their xBABIP in 2008. Interestingly, Ichiro and Kendrick are both known to be unique hitters. Does Matt Kemp do anything differently than most other hitters? But nearly all of the "lucky" players in 2007 regressed in 2008. The model predicted the downfall of Renteria, Taveras, Vidro (although he was also quite unlucky in 08) and Castillo. It correctly predicted a return-to-earth for Upton, Reynolds, Ortiz, Braun and Lowell. Next, let’s look at hitters for whom xBABIP disagreed strongly with old-xBABIP. Here are the top cases where old-xBABIP overrated players in 2008: YEAR NAME BABIP xBABIP old-xBABIP 2008 Brian Schneider .275 .289 .355 2008 Ryan Ludwick .333 .325 .404 2008 Kevin Millar .244 .257 .311 2008 Jesus Flores .309 .293 .353 2008 Omar Infante .323 .294 .356 2008 Joey Gathright .278 .156 .215 2008 Jose Lopez .302 .286 .346 2008 Khalil Greene .251 .276 .336 2008 Cesar Izturis .271 .286 .344 2008 Todd Helton .296 .312 .369 2008 John Bowker .296 .308 .364 2008 Damion Easley .278 .262 .317 2008 Paul Konerko .243 .280 .330 2008 Clint Barmes .322 .308 .360 2008 Freddy Sanchez .282 .304 .356 2008 Jack Wilson .284 .299 .348 2008 Omar Vizquel .239 .277 .326 2008 Dioner Navarro .313 .303 .352 2008 Xavier Nady .327 .321 .370For these players, the old guideline would lead you to believe that the players had been rather unlucky this season. However, our new model shows that these players were far less unlucky than previously thought. In other words, simply using line-drive percentage to predict BABIP overrated these players. And the players that were most underrated by the old model: YEAR NAME BABIP xBABIP old-xBABIP 2008 Gary Matthews Jr. .289 .307 .252 2008 Hunter Pence .298 .290 .236 2008 Jeff Mathis .231 .269 .217 2008 Alexi Casilla .288 .281 .235 2008 Fred Lewis .365 .336 .293 2008 Carlos Gomez .324 .301 .260 2008 Delmon Young .334 .306 .268 2008 Nick Punto .331 .304 .267 2008 Jacoby Ellsbury .305 .326 .290 2008 Lance Berkman .336 .309 .273 2008 Rickie Weeks .266 .294 .260 2008 Denard Span .328 .338 .306 2008 Michael Bourn .283 .277 .246 2008 Yunel Escobar .303 .296 .265 2008 Erick Aybar .297 .304 .274 2008 Brendan Harris .312 .297 .273 2008 Jason Varitek .270 .295 .272 2008 Coco Crisp .308 .321 .298 2008 Howie Kendrick .351 .316 .294For the most part, our model believes these players’ actual BABIP are closer in line with expectations than the old model’s xBABIP. In other words, old-xBABIP may think that Alexi Casilla got lucky, but our model suggests he hit in line with expectations. Simply using line-drive percentage to predict BABIP underrated these players. Finally, let’s take a look at the players who were the most lucky and unlucky this season. We’d expect that many of these players will regress in 2009—not necessarily all are going to, as some are simply going to get lucky or unlucky again. However, we can be confident that most of these players will experience regression in '09. Let’s start with 2008’s luckiest hitters: YEAR NAME BABIP xBABIP Diff 2008 Joey Gathright .278 .156 .122 2008 Chipper Jones .382 .325 .058 2008 Matt Kemp .359 .312 .047 2008 Ryan Theriot .335 .291 .044 2008 Felipe Lopez .324 .287 .037 2008 Milton Bradley .375 .334 .041 2008 Aaron Miles .337 .301 .037 2008 Yadier Molina .307 .274 .033 2008 Shin-soo Choo .359 .320 .039 2008 Geovany Soto .331 .295 .036 2008 Mike Aviles .355 .317 .038 2008 Reed Johnson .338 .302 .036 2008 Jason Bay .318 .285 .033 2008 Chone Figgins .329 .295 .034 2008 Chase Headley .356 .319 .036 2008 Howie Kendrick .351 .316 .035 2008 Edgar V Gonzalez .335 .302 .033 2008 Ryan Doumit .328 .297 .031 2008 Manny Ramirez .360 .326 .034 2008 Aaron Rowand .318 .288 .029Unsurprisingly, this list includes a lot of 2008’s surprises—Bradley, Miles, Aviles, Doumit, Choo, Lopez. Interestingly, Gathright’s xBABIP of .156 was nearly 90 points lower than the next-closest person (and remember, we do take speed into account in the model). Maybe Geovany Soto isn’t quite this good. Perhaps Manny Ramirez and Milton Bradley will disappoint whoever signs them. The Cardinals’ middle infielders aren't as good as they seemed. And 2008’s unluckiest hitters: YEAR NAME BABIP xBABIP Diff 2008 Brandon Inge .229 .292 -.063 2008 Corey Patterson .210 .262 -.051 2008 Carlos Ruiz .230 .282 -.052 2008 Willy Aybar .261 .314 -.054 2008 Jason Giambi .234 .282 -.048 2008 Nick Swisher .245 .294 -.049 2008 Jose Vidro .242 .290 -.048 2008 Kenji Johjima .226 .266 -.040 2008 Austin Kearns .242 .284 -.042 2008 Jeff Mathis .231 .269 -.038 2008 Omar Vizquel .239 .277 -.038 2008 Adrian Beltre .275 .319 -.044 2008 Mike Jacobs .259 .300 -.040 2008 Paul Konerko .243 .280 -.038 2008 Brandon Boggs .296 .342 -.046 2008 Jim Edmonds .246 .283 -.037 2008 Eric Hinske .267 .306 -.038 2008 Willie Harris .268 .306 -.038 2008 Jay Payton .266 .304 -.038 2008 Gabe Gross .272 .308 -.036Some team is going to get a steal in Jason Giambi. Willy Aybar is deserving of full-time action. Nick Swisher and Austin Kearns are a lot better than they showed in 08. Would you believe that Brandon Boggs had the highest xBABIP in 2008 of any player in our database? Jim Edmonds may not be done quite yet. Adrian Beltre is very underrated. While our model cannot explain all of the variation in BABIP, we believe that it is an improvement over current explanations of BABIP, as it takes into account many factors that influence a hitter’s BABIP. By finding players who over- and under-performed their expected BABIP, we can further isolate skill from luck, and infer that players such as Mike Aviles are likely to regress and player such as Nick Swisher are likely to improve. Here's a download of our results in an Excel file. References and Resources We owe a tremendous amount of thanks to Leanne, Dave, Jeremy, Steven and Kevin, who actively conducted the research with us, as part of Baseball Analysis at Tufts’ (BAT) Research Committee. The Committee, headed by Dutton, met once a week throughout the 2007/2008 academic year to discuss and conduct research, as well as analyze the results. BAT, founded by Bendix and Matt Gallagher in 2005, is the first baseball analysis club on a college campus. It has hosted such speakers as Bill James, Alan Schwarz, Keith Law, sportswriters from the Boston Globe and more. It continues to host various speakers and events, as well as provide a forum for intelligent baseball discussion and research on the Tufts campus. For more information, please contact Peter Bendix at (JavaScript must be enabled to view this email address) or Chris Dutton at (JavaScript must be enabled to view this email address). Chris Dutton and Peter Bendix established a sabermetrics fan club and research committee as Tufts University students in 2006. Bendix became co-founder and President of Baseball Analysis at Tufts (BAT) while Dutton founded and directed the research team. As a group, BAT has conducted a variety of research projects using economic analysis and statistical tools. Additional work by the authors can be found at Beyond the Boxscore, FanGraphs, and Bleacher Report. Commenting is not available in this weblog entry. Do you have a general question or comment for one of THT's writers? Send it in to our weekly mailbag We also welcome unsolicited op-ed pieces of approximately 500 words for consideration. We reserve the right to edit for length, clarity and consistency of style. Please include your whole name and location to be considered. If you have a comment about this specific article, please email the writer. Next Article: TUCK! sez: See ya, Mussina>> <<Previous Article: The 10 most interesting Rule 5 draft picks, 1981-2007 | ||||||||||||||||||||||||||||||||