November 7, 2009
Order NowThe Hardball Times Baseball Annual 2010 is now in development and will ship in mid November! This year's book will feature articles by THT's staff as well as Bill James, Tom Tango and Craig Wright. If you use this link to purchase the Annual, you will be in the first group to receive it and you'll be supporting THT. ![]()
Rich Barbieri
John Barten Brian Borawski Craig Brown Evan Brunell David Gassko Jonathan Hale Brandon Isleib Chris Jaffe Max Marchi Bruce Markusen Harry Pavlidis Jeff Sackmann Dave Studeman Steve Treder Bryan Tsao Tuck! Dan Turkenkopf Colin Wyers Geoff Young John Brattain And here's the full roster.
Or you can search by:
Gear up for baseball season with Chicago White Sox tickets and New York Yankees tickets. LA Angels tickets, Houston Astros tickets, and Atlanta Braves tickets are hot sellers! You can get Boston Red Sox tickets, San Diego Padres tickets or Chicago Cubs tickets for your favorite baseball fan. Coast to Coast Tickets has the best MLB tickets like Minnesota Twins tickets, LA Dodgers tickets, Milwaukee Brewers tickets, New York Met tickets and St. Louis Cardinals tickets. Find premium Chicago Cubs tickets and other Chicago tickets at JustGreatTickets.com. Chicago Cubs Tickets Chicago Tickets ![]() All content on this site (including text, graphs, and any other original works), unless otherwise noted, is licensed under a Creative Commons License. |
How well can we predict ERA?by Colin WyersJune 18, 2009 In case you haven't noticed, The Hardball Times is becoming all component ERA, all the time. And how could I miss out on all the fun? So let's do some testing. How accurate are these things, anyway? Ability vs. ValueI have written at length before on the basic principles of ability (or true-talent level) verus value. There's just one point I want to come around and reemphasize. People tend to lean upon defense-independent estimates of pitching performance because they better predict future performance. (And, strictly speaking, they do.) This leads to a lot of fantastic confusion about the issue, with the argument being that if we want to look at past performance, we should ignore defense-independent measures and look at actual results. This is wrong for the same reason that we look at a pitcher's ERA instead of his win-loss record. A team does not consistently score the same amount of runs every game; thus it is possible for different pitchers, even different pitchers on the same team, to have vastly different amounts of run support. This is not a function of pitching, and the credit or blame for this should not righly be assigned to the pitcher. It is the same with defensive support. Two pitchers, even two pitchers on the same team, cannot be presumed to have the same quality of support from their defense. Defense-independent pitching statistics seek to give us a way to compare pitchers with different defensive support fairly. But for a value measure, we do not care if a result came from luck or skill. We attribute defensive performance to the defense, not because the pitcher has no control over it, but because someone else does have control over it. Home runs, on the other hand, are not under the purvue of the defense (except for a few, very rare cases). Thus, for a value metric, it is appropriate to credit a pitcher for the precise number of home runs allowed, and not an estimate thereof. A look at the contestantsThis is not meant to be an exhaustive survey of the entrants. I have picked three stats that are readily available to the public and that are relatively easy to compute. (And for which the means of doing so have been made public.) All are linear estimates, and their accuracy could be improved by creating dynamic versions of them.
The testThis is an ability test, not a value test. (Which is why we can include xFIP in the judging.) It's similar to split-half reliability, but instead of using correlation I'm using root mean square error. That lets us know the average of the magnitude of the error between the two samples. In other words, I split each pitcher season from between 2003 and 2008 into two sets of games: those pitched on even-numbered days, and those on odd-numbered days. I then tested to see how well performance in even-numbered days predicted performance in odd-numbered days. The results:
ERA predicts future ERA rather poorly, with a staggering RMSE of 2.32. In other words, a pitcher with an ERA of 4.00 in the even-numbered sample typically had an ERA ranging anywhere from 1.68 to 6.32. That tells us hardly anything at all. Our best estimator, xFIP, gives us a range of 2.22 to 5.78, much better but still not particularly helpful. One reason that none of these estimates are able to come very close is because they average only 30 innings per each split half. As the number of innings pitched goes up, our RMSEs go down:
Please do not attend to any one number too closely, because when we slice the data up like this we tend to introduce minute errors due to sample size. But note this: Even the smallest RMSE for a pitcher with 110 innings in each split half typically has a range from 3.26 to 4.74, assuming an ERA of 4.00 in the first split half. It's intensely difficult to tell who the best and worst pitchers are, even with a seemingly large amount of innings pitched. The answer is to use more innings. Use as many innings as you can. And regress to the mean. This is what we have projection systems for, and at some point of complexity in our ERA estimators it's better to admit what our purpose is and turn to a projection system instead. And as the number of innings increases, the difference between our entrants shrinks readily. It seems that very quickly we run up upon a point of diminishing returns from incorporating batted ball data into our estimates. Baby out with the bathwaterOne thing to note is that all of these are component ERAs, which means that they calculate an estimated ERA (or RA) based upon a pitcher's components. This, for one matter, strips out a lot of "luck," if we want to define luck as timing. In other words, an inning that looks like: Walk, Strikeout, Groundout, Home Run, Strikeout has a very different result than Home Run, Strikeout, Walk, Strikeout, Groundout even though the pitcher had a roughly equivelent performance. That's typically "luck" or "noise," and over time it cancels out. But does it always? Perhaps not. There are talents, such as the ability to pitch well out of the stretch, or the ability to induce more grounders in double play situations, that may not being accounted for here. In those cases, we're simply throwing the baby out with the bathwater—discarding talent along with noise. And for a value metric, we don't much care to wipe out all the noise that comes with a pitcher's performance, simply to neutralize the effect of a pitcher's defense on his performance. Is there a way for us to account for a pitcher's defense, without resorting to component ERA? I think there is. Simple Zone Rating should suffice nicely, don't you think? See you next week. References and Resources RMSE was computed using a weighted average. The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at www.retrosheet.org. I alluded earlier to using a dynamic instead of linear model for FIP. I have constructed one in the past. LIPS does much the same, but using batted ball data akin to what tRA does. Colin Wyers knows exactly how much of a nerd he is. He is very interested in hearing about any other concerns you may have; you can reach him by e-mail, and he will try his best to respond in a timely fashion. He also blogs at Statistically Speaking. Commenting is not available in this weblog entry. Do you have a general question or comment for one of THT's writers? Send it in to our weekly mailbag We also welcome unsolicited op-ed pieces of approximately 500 words for consideration. We reserve the right to edit for length, clarity and consistency of style. Please include your whole name and location to be considered. If you have a comment about this specific article, please email the writer. Next Article: THT Daily: News in San Diego>> <<Previous Article: Adjusting steals for win value | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||