February 9, 2010
Order NowGet "The world champ of baseball annuals." The Hardball Times Baseball Annual 2010 features articles by THT's staff as well as Bill James, Tom Tango and Craig Wright and contains much, much more. Please support THT and use this link to purchase the Annual. Get the fantasy book that everyone's raving about! Edited by THT Fantasy's Rob McQuown and Michael Street, and featuring our own Matt Hagen on prospects. Shipping now from ACTA! ![]()
Pat Andriola
Rich Barbieri John Barten Brian Borawski Craig Brown Evan Brunell Chuck Brownson Kevin Dame Joshua Fisher David Gassko Jeremy Greenhouse Brandon Isleib Chris Jaffe Max Marchi Bruce Markusen Dan Novick Harry Pavlidis Alex Pedicini Jeff Sackmann Nick Steiner Dave Studeman Steve Treder Bryan Tsao Tuck! Geoff Young John Brattain And here's the full roster.
Or you can search by:
Gear up for baseball season with Chicago White Sox tickets and New York Yankees tickets. LA Angels tickets, Houston Astros tickets, and Atlanta Braves tickets are hot sellers! You can get Boston Red Sox tickets, San Diego Padres tickets or Chicago Cubs tickets for your favorite baseball fan. Coast to Coast Tickets has the best MLB tickets like Minnesota Twins tickets, LA Dodgers tickets, Milwaukee Brewers tickets, New York Met tickets and St. Louis Cardinals tickets. Find premium Chicago Cubs tickets and other Chicago tickets at JustGreatTickets.com. Chicago Cubs Tickets Chicago Tickets ![]() All content on this site (including text, graphs, and any other original works), unless otherwise noted, is licensed under a Creative Commons License. |
The great run estimator shootout (part 1)by Colin WyersApril 09, 2009 How should you measure a position player's offense? You have a lot of options, of course. Most baseball websites will give you one—maybe more!—of an alphabet soup of offensive measures. You're left to pick and choose between them as you please. How are you supposed to know which is the best? So let's put ten different methods of measuring a player's offensive production through the wringer, and see which one comes out on top. I am going to start from the presumption that you belive in something a little more modern than RBIs and batting average to evaluate how good a hitter is. I am also not going to spend a lot of time on the finer points of run estimators; this is essentially a spin-off of my work for this year's THT Annual, and so anyone looking for a lot of background will be best served looking there. So, for this week, let's take a gander at the contestants and take them for a quick spin through some typical accuracy tests. First, let's talk about the types of run estimators we are testing this week.
We'll look at some popular ones, some oldies but goodies, and some of my personal favorites (as well as one or two I've created myself). Then everything goes through the ringer. Unless otherwise noted, the categories under consideration are:
In a player value metric, it may be useful to remove intentional walks; for an accuracy study I thought it was prudent to include them. The period under study is 1993-2008, the "modern era" of baseball offense. Limiting the scope accomplishes two things - it lets us evaluate run estimators as they pertain to evaluating players right now, and it lets me get away with using less computer processing power to run some of these tests. Dynamic Run EstimatorsThis is not meant to be a comprehensive—or even fair—assesment of dynamic run estimators; mostly we're interested in linear run estimators when it comes to player evaluation (with one key exception). Runs CreatedProbably the oldest run estimator still in use, Basic Runs Created follows this simple formula: OBP*SLG*AB And that's the version we'll test here. There are increasingly more complex versions of RC available; I don't care for any of them (again, see the Annual for more detail on this point). So why test RC (yet again)? And why the most simplistic version available? The answer is VORP. The core of VORP is Marginal Lineup Value, which was a groundbreaking way of applying a dynamic run estimator to a player's performance. But it's quite possibly the most popular use of a run estimator in existence, and so in the test it goes. BaseRunsThe absolute king of battle when it comes to dynamic run estimators. Created by Dave Smyth, BaseRuns follows this simple concept: Runs scored = Baserunners * % of runners who score + Home Runs Where the percentage of runners who score is estimated based upon the hitting performance in question. That's all I really have to say about BaseRuns right now; it's included mostly because I like it a lot and am trying to spread the word. Rate statsThese are things that were, for the most part, not concieved of expressely as run estimators. In fact, most of them don't expressely measure anything at all—they correlate with run scoring, but on their own do not directly measure anything, be it runs, bases or otherwise. All of them require the expenditure of additional effort to put them on the scale of runs scored. This requires careful attention; doing this incorrectly can make the rate appear less accurate than it really is. OPSOPS, as you are probably aware, stands for On-Base Plus Slugging. The formula, as you can probably surmise from the name, is: OBP+SLG It is very easy to calculate on your own, and ubiquitious enough that you don't have to. And it does seem to correlate pretty well with team run scoring, as you can see in this graph: The important feature of this graph to note is the slope of the line: the graph tends to increase more on the horizontal axis than the vertical axis. The relationship between runs scored and OPS relative to average is about 2:1, and this is why a lot of run estimator studies underrate OPS; they don't take that into account when translating OPS into runs. OPS+A popular derivation of OPS is OPS+, popularized by Baseball Reference. Many people mistakenly believe that OPS+ is OPS divided by the league average OPS. The actual formula is: OBP/LgOBP + SLG/LgSLG - 1 where LgOBP and LgSLG stand for the league average OBP and SLG. This bothers a great deal many people, although it really shouldn't; this provides OPS+ with a more intuitive 1:1 scale with runs scored, and does a better job of capturing the relative importance of OBP and SLG in run scoring. Gross Production AverageAnother close relative of OPS is GPA, created by THT's own Aaron Gleeman. The formula: (1.8*OBP+SLG)/4 This also corrects for the relative importance of OBP and SLG in run scoring; dividing by four puts it on a scale similar to batting average, which may be more intuitive for some. Like OPS, and unlike OPS+, its relationship to runs scored is close to 2:1. Equivalent RunsNominally a run estimator, it shares a lot of similarity with the OPS-based measures above. (Clay Davenport even says so.) The basic formula is: (Hits + Total Bases + 1.5*(BB + HBP) + SB)/(AB + BB + HBP + CS + SB/3) The numbers produced by this formula tend to look a lot like OPS numbers; this raw measure even has the same basic relationship to runs scored that OPS has. This is then further translated into Equivalent Runs and Equivalent Average. Total AverageTotal Average is the most popular of an innumerable number of bases per out measures. Created by Washington Post scribe Tom Boswell, the basic formula is: (TB + BB + HBP + SB)/(AB - H + CS) It comes pretty close to a 1:1 ratio with run scoring. The main reason I bring it up is the same reason that you studied World War II in high school: Those who cannot remember the past are condemned to repeat it. Somewhere, right now, on a message board or website, there is a young man who is proposing this as the Next Big Thing that will change the way we look at baseball players. Please, do not be that young man. Linear weightsThe difference between these formulas and the rates above is that direct attention is paid to the value of each event in runs, rather than valuing events in relation to each other and then assigning a run value to the result. There are countless linear weights formulas, and I have no real desire to drag them all into this. I have chose three formulas to stand in for the group as a whole. wOBADeveloped by Tom Tango, wOBA is an ingenious recasting of a linear weights formula as a rate, on the scale of OBP. It has gained even more popularity with its use on Fangraphs.. A sample wOBA formula: (0.72*BB + 0.75*HBP + 0.90*1B + 1.24*2B + 1.56*3B + 1.95*HR) / PA To convert this figure to runs, use: ((wOBA-lgwOBA)/1.15+.18)*PA The beauty of any linear weights framework is that you can change the weights to reflect the particular run environment in question; Fangraphs uses seperate weights for each season. RegressionSome people like to estimate linear weights using a process called multiple linear regression. I have done this here, using team runs scored from 1993 to 2008. The formula looks like this: 0.55*1B + 0.73*2B + 1.2*3B + 1.46*HR + 0.32*(BB+HBP+IBB) -0.11*(AB-H)+ 0.17*SB -0.08*CS I have no reason to think this is a very good linear weights formula. I do not recommend its use at all. It is in this evaluation as a stand-in for regression-based LWTS in general. Please do not use this formula. Thank you. HouseThese are my own linear weights, developed primarily for my personal use. I use an approach similar to one used by Tom Ruane, which looks at the change in base/out state after an event. As with wOBA, these weights are tuned by season; an average set of these weights looks like: 0.47*1B + 0.76*2B + 1.04*3B + 1.40*HR + 0.32*BB + 0.35*HBP + 0.20*IBB - 0.09*(AB-H) + 0.17*SB - 0.46*CS Why do I call them my "house" weights? Because the house always wins. And in every run estimation study I've read by the creator of a run estimation formula, the creator's own formula always wins. There are generally two reasons for this:
So for the purposes of this study, these are my house weights. I'm trying to paint myself into a bit of a corner, by publishing this before I actually run all of my run estimation tests. So no publication bias here, hopefully. Qualifying trialsThere are three traditional tests that are staples of the genre. We may as well take a look at them here.
So here's how we're testing these. With the exception of the dynamic run estimators (who don't need the help) and the regression-based weights, all of these run estimators are being tuned to the specific year these stats are coming from. Again, we're looking at the years 1993-2008. But unlike most tests, which look at runs scored by team, these results are by half-inning:
Looking only at the linear run estimators, there isn't a lot to differentiate any one from the other. Once you tune a linear offensive measure to the particular run environment (which is true for all of these measures), there is very little to differentiate them from one another in these tests. This would be fine, if we could count on these measures to agree at the individual player level. But we can't. What we need are better tests. That'll be Part 2. References and Resources A great resource for learning more about run estimation is Patriot's site. Also helpful is this series of posts on Patriot's blog. You can also look at the entire set of "house" weights, if you like. The wOBA values are taken from my adaptation of Tom Tango's work, and should be very close to the values used at Fangraphs. Colin Wyers knows exactly how much of a nerd he is. He is very interested in hearing about any other concerns you may have; you can reach him by e-mail, and he will try his best to respond in a timely fashion. He also blogs at Statistically Speaking. Commenting is not available in this weblog entry. Do you have a general question or comment for one of THT's writers? Send it in to our weekly mailbag We also welcome unsolicited op-ed pieces of approximately 500 words for consideration. We reserve the right to edit for length, clarity and consistency of style. Please include your whole name and location to be considered. If you have a comment about this specific article, please email the writer. Next Article: TUCK! sez: (Still) D-railed>> <<Previous Article: Book review: Cubs By The Numbers | ||||||||||||||||||||||||||||||||||||||||||||||||