Aaron (YYZ): Doesn’t the AL ROY almost have to be Elvis Andrus? He’s the 8th most valuable SS (by WAR) in all of baseball (and the 9th best SS offensively by wOBA)
Christina Kahrl: Why use wOBA when EqA’s testably more accurate?
BP has tested the accuracy of EqA before; however it’s notable that wOBA was not included in the comparison. So I thought I would run a quick test to see if EqA really is more accurate. I used this as my formula for calculating Equivelent Runs and this to calculate runs as per wOBA.
FanGraphs figures wOBA as far back as 1974, and if we extend that far back, looking at RMSE, we discover that in fact EqA bests wOBA by a slight amount in our typical tests, looking at team runs per season:
EqR

wOBA_R


Correl.

0.97

0.97

RMSE

27.3

27.6

MAE

21.4

21.9

I would go ahead and call that essentially a dead heat, but it wouldn’t be unreasonable to declare victory for EqA.
But of course the question was about Elvis Andrus in 2009, and from 1980 onwards the advantage shifts to wOBA. Narrowing it down to 19932008, the modern offensive era:
EqR

wOBA_R


Correl.

0.97

0.97

RMSE

28.3

27.5

MAE

21.9

21.7

Looking at RMSE, wOBA is better by about a run per season. I’d consider that more significant overall, and far more relevant to Elvis Andrus.
Adam Guttridge said...
The problem is, aggregate accuracy is absolutely not the litmus test of a run estimator.
Teams represent a very narrow profile of hitter. The vast majority will have an OBA .300.350, a SLG .390.470, a BA .250.280, 140210 HR. Thus, if the contant for a HR were ‘off’ by 20%, it could be balanced out by an offsettingly (hehe… nice word) bad singles constant…. since there is uch little diversity in team btting, it would rarely show a meaningful error.
All systems (wOBA, RC, LWTS, BaseRuns, XR, ETC) are going to do 99.5% as well as the next in terms of estimating team runs (literally).
Put another way…. what if you found that you could make EqA .0002% more accurate by switching the stolen base constant to 2.3 and the hit by pitch constant to 3.1? I’m sure you could find a way to do such a thing. Does that make EqA ‘better’? Well, it makes it more accurate… but those are just numbers pulled out of lala land, with no basis in things that happen on a baseball field. They will succeed in producing a more accurate (on average) prediction of team runs scored. But when you apply that same formula to any player who falls outside the very narrow profile that teams represent as hitters, all bets are completely off.
Thus, a systems logic basis is king. Ergo, LWTS (or maybe BaseRuns) is king.
Colin Wyers said...
Adam, wOBA is simply a system of linear weights. n fact, what it really is is a method of expressing LWTS as a rate.
Talking about wOBA’s accuracy is something of a red herring – you could do wOBA with Estimated Runs Produced or with my House weights if you wanted to, and the accuracy would be driven by the underlying weights.
But wOBA the framework has become associated with a particular set of linear weights values produced by Tom Tango and used by Fangraphs, so 99% of the time when someone talks about the accuracy of wOBA that’s what they mean and that’s what I tested.
If we want to discuss better tests of run estimator accuracy, well, I love discussing that. (Check last year’s Annual for an example of a more accurate test that I’ve done, at the halfinning level.) But for the purposes of a quick blog post, teamlevel RMSE is enough to get the basic point across. (I also looked at wOBA versus EqA at the game level for this test, and wOBA still “won” in the 9308 band.)
Adam Guttridge said...
Right… I understand the calculation of wOBA, and what you were trying to get across.
Allow me to rephrase; I think my broader point is that ‘tests’ in general only tell you so much about a run estimator in terms of accuracy, because they can only measure it in the macro. We can’t say “RC says Carlos Pena produced 46 runs, but he ‘really’ produced 43”…. there’s just no way to do that. So we are forced to look at teams. But that’s a very narrow constraint, so that standard is irrelevant if we were to apply it to many (or most?) players in the league.
So to say run estimator x is better than run estimator y…. unless one is just comically bad (and none of the popular methods are), an aggregate accuracy test isn’t evidence of anything. If we had an omnipotent way of accounting for individual run contributions, it may turn out the system with the lowest aggregate (team) accuracy (you know…. 95.7% instead of 96.3%, or something) ends up the most accurate overall.
What I’m advocating is evaluating the logic. Linear Weights ‘works’ because the constants are derived from actual baseball games, and the formula is set up to capture how events actually lead to runs. EqA, on the other hand, is just an asenine collection of numbers. So when one tells you Seth Smith is worth 23 runs, and the other tells you he’s worth 19, which should you believe?