Over at FanGraphs, Dave Cameron took a look at how well WAR and actual wins match up this year:
For 2009, the correlation between a team’s projected record based on their WAR total and their actual record was .83. This is a robust number, especially considering that WAR is almost completely context independent and currently includes some notable omissions – base running (besides SB/CS, which are included in wOBA) and catcher defense are both ignored in the calculations. We also don’t have an adjustment for differences in leagues, so we’re not accounting for the fact that the AL is better than the NL.
Despite these imperfections, WAR still performs extremely well. One standard deviation of the difference between WAR and actual record is 6.4 wins, and every single team is within two standard deviations. Only four teams were more than 10 wins away from their projected total by WAR, with Tampa Bay ending up the furthest away from our expectation (96.6 projected wins, 84 actual wins), and 18 of the 30 teams were within six wins of their projected WAR total.
If I’m reading this correctly, Dave is essentially saying that because a teams win totals and their projected record via WAR are generally very close, it shows that WAR “works”. The commenters at FanGraphs seem to agree with him as well.
Maybe, it is just me, but I don’t understand why a high correlation between WAR and wins would verify the accuracy of the stat. The whole point of WAR is to attempt to separate luck from controllable skills. With a teams win totals being so highly influenced by such things as bad luck on balls in play and timing, you wouldn’t expect WAR to have a high correlation with wins. In fact, as Dave later shows in the article, Pythag record has a higher correlation to win totals than WAR. Does that mean that Pythag works better than WAR? Of course not; it just takes out less of the variance in actual wins than WAR, so it will naturally correlate better.
Let me be clear, I’m not saying WAR doesn’t work. It absolutely does. FanGraphs implementation of WAR, if I’m not mistaken, uses the average Linear Weights run values of each event for the year. That separates it from other stats like OPS or RC, which don’t necessarily have any empirical meaning, because it will literally match up perfectly with runs and wins in the aggregate. If you want a metric that shows how well your team would have played that year if timing was taken out of the equation, WAR is your guy.
I’m sure Dave knows all this, which is why it’s confusing to me to see WAR being compared to win totals to show it’s accuracy. Indeed, you would almost rather have WAR correlate poorly with win totals, as it strives to strip away all of the luck associated with them. WAR and wins measure two different things, and saying that the former works because it correlates with the latter makes no sense. A better test of WAR, in my opinion, would be to show how it projects future wins, because you have no reason to expect a teams good or bad luck to continue.