Wednesday, June 15, 2011
Regressing or reverting?Posted by Jonathan Halket at 5:27am
Here’s the punchline: “Regressing to the mean” is different than “Mean reversion”. Lots of experts say “regression” when they should say “reversion”. Does it matter? Well, lots of experts aren’t very clear about the numbers either, so, yeah, I’d worry that they’re messing up the whole thing.
“Regressing to the mean” is a handy thing for accounting for small sample sizes and uncertainty. If you’re just listening and not doing your own projections, you hardly need to bother with this concept (but I’ll get back to it at the end).
“Mean reversion” is an absolutely essential idea to grasp. One way to think about it is “Luck fades”.
Consider these questions about Michael Morse, who’s current batting average is .300:
A) Suppose you know his true talent indicates that he’s a .280 hitter. What would you expect him to bat for the rest of the year?
Answer: You expect him to bat .280. Morse has gotten lucky with his .300 average. By definition, luck is unpredictable (for those statisticians out there: higher order statistics are predictable). Therefore what you expect, .280, is what you expect.
B) Suppose you know his true talent indicates that he’s a .280 hitter. What average would you expect him to have at the end of the year?
Answer: Morse should probably finish with about a .290 average: half a season at .300 and half a season at .280.
What get’s my goat is that I often hear experts say something like “There’s no way Morse is a .300 hitter. He’s getting lucky with his BABIP. He’s gonna regress to the mean and be batting .280 at the end of the season.”
The meaningful problem with this statement is that I have no idea what this expert thinks Morse is going to bat from this point forward. Does he thing Morse will bat .280 from today until the end of the season? Or does he think Morse will bat about .260 (so that his average at the end of the season will be .280)? If you’re looking for expert opinions, there’s a big difference here.
The semantic mistake, using “regression to the mean” when he should be using “mean reversion” isn’t substantive. But combined the expert’s statistical ambiguity, it should raise serious red flags.
Lastly consider this question:
C) You have no idea what Michael Morse’s true talent is. But in about 1,000 plate appearances, he’s batted just above .290. What is a good guess for his true talent?
Answer: Now you’d use “regression to the mean,” which discounts Morse’s personal performance according to how much we’ve seen him play. With few plate appearances, our best guess would be that he’s something like “league average” and we’d care very little about he numbers he has personally put up. With lots of plate appearances, we’d have a lot of faith that his personal numbers were more indicative of his talent than a league average player’s numbers. If he’s somewhere in between on appearances, we’d weigh the two (his versus the league average) by regressing to the mean.
If you have a question for the Roster Doctor email here. Emails in simple text with players' full names properly spelled are much more likely to get responses. Also be sure to include your league's player pool (mixed, AL-only, NL-only), number of teams, scoring format (roto, head-to-head, points, etc.), categories, whether or not it's a keeper league, and any other pertinent information.