Here’s the punchline: “Regressing to the mean” is different than “Mean reversion”. Lots of experts say “regression” when they should say “reversion”. Does it matter? Well, lots of experts aren’t very clear about the numbers either, so, yeah, I’d worry that they’re messing up the whole thing.

“Regressing to the mean” is a handy thing for accounting for small sample sizes and uncertainty. If you’re just listening and not doing your own projections, you hardly need to bother with this concept (but I’ll get back to it at the end).

“Mean reversion” is an absolutely essential idea to grasp. One way to think about it is “Luck fades”.

Consider these questions about Michael Morse, who’s current batting average is .300:

A) Suppose you know his true talent indicates that he’s a .280 hitter. What would you expect him to bat for the rest of the year?

Answer: You expect him to bat .280. Morse has gotten lucky with his .300 average. By definition, luck is unpredictable (for those statisticians out there: higher order statistics are predictable). Therefore what you expect, .280, is what you expect.

B) Suppose you know his true talent indicates that he’s a .280 hitter. What average would you expect him to have at the end of the year?

Answer: Morse should probably finish with about a .290 average: half a season at .300 and half a season at .280.

What get’s my goat is that I often hear experts say something like “There’s no way Morse is a .300 hitter. He’s getting lucky with his BABIP. He’s gonna regress to the mean and be batting .280 at the end of the season.”

The meaningful problem with this statement is that I have no idea what this expert thinks Morse is going to bat from this point forward. Does he thing Morse will bat .280 from today until the end of the season? Or does he think Morse will bat about .260 (so that his average at the end of the season will be .280)? If you’re looking for expert opinions, there’s a big difference here.

The semantic mistake, using “regression to the mean” when he should be using “mean reversion” isn’t substantive. But combined the expert’s statistical ambiguity, it should raise serious red flags.

Lastly consider this question:

C) You have no idea what Michael Morse’s true talent is. But in about 1,000 plate appearances, he’s batted just above .290. What is a good guess for his true talent?

Answer: Now you’d use “regression to the mean,” which discounts Morse’s personal performance according to how much we’ve seen him play. With few plate appearances, our best guess would be that he’s something like “league average” and we’d care very little about he numbers he has personally put up. With lots of plate appearances, we’d have a lot of faith that his personal numbers were more indicative of his talent than a league average player’s numbers. If he’s somewhere in between on appearances, we’d weigh the two (his versus the league average) by regressing to the mean.

chuck said...

so, in other words, yuou just dont really know…

Chicago Mark said...

My head is spinning! Most “experts” would predict a .280 average ROS. I really don’t read anywhere that would expect him to be at .280 year end. Anbd I agree with Chuck above. You and me and most others have little idea. Otherwise we’d all have predicted .300 correctly. I sure hope he regresses/reverts as I don’t have him in any league.

Dave Studeman said...

Thank you, Jonathan. Everytime I hear someone say that a player will “regress to the mean” I grind my teeth a bit. You’ve expressed exactly why.

Regression is something you do in your analysis. Performance reverts to expected levels. Good distinction.

Andrew said...

An important distinction, but you made a couple grammatical mistakes which detracted from your point. Specifically, you should have used ‘whose’ instead of ‘who’s’ and ‘gets’ rather than ‘get’s.’