Today I’d like to take a second-cut (in a second-best fashion) at John Burnson’s nice article last week on near-sighted fantasy players. In his article, Burnson used two forecasting systems, Marcels, and Near-sighted Marcels (NSM, for short), to find some baseball players for next year that were likely to be overvalued by fantasy players that focused mainly on this year’s performance. After reading his article, I was interested in answering some questions: How much worse off are near-sighted fantasy players? And is near-sightedness actually helpful in forecasting certain types of players?

As a reminder, Tom Tango’s Marcels forecasting system is intentionally simple (so simple that Marcel the Monkey could do it). It takes a weighted average of a player’s last three seasons and regresses it to the mean somewhat. All NSM does is put more weight on a player’s more recent performance than Marcels does. Here’s the percent weight that each system puts on past performance:

’08 | ’07 | ’06 | |

Marcels | 41.6 | 33.3 | 25 |

NSM | 80 | 15 | 5 |

How much better or worse is NSM for fantasy? To answer this, I have to make a few adjustments, some of which are harmless while others are necessary but unfortunate. First, I need a metric to measure performance. I’m going to use mean absolute deviation (MAD) (aka mean absolute forecast error). Another possible metric is root mean squared error (RMSE), but that’s for another day. I’m going to compare OPSs, since that’s what John did in his article and also it saves me a few computational steps. Instead of forecasting 2010 like John did, I’m going to use 2006-2008 to forecast 2009 and compare the forecasts with the actual data we have so far (since I’m using OPS, which is a rate, it doesn’t really matter that we haven’t completed the season yet).

Next I have to “fantasize” the OPS forecasts and sample. Fantasizing the sample means dropping players that are projected to have too few plate appearances or are projected to perform too poorly to be fantasy relevant in most leagues. For today, I dropped any player with fewer than 200 projected plate appearances or less than a .700 OPS (for these projections I used Marcels only so that I would have a consistent sample across forecasting systems—nothing really changes if you use NSM for this step instead). Alas this introduces the possibility of sample selection bias. I also didn’t include players without three years of usable data – just like John did.

Fantasizing the forecasts implies removing the means of each forecast. Player values are based on performance relative to league average—so that is how we should measure the value of our forecast systems (though in this case, the average is a sample average and not a league average). Lastly, I’m going to weight each player by the number of plate appearances he’s actually had in 2009 (though this doesn’t make a qualitative difference for any result).

So what’s the MAD?

MAD(OPS) | |

Marcels | 81.8 |

NSM | 83.2 |

Indeed, NSM has a higher MAD and thus a worse performance than Marcels. But the difference is numerically negligible and not statistically significant. The difference is not statistically significant because, frankly, reweighting doesn’t do all that much compared to the overall forecast errors that are inherent in either system. Both NSM and Marcels “regress to the mean” of league performance identically- a step which probably accounts for a large part of either system’s success.

I will follow Burnson in calling the difference between the forecasts for each player as that player’s “sentiment”. A positive sentiment means NSM is more optimistic about the stat than is Marcels. One thing I wondered was how sentiment varied by age and whether there was any variation in forecast performance by age.

Age | Sentiment | MAD(Marcels) | MAD(NSM) |

24 | 1.4 | 110 | 110 |

25 | 5.8 | 79 | 76 |

26 | 7.4 | 81 | 88 |

27 | 11.0 | 74 | 68 |

28 | -1.7 | 76 | 78 |

29 | 0.0 | 68 | 61 |

30 | 3.8 | 102 | 98 |

31 | -1.7 | 71 | 81 |

32 | -3.7 | 100 | 114 |

33 | -2.5 | 75 | 74 |

34 | -5.0 | 60 | 55 |

35 | -4.7 | 100 | 102 |

36 | -0.4 | 118 | 121 |

37 | 4.4 | 87 | 88 |

38 | -3.1 | 098 | 118 |

We can see a slight age based pattern here: NSM is sentimental about younger players whereas Marcels is more “nostalgic” about about older players. Interestingly enough, this sentimentality and nostalgia are somewhat appropriate. NSM does a little better at forecasting the younger players while Marcels does better with the older ages. In neither case is the difference statistically significant though.

What about variation by player performance? Is NSM more sentimental about better players? As we can see in the scatterplot with the fitted line, the answer is a qualified yes (again there’s too much noise in the data for statistical significance). This isn’t terribly surprising. For NSM to be sentimental about a player, that player must have done particularly well in 2008 relative to his 2007 and 2006 campaigns. Some of this is luck perhaps, but to the extent that any of the improved performance is persistent, this luck/skill will carry over into 2009 as well.

To conclude: Being sentimental can hurt you, but being a sentimental monkey (that is using Marcels with different weights) doesn’t hurt you that much and actually may help you with young players. That said, sentimental monkeys are pretty smart since they regress to the mean. Tango made his monkey fairly simple-minded and that includes his 5/4/3 weighting system. In his explanation of Marcels, Tango does not explain where he came up with the particular weights. Probably, in order to keep it simple, it is just a rule of thumb. So it would be interesting to see what the optimal weighting system would be (i.e. the one that provided the best forecasting performance using a certain metric). It might not be all that far from NSM.

Jeremy B. said...

Nice article, Jonathan,

I was thinking while reading this that different positions, particularly “Pitchers” versus “Batters”, using fantasy lingo, might have different optimal weights. Catchers might also have different optimal weights from the other “Batters”.

I don’t really have a hypothesis about what the different optimal weights would be for pitchers vs. batters, because looking at it one way, pitchers might supply around 1.25 days per week of data while hitters supply around 5 days per week of data, but looking at it another way, they both pitch/bat approximately the same number of at bats per week… except for catchers…

Toffer Peak said...

Jeremy B. – Not sure where I’ve read it but I’m pretty sure that Marcel already weights pitchers by 3/2/1 rather than 5/4/3 like he does for hitters.

I also agree with you observation that different positions (or at least catchers) should probably be weighted slightly differently. Catchers just seem to break down much more quickly. Same with large, oafish DHs/1Bs. I’m sure he won’t change it though in order to keep it as simple as possible. Maybe Zips, CHONE, etc. do consider these though.