Thursday, January 29, 2009
ForecastingPosted by Jonathan Halket at 2:23am
There are lots of fantasy baseball writers offering advice out there. Is it useful to listen to more than one? Some websites charge money for access and all of them use up some of your limited time, so you definitely want to pick your advisors carefully. Advice generally falls into one of two categories: strategy (example: “Don’t pay for saves”) or forecasts (example: “Albert Pujols’ injury concerns are overblown”). In this article, I will write a bit about forecasting and whether it is better to listen to the best, to the stupid or to the many.
Forecasts are predictions of the future, a process which harbors two complications: life is random, so forecasts will never be exact (God does play dice with the universe), and we do not know the exact nature of this randomness (We don’t know what kind of dice God uses). Science is the search for truth: scientists want to know why things are the way they are, so they spend their time trying to figure out which dice God is playing with. Forecasters mostly care about their prediction being close to the eventual realization as possible; they don’t care about why (if they are going to make many more predictions in the future, they may want to learn why so that they can do better in the future. But for just a one shot prediction, forecasters don’t care about science).
To see why this distinction matters, consider the following two baseball experts:
Enlightened Stathead has a huge statistical model, years and years worth of data, and a supercomputer on loan from NASA which he uses to figure out which variables are statistically significantly correlated to other variables. However, Enlightened Stathead’s model is mis-specified—he’s missing a variable that adjusts for the fact that power has dropped in the league in the post-steroid era. This isn’t a mortal sin, all models are mis-specified in one way or another. If Enlightened Stathead forecasts using his NASA model, he’s on average wrong (this is called Biased in statistics), but, depending on the type of model mis-specification, on average he won’t be “very wrong” (he has a low forecast standard error).
Opinionated Idiot has a rather simpler way of forecasting. He watches each player play once in Spring Training and, based on only that data, forecasts. If a pitcher strikes out the side against the Marlins B-squad, he’s projected to be a superstar for the rest of the year. Note that Opinionated Idiot will likely have a very high forecast standard error but very little bias.
What’s the tension? The Stathead uses lots and lots of data—indeed he needs a lot of data in order to figure out what’s statistically important. This data is by definition historical; maybe he uses a century’s worth of data. The Idiot uses very little data and it is mostly of a recent vintage. With so little data, he can’t figure out much about the world he’s living in. But suppose the world he’s living in changes from time to time (God picks up a different set of dice), making old data potential much less useful in understanding a new world. Like the Stathead, the Idiot doesn’t know that God has switched dice, but since he doesn’t use old data, he doesn’t care much anyway. In baseball terms, the Idiot might adjust much faster to changes in league-wide steroid use, or at the individual level, his forecasts might adjust much faster to the fact that a pitcher has added a splitter to his arsenal than the Stathead’s would.
So whom should you listen to? Depends on the type of baseball gods we have (and I don’t know). If the baseball gods are the type to pick up new sets of dice very often, then the Idiot may be better. If not, the Stathead will be better.
And how many of them should you listen to? This is an incredibly complex question. Here’s a rule of thumb:
If we live in a Stathead’s world (not many new dice), then it will usually be better to listen to lots of different experts and form some sort of prediction based on the average of their opinions. This is the case even if you know that some of the Statheads are better (lower forecast standard error) than others—it is wise to include all of them in your average.
If we live in an Idiot’s world, then you’re kinda up the river without a paddle.
If you have a question for the Roster Doctor email here. Emails in simple text with players' full names properly spelled are much more likely to get responses. Also be sure to include your league's player pool (mixed, AL-only, NL-only), number of teams, scoring format (roto, head-to-head, points, etc.), categories, whether or not it's a keeper league, and any other pertinent information.