Friday, February 20, 2009
Building a statistical model for RotohogPosted by Alex Zelvin at 1:01am
Rotohog Baseball is a fantasy baseball game with free entry, large prizes and a unique 'stock exchange' trading mechanism. Thousands of players compete in a global contest to see who can accumulate the most points. Like some "salary cap" baseball games, Rotohog gives you the opportunity to turn over your entire roster every day, greatly increasing the importance of taking into account factors such as opponent and park when determining your lineup.
Rotohog Baseball is scheduled to launch on Feb. 23. Assuming that the launch happens as scheduled, that means that this will be the last column I write before we know the rules (and the prizes) for 2009. What can we work on in the meantime, without knowing the specifics? One productive use of our time is to work on whatever statistical models we’re going to use during the season.
No matter where your statistical model stands (including "yet to begin") there is always something you can do to improve it. Despite the fact that I won Rotohog Baseball last year, the gaps and mistakes that I’m aware of in my statistical model are so extensive and so obvious that I would be embarrassed if anyone saw some of them. I have a long list of improvements to make before the 2009 season begins, and I have to confess that I haven’t gotten to any of them yet. I assume that most other competitors’ statistical models are in even worse shape.
If you haven’t started building a statistical model for Rotohog yet, where should you begin? I’m going to describe a few steps to get you started. These aren’t going to make you one of the top teams overnight, but they’ll give you a good starting point onto which you can continue to make additions and improvements. I should also warn you that this is going to take some time and effort. That’s why now is the time to get everything prepared. Once the season starts, you’ll probably find that you just don’t have the time to devote to it.
Choose a set of projections. This is the easiest step. It’s also probably the least important to spend much time on. There have been a number of studies of the accuracy of various projection systems, and the findings have generally agreed on one thing: Most of the top systems are fairly close in accuracy.
For hitters last year I used ZIPS, which usually has ranked near the bottom of the group of leading projection systems. I’m not sure how much I’d gain by using a more accurate system… most of the real differences in the systems come on younger players, many of whom won’t be good enough to figure into your Rotohog plans anyway. I did my own projections for pitchers, but wonder how much that really contributed to my success. In any case, once you’ve settled on a set of projections, you’ll need to get the data into Excel (or some equivalent spreadsheet software).
Link schedules to player data. For your statistical models to have much value, you’re going to need to make adjustments on a daily basis. That means you’re going to need a source of MLB schedules that you can get into a spreadsheet relatively easily. I found that the format that works best for me is the one that is available on mlb.com. But even that one requires me to manually use the "text to columns" command in Excel to get the data into a usable format.
To adjust your player projections to make contextual adjustments for today’s game, you’re going to need the row for each player to reflect the schedule—who today’s opponent is, whether the team is home or away, and what park the team is playing in. I make heavy use of Excel’s “vlookup” function for this. I may be doing something wrong, but I’ve never figured out how to make vlookup work on cells on a different worksheet, so make sure you copy the schedule (and projections) onto the same worksheet where you’ll be doing the calculations.
Adjust component statistic projections. I make all my adjustments at the component level. So I figure out what impact I think each factor (park, opponent, etc.) will have on each statistic and adjust the projections to reflect that. In some cases I think I gain a lot by doing it this way; in other cases I’m probably wasting my time. If you’re a little less ambitious, you can just estimate how much each factor impacts Rotohog scoring as a whole and use that as a multiplier for your projected daily score for each player.
Calculate expected points. Use your adjusted projections for the day to calculate an estimated score for each player for that day. I do this as my last step, but if you’re not making component level adjustments, then you’ll probably do this first, and then make adjustments directly to this estimate. Note that if you’re playing fantasy baseball formats other than Rotohog that also use daily transactions, you can calculation an expected daily score for each player in each of the scoring systems you’re interested in.
Compete against Alex and other players in one day fantasy baseball contests at Fanduel or visit his site, Daily Baseball Data, which has daily hour by hour weather forecasts for all games on one screen and batter vs. pitcher matchup data for the full day's schedule.