For a few years now, I’ve been running a set of baseball player forecasts that are affectionately called “The Marcels”, or unaffectionately called the “Marcel The Monkey Forecasting System.” The idea behind The Marcels is that anyone can look at the back of baseball cards and come up with a decent estimate for the upcoming season.
For every player, Marcel does a three step process:
1. Look at the performance of the player over the last three years, giving more weight to the most recent seasons.
2. Regress the player’s performance toward the league mean, based on the number of plate appearances or innings pitched. The more data, the less you regress.
3. Apply an age adjustment.
There are many, many problems with this. That’s why I call it a monkey system. I developed the worst forecasting system that you could possibly accept. And it takes minutes to run. Any other forecasting system out there, that spends countless hours to design, develop and execute, better be able to beat this system.
Interlude – One Year Ago
Tangotiger – Thursday, January 27 2005 @ 10:43 PM EST
Look at all the guys forecasted for 28 to 30 home runs: Adrian Beltre, Gary Sheffield, Carlos Delgado, Mark Teixeira, Andruw Jones, Alfonso Soriano, Miguel Tejada, Todd Helton, Lance Berkman, Paul Konerko, Rafael Palmeiro, Jeromy Burnitz, Carlos Beltran. Half of those guys will hit more than 29 home runs, and half will hit less.
But, you, me, and everyone else has no idea who will hit 30 or 35 home runs. Bad luck, good luck, injuries, whatever … everything plays a role in this. Marcel’s best guess is that those 13 hitters will average 29 home runs. If you wanted Marcel to forecast number of home runs without attaching names to it, that’d be a lot easier, and the range would be wider. Think of these forecasts as over/unders.
Interlude – One Month Ago
TangoTiger – Wednesday, January 18 2006 @ 04:49 PM EST
It’s one year later. OK, so I decided to find out what those 13 guys I listed did. Here they are:
Player HR Andruw Jones 51 Mark Teixeira 43 Paul Konerko 40 Alfonso Soriano 36 Gary Sheffield 34 Carlos Delgado 33 Miguel Tejada 26 Lance Berkman 24 Jeromy Burnitz 24 Todd Helton 20 Adrian Beltre 19 Rafael Palmeiro 18 Carlos Beltran 16
6 guys with more than 30, and 7 less than 28. Average? 29.5.
I think this illustrates the power and foolishness of forecasting. At the group level, forecasting systems are very powerful. Here we have results of 13 hitters that are as far apart as possible. And yet the mean of their actual results was virtually a match for the mean of the expectation. Did I just get lucky here? That first post last year was done because someone brought up Teixeira, who was forecast with 29 home runs.
So, let’s try a few more shall we? Let me throw a number out there: 90 RBIs. I swear I’m doing this as I’m writing this. There were 13 guys forecast between 88 and 92 RBIs, and this is how they did in 2005:
Player RBI Mark Teixeira 144 Andruw Jones 128 Hideki Matsui 116 Carlos Lee 114 Jeff Kent 105 Paul Konerko 100 Aramis Ramirez 92 Hank Blalock 92 Jim Edmonds 89 Adrian Beltre 87 Todd Helton 79 Carlos Beltran 78 Barry Bonds 10
Average? 95. Not bad. Of course, some of you are thinking that the overall average was brought down by Bonds, but others could say the overall average was brought up by a few guys. The median was 92.
The highest forecasted RBIs were 112 (Tejada), 110 (Pujols), and 108 (Ortiz). What is this, the 1980s? If you had wanted me to only forecast RBIs, and not tell you who would do it, I would have said 150. Why would I give a number like that? Because from 2001 to 2004, the four highest RBI totals were 160, 150, 146, 145. It would therefore be reasonable to think that the league leader will be around 150. The league leader in 2005 had 148 RBI. So, I would have been pretty close, as an over/under.
But, how sure could I have been that it would be Ortiz? You could come up with a reasonable list of 15 or 20 players that would lead the league in RBI. But, that’s not what we area trying to figure out. We are trying to come up with reasonable over/unders, numbers that you could find equal reasons where the player will over-perform and under-perform. Injuries, as we know with Bonds, can devastate any forecast.
Let’s look at pitchers. Marcel had 8 pitchers with 14 wins. Now, before I run this while you watch me, I have to say that one of the weakest part of the forecast would be wins. It’s heavily team-dependent, and Marcel doesn’t even look at a pitcher’s ERA to make its forecast. Marcel only looks at the wins totals of the prior years. Anyway, let’s see what happens.
Player Wins Bartolo Colon 21 Mark Buehrle 16 Johan Santana 16 Pedro Martinez 15 Roger Clemens 13 Jason Schmidt 12 Curt Schilling 8 Russ Ortiz 5
Average? 13. Median? 14. You just never know when injuries hit. OK, how about strikeouts? How about 150? There were 14 pitchers forecast with between 140 and 160 strikeouts. Let’s go to the tape.
Player Strikeouts Jake Peavy 216 Carlos Zambrano 202 Javier Vazquez 192 Mark Prior 188 Brandon Webb 172 Barry Zito 171 Josh Beckett 166 Livan Hernandez 147 Mark Clement 146 Freddy Garcia 146 Rich Harden 121 Ted Lilly 96 Kerry Wood 77 Kelvim Escobar 63
Average forecast? 151. Actual? 150. Look, I’m getting as bored as you are. Let me try one last one, probably the hardest of them all: saves. It’s team-dependent and manager-dependent. Eleven pitchers were forecast for 20 to 29 saves.
Player Saves Trevor Hoffman 43 Jason Isringhausen 39 Billy Wagner 38 Francisco Cordero 37 Eddie Guardado 36 Jose Mesa 27 Armando Benitez 19 Keith Foulke 15 Danny Kolb 11 Troy Percival 8 Jorge Julio 0
Average forecast? 23. Actual saves? 25.
So, what to do? Trust the forecast for a group of players, but don’t go betting on any one forecast. There’s not a single person in the whole world who can help you there. There’s no book, there’s no program, there’s nothing to help you with any single forecast. That’s why we play the darn game, and that’s why we love its drama.
While you keep your nose out of your spreadsheet, here’s the Marcel 2006 forecast for you to download.