In The Hardball Times Annual this year (pre-order here), I have an article on predicting the MVP and Cy Young awards. I built a model in the article (actually, a few separate models), which tends to do a pretty good job, getting the winner right around two-thirds of the time (a little worse for the MVP and a little better for the Cy Young). Just to give you an idea: Last year, the model correctly predicted Cliff Lee’s Cy Young and Albert Pujols’ MVP; saw the National League Cy Young race as a dead heat between Brandon Webb and Tim Lincecum; and in the American League, it had Dustin Pedroia first among hitters, though it actually thought the award would go to Lee.
I thought it might be interesting to look at what the model has to say for this year’s races, and we can of course check its predictions once the awards are announced in November. Still, there’s nothing like jumping the gun and doing a little educated speculating…
The MVP awards are probably a little easier to project this year than the Cy Young awards, though it is usually the other way around. In the American League, the favorite appears to be Joe Mauer, though the model would have been horribly confounded had the Twins not made the postseason:
Rank Name MVP points 1. Joe Mauer 677 2. Derek Jeter 297 3. Mark Teixeira 224 4. Robinson Cano 153 5. Kendry Morales 137 6. Felix Hernandez 115 7. Miguel Cabrera 110 8. Jason Kubel 89 9. Alex Rodriguez 84 10. Jason Bay 82
To be clear, the model is based on a 1,000-point scale, where 1,000 points means a player got all the first place votes. You can see that Mauer comfortably leads the rest of the field, but had the Tigers won the division, he actually would have dropped into second place, with 275 points, while Miguel Cabrera would have shot up to third with 270. Because the model is exponential (which magnifies small differences), that would have meant that the top four candidates were basically indistinguishable. Luckily, with the Twins in the playoffs, the model picks a clear winner.
The rest of the top-10 on this list is pretty reasonable, though one name in particular came as a surprise to me: Jason Kubel. In fact, Kubel had a very good season, hitting .300 with 28 home runs and 103 RBI, with a sabermetric-friendly .907 OPS. I don’t know if he’ll finish eighth in MVP voting, but he’s had himself quite a season.
Let’s move on to the National League MVP, which should turn out about the same way it did last year, at least at the very top:
Rank Name MVP points 1. Albert Pujols 1,405 2. Ryan Howard 240 3. Prince Fielder 182 4. Hanley Ramirez 175 5. Ryan Braun 119 6. Chase Utley 115 7. Matt Kemp 95 8. Derrek Lee 74 9. Jayson Werth 65 10. Adam Wainwright 61
If Albert Pujols doesn’t win, it will be a bigger upset than this year’s Nobel Peace Prize. As I said, the predicted values are technically supposed to be capped at 1,000 points but really great performances sometimes go over. Really great pretty much sums up Pujols’ 2009 season: A .327 batting average, 47 home runs, 135 RBI, 124 runs, and a 1.101 OPS. By any measure, Pujols was the best player in the National League this season.
By the way, Ryan Howard’s second-place finish underscores the fact that the model attempts to represent actual voter behavior, as opposed to what should be. Though Howard might not have even been one of the 10 most valuable players in the National League this year (though he was definitely close), his numbers—.279 average, 45 home runs, 141 RBI—are what MVP voters like to see (plus his team made the playoffs), and that should propel his MVP placement higher than Howard might deserve. Last year, the model predicted the same top two, and indeed that’s what we got.
Just like in the American League, one name on this list comes a surprise to me: Jayson Werth. Though Werth has long been a valuable player, I had no idea that he was seventh in the National League in home runs (a category the voters seem to place a ton of weight on).
Predicting the AL Cy Young is where it gets a little hairy. Here’s how the model sees things:
Rank Name Cy Young points 1. Felix Hernandez 603 2. Zack Greinke 322 3. Roy Halladay 239 4. Justin Verlander 192 5. CC Sabathia 180
Because the model is exponential, there’s a certain sweet spot where a player’s predicted total can just take off; Hernandez is not actually that far ahead of Greinke (or the other candidates for that matter), but in the final weeks of the season, he hit that sweet spot, and the model sees him as a pretty clear winner now.
I’m not so sure. Even though Hernandez went 19-5 with a 2.49 ERA, I think there will be a lot of support for Greinke, who was 16-8 with a 2.16 ERA. Greinke got a lot of press by starting off hot, and I believe that will help him, though it’s something the model has no way of knowing. If Hernandez had won just one more game, he’d definitely be the Cy Young, though.
Let’s look at the National League, which is also experiencing a muddled Cy Young race:
Rank Name Cy Young points 1. Adam Wainwright 290 2. Chris Carpenter 236 3. Tim Lincecum 193 4. Jonathan Broxton 80 5. Josh Johnson 77
Adam Wainwright appears to be the favorite here, though it seems to me that Carpenter and Lincecum have a very good chance of winning as well (to be fair, the model likes all of them, it just likes Wainwright the best). Again, we see a pitcher who would be the odds-on favorite had he won another game, but without a 20-game winner the Cy Young race becomes much harder to predict.
We’ll check to see how the model did when the awards are announced a month from now. If history is any guide, it will probably get one or more of the races wrong, though it’s not unfathomable that it would hit on all four (in the past decade, it’s been four-for-four in 2002, 2005 and 2007).
Still, I’m glad to have gotten a peek at what the numbers have to say a month before the writers announce their decisions. If you are, too (and since you read to the end of this article, I suspect you are!), please consider buying The Hardball Times Annual 2010. You can read Dave Studeman’s sales pitch here, but I’ll summarize: The Annual is chock full of articles, graphics, and statistics that every baseball fan will enjoy (including a piece by yours truly describing the models used in this article), and the money we make from it plays a pivotal role in allowing us to continue publishing so much content every day here at The Hardball Times. If you do buy the Annual, please order it through our publisher as we get a lot more money that way (and the Annual will arrive at your doorstep sooner!).
(And just as a small extra incentive, if you send me a copy of your receipt, I’ll send you the full list of leaders for both 2008 and 2009. Just e-mail me at
, and I’ll respond promptly.)