Who’s going to win the MVP?

In The Hardball Times Annual this year (pre-order here), I have an article on predicting the MVP and Cy Young awards. I built a model in the article (actually, a few separate models), which tends to do a pretty good job, getting the winner right around two-thirds of the time (a little worse for the MVP and a little better for the Cy Young). Just to give you an idea: Last year, the model correctly predicted Cliff Lee’s Cy Young and Albert Pujols’ MVP; saw the National League Cy Young race as a dead heat between Brandon Webb and Tim Lincecum; and in the American League, it had Dustin Pedroia first among hitters, though it actually thought the award would go to Lee.

I thought it might be interesting to look at what the model has to say for this year’s races, and we can of course check its predictions once the awards are announced in November. Still, there’s nothing like jumping the gun and doing a little educated speculating…

The MVP awards are probably a little easier to project this year than the Cy Young awards, though it is usually the other way around. In the American League, the favorite appears to be Joe Mauer, though the model would have been horribly confounded had the Twins not made the postseason:

Rank  Name             MVP points
   1. Joe Mauer        677
   2. Derek Jeter       297
   3. Mark Teixeira     224
   4. Robinson Cano    153
   5. Kendry Morales   137
   6. Felix Hernandez   115
   7. Miguel Cabrera   110
   8. Jason Kubel      89
   9. Alex Rodriguez    84
  10. Jason Bay        82

To be clear, the model is based on a 1,000-point scale, where 1,000 points means a player got all the first place votes. You can see that Mauer comfortably leads the rest of the field, but had the Tigers won the division, he actually would have dropped into second place, with 275 points, while Miguel Cabrera would have shot up to third with 270. Because the model is exponential (which magnifies small differences), that would have meant that the top four candidates were basically indistinguishable. Luckily, with the Twins in the playoffs, the model picks a clear winner.

The rest of the top-10 on this list is pretty reasonable, though one name in particular came as a surprise to me: Jason Kubel. In fact, Kubel had a very good season, hitting .300 with 28 home runs and 103 RBI, with a sabermetric-friendly .907 OPS. I don’t know if he’ll finish eighth in MVP voting, but he’s had himself quite a season.

Let’s move on to the National League MVP, which should turn out about the same way it did last year, at least at the very top:

Rank  Name             MVP points
   1. Albert Pujols    1,405
   2. Ryan Howard       240
   3. Prince Fielder      182
   4. Hanley Ramirez     175
   5. Ryan Braun        119
   6. Chase Utley        115
   7. Matt Kemp          95
   8. Derrek Lee          74
   9. Jayson Werth        65
  10. Adam Wainwright     61

If Albert Pujols doesn’t win, it will be a bigger upset than this year’s Nobel Peace Prize. As I said, the predicted values are technically supposed to be capped at 1,000 points but really great performances sometimes go over. Really great pretty much sums up Pujols’ 2009 season: A .327 batting average, 47 home runs, 135 RBI, 124 runs, and a 1.101 OPS. By any measure, Pujols was the best player in the National League this season.

By the way, Ryan Howard’s second-place finish underscores the fact that the model attempts to represent actual voter behavior, as opposed to what should be. Though Howard might not have even been one of the 10 most valuable players in the National League this year (though he was definitely close), his numbers—.279 average, 45 home runs, 141 RBI—are what MVP voters like to see (plus his team made the playoffs), and that should propel his MVP placement higher than Howard might deserve. Last year, the model predicted the same top two, and indeed that’s what we got.

Just like in the American League, one name on this list comes a surprise to me: Jayson Werth. Though Werth has long been a valuable player, I had no idea that he was seventh in the National League in home runs (a category the voters seem to place a ton of weight on).

Predicting the AL Cy Young is where it gets a little hairy. Here’s how the model sees things:

Rank  Name             Cy Young points
   1. Felix Hernandez  603
   2. Zack Greinke       322
   3. Roy Halladay       239
   4. Justin Verlander    192
   5. CC Sabathia        180

Because the model is exponential, there’s a certain sweet spot where a player’s predicted total can just take off; Hernandez is not actually that far ahead of Greinke (or the other candidates for that matter), but in the final weeks of the season, he hit that sweet spot, and the model sees him as a pretty clear winner now.

I’m not so sure. Even though Hernandez went 19-5 with a 2.49 ERA, I think there will be a lot of support for Greinke, who was 16-8 with a 2.16 ERA. Greinke got a lot of press by starting off hot, and I believe that will help him, though it’s something the model has no way of knowing. If Hernandez had won just one more game, he’d definitely be the Cy Young, though.

Let’s look at the National League, which is also experiencing a muddled Cy Young race:

Rank  Name             Cy Young points
   1. Adam Wainwright  290
   2. Chris Carpenter    236
   3. Tim Lincecum     193
   4. Jonathan Broxton   80
   5. Josh Johnson      77

Adam Wainwright appears to be the favorite here, though it seems to me that Carpenter and Lincecum have a very good chance of winning as well (to be fair, the model likes all of them, it just likes Wainwright the best). Again, we see a pitcher who would be the odds-on favorite had he won another game, but without a 20-game winner the Cy Young race becomes much harder to predict.

We’ll check to see how the model did when the awards are announced a month from now. If history is any guide, it will probably get one or more of the races wrong, though it’s not unfathomable that it would hit on all four (in the past decade, it’s been four-for-four in 2002, 2005 and 2007).

Still, I’m glad to have gotten a peek at what the numbers have to say a month before the writers announce their decisions. If you are, too (and since you read to the end of this article, I suspect you are!), please consider buying The Hardball Times Annual 2010. You can read Dave Studeman’s sales pitch here, but I’ll summarize: The Annual is chock full of articles, graphics, and statistics that every baseball fan will enjoy (including a piece by yours truly describing the models used in this article), and the money we make from it plays a pivotal role in allowing us to continue publishing so much content every day here at The Hardball Times. If you do buy the Annual, please order it through our publisher as we get a lot more money that way (and the Annual will arrive at your doorstep sooner!).

(And just as a small extra incentive, if you send me a copy of your receipt, I’ll send you the full list of leaders for both 2008 and 2009. Just e-mail me at
, and I’ll respond promptly.)

Print Friendly
 Share on Facebook0Tweet about this on Twitter0Share on Google+0Share on Reddit0Email this to someone
« Previous: Internet-capable phones and fantasy sports: A love story
Next: Why the Twins lost to the Yankees »

Comments

  1. Paul Moehringer said...

    If the AL MVP is anyone but Joe Mauer, the award will become almost completely meaningless to me.

    To lead the league in batting average, on base percentage, slugging percentage and OPS as a catcher in addition to being a gold glove winner is simply incredible.

    I don’t know what more you could possibly ask for from a catcher then what Joe Mauer has been doing this year.

  2. Tim said...

    Does the model have a “Jeter” factor in it?  Anyone with the last name Jeter automatically gets a 200 point boost.

  3. David Gassko said...

    Hey Tim,

    That’s not actually the case at all. The model thought Jeter would win in 2006 (albeit slightly), and he didn’t. If anything, I would say the Jeter factor would be negative rather than positive.

  4. B Mills said...

    I think this article might get me to buy the THT Annual now.  This is something I’ve wanted to do using an approach I’ve used for Hall of Fame voting, but haven’t had the time to collect the data.  I’m curious what kind of model you used (though I bet you’re going to tell me to go buy the Annual and see aren’t you).

  5. Chris said...

    Cough, Ben Zobrist, cough, not on your list for AL MVP. Take a look at the fangraphs WAR leader board. He had, arguably, a better season than Pujols.

  6. David Gassko said...

    Maybe, Chris, but the model is based on what MVP voters look at, and WAR is not one of those things. Should Zobrist be on the list? Probably. But will he? Probably not.

  7. Alex Poterack said...

    Keep in mind, WRT to Zobrist not being there, the model is designed to figure out who the writers will vote for, not who’s the most deserving.  Zobrist definitely had a fantastic season, but just not in the sorts of categories the writers pay attention to.  It doesn’t devalue his accomplishments at all, just gives you an idea what writers look at.

  8. Drew said...

    Paul Moehringer – agreed, and he wasn’t just throwing stats into a hole, he drove his team to the playoffs.

    As for the “Nobel” comment, all awards are relative; a weak field can produce an unexpected recipient.

  9. Alex Poterack said...

    BNPP would actually be a fairly competitive award this year; I think Utley, Hanley Ramirez, and Prince Fielder could all lay similar claims to it

  10. lexomatic said...

    I’m surprised Kubel showed up instead of Lind.. I’m guessing it’s the playoff team effect? Lind did have higher counting and rate stats although they basically had identical years.

  11. ecp said...

    I’m not so sure about 20 wins being magical for the Cy Young any more; it didn’t do much for Josh Beckett in 2007.  Beckett had 20 wins, Sabathia 19, but Sabathia won the Cy.  All their other numbers were virtually identical.

    It’s interesting that all the predictive models say Hernandez will win in the AL, but the designers of said models don’t agree with their creations.  Everybody just feels that it’s going to be Greinke.  Is it time to redesign the models to value wins & losses a little differently, or is this just an aberrational year?

  12. Adam W said...

    Ben Zobrist really makes for an interesting discussion this year – how often are we forced to weight utility vs. catcher defense as the deciding factor?

  13. CH said...

    I agree that the debate (Zobrist vs. Mauer) would have seemed highly unlikely at the beginning of the year, and still seems almost laughable to anyone who doesn’t understand/trust UZR and WAR.

    Zobrist’s WAR is driven by his insane UZR results at both 2B and RF.  Personally, I take his UZR (and therefore WAR) for this year with a grain of salt.  He had never played more than 8 games at 2B in any previous year, and he had never played more than 2 games in RF in any previous year.

    To me, 61 career games in RF and 99 career games at 2B aren’t enough of a sample size to say that his UZR deserves to be as high as it is. 

    Zobrist certainly deserves to be in the top 10, and probably even the top 3.  But if we look at JUST batting runs, Fangraphs has Mauer at 56.0 whereas Zobrist is at 40.2.  That seems closer to how the vote will actually turn out.  Zobrist will still get shafted by the writers regardless, by being placed below the several Yankees, but I don’t think anyone can say Mauer “stole the award from him” or anything like that.

    Just my opinion.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current ye@r *