I’m way too interested in MLB awards. I’m well aware that they’re completely meaningless, as the voters don’t exactly look at the most advanced metrics, but they’re always the center of lively debate, which I enjoy thoroughly.
Because of my silly obsession, I’ve always found the Neyer/James Cy Young Predictor to be fascinating, and find myself checking it regularly (Its money is on C.C. Sabathia and Jake Peavy this year, by the way). I always wondered why someone hadn’t created something similar to predict who will win the MVP, and eventually I decided to take up the project myself.
I found out very quickly why no such thing existed. The MVP voting is much more complex than the Cy Young voting, which essentially relies on innings, wins, ERA, and strikeouts. What makes predicting the MVP trickier is the additional bias voters have towards contending teams, which is hard to quantify. The following is my attempt to do so by regressing recent MVP tallies with various statistics, which I’m going to go ahead and call MVP Tracker. To try to break down the MVP voting, I looked at players from 2002 to 2006 who finished a season with a top 10 OPS or in the top five of home runs, RBIs or average or in the top three for runs, which gave me 148 players total.
Before I delve into this, please note that Barry Bonds was not included in this study. Bonds, with his .609 OBP and psychotic 22.2 RC/27, throws off the whole thing. When he’s included the formula looks much different—it thinks the writers weigh OBP and RC/27 heavily. But when you take Bonds out, OBP and RC/27 are not even close to being relevant factors in the regression, which I think is a more accurate depiction of reality. If another guy comes along and draws 232 walks in a season, I’ll have to reevaluate this. I’m not going to hold my breath.
As one might expect, each of the Triple Crown stats are main factors. My original system simply included average, but in looking at this year’s race, it seemed like the guys who had missed significant chunks of playing time (Chase Utley and Ryan Howard) were overrated by that formula. Since average is the only rate stat included in this formula, I figured that should be the one I adjust for games played. I ended up with the following formula: ((GP/162)^2)*BA, which penalizes guys who missed time.
This year this hurts Utley the most, as his .332 BA over 131 games is worth only as much as a .299 BA over the full 162. The coefficient for this number is 642.9; the difference in MVP Tracker points between Matt Holliday (.340 BA in 157 games) and Ryan Howard (.268 BA in 144 games) in this category is 54.4 points.
No surprise here, and not even an adjustment: simply 1.77 points for each home run. Prince Fielder and Howard make up some of the ground here that they lose in BA.
This category is broken up into two sections: one if your team makes the playoffs, one if they don’t. This makes them both statistically significant, and it also illustrates the voters’ mindset: A run producer on a playoff team is more valuable than on a cellar dweller. If your team makes the playoffs, each RBI is worth 0.84 points; if not, you only get 0.55.
For a long time, I didn’t include runs—every regression I ran had them as being irrelevant. But I began to notice that my system consistently underestimated the voters’ interest in guys at the top of the order. So I ran the regression only counting runs if you’re a “table setter”, and it turns out that that’s very significant.
My definition of “table setter” is pretty arbitrary: I honestly just went down the list of the 144 seasons I looked at, and did it from memory. Guys who fit into this category in past seasons included Alfonso Soriano, Jimmy Rollins, Derek Jeter, Ichiro Suzuki, and Johnny Damon. This year the relevant guys in this group are Hanley Ramirez and Jimmy Rollins. They get 0.16 points for each run they’ve scored, which helps them close the gap with the power hitters.
These last three factors are things that are hard to quantify, and not necessarily things I would have guessed were significant factors. But after seeing what my regression model spit out, I think they make a good deal of sense.
Included in the regression were nine seasons from Rockies: four from Todd Helton, and one each from Garret Atkins, Matt Holliday, Larry Walker, Preston Wilson, and Vinny Castilla. I just added in a dummy variable of “1” for those guys, and “0” for everyone else. The coefficient is –26.3, so this year Holliday is penalized that amount for playing in Colorado.
I know, I know, I know- the voters don’t look at UZR. But what’s the theory, that a stat like this should agree with conventional wisdom 80% of the time? The voters do consider fielding, though not much. I used the UZRs from the three years prior to each season (e.g. for this year I used ’04-‘06), as I figured that’s when they established their defensive reputation, which is quite obviously what the voters go by. This year the numbers range from Holliday’s +35 to Hanley’s –35. The coefficient is 0.2, so that’s a 14-point difference between the two.
Again, not something we think of as prevalent in the voters’ minds, but it does prove to factor in slightly. For the adjustment, I usedthe average WPA/PA for each position from 2004 to 2006 (which meshes nicely with the conventional wisdom on positional values); below are how many points each position gains/gets penalized.
So those are the factors. If you’re interested, I uploaded the regression results here. The R-Squared is 0.64. If you’re interested in messing around with the Excel file I used to calculate all of this, feel free to e-mail me and I’ll be happy to send it over. I’m pretty happy with what I ended up with, but seeing as I was tinkering with it up until the last minute, there’s likely room for improvement. For example, I would like to incorporate stolen bases and an extra penalty for designated hitters, but haven’t been able to get statistically significant results with that.
Here’s how this year’s race shakes out according to MVP Tracker:
Not much to say there. Luckily, the NL race is very interesting, or else this would be a very anticlimactic article. The only real suspense left is whether Alex Rodriguez will win unanimously (and whether that can be considered suspense is entirely debatable); I don’t see any reason for him not to, especially since Magglio’s Tigers will be sitting at home in October.
Ortiz probably jumps out as being a bit high, and I agree—I doubt he’ll get more votes than Ordonez or Guerrero. Unfortunately, MVP Tracker isn’t aware that the media thinks Ortiz has a down year if he doesn’t hit five walk-off homers, even if he rakes at a clip of .332/.445/.621, with 52 doubles and 35 homers. Looking at the VORP leaders, A-Rod is actually third in MLVr, slightly behind both Ordonez and Ortiz. Although that obviously has nothing to do with the actual MVP voting.
The following table is not the final MVP Tracker projection for the NL. Rather, it’s what each player’s final number would have been if they had made the playoffs. Since the NL race was so tight, there was no way MVP Tracker was taking a guy whose team missed out on October, so this was the thing to look at.
As you can see, this race was decided in the season’s 163rd game. If Jimmy Rollins doesn’t end up winning the award, he has Trevor Hoffman to thank. Because of the Rockies’ dramatic win last night, MVP Tracker has Matt Holliday winning the award easily, even with the 26 point deduction for playing in Coors. If Colorado had lost on Monday, Holliday’s score would have dropped all the way down to 92.6, since he wouldn’t have received nearly as much credit for his 135 RBIs.
Here are the final NL Standings:
I think MVP Tracker will nail the top two here. After that, Howard will probably be hurt by his 199 strikeouts, and the fact that he was so much better last year, allowing Fielder to finish third.