Last week I provided a points scoring system and asked readers to list the five outfielders who they expected to score the most points on Friday, April 10. Of the readers who replied, the top five picks were Grady Sizemore (six picks), Alfonso Soriano (six picks), Curtis Granderson (six picks), Matt Holliday (four picks), and Manny Ramirez (four picks). The spreadsheet that I use to evaluate hitters based on their matchups, park, and other contextual factors came up with a very different list, only agreeing with the readers on Sizemore. The other outfielders it identified in the top five were Carlos Quentin (two reader picks), Jermaine Dye (one pick), Shane Victorino (one pick), and Vladimir Guerrero (zero picks).
So what accounted for the difference?
It looks like the players on the readers’ list tended to be really good and other than Ramirez are very well-rounded players. The computer put more of a focus on match-ups and contextual factors. It gave Quentin and Dye a lot of credit for playing at home, in a hitters’ park, against a really bad opposing starting pitcher (Dickey). Victorino, too, benefited from playing in an excellent hitters’ park against a bad pitcher (Marquis). Marquis is also very easy to steal bases against, which is a substantial advantage for a fast player like Victorino. I assume that was overlooked by most (if not all) readers. I certainly wasn’t aware of it until I looked over the results of the spreadsheet calculations. In Guerrero’s case, I suspect that the readers may have actually done a better job in their evaluation than the spreadsheet. The calculations treated Wakefield as a really awful pitcher based on his K/9, BB/9 and GB% rates. What they don’t know is that he’s a knuckleball pitcher and that knuckleballers tend to do better than their component statistics would suggest.
Among the other popular readers picks, I suspect that Granderson and Soriano may be worse players in this scoring system than many readers gave them credit for, due to their lack of walks. That’s actually one of the things that’s really surprised me since I started maintaining the spreadsheet and using it for multiple game formats. There’s a drastic difference between the values of players in different games, and in some cases a player may rank very high in one game and far lower in another. Just knowing that a player is “good” isn’t enough. You need to know what they’re worth in the scoring system that your league or contest uses.
Comparing the readers’ picks with the spreadsheet picks really points out the advantages and disadvantages of relying on calculations rather than intuition for your picks in daily contests. While the spreadsheet can often do a better job measuring and balancing a number of different factors related to the players and the context of the game, it can only take into account factors that have been programmed into the statistical model. If you haven’t gotten around to including them, it won’t know about things like knuckleballs, rain, injuries, defensive replacements, and other factors that may occur infrequently or be hard to quantify. On balance, I think a good statistical model can outperform the intuition of any expert, but the model will do best when monitored by a knowledgeable person.
While we all know that one game is far too small a sample size to really answer the question of who was “right,” it’s still fun to take a look at how the players in question performed. Here are the point totals for each of the nine players listed above:
Almost a tie, as the spreadsheet outperformed the people 14 to 13. A side benefit of the spreadsheet’s focus on matchups is that it will often identify players who are available more cheaply than the stars that most people prefer.