THT Essentials:

Now available


You can now purchase the Hardball Times Baseball Annual 2013, with 300 pages of great content. It's also available on Amazon and Kindle. Read more about it here.


Follow our quick-hitting updates each day on Twitter.

And here's the full roster.

Most Recent Comments

Monthly Archives



Creative Commons License
All content on this site (including text, graphs, and any other original works), unless otherwise noted, is licensed under a Creative Commons License.

Wednesday, April 15, 2009

Are you smarter than a sabermetric spreadsheet? (Part 2)

Posted by Alex Zelvin at 2:35am

Last week I provided a points scoring system and asked readers to list the five outfielders who they expected to score the most points on Friday, April 10. Of the readers who replied, the top five picks were Grady Sizemore (six picks), Alfonso Soriano (six picks), Curtis Granderson (six picks), Matt Holliday (four picks), and Manny Ramirez (four picks). The spreadsheet that I use to evaluate hitters based on their matchups, park, and other contextual factors came up with a very different list, only agreeing with the readers on Sizemore. The other outfielders it identified in the top five were Carlos Quentin (two reader picks), Jermaine Dye (one pick), Shane Victorino (one pick), and Vladimir Guerrero (zero picks).

So what accounted for the difference?

It looks like the players on the readers' list tended to be really good and other than Ramirez are very well-rounded players. The computer put more of a focus on match-ups and contextual factors. It gave Quentin and Dye a lot of credit for playing at home, in a hitters' park, against a really bad opposing starting pitcher (Dickey). Victorino, too, benefited from playing in an excellent hitters' park against a bad pitcher (Marquis). Marquis is also very easy to steal bases against, which is a substantial advantage for a fast player like Victorino. I assume that was overlooked by most (if not all) readers. I certainly wasn’t aware of it until I looked over the results of the spreadsheet calculations. In Guerrero’s case, I suspect that the readers may have actually done a better job in their evaluation than the spreadsheet. The calculations treated Wakefield as a really awful pitcher based on his K/9, BB/9 and GB% rates. What they don't know is that he’s a knuckleball pitcher and that knuckleballers tend to do better than their component statistics would suggest.

Among the other popular readers picks, I suspect that Granderson and Soriano may be worse players in this scoring system than many readers gave them credit for, due to their lack of walks. That’s actually one of the things that's really surprised me since I started maintaining the spreadsheet and using it for multiple game formats. There's a drastic difference between the values of players in different games, and in some cases a player may rank very high in one game and far lower in another. Just knowing that a player is "good" isn't enough. You need to know what they're worth in the scoring system that your league or contest uses.

Comparing the readers' picks with the spreadsheet picks really points out the advantages and disadvantages of relying on calculations rather than intuition for your picks in daily contests. While the spreadsheet can often do a better job measuring and balancing a number of different factors related to the players and the context of the game, it can only take into account factors that have been programmed into the statistical model. If you haven’t gotten around to including them, it won’t know about things like knuckleballs, rain, injuries, defensive replacements, and other factors that may occur infrequently or be hard to quantify. On balance, I think a good statistical model can outperform the intuition of any expert, but the model will do best when monitored by a knowledgeable person.

While we all know that one game is far too small a sample size to really answer the question of who was "right," it’s still fun to take a look at how the players in question performed. Here are the point totals for each of the nine players listed above:

Sizemore: 1
Soriano: -2
Granderson: 6
Holliday: 4
Ramirez: 4

Sizemore: 1
Quentin: 8
Dye: -1
Victorino: 2
Guerrero: 4

Almost a tie, as the spreadsheet outperformed the people 14 to 13. A side benefit of the spreadsheet’s focus on matchups is that it will often identify players who are available more cheaply than the stars that most people prefer.

Compete against Alex and other players in one day fantasy baseball contests at Fanduel or visit his site, Daily Baseball Data, which has daily hour by hour weather forecasts for all games on one screen and batter vs. pitcher matchup data for the full day's schedule.



Comments

Mike Clay said...

Woo! I beat it with 17. Thank you Carlos Lee.

Posted 04/15  at  09:07 AM
Al said...

Yay, I got 18 with 4, with Bruce deciding not to play on Friday.

Posted 04/15  at  11:44 AM
Galen said...

Any chance of linking your spreadsheet for general consumption?

Posted 04/15  at  02:01 PM
CD Hoyt said...

Is your spread sheet available to the rest of us mere mortals???

Posted 04/16  at  07:12 PM
Page 1 of 1

Leave a comment:

Name (required):

Email (required):

URL:

Remember my personal information

Notify me of follow-up comments?



     Next Post:  What's the big deal with Jose Arredondo?>> <<Previous Post:  When to use your FAAB budget and waiver priority