|
THT Essentials: Now availableYou can now purchase the Hardball Times Baseball Annual 2013, with 300 pages of great content. It's also available on Amazon and Kindle. Read more about it here.![]() ![]() Derek Ambrosino
Karl deVries Nick Fleder Jeffrey Gross Brad Johnson Moe Koltun Scott Spratt Michael Stein Scott Strandberg Jack Weiland Noah Woodward And here's the full roster. Most Recent Comments
Fantasy Waiver Wire: Week 12, Vol. I (11)
Traders Corner: Oakland Elixir, V is for Victor (2) Fantasy Waiver Wire: Week 11, Vol. II (8) The daily grind: 6-14-13 (6) Fantasy Waiver Wire: Week 11, Vol. III (1) Monthly Archives
June, 2013
May, 2013 April, 2013 March, 2013 February, 2013 January, 2013 December, 2012 November, 2012 October, 2012 September, 2012 August, 2012 July, 2012 June, 2012 May, 2012 April, 2012 March, 2012 February, 2012 January, 2012 December, 2011 November, 2011 October, 2011 September, 2011 August, 2011 July, 2011 June, 2011 May, 2011 April, 2011 March, 2011 February, 2011 January, 2011 December, 2010 November, 2010 October, 2010 September, 2010 August, 2010 July, 2010 June, 2010 May, 2010 April, 2010 March, 2010 February, 2010 January, 2010 December, 2009 November, 2009 October, 2009 September, 2009 August, 2009 July, 2009 June, 2009 May, 2009 ![]() All content on this site (including text, graphs, and any other original works), unless otherwise noted, is licensed under a Creative Commons License. |
THT's Fantasy Archives
Tuesday, February 09, 2010The stats we targetFor someone who writes about fantasy baseball, ADP (Average Draft Position) is a fun statistic. For instance, doing something as simple as graphing ADP against itself can visualize some aspects of what occurs during a draft. This ADP data, by the way, are from Yahoo drafts for the 2008 season, meaning these drafts occurred before the season began. The interesting part of this graph is not where the dots are located, but their distance from each other. Noticing how they are relatively bunched at the edges and less dense in the middle reinforces my sentiment in this article—that drafting in the middle rounds is the most difficult. Fantasy baseballers cannot agree where to take players in these rounds and therefore few players end up with an average draft position in the 100s. Because it is more of a "who" to take rather than a "where" at the end of a draft, you end up with the clustering after the 200 ADP mark that you see. Ostensibly the reason people drafted these players where they did is because of the stats these players accumulated in the previous year. Comparing a player's 2007 numbers with his 2008 ADP can provide us with some insight into which of the fantasy stats we target the most in drafts. Before we get buried in numbers, though, let's first look at some graphs starting with home runs, since I figure they will be an important determinant. GraphsThis graphs shows us that it is not imperative to hit a ton of home runs to be taken early, as depicted by the dots toward the lower left of the graph. Also, hitting around 25 home runs seems to be the magic number to get a hitter out of the 200+ ADP cluster and from there a nicely defined linear slope brings us to Alex Rodriguez' 54 home runs in 2007 and his corresponding 1.2 ADP in 2008. Next we will look at stolen bases, which might present a graph that looks radically different from the plateau-shaped home run graph. This graph actually looks somewhat similar to the home run graph; it features the same basic shape except with more players on the left extreme and fewer to the right one. Simply looking at the graph, though, the dispersion appears more random, whereas on the home run graph there was a more visible downward slope. Even more random than the stolen bases graph is the one comparing batting average to ADP. Since batting average is a rate stat, I increased the at-bat threshold to 400 to eliminate possible fluky batting averages attained over a couple of hundred at-bats. Despite that, a player's batting average appears to have a small effect on where he is drafted. Intuition tells me there must be some degree of correlation, but compared to home runs and stolen bases it appears to be small. Last we will look at the graph of runs, which appear to correlate well with next year's ADP, although later we will find out that may not be the case. As you can see there is a well-defined, generally downward slope to the right, suggesting a correlation. Sometimes with graphs looks can be deceiving, as the next section will show. RegressionLooking at pretty graphs is nice, but let's not get distracted from the purpose of the data. What the data can tell is which of the five main fantasy stats have the largest impact on where a player gets drafted in the following year. For this I used a multivariate regression, two multivariate regressions actually—one using the stats as counting stats with average converted to hits, and the second with them as rate stats, so for example home runs became home runs per at-bat. The results of the regressions are summarized in the following tables.
For the coefficients column, a lower coefficient means the stat is more significant. So in counting form home runs edge out stolen bases as the most significant with runs and hits the least important. The "P-value" column shows the significance of the coefficient with anything under .05 statistically significant, meaning home runs, RBI, and especially stolen bases pass the significance test. As I hinted before, runs were extraordinarily insignificant compared to the other stats.
Once again home runs and stolen bases jump out as the big players, with not surprisingly batting average rising in importance since this is its home court, so to speak. And once again runs display their general lack of relevance. The one part of these charts I have failed to mention yet is the coefficient of the intercept. The fun activity you can do with these is create a rough estimate of where a player will be drafted given his stat line for a season. Multiplying a player's stats in each category by its coefficient, adding those numbers up and then subtracting from the intercept coefficient will generate a rough estimate of that player's ADP. For example if you took Todd Helton's 2007 line of 86 runs, 17 homers, 91 RBI, no stolen bases, and 178 hits and plugged it in: Estimated ADP = 370.6 - (86 * .3829) - (17 * 2.25) - (91 * 1.1258) - (0 * 2.1) - (178 * .3875) = 128.5 Helton's estimated ADP of 128.5 is remarkably close to his actual ADP that year of 135.4 given the crudeness of the model (using only one year of data from one website) and the fact that it does not take into account any positional adjustment. This model worked well for this set of data with an R-Squared of .8, but that is not overly surprising considering the model was created off the 2007 season-2008 ADP data. At this point this ADP model probably will not work tremendously well for the 2009 season stats, but given a few more years of data added it could become an interesting tool for leagues that draft early in the offseason, or for some historical context on a player's ADP. Concluding thoughtsI know this article does more of confirming what we might have already suspected—that home runs and steals are the most significant when it comes to determining ADP—instead of providing us with new information, but there still are lessons to be taken away. First, the insignificance of runs in the regressions points to a possible inefficiency in the fantasy marketplace. People most likely assume runs are a byproduct of other skills and ignore them when ranking players. A system that would take into account position in batting order, team runs per game, and of course the player's skill level could more accurately predict expected run totals and make rankings more accurate. The xADP model I debuted is something that could become a powerful fantasy tool given a few more years of ADP data, and hopefully you saw a glimpse of that. I'll end with a confession and display of gratitude to colleague Nick Steiner, who ran the multivariate regressions that spewed out the coefficient values that were instrumental to this article. I am more statistically illiterate than you might assume and do not have the savvy to run such regressions. I owe a big thanks to him for his time and effort. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||