Tuesday, February 09, 2010
The stats we targetPosted by Paul Singman at 8:29am
For someone who writes about fantasy baseball, ADP (Average Draft Position) is a fun statistic. For instance, doing something as simple as graphing ADP against itself can visualize some aspects of what occurs during a draft. This ADP data, by the way, are from Yahoo drafts for the 2008 season, meaning these drafts occurred before the season began.
The interesting part of this graph is not where the dots are located, but their distance from each other. Noticing how they are relatively bunched at the edges and less dense in the middle reinforces my sentiment in this article—that drafting in the middle rounds is the most difficult.
Fantasy baseballers cannot agree where to take players in these rounds and therefore few players end up with an average draft position in the 100s. Because it is more of a "who" to take rather than a "where" at the end of a draft, you end up with the clustering after the 200 ADP mark that you see.
Ostensibly the reason people drafted these players where they did is because of the stats these players accumulated in the previous year. Comparing a player's 2007 numbers with his 2008 ADP can provide us with some insight into which of the fantasy stats we target the most in drafts. Before we get buried in numbers, though, let's first look at some graphs starting with home runs, since I figure they will be an important determinant.
This graphs shows us that it is not imperative to hit a ton of home runs to be taken early, as depicted by the dots toward the lower left of the graph. Also, hitting around 25 home runs seems to be the magic number to get a hitter out of the 200+ ADP cluster and from there a nicely defined linear slope brings us to Alex Rodriguez' 54 home runs in 2007 and his corresponding 1.2 ADP in 2008.
Next we will look at stolen bases, which might present a graph that looks radically different from the plateau-shaped home run graph.
This graph actually looks somewhat similar to the home run graph; it features the same basic shape except with more players on the left extreme and fewer to the right one. Simply looking at the graph, though, the dispersion appears more random, whereas on the home run graph there was a more visible downward slope.
Even more random than the stolen bases graph is the one comparing batting average to ADP.
Since batting average is a rate stat, I increased the at-bat threshold to 400 to eliminate possible fluky batting averages attained over a couple of hundred at-bats. Despite that, a player's batting average appears to have a small effect on where he is drafted. Intuition tells me there must be some degree of correlation, but compared to home runs and stolen bases it appears to be small.
Last we will look at the graph of runs, which appear to correlate well with next year's ADP, although later we will find out that may not be the case.
As you can see there is a well-defined, generally downward slope to the right, suggesting a correlation. Sometimes with graphs looks can be deceiving, as the next section will show.
Looking at pretty graphs is nice, but let's not get distracted from the purpose of the data. What the data can tell is which of the five main fantasy stats have the largest impact on where a player gets drafted in the following year. For this I used a multivariate regression, two multivariate regressions actually—one using the stats as counting stats with average converted to hits, and the second with them as rate stats, so for example home runs became home runs per at-bat. The results of the regressions are summarized in the following tables.
+-----------------------------------+ | ~ COUNTING ~ | +------+--------------+-------------+ | Stat | Coefficients | P-value | +------+--------------+-------------+ | Int. | 370.6356 | 1.6768 E-31 | | R | -0.3829 | 0.34664 | | HR | -2.2503 | 0.0093 | | RBI | -1.1258 | 0.0030 | | SB | -2.1020 | 8.6056 E-07 | | Hits | -0.3875 | 0.1504 | +------+--------------+-------------+
For the coefficients column, a lower coefficient means the stat is more significant. So in counting form home runs edge out stolen bases as the most significant with runs and hits the least important. The "P-value" column shows the significance of the coefficient with anything under .05 statistically significant, meaning home runs, RBI, and especially stolen bases pass the significance test. As I hinted before, runs were extraordinarily insignificant compared to the other stats.
+-------------------------------------+ | ~ RATE ~ | +--------+--------------+-------------+ | Stat | Coefficients | P-value | +--------+--------------+-------------+ | Int. | 550.6223 | 4.2287 E-24 | | R/AB | -97.9406 | 0.6615 | | HR/AB | -1523.8494 | 0.0019 | | RBI/AB | -608.0605 | 0.0045 | | SB/AB | -1578.2072 | 7.5421 E-10 | | AVG | -833.7461 | 2.9494 E-05 | +--------+--------------+-------------+
Once again home runs and stolen bases jump out as the big players, with not surprisingly batting average rising in importance since this is its home court, so to speak. And once again runs display their general lack of relevance.
The one part of these charts I have failed to mention yet is the coefficient of the intercept. The fun activity you can do with these is create a rough estimate of where a player will be drafted given his stat line for a season. Multiplying a player's stats in each category by its coefficient, adding those numbers up and then subtracting from the intercept coefficient will generate a rough estimate of that player's ADP. For example if you took Todd Helton's 2007 line of 86 runs, 17 homers, 91 RBI, no stolen bases, and 178 hits and plugged it in:
Estimated ADP = 370.6 - (86 * .3829) - (17 * 2.25) - (91 * 1.1258) - (0 * 2.1) - (178 * .3875) = 128.5
Helton's estimated ADP of 128.5 is remarkably close to his actual ADP that year of 135.4 given the crudeness of the model (using only one year of data from one website) and the fact that it does not take into account any positional adjustment. This model worked well for this set of data with an R-Squared of .8, but that is not overly surprising considering the model was created off the 2007 season-2008 ADP data. At this point this ADP model probably will not work tremendously well for the 2009 season stats, but given a few more years of data added it could become an interesting tool for leagues that draft early in the offseason, or for some historical context on a player's ADP.
I know this article does more of confirming what we might have already suspected—that home runs and steals are the most significant when it comes to determining ADP—instead of providing us with new information, but there still are lessons to be taken away.
First, the insignificance of runs in the regressions points to a possible inefficiency in the fantasy marketplace. People most likely assume runs are a byproduct of other skills and ignore them when ranking players. A system that would take into account position in batting order, team runs per game, and of course the player's skill level could more accurately predict expected run totals and make rankings more accurate.
The xADP model I debuted is something that could become a powerful fantasy tool given a few more years of ADP data, and hopefully you saw a glimpse of that.
I'll end with a confession and display of gratitude to colleague Nick Steiner, who ran the multivariate regressions that spewed out the coefficient values that were instrumental to this article. I am more statistically illiterate than you might assume and do not have the savvy to run such regressions. I owe a big thanks to him for his time and effort.
Paul has been managing fantasy baseball teams for many seasons and writing for THT Fantasy over the past three years. He is currently a student at UPenn welcomes readers' thoughts at his email here or in the comments below.