Wednesday, June 09, 2010
Confessions of a conspiracy theorist: who’s watching the watchers?Posted by Derek Ambrosino at 4:17am
One of the fundamental elements of the great Cardrunners debate from eariler this year - and one of the few that we have not beaten to the ground - was that of whether the fantasy community has accurately determined the value of a player's production, even when we have the luxury of pricing it retrospectively. I think most of us just assume that when we look at our league provider's player rankings pages that the player who is ranked first overall has outproduced the player ranked second, and so forth. However, in reality there is some sort of back-end formula taking place that is making certain assumptions and choosing what to value and how heavily so. The formula is not gospel, though it remains largely unquestioned by the fantasy-playing universe.
Returning to the quants' point from the Cardrunners debate, perhaps they are right. If I removed the rankings from the following two stat lines, which line would you determine to have more absolute value?
Player A: .314 100 R 31 HR 112 RBI 7 SB over 620 ABs
Player B: .323 109 R 15 HR 79 RBI 32 SB over 640 ABs
Player A enjoys a substantial advantage in the power categories to the tune of 16 homers and 33 RBI. Meanwhile, Player B has a slim nine-run advantage, a slight nine-point advantage in AVG, over 20 more ABs, and a large advantage in SBs to the tune of 25 SBs. I would guess that Player B would come out ahead in the player rater, but I can’t be certain.
These are totally made up lines, so there’s no looking up the real counterparts to these lines and checking their respective rankings, and even if we could, that is not the point.
I assume Yahoo, ESPN or CBS could provide a "correct" answer or at least one justifiable by some objective standard. But just because the league providers do give us an answer, how are we to know that their conclusions are correct. Have they ever divulged their methodology and subjected it to the scrutiny of the Derek Cartys of the world?
To be clear, I'm not speaking from the "who would be more valuable to my team" perspective - we can't expect Yahoo to know that. But, even in terms of value in a vacuum, how are we to know Yahoo's answers are right?
I'm not going to attempt to derive my own formula, I'll leave that to those with more statistical chops than I. But, I do want to ask some largely rhetorical questions about how Yahoo thinks by looking at the stat lines of some of its top-ranked players.
Miguel Cabrera and Robinson Cano are the two top-ranked batters and they share similar statistical profiles. As of my writing of this article, they boast the following lines:
(1) Cabrera: .351 (73/208) 40 R 17HR 52 RBI 2 SB
(2) Cano: .363 (82/226) 41 R 12 HR 45RBI 2 SB
Right off the bat, we can pretty confidently answer one question some may have. It does not appear that Yahoo considers positional scarcity/eligibility in its rankings and that Yahoo's rankings are based on absolute value. I draw this conclusion because Cano's production from a 2B appears as if it would be more desirable to a team than Cabrera's production from a 1B. This is a nice and clean example because neither player can boast a specific type of production that the other is deficient in.
Here are some actual comparisons that Yahoo is forced to rule on.
(6) Alex Rios: .318 (62/195) 38R 12HR 29 RBI 17 SB
(8) Evan Longoria: .312 (68/218) 37 R 11 HR 44RBI 10 SB
So, here we see seven steals (and a tiny advantage in AVG along with one run and one homer) trump a 15 RBI advantage. That makes sense on the surface, given the relative value of an RBI versus a steal. However, Longoria is tied for fifth in all of baseball in RBIs, while 75 players have driven in more than Rios. Surely, Rios is getting credit for his across-the-board productivity here, but in this respect Longoria is actually the better balanced player.
Here’s another one.
(26) Troy Tulowitzki: .303 (64/211) 42 R 8 HR 29 RBI 6 SB
(27) Magglio Ordonez: .312 (63/200) 37 R 8 HR 41 RBI 1 SB
Here we see five steals and five runs overtake small absolute batting average and a substantial RBI advantage. It seems the player rater is relying on a similar value system as it did above.
So far, I’ve just focused on close calls, but I presume all of these are defensible. But, combing the top-ranked players list, you do find some curious rankings, relative to one another. For example, how do we explain the chasm between Brett Gardner and Elvis Andrus?
(20) Gardner: .311 (59/190) 41 R 3 HR 18 RBI 20 SB
(65) Andrus .304 (63/207) 39 R 0 HR 16HR 18 SB
So, Gardner does have a small advantage in AVG, but Andrus’ is weightier (I assume Yahoo considers this to some degree or else Buster Posey would be considered a better AVG asset than Ichiro). He also boasts two more runs, three more homers, two more RBIs and two more steals. Now, a combined five HR/SB advantage is a legitimate relative advantage, as one homer or steal is a lot more meaningful than a single run or RBI. But is this advantage really large enough that there are 45 players ranked in between them? How many David Wrights is Yahoo fitting on the head of this pin?
One of the most curious comparisons and perhaps the one that gives me the strongest urge to question the system is Alex Rodriguez and Casey McGehee.
(43) Alex Rodríguez .294 (63/214) 33 R 8 HR 43 HR 2 SB
(63) Casey McGehee .291 (62/213) 29 R 9HR 43 HR 1 SB
In this case, the two players have almost identical statistical profiles, including hits and at-bats. A-Rod has one more hit (in one more AB), four more runs and one more steal. McGehee has an extra homer. Yet, somehow Adrian Gonzalez, Ryan Zimmerman, Andrew McCutchen, Marlon Byrd, Michael Young, Josh Willingham, Kelly Johnson, Troy Glaus, and Adrian Beltre rank between them? Really?
How does Andrew McCutchen’s totally distinct profile of .314/34/7/17/13 compare to a more slugger-oriented stat line? Well, apparently it is more valuable than McGehee’s .291/29/9/43/1, but less valuable than A-Rod’s .294/33/8/43/1.
This isn’t even considering the fact that there a 10 pitchers grouped between them.
While I have no independent mechanism by which to rank McGehee and A-Rod, I feel I’m being reasonable and justified when I say that this triggers some reaction when put to the sniff test.
To be clear, the point of this piece is not to throw unsubstantiated mud at Yahoo's player ranker, but just to raise questions in a way that could benefit our readers. If it's possible that these rankers are wrong, that's a gaping opportunity to be exploited, considering just about the whole fantasy universe uses these systems to attribute relative value to known player performance.
You could either investigate the bias and rework the system which might allow you to capitalize the same one might when a league’s categories are customized while the pre-ranks remain tied to the default categories.
Or, you can scour the boards to find what seem to be bargains. Now, it's not news to find that McGehee can be had cheaper than A-Rod, but it is interesting to know that a system concerned only with the past and not interested in predicting the future can indicate such a sizeable gap in value between two nearly identical products.
This means that if I were equally confident that this performance represents both McGehee and A-Rod’s true talents, I should target McGehee because the mechanism we are using as the price guide seems to be off….unless of course it is right on McGehee, but off on A-Rod, which is why we should probably derive our own formula so we can determine if and where the disparity is occurring.
As the resident conspiracy theorist, I’ve done my job. OK, actually intelligent guys – get on that!
Derek Ambrosino aspires to one day, like Dan Quisenberry, find a delivery in his flaw, you can send him questions, comments, or suggestions at digglahhh AT yahoo DOT com.