Tuesday, April 20, 2010
Most debates really boil down to semanticsPosted by Derek Ambrosino at 5:24am
So, the fantasy universe, or at least a small subset thereof, has recently been somewhat abuzz over debate between Rotowire/RotoSynthesis' Chris Liss and some of the poker pros/options trader folks in the CardRunners league. Derek Carty has already graciously offered as series of links that allow us rubberneckers to brief ourselves on the debate. Always up for sticking my nose where it doesn’t belong, I’d like to offer my take on things and fan the flames like Suge Knight at the Source Awards.
Essentially the debate comes down to this: Bill Phipps and his ilk think that the fantasy gaming community has not optimized pricing of stat lines. Chris Liss feels that even the best projection models are so inaccurate at predicting such stat lines that the ability to translate those stat lines into dollar values with perfect accuracy is either minimally advantageous, or the source of illusionary confidence, which only serves to reinforce the cognitive dissonance that bad projections can somehow be corrected by superlative valuation. Liss feels there are too many inputs and the game is too organic and the fantasy draft too dynamic for a model to be ideal. Instead he prefers to trust his ability to “thin slice” on the fly after copious preliminary cramming sessions. Phipps (and Carty) fail to understand how optimizing an additional tool couldn’t help. Of course, Phipps and crew have not revealed their model, so any assertions about what inputs and variable it does and does not attempt to address are presumptuous.
I actually think that this discussion is not as complicated as it seems to be. But more importantly, I also think it is a bit premature, as we lack the tools to really test either approach under the rigor of either the scientific method or the weight of tens of thousands of repetitions. Finally, I think the question of who is right may actually depend on how one defines success. There’s also some cultural baggage involved about the “expert” archetypes that is probably at play here too.
So, let me get to work at jumping into a gun fight among strangers with a pocket knife. I think I’ll make my points in the form of questions. (It seemed to intimidate Jeopardy contestants to the extent that the format is no longer even followed!)
Does it confer an advantage to have a tighter model for converting stat lines into dollars?
This one seems self evident—of course it does.
Liss argues that the advantage is marginal because the inputs are flawed in the first place and, perhaps even more importantly, the environment in which you use them is dynamic. The available player pool (and ergo the supply of different stats from different positions) changes.
To me, neither of these counterarguments actually refute the premise that it is ideal to have as accurate a translation tool as possible. Liss argues that he doesn’t need a translational tool because the whole process is fluid, like speaking a language in rapid, unscripted conversation. Well, I don’t fully buy that.
I have a fairly robust and unconsciously competent command of the English language, but that doesn’t mean that I never have trouble expressing myself as accurately and articulately as I wish, especially within the context of rapid, unscripted dialogue. And I wouldn’t dream of sitting down to write a dissertation without the full of arsenal of reference books (translation: aids) at my disposal.
A tool that can accurately reflect the value of a composite stat line is valuable, even if that stat line has a component of uncertainty and that value is set in a vacuum that doesn’t fully reflect the evolving dynamic of the draft room. The point to remember here is that this is just a starting point. Liss’ point about the dynamism of the draft process is important, but all that means is that if you want to optimize your odds when playing blackjack, you should know raw odds of your hand beating the dealer’s (partially hidden) hand and should be counting cards, too.
Sure, fine, so how does having an accurate pricing model in a vacuum preclude Phipps from counting cards as well, and modifying the model to reflect the ever changing supply? It doesn’t. Owner A just took somewhere between 55–75 stolen bases out of the pool by buying Ellsbury; all remaining lines therefore get adjusted. You could either do this in your head, or through a computer. I fail to understand why the human brain has any intrinsic advantage at modifying value of remaining players on the fly.
The real question here is whether the owner using the computer model is aware of how reliable (or unreliable) the projections are and has sufficiently corrected for that in his model.
How much advantage does a better conversion/translation tool confer during the draft?
Well, this is the question of whether Phipps is a good card counter or not. Or, to add another analogy here, having the best feel of what raw materials should cost will not necessarily make you the most efficient builder.
So, in addition to the elastic supply and demand in a fantasy draft/auction setting, one must assemble value in the correct way. The efficiency of correct pricing can quickly evaporate if you make errors in estimating how much of each material you need to build your structure. In a fantasy league, margin of victory is meaningless, so the holy grail is to get maximum value distributed with maximum efficiency, which is to win every category by a single unit. However, more likely is that you wind up with surplus bricks but are woefully short in mortar; at that point is doesn’t matter if your whole lot of materials is appraised for more than anybody else’s lot, or if bricks are more valuable per unit than mortar. The surplus bricks are worthless to you and unless you can turn them into something else, you lose.
The point here, and I don’t think either of the parties would dispute it, is that even with better translational skills, other knowledge gaps can drastically mitigate the (marginal in the first place, according to Liss) benefits derived from optimized pricing. To say this another way, to retain the advantage of the optimized pricing model, either the model must have the capacity to process this dynamism built in, or its user must be able to thin slice these developments as well as Chris.
To be fair to Liss’ argument, it bears emphasis to repeat explicitly that he does not fully deny the advantage of optimized pricing, though he is skeptical. He just thinks that such an advantage is small in relation to the other knowledge areas from which one could derive an advantage. And that’s why the poker pros aren’t yet ready to compete with the fantasy pros; they’re scraping the margins of the math game, while the “genius" drafters are honing their ability to predict sea changes in specific commodities. And, frankly, in an important sense, I agree.
How do you beat an expert gamer in a math game?
Let’s start with one basic premise of game theory and one fact about computer programming, neither of which are fields in which I’d consider myself an expert. Game theory dictates that you never want to alter your play in a manner that will cause a poorly playing opponent to react in a way that will probabilistically improve his play. Computer programmers note that one of the most difficult things to program a computer to do is to generate random numbers; what often appears to be a random string is really just a very small string within a much longer string, for which a pattern exists.
And, while we’re at it, let’s throw in the good ole Voltaire quote: “The perfect is the enemy of the good.”
Now, I will offer my theory about how I would go about playing one of these poker pros, in poker, were I given the opportunity. I’m a decent card player. I don’t play at casinos much, but I have a fairly good grasp of math, probabilities, and can do fairly complicated computational math in my head … even while drinking scotch—alcohol tolerance is an unheralded skill for the informal poker night warrior. Basically, I’m just good enough to know how badly everybody else at the table is playing and complain about it. Yeah, tons of fun, I am!
But I would not attempt to play like this were I to play out a discreet trial with a table of experts. By playing my best, all I would be doing would be to ensure that I am playing an inferior version of the game that they are playing. Instead, what I would do is to try and make myself seem like a bigger wild card while still retaining a semblance of objectively correct strategy, Sure, I’m mitigating my own capabilities but mitigating my opponent’s even more because he/she has more to lose by minimizing things like the ability to predict my hand by the betting patterns I make.
To continue the strained analogy and bad metaphor theme, what I’m doing is attempting to turn this basketball game into a 3-point shooting contest, which is the way Cinderellas most commonly knock off high seeds. I may think I’m actually a better post player (the percentage play) than I am a 3-point shooter, but the gap between my post skills and my opponent’s is greater than the gap in our 3-point shooting skills.
In a way, what Chris is saying is don’t leave me open uncontested from behind the arc because I will knock ‘em down all day if you don’t put a hand in my face. I’ll take the vig (lower percentage shot) for the trade off that I get to pick my shot and get open (you don’t go the extra dollar), and if I’m shooting as well as I normally do, you’re going to have to make a whole lot of turn-around seven-footers to outscore me. Further, you’ll actually miss some of your twos as well because I’m defending the paint while you are not even bothering to defend the line.
Liss’ argument is that the fantasy pros inside game is good enough that the quants don’t have room for a huge advantage, meanwhile Liss and crew will easily out shoot the poker pros from 3. And, as tempted as I might be to put my money on a single quant versus any of the single geniuses, when you break them into fields, I think the odds are that ones of the geniuses wins more often than not. More on this in a minute.
How do we judge who performs the best in a fantasy league, and what is the goal when developing your team?
Now, we get to my real question.
Two of the analogies mentioned throughout this debate were chess and stock market. These were chosen as examples to reflect subjects for which processing power was the linchpin in figuring things out and where the inputs were just so numerous and diverse that a model was nearly impossible to build at all. There’s one other difference between chess and the stock market, one which fantasy baseball actually marries though, resulting in a difficult to resolve question.
In chess, you are playing a one-on-one game, in which you must simply beat your single opponent; there’s one winner and one loser. To succeed in chess you must win way more often than you lose. In the stock market, you are really competing with the field to call yourself successful. You don’t ever need to have the biggest day, or week, or month of any other trader, you just need to win more than you lose and be consistently profitable over time. In fantasy baseball, you play against the field, but there is only one winner. This dynamic affects the appetite and rational tolerance for risk.
I don’t think a model-based approach will necessarily be risk-averse. In fact, I think a good model will aim to be risk appropriate. But, as long as a quant is competing against a field of “genius” sharps, it seems plausible that nearly every season several of the geniuses will take on what objectively derived and rational models will deem too much risk and one or more will hit on a bunch of those picks and win by outperforming the market. My biggest fear about the quant approach is that it’s a path to being a perennial runner-up.
I have no doubt that the quants will “get their money in good” with high frequency even right off the bat. They could do this just on the strength of math even if they didn’t know much about baseball. But that doesn’t mean they will win the league outright with any consistency. If you are playing against an opponent who sees a second-place finish and a last-place finish as the same exact thing, how do you consider that in a model?
I question whether the genius drafter and the quant are actually competing strategies, or discreet paradigms, one being a road to perennial contention but smaller margins for over or underperformance, and the other being more volatile in terms of range of outcome, but more anecdotally successful. And this is the question that begs the elephant-in-the-room meta-point; how do we judge success in the fantasy baseball arena?
If Liss and Phipps were to play out 20 seasons and (pretending that is a statistically significant sample size), Liss has five championships with an average finish of 3.8, while Phipps has only two championships but an average finish of 3.2, who is the better player? Phipps has higher batting average, Liss the better slugging percentage.
Until we answer that question, I’m not sure we can form viable opinions on the relative merits of the genius and quant strategies. Perhaps some insight lies in what the market wants out of its experts. From whom would you rather take advice, the guy who consistently exploits the market inefficiencies and beats it on the margins, or the guy who swings for the fences and connects more of than the other sluggers but still makes more wrong decisions than the consistent margins guy? Rationally, I think we want the quant. But, culturally, I think we romanticize the genius.
Back to the topic at hand for a second, I think one of the intriguing questions here is whether the quants remain fully agnostic as they nurture their genius tendencies. The poker analogy is kind of like reading players: “You know, I think that guy is bluffing and I can tell not by his betting patterns, but by his body language.”
Certainly, card-playing quants are open to integrating that form of insight, so why wouldn’t they be open to saying, “You know what, Justin Upton has shortened his stride this preseason and it is really helping him handle those pitches on the outer half, and since the statistical projection model doesn’t factor that input, I think his baseline is actually his 70th percentile season based on their projections, so I’m going to bump the price I’m willing to pay for him.” (Totally made up scouting evaluation by the way; I know nothing about Upton’s stride length.)
Anyway, my biggest question in this overall debate is how are we to know who is right? Certainly, who wins this single league—one trial that takes seven months to complete—is not really telling of anything. How many times would quants and geniuses have to play out a single season before the results have meaning? So, until we have simulators that can simulate the minds of five guys like Phipps and five guys like Liss, play out thousands of drafts before a single season and mimic their in-season managerial styles, how do we separate luck from skill? Even further, we’d then have to do the same year after year to determine whether the trends in the first set of trials were due to variance, and if so whether there’s any trend within the variance that either player may be consciously or unconsciously exploiting.
I believe it was Mike Podhorzer, who popped in on Derek Carty’s article, who once wrote a piece back at Fantasy Generals about what constitutes an “expert,” and performance was not one of the metrics he used, quite correctly I think. Sure, an expert will outperform the mathematical probability of winning his league over the long haul, but that bar is low and the trials one completes, even in a lifetime of fantasy gaming, are relatively few. If I play a 12-team league for 24 years and win three times, do I pass that bar? Was that my skill or luck? Instead, Mike focused on factors like intellectual independence, internal consistency of reasoning, etc. as criteria. So, while the genius vs. quant debate is fascinating, let’s remember that experts exist in both camps and that a few seasons worth of anecdotal performance will provide very little insight into the relative merits of the approaches.
There is one tool that I wish would be developed that would help advance our ability to test some of our theories, or even just to add perspective. I wish Yahoo, ESPN, CBS Sportsline and the other main fantasy sports providers adopted a census option of sorts that would feed and build a database. When you set up your league, you can choose whether you want the data tabulated as part of the census, and what that feature would do is record your league’s settings and bank its results with leagues with identical settings. So, therefore you would develop a database of mixed 14-team leagues with this exact roster structure. Users could then search that database to find things like what categorical benchmarks you’d have to target to aim for 11s across the board.
At one point in the quant vs. genius debate, the question came up about the stratification tendencies of categories. Do home runs tend to cluster relatively tighter than steals? I don’t know; I can only look back at my past leagues’ standings and guess. But, if I have access to the aggregate data of hundreds of thousands of leagues played with the same settings, it’s much more likely that the trends that emerge are meaningful.
Derek Ambrosino aspires to one day, like Dan Quisenberry, find a delivery in his flaw, you can send him questions, comments, or suggestions at digglahhh AT yahoo DOT com.