# The Hardball Times Fantasy

## Fantasy semantics

by Jonathan Halket
May 06, 2010

If you've never taken a course in econometrics, I encourage you to do so if you have the chance. Actually, any course that teaches statistical methodology will do. Even if you never want to "crunch numbers," it will teach you how to think and read "probabilistically." Since nature, life and fantasy baseball are inherently random, understanding the semantics of probability is essential, doubly so if you're purporting to offer advice.

Things basic econometrics has taught me:

### 1. We can still say something about coin flips even though the outcome can be either heads or tails.

This may seem obvious, but I've seen educated people argue that just because we can't be sure about the outcome (or that "the outcome could be anything"), it is useless to talk about forecasts or numbers or statistics. Obviously not. We can still talk about which outcomes are more likely than others. We can still talk about the probability of outcomes. We can still say that the odds of a heads is 50 percent and that the outcome of a fair die roll is more likely to be between two and five (inclusive) than it is to be either one or six.

### 2. In an ideal world, tell me everything—give me the probability of everything.

Let's say you meet God. God turns out "to play dice with the universe." He doesn't know whether Chipper Jones is going to retire midyear, but he does know the probability that it might happen. He knows his own dice.

Why ask God only how many home runs he "expects" (that is, the average amount) Chipper to hit? Why not get more information from him? What's the probability that he hits fewer than five (tantamount to asking the probability that Jones gets injured)? What the probability that he hits 10, 15, etc.? With this information, you'd have a better idea about how risky Jones is.

### 3. It can be very tempting to take shortcuts when writing.

Actually, I learned this writing about baseball. I like to keep my columns as simple as possible while still making my point. I try to avoid adverbs when possible (though I am rarely successful). Writing "probabilistically" without adverbs is difficult—words like "usually, probably, likely" are useful. I have the same problem with numerical information. Yes, in a perfect world, I would just give you my probability of everything when I talk about my forecasts for Chipper, but parsimony and limited attention spans demand that I give you only as much as I deem relevant and interesting.

On the relationship between "experts" and readers:

The key is trust and establishing consistency. It is possible for one expert (say, Ron Shandler) to use mostly intervals and another (say, Derek Carty) to provide mostly point estimates. Intervals are kind of nice, but they require more disclosure. It is OK for Shandler to prefer to say (paraphrasing) "Miguel Cabrera is likely to have a home run total in the 30s" as long as we know what he means by "likely"—40 percent? 90 percent? Similarly, it is OK for Carty to say "Cabrera is projected to hit 37 home runs." If Carty gives us some interval around it, too ("standard error bands" in statistics speak), then Carty's statement is very similar to Shandler's even though they've used different words. (In fact, I am just sort of rephrasing what Carty wrote about on Tuesday. My problem with Shandler's recent writing is that he forgot a version of Lesson One above.)

Readers and writers need to come to a sort of tacit understanding about language. More often than not, writers are going to give numbers for everything. If a Shandler-esque writer wants to say "Cabrera is likely to hit around 35 home runs" instead of giving lots more numbers, than he should be consistent with what his words mean. Approximately what does "likely" mean?

On arguments within the expert community:

What goes for communication between adviser and advisee goes doubly for these blogged exchanges between experts. It is very hard to champion your cause against another "expert" in a venue designed to still be accessible to the layman reader. Actually, it is very hard to do it in any venue.

I'll have more on the quants-versus-quaints (in case you can't tell which side I'm on) debate in my next article, the tenor of which has actually be very good I think. Many expert exchanges are not nearly as interesting in part because one expert will say something semi-informative but mostly substantive like "It is good to use statistics to forecast how many home runs Cabrera will hit." And then the opposing expert will say something like "Statistical forecasts are always wrong. I prefer to go with my gut."

My problem with the second statement is that it is absolutely true but totally practically false. Forecasts are always wrong, but they are still incredibly useful. Most experts, even those who haven't taken econometrics, know this to be true. The more literally accurate statement, "I project that Cabrera will hit between 35 and 45 home runs with 95 percent certainty" would be more bulletproof to these kinds of flatulent responses, but all of those numbers are superfluous to the argument. At some point it would be better if some details could be taken for granted.

If you have a question for the Roster Doctor email here. Emails in simple text with players' full names properly spelled are much more likely to get responses. Also be sure to include your league's player pool (mixed, AL-only, NL-only), number of teams, scoring format (roto, head-to-head, points, etc.), categories, whether or not it's a keeper league, and any other pertinent information.