Ifs and buts

by Jonathan Halket
November 5, 2009

“Hmm. If I’m healthy next year, I could be your fourth-round pick.” (Icon/SMI)

Warning, this is a rant—a rant with a purpose, but nevertheless one full of unnamed conspirators and false friends. I can’t even say for sure that I haven’t been guilty of these crimes. The point here isn’t to name and blame, but rather to enjoy a bit of therapy. It is healthy for me to let it all out every now and then, and if it helps you—well then we’re both smiling. Today’s exorcism focuses on two ways that experts (and others) often give non-answer answers.

The ifs

How would you feel about the following hypothetical question and answer?

Desperate fantasy player:
“Dear Guru,
My league does our 2010 draft during this year’s World Series. Who would you draft first: Johnny Damon or Raul Ibanez?”

Roster Quack:
“It all depends on who stays healthy. If Ibanez can stay on the field, then he’ll have the greater value. If he’s not 100 percent healthy, though, and if Damon re-signs with the Yankees, then Damon might be a better pick.”

What’s wrong with this diagnosis? Well, the Quack doesn’t actually answer the question. Instead he has provided a bunch of conditional statements. Linguistically, conditional statements often look like: if event X happens, then the value is Y.

There’s nothing wrong with a conditional statement. A bunch of them can often provide more information than a single unconditional statement (which would be, e.g., “Ibanez is worth more right now than Damon”). They’re useful for explaining one’s reasoning: “Ibanez is worth more than Damon because if Ibanez stays healthy he’ll outproduce Damon AND I don’t think he’s a big injury risk.” But just as often, gurus use them to avoid (intentionally or otherwise) giving an answer to the hard part of the question.

For instance, how helpful is a statement like: “If Curtis Granderson could hit lefties as well as he hits righties, he’d be a second-round pick”? Well, if you didn’t know that Granderson had terrible splits, then it would be useful. But if you were wondering about his value for next year, you’d be left a little short.

Sometimes substituting a conditional statement in place of an unconditional one is helpful as long as it is accompanied by a little honesty. For instance: “I’m terrible at projecting injuries, so you should use your own expectations concerning injuries. But, if Ibanez is healthy, he is worth more than Damon.” Here, the guru is telling you that he could give you an unconditional statement like “My projections are that Damon is worth more than Ibanez,” but that it might be based on some unreliable injury forecasts. So, instead he provides you with the part of his forecast that he feels is more reliable, while at least being upfront about his unfamiliarity with the repercussions of Ibanez’s current ailments.

The buts

“But for Ibanez’s hot September (when he hit seven home runs), he was terrible after the All-Star break. You can’t expect that kind of September again, so I think Ibanez is due for a regression.”

Nearly everything about this guru’s prognosis is correct. Ibanez did have a great September and had an equally desultory July and August. Let’s ignore that he also hit seven or more home runs in April and May and grant that such months are rare events. Still, the logic above is almost certainly incorrect.

Suppose I tell you I have a die numbered 1 through X. It could be a 100-sided die (1-100), a standard six-sided die (1-6), etc. You don’t know what X is—that is, you don’t know how many sides it has. But I do tell you that the die rolls are each independent of each other—the result of one die roll does not affect the outcome of the next (just like any normal die). Ask yourself if there is any difference in the following pieces of information:

1) I tell you that I rolled the die 200 times and only the first four rolls came up with the number 1.
2) I tell you that I rolled the number 1 four times in 200 rolls.

The first outcome—rolling 1 four times in a row to start (and then never again)—is the far rarer outcome. Imagine a 100-sided die—then the probability of doing it would be one time out of 100 million! If we were strato-mating a baseball season with that die, probably we would never get that outcome again. Nevertheless, both statements are equally informative about how many sides the die has (that is, what X is). Statistically, this is due to the independence assumption—which means that the order in which events occur is uninformative.

Now replace “rolls a 1” with “hits a home run.” As long as the independence assumption holds in baseball, then it makes no difference whether Ibanez hits all his home runs in September or not. Only the sheer number of home runs is informative, not when they occurred.

There are many discussions of the independence assumptions—for instance, if there is such a thing as a hot or cold player. Most research points against streaks and for independence. It doesn’t really matter here actually, unless you believe that streaks can carry over through the offseason and into the next. Off the top of my head, many gurus use “buts” to remove rare events from consideration. Few of them, I would venture, believe in multi-season streaks.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG