I have been acquainted for some time with the work of one Jonah Lehrer, who has written books and articles on various aspects of psychology and neuroscience. I have not always agreed with Lehrer’s interpretations and conclusions, but I have appreciated his efforts to relate recent research to literature (e.g., in his book Proust Was a Neuroscientist) and to understanding modern life (e.g., in his book How We Decide).
On Monday, I became acquainted with another Jonah Lehrer, a seemingly different being who evidently inhabits the same body. Jonah Lehrer 2.0 finds fault with sabermetrics and the kinds of analyses presented on sites like The Hardball Times. His complaint is that sports teams…
…are seeking out the safety of math, trying to make extremely complicated personnel decisions by fixating on statistics … they are pretending that the numbers explain everything.
…This is largely the fault of sabermetrics. … The underlying assumption is that a team is just the sum of its players, and that the real world works a lot like a fantasy league.
…But sabermetrics comes with an important drawback. Because it translates sports into a list of statistics, the tool can also lead coaches and executives to neglect those variables that can’t be quantified.
Lehrer 2.0’s concern over whether variables can or cannot be quantified seems odd, especially in light of 1.0’s obvious familiarity with psychology. Lehrer’s subject, Proust, famously, wrote about memory, of course, and Proust’s work remains well known to this day because he captured so elegantly many aspects of memory that are not so easily articulated, much less quantified.
But today’s psychologists who are interested in memory quantify everything they study, and other psychologists invent ways to quantify depression or extraversion or risk-taking or anything else they seek to study scientifically. One of the first things that psychology majors learn is that concepts and variables relating to unobservable inner workings of the mind must be “operationally defined,” that is, described in such a way that they can be measured.
So “memory” might be defined in one study as how many words a person can recall from an earlier-studied list, “extraversion” might be a score derived from a personality test, and so on. No one who studies memory believes that any one operationalization captures everything there is to be known about memory, or even that the sum total of all the operationalizations ever used gives us a complete picture. Some aspects of memory no doubt remain unquantified. But modern psychology depends heavily on quantification. Lehrer 1.0 must know that.
So he should tell Lehrer 2.0 that sabermetrics is nothing more than a way to operationalize variables, which is appropriate, since athletics is another form of human behavior like any other that psychologists might quantify and study. Sabermetrics is founded on basically the same principle of operationalization that so much of the work of Lehrer 1.0 has depended on, and sabermetric measures are used in similar ways.
For instance, psychologists sometimes disagree about which operational definition is most appropriate to a given research question. In the same way, when sabermetricians introduce a variable like OPS, they are more or less just objecting to the idea that batting average is an adequate operationalization of hitting ability. I have not seen anyone write that OPS is a perfect measure of hitting ability, just that it is better. And just as the results of a psychological study might be misinterpreted by one psychologist or another, the results of any sabermetric analysis might be misapplied or misunderstood. That fact does not support the concerns that Lehrer 2.0 has raised.
Lehrer 2.0 similarly misunderstands sabermetric results that are yielded from statistical methods that Lehrer 1.0 would find familiar, because modern psychology relies heavily on various kinds of statistical modeling, for example in brain imaging studies. In sabermetrics, such models specifically recognize the impossibility of making perfect predictions and thus are not “pretending that the numbers explain everything” as Lehrer 2.0 says.
One example, Baseball Prospectus’ PECOTA forecasting system, gives a percentile forecast that takes uncertainty into account by making an estimate of the probability that a player will achieve some statistical standard (such as, a 50 percent chance that a player will have an OPS of .800 or higher), rather than yielding a single number in an attempt to “explain everything.”
Users of PECOTA know that it is not true, as Lehrer 2.0 says, that the “underlying assumption is that a team is just the sum of its players, and that the real world works a lot like a fantasy league.” The underlying assumption is that there is a range of possible outcomes and that the real world works according to a very large number of quantified and unquantified factors.
It is also unclear how Lehrer 2.0 would have sports executives deal with those variables that he thinks cannot be quantified. PECOTA deals with that by considering the range of possible outcomes and estimating how likely they are. Lehrer 2.0’s prescription for handling the unquantifiable is confusing:
If we were smarter creatures, of course, we wouldn’t get seduced by the numbers. We’d remember that not everything that matters can be measured, and that success in sports … is shaped by a long list of intangibles. In fact, we’d use the successes of sabermetrics to focus even more on what can’t be quantified, since our new statistical tools take care of the stats for us.
So what does Lehrer 2.0 mean by “focus even more on what can’t be quantified?” If we are not to measure those variables, what do we do instead? The most-usually-offered alternative to using statistics to make roster decisions is to use the judgments of scouts.
But that notion, at least in the sense that it offers an alternative to quantifying things, is based on a misunderstanding: The scout’s judgments are themselves measurements of a sort. They rank player A ahead of player B, ahead of player C, and so on. By assigning ranks, they have put numbers on the players just as surely as if we calculated their WARP. And what do we do if the scouts disagree, as they so often do? Average their rankings? More statistics!
Lehrer 1.0 knows from writing his book How We Decide that human decision processes are fraught with systematic biases and errors that the scouts must be as subject to as the rest of us. Lehrer 2.0’s recommendation that it would be better to rely more on those processes and less on sabermetrics seems especially odd in that light. (No, I am not saying we should simply ignore the scouts, but we should not pretend they are not human, either.) I think Lehrer 1.0 is closer to the right way to think about this—I guess I’m not yet ready to upgrade to Lehrer version 2.0.