Ah, the Saber-sphere is all abuzz with talk of regression to the mean. Regression to the mean is a fairly simple concept. If, over the past four years, you have a player who has had HR/PA rates of 2.8%, 1.9%, 2.3%, and 2.4%, then suddenly, his rate goes to 7.3%, what should you expect in the next year? (The correct answer is 2.6%, at least that’s what Brady Anderson did in 1997.)
Why not expect 7% again? Baseball fans (and a few front office folk) are remarkably good at coming up with justifications for why one should expect 7%. They’ll might say, “That year, Brady developed a new swing/changed his routine/changed his diet/began dating Madonna. That must be the reason for his sudden power outburst!” (The more cynical among you might suggest more nefarious reasons*.) How about another explanation? Brady Anderson got insanely lucky in 1996. It’s not often that fate smiles that kindly on one man for such a short period of time, but… how to explain this without referring to Kevin Federline… let’s just say it doesn’t happen very often.
After a few years worth of data points from 1992-1995, we have a decent idea that in reality Brady Anderson is the kind of guy who hits a home run once every 40 times to the plate (2.5%). In other words, we can be pretty sure that’s Brady’s true talent level. When he outshot that true talent level in 1996, it made sense that he was due to come back down to earth the next year (which he did). Or in fancy statistical terms, he regressed to his own mean. His performance regressed (got worse), due to the fact that deep down, he was playing over his head the year before, and the next year, he went back to doing what he usually does.
Exactly how to incorporate regression to the mean is the great knuckleball of Sabermetrics. There are as many theories on how to do so as there are Sabermetricians who have looked at the question. This is because what folks are really talking about is not ”how do I regress to the mean mathematically?” That’s actually really easy. The real question is ”How do we estimate a player’s true talent level?” In other words, what do I regress back to? What is this player really capable of?
Colin Wyers wrote a bit on true score theory in a recent THT article. In the piece, he said that a player’s performance is a function of his true talent level, random error (aka luck), and bias in measurement. He made me happy by including measurement bias in his conceptualization (although he then politely dismissed it). I still think there’s one extra missing piece that he hadn’t considered. Colin began to hint at that missing piece when he talked about Ichiro, who gets a hit in roughly 30% of his at-bats.
“Moreover, based on all those factors–and of course many others–a player’s true talent level changes from moment-to-moment. Ichiro may have a 30 percent chance of getting a hit in one at-bat, but if his jock strap starts to itch, perhaps that goes down to 29 percent the next. On the other hand, if someone in the dugout makes a funny joke(auth note: in Japanese? – P.C.) that puts Ichiro in a good mood, his true talent could go up to 31 percent so long as that good mood lasts.”
The actual equation should look like: Observed performance = true talent + measurement bias + contextual factors + luck/random error.
If there is a great sin of Sabermetrics, it’s that we (and I happily include myself in that pronoun) have treated players as though they were Strat-o-matic cards. That is to say that they don’t respond in the least to what’s going on around them, which doesn’t make common sense (although common sense is not a proof of anything…) We act as if it’s as if it’s just a matter of finding the right algorithim based on last year’s stats plus this year’s stats times prime rate minus the square of blah blah blah… After that, we know what a player has the probability to do. And he’ll do it no matter what situation he is in.
Or will he? Colin correctly points out that we won’t be able to know everything. (I frankly don’t want to know if Ichiro’s jock strap starts to itch.) But there are some things that we can know, and know them rather easily, that might make a big difference. Let’s take a truism in life. It’s a lot easier to do your job when you are in a good mood than when you’re in a bad mood, and overall, you’re probably better at the job in a good mood. Does it apply in baseball? Let’s take the simplest rough proxy for a good mood that there is: is my team winning?
Warning: This is the nerdy part.
I took the 2008 season, and found all plate appearances in which a batter who had at least 250 PA squared off against a pitcher who faced at least 250 batters, and the score was not tied. (It left about 78,000 plate appearances.) I classified whether the plate appearance ended in an on-base event (I included ROE), or not. To control for batter and pitcher matchup, I took the batter’s seasonal OBP (including ROE) and the pitcher’s OBP against. This is nice because we can use hindsight to get an idea of what each player’s overall talent level was during the 2008 season. Because OBP is stated as a probability (a .350 OBP = a 35% chance of getting a hit), we can convert the percentages into odds ratios with the formula OR = p / (1 – p).
Once we have that, we can figure out what the expected outcome is of this matchup with the formula: (batter OR / league OR) * (pitcher OR / league OR) = (expected OR / league OR). Figure out the expected OR and take the natural log of that number (more on this step in a moment.)
I then put the logged-odds-ratio values into a binary-logit regression equation. Binary logit deals in outcomes that have a binary (yes/no) outcome. Either the batter was safe or he was out. Binary logit models the probability that the answer will be yes or no based on whatever factors are entered in. It does this by modeling the probability as a… wait for it… natural log of the expected odds ratio.
Slip the natural log of the odds ratio based on the expected outcome from the batter and pitcher, and you’ve controlled for the batter pitcher matchup. (The coefficient on that factor should be very very very near 1.00 when you get your output). Now, enter any other predictor you want… including the dummy variable of whether or not the batting team is winning. Because we have 78,000 cases, we’ve got plenty of statistical power to check for significance above and beyond the effects of the batter and pitcher matchup. (Want more power? Add additional years!)
The regression equation that’s produced will give you a predicted log-of-the-odds ratio given all the input factors. I did just that. (To make sure it wasn’t an artifact of the 2008 season, I re-ran the analyses for 2007 and 2006 and got the same basic results.)
OK, you can open your eyes now.
What’s the value of the batter’s team winning vs. the batter’s team losing? Let’s say a league average batter faces a league average pitcher (an OBP of .339, including ROE). The generated equation says that if the batter’s team is winning, the expected OBP for that situation is .341. If the batter’s team is losing, it’s .334. That’s a 7-point swing, based entirely on what’s on the scoreboard. Seven points is not huge, but it’s not exactly trivial either.
So, a player’s “true” talent can vary based on whether or not he’s winning. This is very interesting when watching (and analyzing) baseball on a short-term level, and even has some more macro-level applications. Suppose that a player is traded from a team which isn’t very good (and as such, is losing a lot) to a team that is good (and is winning a lot). Or the other way around. Should we not adjust our estimates of his true talent to compensate?
Whether the batting team is winning is just one variable for consideration. I can think of a dozen more. I may not be able to read minds, but it’s not hard to figure out that a player is probably frustrated when in a slump (or when his teammates are in a slump?), angry when a bad call is made by an umpire, or a little more lethargic when it’s cold. All of these variables might be found with a little bit of engineering daring-do. But the point is that it’s time that we started looking a little harder at how the situation effects the men playing the game. Context matters.