Hair of the Dog: The Hangover Effect Revisited

When a game goes this long, sometimes a team needs two position players to pitch (via Dave Herholz).

When a game goes this long, sometimes a team needs two position players to pitch (via Dave Herholz).

Does it count as name-dropping if nobody actually says your name?

The pity is, I should have seen it live. I had been taking the MLB game-streaming subscription for an extended test-drive and, being a Yankees fan, I could have had their home game against the Rays running. Instead, I was watching the Cardinals and Cubs. I’m going out to Chicago later this season and will get to take in my first game at Wrigley Field during this, its centennial season. I had therefore lately been following the Cubs, getting a tutoring in the state of masochism that famously defines the team’s fans.

(This isn’t hyperbole. Less than two weeks before, I’d watched them play in Federal League regalia to commemorate the park’s 1914 grand opening as home for what would be known as the Chicago Whales. They were up 5-2 on the Kansas City Packers (the day’s alias for the Arizona D-Backs) going into the ninth. They gave up five runs that inning in truly gruesome fashion—especially for right fielder Justin Ruggiano, who injured himself chasing what would end up as the game-winning triple and missed more than a month. The sight of that one kid’s reactions in the stands … welcome to Cubs fandom, son.)

I was midway through what was still a scoreless Cards-Cubs game when I got an e-mail from friend (of myself and of THT) Paul Golba. Its compact contents were quite the shock: “They just mentioned your THT article about long games and the impact on later games on YES. Alas did not mention your name, just the site.” Not long after, I got another e-mail, this one from THT’s Dave Studeman, saying much the same thing with an appended “Way cool!”

Why, yes. Yes it was.

With Paul’s mildly blurred recollection guiding me, I used MLB.tv’s game-review function to hunt down the moment. It was the bottom of the third inning, the Rays up 2-0 on the Yankees, with Yangervis Solarte working a long at-bat. The YES Network’s announcing team needed something to fill the time, which lead announcer Michael Kay as much as admitted in introducing the material.

“Our stat-guy, James, is amazing. I don’t know how he finds this stuff. 2012, Hardball Times had a “Hangover Effect” study. They studied games of 18 or more innings and found that the losing teams vastly underperformed their season record during the week after the marathon game.”

I have several observations to make here. Firstly, Mr. Kay shouldn’t be all that amazed. It’s not like The Hardball Times is hiding its light under a bushel. Anyone’s welcome to read us, even noted television personalities who might admittedly have a lot on their plates.

Second, I have no problem swallowing my ego long enough to say that James, whoever he may be, probably did well by crediting THT collectively. Giving my last name to somebody to read cold on live television would have been a disaster waiting to happen. A hilarious disaster, granted: less The Poseidon Adventure than Airplane! Still, best avoided.

Third, what I anticipated would happen did not, and I’m glad it didn’t. The Rays and Yankees had played a 14-inning game the previous night, which obviously was the spur for stat-guy James to dig into the aftereffects of long games. Before finding the clip, I was worried that the announcers were going to take the things I had written about games at least 18 innings long, and apply them out of context to the preceding, significantly shorter contest.

As the quotation I gave shows, they didn’t. They kept my finding (not the only one I made, of course) within its proper narrow focus. This is, of course, fully the doing of the mono-nymous James, whose behind-the-scenes hunt for broadcast-filling baseball facts did not tempt him to stretch what he uncovered.

So consider this your kudos, James—and consider it an invitation not to be a stranger ’round these parts. We’ve got a comments section: come and say hi. In fact, with the job you have, I’d be interested in much more than saying hi. If you ever want to talk about what it’s like being a stat-hound for a marquee baseball broadcast, you’ll have a receptive audience here. You can go through me; you can go through Dave Studeman; you can do it yourself. Something can be arranged.

Fourth, and most important for what’s to follow, there was a little addendum to the trivia bite. Michael Kay’s reaction to what he had just delivered was, “Gotta be pitching related.” Paul O’Neill, former Yankee and one of Kay’s broadcast partners, dismissed it with “Coincidence!” Ken Singleton, third man in the booth, said something not too distinct, but which sounded like he was agreeing with Kay.

Michael, thank you for asking the next question, even in the form of trying to answer it yourself. Paul, thank you for disputing the point, emphasizing that Michael’s isn’t the last word. Because you two got me wondering the same thing.

A Hardball Times Update
Goodbye for now.

When a team’s performance falls off after a marathon game—it happens for both teams, and for rather longer than a week, though the instance James gleaned was the most dramatic—what is the greater cause? It seems natural that a sorely-used pitching staff would take longer to recover than the position players—or does it? Would the need to keep running out most of the position players game after game, while at least some relievers would get a breather for a couple days, make it the reverse?

I will strive to explain the results from my earlier article whenever today’s findings relate to them, but those wanting a full grounding may look back here to that article, either to refresh their memories or see for the first time all that I had to say then.

Offensive and Defensive Effects

My 2012 study looked at games of at least 18 innings that happened between 1990 and the 2011 All-Star Game. In this offense/defense breakdown, I have added a 19-inning game the Atlanta Braves played on July 26, 2011. The reason I’m not adding their opponent, the Pittsburgh Pirates, is because it was the collapses of the Pirates after extra-long games in both 2011 and 2012 that got me studying this phenomenon. Including them in the patterns of wins and losses after a marathon game would have biased the results toward the conclusion I was hypothesizing: that super-long games hurt you in the long term, win or lose.

The data I got without the Pirates ended up supporting this theory. Winning percentages were suppressed after a marathon, compared to team’s records for the entire season, and the adverse effect lasted a full 30 days after the game. (I didn’t track beyond 30 days, thinking any “hangover effect” that did exist would run out by then.)

I have taken those 47 teams, the original 46 plus the throw-in Braves, and gathered both batting and pitching data for them for the periods after their marathon games. This generally went 30 days, but not always. I cut off games played after any substantial break that would allow teams to recuperate from fatigue. This affected six teams (in three games) playing before the All-Star break, and two teams that played before the week’s suspension of baseball after 9/11. This leaves 1,131 total games for the 47 teams.

This first table shows some offensive stats of marathon-playing teams, during both the full season and the 30-day “hangover” period after their marathon games (or as many games as they played before one of the breaks mentioned above). The apparent effect was rather surprising, but I may have a reason for this.

Offensive Changes for “Hangover” Teams
Batting Stats “Hangover” Full Season Difference
H/9 9.06 9.052 0.008
R/9 4.648 4.623 0.025
BA/OB/SL .2644/.334/.414 .2625/.3308/.4125 .0019/.0032/.0015
OPS 0.748 0.7433 0.0047
BB/9 3.385 3.339 0.046
SO/9 6.41 6.475 -0.065
HR/9 0.998 1.009 -0.011
HBP/9 0.332 0.32 0.012

Yes, those numbers show teams’ offensive performance improving, if slightly, after the exhaustion of an 18-inning (or more) game. Every category shown is in favor of the hangover teams except for home runs. The margins may be slim—one-fortieth of a run per game would be four runs over a whole season—but it’s the wrong direction when performance is supposed to be degrading.

What we may be seeing is the effect of summer. The warmer months brings a rise in offense, an effect widely understood for some time. As I observed in the original article, of all the marathon games in the study period, none of them happened in the month of September (or October). This means none of the hangover periods would have run to the end of the seasons, when temperatures and thus offenses are cooling again. Even if some had, the cutoff brought by the end of the season would have mitigated the influence of autumn’s approach.

So an actual drop in offensive performance could have been masked by a rise in the overall scoring environment. Or the rising temperatures might have just augmented an actual, if tiny, uptick. But what we’re actually looking for is the comparison between offensive and defensive changes, so we can shelve that question for the time being. How do the hung-over teams look on the other side of the ledger?

Defensive Changes for “Hangover” Teams
Batting Stats “Hangover” Full Season Difference
H/9 9.297 9.025 0.272
R/9 4.839 4.589 0.25
ERA 4.381 4.193 0.178
UERA 0.458 0.396 0.062
HR/9 1 0.985 0.015
BB/9 3.372 3.334 0.038
HBP/9 0.32 0.327 -0.007
SO/9 6.542 6.681 -0.139
FIP* 4.321 4.258 0.063

*The fielding-independent pitching numbers use a constant of 3.10, roughly the average for the 1990-2011 period under study.

This is more what we expected. Every statistic worsens, except for the not-so-important HBP numbers. There are still a couple interesting comparisons to make with the offensive stats.

It’s interesting that the walk numbers both go up, a shade more for offense than defense. It could be that a thousand-plus games isn’t that reliable a sample (and you’ll see later that this isn’t my only cause for thinking so). Otherwise, what could raise the walk rates for “hung-over” teams? Does fatigue make you more passive at the plate? If so, it doesn’t cost you in called third strikes, as the offensive strikeout numbers dip. It might, per the HBP figures, make you a bit slower getting out of the way.

The home run figures move in the expected directions, but added together are virtually a wash. Given that I’m positing that the hangover periods tend to coincide with summer months, and that summer’s heightened offense comes largely from balls carrying farther and ending up homers more often, this is a bit incongruous.

The original question, you may recall, was whether the post-marathon slump in teams’ fortunes was attributable to the pitching side of the ledger. Despite the possible confounding factor of summer, it looks safe to say that it is. Teams gave up an extra quarter-run per game during their hangover periods, as opposed to adding just one-fortieth of a run per game on offense. Michael Kay’s instincts were on the money.

In truth, though, this doesn’t quite cover the weaker records they produce. I calculated in 2012 that hangover teams won about one game less over the following 30 days than their overall records would indicate. The change in run differential shown by the tables is 0.225 runs per game. In 30 days, a team can expect to play an average of 27 games, which would add up to 6.075 runs. Using the standard equivalence of 10 runs to one win—and given the centering of the study on the Steroid Era, this holds up even after countering the recent offensive cool-down—we are missing around two-fifths of a win.

The 10-to-one ratio is shorthand, of course. The “hung-over” teams simply undershot the records we’d expect from the change in run differential. That itself, though, is a problem. It means the expected hangover effect was less than the one the records showed, unless we argue that teams somehow sequence their runs in a way causing them to lose more tight games (or win more routs) in the aftermath of marathon games.

The former is perhaps arguable. If you’ve blown out your bullpen playing 18 innings, subsequent close games are liable to get rocky. Still, if we give credence to the underlying numbers, the hangover effect isn’t as strong as I originally stated it to be.

Fortunately, I was already looking to test my results further. I had gathered data from years both before and after my original period, going back to 1982 and forward to the marathon-heavy season of 2013, when six games of at least 18 innings were played. Following my previous example, I excluded the 19-inning game played by the 2012 Pirates, since their post-marathon travails had drawn my interest and would bias my result toward my preconceived conclusion. Their opponent that day, the Cardinals, I did include.

This expanded the original pool from 46 to 98 teams, or would have had a new complication not cropped up. On June 11, 1985, the Atlanta Braves lost an 18-inning game against the San Francisco Giants. Starting 23 days later, on the 4th (and 5th) of July, they had a 19-inning game against the New York Mets, an absolutely bonkers contest that THT’s Chris Jaffe wrote up as his nominee for the greatest game ever played.

This is the sole instance in the data set of a team playing two games of 18 innings or more within the space of one month. It adds a level of fatigue to the Braves’ endeavors, at least after the second marathon, that I cannot compare properly with the other teams. I thus removed entirely the second game, and cut off the hangover period of the first game at the point where they went 19 rounds with the Mets.

There were also more instances of the All-Star break intervening, as well as September marathons meaning the season sometimes ended before the full 30 days elapsed. I made the usual cutoffs after those. I took out games against a team’s marathon opponent, as they would necessarily add one win and one loss to the record and thus tell us nothing, and for the same reason trimmed out games against other teams currently in a post-marathon hangover.

The Longer View

That done, I got on with the business of seeing how the expanded group coped after the exhaustion of a double-length (or more) game. And ended up almost wishing I hadn’t.

Expanding the sample wipes out most of the evidence that a game of 18 innings or more has a deleterious effect on a team’s performance in following games. The overall trend remains downward, but by a margin that is no longer statistically significant. If one counts only the new games I added to the list, one shockingly finds the opposite effect happening.

Win-Loss Records for “Hangover” Teams
Years of Sample Season W% “Hangover” W% “Hangover” W-L
1990-2011 0.5013 0.4613 471-550
1982-2013 0.4997 0.487 1085-1143
82-’89,’11-’13 0.4982 0.5087 614-593

In the original sample, hangover teams lost .0400 from their winning percentages in the 30 days after their marathons. This plunges to .0127 for the expanded sample, and turns to a .0105 increase for the newly added teams. Worse still, the added games form a larger sample group, and thus should be a more reliable indicator, than the original one.

This is surprising if one looks at the years. There were 23 marathon games in the 21 and a half seasons from 1990 to mid-2011. In the 10 and a half seasons forming the “bookend” years, there were 27 games that went 18 or beyond. My original sample had marathons taking place at less than half the rate of the other years, a result that holds up when taking both expansion and strike seasons into account.

I can, even in my flabbergasted condition, imagine something that might explain the huge discrepancy, but I’ll save that for a bit. In hopes of salvaging something from my original conclusions, I will break down the results for teams that won the marathon and those that lost.

I originally found that winners got a bump in the first week after, while losers crashed and burned. Over the full month, the cumulative gap narrowed a great deal, but still left the marathon winners with better performance than the losers, compared to season marks for both. The table below shows the relevant numbers.

Records for Winners & Losers, Original Sample
Type Gms. In Sample, Wk/Mo Season W% Week After Month After
Winners 139/556 0.5027 0.5324 0.473
Losers 138/551 0.4999 0.3841 0.4537

The winners had their records over the next month drop by .030 against their full-season marks, while the losers saw a fall-off of .046.

With the expanded sample, the season marks of the winners and losers change. The winning teams in marathon games averaged a .5063 winning percentage; the losers averaged .4929. Not big changes, and of course they are acting as baselines for performance in the hangover period. We expect the numbers to drop, and they do … just in a different pattern than before.

Records for Winners & Losers, Expanded Sample
Type Gms. In Sample, Wk/Mo Season W% Week After Month After
Winners 239/1073 0.5063 0.5546 0.4893
Losers 229/1037 0.4929 0.4498 0.4831

Winners do better than before, both for the following week and the following month. Losers, however, do much better than before, strongly closing the gap in both periods. In fact, for the full hangover month, those losers now lose less off their seasonal winning percentage than the winners do. The week after a drawn-out defeat is still rough, but by these numbers, the month after ends up a shade better. (Except for having lost the marathon. The loser gains only around a fifth of a win back in the next month: not worth the cost.)

This means, of course, that once again the added sample turned out to be vastly different from the original one. How different?

Hangover2-W&L

The samples are not very large—in the 100-game range for the first week and the 500-game range for the full month—which explains why swings this big aren’t quite as shocking as they may appear. Okay, the swing for losing marathoners in the first week is still shocking.

What it adds up to remains the same. There are still indications that winning a game that goes 18 innings or longer will give you a week-long boost, and losing it will depress your performance for the same period. The sample sizes, though, are too small to make a definitive call (not that I’m in a mood any longer to trust my definitive calls). For the full hangover month, I won’t even try to guess what, if anything, is happening there.

The Upshot

I’m really beginning to regret this conscientious streak of mine.

I started off by trying to confirm Michael Kay’s impression that a team’s slump after playing (and losing) a marathon game stems from the pitching staff. I found decent evidence for that, but after following up my old study, I’m now a lot less sure that the hangover effect itself is real. The overall numbers still point in that direction, but not nearly with the confidence interval that we had before. So the deteriorating pitching explains a phenomenon that, it turns out, may not exist after all. (Also, the uptick in offense after a marathon game could now be explained by that marathon not worsening the teams playing it.)

It’s a Pyrrhic victory at best for Mr. Kay, and for Paul O’Neill who dismissed it as coincidence, he may have been right all along. Ken Singleton with his indistinct grunt probably comes out best from the debacle.

Why did the numbers turn sour this way? It may have been bad luck, choosing a sample that happened to be misleading. The sample extended across more than 20 years, but it also encompassed just 23 games. Contests that go 18 are, despite the bounty of a half-dozen last season, extremely rare, though they were rarer than usual during the 1990-2011 time period.

There is an established reason why this timeframe would have fewer marathons. In another previous article of mine, “Beyond the Ninth Inning,” I found that higher run environments lowered both the chances of a game going to extra innings and of a game in extras reaching the next inning. The 1990-2011 period includes the entirety of the high-offense environment we now know as the “Steroid Era.” It’s natural that a sample centered on this era would run a deficit of 18-inning games.

Is it possible that there’s a further effect? Could the steroid-fueled offensive runaway have produced more severe detriments to teams that managed to play a game that lasted 18 innings or more?

I’d love there to be a connection, if only to explain away this U-turn I’m forced to make, but I can’t really see the mechanism that would make it happen. Was it so exhausting for relievers to put up strings of zeroes against the sluggers of the age that their arms would be limp noodles for weeks afterward? Were batters, used to domination, so demoralized by being unable to break the deadlock inning after inning that they fell into slumps? The Steroid Era may be responsible for a whole lot (you can hold that argument among yourselves), but I can’t hang this around its neck.

More likely, I got burned by the numbers. A 95 percent confidence interval still leaves a 5 percent chance that your results are misleading. It might even be that the second set of data was the screwy one, and the original conclusion was right after all. To test that, I’d need to enlarge my sample yet again, going back into the 1970’s and likely beyond. I won’t do that here, because I’m concerned that the game could be sufficiently different half a century ago that I might no longer be measuring the same thing when testing post-marathon performances.

Also, it’s a lot of work to do to set myself up for another probable gut-punch to my ego, and I’m not up for it just this moment.

I was not looking to withdraw my findings from the fall of 2012, and if I had just buried this follow-up and done something else to post here, there’s a good chance nobody would ever have been the wiser. That’s not how baseball analytics, or anything presuming to follow the method we call “science,” works. Besides, heaven forbid somebody should actually base some in-game decision on an obscure bit of baseball knowledge that turns out instead to be an obscure goof-up.

And I guess I owe an apology to stat-guy James of the YES Network. Sorry the numbers weren’t all I thought that they were, James. No hard feelings, I hope.


A writer for The Hardball Times, Shane has been writing about baseball and science fiction since 1997. His stories have been translated into French, Russian and Japanese, and he was nominated for the 2002 Hugo Award.
5 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Vslyke
9 years ago

“In this offense/defense breakdown, I have added a 19-inning game the Atlanta Braves played on July 26, 1991.”

That should be 2011. Thanks for the post!

James
9 years ago

Hey, I’m James!

Shane Tourtellotte
9 years ago

Vslyke: Thanks for the catch. It’s been corrected.

James: If that is truly you, we’ve got plenty to talk about. Do you want to start here, or shall I slip you my e-mail address?

AC of DC
9 years ago

Wouldn’t dismiss the initial findings outright just yet. By extending the sample further into the past, you may have added more games, but you’ve also introduced games from a period with a different run environment and different bullpen usage from today, and thus examined effects that may no longer be relevant. It could be that the more recent data is accurate for today’s game, but just lacks confidence.

The thing to do would likely be to broaden to include shorter extra-inning games, perhaps define game length not by innings but by PA or NP even, and see if there’s a trend as the game goes longer.

Incidentally, how many of these “hungover” teams met the same opponent the next day, or again in the following week or month? Who were their opponents in that span? While I recognize that once a game has entered extras the outcome trends closer to coin-flip, might any of the performance woes be attributable to playing a superior team or teams, against whom one’s numbers are going to drop below the season’s mean? Wait, so maybe I am saying to dismiss the initial findings. Dang.

Paul G.
9 years ago

Eh, that’s what I get for watching baseball on television. Michael Kay, bane of my existence!

I tend to agree with AC of DC that the original sample may be more representative of the modern era than the extended sample. The modern bullpen of 12-13 pitchers was a response to the success of Tony LaRussa in Oakland in the late 1980s. Prior to then it was expected that a team would have a long reliever or two that could eat lots of innings. Teams would go into the playoffs with 9 or 10 pitchers on the roster. Complete games were much more prevalent (NL CG team average: 19 in 1986, 4 in 2013). These days a starter than can throw over 7 innings consistently and a reliever that can pitch two innings are luxuries, which is why teams have to carry 7-8 relievers. The pitching game is a lot different in the 1970s than today, but that was true in the bulk of the 1980s as well. If I were to bet, it takes a lot less to tire out the bullpen today than in 1983. To go the anecdote route, what happens these days when a team has to throw a lot of innings, either because of a long game or a doubleheader? Usually one or two pitchers are recalled from the minors the very next day. Heck, these days they allow you to carry an extra pitcher for a doubleheader just to mitigate bullpen exhaustion.

Since pitching appears to be the likely culprit, what may be interesting to investigate is to look at the actual box scores of the games after the marathon. How much of the bullpen was used in the marathon? Who relieved the next day and how did they do? When did the tired arms pitch again and how did they do? Did the starters manage to go longer and “save” the bullpen? A marathon game that used 3 pitchers and the next day’s starter goes 9 is going to have much less future impact than a game when the manager uses up all his bullets and the starter ekes out 5 innings.

And I think I speak for all of us that we respect and appreciate your integrity to further investigate this matter, even when the results were not what you wanted. That is how science is supposed to work.