In high school world history class, I read a book called Cod, which managed to summarize 1,000 or more years of Western history and geopolitics by focusing on cod and the fishing industry. There’s a whole subgenre of these books that summarize the world (or a part of the world) by focusing on one mundane thing; two others use salt (same author as Cod!) and beverages to frame their histories.
A history of baseball focusing on playing games at night could do something similar, as night games make a good lens for looking at things like the Negro leagues, World War II, cross-country expansion, and the influence of television on the game. You could even tuck in events that were memorably shaped by lights (or the lack thereof), like the Homer in the Gloamin’ or the 32-inning marathon played by Rochester and Pawtucket in 1981.
However, it’s not clear that the sabermetric chapter of that book would be too long, as there’s not much canonical analysis regarding day/night splits. Probably the most substantive piece (at least of those easily found) is Russell Carleton’s Baseball Prospectus piece from several years ago, in which he basically concluded that day/night splits aren’t to be trusted.
What if there are effects more subtle than those tested for by Dr. Carleton, though? What if we need to use PITCHf/x to see if something’s going on?
I first became interested in this when doing an entirely unrelated analysis last year that looked at whether players who wear their socks up get low strikes called differently from players who wear their socks up. (The answer is “probably not.”) As day/night splits are the same as high/low sock splits for David Wright, I wound up looking at these splits in PITCHf/x, and after a certain amount of puttering and pondering, some interesting results jumped out.
While trying to understand those results, it made sense to add in considerations for whether games were played outdoors or indoors. Together we can think of these two conditions as defining four types of “lighting.” The extra split is in there because any effect due to the light presumably would be different for games played outdoors. I initially didn’t split out indoor-night and indoor-day, but the two ways of bucketing yield noticeably different results.
What are these interesting relationships between lighting and the PITCHf/x data? We can start with something simple. For instance, how fast are pitches thrown?
As you can see, the velocity distributions are slightly, but noticeably, different. The spread in means is small (the extremes are that pitches thrown indoors during the day average about 0.5 mph slower than those thrown indoors at night), but given the number of pitches we are looking at, the difference isn’t simply chance. (For the technically inclined, an ANOVA using indicator variables for night and indoors, plus their interaction, yields statistically significant results for the night variable, the dome variable, and the interaction variable.)
A brief, necessary aside: The data used for this article are called strikes from 2011 to 2013. Games are tagged as outdoor or indoor using Retrosheet game logs and are identified as night games if they had a listed starting time of 7 p.m. or later.
Now, the fact that the difference isn’t chance doesn’t mean we can take it at face value; the small effect could be explained by small selection biases, like different pitcher usage. To attempt to correct for that, I included only pitches thrown by pitchers who threw at least 100 pitches to both lefties and righties in all four lighting states. Next, I computed the average velocity for each lighting state for each pitcher (restricted to fastballs). I then averaged the averages across pitchers, effectively weighting each pitcher the same, and found a comparable spread—about half a mile an hour, though with some noise. (Done more conventionally, an ANOVA using pitcher and the lighting states still found the lighting states to be highly significant.)
Obviously, this isn’t earth-shattering by any means. In fact, if we do similar tests for other PITCHf/x variables, other differences emerge in the break and location of pitches. I’m not a physicist, but my sense is that those differences easily can be attributed to variation in temperature, air pressure, etc. But what if we look beyond strictly measurements and get into results? More specifically, what if we look at how the strike zone changes depending on the lighting conditions?
Thankfully, Brian Mills and Carson Sievert have laid out some pretty straightforward ways to compare two different strike zones. (At the very least, they’re as straightforward as anything involving generalized additive models can be. If those sound familiar, it’s the basis for Baseball Prospectus’s framing numbers.) If you apply these two conditions and control for pitch location and which side the batter swings from (though ignoring catcher, umpire, count, year, and many other factors known to influence the zone), we can generate maps showing how the strike zone differs with respect to lighting conditions. For instance, consider the chart below:
This shows the difference between the day strike zone and night strike zone for right-handers in games played outdoors. Blue colors show where pitches are more likely to be called strikes during the day; red colors show where pitches are more likely to be called balls during the day; the scale shows how large a difference there is between the two zones. As you can see, the basic finding is that the strike zone moves down at night (though the effect is certainly not uniform), and for some pitches the change in probability is as large as five percentage points. The next two plots show the differences for all of the different lighting and handedness states.
The rest of the eight plots are pretty similar to the first one, though the magnitudes are even larger; the day-night difference is quite noticeable for indoor games, while the indoor-outdoor difference is more noticeable for day games (and shifts the strike zone in the opposite direction).
Now, the fact that these differences are large doesn’t necessarily mean anything when it comes to the truth. There’s always the possibility, even with a large sample size, that the model is capturing noise and portraying it as something meaningful. To assess whether the added complexity of the model is actually adding anything, it makes sense to do some out-of-sample testing. Specifically, I randomly divided the data set in half, used one half to recompute the strike zone model both with and without the lighting terms, then assessed which version of the model predicted the other half of the data better. This is sort of like comparing a projection system to Marcel—if the strike zone prediction can’t noticeably outperform an extraordinarily simple model, it shouldn’t carry any weight.
When you include all pitches, the models perform almost identically. (The metric I’m using for goodness of fit is the log loss, which is one way of assessing the performance of binary predictions, like balls and strikes. It’s the same metric used by Kaggle to pick the winner of its NCAA tournament contest, for instance.) This isn’t surprising, since a huge number of pitches (those well within the zone or well outside it) are predicted very similarly by the two models. If you start looking at pitches where there is a difference of opinion between the baseline model and the lighting model, though, you see an interesting trend, which is illustrated in the chart below:
As the gap between the predictions of the two models grows, the lighting model initially performs a bit worse than the baseline. However, as the gap gets even larger (corresponding to a focus on the colored areas on the charts above), the performance of the lighting model becomes substantially better than the baseline model. Though that initial dip suggests there is some overfit in the model, overall it seems to be the case that the lighting terms capture something real about how the strike zone has moved.
Differences around (or even larger than!) 10 percentage points are much larger than what I was expecting, and they’re a bit difficult to wrap my head around. It’s an abstract enough concept that it’s not easy to interpret intuitively. Unlike with framing models, where we talk about the influence a catcher or umpire has on the game, this somehow suggests that (batter behavior being equal, which it almost certainly isn’t) pitches might vary substantially in value based solely on when they’re thrown.
What makes this even more difficult to understand, and makes me skeptical of the results, is that there’s not an obvious reason this should be true. It’s likely these effects are overstated by my numbers, and some of them might wash away if we control for the players in a given at-bat and different park factors, but even assuming a smaller effect size, what might be the cause? I can think of a few possible reasons, any combination of which could account for what’s going on:
- Umpires and/or batters perceive the pitches slightly differently in different lighting conditions. That is, they think pitches are higher or lower than they truly are.
- PITCHf/x data is incorrectly reporting the height of these pitches (presumably due to lighting conditions), but the umpires see them correctly, so the strike zone appears to change.
- Weather correlates with the lighting states and influences speed, pitch type, and spin, in turn affecting the strike zone.
- There’s a time-of-day effect at least partially unrelated to lighting. Umpires will be more or less tired or hungry during games played at different times, which affects their mood and also could account for some of what’s going on. (That might sound far-fetched, but given that at least one study has found judges make different decisions before and after lunch, it doesn’t necessarily seem implausible to me.)
I really don’t have a good guess for what’s going on here, which is fairly disconcerting. I don’t know enough about the subtleties of my possible explanations to assess how likely they are, so I’m very open to other suggestions about what’s happening here.
Still, if these effects are real and anywhere near as large as I’ve estimated here, they could have substantial implications. Even if they are only measurement issues, the magnitude of difference they imply suggests they should be included in pitch-framing and other strike zone calculations. If they aren’t just measurement issues, these findings suggest there’s an advantage to be gained by pairing pitchers with the environments that play best to their strengths, depending on where they throw most of their pitches.
A team that plays most of its games indoors and at night might get an advantage from targeting low-ball pitchers, whereas a pitcher who works up in the zone might be better off pitching outdoors during the day. In this way, one can develop a theory for why we might expect some players to develop day/night splits and how large those splits would be. There might yet be something subtle enough that it would escape the method used by Carleton in the article I referenced at the beginning of this piece.
Of course, that is just one possibility, and probably not a very likely one. Assessing what exactly is causing these apparent differences—whether selection effects, weather, lighting, or something psychological—is going to be tricky, and so will be coming with estimates of the effects (if they exist at all). At some point soon I intend to look at whether swing and contact rates also change with the lighting and what that might mean for the different possible causes. Perhaps a different sort of model (like the mixed models Jonathan Judge advocates) would shed some light (pun only partially intended) on what’s happening. In the meantime, I’m curious about the possible explanations I’ve failed to think of. Maybe there’s another chapter to be written in that history of baseball at night.
Author’s note: This research is an expanded version of findings presented at Saberseminar 2014. Comments and questions received there were helpful in shaping the piece. All errors are my own.