Pujols and the home run drought questionby Kevin Lai
May 25, 2011
One of the more intriguing remarks I hear sometimes during a televised game, or read in various news outlets, is that Player X has not hit a home run in some historically large amount of at-bats. Are they suggesting to their audiences he is bound to pop one any minute now, so stay tuned? Or maybe they are really worried about the player's production, and whether or not it signifies an underlying health or mechanical issue.
The most recent player to be given this type of media treatment is Albert Pujols. Before sending a long ball out of Petco Park this past Monday (May 23) Pujols' last homer came 122 plate appearances earlier, on April 23 against Travis Wood. As you've probably heard, that is the longest power drought Pujols has endured in his entire career.
I wonder, though. Besides having topical story value, is there any real information behind home run waiting periods? After all, it's just another way to chop up data in an attempt to learn more about the natural distribution of player performance. I took a look at Pujols' career home runs from 2001 to 2010 (including playoff appearances) and found the number of plate appearances in between each home run. Its density plot is below in black.
As you can see, the waiting periods are heavily skewed, with most of the periods in the zero to twenty-five plate appearance range. His average period between home runs is about 15 PA. This skewed density plot leads me to speculate that these waiting periods follows the Exponential Distribution (outlined in red) closely.
Without getting too messy, the exponential distribution gives us an approximation to the time between rare events. In this case, rare events are home runs, and the time interval can be constrained to plate appearances. Other examples of these rare events (that are known as Poisson Processes) would be the distribution of telephone calls arriving at a switchboard, or rainfall in a distinct area.
Modeling waiting periods on home runs like this can confirm what we already know, or have deduced, from this Pujols story; the likelihood of such a long waiting period was highly unlikely, given his historical home run numbers. Assuming this exponential distribution, the probability of Pujols having a 122 PA drought or longer comes out to 0.03 percent.
Signs of something wrong
Like I mentioned earlier, maybe this unprecedented drought is a sign of a bigger issue. Rob Neyer suggested the other day that Pujols' power outage coincides with a "mild" hamstring injury occurring the day after his previous home run.
Pre-injury through April 24: .250/.306/.500
Post-injury since April 24: .286/.372/.327
Pujols went from being Pujols, to having power like Ichiro. Neyer (as well as a majority of those voting on his article) believes it is his hamstring injury that is preventing him from hitting the ball with power authority.
If you agree with the exponential distribution model, it's telling us that, if Pujols was acting like career-Pujols, it's highly unlikely we would see this result. If we suggest a lower home run rate for 2011-Pujols (due to hamstring issues or free agency hiccups), we're going to see the probability of this 122 PA waiting period increase.
Projecting the rest of the season
Preseason-2011 ZiPS projections had Pujols plugged at 37 home runs. Now that almost two months have passed (and we've only seen eight dingers from him), the projection has decreased to 32.
Another way to compile home run total projections is to use the Poisson Process model. An assumption I am making is this—the number of home runs going forward will be independent from what we've already seen (also known as the memory-less property). This seems intuitive and expected; while historical performance can be indicative of true talent, each plate appearance is a random independent event, like flipping a coin 10,000 times.
With this in mind, if we assume Pujols will have about 463 plate appearances left in the season (his average PA per season stands at 678, and he's had 215 so far), we can use a density plot to see the likelihood for rest of the season's home run totals. Using this Poisson model, we can expect 31 home runs (rounded from 30.7) from Pujols, a substantially higher projection than ZiPS' 24. It seems this model is bullish on Pujols compared to most sophisticated projection systems out there. Despite the higher home run total from this model, it has decreased since the season began (the preseason projection would have been at 46 home runs using this model, while now it is at 39).
More or less, modeling plate appearances between home runs can confirm our judgments on a player's performance drought. Is something bothering the player, has his talents regressed or some sort of highly unlikely, random event? This idea can be applied to other baseball events—i.e., the time between stolen bases, perhaps (although the time increment would be a little tough to define, maybe by base appearance?).
At this point for Pujols, there's only eight data points that I have for this season's home run waiting periods—not enough to come up with any conclusive evidence that his rate of home runs per PA has effectively changed. But given his career rate, what we can infer is the absolute extremity of this power drought.
References and Resources
The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.