The long and the short of plate appearancesby Sal Baxamusa
August 06, 2007
As you may have guessed by now, my primary interests in baseball analysis have to do with sequences and distributions. Sabermetrics is usually interested in averages, and that interest has served baseball fans very well. But if the next great advances in analysis will bring about a convergence of traditional sabermetrics and traditional scouting, then we'll have to look beyond the averages and instead examine the sequence and distribution of events that underlie those averages.
That's why I love the nascent research using MLB's enhanced gameday, which allows us to see how pitchers distribute the location of their pitches. That's why I like to look at run distribution, performance distribution, and pitch sequences. And that's why this week we're going to look at plate appearances through the lens of distributions.
A few words of caution: despite the above paean to distributions and sequences, this article is strictly for fun. I don't know that the following is anything but interesting, and I certainly won't pretend that it is a profound analytical result.
Let's start with the averages and then take a deeper look.
Tangotiger has done work with estimating pitch counts, and has a basic pitch count estimator equation:
Pitches = 3.3 * PA + 1.5 * SO + 2.2 * BB
If we look at pitch-by-pitch data for 2006 (courtesy Retrosheet), the data match the estimator quite well. Last year, the average walk required 5.7 pitches (Tango's equation says 3.3 + 2.2 = 5.5; quite close), the average strikeout required 4.8 pitches (Tango: 3.3 + 1.5 = 4.8; spot on), and all other plate appearances lasted 3.3 pitches (again, spot on with Tango's equation). The average plate appearance lasted 3.8 pitches; the estimator says 3.75.
Does the number of pitches depend on the result of the ball in play? On average, it turns out that the answer is no. Balls in play that resulted in outs took 3.3 pitches, and those that resulted in hits also took 3.3 pitches. It doesn't depend on type of hit, either. Singles took 3.3 pitches; doubles, 3.3 pitches; triples, 3.5 pitches (in a smaller sample); home runs, 3.3 pitches.
Now let's take a look at the distribution of pitches per plate appearances. There are a few ways to look at this. The first graph shows how frequent plate appearances are by their pitch count.
Four pitches is the most common pitch count, followed closely by three and then two. Plate appearances lasting two, three, and four pitches make up around half of all plate appearances.
The next plot shows the distribution of the length of plate appearances broken down by result.
Naturally, for the first two pitches, the result is either a hit or an out. At three pitches, strikeouts first start to appear, and they peak at the fourth and fifth pitch. One thing that distributions tell us is how common a particular occurrence is. For example, a dominant good morning, good evening, good night three-pitch strikeout comprises 17.5% of all strikeouts. So even if a game has only six strikeouts, you would still expect to see one batter go down on only three pitches.
What about walks? Walks, of course, first appear at four pitches, and peak at five. Four-pitch walks (not including the intentional variety - I excluded these from the data set), account for about 1 in 5 of all free passes. Even if a game features only five walks, you would expect one of those to include a pitcher missing the zone on four straight.
We already saw that the average number of pitches doesn't depend on the result of the ball in play. It turns out that the distribution doesn't either:
The shapes of the distributions of the various types of hits as well as outs on balls in play are essentially the same. About 15% of home runs occur on the fourth pitch of the plate appearance, but the same is true for singles, doubles, triples, and outs on balls in play.
You can see this even more clearly on what is called a stacked bar chart, where the contributions of each type of ball in play are shown as a function of the number of pitches:
When the ball is put in play on the first pitch, singles account for about 20% of balls of play and outs for about 70%. Doubles (about 6.5%) and home runs (about 3.5%) basically account for the rest. Those numbers are more or less the same when the ball is put in play on the second pitch, or third pitch, or...well, you get the idea. This is true until around the eighth or ninth pitch, when the data start to look a little noisy. There may be something we can tease out there, but we won't attempt that today. Our general observation is that as the plate appearance wears on, a ball in play is no more or less likely to be a home run as opposed to, say, an out.
I hope this gave you a flavor of what distributions can tell us. Events which may seem to be infrequent, such as four-pitch walks or three-pitch punchouts, are more common than I would have guessed. Even when the distribution doesn't tell us anything, it tells us something: that the result of a ball in play is not a function of the length of a plate appearance.
Sal Baxamusa is a graduate student in chemical engineering. He can be reached here.