Can’t find the strike zone?by Sal Baxamusa
July 02, 2007
In computer parlance, one might consider the plate appearance to be the "bit" by which we store and communicate information about baseball. Data parsed by plate appearance is at the core of WPA and sophisticated defensive metrics such as PMR or UZR. It gives us much more information than gross averages, and there's no reason not to use these data now that they are digitally available at sites such as Baseball-Reference.com or Retrosheet.
Changing bit by bit
Recently, however, the availability of information is changing the sabermetric bit from plate appearances to pitches. And why not? Pitch-by-pitch data are available through MLB.com's Enhanced Gameday, and researchers like Dan Fox, John Walsh, John Beamer, and Joe P. Sheehan are taking advantage by bringing us tons of interesting analyses. As Tangotiger wrote on his blog, "[This] is the cutting edge of sabermetrics, the point where performance and scouting converge." Tangotiger was talking about tracking pitch movement using MLB Enhanced Gameday, but I believe there is a set of parallel analyses that merit exploration as well.
Using data available from Retrosheet, I started futzing around with pitch sequences to see if I could find "momentum" in plate appearances. We all know that hitters tend to do better when swinging at a 2-0 pitch than when swinging at an 0-2 pitch. But it turns out that there is some directionality involved as well. I found that under certain conditions, the outcome of a plate appearance is a function not only of the count, but also in how the plate appearance arrived at that count; in particular, I found that the outcome of a ball in play on a 1-1 pitch is affected by whether the sequence of pitches was strike-ball or ball-strike. I looked at full counts under the same lens as well.
My analyses were limited and time-consuming because of my nonexistent databasing chops and the most common comment that I received was, "Learn to use Access/MySQL."
Well, I finally got around to it, and pitch sequence analysis has been greatly facilitated. Today I want to spend some time dissecting 3-0 counts. What becomes of plate appearances in which the pitcher is unable to find the strike zone?
According to Retrosheet's data from 2006, 8049 major league plate appearance began with the pitcher throwing three straight balls (not including intentional walks, of course). The list of hitters who ran 3-0 counts most frequntly is, with a few exceptions, predictably filled with hitters known for a good plate approach.
Hitters who saw the most 3-0 counts Brian Giles 53 David Ortiz 51 Adam Dunn 48 Miguel Cabrera 46 Carlos Beltran 44 Joe Mauer 44 Kevin Youkilis 44 Bobby Abreu 43 Andruw Jones 43 Felipe Lopez 43 Mark Teixeira 43
A good number of these 3-0 counts were no doubt part of "unintentional intentional walks," while others were simply the result of patient hitters with good strike zone judgement or pitchers with poor control.
What happened on the fourth pitch?
Result of fourth pitch for plate appearances that start 3-0 Called strike 59.6% Ball four 33.8% In play 3.1% Foul ball 2.5% Swinging strike 0.8%
Unsurprisingly, hitters rarely offered at the 3-0 pitch. The reason is simple: one time out of three, the fourth pitch was a ball and the hitter jogged to first. Among those who swung at the fourth pitch, roughly half put the ball in play and the others either fouled it off or missed altogether.
An often-deceiving statistic that is used is a hitter's batting line when putting the first pitch in play. The hitter is usually shown to hit for a ridiculously high average when swinging at the first pitch (Derek Jeter, for example, hits .385 when he swings at the first pitch). Here is a similarly deceiving statistic: the rare times that hitters put the 3-0 pitch into play, their AVG/SLG was .373/.578. Goodness gracious, why doesn't everybody swing at the 3-0 pitch?
The explanation, of course, is part of Baseball 101. Simply by doing nothing the batter can expect to draw a base on balls one-third of the time. The downside is that the next pitch is a called strike, which isn't so terrible considering that the count is still in the hitter's favor. While swinging away leads to some success, the risk of making an out jumps from 0 to somewhere north of 60%. Hitters ought to—and do occasionally—game pitchers by hacking at the 3-0 offering, but by and large hitters do—and ought to—keep their bats on their shoulders. The following hitters were the most trigger-happy on 3-0 pitches.
Hitters who swung most often at 3-0 pitches Brian Giles 1-6, 1B, GIDP Brad Ausmus 0-5, GIDP Morgan Ensberg 3-5, 3B, HR Raul Ibanez 0-4, SF Victor Martinez 2-5, HR
Lance Berkman also put four 3-0 pitches into play; I wonder if this is something preached by Astros hitting coach Sean Berry.
(For those concerned about selection bias, the group of hitters who received four-pitch walks had an aggregate seasonal batting line of .272/.346/.446 compared to .270/.340/.437 for hitters who took a called strike on 3-0 counts. The pitchers in the first group had a 4.80 ERA and the pitchers in the second group had a 4.90 ERA. So while there is some evidence that "fourth pitch ball" tends to be the domain of better hitters and poorer pitchers than "fourth pitch called strike," the difference is only 15 points of OPS for the hitters and one-tenth of a run per nine innings for the pitchers.)
Not over yet
Based on the above information, over 60% of plate appearances that start 3-0 continued past the fourth pitch. What happened on subsequent pitches?
With the count 3-1 following ball-ball-ball-strike, a quarter of the following pitches were put into play.
Result of 3-1 pitch in play after count started 3-0 Sequence AVG/SLG N BBBCX .358/.496 1280 BBBFX .340/.510 47 BBBSX .272/.272 12 C= called strike, S = swinging strike, F = foul, X = in-play, N = sample size
A good benchmark with which to compare these results is the outcome of all 3-1 pitches put into play, regardless of the pitch sequence leading up to the 3-1 count. Balls in play on any 3-1 pitch were hit for AVG/SLG of .351/.486 (N=4362).
I would not draw any conclusions from the BBBSX data given the extremely small sample size, but tests for statistical significance show that the differences between the BBBFX and BBBCX sequences are indeed real. Why would hitters hit for more power after a fouling off a 3-0 pitch as opposed to taking the 3-0 offering for a strike?
The statistical answer: selective sampling. It turns out that the hitters who elected to take the 3-0 pitch were, on aggregate, .270/.338/.437 (AVG/OBP/SLG) hitters and those who elected to swing (and subsquently foul off) the 3-0 pitch were .278/.349/.475 hitters. The quality of pitchers were different, too: a 4.80 ERA to a 5.08 ERA, respectively. Additionally, the hitters who swung and put the 3-0 pitch into play were collectively .275/.352/.471 hitters. It's not that the pitch sequence made them better hitters; it's that better hitters had different pitch sequences.
The baseball interpretation is that hitters who swing on 3-0 are typically better-than-average hitters, particularly in the power department. And of course, this makes sense: a hitter who knows he can hit for average and power is more likely to put the ball into play rather than hope for the walk. (I remember being realistic about my abilities as a Little Leaguer. I could put bat on ball, but I couldn't get the ball out of the infield. If I was anywhere near a walk, I was taking all the way.)
Unfortunately, we don't learn anything about directionality: any difference in hitting a 3-1 pitch after the first three pitches were balls as opposed to any 3-1 pitch is not readily apparent. This is disapppointing; I set out looking for evidence of momentum in plate appearances and found it on the 1-1 pitch. But I haven't been able to find it anyplace since. Still, looking at pitch-by-pitch data gives us information on the proclivities of hitters in certain counts, which is presented here as the league aggregate. The vast majority of the time, hitters take the 3-0 pitch. Hitters that do occasionally take a swing tend to be overall better hitters, so pitchers ought to be careful when they put a fastball down the middle of the plate to a power hitter on a 3-0 count. He just may be swinging.
There's no reason why the same information couldn't be used to learn about individual players. And what do we call judgements on the proclivities of individual players? Scouting. Stat wonk sabermetric objective analysis is fast converging with sun-drenched old-school experiential scouting. It's a good time to be a baseball fan.
References and Resources
As always, Retrosheet makes this all possible. The blurb: The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org". And you should all be interested.
Sal Baxamusa is a graduate student in chemical engineering. He can be reached here.
<< Return to Article