One reason why I love baseball is because every so often you read about an astonishing factoid about some player that sets your mind abuzz with questions that you want, nay, need, answered. Not you? Right, I’ll check in with the men in white coats later, but right now your favorite baseball site needs me to finish and submit this article.

###### Factoid #1

A few weeks ago I was gabbing with a journalist buddy at *USA Today* toing and froing over every baseball debate under the sun when he slipped in this gem: “Did you know”, he asked, “that in 2005 Placido Polanco (now of the Detroit Tigers but also playing for the Phillies that year) notched up a paltry 25 strikeouts in 501 at-bats?” To save you from whipping out the calculator that is a strikeout/at-bat rate a shade under 5%!

I knew Polanco was stingy on strikeouts but I never knew how stingy. To put that into perspective, in 2005 Polanco struck out once every five games, or a little over once a week. Wow. How impressive is that?

Great question, if I do say so myself.

###### So, How Impressive is That?

Fortunately, with the plethora of baseball resources available on the Internet, it isn’t too difficult to check. Using the Lahman database it is easy enough to see how Polanco’s 2005 strikeout/at-bat rate compared to that of other hitters.

Strikeouts are a lot more common nowadays than they were in the past, so for the time being let’s restrict our data to the era of six-division play (actually I include 1993 too as it makes the data slightly more interesting). Failure to do this means we’d get legends like Al Spalding heading our list. Al, for those who don’t know, recorded zero strikeouts in 384 at-bats in 1874, which was a massive improvement of his troublesome 1873 season when he struck out once in 322 at-bats. Later on in the analysis we’ll come back and adjust for era.

Full name Year AB K/AB Tony Gwynn 1995 535 2.8% Felix Fermin 1993 480 2.9% Ozzie Smith 1993 545 3.3% Tony Gwynn 1999 411 3.4% Tony Gwynn 1996 451 3.8% Tony Gwynn 1993 489 3.9% Tony Gwynn 1998 461 3.9% Tony Gwynn 1994 419 4.5% Juan Pierre 2001 617 4.7% Tony Gwynn 1997 592 4.7% Ozzie Guillen 1997 490 4.9% PlacidoPolanco 2005 501 5.0% Lance Johnson 1995 607 5.1% Juan Pierre 2004 678 5.2% Gregg Jefferies1996 404 5.2% Juan Pierre 2003 668 5.2% Gary DiSarcina1997 549 5.3% Jason Kendall 2002 545 5.3%

Yikes, that has whacked my baseball nose decidedly out of kilter. Polanco doesn’t even crack the top 10—he’s a cool 11th…but look at Tony Gwynn. Gwynn had a reputation for making contact and, let’s face it, he built a first-ballot Hall of Fame career on it, but the consistency with which he appears in the top 10 is nothing short of astonishing. Since 1993 Gwynn appears on five occasions out of 2,500 player seasons (with more than 400 AB).

###### Factoids #2 and #3

Bored? Here is another factoid: Did you know that Gwynn only had one game in his career where he struck out three times? Yes, that’s one out of 2,440 games that he played. Incredible.

In 1995, his finest season, he struck out a paltry 15 times in 535 at-bats. That is one strikeout every 10 days. If you needed further convincing as to how good Gwynn’s career was, consider the following table that shows career strikeout/at-bat stats for seasons from 1993. I’ve restricted the data to 2,000 or more career at-bats, which is about four seasons’ worth of fullish playing time.

Name AB (since 1993) K/AB Gwynn 3664 4.3% Jefferies 3203 5.8% Pierre 4110 6.1% Johnson 3320 6.2% Vina 4240 6.9% Guillen 2845 7.1% Polanco 3726 7.3% Young 5987 7.6% Lo Duca 3274 7.7% DiSarcina 3112 7.8% Grace 5258 7.9% Eckstein 3338 8.5% Kendall 5759 8.6% Hall 2107 8.8% Cora 3024 8.8%

As expected Gwynn comes out comfortably on top with a strikeout rate that is 1.5% better than his nearest challenger, Gregg Jefferies. This is also one of those few categories that Juan Pierre dominates. Who’d have thought it, a list with Tony Gwynn and Juan Pierre in the top five … that’s a third factoid for you.

###### The Other Side of the Coin

Shooting down the career top 10 it is clear that these guys aren’t prolific sluggers. The old adage that good contact hitters strike out less appears to be true. What about the bottom 10 hitters? Are these guys champion power houses?

Name K/AB BA SLG ISO Bellhorn 34.3% 0.231 0.396 0.165 Dunn 32.7% 0.245 0.513 0.267 Hernandez 30.1% 0.254 0.422 0.168 Thome 30.0% 0.284 0.573 0.289 Wilkerson 29.9% 0.252 0.448 0.197 McGwire 29.3% 0.279 0.675 0.397 LaRue 29.0% 0.239 0.415 0.176 Burrell 28.8% 0.258 0.479 0.221 Lankford 28.3% 0.271 0.488 0.217 Wilson 27.9% 0.266 0.461 0.195 Abbott 27.9% 0.256 0.423 0.167 Canseco 27.7% 0.265 0.519 0.253 Becker 27.7% 0.256 0.372 0.116 Buhner 27.5% 0.258 0.512 0.254 Cameron 27.5% 0.252 0.447 0.195 Wilson 27.1% 0.264 0.471 0.207 Hundley 26.8% 0.239 0.464 0.225 Rodriguez 26.8% 0.261 0.489 0.228 Sexson 26.7% 0.269 0.526 0.257

Generally, yes. Sluggers like Mark McGwire and Adam Dunn populate this list so superficially our assertion appears correct. However, we all know that by choosing the appropriate parameters we can manipulate any data set to bend to our hypotheses, so let’s try to be a little more robust. Here are career strikeout/contact/power data for 50-player cohorts since 1993.

Cohort K/AB SLG BA ISO 0-50 9.6% 0.413 0.288 0.125 50-100 13.0% 0.439 0.287 0.152 100-150 14.6% 0.447 0.281 0.166 150-200 16.2% 0.436 0.278 0.158 200-250 18.0% 0.442 0.273 0.169 250-300 19.7% 0.464 0.274 0.190 300-350 22.7% 0.463 0.269 0.194 350-391 27.0% 0.492 0.265 0.227

We can see that at the extremes the contact hitters strike out less, and the sluggers aren’t shy about racking up some gaudy K/AB rates, but the middle ground is a touch murkier. Why might this be?

First, let’s step back and establish the statistical validity of the relationship between K/AB rate and power. The R between ISO and K/AB is -0.49, so a relationship definitely exists. In statistical speak, this means that for every standard deviation increase in K/AB, ISO moves 0.49 standard deviations in the other direction. What about the repeatability of K/AB rates…how much of a skill is it?

A year-to-year correlation between 2005 and 2006 K/AB for hitters with more than 300 at-bats gives an R squared of 0.73, suggesting that swinging and missing on strike three is mostly a repeatable skill. Compare that to a stat we know to be inherently lucky, such as line-drive percentage, where the R squared is 0.08.

However, if we look back to the ambiguous no man’s land (cohorts 100-250), we see that the average K/AB rates for the different player cohorts are huddled together—with less than 2% difference separating each cohort. That translates to only one additional strikeout every 10 games; add in the statistical noise (random variance) and it isn’t a surprise that the trend isn’t perfect.

Right, let’s get back to business and try to work out just how good Tony Gwynn is at avoiding the big-K.

###### Strike Rate Over Time

Tony Gwynn’s performance was pretty darn impressive, especially in an era of strikeout proliferation, but how impressive is it compared to other generations of hitters?

First take a look for how K/AB has varied for every decade in the bigs:

Decade K/AB (Ave) 1870 4.3% 1880 5.6% 1890 4.7% 1900 N/A 1910 9.7% 1920 8.2% 1930 9.6% 1940 10.4% 1950 13.3% 1960 17.0% 1970 14.9% 1980 16.0% 1990 18.2% 2000 18.9%

Baseball has grown fonder of the clod-hopping slugger as opposed to the fleet-footed speedster, so K/AB rates have increased. Saying that, even in the late 19th century when strikeouts were anathema, Gwynn’s whiff rate would have been squarely in the top quartile when he was in his pomp. We have to go back to the 1870s when soft underarm tossing was the norm to observe K/AB rates on a par with Gwynn’s.

###### Adjusting for League Average

To pull together an all-time list we must adjust for context. I did this by working out the mean and standard deviation of K/AB for each year; I then ranked sluggers on how many standard deviations they were from the mean of that year. That allows us to identify who has had the fewest adjusted strikeouts in a season.

Name yearID Z score Tony Gwynn 1998 -2.57 Bob Lillis 1965 -2.44 Tony Gwynn 1999 -2.42 Tony Gwynn 1995 -2.40 Nellie Fox 1962 -2.35 Tony Gwynn 1997 -2.34 Rafael Bournigal 1998 -2.32 Dave Cash 1976 -2.32 Juan Pierre 2001 -2.32 Ozzie Guillen 1997 -2.31 Nellie Fox 1960 -2.30 Nellie Fox 1961 -2.28 Glenn Beckert 1967 -2.27 Gregg Jefferies 1998 -2.26 Don Mueller 1956 -2.26 Nellie Fox 1959 -2.25 Gary DiSarcina 1997 -2.25 Felix Fermin 1993 -2.24 Tony Gwynn 1992 -2.24 Vic Power 1958 -2.23

The data show Tony Gwynn in a great light. He has five seasons in the top 10 and definitely appears to have a penchant for avoiding the strikeout. But…hang on just a cotton-picking minute. Budding statisticians may be slightly taken aback at the low Z-scores. Remember that a Z-score of 2.5 means we’d expect that data point to be 2.5 standard deviations away from the mean about 2% of the time through luck. Here we have nothing above 2.5. Moreover, if we look at the other side of the distribution we see Z-scores of five and more. This indicates that there is bias in the data.

Simply put, the distribution is not normal. Take 2006, the average K/AB was 18% and the standard deviation was 6%. In this case getting a Z-score over three is impossible!

(*Technical Note*: Although Z-scores should only be used when applied to a normal distribution, as we don’t require any significance testing, we can still apply the concept here. As such be careful to note that a Z-score of three does not imply a 99% confidence level in this instance.)

One check worth doing to ensure that our analysis is valid is to look at the maximum Z-score by year. If the standard deviation and mean interact in such a way that maximum Z-scores are higher as time goes on, then of course Gwynn will top the list.

Decade Ave of max Z 1870 1.62 1880 2.15 1890 2.14 1900 N/A 1910 2.69 1920 2.07 1930 2.16 1940 2.17 1950 2.49 1960 2.73 1970 2.59 1980 2.7 1990 2.93 2000 3.01

Hmm…we see that maximum Z-scores have slowly been moving toward three in the last few decades—whereas in the early 20th century Z was closer to two. Another lens through which to look at this is proximity to the maximum Z-score in each year (partly correcting for the phenomena we see above). Here is a list of batters ranked by how close they were to the maximum Z-score in that year.

Name Year Zdiff% Z score Joe Sewell 1932 0.08 -2.06 Joe Sewell 1925 0.10 -1.81 Joe Sewell 1933 0.10 -1.90 Joe Sewell 1929 0.10 -2.02 Joe Sewell 1930 0.11 -2.25 Don Mueller 1956 0.13 -2.45 Charlie Hollocher 1922 0.13 -1.69 Nellie Fox 1962 0.14 -2.41 Stuffy McInnis 1922 0.15 -1.66 Nellie Fox 1958 0.15 -2.23 Nellie Fox 1961 0.15 -2.25 Dave Cash 1976 0.15 -2.26 Red Schoendienst 1957 0.15 -2.11 Buck Jordan 1938 0.15 -1.65 Lloyd Waner 1936 0.16 -1.82 Stuffy McInnis 1924 0.16 -1.82 Joe Sewell 1926 0.16 -1.77 Nellie Fox 1959 0.16 -2.25 Bob Lillis 1965 0.16 -2.39 Emil Verban 1947 0.16 -1.88 Tony Gwynn 1995 0.17 -2.56 Dale Mitchell 1952 0.17 -1.83 Nellie Fox 1960 0.17 -2.23 Clint Courtney 1954 0.17 -2.18

Hall of Famer Joe Sewell tops our list. A quick glance at Baseball Reference tells us he was the “greatest contact hitter ever” and a look at his stats shows he had a quite remarkable career with the timber. Even in a low-whiff era he regularly struck out fewer than 10 times a season.

What about Tony Gwynn—where is he in our ranking? His great 1995 season appears at number 20, that’s out of 15,000 player seasons; he appears three times in the top 50 and six times in the top 100. Even though he isn’t at the summit that is nothing to sneeze at.

###### Wrapping Up

Strikes are an inherently magical part of the game. We laud hurlers who can mow down over 10 an inning without paying too much attention to hitters who constantly avoid the embarrassing swing and miss. And rightly so: a pitcher’s strikeout rate is far more indicative of his skill than a hitter’s K/AB rate is. That’s why we have Juan Pierre and Tony Gwynn at the top of the same list. Amen.

