Successes in Walking

Mike Trout saw the most pitches per plate appearance than any other player in 2014. (via Erik Drost)

Mike Trout saw the most pitches per plate appearance than any other player in 2014. (via Erik Drost)

In our current, run-depressed environment, one hears a lot about increased strikeouts and the increased size of the strike zone. Although it follows naturally from these same things, one doesn’t hear as much about the decline in walks.

Perhaps this is because many don’t understand the challenges of drawing walks, and how players who draw walks succeed in doing so. I want to talk about all of these things, because there are some fairly interesting trends beneath the surface.

First, we need to correctly define the problem. Far too often, I see people writing about a player’s walk rate being “above average” or “below average.” On a league-wide basis, people will cite the average league walk rate as evidence that walks are down, and to compare players’ walk rates to that average. Indeed, all leading baseball sites track the weighted league-wide walk rate as an average.

Of course, there is indeed a league-average walk rate (7.8 percent for non-pitchers last year) and any player who registers a plate appearance can be compared to it. In that regard, 2014 was the worst year for average walk rate since 1980, and the last three league years have featured the worst such walk rates during that time period.

But these “averages” mask how bad the problem has become, because they are based on the arithmetic mean. As is often the case, the median (literally: the midpoint of the data) is more helpful in telling how walk rate distributes among players. Here is a table comparing the average and median walk rate for all batters in baseball since 2014:

pic-1-judge

If we limit the sample to qualified hitters each season, the figures tighten up a bit, but not much:

pic-2-judge

Why am I belaboring a distinction between the average and the median, when both show a clear trend? Because the difference is a tip-off of a skew to the data. Like most talents in baseball, the ability to draw walks is not uniformly distributed. Indeed, these charts show that it is not even normally distributed:

pic 3 judge

Above we have the concentration of walk rates among qualified hitters, with a vertical blue line for the median walk rate. This histogram underscores the clear positive tail to the data, caused by a minority of players who are actually uniquely good at drawing walks — league trends notwithstanding. To put these same data into percentiles:

2014 Walk Rates
Quantile 0% 25% 50% 75% 100%
BB% 4% 6% 8% 10% 17%

As you can see, a particularly-gifted category of walkers settles in at a walk rate of 10 percent and above, populating the 75th percentile and up. Who are these exceptional walkers, and how do they do it in such a tough run environment?

There are, as it turns out, a few factors that drive a batter’s success at walking. Some you would expect, others you wrongfully expected, and at least one you did not expect at all.

Method

I analyzed all qualified hitters in 2014. Qualified hitters were selected because they (a) created an adequate sample size (147), (b) are the hitters most likely to reflect their true talent (because they play so much), and (c) are the players managers choose to play basically every day.

The list of 2014 qualified hitters was subdivided into a randomly selected training set and the remainder was set aside for testing. Regressions were performed using four different methods: a linear model, randomForest, the lasso, and earth (an open-source implementation of MARS), all in R. All models were weighted for plate appearances.

The output of interest (walk rate) was regressed against various plate discipline metrics (as recorded by PITCHf/x and tracked by FanGraphs) and hit-type characteristics deemed by me to be potentially reflective of a player’s walk rate. These statistics included:

  • Pitches per Plate Appearance (P/PA)
  • Swing rate at pitches outside the strike zone (O-Swing %)
  • Swing rate at pitches inside the strike zone (Z-Swing %)
  • Overall swing rate (Swing %)
  • Contact rate with pitches outside the strike zone (O-Contact %)
  • Contact rate with pitches inside the strike zone (Z-Contact %)
  • Overall contact rate (Contact %)
  • Pitches received inside the strike zone (Zone %)
  • First strike rate (F-Strike %)
  • Overall swinging strike rate (SwStr%)
  • Groundball / Fly ball ratio (GB.FB)
  • Infield fly ball rate (IFFB %)
  • Infield hit rate (IFH %)
  • Bunt hit rate (BUH %)

Each model was evaluated through bootstraps of the training sample, validated on the test sample, and then re-fit to the full 2014 season of qualified hitters. Overall, the models produced similar results. Although the randomForest model had the lowest average error rate, it also retained over twice as many predictors as a simple linear model, with little ultimate benefit (.004 improvement in RMSE).

The final linear model benefited from variable selections suggested by the more complex models. Overall tolerance was acceptable (>.42), and the predictors were highly statistically significant (p<.001).

Discussion

The four variables that ended up being most useful in predicting a batter’s walk rate were, in no particular order:

  1. Pitches per Plate Appearance (P/PA)
  2. Swing rate at pitches outside the zone (O-Swing %)
  3. Pitches received inside the strike zone (Zone %)
  4. Contact rate with pitches inside the zone (Z-Contact %)

We’ll discuss each of these. All of these predictors are, admittedly, arithmetic means, but that statistic is less problematic with these particular attributes.

Pitches per Plate Appearance (P/PA)

I suspect that if readers had to vote on one factor most predictive of a player’s walk rate, they would pick this one: P/PA. It makes sense that taking more pitches would increase one’s walk rate, since taking a walk requires receiving at least four pitches per plate appearance, and against major league pitchers usually requires even more.

But surprisingly, this factor is not that important. Of the four factors I isolated above, P/PA is a distant third in importance (t=3.021), just barely more meaningful than zone contact rate (t=-2.993). In part, this is because, as I’ve written before, the difference between hitters when it comes to plate patience is fairly minimal. The interquartile range (IQR) (the distance between the 25th and 75th percentiles) is less than one-third of one pitch per plate appearance, and there is a grand total of one extra P/PA between the most patient and impatient full-time hitters in baseball. The major-league percentiles for P/PA distribute as follows:

Pitches per plate appearance percentiles
Quantile 0% 25% 50% 75% 100%
P/PA 3.330 3.685 3.820 3.985 4.360

This is not to say that taking pitches is irrelevant. There is a statistically significant relationship, but much of that relationship seems to be a function of other factors. The difference between the top and bottom of the IQR is worth about .7 percent in a player’s walk rate. So, while some players may well have a conscious strategy of taking pitches, that does not seem to be a driving factor in their success at actually drawing walks.

The five qualified hitters in 2014 with the highest pitches per plate appearance were as follows:

P/PA Leaders, 2014
Name Pit/PA BB%
Mike Trout 4.44 12%
Brett Gardner 4.43 9%
Matt Carpenter 4.36 13%
Carlos Santana 4.30 17%
Adam Dunn 4.29 14%

All these walk rates are clearly above the median. Yet, the variety among the extent to which the walk rates exceed the median is consistent with P/PA not being much of a distinguishing factor.

Swings at Outside Pitches

The next factor is incredibly important: the tendency of a batter to swing at pitches outside the strike zone predicted by PITCHf/x. This is certainly not a surprise. Batters who swing at bad pitches are racking up strikes, not balls, and batters who are generous with their swings can be expected to get a steady diet of bad pitches.

A batter’s swing rate at outside pitches is nonetheless interesting for two reasons. First, it basically ties with zone rate as the most important factor in determining a player’s walk rate. (t=-8.9, SE=.04). Second, the batter’s swing rate at outside pitches correlates with his pitches per plate appearance, which interestingly is not at all true for either zone rate or zone-contact rate. In fact, if you take either P/PA or O-Swing% out of the model, the tolerance among the remaining predictors jumps up to the .9 range, making those variables almost entirely independent from one another. O-Swing%, therefore, is not important merely for its predictive value in walk rate, but for the increased overall opportunities it helps provide a player to get the pitch he wants.

The qualified players with the five-best rates at resisting outside pitches are as follows:

O-Swing% Leaders, 2014
Name O-Swing% BB%
Matt Carpenter 19% 13%
Coco Crisp 21% 12%
Brett Gardner 22% 9%
Carlos Santana 22% 17%
Adam Dunn 22% 14%

As you can see, a player’s O-Swing % is inversely related to his walk rate. Many of the names we see here were on the first chart too, which is consistent with relationship we are finding between O-Swing % and ultimate pitches per plate appearance. And how important is outside-swing rate? The difference between the top and bottom of the IQR (25th and 75th percentiles) is worth 2 percentage points to a player’s walk rate. That can mean the difference between mediocre and quite good.

Zone Percentage

A batter’s walk rate is also a function of how many strikes he gets get thrown in the first place. (t=9.2, SE=.05). This factor is not a surprise either, but that does not make it irrelevant. The extent to which a batter receives pitches in the zone is only somewhat related to the fortuity of his schedule. Generally, all major league pitchers have at least reasonable command. Thus, zone rate is driven substantially by the respect pitchers have for the batter in the box.

To take the simplest example, this past season there was a solid (r=.52) and very highly significant (p<.0001) relationship between a batter’s isolated power (ISO) and the number of pitches that batter received in the strike zone. Pitchers are not stupid, and they are not interested in having a good night’s work ruined by one pitch that catches too much of the plate. Rob Arthur has broken this concept down even further, finding that the typical distance of a batter’s pitches from the center of the strike zone is not only indicative of his ability, but predictive of future breakouts.

So, consider Zone % a function as being a function of the batter’s underlying reputation and ability, not just the pitcher who happens to be on the mound each day. In this regard, the lineup of the batters who face the lowest percentage of pitches in the zone should not be surprising:

Zone% Laggards, 2014
Name Zone% BB%
Pablo Sandoval 36% 6%
David Ortiz 39% 13%
Giancarlo Stanton 39% 15%
Freddie Freeman 39% 13%
Jay Bruce 39% 8%

David Ortiz, Giancarlo Stanton, and Jay Bruce: all three are feared power hitters, even if Bruce managed to have a terrible 2014. Freddie Freeman is an excellent all around hitter (when he gets the ball in play), and Pablo Sandoval simply swings at everything, so pitchers do their best to oblige him.

Zone-Contact %

The last category is without question my favorite. It was the unexpected one that I at first assumed had to be a mistake. What does making contact have to do with drawing walks, given that swinging the bat can only result in a strike? And why is a batter’s zone contact rate inversely related to this? That’s right: The worse a player is at making contact with strikes, the more he draws walks. Swing, miss, and draw walks.

The answer lies in part with our “three true outcome” hitters. Many of these hitters are perceived as “all or nothing” in their approach at the plate, because they struggle to make everyday contact with pitches. As such, to survive in the lineup between home runs, they almost need to have an elevated walk rate. In part, these sluggers benefit from a lower percentage of strikes in the zone, as shown above. But they also benefit from that same inability to make contact with pitches that are in the zone, thereby extending their at-bats and increasing the chance of drawing a walk. In other words, when it comes to drawing walks, these players succeed through failure.

This time, I chose the bottom six hitters in zone contact rate, and I think it makes the point:

Z-Contact% Laggards, 2014
Name Z-Contact% BB%
B.J. Upton 74% 10%
Chris Davis 78% 11%
Chris Carter 78% 10%
Adam Dunn 79% 14%
Justin Upton 79% 9%
Ryan Howard 80% 10%

All these hitters have terrible contact rates. But, they also all have decent to excellent walk rates, and their inability to hit the reduced number of strikes they already get is one reason for that. The difference between a good and bad zone contact rate (in other words, the top and bottom of the IQR) is about 6.5 percentage points, amounting to about a half point of walk rate. This is not huge, but this category of hitter needs every bit of on-base percentage they can get.

A Final Note

Mike Trout had a terrific walk rate last season of 12 percent. One factor in that walk rate? His significantly below average zone contact rate of 85 percent.

It’s good to know that even Mike Trout has things to work on at the plate.


Jonathan Judge has a degree in piano performance, but is now a product liability lawyer. He has written for Disciples of Uecker and Baseball Prospectus. Follow him on Twitter @bachlaw.
16 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Eric
9 years ago

Well, Mike Trout struck out 185 times last year, so that makes sense as that is a completely atrocious figure. I am surprised, the one variable I didn’t see you analyze in here is foul ball rate, which extends a plate appearance, allowing a batter to see that ‘one more pitch’, which allows a batter to dig himself out of 0-2, and 1-2 holes, and come all the way back to 3-2 to garner walks. It allows you to fight off a tough pitcher that is really not giving you anything, in order to glean something positive. A la Dustin Pedroia anyone, or Kevin Youkilis? Some of these stats you chose get you there but indirectly, or skirt around the issue a bit, P/PA, and Zone Contact %, but try the direct route and see what that yields. You think this is odd, “That’s right: The worse a player is at making contact with strikes, the more he draws walks. Swing, miss, and draw walks.” Yet, consider actual contact, that would be ‘even odder’. Pun intended. Foul ball percentage or raw volume of foul balls, check it out.

Jonathan Judge
9 years ago
Reply to  Eric

Hi Eric,

I will look into that. Howevver, a quick review of other studies seems to suggest that foul ball percentage is a “skill” but one that on average doesn’t tend to increase batter production in a meaningful way. Bill Petti discussed them here: http://www.beyondtheboxscore.com/2011/11/10/2551718/investigating-foul-balls-as-a-skill-or-signal-of-skill

Thanks for reading.

Jonathan

Jeff Zimmermanmember
9 years ago

Where intentional walks removed from the walk rates?

Jonathan Judge
9 years ago
Reply to  Jeff Zimmerman

Hi Jeff,

They were not. It would have been interesting to do that, but then I would have to find some way to adjust the other characteristics to exclude IBB plate appearances, which for me at least would have been difficult to do. I suppose I could also add a separate control variable for IBBs, to isolate any effect they would have, although I need to think that through. One complicating factor, of course, is that many intentional walks don’t start out that way, making them pretty similar to unintentional walks when all is said and done.

Regardless, this analysis currently assumes that intentional walks are a function of other factors or ultimately are not meaningfully different (e.g., they still involve hitters not swinging at pitchers outside the zone).

Carlos Shwartz
9 years ago

Batters and pitchers should know the battle ends until one either bats or the umpire calls the strike out, fair arguing on foul ball rate but should remain the way it is. I think.

MGL
9 years ago

Not sure why you are surprised at the inverse relationship between contact rate and walk rate. If a pitch is in the zone and the batter makes contact, the ball is usually in play, the PA is over and the batter cannot walk. If he swings and misses, he gets a strike, but the PA is probably not over (unless it is strike 3) and he may end up drawing a walk. Of course contact rate and walk rate is inversely proportional. So one way the PA is either definitely or probably over (I am not sure if a foul ball is considered “contact” or not) and the other way the PA is probably not over. Which way do you think ends up with more walks (hint: one way can not ever end up with a walk, other than on a foul ball if that is considered contact)?

BTW, I think that Arthur’s theory about “zone distance” presaging a breakout season is “wrong.” I think what he is seeing is merely regression. If a player gets lots of pitches near the center of the zone in a season, i.e., a small “zone distance” he likely faced bad pitchers and/or got lucky in terms of the pitches he received. IOW, he got more than his fair share of cookies. We would expect him to regress the following season.

Likewise if a pitcher gets more than his fair of tough pitches, i.e., he faced an unlucky distribution of pitches, we are likely to see a regression upward the next season.

So while his results are accurate, I think his interpretation is wrong. Sort of like the “sophomore slump” for rookies of the year. It is true but only has to do with regression.

Eric
9 years ago
Reply to  MGL

This was my exact point originally when I said, if you think its odd that swings and misses lead to a walk, then it would be odder still, that contact would lead to walks. Because most people when they think of contact, only consider Balls in Play, they overlook the value of a foul ball out of play. And yes, of course foul balls are contact. The only time you can strike out with a foul ball is on a bunt. So if you never bunt, and are sitting on strike two in your count, its not possible to strike out if you keep fouling off one pitch after another. Personally, I pull that trick all the time. I can foul off 2-5 pitches every plate appearance, unless I get a cookie early in the count. Its maddening for a pitcher. That is why I told the author to try foul ball percentage as an independent variable. If you NEVER make contact, aka swing and miss, well a swing and miss as you alluded to could be strike three, but a foul ball allows you to stay in the batters box for at least another pitch, which could lead to a pitcher throwing a mistake or ultimately walking you. That is why, the single most valuable trait an offensive player can have is a discerning batting eye. Never overlook it. Those are the guys with the longest careers in MLB. Furthermore, those players have the longest careers precisely because they can skirt or halt the “regression rules” for way longer than the typical player, or the Adam Jones’ of this world – he of he 29 walks and 128 strikeout average per year. This business saying that striking out 100+ times a year doesn’t matter because its just another out is complete and utter bullshit. So many good things can happen with simple contact. The Royals led in contact outs, and just plain contact this year and it got them to the World Series.

LeeTrocinski
9 years ago
Reply to  Eric

Eric, your last sentence has no relevance to the argument. The Royals made the playoffs due to a bit of clutch luck and great run prevention. They allowed the 3rd fewest runs in the AL, thanks to their elite defense and good pitching. They only scored 4.02 runs per game, 9th in the AL. You can drop them below 4 runs a game if you take out their baserunning, which is not affected by how they got on base. Also notice how their 2 best hitters last year, Gordon and Cain, had the highest K rates on the team. Only 1-2% of the variation in Clutch score is explained by K%, so it has basically no effect.

Vince
9 years ago
Reply to  Eric

Is it hard to foul off all those pitches with the broken arm?

Rob Arthur
9 years ago
Reply to  MGL

Jonathan, great article.

MGL: re: the zone distance -> breakout theory, I don’t follow your reasoning. The idea is that if a player sees pitches which get further and further from the center of the zone in year N, they will break out in year N+1. It does not involve a comparison between years–only within a single year. What’s more, regression to the mean is accounted for by PECOTA, which is used as a baseline; players do even better than their regressed forecast. I’d love to hear more about why/how it could be some kind of regression, but the objection you raised doesn’t quite make sense to me.

dcs
9 years ago

While it’s true that swinging and missing can lead to more walks, that should not be taken to imply that it’s a desirable attribute. Swinging and missing also leads to more K’s, as well putting the batter more behind in the count, so that he will likely get worse pitches to hit. That low contact batters are somewhat better, as a group, than high contact batters is not a counter, since that result is not a cause, but an effect…

Jonathan Judge
9 years ago
Reply to  dcs

Hey Dave,

Yes, I agree. I tend to think of this poor-contact “rebate” as being like regenerative braking, except that it’s probably not an intended result.

Jonathan

Joshua
9 years ago

Love the article. Players need to read it. Need to walk more.

Bpdelia
9 years ago
Reply to  Joshua

As a guy who played very well all the way through division I college and then competitive wood bat men’s leagues after players reading this will have all of zero impact.

Despite their best efforts players really cannot learn batting eye.

It’s possible they can get slightly better with thousands of reps but really it’s something you have or don’t.

No player has ever attempted to swing at a bad pitch.

They swing because that pitch LOOKED good.

Watching on tv it’s easy to think this is obvious but the speed at which this decision is made…..

Well the speed renders it almost a NON decision. An act of instinct that occurs on the subconscious thought level of consciousness.

Even change ups and curveballs arrive shockingly fast.

I’m over forty now and still regularly play and go to the batting cage. Always on “fast” .

You should go. Maybe it’s been awhile but generally “fast” is mid eighties which is nothing and is a near blur.

The only way to accomplish what you are asking for is guessing to a degree. Which I’ve always done.

You train yourself and think “ok if the ball is on the outer half I’m simply not singing” etc.

And even THAT is very difficult. Sometimes your hands just disobey.

In conclusion people can’t learn batting eye after their initial exposure to the game.

I suspect any of the “improvement” we see in this regard is simply pitchers throwing less strikes and more obvious balls to hitters who have become dangerous upon contact.

Good example being Robinson cano. The second his batted ball damage goes down his slight improvement in walk rate will evaporate because this is something not under his control anymore than being happy is under the control of someone with bipolar or not hearing voices is in the control of the schizophrenic

Michael
9 years ago
Reply to  Bpdelia

I suppose I am biased. I think you can learn batting eye, but then I still play like you, in local wood and metal bat leagues. To me 80-85 mph is easy, since I practice at that speed, but the bias is good genetics. Dad has 20/10 vision, better than perfect. You get extra reaction time that way, and you can see things clearer at a greater distance and wait longer to swing while others cannot with 20/20. I am your age and the last 2 seasons of 45 games a year were OBP’s of .627 and .611 – Ooooo, I’m regressing….and Thanks for passing on those genes, pop. Baseball is still fun after all these years only because I don’t have a single broken body part. KNOCK ON WOOD. On another note, it begs the question – was Ted Williams a great baseball player and student of the game, or was it all due to his 20/5 vision that made him the last one to hit .400? Is it truly hand-eye coordination, or is it a truer axiom as eye-hand coordination? You do know what “position” ole Teddy ballgame played in WWII right? That of pilot. Have to have excellent vision and eyes in the back of your head, and your head on a swivel to come home safe. Where Teddy was concerned, I think it was a bit of all, awesome genetics, talent, and experience.

Carlos Sun
8 years ago

Late to the game, but great article! Did you look at multicollinearity between P/PA and Zone-Contact%?