# The Hardball Times

## Predicting double play rate

by Dan Turkenkopf
July 16, 2009

It seems pretty intuitive, right? Big, slow sluggers will hit into more double plays than speedsters.

They hit the ball harder, so the ball gets to the infielders a lot faster. And they obviously don't run as well, so they're easier to double up. But sluggers also are less likely to put the ball in play and less likely to hit a ground ball when they do put the ball in play.

So is our theory really true? Do power hitters hit into more double plays than weaker hitter who might be faster?

### Methodology

The approach I used is a pretty simple one. I figured the double play rate for every player season since 1954 that had more than 50 opportunities, where an opportunity is a plate appearance with a runner on first and fewer than two outs. This left me with 13,169 player seasons. For each of those player seasons, determine isolated power (slugging percentage - batting average), isolated walk rate (on base average - batting average) and speed score. I calculated speed score the same way Fangraphs does; using stolen base percentage, stolen base frequency, triple rate and runs scored percentage.

Using isolated power (isoP), isolated walk rate (isoW), speed score and a dummy variable for handedness (0 for right handed, 1 for left handed and .5 for switch hitters) as my independent variables, I ran a linear regression against double play rate.

This approach has some possible concerns though. The most problematic is that handedness, isoP, isoW and speed score might not be completely independent.

In fact, only two of the relationships are correlated even somewhat strongly. Isolated power and isolated walk rate have a 0.31 correlation (where a number closer to one or negative means represents a stronger relationship), while batting hand and isolated walk rate have a 0.15 correlation. Apparently lefties have a higher isolated walk rate than do righties.

The full set of correlations is:

 isoP / isoW 0.31 isoP / speed score -0.02 isoW/ speed score -0.1 handedness / speed score 0.06 handedness / isoP 0.02 handedness / isoW 0.15

Also, I'd really like to include ground ball/fly ball ratio but I don't have reliable data for the entire Retrosheet era. Finally, it's entirely possible that the relationships between skills have changed over time, and we might be better suited looking at smaller time period to find the proper equation.

### Results

The equation to predict double play rate, or the chance of a batter grounding into a double play when faced with at least a runner on first and less than two outs, according to my regression is:

gidpRate = 0.215 * isoP + 0.529 * isoW + 0.009 * speed score - 0.015 * (0 if batter is right handed, 0.5 if batter is a switch hitter, 1 if batter is left handed).

All the variables are significant at the 99 percent confidence level, and the entire formula describes roughly 76 percent of the variation in double play rate.

Interestingly enough, power, batting eye and speed all contribute positively to the rate of double plays. Being left handed is about the only advantage. Of course, since you rarely find players who are fast power hitters with good eyes, the formula probably doesn't do a good job of capturing the real interaction between those skill sets.

Now that the hard work is out of the way, let's look at the fun stuff.

Most expected double plays

Player Season Expected DPs
Barry Bonds 2001 37
Jeff Bagwell 1999 36
Barry Bonds 1998 35
Jeff Bagwell 1996 33
Jeff Bagwell 2000 32
Willie Mays 1962 32
Jeff Bagwell 2001 32
Sammy Sosa 2001 31
Alex Rodriguez 2007 31
Mark McGwire 1998 31
Larry Walker 1997 31

Wow, that's definitely the Jeff Bagwell list. Basically, the better hitter you are, the more double plays you're expected to hit into. This causes me some concern. More on that later.

Best at avoiding the double play

Player Season Expected DPs Actual DPs Delta
Barry Bonds 2001 37 5 32
Barry Bonds 2001 37 5 32
Joe Morgan 1976 28 2 25
Sammy Sosa 2001 31 6 25
Joe Morgan 1975 27 3 24
Mickey Mantle 1955 28 4 24
Jimmy Wynn 1969 29 5 24
Mickey Mantle 1961 25 2 23
Barry Bonds 2004 28 5 23
Mickey Mantle 1956 27 4 23
Barry Bonds 2002 27 4 23

Not surprisingly, there are lot of the players we expected the formula to handle poorly: those who have good eyes, good power and fairly good speed. At this point, I'm thinking I probably should have included contact rate in this regression as well, although that will likely raise problems because of its relationship to isolated power.

Now the worst seasons:

Worst at avoiding the double play

Player Season Expected DPs Actual DPs Delta
Brad Ausmus 2002 9 30 -21
John Bateman 1971 7 27 -20
A.J. Pierzynski 2004 7 27 -20
Jerry Adair 1969 5 24 -19
Miguel Tejada 2008 13 32 -19
Paul Konerko 2003 9 28 -19
Tony Armas 1983 12 31 -19
Sean Casey 2005 9 27 -18
Ken Reitz 1976 6 24 -18
Al Oliver 1984 5 23 -18
Ted Simmons 1973 11 29 -18

Looking at this list, I'm thinking speed isn't being considered enough. Most of the players who miss on the low end are quite slow. Perhaps speed score isn't the best way to estimate speed of a player. It definitely doesn't seem to be a normal distribution which might cause problems when using it as part of a regression.

### The wrapup

I've got a lot of misgivings about the usefulness of these results. I think as they stand right now, they mostly prove the hypothesis that the stereotypical beer league softball player (a.k.a. the Moneyball player) is expected to hit into more double plays than the weaker hitting speedster.

The regression formula seems to go too far though. Those hitters who best combined speed, power and batting eye are predicted to hit into the most double plays. The top hitters in the game dominate both the list of most expected double plays, and the list of best at avoiding expected double plays. I'm thinking the regression equation misses most at the extremes, which calls into question its entire applicability.

At this point, I'm not sure what it's really useful for, besides being a potentially interesting piece of data. If we're trying to predict future double play rate, then we might be better off using the next season's double play rate as our independent variable. If we're attempting to predict whether it makes sense to intentionally walk the current batter because we think the next one might hit into a double play, this calculation might help, but perhaps not as much as looking at his actual double play rate. It's not a value measure, so it can't be used looking backwards.

Perhaps the best we can hope for is that it sheds some light on the interaction between batter skills and that strength at the plate overcomes any negative that stems from grounding into double plays. Future work on the topic can better account for ground ball / fly ball tendencies and contact rate, as well as speed of the runners on the bases, which may provide a more accurate picture of how double plays really unfold.

Dan Turkenkopf is a Yankees fan who spends way too much time poring over baseball statistics (at least according to his wife). He also writes for Beyond the Box Score and can be reached by email.