Predicting double play rate
by Dan TurkenkopfJuly 16, 2009
It seems pretty intuitive, right? Big, slow sluggers will hit into more double plays than speedsters.
They hit the ball harder, so the ball gets to the infielders a lot faster. And they obviously don't run as well, so they're easier to double up. But sluggers also are less likely to put the ball in play and less likely to hit a ground ball when they do put the ball in play.
So is our theory really true? Do power hitters hit into more double plays than weaker hitter who might be faster?
Methodology
The approach I used is a pretty simple one. I figured the double play rate for every player season since 1954 that had more than 50 opportunities, where an opportunity is a plate appearance with a runner on first and fewer than two outs. This left me with 13,169 player seasons. For each of those player seasons, determine isolated power (slugging percentage - batting average), isolated walk rate (on base average - batting average) and speed score. I calculated speed score the same way Fangraphs does; using stolen base percentage, stolen base frequency, triple rate and runs scored percentage.
Using isolated power (isoP), isolated walk rate (isoW), speed score and a dummy variable for handedness (0 for right handed, 1 for left handed and .5 for switch hitters) as my independent variables, I ran a linear regression against double play rate.
This approach has some possible concerns though. The most problematic is that handedness, isoP, isoW and speed score might not be completely independent.
In fact, only two of the relationships are correlated even somewhat strongly. Isolated power and isolated walk rate have a 0.31 correlation (where a number closer to one or negative means represents a stronger relationship), while batting hand and isolated walk rate have a 0.15 correlation. Apparently lefties have a higher isolated walk rate than do righties.
The full set of correlations is:
| isoP / isoW | 0.31 |
| isoP / speed score | -0.02 |
| isoW/ speed score | -0.10 |
| handedness / speed score | 0.06 |
| handedness / isoP | 0.02 |
| handedness / isoW | 0.15 |
Also, I'd really like to include ground ball/fly ball ratio but I don't have reliable data for the entire Retrosheet era. Finally, it's entirely possible that the relationships between skills have changed over time, and we might be better suited looking at smaller time period to find the proper equation.
Results
The equation to predict double play rate, or the chance of a batter grounding into a double play when faced with at least a runner on first and less than two outs, according to my regression is:
gidpRate = 0.215 * isoP + 0.529 * isoW + 0.009 * speed score - 0.015 * (0 if batter is right handed, 0.5 if batter is a switch hitter, 1 if batter is left handed).
All the variables are significant at the 99 percent confidence level, and the entire formula describes roughly 76 percent of the variation in double play rate.
Interestingly enough, power, batting eye and speed all contribute positively to the rate of double plays. Being left handed is about the only advantage. Of course, since you rarely find players who are fast power hitters with good eyes, the formula probably doesn't do a good job of capturing the real interaction between those skill sets.
Now that the hard work is out of the way, let's look at the fun stuff.
Most expected double plays
| Player | Season | Expected DPs |
|---|---|---|
| Barry Bonds | 2001 | 37 |
| Jeff Bagwell | 1999 | 36 |
| Barry Bonds | 1998 | 35 |
| Jeff Bagwell | 1996 | 33 |
| Jeff Bagwell | 2000 | 32 |
| Willie Mays | 1962 | 32 |
| Jeff Bagwell | 2001 | 32 |
| Sammy Sosa | 2001 | 31 |
| Alex Rodriguez | 2007 | 31 |
| Mark McGwire | 1998 | 31 |
| Larry Walker | 1997 | 31 |
Wow, that's definitely the Jeff Bagwell list. Basically, the better hitter you are, the more double plays you're expected to hit into. This causes me some concern. More on that later.
Best at avoiding the double play
| Player | Season | Expected DPs | Actual DPs | Delta |
|---|---|---|---|---|
| Barry Bonds | 2001 | 37 | 5 | 32 |
| Barry Bonds | 2001 | 37 | 5 | 32 |
| Joe Morgan | 1976 | 28 | 2 | 25 |
| Sammy Sosa | 2001 | 31 | 6 | 25 |
| Joe Morgan | 1975 | 27 | 3 | 24 |
| Mickey Mantle | 1955 | 28 | 4 | 24 |
| Jimmy Wynn | 1969 | 29 | 5 | 24 |
| Mickey Mantle | 1961 | 25 | 2 | 23 |
| Barry Bonds | 2004 | 28 | 5 | 23 |
| Mickey Mantle | 1956 | 27 | 4 | 23 |
| Barry Bonds | 2002 | 27 | 4 | 23 |
Not surprisingly, there are lot of the players we expected the formula to handle poorly: those who have good eyes, good power and fairly good speed. At this point, I'm thinking I probably should have included contact rate in this regression as well, although that will likely raise problems because of its relationship to isolated power.
Now the worst seasons:
Worst at avoiding the double play
| Player | Season | Expected DPs | Actual DPs | Delta |
|---|---|---|---|---|
| Brad Ausmus | 2002 | 9 | 30 | -21 |
| John Bateman | 1971 | 7 | 27 | -20 |
| A.J. Pierzynski | 2004 | 7 | 27 | -20 |
| Jerry Adair | 1969 | 5 | 24 | -19 |
| Miguel Tejada | 2008 | 13 | 32 | -19 |
| Paul Konerko | 2003 | 9 | 28 | -19 |
| Tony Armas | 1983 | 12 | 31 | -19 |
| Sean Casey | 2005 | 9 | 27 | -18 |
| Ken Reitz | 1976 | 6 | 24 | -18 |
| Al Oliver | 1984 | 5 | 23 | -18 |
| Ted Simmons | 1973 | 11 | 29 | -18 |
Looking at this list, I'm thinking speed isn't being considered enough. Most of the players who miss on the low end are quite slow. Perhaps speed score isn't the best way to estimate speed of a player. It definitely doesn't seem to be a normal distribution which might cause problems when using it as part of a regression.
The wrapup
I've got a lot of misgivings about the usefulness of these results. I think as they stand right now, they mostly prove the hypothesis that the stereotypical beer league softball player (a.k.a. the Moneyball player) is expected to hit into more double plays than the weaker hitting speedster.
The regression formula seems to go too far though. Those hitters who best combined speed, power and batting eye are predicted to hit into the most double plays. The top hitters in the game dominate both the list of most expected double plays, and the list of best at avoiding expected double plays. I'm thinking the regression equation misses most at the extremes, which calls into question its entire applicability.
At this point, I'm not sure what it's really useful for, besides being a potentially interesting piece of data. If we're trying to predict future double play rate, then we might be better off using the next season's double play rate as our independent variable. If we're attempting to predict whether it makes sense to intentionally walk the current batter because we think the next one might hit into a double play, this calculation might help, but perhaps not as much as looking at his actual double play rate. It's not a value measure, so it can't be used looking backwards.
Perhaps the best we can hope for is that it sheds some light on the interaction between batter skills and that strength at the plate overcomes any negative that stems from grounding into double plays. Future work on the topic can better account for ground ball / fly ball tendencies and contact rate, as well as speed of the runners on the bases, which may provide a more accurate picture of how double plays really unfold.
Dan Turkenkopf is a Yankees fan who spends way too much time poring over baseball statistics (at least according to his wife). He also writes for Beyond the Box Score and can be reached by email.






 
I’m not an expert on regressions, but this model just doesn’t seem to make sense. How can speed have a positive coefficient? This would mean that, all other things being equal, slow runners are better at avoiding the double play than fast runners. How can that be true, all other things being equal?
I think you could make a good model with just three independent variables: GB% (the percentage of plate appearances that result in a ground ball), speed score and handedness.
You can get a pretty good measure of GB% for all the retrosheet period by looking at outs (on balls in play) and whether an assist was credited on the play. There are a few unassisted groundouts, of course, and perhaps GB% on outs it not the same as on all BIP, but this will be a decent enough measure.
Since you are parsing the retrosheet data, you might as well just consider how a batter actually batted (L or R) for a particular plate appearance instead of adding switch-hitting to the handedness variable.