Retrosheet. A sweeter single word has never been uttered.

I’ve been having a lot of fun going through the fielding data that has been captured by Retrosheet. It’s not the subjective data, like trajectory or zone, but simply basic data, like which pitcher threw the ball to which hitter that was caught by which fielder in which park. This research resulted in a fielding system that I introduced in The 2008 Hardball Times Annual called **With Or Without You** (WOWY).

Analyzing baseball performance data is all about understanding the context—one look at Dante Bichette‘s seasonal batting lines, and any baseball fan knows this to be true. What affects whether a shortstop will convert a batted ball into an out? The three most important parameters are the identity of the pitcher, the identity of the batter, and the park. There are other considerations, such as the game state (inning/score/bases/outs), the identity of the runner(s), the guys playing to the shortstop’s left and right, and the guy that he’s throwing the ball to. However, the first three have the greatest impact.

We can generalize the impact of the batter. When a right-handed batter is at the plate, and the ball is in play (anywhere in the park), the shortstop will make an out 15% of the time. With a lefty batter, the out conversion rate is 9%. Clearly, the handedness of the batter matters a lot. Presuming that the shortstop will face a somewhat random collection of batters over the course of a season, let’s alter one of the parameters to be the handedness of the batter (in lieu of the identity of the batter).

As it happens, we can also generalize the impact of the fielder. Today, I look at the data from the angle of the shortstop’s age.

**Matched Pairs**

In 1976, Robin Yount was the shortstop for pitcher Jerry Augustine on 233 occasions at County Stadium in Milwaukee when a righty batter put the ball in play (excluding home runs and bunts). Thirty of those times, the 21-year-old Robin Yount made the out. In the following season, that exact same combination (Yount/Augustine/County Stadium/righty batter) was involved with 289 balls in play, of which Yount made the out 47 times. Pro-rated down to 233 at-bats (as in 1976), Yount would have made 38 outs. The only substantial difference between the two seasons is that Yount was 21 years old when he made 30 outs, and 22 years old when he made (a prorated) 38 outs.

Let’s call these *matched pairs*: We have a duplicate set of parameters in each group, except for the age of the shortstop. All we need to do now is go through the Retrosheet years (1956-2006, excluding 1999) and repeat the work. There are 1,785 such matched pairs with the shortstop at ages 21 and 22. The total number of balls in play is 20,994, which is fairly substantial. Adding up the number of outs made by the shortstop, we have 2,721 outs at age 21 and (with these exact same shortstops, pitchers and parks) 2,750 outs at age 22. We can therefore conclude that the shortstops improved from age 21 to 22 to the tune of 29 extra outs on 20,994 balls in play. In a typical season with 4,000 balls in play, that translates to almost six more outs.

If we repeat this process with a new set of players and parks for the age 22/23 matched pairs, we get 60,439 balls in play. In this case, there is almost no change: 7,889 outs at age 22 versus 7,895 outs at age 23. In effect, the performance of these shortstops with these pitchers at these parks was virtually the same at ages 22 and 23. We therefore conclude that there was no improvement in talent. At age 23/24 we also get exactly no improvement: 82,569 balls in play, with exactly 10,754 outs at each of ages 23 and 24.

Based on this study, the defensive peak for a shortstop is between the ages of 22 and 24. Afterwards, for the matched pair at every age level, the shortstops showed a decline. The average decline is nine plays per year, which roughly corresponds to seven runs per year.

Here’s the chart for the above data along with all the other age pairs:

AGE1 | AGE2 | PA | OUT_YR1 | RATE6_YR1 | OUT_YR2 | RATE6_YR2 | DELTA | CHAIN |

21 | 22 | 20,994 | 2,721 | 0.130 | 2,750 | 0.131 | 6 | -6 |

22 | 23 | 60,439 | 7,889 | 0.131 | 7,895 | 0.131 | 0 | 0 |

23 | 24 | 82,569 | 10,754 | 0.130 | 10,754 | 0.130 | 0 | 0 |

24 | 25 | 125,146 | 16,536 | 0.132 | 16,316 | 0.130 | -7 | 0 |

25 | 26 | 148,236 | 19,497 | 0.132 | 19,438 | 0.131 | -2 | -7 |

26 | 27 | 157,718 | 20,691 | 0.131 | 20,466 | 0.130 | -6 | -9 |

27 | 28 | 137,983 | 18,268 | 0.132 | 18,002 | 0.130 | -8 | -14 |

28 | 29 | 127,548 | 16,883 | 0.132 | 16,298 | 0.128 | -18 | -22 |

29 | 30 | 105,921 | 13,998 | 0.132 | 13,702 | 0.129 | -11 | -40 |

30 | 31 | 87,081 | 11,450 | 0.131 | 11,099 | 0.127 | -16 | -52 |

31 | 32 | 72,144 | 9,495 | 0.132 | 9,276 | 0.129 | -12 | -68 |

32 | 33 | 64,004 | 8,376 | 0.131 | 8,287 | 0.129 | -6 | -80 |

33 | 34 | 36,877 | 4,967 | 0.135 | 4,950 | 0.134 | -2 | -85 |

34 | 35 | 24,071 | 3,238 | 0.135 | 3,168 | 0.132 | -12 | -87 |

35 | -99 |

DELTA refers to the change in outs per 4,000 balls in play from one year to the next. CHAIN is the running total of DELTA. Start CHAIN at -6 outs per 4,000 balls in play at age 21. Then, add the DELTA in the first row (+6) to the CHAIN (-6) and put that total (0) as the CHAIN at age 22. Keep going for every row.

What this chart shows is that there’s a defensive peak for shortstops from ages 22 to 24, and then a long progression downward as the shortstop ages.

**Selective Sampling**

Now, this is an enormous plummet. Our own expectation was a peak in the mid-20s, followed by a gentle drop of three plays or so every year; instead, we’re getting a drop of nine plays per year.

The steepness is due to selective sampling: The only shortstops who survive the study are those good enough (or thought to have been good enough) to play in back-to-back years. And a player who performs above the population average is likely a player who benefited from good fortune. Consider that the league-average out rate is around 12.5%, but, for every matched pair under observation, the out rate for shortstops in the first of the two years is at least 13.0%. In essence, that 13.0% contains a tinge of luck. And that luck needs to be extracted.

Just as an illustration, let’s say that we need to remove 0.15% from the sample out rate in the first year to establish the group’s true talent level. (The 13.0% is a sample performance rate, a mix of true talent plus random variation.) Here’s how that new chart looks:

AGE1 | AGE2 | PA | OUT_YR1 | RATE6_YR1 (regressed) | OUT_YR2 | RATE6_YR2 | DELTA | CHAIN |

21 | 22 | 20,994 | 2,721 | 0.128 | 2,750 | 0.131 | 12 | -28 |

22 | 23 | 60,439 | 7,889 | 0.129 | 7,895 | 0.131 | 6 | -16 |

23 | 24 | 82,569 | 10,754 | 0.129 | 10,754 | 0.130 | 6 | -10 |

24 | 25 | 125,146 | 16,536 | 0.131 | 16,316 | 0.130 | -1 | -4 |

25 | 26 | 148,236 | 19,497 | 0.130 | 19,438 | 0.131 | 4 | -5 |

26 | 27 | 157,718 | 20,691 | 0.130 | 20,466 | 0.130 | 0 | 0 |

27 | 28 | 137,983 | 18,268 | 0.131 | 18,002 | 0.130 | -2 | 0 |

28 | 29 | 127,548 | 16,883 | 0.131 | 16,298 | 0.128 | -12 | -2 |

29 | 30 | 105,921 | 13,998 | 0.131 | 13,702 | 0.129 | -5 | -14 |

30 | 31 | 87,081 | 11,450 | 0.130 | 11,099 | 0.127 | -10 | -19 |

31 | 32 | 72,144 | 9,495 | 0.130 | 9,276 | 0.129 | -6 | -29 |

32 | 33 | 64,004 | 8,376 | 0.129 | 8,287 | 0.129 | 0 | -35 |

33 | 34 | 36,877 | 4,967 | 0.133 | 4,950 | 0.134 | 4 | -35 |

34 | 35 | 24,071 | 3,238 | 0.133 | 3,168 | 0.132 | -6 | -31 |

35 | -36 |

Note that the rate column for year 1 is marked as “regressed” to avoid confusion. Now, the CHAIN column looks much more reasonable: The peak is from ages 24 to 28, and then the drop is only five plays per year. In the three years leading up to age 24, the gain averages eight plays per year.

The degree of regression will establish the peak age and the slope toward the peak age. I tried different regession values, and it always maxes out at age 28. So, that is one conclusion we can make: On average, shortstop fielding prowess peaks no later than age 28. Recall that, in the first (unregressed) table, the peak age was around 23. So, the true answer lies somewhere between these points (without regression and with maximum regression). The second chart above seems to satisfy this condition.

As for the slope toward the peak, that’s another issue. I can adjust the regression so that we have very little slope (only one fewer out per year), whereas with no regression, the slope was nine outs per year. What is the real answer? I don’t know yet. But, my guess is that the second chart above will likely be very close to the final best answer.

*Next up: aging curves for all the other positions.*

## Leave a Reply