In the back of The Hardball Times Season Preview 2008, I have a little essay about projecting a player’s career statistics using a fairly simple toy I developed. The toy also allows me to make probabilistic predictions—for example, predicting a player’s odds of reaching 3,000 hits.
In the article, I presented a table of the 10 players who, among those players whose seasonal age in 2008 will be between 21 and 41, have the best odds of reaching 3,000 hits according to my system. For posterity, I’ll reproduce that table here:
Name 3000 Odds Derek Jeter 42.1% Alex Rodriguez 36.1% Delmon Young 32.4% Miguel Cabrera 30.1% Ryan Zimmerman 27.4% Ivan Rodriguez 27.4% Jose Reyes 26.2% Ken Griffey 23.5% David Wright 23.3% Hanley Ramirez 21.3%
A few things in this table stand out, but what surprised me most was the number of young hitters on this list. Here’s what I said in my Season Preview article:
I’ll admit that the odds for some of the younger players, Young and Zimmerman in particular, look really high. Young has one full major league season under his belt, Zimmerman has two. Neither has even broken the 190-hit mark yet. So how come our system says that the odds of one of the two gathering 3000 hits are better than the odds that neither does? Once more, it all comes down to two effects: The high multipliers for extremely young players and the high error bars associated with those kinds of projections. Young was just 21 years old in 2007; Zimmerman was 22. We end up projecting that both will end up with around 2300 hits and the huge standard errors mean that their odds of ending up at 3000 are pretty good.
The Favorite Toy sees things very differently. It tells us that Zimmerman has around an 11% chance at 3000 hits and places Young’s odds at zero. On the other hand, it gives Jeter an 86% chance at hitting 3000 and Rodriguez at 79% shot. Overall, both systems see about 10 active players who were between the ages of 20 and 40 in ’07 ending up with 3000 hits, an encouraging sign since research has shown that the Favorite Toy does a pretty good job at projecting how many players will end up with 3000 hits (though I will admit that number sounds high to me).
Others, too, have commented that the numbers for some of these young players look unrealistically high. Tom Tango, author of The Book: Playing the Percentages in Baseball and occasional contributor to THT, put it bluntly: “It’s easy enough to test: Look for all guys like Zimmerman, and see if 20% of them reached 3,000 hits. My guess is that the answer will be: no.”
Well, let’s start with that. Our system tells us that Zimmerman has a 27.4 percent chance of reaching 3,000 hits. Since World War II, there have been ten 22-year-old hitters with between a 25 and 30 percent chance of reaching 3,000 hits according to my system, averaging, you guessed it, a 27.4 percent shot.
Altogether, these guys are very similar to Zimmerman. How many of them ended up with at least 3,000 career hits? Five, which is actually 50 percent. In other words, based on this small test, the projection for Zimmerman looks not only reasonable but low!
Okay, but that’s only 10 players. What if we expand our sample? What if we look at all players—do the results still hold?
First, let’s take a quick look at all hitters, young and old, who debuted after World War II but have since retired. Let’s project their career statistics at each age between 21 and 41 and see if our predictions match what actually happened. (Keep in mind that each player-season is counted separately, so Hank Aaron, for example, is counted as a 3,000-hit player 21 different times.)
In total, using such a method of accounting, we end up with 332 players reaching the 3,000-hit plateau. And how many does our system expect? Three hundred thirty-eight, an almost perfect match. Bingo.
Now that we see that our system works on an overall level, let’s move on and look at the younger players. Looking at all players who were playing in the major leagues at 20 years of age (and, as usual, who debuted after World War II and are now retired), the system projects that 12 would eventually reach 3,000 hits. Eight actually did. At 21, it projects that 16 guys would get 3,000 hits, and 13 did. At 22? 19 predicted, 15 actual. At 23? 22 and 16.
Overall, there does appear to be a bit of over-prediction, though statistically the difference is not significant. Nonetheless, if you want to downgrade Zimmerman to 20 percent and Young to 25 percent, I won’t complain.
Those numbers are still significantly higher than what Bill James’ Favorite Toy projects. So maybe we should run one more test: What if my system strongly over-projects young hitters with high odds of making it to 3,000 hits, like Young and Zimmerman?
Well then, let’s run the same test as above but restrict ourselves to only the players at each age who were in the top 10 in projected odds of gathering 3,000 hits. Among 20-year-olds, my system thought that 4.7 of the top 10 players would reach 3,000 hits; only two did. At age 21, the prediction was 3.7, and the actual number was three. At 22, 3.5 and four; at 23, 3.5 and four again.
In total, there was only a very slight over-prediction, and that was probably due just to random chance.
In other words, there is little evidence that our system highly overrates young hitters, and good reason to believe that one of Young or Zimmerman will eventually reach 3,000 hits.