A couple years ago, I created a system that predicted career statistics for major league players, debuting it in the *2008 THT Season Preview*. I published updated projections in last year’s *Season Preview*, but this year we’re moving the *Season Preview* to an online format (more about that very soon!), so I get to share the predictions with all of you this time around. The danger, of course, is that people take this too seriously, so please remember: We’re just trying to have some fun. The error margins around all of these predictions are huge, so please, take them all with a grain of salt.

Still, with that said, I think my system does a very nice job of predicting a player’s career numbers based on just a few simple inputs. It’s much like the Favorite Toy, but a little more rigorous, and as I argued in the *Season Preview 2008*, more accurate as well. Anyway, here’s a quick run-down on the how the system works, taken straight from the pages of the *Season Preview 2008*:

First, I put together a database of all major league players who debuted after World War II and retired by 2006. I then found every string of three consecutive seasons for every hitter in my database and every string of two consecutive seasons for every pitcher (as a third season proved to be unnecessary). I grouped the players by age and found how many home runs, hits, or wins they had remaining in their career at a given age. Then I ran a regression trying to predict that number based on the number of hits, home runs, or wins they had in the previous few seasons. For very young or old players, only more recent seasons proved to be significant, so that’s all I used. Players younger than 20 or older than 40 in their most recent season were excluded because of the minuscule sample sizes.

Here’s an example of how the system works. A 28-year-old player’s remaining home runs can be projected by multiplying his home runs in the most recent season by 3.475; the year before that by 1.239; and the year before that by 0.939. So Adam Dunn, who will have a seasonal age of 28 in 2008 and has hit 40 home runs each of the past three seasons, is projected to hit 3.475*40 + 1.239*40 + 0.939*40 = 226 home runs in the rest of his career. With 238 career home runs already to his credit, the system projects that he will hit 464 career home runs. The Favorite Toy, meanwhile, predicts 526 home runs, while Frey’s system projects 437.

But just projecting career home run totals is kind of boring. If Dunn ends up with 400 some-odd home runs, he will go down as a very good home run hitter, but nothing more, Dave Kingman with more walks. What we really want to know are Dunn’s odds of hitting some historic number, say 500 home runs or even 763. Luckily, statistics provide us with a simple way of estimating that, the standard error, which is what news organizations use to give you the margin of error associated with their polls.* In Dunn’s case, the standard error is around 111 home runs, meaning that he would have to finish only about a third of standard deviation away from his projection to hit 500 career home runs, but 2.7 standard deviations away to hit 763. In plain English, that translates to a 38 percent chance of hitting 500 home runs, but less than a 4 percent chance of breaking Barry Bonds’ all-time record.

* For those who care about the statistical details: I could not use a general standard error because the estimates exhibited heteroskedasticity, which is an increase in error bars as the predictions get bigger. I tried a few well-known statistical methods for dealing with the issue, but it persisted, so I settled on a kind of ad-hoc solution, using regression analysis to project the standard errors at various prediction levels. So as the total number of home runs, hits, or wins remaining goes up, so do the error bars.

OK, so maybe that was not so quick. Anyway, I’m glad you’re still with me because now we can get to the fun part—projecting the pitchers and hitters likeliest to hit some of baseball’s biggest milestones. Let’s start by looking at the 3,000-hit mark:

Name Hits 3,000 Odds Derek Jeter 3197 68.7% Alex Rodriguez 2883 36.7% Ivan Rodriguez 2881 29.7% Elvis Andrus 2127 27.0% Johnny Damon 2762 24.7% Justin Upton 2077 21.8% Miguel Cabrera 2425 21.6% Ken Griffey 2863 20.9% Albert Pujols 2495 18.8% Ryan Zimmerman 2147 17.5%

The first two years I did this exercise, no player registered better than even odds at hitting this mark, though both times, Jeter was very close. After a stellar 2009 season, Jeter now projects to achieve it handily, though I would bet that he has a better than 68.7 percent chance at doing so. Still, what the numbers reveal is just how difficult it is to project the future for older players. Sure, Jeter has been extremely durable for much of his career—still, at 36, it’s just a matter of time before his numbers and playing time begin to decline. In all likelihood, however, that time will come after he’s reached 3,000 hits.

Jeter is a couple years away from reaching 3,000 hits—Elvis Andrus, on the other hand, is going to need a couple decades. How can our system give him at 27 percent shot after just one season in the major leagues? Well, the key here is that this one season came at age 20, an extraordinarily rare accomplishment which implies a lot about a player—namely, that he is very, very good.

Andrus gathered 128 hits last year; here is a list of players who, since 1900, have collected between 118 and 138 hits at age 20: Whitey Witt, Jimmie Foxx, Cecil Travis, Willie Mays, Eddie Mathews, Hank Aaron, Roberto Clemente. Whether or not Andrus turns out to be as good as some of these players remains to be seen, but it’s clear that with just one season he has put himself in some very good company.

Now 3,000 hits is nice and all, but it’s no record—763 home runs, however, would be. Which players are most likely to pass Bonds in the record books? Let’s take a look:

Name HR 763 Odds Prince Fielder 513 21.8% Alex Rodriguez 667 14.2% Evan Longoria 379 8.0% Justin Upton 353 6.1% Albert Pujols 543 5.3% Mark Reynolds 375 4.6% Ryan Braun 372 3.5% Miguel Cabrera 437 3.1% Adam Dunn 488 3.1% Ryan Howard 426 2.3%

Overall, the odds that one of these 10 players eventually passes Bonds are slightly better than even, and overall the odds that a currently active player sets the record are almost 3-in-4. Let’s focus on the likeliest contenders, though.

The first question that you might have is how A-Rod doesn’t end up at the top of this list. The answer is simple: The system has no way of knowing that A-Rod has eight years remaining on his contract, and is likely to start for much of that time. Simply put, your average 34-year-old who has hit 30 and 35 home runs each of the past two years is expected to tack on another 84 before he retires. That may sound low, but think about it: The most likely scenario is that this player slows down, plays a few more years, and retires. Maybe he hits 25 home runs this year, 22 the next, 20 the year after that, 10 the year after that playing part-time, and then seven more before finally moving on. But Rodriguez is no average 34-year-old.

The issue here is not so much that A-Rod is an exceptionally talented player as it is that he is very unlikely to retire until he’s 41 years old, and with his close to $30 million-a-year salaries, it’s extremely unlikely that he’ll get benched. So even if A-Rod’s injury troubles persist, and even if his skills start to go a bit, he will have every chance to keep hitting, and unless he completely falls off, that means that Rodriguez has a pretty decent chance at 763 home runs.

Still, Prince Fielder’s chances aren’t bad either. You can see that our system already believes he’ll reach the 500-home run mark, and it gives him a better than 1-in-5 shot at topping Bonds’ record as well. That’s what happens when you knock 160 home runs by age 25.

Now we have to be very careful, because Fielder’s projected career total is still a massive 250 home runs away from the record, but what the system is telling us is that there’s a lot of uncertainty in that projection. Could Fielder get injured tomorrow, never really recover, and fall short of 200 home runs? Sure. But remember, Fielder is also the youngest player ever to hit 50 home runs in a season. What’s to say he doesn’t keep up that pace for the next decade, age gracefully, and obliterate Bonds’ record? That’s not the likeliest scenario, but the fact is, **we just don’t know**, and ultimately that’s why the system ends up liking Fielder so much. (The same goes for Evan Longoria and Justin Upton, though obviously to a bit of a lesser extent.)

Now let’s turn our attention to the pitchers, and see if Randy Johnson will really be the last 300-game winner.

Name Wins 300 Odds Andy Pettitte 263 19.5% CC Sabathia 212 12.8% Rick Porcello 151 10.6% Felix Hernandez 167 10.1% Roy Halladay 199 7.2% Justin Verlander 146 6.0% Jair Jurrjens 123 5.4% Josh Beckett 166 5.4% Mark Buehrle 183 5.4% Zack Greinke 131 5.1%

Here we can see both why so much has been written about the death of the 300-game winner and why that is unlikely to be the case. On the one hand, no current pitcher is particularly close to winning 300 games. Andy Pettitte is the closest, with 229 wins already to his name, but at 38 years of age, he still has just a 1-in-5 of winning 71 more games.

The good news for Pettitte is that he has a track record of durability and consistency and he pitches in front of a great offense; the bad news is that he’s already only a slightly above-average starter, meaning that any decline in ability could also mean the end of his career, and with his consistent 14-win seasons, he would still have to pitch until he was 42 or 43 to make it to 300. Now, that’s not to say that this is impossible, but it is unlikely.

If I had to bet on one pitcher making it to 300, my money would be on CC Sabathia. Sabathia has all the right characteristics for a 300-game winner: He started at an early age, winning 17 games at age 20; he’s been extremely durable, with only minimal injury problems; he’ll have a great offense backing him up for at least the next six years; and, of course, he is an elite pitcher.

Moreover, get this: Sabathia’s 136 wins through age 28 are more than any 300-game winner since Walter Johnson, who notched his 300th career victory in 1920. Now, none of this means that Sabathia will win 300 games—after all, there are many pitchers who won even more than 136 games by 28, and still fell well short of 300—which is why the system only gives him a 12.8 percent shot, but it does mean that he certainly has all the right stuff.

More importantly, what the numbers tell us is that if the next 300-game winner isn’t Pettitte or Sabathia—and more likely then not, that will indeed be the case—it will still be **someone**. There are tons of really good, really young pitchers out there who are still far away from the 300-game mark, but some of whom will eventually make a run at it. So, if you add the numbers up, chances are that some currently active pitcher will eventually win 300 games—we just don’t know who it will be yet.

And ultimately, isn’t that what makes baseball so great? Sure, we can run the numbers and see the likeliest contenders, but at the end of the day, the unlikeliest things always seem to happen on the field.

**References & Resources**

For those of you who want to see career projections for every major league player that was active in 2009, feel free to download this spreadsheet. It contains career projections for hits, home runs, and wins, as well as each player’s likelihood of reaching 3,000 hits, 500 home runs, 763 home runs, or 300 wins.

Josh said...

So Ichiro already has 2030 hits. The last 3 years he has 225, 213, and 238 respectively (and never below 200 in any season). Granted he is 36 years old this entire season but it seems he would have better odds of reaching 3000 hits than most of the players listed.

Joe R said...

Are these standard distributions? Because, take Fielder for example, if his error follows a bell curve, and 513 is the mean/median/mode, then 763 is the point where z = 0.78. That’s a standard deviation of 320.5 home runs.

That is a huge standard deviation.

David Gassko said...

Your math is a bit off, Joe—1 SD is 279 in this case. But yes, that is a huge error margin, which is why I keep repeating, these numbers are just for fun. Too many things can happen for me to claim any sort of accuracy or prescience here.

John Walsh said...

I agree with Jason’s point: your choice of a 118-138 hit window for Andrus’s comparables is misleading. Why cut your selection above at 138 hits?

If you look at everyone who had 120 hits or more at age 20, you get 46 players who averaged 158 hits in their age-20 season. Sure, there are many great players in the list, but by my quick count, I see 6 3k-hit guys there, or around 13%. A pretty high number, but it doesn’t look as impressive as your list of 7 comps, 3 of which have 3000 hits.

I know, this is just a fun toy, but I’m just sayin’.

David Gassko said...

Hey John,

I cut the selection at 138 simply because I wanted players with similar hits numbers to Andrus. I didn’t mean to mislead, though based on your comments it appears that players with even better numbers didn’t fare quite as well as the group I looked at.

However, you should take a look at an article I wrote in response to similar criticisms a couple of years ago:

http://www.hardballtimes.com/main/article/testing-my-career-projection-system/

All in all, the system does very well, though it may have a slight bias when it comes to young players. If you want to knock Andrus’ odds down to 20% or 22% or whatever, that’s fine, but the fact is his “true odds” are still much higher than the vast majority of people would expect.

Nick said...

I dont see Omar Vizquel on the list, he playe din 2009 so he should be on there, right?

Joe said...

I realize that you’re just showing the math here, and I’m not questioning the math, but in reality the only thing between Jeter and 3,000 hits is catastrophy. He’s 253 short, and the fewest he’s ever had over the course of two seasons is nearly 100 above that amount (344). If he’s able to play, he’s going to get there sometime in 2011.

David Gassko said...

Nick,

The projections are only for players who were between 20 and 40 years old in 2009, so there’s no projection for Vizquel.

Bookbook said...

It would be very helpful to me if you listed current age and total along with the projection.

Much though it pains me to remember that A-Rod is already 34, it helps frame what you’re talking about.

jason said...

Arod had 259 hits as a 20yo. So why does your system account for Elvis Andrus MAYBE being a very good player someday and NOT that ARod was/is a very good player? And does it really just use raw counting stats per year? not hr/ab with application of a regression to playing time averages?

David Gassko said...

Hey Jason,

This is supposed to be a simple system. Sure, we might be able to add a couple percentage points of accuracy by adding in all sorts of variables, but with the huge error bars inherent in projecting career statistics, I don’t see the point of doing that. We’re here to have some fun.

Also, given that, at A-Rod’s age, a player’s hits total from three years ago has no predictive value in projecting how many hits he’ll gather over the remainder of his career, I seriously doubt that his hits total from 14 years ago will matter either.

Gerry said...

Not every day that one sees “heteroskedasticity” and “home run” in the same paragraph.