As soon as a player starts racking up numbers in the big leagues, we know a lot about where his performance might go from there. Researchers have worked hard to understand how players develop at that level, and rightfully so: You can’t make respectable projections without at least some approximate knowledge in that department.
What’s more difficult—and, accordingly, has received less attention—is how players develop before they hit the majors. For minor leaguers, we can use major league equivalencies (MLEs) and approximate, but the the wide range of run environments and competition levels in the minors means that we need to make more and more assumptions. The results are ever more approximate.
This is one of the few areas where working with college data might actually be a plus. Usually, the paucity of available stats, along with the vast number of players who will never go pro, means that college-level analysis is daunting. Here it solves some problems. We don’t need to worry about players changing leagues, or the biases involved in focusing only on players who reached a certain level. Many college players give us two, three, even four years of performance in the same park against the same competition.
College and the minors
What college data can give us, then, is approximate development patterns for ages 19-22. Not every college player is the same age, of course, but since date of birth is hard to come by for many players, it’s more practical to assume that freshmen are 19, sophomores are 20, and so on. Athletes tend to be a bit old for their grade, and baseball is played in the spring semester, so it’s a reasonable assumption.
Naturally, we need to use extreme caution in applying these development patterns outside of college ball. If we look at only those players who end up in the pros, those who go to college and those who don’t profile very differently. A very polished 18-year-old may not go to college at all, while an 18-year-old who hasn’t convinced scouts of his ability to play is much more likely to start work on a degree.
Even if we are limited to using college results for college players, there’s a lot to be gained. Let’s turn to the numbers and see what they have to tell us.
Preliminary aging patterns
To generate year-to-year multipliers, I identified all the players over the last few years who amassed at least 150 at-bats in consecutive seasons. I assigned each pair of seasons to one of three groups: Freshman/Sophomore, Sophomore/Junior, or Junior/Senior. I then adjusted the numbers for park and strength of schedule. Each one can vary a bit from year to year for a given school, and the adjustment also allows us to include transfers.
Finally, for each of the three groups, I averaged every pertinent pair of seasons, weighting for playing time. For instance, a player who had 250 at-bats in each season is weighted more heavily than one who had 170 and 220. Across all of Division One college baseball, here’s what I came up with:
Pair Players $H $D+T $HR $BB $K $SB $SB/ATT Fr -> So 335 1.05 1.26 1.73 1.19 0.98 1.34 1.11 So -> Jr 612 1.04 1.18 1.70 1.18 1.01 1.32 1.09 Jr -> Sr 746 1.03 1.18 1.57 1.18 1.01 1.30 1.07
For a small table, there’s a lot of information here. Each entry is a multiplier representing the average increase in a given statistical category. For instance, the average freshman with 150 or more at-bats got 5 percent more hits (per at-bat) as a sophomore. A few more observations:
- “Players” refers to the number of pairs of seasons I was able to use. The small number of Fr/So is no surprise since relatively few freshmen are given the opportunity to start.
- It’s also as expected that while most stats increase over time, extra-base hits increase more than hits, and home runs even more. Not only are players naturally getting bigger and stronger, but many factors increase their ability and motivation to get bigger. Better nutrition advice is available, high-tech college gyms are at hand, and coaches and peer groups expect players to take advantage of them.
- I expected that we’d see more improvement in strikeout rate. As it is, strikeout rate barely changes. It’s possible that, as players get stronger and learn to swing for the fences, their batting eye improves but they accept a higher number of swing-and-misses. Perhaps the two effects roughly cancel out.
- $SB refers to stolen bases per time on base. Thus, since the average player gets on base more often as he ages, he steals even more than these multipliers suggest. Compared to the increase in SB success rate, it’s clear that players run a lot more as they get older. This may have more to do with coaching (and a coach’s confidence in his players) than pure skill.
But wait—maybe we can do better. As I noted above, we can make some assumptions about a player simply based on whether he goes pro or goes to college. They might not hold in all instances, but for a project like development patterns, they may be very pertinent.
What if the same is true for different segments of Division One? Certainly it seems wrong to treat heavily recruited hitters who end up at LSU the same way we treat the guys who end up starting at Iona or Alcorn State. Intuitively, the same sort of difference exists between high-profile programs and other programs that exists between 19-22-year-olds in college and 19-22-year-olds in the minors.
Most of us care more about those higher-profile players, so let’s focus on them. Using the conference strength data I shared about a year ago, I’ve arbitrarily divided D-1 into “elite conferences” (the 82 teams in the ACC, SEC, Pac-10, WCC, Big 12, Big East, Big West and Colonial USA) and everybody else. It’s not a perfect distinction, but for projects that are explicitly geared toward identifying or analyzing draft-worthy talent, focusing on “elite” conferences seems more appropriate.
With that in mind, let’s look at the same table, only for players in these elite conferences:
Pair Players $H $D+T $HR $BB $K $SB $SB/CS Fr -> So 133 1.02 1.30 1.68 1.18 1.00 1.49 1.10 So -> Jr 221 1.01 1.15 1.62 1.18 1.03 1.25 1.06 Jr -> Sr 188 1.01 1.10 1.57 1.11 1.02 1.35 1.12
With just a few exceptions, there’s less improvement from year to year for players in elite conferences than for the average D-1 player. This makes sense. In general, the more impressive the player in prime recruiting years, the less room for improvement. The typical stud hitter who heads to Rice or Florida State has already received years of high-quality coaching, while someone at a second-tier school may be hearing helpful tips for the first time.
Perhaps most marked in this second table is the difference between Soph/Jr and Jr/Sr improvements. Keep in mind that the amateur draft has a lot to say about which juniors stick around for a senior year. Hundreds of players are plucked from the college ranks each year, a disproportionate number of them juniors from elite conferences. Thus, there are fewer pro-level prospects in the Jr/Sr pool than in the Soph/Jr pool.
Illustration: Zach Cox
The numbers we’ve seen so far are awfully abstract. Let’s look at a few illustrations to get a firmer grasp of what these multipliers mean.
Let’s start with a 2010 sophomore. Zach Cox of Arkansas State has only one year of college experience, but he’s draft-eligible this year, making his spring campaign particularly closely watched. The Red Wolves play in the Sun Belt conference, outside of my “elite” group. We could make an argument that a premium talent like Cox should be treated differently, but for today, let’s use the multipliers for all of Division One.
Here are Cox’s numbers from 2009, along with “projected” 2010 numbers, using only the raw ’09 stats and the generic multipliers:
Year AB H 2B 3B HR BB SO AVG OBP SLG 2009 199 53 15 2 13 20 65 0.266 0.345 0.558 2010 199 56 19 3 22 24 64 0.281 0.370 0.739
Before we get too excited, this approach is not a projection, it’s just an illustration. If we were to boldly claim that we believe Cox will post numbers like this, we’d be assuming not only that he has something like an “average” profile for a college sophomore, but also that his 2009 numbers represented his actual talent level, uninfluenced by too much luck.
Again, it’s worth noting the tiny observed change from year to year in strikeout rate. The knock on Cox is just that. As you can see from his ’09 stats, he K’d about one in three trips to the plate. If his freshman-to-sophomore transition works in the generic manner, he’ll whiff just as often this year. Of course, Cox isn’t generic, so we’ll have to wait and see if he can improve his contact rate.
Illustration: Christian Colon
Let’s move on to a junior. Fullerton shortstop Christian Colon is perhaps the most highly-touted junior for this year’s draft, making him worthy of our attention.
In this case, I can show you two years of his stats, again along with what his 2010 numbers would look like if he ages like the generic college junior.
Year AB H 2B 3B HR BB SO SB CS AVG OBP SLG 2008 243 80 12 2 4 19 25 13 4 0.329 0.406 0.444 2009 255 91 16 2 8 24 24 15 7 0.357 0.442 0.529 2010 255 92 18 2 13 28 25 20 8 0.361 0.452 0.600
The typical Soph/Jr improvement, especially for a player in an elite conference, isn’t as striking as the Fr/Soph jump, so this doesn’t exactly show Colon becoming the next Matt LaPorta. But if he follows the generic path, his already superlative OBP will climb even higher, and his stolen base rate will creep past 70 percent.
Illustration: Blake Dean
To round out the set, let’s look at one more. Blake Dean, first baseman at Louisiana State, was selected in the 10th round by the Twins last year, but opted to return to school. Given the plateau his numbers hit from his sophomore to junior year, it’s understandable for him to think he could put together a big senior campaign and do better in his second try at the draft.
Here are Dean’s college stats, along with a 2010 line generated from his ’09 performance and the typical Jr/Sr improvement observed in elite conferences:
Year AB H 2B 3B HR BB SO AVG OBP SLG 2007 206 65 12 3 7 20 25 0.316 0.366 0.505 2008 269 95 18 3 20 35 46 0.353 0.432 0.665 2009 259 85 18 0 17 50 37 0.328 0.432 0.595 2010 259 86 20 0 27 56 38 0.332 0.445 0.722
Once again, this isn’t a projection. But in this case, given a sophomore season in which Dean might have outperformed his skill level and a junior year when he didn’t, the numbers seem plausible.
There are a lot of directions to go from here. “Elite” conferences aren’t the only way we can break down aging patterns. In the tried-and-true path of projectors throughout history, we could look at aging patterns by position, by body type, or any number of other variables.
Combined with appropriate regression to the mean (and maybe some summer league stats thrown in for good measure), we might just have ourselves the beginning of a projection system for college players.