Searching for Biases in the First Round of the Draft

The numbers heavily favor Corey Ray being productive early in his career. (via Univ. of Louisville Sports Info.)

The numbers heavily favor Corey Ray being productive early in his career. (via Univ. of Louisville Sports Info.)

Major League Baseball’s amateur draft is something of a crapshoot. Even in the early portion of the first round, where teams reap significant value from their picks on average, busts aren’t uncommon. Delmon Young, Matt Bush, Bryan Bullington, Matt Hobgood. The list goes on. Meanwhile, Mike Trout fell all the way to the 25th overall pick, and quickly blossomed into a generational talent.

Some degree of volatility is expected due to the sheer difficulty of what teams are tasked with doing. Figuring out how good a 21-year-old college kid will be at age-25 is a tall order, while doing so for a high schooler is an even taller order. Amateur players are inherently risky assets, which almost certainly helps explain the unevenness of first round draft returns.

But I’d posit that at least some of that unevenness isn’t purely the result of random chance. Some of this is likely a case-by-case basis thing: More thorough scouting of the New Jersey area might have predicted Mike Trout’s star potential, for example. But perhaps there’s also something that runs deeper than that. Perhaps there are systemic biases in the way teams evaluate first-round talent, causing certain types of players to be overvalued or undervalued in the first round.

As you probably guessed, I did some math to search for these biases. If you’re not interested in reading about the nitty-gritty and would rather just read my conclusions, feel free to skip to the paragraph that starts with “That last paragraph was a bit wonky.” Everyone else, let’s get nerdy.

I ran some regressions to identify possible biases. My data set includes all players drafted within the first 30 picks from 2002-2009, which I split into hitters and pitchers. I excluded draftees who did not sign. It’s still a little early to know what to make of players drafted in 2010 and later, especially for some of the high school draftees like 2010 No. 2 overall pick Jameson Taillon, who just made his major league debut this week.

My dependent variable was WAR over a player’s first four years of team control. For players who haven’t eschewed their first four team control years yet, I filled in the remaining years using RoS 2016 projections from the FanGraphs depth charts. Most of the players I had to use projections for weren’t good enough for it to make a noticeable difference: Only six from this category were projected for more than 1 WAR this year.

To start, I included a variable for a player’s draft selection in my regression to act as a proxy for his perceived value. It isn’t uncommon for a player to fall a few picks in the first round of the draft for signability reasons, which muddies the calculus a bit. But by and large, the spot at which a player is drafted correlates strongly with his perceived value.

From that baseline, I tested out the following variables in my regressions: a player’s handedness, his height, and whether he was drafted out of high school or college. For hitters, I also tested defensive position at the time of the draft. If teams are acting optimally, the variables pertaining to a player — his background, handedness, height and position — would not turn up statistically significant. After controlling for draft position, a college draftee would be no more or less likely to achieve big league success than a high school draftee. Nor would a left-handed pitcher compared to a righty, or a short player compared to a taller one.

On the pitching side, this is exactly what I found. Nothing came up significant. None of the characteristics I looked at — handedness, educational attainment or height — appeared to be over- or under-valued in the first round. Teams seem to be acting optimally.

Things looked much more interesting on the hitting side, however. The data suggest teams haven’t been valuing all demographics appropriately in the draft.

Since I’m a good boy who tries to avoid overfitting my statistical models, I partitioned my data set into two separate pieces before I got started: one included draftees from 2002-2005 and the other included draftees from 2006-2009. Within both subsets, I found a similar-looking interaction between a hitter’s draft selection and whether he was drafted out of high school or college. Here are the resulting coefficients R spit out when I applied these variables to the full data set: 2002-2009.

REGRESSION COEFFICIENTS PREDICTING WAR FOR DRAFTEES
Variable Coefficient P-Value
Intercept  15.564 0.00
Log(Pick)  -4.172 0.00
College Pitcher  -6.373 0.07
High School Hitter -10.575 0.00
High School Pitcher -10.593 0.05
Log(Pick) * College Pitcher   2.025 0.14
Log(Pick) * High School Hitter   3.916 0.01
Log(Pick) * High School Pitcher   4.068 0.05

That last paragraph was a bit wonky, and the interpretation of the regression coefficients is the opposite of straightforward. But the main takeaway is this: In the early part of the first round, college hitters tend to outperform high school hitters by a substantial margin. A visual might help make this clear.

Graph

Here’s what it looks like when I also include pitchers in the regression.

a

The high school versus college trend holds for pitchers as well, though the gap wasn’t large enough to trip the “statistically significant” alarm when I looked exclusively at pitchers. That doesn’t necessarily mean the high school versus college disparity doesn’t also exist for pitchers. The data just aren’t as convincing as they are on the hitting side, where the effect is more pronounced. The relative lack of high school pitchers selected in the first round (less than 17 percent of my data set) might explain why nothing super-substantial turned up.

Those graphs and equations are cool and all, but I’ve barely named any of the players who made them look the way they do. Let’s change that. The table below lists the high school hitters selected with the first 10 picks in the first round from 2002-2009.

HIGH SCHOOL HITTERS DRAFTED IN THE EARLY FIRST ROUND, 2002-2009
Year Pick Team Name WAR in First Four Years
2005   1 Diamondbacks Justin Upton 13.8
2003   1 Devil Rays Delmon Young  0.2
2008   1 Rays Tim Beckham  0.0
2004   1 Padres Matt Bush  0.0
2002   2 Devil Rays Melvin Upton Jr. 15.3
2007   2 Royals Mike Moustakas  9.3
2008   3 Royals Eric Hosmer  6.1
2007   3 Cubs Josh Vitters  0.0
2009   3 Padres Donavan Tate  0.0
2003   5 Royals Chris Lubanski  0.0
2008   6 Marlins Kyle Skipworth  0.0
2003   6 Cubs Ryan Harvey  0.0
2002   7 Brewers Prince Fielder 12.8
2002   8 Tigers Scott Moore  0.0
2004   9 Rockies Chris Nelson  0.0
2006   9 Orioles Billy Rowell  0.0
2005  10 Tigers Cameron Maybin  7.8
2003  10 Rockies Ian Stewart  2.5
Median 4.0  0.0
Average 4.8  3.8

An awful lot of zeroes in there. To name names: Delmon Young, Tim Beckham, Matt Bush, Josh Vitters, Donovan Tate, Chris Lubaski, Kyle Skipworth, Ryan Harvey, Scott Moore, Chris Nelson, Billy Rowell. That’s 11 high school hitters selected in the single-digits over an eight-year span who were essentially useless.

The list of college hitters looks noticeably better.

COLLEGE HITTERS DRAFTED IN THE EARLY FIRST ROUND, 2002-2009
Year Pick Team Name WAR in First Four Years
2005   2 Royals Alex Gordon 11.1
2009   2 Mariners Dustin Ackley  7.0
2003   2 Brewers Rickie Weeks  6.8
2008   2 Pirates Pedro Alvarez  5.9
2006   3 Devil Rays Evan Longoria 28.7
2005   3 Mariners Jeff Clement  0.0
2005   4 Nationals Ryan Zimmerman 17.6
2009   4 Pirates Tony Sanchez  0.1
2008   5 Giants Buster Posey 23.7
2005   5 Brewers Ryan Braun 23.0
2007   5 Orioles Matt Wieters 14.2
2005   7 Rockies Troy Tulowitzki 16.0
2003   7 Orioles Nick Markakis 14.2
2008   7 Reds Yonder Alonso  3.8
2007   7 Brewers Matt LaPorta  0.0
2006   8 Reds Drew Stubbs  9.2
2008   8 White Sox Gordon Beckham  5.7
2008  10 Astros Jason Castro  6.3
2002  10 Rangers Drew Meyer  0.0
Median 5.0  7.0
Average 5.3 10.2

Evan Longoria, Ryan Zimmerman, Buster Posey, Ryan Braun, Troy Tulowitzki. Several hitters from this group were (or still are) among the best players in baseball. Even the “flops” weren’t complete zeroes in most cases: Dustin Ackley and Rickie Weeks look like stars next to Delmon Young and Josh Vitters. You don’t need a fancy regression model to see the difference between these two lists.

Here are the high school pitchers.

HIGH SCHOOL PITCHERS DRAFTED IN THE EARLY FIRST ROUND, 2002-2009
Year Pick Team Name WAR in First Four Years
2002   3 Reds Chris Gruler   0.0
2002   4 Orioles Adam Loewen   1.4
2004   5 Brewers Mark Rogers   0.9
2002   5 Expos Clint Everts   0.0
2009   5 Orioles Matt Hobgood   0.0
2002   6 Royals Zack Greinke  10.2
2009   6 Giants Zack Wheeler  5.0*
2006   7 Dodgers Clayton Kershaw  23.7
2004   7 Reds Homer Bailey   7.6
2003   9 Rangers John Danks  12.5
2007   9 Diamondbacks Jarrod Parker   5.0
2009   9 Tigers Jacob Turner   0.6
2007  10 Giants Madison Bumgarner  17.4
Median 6.0   5.0
Average 6.5   6.5
*Estimated based on projections

And here are the college pitchers.

COLLEGE PITCHERS DRAFTED IN THE EARLY FIRST ROUND, 2002-2009
Year Pick Team Name WAR in First Four Years
2007   1 Devil Rays David Price 19.5
2009   1 Nationals Stephen Strasburg 15.3
2006   1 Royals Luke Hochevar  6.6
2002   1 Pirates Bryan Bullington  0.0
2004   2 Tigers Justin Verlander 17.3
2006   2 Rockies Greg Reynolds  0.0
2004   3 Mets Philip Humber  2.6
2003   3 Tigers Kyle Sleeth  0.0
2004   4 Devil Rays Jeff Niemann  6.5
2008   4 Orioles Brian Matusz  4.9
2003   4 Padres Tim Stauffer  3.4
2007   4 Pirates Daniel Moskos  0.2
2006   4 Pirates Brad Lincoln  0.1
2006   5 Mariners Brandon Morrow  7.7
2005   6 Blue Jays Ricky Romero  8.6
2007   6 Nationals Ross Detwiler  4.0
2006   6 Tigers Andrew Miller  2.4
2004   6 Indians Jeremy Sowers  2.2
2009   7 Braves Mike Minor  6.9
2003   8 Pirates Paul Maholm  9.6
2009   8 Reds Mike Leake  5.7
2007   8 Rockies Casey Weathers  0.0
2005   8 Devil Rays Wade Townsend  0.0
2002   9 Rockies Jeff Francis 10.8
2005   9 Mets Mike Pelfrey  8.9
2006  10 Giants Tim Lincecum 25.9
2004  10 Rangers Thomas Diamond  0.0
Median 5.0  4.9
Average 5.2  6.3

On the whole, the list of college pitchers does not look any better than the list of high school arms, though, at the very top, David Price, Stephen Strasburg and Justin Verlander blow away the high school kids who selected with the first few picks.

The far left-hand side of the above graph is the real story here, though what’s happening on the right side is also noteworthy. It seems the script flips towards the end of the first round: High school picks turn out better than college picks. I’m hesitant to say there’s much to this trend. We’re talking a difference of just a couple of WAR on average over several years, which isn’t enough to get too worked up about, especially in a small sample. Furthermore, some of this can be explained by the beautiful outlier that is Mike Trout. Take him out of the mix, and the lines for hitters move much closer together. What jumps out to me is that high schoolers taken at the beginning of the first round haven’t fared much better than their counterparts taken toward the end. This suggests the gap between the elite high school players and second-tier high school players might not be as large as the industry perceives it to be.

One of the more annoying quirks of analyzing prospects is that you have to wait a few years to really know how they turn out. Due to this limitation, my analysis looks exclusively at players who were drafted several years ago. This means that any bias that existed toward high school players vis-à-vis college players may have already been corrected.

Anecdotally, it seems the gap has narrowed, particularly due to better results on the high school side. Bryce Harper, Manny Machado, Carlos Correa and Francisco Lindor were all effectively drafted out of high school, and have already blossomed into stars. At the same time, though, Bubba Starling, Michael Choice, Courtney Hawkins and Dylan Bundy have failed hard.

Additionally, there are players who have had disappointing starts to their careers and are beginning to teeter on the fence of the failed prospect graveyard. This group includes Byron Buxton, Delino DeShields, Alex Jackson, Archie Bradley, Max Fried, Tyler Kolek and Jameson Taillon. It’s tough to say anything definitive about the recent drafts without knowing what will become of the Buxtons and Jacksons. But unlike with the 2002-2009 group, the list of failures doesn’t completely overwhelm the list of successes.

With all this in mind, let’s consider what it all might mean for this year’s crop of draftees. I’ve organized the top 15 from Keith Law’s recent ranking of draft prospects in the table below.

TOP 15 2016 DRAFT PROSPECTS
Rank Name Position Type
 1 Corey Ray OF College
 2 Jason Groome LHP High School
 3 Delvin Perez SS High School
 4 Mickey Moniak OF High School
 5 A.J. Puk LHP College
 6 Braxton Garrett LHP High School
 7 Blake Rutherford OF High School
 8 Kyle Lewis OF College
 9 Matt Manning RHP High School
10 Nick Senzel 3B College
11 Nolan Jones SS High School
12 Joey Wentz LHP High School
13 Riley Pint RHP High School
14 Ian Anderson RHP High School
15 Forrest Whitley RHP High School
SOURCE: Keith Law

Twelve out of the top 15 prospects are high schoolers (or high school-aged players) — the prospect archetype that has been most prone to failure in the past. Only three of the top 15 are college hitters — the archetype that’s been most successful.

Using nothing but their rank on this list (which I’m using as a proxy for draft slot) and their player type (high school hitter, college hitter, high school hitter or college pitcher), let’s see what my math suggests these players will do over their first four years of team control. To be perfectly clear, these “projections” don’t take stats, scouting or any other knowledge into account. They’re dumb, terrible projections that are dumb and terrible on purpose. They’re just meant to demonstrate the magnitude of the varying production for each demographic of draftee.

DRAFT PROSPECTS’ EXPECTED WAR BASED ON PLAYER TYPE
Rank Name Position Type WAR in First Four Years
 1 Corey Ray OF College 15.6
 2 Jason Groome LHP High School  4.9
 3 Delvin Perez SS High School  4.7
 4 Mickey Moniak OF High School  4.6
 5 A.J. Puk LHP College  5.7
 6 Braxton Garrett LHP High School  4.8
 7 Blake Rutherford OF High School  4.5
 8 Kyle Lewis OF College  6.9
 9 Matt Manning RHP High School  4.7
10 Nick Senzel 3B College  6.0
11 Nolan Jones SS High School  4.4
12 Joey Wentz LHP High School  4.7
13 Riley Pint RHP High School  4.7
14 Ian Anderson RHP High School  4.7
15 Forrest Whitley RHP High School  4.7

This study had its flaws: My sample size was small and most of the data I used were several years old. A fixed number of players are drafted in the first round each year, which makes it difficult to do a rigorous analysis without analyzing decades of data. And since big league front offices are getting smarter and smarter, they’re probably not making the same mistakes their forefathers did 15 years ago. If I were to condense my findings into one sentence, that sentence would be closer to “In the recent past, college hitters (and likely pitchers too, though the evidence isn’t as strong) have been undervalued relative to their high school counterparts in the first few picks of the draft” than “College players are better bets than high school players in the first few picks of the draft.” Still, I think these findings are both recent enough and significant enough to keep in mind when teams make their picks today.

None of this is to say teams should avoid drafting high school players in the early portion of the first round. It would have been idiotic for the Nationals to pass on Bryce Harper just because Delmon Young and Matt Bush turned out worse than Evan Longoria. At any given pick, it’s entirely possible that a high schooler truly is the best player available. Although the history of high school hitters isn’t pretty, Delvin Perez and Mickey Moniak might actually be the best players available when their names are called on draft day. However, the Cubs thought the same about Josh Vitters. Ditto the Royals with Chris Lubanski, the Orioles with Billy Rowell and Matt Hobgood and the Rays with Tim Beckham and Delmon Young. An outsized share of teams that took high school players — particularly high school hitters — with their single-digit first round picks in the recent past ultimately regretted their decision, and it’s happened often enough that it probably isn’t a coincidence.

References and Resources


Chris works in economic development by day, but spends most of his nights thinking about baseball. He writes for Pinstripe Pundits, FanGraphs and The Hardball Times. He's also on the twitter machine: @_chris_mitchell None of the views expressed in his articles reflect those of his daytime employer.
5 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
evo34
7 years ago

Would be interesting to see if age is a significant predictor among first-round HS hitters. Overall, love the concept here.

jdbolick
7 years ago

Nice work! This appears to dovetail with what we know regarding pitchers having much less of an aging curve during their 20s than hitters do. So college hitters would be farther along in their development, theoretically easier to project, and closer to their peak than high school hitters.

Jetsy Extrano
7 years ago

Nice work. This refines previous work that implies teams are taking too many pitchers too high, and would get more payback from hitters — it’s specifically college hitters they’d get more from, this says.

Can the data set support splitting hitters by side of the defensive spectrum? I feel like taking 1B high is a disaster…

What are the curves you show for wins versus draft position, are those a model you fitted? It looks like a pretty low-parameter model; I’d be curious to see a LOESS smoothing for comparison.

SideshowRaheem
7 years ago

Someone at BP did a similar study and found that it wasn’t even necessarily a split between HS/College as much as it was being young for whatever level the player is at. For example, a top HS prospect who is 19 when drafted has statistically done far worse than a a top HS prospect thats drafted at 17.

gary
7 years ago

Wouldn’t bonus paid be a better “value” proxy than draft position? Or try that and get back to us! See how close the results are/aren’t.