The Hardball Times

The uncertainty of Aaron Hill

by Geoff Young
December 16, 2009

Forgive the mess that follows. I've been reading again, and that always causes problems.

I've been thinking about the knowledge we have, the knowledge we think we have, and the gap between the two. I've been thinking about our tendency to construct "reasons" for behavior after the fact, based on information not readily available (or decipherable) earlier.

I find myself haunted by intangibles and uncertainty. In other words, it's just another day in the universe.

More practically, I've been contemplating Aaron Hill's monstrous 2009 campaign. Where did it com from, and where will it lead?

Unfortunately, the answer to both questions is, "I don't know." Fortunately, sometimes that is okay.

If you saw that coming, you lie

First, to get an idea of the caliber of player Hill out-homered in 2009, I thought it might be instructive to examine selected leaders from the previous year (listed in descending order by '08 homers) for the sake of comparison. Note that this ignores players who out-homered Hill; we're looking only for unexpected behavior:

Eleven players that Aaron Hill out-homered in 2009
 20092008Diff
PlayerAgePAHRPA/HRAgePAHRPA/HRHRPA/HR
Carlos Delgado37112428.0366863818.1-34-9.9
Manny Ramirez374311922.7366543717.7-18-5.0
Ryan Ludwick305392224.5296173716.7-15-7.8
Miguel Cabrera266853420.1256843718.5-3-1.6
Ryan Braun257083222.1246633717.9-5-4.2
Carlos Quentin263992119.0255693615.8-15-3.2
Alex Rodriguez335353017.8325943517.0-3-0.8
Jim Thome384342318.9376023417.7-11-1.2
Jermaine Dye355742721.3346453419.0-7-2.3
David Wright266181061.8257353322.3-23-39.5
Chase Utley306873122.2297073321.4-2-0.8
Aaron Hill277343620.4262292114.5+34+94.1

We can find excuses for most of these guys if we look hard enough—especially since we now have the benefit of hindsight. Delgado, Ramirez, Thome, and Dye are all old. Delgado got hurt, as did Quentin and Rodriguez. Ramirez had his little vacation.

Ludwick hadn't done much before 2008; that could have been a fluke. Wright? It happens; ask Gary Gaetti. At least Wright kept hitting and didn't stink like Gaetti. Not that I'm bitter or anything.

It's harder to concoct "reasons" for Cabrera, Braun, and Utley; then again, those guys all cleared 30 homers, so they don't really need any.

Let's try approaching this from a different angle. Here are some 26-year-olds who put up numbers comparable to those posted by Hill at the same age (listed in descending order by homers at age 27):

Nine players that had similar seasons to Aaron Hill's 2008
 Age 26Age 27
PlayerYearPABAOBPSLGHRYearPABAOBPSLGHR
Aaron Hill2008229.263.324.36122009734.286.330.49936
Jim Delsing1952459.260.329.36041953552.288.380.43611
Rance Mulliniks1982353.244.323.36341983427.275.373.46710
Dale Berra1983607.251.327.358101984500.222.273.3189
Cy Perkins1922560.267.322.36661923579.270.356.3702
Chick Galloway1923556.278.327.36121924505.276.311.3412
Kurt Stillwell1991428.265.322.36161992416.227.274.2982
Alex Sanchez2003599.287.319.36312004352.322.335.3862
Marty Marion1944565.267.324.36261945477.277.340.3701
Gary Thurman1991200.277.320.35921992216.245.281.3050

Most of these guys had more plate appearances than did Hill, many of them a lot more, so they aren't perfect analogues (the information gleaned from Hill's 229 plate appearances may not be equal in quality to that gleaned from Berra's 607), but they're in the general neighborhood. The table above also tells us nothing about each player's respective career totals to date, which could provide additional insights.

That said, it's interesting to note that some players (Mulliniks, Delsing) improved the following year, while others (Berra, Stillwell) did not. But can we offer meaningful predictions on Hill's path forward based on such information? There are other factors to consider as well (era being one example; how similar are the conditions in Hill's time to those experienced by the likes of Perkins and Marion?), and it's not always clear which are important or to what degree.

Nobody else experienced the type of power spike that Hill did. Such is the nature of outliers. Typically we don't see that kind of a jump from year to year. There are exceptions, of course: Bert Campaneris in 1970, Wade Boggs and Dale Sveum in 1987, Brady Anderson in 1996 (although he was a 10-15 homer guy before jumping to 50 that year).

Phil Bradley went from zero homers in 1984 to 26 in 1985, and then settled into the low double digits for a few years. Kirby Puckett was like that. So were Jeffrey Leonard and Steve Finley, probably others I'm forgetting.

What I find both fascinating and frustrating is that there is no way we could have seen Hill's (or Campaneris's or Puckett's) outburst coming. According to reasonable expectations, it should not have happened, and yet it did.

Now armed with the knowledge that Hill knocked 36 homers in 2009, we can go back and point out that he hit 17 in 2007 and that he missed much of 2008 due to a concussion. But before the 2009 season started, anyone predicting that kind of output from Hill would have been written off as a crazy person and justifiably so.

Seriously, think of the odds you could have gotten if you'd been willing to wager that Hill would out-homer not only one of the players in that first table, but all of them. He knocked more than Delgado, Ramirez, and Wright combined for crying out loud. You could have bought an island with your winnings on that one... if you'd been "stupid" enough to place such a ridiculous bet.

Alas, such is the nature of future events. We can make educated guesses and then watch with everyone else to see what unfolds. If we are good and/or lucky, reality may coincide with our guesses. Sometimes, however, the two will diverge more severely than we might have imagined.

In Hill's case, we identify his dramatic and apparently unpredictable improvement as an outlier, an anomaly, according to the models of reality we have created. Maybe we retroactively concoct reasons for its presence—something to satisfy our desire for order in the midst of chaotic and dynamic systems.

But what have we learned about Hill? How can we apply these lessons to future possible anomalies, without knowing when those anomalies will occur or what shape they will take? How do we make accurate and timely decisions in the face of necessarily incomplete and imperfect information?

Forecasting and other impossibilities

"I am not a number, I am a person."
-- Number Six

A number? More like a series of numbers that we don't (and possibly can't) know. Ron Shandler, among others, has discussed the difficulties inherent in forecasting future performance based on past data.

To continue with Hill as an example, given what we know about Year 1 (2 HR) and Year 2 (36 HR), what should we predict for Year 3? A reasonable approach might be to take the average of those two, i.e., 19 homers. In fact, Hill hit 17 home runs in 2007, which is pretty well in line with the average of his past two seasons and suggests that 15-20 might be a more appropriate range of expectation for him.

But how confident can we be in our prediction? Sure, Hill has averaged 19 home runs over the past two seasons, but in neither instance did he come anywhere near that actual total. In fact, in both cases, we could have predicted that Hill would hit between 3 and 35 home runs—a laughably wide range that effectively tells us nothing—and been wrong. It is very difficult to miss a target that large:

How unique is 3-35 home runs?
Year200 PA3-35 HRPct
2008344288.837
2009346299.864

Fewer than 20 percent of all players who amassed 200 or more plate appearances in either season fell outside the 3-35 home run range. Fewer still (14 total) did it both years:

Fewer than three homers in 2008 and 2009 (min. 200 plate appearances):

{exp:list_maker}Emmanuel Burriss Jamey Carroll David Eckstein Cesar Izturis Jason Kendall Augie Ojeda Juan Pierre Nick Punto Willy Taveras {/exp:list_maker}

Pretty exclusive; only nine guys there. This next one's even more exclusive.

More than 35 homers in 2008 and 2009:

{exp:list_maker}Adam Dunn Adrian Gonzalez Ryan Howard Albert Pujols {/exp:list_maker}

It gets better.

Fewer than three homers in either 2008 or 2009, more than 35 the other year

{exp:list_maker}Aaron Hill {/exp:list_maker}

It may well be that 15-20 home runs is a solid guess for Hill in 2010 (I'll go out on a limb: this is the year he reverts to 2007 form and hits between three and 35 homers), but with such crazy swings from Year 1 to Year 2, it's hard to say. If he does end up in that range, I'm sure we'll find "reasons" for his regression to the mean. If he falls short, we'll scramble to explain why 2009 was the fluke; if he duplicates (or approaches) last year's success, we'll scramble to explain why it represented a new level of ability.

Maybe Hill made an adjustment last year, or maybe he simply reached physical maturity. Both happen, although neither in entirely predictable ways. We hear about players making adjustments all the time (e.g., learning a new pitch) and sometimes it even makes a difference. But how useful is such information if it's accurate in some instances and not in others? What are we to do with players that reportedly make an adjustment and fail to improve anyway?

As for physical maturity, who knows. We have data that suggest probabilities, but then guys like Cesar Cedeno and John Olerud peak early, while Finley and Luis Gonzalez don't hit full stride until thirtysomething.

Okay, genius; what are we supposed to do?

Would you believe ... I don't know? Actually, that's a good first step. Understand and acknowledge that there is much we don't know. The act of not knowing leads to more questions, which can lead to insight.

This is a hard point to sell in a results-oriented world, but the act of questioning is critical. The process (observe, ask, theorize, test, etc.) is scientific, but the "answers" we have at any given time may be more a matter of faith, as is what we choose to do with them.

Don't mistake models for reality. Don't mistake probability for certainty. Don't fall in love with your data, even if it's really good. The world is not reducible to a set of equations, nor are its inhabitants.

The irritating thing, of course, is that at some point we have to stop analyzing whatever data we may have collected, take a deep breath, and make our best guess based on our understanding at that time of a given situation and its various components. We act on what we think we know, and hope for a favorable outcome.

Sometimes our information will be sufficiently accurate and complete for us to make a good choice; other times, not so much. Either way, with luck we will have learned something of value that we can apply toward future decisions.

(In the interest of full disclosure, I should mention that I've contributed to books that provide forecasts. Aside from the exposure and modest compensation, there are two reasons I do this. First, as I've said, at some point we have to take our best shot and do something. If I never had to make a decision, I could keep postulating until the end of time without fear of repercussion. Second, although most forecasts say roughly the same thing—i.e., Player X is probably going to do what he has done in the past unless he is very young, very old, or very injured—I like to believe there is value added in accompanying commentary, which is an opinion I should have since I provide such commentary. Really, would you be inclined to read my commentary if I didn't believe it mattered?)

This is all getting very heady, so let's cut to some practical points.

Fantasy implications

I believe that in the realm of fantasy games, projections have utility:

  1. Once you realize that they all are essentially the same and subject to the same errors that make fantasy baseball largely a game of chance, you can stop wasting so much time and energy on them.
  2. Because many people do not realize this—including some of your opponents—you can use projections to gain insight into how others are likely to value players; then you simply (heh, it's not so simple) identify and exploit the gaps between their perceptions and yours.

Real-world implications

For folks making decisions in real baseball, one potential implication of uncertainty—and this borders on blasphemy—may be that lack of reliance on advanced metrics and methods could be less destructive than we might assume. Not that they aren't useful—I happen to believe they can be if deployed properly—but perhaps the importance of these tools has been overstated by some.

For example, what if the would-be practitioners of such metrics and methods aren't comfortable using them and thinking in those terms? This is not to imply that they are stupid; it's just that some people are more conversant in—and therefore better able to contemplate and make judgments—say, the language of scouting than in some foreign mode.

As others have noted elsewhere (and this lies well beyond the scope of our current inquiry), there probably should be a balance of approaches. The salient point is that a deficiency in one area might not be as devastating as it would initially seem to be. (For example, a person who is not "fluent" in sabermetrics isn't likely to become overly dependent on its techniques or blind to its weaknesses; not that practitioners necessarily will, either, but the chance is much greater.)

Conclusion

Strive to understand the difference between what we know and what we don't know. Question everything. Be prepared to make decisions based on incomplete and imperfect information. Sometimes we will make a sensible decision and be wrong; sometimes we will make a risky decision and be right. Try not to think about such things too hard; I speak from experience when I say that your head will hurt.

The downside to this type of skepticism is that people often think you are a complete nutjob. Worse, they may be right.



References and Resources
 
Books

The following books provided initial sparks of inspiration for this line of inquiry. They have nothing to do with baseball but make for good reading if you are of a philosophical bent.

Nassim Nicholas Taleb, The Black Swan
Malcolm Gladwell, Blink
Robert J. Shiller, Irrational Exuberance

Articles

During the course of my research, I found a few thought-provoking articles that address similar questions in a baseball context. You may find them useful and/or entertaining.

Risk vs. Uncertainty
The Occam’s Razor of Events
The Residual Fallacy
Scouts versus Stats: A Case Study in the Diffuse Nature of Knowledge

Heavy lifting

Baseball-Reference

Thanks also to my good friend Didi, with whom I have had many fascinating discussions on this topic and who helped focus my ideas.

Geoff Young covers the San Diego Padres at Ducksnorts and is a contributor to Baseball Prospectus. Feel free to send Geoff comments via email.

<< Return to Article