The baseball economistby John Beamer
April 14, 2008
Those of you looking for an introductory text on sabermetrics could do a lot worse that picking up a copy of JC Bradbury's The Baseball Economist. Written in an easily accessible style it is heavy on explanation and light on statistics—perfect for a beginner.
Unfortunately, for more knowledgeable readers, that approach has its drawback—namely you want the back-up info to see if you agree with JC's interpretation of the data.
In this article I want to do two things. One, give a very brief review of the book for those who haven't come across it, and two, hone in on a couple of JC's conclusions and see if they stand up to a bit more scrutiny.
A short review
I'm not going to write a full review as others have done that with more rigor. Phil Birnbaum's review over at The Griddle is the best of the lot.
The book is divided up into four sections. The first few chapters are about play on the field, and are probably the best reads in the book. The first study discusses whether the propensity for more hit batters in the AL is a result of the DH. Bradbury follows this up with a look at the role of the on-deck hitter and why there are no left handed catchers (the answer is that there is generally more value having southpaws in other positions). The final study in this section asks whether managers can influence an umpire's call.
The second section is called "almost off the field" and opens with JC's famous Mazzone study, which asserts that Leo Mazzone's pitching methods were instrumental in the Braves success over the last 15 years—more on this later. So far, so good.
JC then looks at the link between wins and population and subsequently asks which are the two best run teams in baseball. It is at this stage, for me, where the book begins to lose a bit of steam. The correlation between populations and wins is skewed by the New York Yankees and might be flat if they were omitted. Also teams in the same metropolis still get to "claim" all of that city's population—that might be right way to do it but I'm not sure.
JC concludes that the Marlins and Indians are the two best run teams in baseball (this was a couple of years ago). He derives this by working out which teams spend their dollars most efficiently and then looks at how often each team wins.
The issue is that the Marlins' strategy of cutting payroll to the bone is always going to be dollar efficient because even with 25 replacement-level players a team can expect to garner 60 wins. The other problem is that JC then creates an arbitrary metric (equally weighted between "winningness" and "dollar efficiency"). Call me old fashioned but I dislike arbitrary stats—different arbitrary weightings produce different arbitrary answers..
JC also has an intriguing chapter on applying game theory to steroids, which I thoroughly enjoyed; it is a great introduction to game theory for aspiring economists.
The final two sections deal more with off-the-field matters. The centerpiece of this is his player valuation system, which attributes dollars to each player's performance. This is a good attempt to put a dollar value on each player but suffers from failing to account properly for replacement level (see Phil's review for more details) and uses wonky weights (OBP is over valued relative to SLG). It also fails to take into account position and defense for hitters and leverage for pitchers. This is frustrating because all these are easy fixes. As a result JC tends to undervalue closers (for example) by a factor of two.
Anyway, to sum up the Baseball Economist is a very enjoyable book, written in a very accessible style by an individual who is very knowledgeable about the game. I did enjoy the refreshing economics slant that JC provided.
Okay, now I've given you an overview of the book I want to take more in depth look at three of the studies.
The Mazzone effect
Leo Mazzone is the legendary Braves pitching coach who now works for the Baltimore Orioles. While Mazzone was in Georgia the Braves had an unprecedented run of success, dominating the National League in the 1990s. During this period (mostly 1991 - 2000) the Braves had one of the best rotations of all time (Greg Maddux, John Smoltz, Tom Glavine, Steve Avery, to name four).
Not only that but the Braves a ton of reclamation projects cycled through the system. Guys who were mediocre pitchers suddenly became world beating. And what's more when they left the Braves they (largely) reverted to their old form. Remember guys like Chris Hammond, Mike Remlinger (1999 not 2007), Jorge Sosa and Jaret Wright—these are the sorts of hurlers we are talking about.
The question is was this due to luck or was the Braves coaching team, and in particular Mazzone, the difference maker?
JC's approach was to use regression to try to tease out the Mazzone effect. According to the book JC includes the following variables: pitcher age, defense, run environment, pitcher quality, starter/ reliever and whether Mazzone is coach.
JC then reports the regression result, which is that Mazzone presence reduces a player's ERA by 0.64 runs (0.41 for starters and 0.71 for relievers)!!! Wow. Without any more data it is difficult to comment on the interpretation of the regression equation. Fortunately JC has published his studies on the Internet for us to look at. You can find it here.
Is JC right?
At first pass the regression seems sound although there are one or two concerns. First, selection bias could play a role. To start with pitchers might pitch better than expected and so retain their spot on the roster (and accumulate a low ERA). Once said hurler shows signs of regression he'd be jettisoned and picked up by another team—Jorge Sosa would be a classic example going to St Louis after a very promising year as a starter.
Second, another form of selection bias is that we happen to pick arguably the best staff of the era! Whenever you pick the best of something it is implicit that performance at that point is better than what became before it and went after it.
Third, JC includes two age control variables (age and age squared). There is no evidence that those two dummy variables are enough to control for age, especially as different aging patterns are more likely to dominate in such a small sample. (See recent debate on Roger Clemens.)
Fourth, Guy points out that JC under accounts for defense and league. This is what he says:
Consider these coefficients for starters:
League ERA: .271
Team DER: -5.554
This means that if a pitcher leaves Mazzone and goes to the AL, where the ERA was usually 0.50 higher, the model only “expects” his ERA to rise by 0.135. And if Atlanta has a team DER of .720, and a pitcher leaves Atlanta for a .700 team, the model expects his ERA to rise by only 0.11, when in fact 2 points of DER translates into about .5 runs/game. So two major advantages for Mazzone pitchers in this period—pitching in the NL, and in front of a good defense—are not fully compensated for. (Now, I gather the latest analysis focuses on Ks and HRs, which obviously are not defense dependent. But you still need to deal with league, and park effect on HRs and Ks.)
Also, the career ERA coefficient is just .62, so the model will tend to somewhat underestimate great pitchers (i.e. predict too high an ERA), and overestimate lousy pitchers. This will be true in both Mazzone and non-Mazzone years, of course, but to the extent the good pitchers in this sample pitched more years with the Braves, it will again create an illusion that pitching for Mazzone reduces ERA.
I’d like to see the K-rates and HR-rates for these pitchers, league/park-adjusted and age-adjusted, with and without Mazzone. I’d have more faith in that than these regressions
So is the Mazzone effect fiction? It's difficult to say but more evidence is needed before we declare this either way. One thing for sure is the Braves didn't think so otherwise they'd have given him a mega contract. All eyes on Baltimore for the next few years.
The Moneyball effect
Moneyball is perhaps the most famous baseball book of recent times. The book is about how Billy Beane and his cohorts took a small-market franchise to the top by trying to exploit market inefficiencies. Perhaps the most famous example was the undervaluation of on-base percentage.
Beane and his team simply looked for high OBP players and were able to acquire them on the cheap. Hey presto ... you've got a pennant-winning team! JC charts this phenomenon in his "innovating to win" chapter. JC points to a study by Hakes and Sauer that confirms the inefficiency but also claim that it didn't last long. JC quotes this piece from the Hakes and Sauer paper:
The diffusion of statistical knowledge across a handful of decision making units in baseball was apparently sufficient to correct the mispricing of skill. The underpayment of the ability to get on base was substantially if not completely eroded within a year of Moneyball's publication.
A year? Really?
You can find the Hakes-Sauer study here. The study runs a regression on the log of salary against OBP, SLG and a couple of other variables for each year between 2000 and 2006. What they found was that between 2000 and 2003 SLG was valued more highly that OBP. Guess what? In 2004 that was reversed—and that trend held up in 2005 (and to a lesser extent in 2006). QED?
Perhaps, but it isn't quite the slam dunk we're led to believe (in my opinion). One issue is whether the market would be able to correct for the salary data in just one year? That's possible but consider for a second how many players renegotiate their contracts every year—only a fraction (big name free agents and journeymen). If the Hakes-Sauer assertion were true we'd expect to see the gap close slowly.
In fact what we actually see is overcompensation for the OBP effect in 2004 before a moderation in 2005 and 2006. Because the study looks at all players it considers their salary weightings (between OBP and SLG) equally, which is incorrect.
Another issue is that we shouldn't use actual performance data to work out how teams make salary decision because these are subject to a lot of random fluctuations. What we should really do is to regress historic data to a mean to arrive at a projection for the player in question as this is how ballclubs make salary decisions. If a player has an OBP above his career norm in a given year the odds are it will be lower the following year. Here is a great intro to the phenomenon.
Finally it is slightly disingenuous to use SLG and OBP in the same regression. The fact is that these two stats are linked (OBP is part of SLG) and the larger variance of SLG means that this is more likely to capture some of the OBP effect. The authors would have been better off using OBP and ISO and rebuilding the data from that (or even better singles, doubles, triples, homers, walks etc.).
There is no doubt that something a little odd happened to salary in 2004—for a start total payroll fell that year for the first time since 1986. Whether the Moneyball hypothesis accounts for anything is at this point still unclear.
The final chapter I want to take a look at is called "The Legendary Power of the On-Deck Hitter". For my money this is the best chapter in the book. (The full study can be found here.)
Standard theory suggests that a batter with a good on-deck hitter behind him will more likely see a pitch he can drive because the hurler is desperately trying not to walk the batter currently in the box. The corollary of this is that if the on-deck hitter is mediocre to poor then the batter in front is more likely to be walked. We see this daily in the National League as the hurler comes to the plate.
Bradbury and one of his colleagues, Doug Drinen, ran a regression on batting average and a ton of other variables (OPS, score, bases, outs, innings, batter quality, pitcher quality) and concluded that a 1 standard deviation increase in the OPS of the on-deck hitter led to a 1% drop in batting average for the current hitter, a 2.6 percent drop in walk rate and a 3.7 percent drop in extra-base hits (including a three percent drop in home runs).
The argument is that with a David Ortiz on deck it is critical to get the current batter out. What that means practically is that the hurler will dial up his effort to try to strike out the batter (in effect reducing the potential impact of Ortiz's at bat). This goes against conventional wisdom. Are Drinen and Bradbury right?
The study seems robust and while the authors don't take into account all the interaction effects it is likely even if they did the results would stand. It certainly seems plausible that pitchers can expend more effort when they need to strike out a particular batter.
I'd like to see two further tests to confirm the hypothesis. First I'd like to see the split between starters and relievers. Relievers will always be throwing closer to maximum effort as they pitch for shorter stints whereas starters are more likely to vary effort more. In other words the effect should either be much reduced or disappear when looking at relievers.
Second we could use pitch f/x data to look at the critical at-bats where a good on-deck hitter is present. With pitch f/x we could see if hurlers throw faster and hit the strike zone more consistently.
All in all I thoroughly enjoyed this book. It makes you think about the studies and what the potential pitfalls might be. It is a shame that Bradbury didn't include more details of each study in the book but given his target audience that isn't too surprising (this is Freakonomics for baseball). However, the majority of the studies are available on the Internet for us to look at.
Here's hoping for a sequel ... come on JC, whaddya say?
References and Resources
JC Bradbury blog at Sabernomics. He has also written for the Hardball Times. Also thanks to various contributors at The Book blog and Sabermetric Research who have been invaluable in stimulating debate on these topics.
John is an unashamed glory supporter having followed the Atlanta Braves since 1991. He blogs the Braves at Chop-n-Change. He welcomes comments, criticisms and suggestions via e-mail