Sabermetric Book Reviews

It's hard to read multiple books at once, but there's no harm in trying (via Abhi Sharma).

It’s hard to read multiple books at once, but there’s no harm in trying (via Abhi Sharma).

Long-time readers of The Hardball Times may remember a statistic called GPA.  It stood for Gross Production Average and it was the brainchild of THT co-founder Aaron Gleeman. The formula was (1.8* OBP plus SLG)/4 and the result looked like, and closely aligned with, batting average. It was designed to serve as a transition stat from the advanced-yet-flawed OPS to more advanced stats like wOBA.

GPA served its purpose for a while, but it turns out that the baseball community isn’t really interested in transition stats.  There seems to be two types of baseball stat nerds out there: those who like the old stats no matter what, and those who want to jump straight to the most advanced stats. There aren’t many people in between. So we put GPA in mothballs a few years ago.

This year, however, GPA is making a comeback. David P. Gerard has written a new book called Baseball GPA (the GPA here stands for Gross Productivity Average; slight difference) and it is indeed a more advanced statistic. In fact, it will take me a couple of paragraphs to explain it.

You know that run-scoring table that captures the average number of runs scored from the 24 different base-out situations? The one that looks like this?

    MenOn     Number of Outs
     FST       0      1      2
     ---    .482   .258   .096
     x--    .853   .510   .211
     -x-   1.095   .646   .293
     xx-   1.494   .907   .423
     --x   1.356   .940   .377
     x-x   1.804  1.151   .470
     -xx   2.169  1.418   .598
     xxx   2.429  1.549   .745

FST stands for runners at first, second and third, and the numbers are the number of runs that scored, on average, in that situation in a given year and league (in this case, the American League in 1992). For example, the average team scored 0.94 runs with a runner on third and one out. This table forms the foundation for determining the weights given to singles, doubles, triples, etc. in wOBA, wRC and many other advanced stats.

For those stats, we typically calculate the average impact each type of baseball event had on the number of runs that scored and/or were projected to score before and after the event. For instance, home runs usually add about 1.4 runs in impact when you factor in all the different situations batters hit home runs. When calculating stats for individual players, we then multiply that weight times the number of events he contributed (1.4 times 20 home runs, for instance, results in 28 runs).*

*The weights are transformed for wOBA and its brethren stats, but we don’t have to get into that right now.

There’s another way to approach this. You could also take the specific outcome of every play and apply the difference in runs from the above table. That would allow you to capture how batters and pitchers performed in very specific situations.

The home run is worth about 1.4 runs when you average its impact across all 24 base/out states.But a batter who hits a home run with no one on has specifically contributed one run, while a batter who has hit a home run with the bases loaded and two outs has contributed about 3.35 runs. The “average” approach multiplies all home runs by 1.4; the “specific” approach multiplies each home run by a different number, depending on the base/out state in which it occurred.

To my knowledge, this idea was first proposed by Gary Skoog in Bill James’ 1987 Baseball Abstract. He called it the “Value Added” approach. FanGraphs readers now know it as RE24. Both stats measure the number of runs above/below average a batter contributed when you consider exactly when he performed each batting event. It’s a different and sometimes useful way to analyze performance.

David Gerard has added the wrinkle of converting situation-specific results into an average, the equivalent of batting average, just like the original GPA. In his book, Gerard sets the GPA standard weights by using the average of all base/out run values from 1997 to 2009 and setting the equivalent batting average to that from 2005 to 2008. He applies his math to all the Retrosheet years from 1952 to 2012. Then he goes to town.

He lists all the best seasons from 1952 to 2012, natch, both batters and pitchers. He uses GPA to develop ballpark factors. He uses the difference between GPA and batting average to define eras in recent baseball’s past (for the record, he identifies the years 1971-1984 as the Pitching Era and the years 1994-2004 as the Steroids Era).

Next he combines GPA with a powerful baseball simulator to try to answer some of baseball’s thorniest questions. How to construct a lineup? When to steal a base? When to attempt a sacrifice bunt? When to walk Barry Bonds?

The final section of the book includes a method by which Gerard incorporates GPA into a Wins Above Replacement (WAR) metric. He doesn’t fully create his own WAR because he doesn’t have a fielding component, but he goes into some interesting details (including his own Leverage Index approach). He also uses GPA to develop a free agent ranking system, review all past MVP and Cy Young awards voting and analyze Jim Rice’s 1984 season. So much great stuff in 250 pages.

A Hardball Times Update
Goodbye for now.

Evidently, the concept of GPA first occurred to Gerard in the mid 1980s, when he was mulling over the true value of Rice’s year. A couple of decades later, after the Retrosheet Gang pulled together many years of box scores into a cogent database, he was able to grab the data and publish his book.

You can tell he’s been mulling over this for a while. Gerard uses GPA to address many of the things Thorn and Palmer addressed in The Hidden Game of Baseball or those that Tango, MGL and Dolphin addressed in The Book. The question is…how well does he achieve his goals?

First, let’s say that GPA is a good and legitimate statistic. It’s not for everyone, because some players are presented with more opportunities than others within a given year (something that Matt Hunter addressed in his own version of this stat). But if you appreciate RE24, there’s no reason you shouldn’t like GPA. One of the nice things about both stats is that you can apply them equally to batters and pitchers.

Gerard appears to be only casually familiar with sabermetric advances on the Internet. As a result, he creates a number of things that could have been better informed by similar work on the Web. On the other hand, he sometimes takes a different approach to a subject that many of us may not have considered yet.

His approach to Leverage Index, for instance. While Gerard doesn’t seem to know of Tom Tango’s work (he certainly doesn’t acknowledge it), he takes a worthwhile stab at the issue by assessing the relative impact runs have on the potential change in Win Probability within a given situation. The logic is independent of, but very similar to, Tango’s and I learned some things from it.

Since GPA already incorporates the bases/out situation, Gerard calculates Leverage Index only to the inning. By multiplying the player’s GPA times his version of the Leverage Index, he comes up with a roundabout Win Probability Added metric (which he applies only to relievers). It’s a different approach, but legitimate in its own way.

Reading the book, you may find other interesting angles in some of his analyses such as ballpark adjustments and lineup construction. In the end, however, Gerard’s fundamental findings don’t appear to be very different than what we believe today.  Coors Park is still a hitter’s paradise. Barry Bonds was still walked way too often. But there is something to be learned from a person who approaches a problem in a different manner. Gerard’s book is definitely worth a read for those who like to dive into the details.

Having said that, there are some flaws here. About once a chapter, Gerard says things that make you scratch your head, such as the assertion on page 11 that batting average undervalues a home run by 99 percent since it values a single and a home run equally. Say what?

Gerard also doesn’t always present his results in a logical fashion. For instance, he has a habit of presenting the simple version of an analysis and then adding layers of complexity. This is okay, but you tend to say “Yeah, but what about…” when reading his book, only to discover that he addresses your issue in the next version of his analysis. You might call this the “deductive” style of writing, and it usually doesn’t serve a writer well. Memo to all research writers: Let readers know where you’re going. Give the answer first, not last.

Plus, this book is thick with tables…tables that are 10 columns wide and 20 columns long. He is the type of writer who likes to show you everything, but then makes it difficult to pull out the key information. It can be hard to follow his logic at times.

I wonder sometimes about the people who publish these books.  Do they read them for coherency and understanding? When books are dense and hard to read, do they realize what they’re asking of their readers? Michael Humphreys’ book about fielding stats—Wizardry—was a fine book with some intriguing ideas, but it was also very hard to read. Michael could have used a strong editor and publisher, but I guess no one was interested. I sometimes think that the publishers of these books don’t get it themselves and just don’t try.  Margins are low in the book industry. Who’s got the time?

I don’t mean to end on a down note, however. Gerard is to be congratulated for his impressive and deep thinking, analysis and work. If you have a hankering for the sabermetric details, I recommend this book.


Meanwhile, Benjamin Baumer and Andrew Zimbalist have written a book that summarizes the recent history and current state of sabermetrics.  It’s called The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball. Unlike Baseball GPA, this book is an easy read that doesn’t really provide anything new for the long-time sabermetrician, except for the last section.

In 150 pages, the two authors…

  • Recount the truths and untruths of Moneyball, the book and movie
  • Trace the history of sabermetrics
  • Give an overview of current sabermetric thinking
  • Discuss how sports analytics has grown in other sports
  • Assess baseball’s competitive balance between teams
  • Estimate the impact of sabermetrics on the game.

Their credits are strong. Baumer once made headlines by being the first sabermetrician hired by the Mets. He now teaches statistics at Smith College, where Zimbalist is an economics professor. Zimbalist already has an impressive list of baseball business books to his credit. And most of the points they make in their book are spot on.

Having said that, the two authors tread familiar ground. Most people who follow sabermetrics will be very familiar with what’s in this book. The new material, at least for me, was in the final chapter (and the related appendix), in which the authors use stats to assess the “saber-intensity” of teams and identify the potential impact of those stats on winning.  I thought they did a nice job with this analysis.

On the down side, The Sabermetric Revolution reads like a textbook (I said it’s an easy read, not a fun read) and there is one citation error that really irritates me, for totally selfish reasons. On page 29, they say that Carlos Gomez (once Arizona’s director of international scouting) made his mark writing for Baseball Prospectus. As far as I know, Carlos never wrote for Prospectus. He wrote for The Hardball Times (and, before that, Baseball Think Factory). In fact, their related footnote references a Hardball Times article he wrote.

That lapse aside, this is a quick and easy read with some new insights in the final chapter. It would also be a good read for the thousands of people who have misinterpreted Moneyball. Unfortunately, most of you don’t know who you are.

References and Resources

  • Retrosheet for the 1992 American League run scoring table. God Bless Retrosheet.


Dave Studeman was called a "national treasure" by Rob Neyer. Seriously. Follow his sporadic tweets @dastudes.
9 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Rally
10 years ago

Carlos Gomez still holds that title, he is currently with the Angels.

jim S.
10 years ago

Does Gerard say what that “powerful baseball simulator” is? The best baseball sim on the market is the one created by Tom Tippett, who now works for the Red Sox. It is called Diamond Mind, though he sold it about seven years ago and hasn’t seen any substantial upgrades.

bob
10 years ago

Although Amazon sells this book, I can’t recall them showing it to me when I was searching for books on advanced baseball statistics. For example, when I look at the Amazon page for “Baseball Between the Numbers” this book is not on the list of “Customers Who Bought This Item Also Bought.” Linking from book to book this way is often how I find things. I find it after reading this article, of course, by using the book title, but Amazon would not have told me about it even though they know my interests.

Peter Jensen
10 years ago

Very thorough reviews of these books Dave. Thanks. And thanks for crediting Gary Skoog for his part in developing the RE matrix and the value added approach of player evaluation.
Gary was also the first to my knowledge to use a Markov chain in baseball analytics. People now don’t realize how difficult it was back in 1987 to make your baseball research public. Thanks for all you have done with The Hardball Times to make it easier today.

Ben Baumer
10 years ago

Dave, thanks for pointing out this mistake. I have notified the publisher so they can add this to the list of errata. I’m not sure how that one slipped by, given that we had the THT article cited in the endnote, but I can’t argue with the facts!

Thanks also for the (mostly) positive review!

Cliff Blau
10 years ago

As for RE24-type analysis, the first to use it as far as I know were the Mills Brothers in their 1970 booklet, Player Win Averages. See http://trace.tennessee.edu/cgi/viewcontent.cgi?article=1005&context=utk_harlan

Studes
10 years ago

The Mills Brothers wrote about WPA, not RE24, right?