Bases and outs ad nauseum

by Brandon Heipp
April 22, 2008

Bases are of the highest importance, competing with outs for the production of the sport’s gold—runs. The object of the game is circling the bases before the third out is made. To attain the highest number of bases while compiling the fewest number of outs is each batter’s dream
—Barry Codell, “The Base-Out Percentage: Baseball’s Newest Yardstick,” 1979 SABR Baseball Research Journal

One of the common complaints about sabermetrics, raised by proponents and opponents of the field alike, is that there are just too many measures out there that evaluate offensive performance. We have OPS, Runs Created, Base Runs, Batting Runs, Extrapolated Runs …the list goes on.

It was a simpler time when Codell introduced his Base-Out Percentage in 1979. While some of the offensive metrics mentioned above already had been introduced, many others had not. After all, it would be another three years until Bill James thrust sabermetrics into the spotlight.

Codell’s idea was that offensive productivity could be measured by a ratio of bases to outs. He expressed this idea mathematically in Base-Out Percentage:

BOP = (TB + W + HBP + SB + SH + SF)/(AB – H + SH + SF + CS + GDP)

Around the same time, noted sportswriter Tom Boswell, writing in Inside Sports, introduced his own statistic, Total Average:

TA = (TB + W + HBP + SB)/(AB + W + HB + SB + CS)

Boswell’s creation was essentially (notwithstanding the inclusion of stolen base attempts in the denominator) a ratio of bases to plate appearances, as opposed to outs as Codell proposed. By 1981, though, Boswell had revised his formula for Total Average:

TA = (TB + W + HBP + SB)/(AB – H + CS + GDP)

The revised Total Average was identical to Base-Out Percentage, except it ignored sacrifices.

TA and BOP were each discussed by John Thorn and Pete Palmer in their seminal work, The Hidden Game of Baseball. The book (as well as later Thorn/Palmer collaborations like Total Baseball) included lifetime and seasonal Total Average leaders in its statistical section, alongside OPS, RC, Batting Runs and a number of other metrics.

The mothers of reinvention

So far, so good. The bases-to-something ratio had established a niche as a little-used alternative offensive metric. Then, over the next 20 years and continuing to this day, something else happened: The statistic kept being “re-invented.” Numerous authors, bloggers and message board posters have come along with their own versions—each seemingly unaware of the existence of BOP or TA.

One can speculate on why this might be. Perhaps the idea of bases as a building block of baseball is so fundamental that it rings true to numerous fans. Perhaps it is that TA and BOP never caught on among sabermetricians (let alone in the mainstream) as did an alternative metric, OPS. People simply may not be aware of the existence of those metrics, while they don’t re-invent OBP + SLG because they are aware of OPS. Or perhaps people simply are lazy and are not interested in spending a few minutes learning about pre-existing metrics.

Whatever the reason, the proliferation of base to out and base to plate appearance ratios continues. To make my point, here is a list of nine of these duplicates (organized alphabetically by author’s name):

1. Paul Adel posted on the Internet in the early part of the decade his Offense Ratio (his site seems to not be available anymore):

OR = (TB + W)/(AB – H)

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

2. Bill Gilbert has posted his Bases per Plate Appearance metric in various places online for many years now:

BPA = (TB + W + HBP + SB – CS – GDP)/(AB + W + HBP + SF)

3. Stephen Grimble, in Setting the Record Straight: Baseball’s Greatest Batters, used Base Production Average as an alternate to his runs scored and RBI-based main statistic:

BPA = (TB + W + SB – CS)/(AB + W)

4. In a 1994 book, Essential Baseball, Norm Hitzges and Dave Lawson (now a statistical analyst for the Milwaukee Brewers) used TOPR as their main measure of offensive productivity (they also broke TOPR down into components like power and walks, and used the same formula to evaluate pitchers, calling it TPER):

TOPR = (TB + W + HBP + SH + SF + SB – CS – GDP) / (AB – H + SH + SF + CS + GDP)

5. In the 2000 Baseball Research Journal, Mark Kanter argued for the use of New Production as an alternative to OPS:

NewProd = (TB + W + HBP + CI)/(AB + W + HBP + CI + SH + SF)

6. Leo Leahy, in his book Lumbermen, used Bases to Out Ratio:

BOR = (TB + W)/Outs

7. John McCarthy, in his book Baseball’s All-Time Dream Team, introduced Earned Base Average:

EBA = (TB + W + SB – CS)/(AB + W)

8. Craig Messmer, in a recent book entitled Stat One, uses what he calls P/E Ratio to evaluate batters. One of the two components of P/E Ratio is “complete bases” (TB + W + HBP + SB – CS) divided by plate appearances.

9. In the 1996 Baseball Research Journal, Lawrence Tenbarge wrote about his own Earned Base Average (apparently unrelated to McCarthy’s except in name):

EBA = (TB + W)/PA

This summary has focused solely on inventions of the statistic that appeared in books or prominently on the Internet. The list could be made much longer if I included proposals floated on message boards, sites with user diaries and other lower-profile outlets.

The amazing thing about this to me is that, as far as I can tell, all of these were presented as new ideas and new statistics. There were no disclaimers along the lines of “Statistic X is similar to Barry Codell’s Base-Out Percentage and Tom Boswell’s Total Average. However, I have decided to give it a different name because I excluded stolen base attempts and hit by pitch from the equation.” Quite the opposite: Some of the metrics were introduced breathlessly by their respective authors as a completely new way to look at the game.

This history review should make it obvious that, first, the idea of a base to out or PA ratio appeals to a lot of baseball observers, but inevitably fails to gain traction in the sabermetric community and, second, there is absolutely no need for anymore metrics of this family to be “invented.”

So what are we trying to measure?

Why all of this emphasis on bases anyway? After all, isn’t the true goal of an offense to score runs? Of course, everyone recognizes that to score a run, a batter/runner must touch all four bases. Does that fact require us, though, to consider moving from first to second to be equally valuable as reaching first base to begin with?

Let’s take a step back from that idea, and consider how TB + W + HB + SB counts bases. It tells us that a single, a walk, a hit batter and a stolen base are all worth one base. This is obviously true, if we consider the man hitting the single or stealing the base only. What about the baserunners, though? If there is no one on base, then a single will account for only the one base gained by the batter. With a runner at first, a single will account for at least two and quite often three bases.

All of the base-based measures above ignore the existence of baserunners, and implicitly assume that all plate appearances occur in bases empty situations (or alternatively posit that the effects of a batter’s actions on the other baserunners are meaningless in evaluating his performance). What if instead we look at how many bases ctually are accounted for, on average, for each event?

Thanks to the data collected by Retrosheet, here are the figures for 1999-2002 (kindly figured by Tango Tiger):

Event           Average Bases
Walk                   1.39
Hit Batter             1.48
Single                 1.83
Double                 3.23
Triple                 4.46
Home Run               5.41
Stolen Base            1.10

From this table, we confirm our intuition that singles account for more bases than walks, and we also see that the number of bases actually produced by the different flavors of hits is not nearly as steep as the total base relationship.

To make better sense of those numbers, allow me to express them relative to the number of bases produced by the average single (1.83). For example, the double value in the table below will be 3.23 divided by 1.83 equals 1.77:

Event     Bases relative to Single      ?
Walk               0.76               0.68
Hit Batter         0.81               0.68
Single             1.00               1.00
Double             1.77               1.64
Triple             2.44               2.23
Home Run           2.96               3.00
Stolen Base        0.60               0.36

Rather than being equal in base production to a single, walks contribute three-fourths as many bases. Home runs produce about three times as many bases as a single, not four. Again, the deviation from the simple total base relationship is a consequence of considering the bases advanced by baserunners in addition to those gained by the batter.

I also included a mystery column in the table. You can see that the values in the mystery column are strongly related to the corresponding relative base values.

The mystery column is the linear weight value of each event relative to that of a single (I used Mitchel Lichtman’s Super Linear Weights, but other linear weight methods would give similar results). Once one accounts for all of the bases resulting from a given event, the relative values of the respective events are very similar to linear weights.

So why not just use linear weights? After all, the relative values of the events are the same, and the linear weight values are based on changes in expected runs scored rather than just bases gained. All bases are not created equal; the average value of reaching first base is greater than the average value of stealing second. An additional benefit of linear weight values is that they are expressed in runs, and run scoring, not base compiling, is the true objective of an offense.

Those analysts who started by considering bases weren’t completely off base, especially when many recognized the importance of outs and constructed a base-out ratio. However, as the concept is refined and improved, the logical conclusion is the linear weight approach used by analysts from George Lindsey and Pete Palmer in the 1960s all the way to today.

As a final plea, certainly everyone can agree that the world has seen enough introductions of differently named, historically oblivious bases/something ratios, can’t we? Thirty years of reinventing the wheel should be sufficient.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG