In 1976, David Shoebotham published an article in SABR’s Baseball Research Journal on Relative Batting Average, which he developed to contextualize batting averages. Relative batting average took the form of individual BA divided by league BA (Shoebotham actually removed the individual’s statistics from the league totals before calculating the league average). From that point on, the use of such relative indices has been an oft-used tool. These measures are easily interpreted, as 1.0 or 100 percent represents league average (typically the percentage sign is omitted and the decimal place is discarded, so 100 is the average) and can be adapted for use with almost all metrics.
The most common relative measures used to measure overall performance were developed by Pete Palmer and popularized first through his books (co-authored with John Thorn) The Hidden Game of Baseball and Total Baseball and later made widely available by Baseball-Reference. For batters, Palmer used Normalized OPS or Production+ (later called OPS+) and for pitchers, ERA+. Thirty years after the publication of The Hidden Game, these two metrics remain the most frequently used relative measures in sabermetrically informed discussions. However, there are now alternatives to these old standbys readily available through other websites, and these are gaining traction.
Faced with the choice of which of these metrics to use, it is important to understand their key properties and how they are constructed. I will endeavor to compare ERA+ and OPS+ to their FanGraphs counterparts in this article. My objective is not to tell you which metrics are “better” — ultimately that decision depends on a number of considerations which may not be constant across users or uses. However, I will express my thoughts on particular properties of the metrics that I find preferable.
ERA+ is calculated as League ERA/ERA. A pitcher with a 3.00 ERA in a league in which the average is 4.00 has an ERA+ of 1.33 (usually displayed as 133). The rationale for putting the individual’s figure in the denominator is that such a construct results in a higher ERA+ representing better performances, since lower is better for ERA. At first blush, this seems like a reasonable enough decision. Since the new metric is going to be expressed as a unitless ratio, why not satisfy the aesthetic desire of many users to have a higher figure be better?
Unfortunately, the trick that enables a high ERA+ to be better than a low ERA+ carries along with it some undesirable side effects. To understand why this is the case, it’s helpful to take a close look at the formula:
ERA+ = League ERA/ERA = League ERA/(ER*9/IP) = League ERA*IP/(ER*9)
By putting the pitcher’s ERA in the denominator, the denominator of the metric is not innings pitched as in unadjusted ERA, but rather earned runs. The first consequence of this is that one cannot compute an average ERA+ for multiple seasons or multiple pitchers by weighting innings pitched, which would be the most intuitive procedure. Instead, any weighting must be done by (park-adjusted) earned runs.
Scaling the metric in this manner makes it awkward to express what the result represents. Typically, an ERA+ of 160 will be translated into English as “the pitcher’s ERA was 60 percent better than league average.” What an ERA+ of 160 actually means is that the league ERA was 60 percent higher than the individual’s ERA. Having to express the result in this manner actually shifts the focus to how the league compared to the pitcher. But given that the pitcher in question is the one whose performance the user of the statistic is attempting to contextualize, it would make much more sense to focus on how the pitcher compares to the league. Putting the pitcher’s ERA in the denominator makes it the standard of comparison rather than the league ERA.
As a result, the meaning of ERA+ differentials across pitchers is unintelligible. There are two basic ways in which to compare a pitcher to the league or another pitcher –- using a ratio or a differential. Most sabermetric approaches opt for the ratio (for good reasons which are beyond the scope of this article), but differentials can also be useful in some circumstances.
Consider two pitchers each in the same 4.00 ERA league, one with a 2.00 ERA and the other with a 5.00 ERA. They have ERA+s of 2.0 (200) and .80 (80) respectively. Of course, since the league context is the same in this example, we don’t really need ERA+ or any sort of relative metric to compare the two. Pitcher A allowed 40 percent as many runs per inning as Pitcher B (2.00/5.00) and three runs fewer per nine inning (5.00 – 2.00).
If we attempt to replicate these differences by using ERA+, we can do so in the ratio case, although we have to flip the numerator and denominator -– pitcher B’s ERA+ divided by pitcher A’s ERA+ (80/200 = 40 percent) confirms that pitcher A allowed runs at 40 percent the rate of pitcher B. However, we cannot make an easy comparison using the differences in ERA+ between the two.
One might think that we could take the ERA+ differential and multiply by the league average ERA to find the difference, but this does not work: (2.0 – .80)*4.00 = 4.80, whereas the correct answer is three. In order to get that result, we would have to take the reciprocal of each (1/.80 – 1/2.0)*4.00 = 3. Thus ERA+ distorts cross-pitcher differential comparisons. Each additional point of ERA+ is not worth the same in terms of runs.
There is an alternative approach to calculating a relative ERA. It is as easy to figure as ERA+, maintains the ability to make ratio comparisons between pitchers, and also permits differential comparisons between pitchers. The drawback in the eyes of some users is that it does not preserve the “bigger is better” property exhibited by ERA+. This approach is simply to divide ERA by league ERA rather than the inverse.
Using this construct addresses all three of the downsides of ERA+ I’ve discussed:
1. Innings pitched are in the denominator, meaning that any weighted averages can use innings pitched rather than earned runs for weighting.
2. The result can be stated in terms of the pitcher’s ERA relative to the league. A 2.00 ERA pitcher in a 4.00 ERA league has a figure of .50 or 50, which means he allowed runs at 50 percent of the league average rate.
3. Ratio comparisons are preserved, and differential comparisons are meaningful as well. In the example of pitcher A and pitcher B above, pitcher A now has a figure of .50 (2.00/4.00) and pitcher B of 1.25 (5.00/4.00). The differential (1.25 – .50) times the league ERA (4.00) yields three, the actual ERA difference between the two pitchers.
The use of such approach has a long history in sabermetrics. Bill James used what he called “Percentage of League” as his primary metric for cross-era pitcher comparisons in the original 1985 version of The Historical Baseball Abstract. James used all runs allowed rather than just earned runs, but the math is the same. Recently, this approach has gained traction through its adoption by FanGraphs, which has referred to the metric as “ERA-” in order to distinguish it from ERA+.
FanGraphs uses the ERA- approach for all its relative pitching metrics, offering users the choice of ERA-, FIP- and xFIP-. While the specific metric of choice may depend on the preferences of the user and the question at hand, the math underpinning the comparison to league average is the same regardless of which run average is chosen.
Shifting the focus to offensive measures, OPS+ falls victim to name-based confusion as well. Many assume that OPS+ would be calculated as the ratio of OPS to league OPS given its name, but in fact OPS+ compares On Base Average and Slugging Average to the league separately, then combines the two ratios:
OPS+ = OBA/LgOBA + SLG/LgSLG – 1
Rather than the expected relative OPS, OPS+ is actually relative OBA plus relative SLG minus one. While the name may be misleading, this construct has three favorable properties compared to relative OPS:
1. It gives greater weight to OBA than relative OPS. Since league SLG is typically in the neighborhood of 20 percent higher than league OBA, treating the two equally is comparable to giving a 20 percent boost to OBA. Why is this a positive for OPS+?
2. It correlates better with runs scored. Multiple approaches have suggested that the optimal weight to place on OBA in a formula to estimate runs scored of the format x*OBA + SLG would set x around 1.8. Thus the weight roughly equivalent to 1.2 applied in OPS+ is better than no weight at all, although less than ideal.
3. The scale matches runs scored. A team with an OPS+ that is 10 percent better than league average will tend to score 10 percent more runs than league average. The same is not true for relative OPS; a team whose OPS exceeds the league average by 10 percent can be expected to score 20 percent more runs than league average. In other words, relative OPS has an approximately 2:1 relationship with team runs scored. Thus OPS+ is directly interpretable in terms of relative runs scored, while relative OPS is not.
While OPS+ is superior in these respects to a hypothetical relative OPS, it still leaves much to be desired as an overall measure of offensive productivity, most significantly the undervaluation of the on-base portion of the equation. OPS+ also does not account for stolen bases and caught stealing.
Over the last decade or so, a consensus has developed within the sabermetric community that the optimal way to measure individual offensive productivity using standard season summary statistics (as opposed to using play-by-play data to incorporate situational effects) is through a linear weights approach. OPS is a statistic of convenience, used because OBA and SLG are both already in use and serve fairly well as complements to each other. OPS+ improves the weighting, but still relies on the coincidental marriage and haphazard weighting within OBA and SLG. The rationale for using a linear weights approach may have been best expressed by Thorn and Palmer when OPS+ was introduced in The Hidden Game. Describing OPS as a “shadow stat” to linear weights, they note that it is “not expressed in runs and thus lacks the philosophical appeal of Linear Weights.”
Given the advances in the ease of computing metrics over the past 30 years, there is no longer any significant advantage to be gained by using a “shadow stat” like OPS+ as the flagship measure of offensive productivity on an influential reference site. While OPS remains a quick and dirty calculation (given that OBA and SLG are already figured), no one calculates OPS+ for themselves by hand to begin with, making its continued use at the expense of linear weight-based metrics puzzling.
An implementation of linear weights that has amassed significant traction is Weighted On Base Average (wOBA), created by Tom Tango and popularized through its use in The Book and dissemination on FanGraphs. wOBA is an expression of a player’s run contribution (including the value created by avoiding outs and thus generating additional opportunities for one’s team) by subtracting the run value of an out (typically around -.3 runs when measured on a zero baseline basis) from the linear weight value of each positive offensive event. A scalar multiplier is then applied to achieve equality with On Base Average at the league level, and the result is divided by plate appearances to complete the conversion to an OBA scale. For 2013, the formula as used on FanGraphs was:
wOBA = (.888S + 1.271D + 1.616T + 2.101HR + .690W + .722HBP + .2SB – .384CS) / PA
The use of OBA as a scale is far from an inevitable choice — there are any number of alternate and substantially equivalent ways in which a linear weight-based metric could be expressed, including absolute runs per out (similar to Bill James’ RC/27), runs above average per plate appearance, and scaled to batting average rather than OBA (as Baseball Prospectus’ True Average is). Regardless of one’s scale preference for the initial raw metric, it’s imperative to consider how the relative version will be constructed.
Due to the modifications made to the event weights, a simple ratio of wOBA to league wOBA would not reflect the expected difference in run production (just as OPS/LgOPS does not). The conversion of wOBA to a relative measure (wRC+) is done by first computing a batter’s runs above average (wRAA):
wRAA = (wOBA – LgwOBA)/scalar * PA
Where “scalar” is the scalar multiplier described earlier, generally around 1.25 (for 2013 it was 1.277).
From there, wRAA/PA is divided by the league average R/PA. This results in the batter’s runs above average per plate appearance as a percentage of the league average. Adding one converts the scale to the familiar one in which one (or 100 if the decimal point is dropped) represents league-average:
wRC+ = (wRAA/PA)/Lg(R/PA) + 1
An additional upside is that switching from OPS+ to wRC+ requires less adaptation than moving from ERA+ to ERA-. The two metrics have the same intended scale and meaning; wRC+ simply offers improved weights and better theoretical grounding.
Improved weights and better theoretical grounding sound great to a sabermetrician, but what does it mean to an average user of the stat who doesn’t dive deep into the theory? The improved weights mean that wRC+ will correlate better with runs scored on the team level than OPS+, but the marginal improvement is limited — as Palmer and Thorn pointed out 30 years ago, OPS+ tracks linear weights fairly well and is nearly as accurate in estimating team runs scored.
However, the range of performance for major league teams is much narrower than the range of individual performance, which results in accuracy comparisons on the team level masking flaws of metrics (such as OPS+ undervaluing OBA). The magnitude of these errors is enhanced when considering individual players, whose ratio of OBA and SLG vary from the average.
Any number of players could be used to illustrate the subtle difference between OPS+ and wRC+, but Rickey Henderson is an obvious choice due to his extreme walk tendencies (second all-time in career walks) and his stolen base exploits, which are considered by wRC+ but not by OPS+. Henderson’s career OPS+ is 127, but his career wRC+ is 132. A difference like that may not seem particularly significant, but over a career the length of Henderson’s, it can be equivalent to several wins, a distortion that can be avoided by simply starting with a different metric.