Why wOBA Works

Calculating wOBA for players like Mike Trout or fictional cohort Al Trout is very simple (via Bryan Horowitz).

Calculating wOBA for players like Mike Trout or fictional cohort Al Trout is very simple (via Bryan Horowitz).

 

Johnson’s Scale

In the 1985 Bill James Baseball Abstract, Bill included an article from a guy named Paul Johnson who had developed his own version of Runs Created. From a sabermetric perspective, this was an important event for several reasons, but for me it all came down to the math.

Johnson called his formula Estimated Runs Produced and it’s a simple construct, really: positive batting events minus negative batting events (also known as outs). The trick is in the weighting. The formula goes like this:

(Two times (total bases plus walks plus HBP) plus hits plus stolen bases… minus
(.605 times (at-bats plus caught stealing plus GIDP minus hits))) times 0.16

I know the first part of the formula looks complicated, but it really isn’t.  As Johnson explained in the article, he had found that batting events follow a natural scale relative to each other in terms of their impact on run scoring. He just played around with math to find a way to replicate that scale in a formula.

The scale starts at 9 (the last single digit) and touches down on every odd number, throwing in “2″ right before the end. In other words, it’s 9, 7, 5, 3, 2, 1. Each number stands for the relative weight of a different type of batting event.

Specifically…

  • Home run: 9
  • Triple: 7
  • Double: 5
  • Single: 3
  • Walk: 2
  • Stolen base: 1

Johnson’s scale had a big impact on me and I have never forgotten it. If you remember Johnson’s scale, you’ll remember that a home run is worth about three times more than a single, or that a walk and a stolen base are worth about as much as a single. What’s more, Johnson’s scale is simple and easy to communicate. I’ve used it many times to explain how offense really works, including this article.

Most important, it’s true.

Linear Weights

Another reason Johnson’s article represented an important sabermetric landmark was that it directly addressed an argument that dominated sabermetrics for over two decades: what is the best way to estimate the number of runs a batter has contributed to his team? The argument raged with a white-hot intensity for a long time, but a winner eventually emerged. We call it linear weights.

I don’t want to go into the entire history or rationale for linear weights (here is a good history), but you should know that it is behind most of our advanced stats today. wOBA is based on linear weights, as are wRC and wRC+RE24 and WPA are based on the same logic as linear weights. UZR and DRS are essentially linear weights applied to fielding. WAR and WARP are based on linear weights, too.

So I’m going to multiply each number on Johnson’s scale by 0.16 (as in Johnson’s formula) and then compare the results to the linear weights that Tom Tango and friends published on page 26 of The Book (and which were based on the years 1999 to 2002). As you’ll see, there is virtually no difference between the two.

Johnson’s Scale Vs. Linear Weights
Event Scale Times 0.16 Linear Weights
Home Run 9 1.44 1.40
Triple 7 1.12 1.07
Double 5 0.8 0.78
Single 3 0.48 0.48
Unintentional Walk 2 0.32 0.32
Stolen Base 1 0.16 0.18

Paul Johnson’s scale mimics linear weights almost perfectly (don’t forget that linear weights change somewhat from year to year). Next time you need to remember the relative value of batting events, just remember Johnson’s scale. Leave the linear weights to the spreadsheet.

I like to remind people of Johnson’s scale because I’ve noticed that some folks are getting confused about linear weights and the relative value of baseball events.  One reason they’re getting confused is due to an invention that Tango introduced two pages later in The Book.

But first, we need to talk about the value of an out.

The Value of an Out

Go back and look at Johnson’s formula to see how he valued an out. He multiplied each out by .605, which means that an out equals -0.6 (that’s negative 0.6) on his 1-9 scale. Allow me to draw the comparison here in a way that mimics the table I just showed you. Following is the value of an out in…

The Value of an Out
Source Value
Johnson’s scale  -0.605
Multiplied by 0.16  -0.097
In Tango’s Linear Weights table  -0.299

In this case, there is a big difference between Johnson’s scale and Tango’s linear weights. In linear weights, the negative impact of an out is three times greater than in Johnson’s formula. You may wonder why this is.

Johnson wanted a metric that followed the same scale as a team’s total runs scored. In linear weights, however, everything is based on average. Generally speaking, if you apply Johnson’s formula to a league’s stats, you’ll get the total number of runs scored by the league.  If you apply linear weights to a league’s stats, you’ll get zero.

There’s another way to explain the difference. There are essentially three types of negative impacts of making an out:

  1. Removing a runner who is on base, which occurs during a caught stealing or double play.
  2. Decreasing the value of a runner on base, because he now has fewer outs in which to score during the rest of the inning.
  3. Reducing the potential number of runs a team can score in a game, by reducing the number of outs left in the game.

The third aspect—the “ticking clock” part of making an out—is calculated by simply dividing runs by outs. For instance, in the major leagues last year, there were 20,255 runs scored in 43,653.1 innings pitched, or 130,960 outs.  Divide 20,255 by 130,960 and you get 0.154. Make it negative and you have the “ticking clock” value of an out.

This is ignored in Johnson’s formula.  He just includes the impact of the first two aspects of making an out. This isn’t something he did on purpose—I didn’t understand it myself until Tango talked about the value of outs in this seminal post. But breaking apart the value of the out is a key to understanding different run estimation formulas like Estimated Runs Produced and wOBA.

wOBA

To create wOBA (which is basically a linear weights rate stat), Tom did something very clever. He ignored batter outs and instead added the positive value of an out (or the negative of the negative value) to each positive batting event. For example, he added 0.299 (the positive value of an out) to 0.48 (the value of a single) to get a new value for a single: 0.779. Let’s round up to 0.78.

Here is a table of the linear weight and wOBA weight of each batting event: (technical footnotes: I am not including the impact of the wOBA multiplier, which Tango uses to make wOBA follow the same scale as OBP.  It’s not really necessary for the discussion.  Also, FanGraphs’ implementation of wOBA doesn’t include stolen bases.)

Batting Event Weights
Event Linear wOBA
Home run 1.40 1.70
Triple 1.07 1.37
Double 0.78 1.08
Single 0.48 0.78
Unintentional walk 0.32 0.62
Stolen Base 0.18 0.48

As you can see, each wOBA weight is exactly 0.3 runs more than its linear weights value—0.3 being the positive value of the out. To calculate wOBA, simply multiply each batting event by its wOBA multiplier, add them up and divide the total by plate appearances. Voila, the perfect rate stat.

Why wOBA Works

Let’s say you have a player, let’s call him Al Trout, who has hit six singles in 10 plate appearances (making an out every other time) and the league, on average, hits three singles every 10 plate appearances (again, making an out every other time). To use linear weights to figure out how many more runs Al contributed above the league average, you’d…

  1. Calculate the extra runs Al contributed by hitting more singles, which equals the difference in singles times the run value of a single, or (6-3) times 0.48, or 1.44.  Then you’d…
  2. Calculate the extra runs that Al contributed by making fewer outs, which equals the difference in outs made times the run value of an out, or (4-7) times -0.30 (that’s a negative 0.30), or 0.9. Then you’d…
  3. Add the two together. 1.44 plus 0.9 equals 2.34. In his 10 plate appearances, Al contributed 2.34 runs more than the average player.

Okay, I have to add a technical footnote here. I have shown you this way of calculating linear weights because it will make it easier to understand the wOBA formula But, in reality, you only have to multiply Al’s singles and outs by the appropriate linear weights of that league and year to calculate the number of runs he contributed above average.

Anyway, that’s the hard linear weights way to calculate the difference.  Here’s the wOBA way:

  1. Multiply the difference in singles times the wOBA multiplier, or (6-3) times 0.78, or 2.34.

That’s it; one simple step. wOBA shows that Al contributed 2.34 runs more than the average player, the same outcome as the linear weights.

Why does this happen? Because wOBA weights include the impact of the hit AND the impact of turning an out into a hit. When you keep the number of plate appearances even, you’re not just adding a hit to the hit total.  You’re also reducing an out from the out total. wOBA captures the impact of both event changes.

wOBA fundamentally works because it is a rate stat.  Its divisor is plate appearances. When you compare two players’ wOBA you have equalized their total plate appearances.

Incremental vs. Changed Baseball Events

I hear and see this type of discussion all the time: what’s the impact of giving up a walk? Well, according to our linear weights table, it’s 0.32 runs, but according to our wOBA weights table, it’s 0.62 runs.  So which is it?

The answer is: Can you repeat the question? Because it really depends on what you’re asking. If you’re talking about adding a walk to a batter’s line—and also adding a plate appearance to account for the walk—then you’re adding 0.32 runs.  However, if you’re talking about adding a walk and keeping total plate appearances the same, that means that you’re adding a walk AND subtracting an out. The difference is 0.62 runs.

One kind of event is incremental; it’s added to the total. The other kind of event is a changed event; the number of plate appearances doesn’t change. To add an event you have to reduce the opposite kind of event.

Next time you get caught in one of these discussions, keep the distinction in mind.

Relative Value

So, then is a home run three times more valuable than a single (Johnson’s scale and linear weights) or 2.2 times more valuable (wOBA weights)? There’s no equivocation here: It is three times more valuable.  There is only one right way to ask and answer the question.

The wOBA weights don’t really speak to the relative value of baseball events; they speak to the number of plate appearances needed to make the tradeoff between the events and an out. In other words, “you have to convert 2.2 plate appearances from an out to a single to have the same impact as converting one plate appearance from an out to a home run.” Not many people think or speak in those terms.

So be very careful anytime you use wOBA weights as part of your thinking.  In fact, just stay away from wOBA weights and stick to the Johnson scale or the actual linear weights. You’ll be less confused.

Converting wOBA to Total Runs

I have a bias. I like run impact scales that add up to team and league totals. I like player run impact totals that are similar to their Runs Scored and/or Runs Batted In totals. I like being able to say that Paul Goldschmidt created 128 runs last year; not that he created 50 runs above average.

It’s just a thing of mine.

Thankfully, with Tango’s help, FanGraphs carries a number like that. It’s called wRC and it’s a simple derivative of wOBA. It works by taking the league average runs scored per plate appearance, adds in the player’s relative performance (according to wOBA) and then multiplies the total by the player’s plate appearances. It’s kind of cool, really.

The exact formula is…

(((wOBA – lgwOBA) / wOBAScale) + (lgR/PA)) * PA

wOBAScale is what I call the wOBA multiplier. You need it for the math, but it doesn’t impact the concepts we’re discussing here.

By adding back in runs per plate appearance, FanGraphs is adding some of the negative elements of an out, but not the “ticking clock” value of the out, just as Johnson did in his formula. Last year, major league teams scored 20,255 runs in 184,873 plate appearances. That’s 0.11 runs per plate appearance (for the value of an out, we’d make it negative), which is pretty much the same multiplier that Johnson used in his formula. If you don’t believe me, go back to the top of the page to take a look.

And now you know a quick-and-dirty way to calculate the run impact of an average out. Just add together runs scored per out (the “ticking clock” portion) and the runs scored per plate appearance (the impact on baserunners).

R/PA plus R/O

The formula says that the average negative run impact of an out last year was -0.154 plus -0.11 = -0.264. Tango’s Markov Calculator returns a value of -0.258.  Pretty close and much easier, right?

wOBA replacement level

Maybe you have your own bias. Maybe you’re okay with runs against average, or maybe you’re a fan of replacement level. Sadly, there is no replacement level version of wOBA, but I’m going to show you how to make your own.

First, pick a target replacement level winning percentage—I’m going to pick .300—and then use the Pythagorean formula to figure out a corresponding percentage decrease in runs scored. For a .300 winning percentage, I found that a replacement level offense would be 65 percent of the league average (assuming that defense is average).

In that case, the replacement level wOBA is this:

League wOBA minus (0.35 times league Runs per plate appearance times the wOBA multiplier)

The 0.35 is the result of subtracting 65 percent from one.

To apply this level to a player, take the wRC formula and replace the league wOBA with the replacement-level wOBA.  The formula looks like this:

(((wOBA – ReplwOBA) / wOBAScale) + (lgR/PA)) * PA

To help you along, here is a list of all wOBA weights by year—the first column is the wOBA multiplier. Remember, ignore all the other weights because they’ll just confuse the issue. Stick with Johnson’s scale.


Dave Studeman was called a "national treasure" by Rob Neyer. Seriously. Follow his sporadic tweets @dastudes.
30 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Dr. Doom
9 years ago

Wow! Great primer. I’ve never seen those Johnson weights before; now I guess I know what I’ll be using from now on when I goof around with this stuff. Thanks!

tz
9 years ago

Thanks for putting all this together in one place. I’ll be bookmarking this as my personal wOBA reference.

bob
9 years ago

Could there be credit for a runner who scores on a sac fly? Because an attempt to score that way is a potential inning killer if the runner can’t make it, so doesn’t he deserve something positive for not making an out? And maybe similar reasoning for the runner on a sac bunt.

Tangotiger
9 years ago
Reply to  bob

That’s why we have RE24.

We’ve got a metric designed to answer various specific questions, and yours is handled by RE24.

AP
9 years ago

Good artikel but it needs a small correction. WOBA weight of Stolen base on the table “Batting Event Weight” is wrong. Stolen base doesn’t turn an out into a safe, because it is a baserunning event. If the linear weight of SB is 0.18, then the wOBA weight of it is also 0.18.

studes
9 years ago
Reply to  AP

Thanks for the correction, AP. Tango has also told me that my math regarding replacement level is wrong, though I don’t really understand why. I’ll do some more research on it.

studes
9 years ago
Reply to  studes

Okay, I get the objection to the way I approached replacement level. It’s this: given the way that offense and defense interact, a .300 team would have players on both offense and defense who are roughly .400 level players. Put two sides that are .400 together and you have a .300 team.

The question is: how do you set replacement level for just one side or the other? Do you assume the other side (in my case, defense) is also replacement level, or do you assume it is equal to league average? Or something else?

In other words, proceed with caution.

Ethan
9 years ago
Reply to  studes

I think the answer here is that you just can’t make a replacement level for just offense or defense. The entire concept only works with overall value (WAR). There just isn’t a thing as “replacement level wOBA”. A replacement level player can have decent offense and bad defense, or good defense but awful offense, or anywhere in between. You could come up with a metric that calculated replacement-level wOBA assuming average defense (including positional adjustment), I think, though I can’t say I’m good enough with the math of it to say how. I think that might be what you ended up with here, roughly?

Noah Woodward
9 years ago

Great stuff, Studes!

Andy
9 years ago

You said Johnson’s value of an out, 0.097, was smaller than Tango’s 0.299, because the latter includes the “ticking clock” component, the effect the out has on scoring for the rest of the game. But the calculated value of this component, 0.154, does not account for all of the difference. I’m further confused when you say later that Tango used the Markov method to come up with a value of 0.258. What is the relationship between this value and the 0.299?

P.S. – Very picky correction:

In “If you remember Johnson’s scale, you’ll remember that a home run is worth about three runs more than a single…”, “runs” should be “times”, i.e., “worth about three times more [runs} than a single…”

Andy
9 years ago
Reply to  Dave Studeman

OK, that possibility did cross my mind, I just thought that there wouldn’t be such a large change in runs per out over a decade. If one assumes that most of the change is in the third component, the ticking clock, then the change is about 30% [(0.299 – 0.097)/0.154]. If I understand that parameter correctly, that means that runs per game has gone down about 30% over that time period. I know it’s decreased, but I didn’t think it was that much.

Even if the other two components had a similar decrease (which I guess is possible, because if offense has gone down, the negative value of an out is not as much?), the decrease is still about 15%.

Andy
9 years ago
Reply to  Dave Studeman

Thanks. I had another question, which I thought I posted, but it didn’t make it, so I’ll repeat it. Why can’t wOBA be calculated as just wRC/PA? It seems to me that wOBA should be just wRC normalized to PA, maybe adjusted by some factor to bring it in the same range as OBP.

Going through the current FG leaderboards, I see that (PA x wOBA)/wRC is close to but not precisely 2.00 for the top leaders, but for players with progressively lower wRC and wOBA values, the value increases. So obviously wOBA is not directly related to wRC/PA, but I don’t understand why (or why there isn’t a stat that incorporates this ratio).

I’ve also noticed that at FG, batting runs are close to, but not precisely the same as, wRAA. What is the difference between these two stats?

Andy
9 years ago

Also, while you say wRC is a counting stat, wRC+ is clearly a rate stat. So the difference between the two is not just park factors, but something much more. Is wRC+ a measure of above average, like OPS+ is vs. OPS? But then it must be a form of wRC that is normalized for PA, or something else that turns it into a rate stat.

Andy
9 years ago
Reply to  Andy

If wRC is a counting stat, something seems to be missing. There should be a “pure” rate stat involving wRC, then wRC+ is derived from this by comparing it to the league average for the wRC rate stat.

BobDD
9 years ago

I’ve always been amused that fans ask if some particular action has been thought of in some formula – now I’m the guy thinking there is something I haven’t ever seen included.

Do good hitters have more ‘passed balls’ occur during their ABs with runners aboard?
Are there more passed balls with runners on for RH hitters?

Regardless I assume these and ‘reached on errors’ are accounted for and given credit if someone has these ‘skills’ – and I would sure like to know who does.

Tangotiger
9 years ago
Reply to  BobDD

Reached on Error *is* included, or at least, can be included if one so chooses. I know I’m a champion of including them. Derek Jeter is a ROE machine.

PB as well as WP, BK, SB, CS, PK, DI, and the like, I attribute to the runner, not the batter. But, that’s just a decision I make in terms of selecting an either/or. You could just as well ask if there are more SB and fewer CS with a particular batter at the plate. Once you get into these partial credits, you are entering a new realm.

Andy
9 years ago
Reply to  Tangotiger

Are ROE data available at FG? I haven’t been able to find them on any of the leaderboards.

Adam
9 years ago

Thanks for the article! From my reading it seems as though one flaw of wOBA would be that it fails to differentiate one out from another in terms of run creation. For example, a sacrifice fly is a more valuable out than a GIDP. Also, I would expect that an out which advances a runner on the basepaths would have a less negative value than a strikeout. Am I missing something?

Adam
9 years ago
Reply to  Adam

Cannot edit my above comment, but I meant to write strikeout where I wrote GIDP.

Adam
9 years ago

Thanks for the response Dave. Wouldn’t a metric, perhaps like RE24, that accounts for types of each individual out be a more accurate measure for calculating how many runs player X created during a season?

John C
9 years ago

Excellent primer. I adopted wOBA for my teams as soon as I learned about it from Tango’s site. BA, OBA, SLG, ISO, ISOd are all readily calculated directly from baseball events that we see. wOBA’s intermediate LW calculations unfortunately provide a barrier to the uninitiated. Dave provided a huge service in clearly explaining the intermediate invisible calculations of wOBA.
The only issue I have with wOBA is given only that number, we don’t know if the value is driven by OBA, SLG, or or a more or less balanced mix for individual players or teams. For example, players like Alfonso Soriano and Nelson Cruz derive much of their value from SLG, while Xander Boegarts derives much of his value from OBP; wOBA alone will not tell us that.

Tangotiger
9 years ago

“wOBA alone will not tell us that”

Nor was it designed to do so. I don’t know how you can fairly compare two metrics (OBP and SLG) to one, and then say that it’s a deficiency in wOBA that it can only tell you one thing while the others two things tell you… well, two.

In any case, I use wOBA and OBP. When wOBA is greater than OBP, then it’s disproportionate power to walks. When wOBA is less than OBP, then it’s disproportionate walks to power. And the greater the disparity, the greater the disproportion.