Why wOBA Works

Calculating wOBA for players like Mike Trout or fictional cohort Al Trout is very simple (via Bryan Horowitz).

Calculating wOBA for players like Mike Trout or fictional cohort Al Trout is very simple (via Bryan Horowitz).

 

Johnson’s Scale

In the 1985 Bill James Baseball Abstract, Bill included an article from a guy named Paul Johnson who had developed his own version of Runs Created. From a sabermetric perspective, this was an important event for several reasons, but for me it all came down to the math.

Johnson called his formula Estimated Runs Produced and it’s a simple construct, really: positive batting events minus negative batting events (also known as outs). The trick is in the weighting. The formula goes like this:

(Two times (total bases plus walks plus HBP) plus hits plus stolen bases… minus
(.605 times (at-bats plus caught stealing plus GIDP minus hits))) times 0.16

I know the first part of the formula looks complicated, but it really isn’t.  As Johnson explained in the article, he had found that batting events follow a natural scale relative to each other in terms of their impact on run scoring. He just played around with math to find a way to replicate that scale in a formula.

The scale starts at 9 (the last single digit) and touches down on every odd number, throwing in “2″ right before the end. In other words, it’s 9, 7, 5, 3, 2, 1. Each number stands for the relative weight of a different type of batting event.

Specifically…

  • Home run: 9
  • Triple: 7
  • Double: 5
  • Single: 3
  • Walk: 2
  • Stolen base: 1

Johnson’s scale had a big impact on me and I have never forgotten it. If you remember Johnson’s scale, you’ll remember that a home run is worth about three times more than a single, or that a walk and a stolen base are worth about as much as a single. What’s more, Johnson’s scale is simple and easy to communicate. I’ve used it many times to explain how offense really works, including this article.

Most important, it’s true.

Linear Weights

Another reason Johnson’s article represented an important sabermetric landmark was that it directly addressed an argument that dominated sabermetrics for over two decades: what is the best way to estimate the number of runs a batter has contributed to his team? The argument raged with a white-hot intensity for a long time, but a winner eventually emerged. We call it linear weights.

I don’t want to go into the entire history or rationale for linear weights (here is a good history), but you should know that it is behind most of our advanced stats today. wOBA is based on linear weights, as are wRC and wRC+RE24 and WPA are based on the same logic as linear weights. UZR and DRS are essentially linear weights applied to fielding. WAR and WARP are based on linear weights, too.

So I’m going to multiply each number on Johnson’s scale by 0.16 (as in Johnson’s formula) and then compare the results to the linear weights that Tom Tango and friends published on page 26 of The Book (and which were based on the years 1999 to 2002). As you’ll see, there is virtually no difference between the two.

Johnson’s Scale Vs. Linear Weights
Event Scale Times 0.16 Linear Weights
Home Run 9 1.44 1.40
Triple 7 1.12 1.07
Double 5 0.8 0.78
Single 3 0.48 0.48
Unintentional Walk 2 0.32 0.32
Stolen Base 1 0.16 0.18

Paul Johnson’s scale mimics linear weights almost perfectly (don’t forget that linear weights change somewhat from year to year). Next time you need to remember the relative value of batting events, just remember Johnson’s scale. Leave the linear weights to the spreadsheet.

I like to remind people of Johnson’s scale because I’ve noticed that some folks are getting confused about linear weights and the relative value of baseball events.  One reason they’re getting confused is due to an invention that Tango introduced two pages later in The Book.

But first, we need to talk about the value of an out.

The Value of an Out

Go back and look at Johnson’s formula to see how he valued an out. He multiplied each out by .605, which means that an out equals -0.6 (that’s negative 0.6) on his 1-9 scale. Allow me to draw the comparison here in a way that mimics the table I just showed you. Following is the value of an out in…

The Value of an Out
Source Value
Johnson’s scale  -0.605
Multiplied by 0.16  -0.097
In Tango’s Linear Weights table  -0.299

In this case, there is a big difference between Johnson’s scale and Tango’s linear weights. In linear weights, the negative impact of an out is three times greater than in Johnson’s formula. You may wonder why this is.

Johnson wanted a metric that followed the same scale as a team’s total runs scored. In linear weights, however, everything is based on average. Generally speaking, if you apply Johnson’s formula to a league’s stats, you’ll get the total number of runs scored by the league.  If you apply linear weights to a league’s stats, you’ll get zero.

There’s another way to explain the difference. There are essentially three types of negative impacts of making an out:

  1. Removing a runner who is on base, which occurs during a caught stealing or double play.
  2. Decreasing the value of a runner on base, because he now has fewer outs in which to score during the rest of the inning.
  3. Reducing the potential number of runs a team can score in a game, by reducing the number of outs left in the game.

The third aspect—the “ticking clock” part of making an out—is calculated by simply dividing runs by outs. For instance, in the major leagues last year, there were 20,255 runs scored in 43,653.1 innings pitched, or 130,960 outs.  Divide 20,255 by 130,960 and you get 0.154. Make it negative and you have the “ticking clock” value of an out.

This is ignored in Johnson’s formula.  He just includes the impact of the first two aspects of making an out. This isn’t something he did on purpose—I didn’t understand it myself until Tango talked about the value of outs in this seminal post. But breaking apart the value of the out is a key to understanding different run estimation formulas like Estimated Runs Produced and wOBA.

wOBA

To create wOBA (which is basically a linear weights rate stat), Tom did something very clever. He ignored batter outs and instead added the positive value of an out (or the negative of the negative value) to each positive batting event. For example, he added 0.299 (the positive value of an out) to 0.48 (the value of a single) to get a new value for a single: 0.779. Let’s round up to 0.78.

Here is a table of the linear weight and wOBA weight of each batting event: (technical footnotes: I am not including the impact of the wOBA multiplier, which Tango uses to make wOBA follow the same scale as OBP.  It’s not really necessary for the discussion.  Also, FanGraphs’ implementation of wOBA doesn’t include stolen bases.)

Batting Event Weights
Event Linear wOBA
Home run 1.40 1.70
Triple 1.07 1.37
Double 0.78 1.08
Single 0.48 0.78
Unintentional walk 0.32 0.62
Stolen Base 0.18 0.48

As you can see, each wOBA weight is exactly 0.3 runs more than its linear weights value—0.3 being the positive value of the out. To calculate wOBA, simply multiply each batting event by its wOBA multiplier, add them up and divide the total by plate appearances. Voila, the perfect rate stat.

Why wOBA Works

Let’s say you have a player, let’s call him Al Trout, who has hit six singles in 10 plate appearances (making an out every other time) and the league, on average, hits three singles every 10 plate appearances (again, making an out every other time). To use linear weights to figure out how many more runs Al contributed above the league average, you’d…

  1. Calculate the extra runs Al contributed by hitting more singles, which equals the difference in singles times the run value of a single, or (6-3) times 0.48, or 1.44.  Then you’d…
  2. Calculate the extra runs that Al contributed by making fewer outs, which equals the difference in outs made times the run value of an out, or (4-7) times -0.30 (that’s a negative 0.30), or 0.9. Then you’d…
  3. Add the two together. 1.44 plus 0.9 equals 2.34. In his 10 plate appearances, Al contributed 2.34 runs more than the average player.

Okay, I have to add a technical footnote here. I have shown you this way of calculating linear weights because it will make it easier to understand the wOBA formula But, in reality, you only have to multiply Al’s singles and outs by the appropriate linear weights of that league and year to calculate the number of runs he contributed above average.

Anyway, that’s the hard linear weights way to calculate the difference.  Here’s the wOBA way:

  1. Multiply the difference in singles times the wOBA multiplier, or (6-3) times 0.78, or 2.34.

That’s it; one simple step. wOBA shows that Al contributed 2.34 runs more than the average player, the same outcome as the linear weights.

Why does this happen? Because wOBA weights include the impact of the hit AND the impact of turning an out into a hit. When you keep the number of plate appearances even, you’re not just adding a hit to the hit total.  You’re also reducing an out from the out total. wOBA captures the impact of both event changes.

wOBA fundamentally works because it is a rate stat.  Its divisor is plate appearances. When you compare two players’ wOBA you have equalized their total plate appearances.

Incremental vs. Changed Baseball Events

I hear and see this type of discussion all the time: what’s the impact of giving up a walk? Well, according to our linear weights table, it’s 0.32 runs, but according to our wOBA weights table, it’s 0.62 runs.  So which is it?

The answer is: Can you repeat the question? Because it really depends on what you’re asking. If you’re talking about adding a walk to a batter’s line—and also adding a plate appearance to account for the walk—then you’re adding 0.32 runs.  However, if you’re talking about adding a walk and keeping total plate appearances the same, that means that you’re adding a walk AND subtracting an out. The difference is 0.62 runs.

One kind of event is incremental; it’s added to the total. The other kind of event is a changed event; the number of plate appearances doesn’t change. To add an event you have to reduce the opposite kind of event.

Next time you get caught in one of these discussions, keep the distinction in mind.

Relative Value

So, then is a home run three times more valuable than a single (Johnson’s scale and linear weights) or 2.2 times more valuable (wOBA weights)? There’s no equivocation here: It is three times more valuable.  There is only one right way to ask and answer the question.

The wOBA weights don’t really speak to the relative value of baseball events; they speak to the number of plate appearances needed to make the tradeoff between the events and an out. In other words, “you have to convert 2.2 plate appearances from an out to a single to have the same impact as converting one plate appearance from an out to a home run.” Not many people think or speak in those terms.

So be very careful anytime you use wOBA weights as part of your thinking.  In fact, just stay away from wOBA weights and stick to the Johnson scale or the actual linear weights. You’ll be less confused.

Converting wOBA to Total Runs

I have a bias. I like run impact scales that add up to team and league totals. I like player run impact totals that are similar to their Runs Scored and/or Runs Batted In totals. I like being able to say that Paul Goldschmidt created 128 runs last year; not that he created 50 runs above average.

It’s just a thing of mine.

Thankfully, with Tango’s help, FanGraphs carries a number like that. It’s called wRC and it’s a simple derivative of wOBA. It works by taking the league average runs scored per plate appearance, adds in the player’s relative performance (according to wOBA) and then multiplies the total by the player’s plate appearances. It’s kind of cool, really.

The exact formula is…

(((wOBA – lgwOBA) / wOBAScale) + (lgR/PA)) * PA

wOBAScale is what I call the wOBA multiplier. You need it for the math, but it doesn’t impact the concepts we’re discussing here.

By adding back in runs per plate appearance, FanGraphs is adding some of the negative elements of an out, but not the “ticking clock” value of the out, just as Johnson did in his formula. Last year, major league teams scored 20,255 runs in 184,873 plate appearances. That’s 0.11 runs per plate appearance (for the value of an out, we’d make it negative), which is pretty much the same multiplier that Johnson used in his formula. If you don’t believe me, go back to the top of the page to take a look.

And now you know a quick-and-dirty way to calculate the run impact of an average out. Just add together runs scored per out (the “ticking clock” portion) and the runs scored per plate appearance (the impact on baserunners).

R/PA plus R/O

The formula says that the average negative run impact of an out last year was -0.154 plus -0.11 = -0.264. Tango’s Markov Calculator returns a value of -0.258.  Pretty close and much easier, right?

wOBA replacement level

Maybe you have your own bias. Maybe you’re okay with runs against average, or maybe you’re a fan of replacement level. Sadly, there is no replacement level version of wOBA, but I’m going to show you how to make your own.

First, pick a target replacement level winning percentage—I’m going to pick .300—and then use the Pythagorean formula to figure out a corresponding percentage decrease in runs scored. For a .300 winning percentage, I found that a replacement level offense would be 65 percent of the league average (assuming that defense is average).

In that case, the replacement level wOBA is this:

League wOBA minus (0.35 times league Runs per plate appearance times the wOBA multiplier)

The 0.35 is the result of subtracting 65 percent from one.

To apply this level to a player, take the wRC formula and replace the league wOBA with the replacement-level wOBA.  The formula looks like this:

(((wOBA – ReplwOBA) / wOBAScale) + (lgR/PA)) * PA

To help you along, here is a list of all wOBA weights by year—the first column is the wOBA multiplier. Remember, ignore all the other weights because they’ll just confuse the issue. Stick with Johnson’s scale.

Print Friendly
 Share on Facebook6Tweet about this on Twitter27Share on Google+0Share on Reddit0Email this to someone
« Previous: Park Features In Play
Next: Strikeouts: Definitely Bad, Likely Necessary »

Comments

  1. Dr. Doom said...

    Wow! Great primer. I’ve never seen those Johnson weights before; now I guess I know what I’ll be using from now on when I goof around with this stuff. Thanks!

  2. bob said...

    Could there be credit for a runner who scores on a sac fly? Because an attempt to score that way is a potential inning killer if the runner can’t make it, so doesn’t he deserve something positive for not making an out? And maybe similar reasoning for the runner on a sac bunt.

    • Tangotiger said...

      That’s why we have RE24.

      We’ve got a metric designed to answer various specific questions, and yours is handled by RE24.

  3. AP said...

    Good artikel but it needs a small correction. WOBA weight of Stolen base on the table “Batting Event Weight” is wrong. Stolen base doesn’t turn an out into a safe, because it is a baserunning event. If the linear weight of SB is 0.18, then the wOBA weight of it is also 0.18.

    • studes said...

      Thanks for the correction, AP. Tango has also told me that my math regarding replacement level is wrong, though I don’t really understand why. I’ll do some more research on it.

      • studes said...

        Okay, I get the objection to the way I approached replacement level. It’s this: given the way that offense and defense interact, a .300 team would have players on both offense and defense who are roughly .400 level players. Put two sides that are .400 together and you have a .300 team.

        The question is: how do you set replacement level for just one side or the other? Do you assume the other side (in my case, defense) is also replacement level, or do you assume it is equal to league average? Or something else?

        In other words, proceed with caution.

      • Ethan said...

        I think the answer here is that you just can’t make a replacement level for just offense or defense. The entire concept only works with overall value (WAR). There just isn’t a thing as “replacement level wOBA”. A replacement level player can have decent offense and bad defense, or good defense but awful offense, or anywhere in between. You could come up with a metric that calculated replacement-level wOBA assuming average defense (including positional adjustment), I think, though I can’t say I’m good enough with the math of it to say how. I think that might be what you ended up with here, roughly?

      • Dave Studeman said...

        You’re absolutely right, Ethan. I should have mentioned that a purist wouldn’t assign a replacement level to one particular skill, but only to a player’s total skillset. I included it here because I’ve seen people ask the same question online, but it’s probably best to tell people to just use wRC+.

  4. Andy said...

    You said Johnson’s value of an out, 0.097, was smaller than Tango’s 0.299, because the latter includes the “ticking clock” component, the effect the out has on scoring for the rest of the game. But the calculated value of this component, 0.154, does not account for all of the difference. I’m further confused when you say later that Tango used the Markov method to come up with a value of 0.258. What is the relationship between this value and the 0.299?

    P.S. – Very picky correction:

    In “If you remember Johnson’s scale, you’ll remember that a home run is worth about three runs more than a single…”, “runs” should be “times”, i.e., “worth about three times more [runs} than a single…”

    • Dave Studeman said...

      Hi Andy, sorry I wasn’t clear, but those out values are from different time periods. As I said, the 0.299 are based on the years 1999 to 2002. The latter values were for last year.

      • Andy said...

        OK, that possibility did cross my mind, I just thought that there wouldn’t be such a large change in runs per out over a decade. If one assumes that most of the change is in the third component, the ticking clock, then the change is about 30% [(0.299 - 0.097)/0.154]. If I understand that parameter correctly, that means that runs per game has gone down about 30% over that time period. I know it’s decreased, but I didn’t think it was that much.

        Even if the other two components had a similar decrease (which I guess is possible, because if offense has gone down, the negative value of an out is not as much?), the decrease is still about 15%.

      • Andy said...

        Thanks. I had another question, which I thought I posted, but it didn’t make it, so I’ll repeat it. Why can’t wOBA be calculated as just wRC/PA? It seems to me that wOBA should be just wRC normalized to PA, maybe adjusted by some factor to bring it in the same range as OBP.

        Going through the current FG leaderboards, I see that (PA x wOBA)/wRC is close to but not precisely 2.00 for the top leaders, but for players with progressively lower wRC and wOBA values, the value increases. So obviously wOBA is not directly related to wRC/PA, but I don’t understand why (or why there isn’t a stat that incorporates this ratio).

        I’ve also noticed that at FG, batting runs are close to, but not precisely the same as, wRAA. What is the difference between these two stats?

      • Dave Studeman said...

        Look at the wRC formula. If you divide that by PA, you simply have the first part of the equation left, which is (wOBA-lgwOBA)/wOBAScale plus lgR/PA. I don’t know what you’re trying to do, but if you’re trying to get back to the original wOBA, you’d have to know the league wOBA and the wOBA Scale. I’ll leave the math up to you.

  5. Andy said...

    Also, while you say wRC is a counting stat, wRC+ is clearly a rate stat. So the difference between the two is not just park factors, but something much more. Is wRC+ a measure of above average, like OPS+ is vs. OPS? But then it must be a form of wRC that is normalized for PA, or something else that turns it into a rate stat.

    • Andy said...

      If wRC is a counting stat, something seems to be missing. There should be a “pure” rate stat involving wRC, then wRC+ is derived from this by comparing it to the league average for the wRC rate stat.

  6. BobDD said...

    I’ve always been amused that fans ask if some particular action has been thought of in some formula – now I’m the guy thinking there is something I haven’t ever seen included.

    Do good hitters have more ‘passed balls’ occur during their ABs with runners aboard?
    Are there more passed balls with runners on for RH hitters?

    Regardless I assume these and ‘reached on errors’ are accounted for and given credit if someone has these ‘skills’ – and I would sure like to know who does.

    • Tangotiger said...

      Reached on Error *is* included, or at least, can be included if one so chooses. I know I’m a champion of including them. Derek Jeter is a ROE machine.

      PB as well as WP, BK, SB, CS, PK, DI, and the like, I attribute to the runner, not the batter. But, that’s just a decision I make in terms of selecting an either/or. You could just as well ask if there are more SB and fewer CS with a particular batter at the plate. Once you get into these partial credits, you are entering a new realm.

  7. Adam said...

    Thanks for the article! From my reading it seems as though one flaw of wOBA would be that it fails to differentiate one out from another in terms of run creation. For example, a sacrifice fly is a more valuable out than a GIDP. Also, I would expect that an out which advances a runner on the basepaths would have a less negative value than a strikeout. Am I missing something?

      • Dave Studeman said...

        That’s probably fair, Adam. wOBA doesn’t distinguish between different types of outs. If you want something like that, go with RE24, which is situation-specific. Or you can head over to Baseball Reference, where Sean lists “BtRuns,” which is good old-fashioned linear weights. For both stats, you’re pretty much stuck with the scale of runs above or below average.

  8. Adam said...

    Thanks for the response Dave. Wouldn’t a metric, perhaps like RE24, that accounts for types of each individual out be a more accurate measure for calculating how many runs player X created during a season?

    • Dave Studeman said...

      I’d say in general, yes, but it’s complicated. A stat like RE24 takes all situations into account. For instance, a single that drives in a runner from second is worth more than the same event with two outs than with no outs. There are a lot of people who feel that this shouldn’t be a factor, because batters can’t control if they manage a double with no out vs. two out, plus batters face unequal numbers of opportunities in each type of situation.

      So, these folks would prefer straight linear weights over RE24. However, sacrifice flies are the result of a situation too. So how much credit should be given to a batter who flies out when there just happens to be a runner on third? OTOH, if you believe giving credit is appropriate, why not go all the way and use WPA?

      Basically, what you value most in a run stat depends partly on what you believe is important and giving batters “credit” for the situation is a divisive issue.

  9. John C said...

    Excellent primer. I adopted wOBA for my teams as soon as I learned about it from Tango’s site. BA, OBA, SLG, ISO, ISOd are all readily calculated directly from baseball events that we see. wOBA’s intermediate LW calculations unfortunately provide a barrier to the uninitiated. Dave provided a huge service in clearly explaining the intermediate invisible calculations of wOBA.
    The only issue I have with wOBA is given only that number, we don’t know if the value is driven by OBA, SLG, or or a more or less balanced mix for individual players or teams. For example, players like Alfonso Soriano and Nelson Cruz derive much of their value from SLG, while Xander Boegarts derives much of his value from OBP; wOBA alone will not tell us that.

  10. Tangotiger said...

    “wOBA alone will not tell us that”

    Nor was it designed to do so. I don’t know how you can fairly compare two metrics (OBP and SLG) to one, and then say that it’s a deficiency in wOBA that it can only tell you one thing while the others two things tell you… well, two.

    In any case, I use wOBA and OBP. When wOBA is greater than OBP, then it’s disproportionate power to walks. When wOBA is less than OBP, then it’s disproportionate walks to power. And the greater the disparity, the greater the disproportion.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current day month ye@r *