Adjusting RE24 for baserunning

In 2012, Jason Heyward was worth seven and a half runs on the base paths, independent of his stolen bases and caught stealings. In other words, his ability to advance from first to third and second to home on singles, and first to home on doubles, led the Braves to score about seven or eight more runs on the season than they would have had an average baserunner replaced Heyward. This is quantified in UBR, or Ultimate Base Running.

UBR is one aspect of FanGraphs’ WAR formula, because WAR’s offensive component, Batting Runs, does not take the actual changes in base/out state into consideration. That is, Batting Runs cares only about the event itself, not the game context in which the event occurred.

But guess what does consider the beginning and end base-out states. That’s right: RE24! A reminder, in case you forgot: RE24 is a cumulative statistic that measures the change in run expectancy from the beginning to the end of the play. Hit a single with no one on and no one out? Your RE24 is just the run expectancy with a man on first and no outs minus the run expectancy with the bases empty and no outs. Sound reasonable? I think so.

But what about RE24 with runners on base? If, say, Dan Uggla hits a single when Jason Heyward is on first and no outs, and Heyward ends up advancing to third, Uggla gets credit for the change in run expectancy from runner on first with no outs to runners on first and third with no outs. But if you recall from the first paragraph, Jason Heyward was excellent (best in the league, in fact) at non-stolen base baserunning, yet Uggla receives all the credit if we use RE24.

Let’s try to fix that. Remember, we want to give the hitter credit for the actual context-dependent value of a hit with runners on base, but without also giving him credit for the baserunning of those ahead of him. This is important, because unlike, for example, the quality of the pitcher or the quality of the defense, each hitter is likely to have only a limited variety of baserunners ahead of him in a given season.

Boring methodology

So how do we adjust RE24 to remove baserunning? My first thought when I was brainstorming this question was to simply figure out who was on base in front of each hitter during the season and subtract some portion of their UBR from the hitter’s RE24. However, obviously, that’s far too simple, for it assumes that a baserunner’s value is evenly distributed among opportunities. This is not true, so we must be more specific about how we make this adjustment.

Then, it hit me. Instead of just looking at the beginning state and end state, which credits this entire change to the hitter, we could instead look at the event—that is, single, double, strikeout, etc—and figure out where we expect the baserunners to be at the end of the play. In other words, we simply need to find the average change in run expectancy for each event for each base/out state, and apply these values to each play. So if Dan Uggla hits a single and Heyward moves to third with no outs, Uggla will not get credit for the entire change in run expectancy, but instead will only receive credit for what the run expectancy would be if an average runner was on first, or the average run expectancy for a single in a man on first, no out situation.

Of course, there are some issues with this approach. First of all, unless I use many many years of data, some of the event/base/out combinations are going to have very small sample sizes. Triples are already rare, so a triple with a man on third and no outs will be even more rare (turns out this only happened seven times from 2010 to 2012). And while we could just leave triples out of this adjustment, there is a possibility of a baserunner being thrown out at home on a triple, so it is best to include them. Luckily, any event that has a very small sample size with regard to run expectancy will not have a significant impact on a player’s baserunning-adjusted RE24, so it is an issue that I can ignore for the sake of this non-scientific article.

The other issue with this approach is that we do not want to adjust all events for baserunning, because we want to award, for example, “productive outs”. I know, I know, that’s not a popular phrase in sabermetrics, and I’m not saying that it should be encouraged. However, if want to measure context-dependent offensive contribution with run expectancy, we can’t assume that every out (by base/out state) is equal; it is better for hitters to hit a grounder to the right side with a runner on second than a pop-up to shallow left. Yes, there is some baserunning skill in outs, but my intuition, and hopefully yours as well, is that we should assign more credit to the hitter for productive outs than the baserunner.

I hope my reasoning makes sense. Before I give you my adjusted RE24 values, I want to make clear that these are not perfect measurements. I did not use Markov chains to calculate the run expectancy values, as I would/should if these data were to be actually used. I also didn’t necessarily assign all the proper credit where credit is due; however, you would likely have to go through every play individually to really get this right, and even then a lot of guesswork and subjectivity is involved. The purpose of this article is to simply get an idea of what happens when we remove baserunning from the equation, and who it helps and hurts the most.

A few final notes:
- I created my run expectancy values from the 2010-2012 seasons.
- I did not adjust for park or league, so keep that in mind if you use the results to rank players (which you probably should not do).
- “RE24″ is not the same as the RE24 on FanGraphs or Baseball-Reference, so don’t be confused by the differences you may find.
- I did not include stolen bases and caught stealings in either version of RE24.

Results

First of all, let’s look at the players who had the largest positive difference between their RE24 and their adjusted RE24. These are the players that benefited most from good baserunning ahead of them.

Name PA RE24 adjRE24 Diff
Joe Mauer 641 46.22 39.13 7.10
Alex Gordon 721 14.12 7.36 6.76
Torii Hunter 584 20.86 14.46 6.41
Josh Hamilton 636 46.47 41.13 5.34
Edwin Encarnacion 644 57.84 52.56 5.27
Billy Butler 678 27.12 21.95 5.17
Howie Kendrick 594 -3.72 -8.81 5.09
Will Venable 470 12.75 7.91 4.84
Carlos Gonzalez 579 26.01 21.17 4.84
Ryan Zimmerman 641 22.74 18.20 4.54

These names, when you look at the UBR leaderboard, are pretty obvious. Mauer’s difference comes largely from Ben Revere and Denard Span, whose impact was even greater because of the large number of singles and doubles that Mauer hit. Alex Gordon can be explained by Alcides Escobar, and Torii Hunter can be explained by none other than Mike Trout.

Interestingly, the first Braves player on that list isn’t until number 24 with Chipper Jones. So, although Jason Heyward had ridiculous baserunning numbers, the effect may have just been distributed more evenly among his teammates.

Next, we’ll look at the players with the biggest negative difference between the two; that is hitters with bad baserunning in front of them:

Name PA RE24 adjRE24 Diff
Hanley Ramirez 667 0.83 8.87 -8.04
Nelson Cruz 642 14.60 20.55 -5.95
Jose Altuve 630 -1.80 3.71 -5.52
A.J. Pierzynski 520 9.66 14.76 -5.10
Everth Cabrera 449 -19.40 -14.42 -4.98
Shane Victorino 666 -8.46 -3.70 -4.76
Yunel Escobar 608 -23.17 -18.74 -4.43
Bryce Harper 597 7.51 11.78 -4.28
Adam LaRoche 647 23.14 27.40 -4.26
Michael Saunders 553 -2.59 1.52 -4.11

Hanley Ramirez, for some odd reason, leads the pack. I say some odd reason because I can’t for the life of me figure out why the baserunning in front of him was so bad. For reference, take a look at the runners who were on first base when Hanley came up to bat:

Runner PA UBR
Jose Reyes 37 1.8
Emilio Bonifacio 34 1.6
Adrian Gonzalez 34 -2.4
Omar Infante 30 0.7
Andre Ethier 26 0.8
Matt Kemp 17 -0.5

Doesn’t look like -8 runs of bad baserunning to you, does it? The numbers for runners on second are very similar, so I’m quite perplexed. It could be the case that these runners just happened to have bad baserunning blunders on Hanley’s plate appearances, or that bad baserunners like Adrian Gonzalez were on most often when Hanley hit singles or doubles. After Hanley, Cruz’s difference is explained by the bad baserunning of Adrian Beltre and Michael Young, and Altuve by Jordan Schafer, though the extent of the difference is, like Hanley, perplexing.

Removing baserunning from RE24 doesn’t do a whole lot for the majority of players; at most, it will change the result by seven or eight runs, but more often it will be around one or two. However, regardless of the practical effects, it’s important to think about how we assign credit and blame for both context-independent metrics like wOBA and context-dependent metrics like RE24. This is just one of the ways in which we can do that.

If you’d like to see these numbers for all players, I made a full Google spreadsheet here.

Thanks to Retrosheet, FanGraphs, and Baseball-Reference for the data.

Print Friendly
 Share on Facebook0Tweet about this on Twitter0Share on Google+0Share on Reddit0Email this to someone
« Previous: Fantasy Waiver Wire: Week 17, Vol. II
Next: The all-decade team: the ‘60s »

Comments

  1. Peter Jensen said...

    Matt – If the hitter himself is a good baserunner he will stretch a single into a double more often and the man on first will have fewer opportunities to score.  How a ball is hit and the running ability of the batter are two important factors that UBR does not control for, hence they are showing up in your study as something that should be adjusted out of the batting numbers when they are very much characteristics of the batter. 

    Then there is also the factor of when the runner is running on the pitch and the batter puts the ball in play.

  2. Matt Hunter said...

    Professor: I’m glad you liked it!

    Peter: To your first point, you’re saying that my version will give too much credit to good baserunners, right? So if there is a runner on first and the batter hits what would usually be a single but stretches into a double, normal RE24 will just give him credit for the change from 1__ to _23. But my adjRE24 will assume the double was a normal double, which will give the batter credit for however often a runner on first usually scores on doubles.

    That’s definitely an issue with my approach, but I don’t see it as a major one. I’d expect that a batter stretching a single into a double is less common than a runner advancing from first to home on a double. That being said, there are definitely holes in my approach, and ways to improve, so thank you for your feedback!

    To the second point, I’m a bit confused, because it seems like this is the very reason I wanted to adjust RE24. The normal version would give the batter complete credit for extra bases advanced on a hit-and-run, whereas my version would just give the batter credit for the hit.

  3. Jon Roegele said...

    Really cool Matt.

    Yeah as Peter mentioned, I could imagine the specifics of the batted ball might explain some of the outlier cases. If Hanley pulls a lot of line drives to left field, no matter how good the baserunners are on first, they are probably not legging it out to third.

    I agree that overall this sort of thing would probably even itself out for most players. I wonder how much being able to classify hits even into bins for left/center/right would change your run expectancy tables for each starting state/event combination.

  4. Matt Hunter said...

    Great point, Jon. Definitely lots of ways to improve on the approach. More data on where the ball is hit would help a lot for sure.

  5. James Gentile said...

    Great stuff, Matt! It’s fascinating to see that names you’d expect to show up st the top—like Torii Hunter—actually show up at the top. And the difference isn’t really something to scoff at—seven runs for Mauer is something to keep in mind. Really loved this!

  6. Professor Longnose said...

    Very interesting. This is just about the right detail for me to follow, so it’s a good way fo rme to keep up on saberstats.

  7. Michael said...

    So, could Hanley’s numbers be the result of a conservative 3rd base coach? It seems that looking at players taking the extra base could open up these results to that influence. Could you look at how whole teams did compared to the average expectation, then you could possibly find that some third base coaches are more aggressive, while others are more conservative. Although you may want to include some sort of baserunning rating/component, so that a coach isn’t getting more credit just due to having fast runners.

  8. MGL said...

    Wait, I thought that RE24 already used the average change in RE, and not the actual change. At least, I always thought that was the case. I guess not, though.

    Matt, you want to at least separate out the IF hits from the OF ones, if that data is easily available.

    Also, power hitters will advance runners more often because the OF has to play deep, so by assuming normal BR advances for all player, you are going to shortchange the power hitters and over value the slap hitters. And of course slap hitter get more IF hits and power hitters get fewer ones, so that if you don’t separate IF from OF hits, you are going to reward the slap hitters and punish the power hitters even more.

    So, I am not sure I like this approach at all. In fact, I think your first instinct was a better approach, which is to leave the RE24 alone, using actual base runner advances to do the calculations and then just separate out the base running by using the baserunners’ UBR.

    None of these versions is nearly perfect and I am not really sure which is the best one.

    Then again, I’m not crazy about RE24 anyway. If you are going give a batter credit for the timing (at least bases and outs-wise, not inning and score) of his hits, then you might as well go the full 9 yards and give him WE credit, and not RE credit (which is WPA I guess). Thus, a hit or an out with a big lead or deficit is not going to be worth much of anything.

  9. Brian Cartwright said...

    When calculating the expected (mean) rates of advance for each situation, did you consider which field the ball was hit to (left, center, right), what type of batted ball (grounder, liner, fly) or how many outs?

    This week I was coding my own baserunning/throwing metric. Happily, I had the same number for Heyward’s 2012 season (+7.4), but I thought the above parameters may help explain Hanley Ramirez at the plate – assuming most of his singles are to left field, the average advance from first or second base to third base is much less than if the ball is hit to center or right field.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current day month ye@r *