Circle the Wagons: Running the Bases Part I

“Who says there’s an unemployment problem in this country? Just take the five percent unemployed and give them a baseball stat to follow.”
–Outfielder Andy Van Slyke

Way back in 1984 Bill James wrote in The Baseball Abstract:

“Baserunning is perfectly measurable; it can be easily defined and, given properly maintained scoresheets, easily researched. Our lack of knowledge on the subject is attributable entirely to record-keeping decisions that were made a little over a century ago and have never been intelligently or systematically reviewed.”

In other words, our lack of knowledge about baserunning is a matter of historical contingency. In 1845 the first box score appeared in the New York Morning News and contained only runs and outs or “hands lost” in the nomenclature of the day. By the end of the 1850s box scores included nine additional columns per player including foul outs and put outs which contained times catching a ball on one bounce (at that time counted as outs).

A few years later a cricket enthusiast named Henry Chadwick, as documented in Alan Schwarz’s wonderful book The Numbers Game, invented the nine-by-nine grid and the system of letters and numbers that became the standard scoresheet and scoring system used to record the events of games. And although Chadwick as the acknowledged father of baseball statistics codified the definition of base hit, total bases, unearned runs, batting averages and more, neither his scoring system nor his plethora of statistics captured how baserunners advanced around the bases outside of counting when a runner reached base and if that runner scored.

An attempt to do so was made, however, when stolen bases were tracked beginning in 1886 and included both traditional stolen bases and “extra” bases gained on hits. For example, a runner going from first to third on a single would be credited with a stolen base. Under these rules Hugh Nicol was credited with 138 stolen bases in 1887. The modern stolen base definition was adopted in 1898 (although the AL didn’t count caught stealing until after 1919 and the NL until after 1950) and took with it any attempt at quantifying baserunning. Things have pretty much remained the same ever since.

These and other problems with the traditional scoring system led James and John Dewan (who later went on to co-found STATS, Inc) to create Project Scoresheet in the late 1980s. Under Project Scoresheet, volunteers across the country used a new scoresheet format and scoring system that better captured the information that tends to get lost in the cracks of Chadwick’s method. The essence of the system is that each of Chadwick’s cells representing plate appearances is subdivided into three sections – pre events, the primary event, and post events. By capturing this level of detail, the tracking of play-by-play data is possible. The codes that were developed by Project Scoresheet now show up in the event files published by Retrosheet and the codes used by DataCasters like myself for MLB.com’s Gameday system.

All that to say that with the appropriate scoring system in place and the data available, last winter on my blog, I took a stab at quantifying baserunning using the play-by-play data for 2003 and 2004 . Shortly after, James Glick at Baseball Prospectus did the same and published his excellent analysis in the 2005 Baseball Prospectus under the title “Station to Station: The Expensive Art of Baserunning”. However, at the time I wrote on the subject, I could only obtain play-by-play data from 2003 and 2004. Now Retrosheet has made available the event files for 2000 to 2004, and so I thought I’d update my own framework and report the results in this week’s and next week’s articles.

The Methodology

The methodology I used when creating my baserunning framework is really quite simple. First, I examined the following scenarios:

  • Runner on first, second not occupied, and the batter singles
  • Runner on second, third not occupied, and the batter singles
  • Runner on first, second not occupied, and the batter doubles

Although these are not the only possible scenarios that might be used to measure baserunning, I chose them since I assumed they were fairly common and could provide a baseline to measure the magnitude of the difference between good and bad baserunners.

Next, I calculated the number of bases that runners advanced in each scenario, broken down by the number of outs and which fielder fielded the ball. Third, I took these aggregates and created a matrix of expected outcomes. For example, the expected outcomes in terms of percentages encompassing the five year period from 2000 to 2004 are shown in the three tables below.

Runner on first, second not occupied, and the batter singles

Outs/Where   Opp    To2nd    To3rd    Score       OA
0  Other    1611    0.779    0.196    0.012    0.014
0  Left     2696    0.854    0.136    0.004    0.006
0  Center   2393    0.731    0.255    0.005    0.008
0  Right    3114    0.570    0.413    0.007    0.011

1  Other    1709    0.737    0.239    0.008    0.016
1  Left     3453    0.851    0.135    0.006    0.009
1  Center   3018    0.705    0.281    0.004    0.010
1  Right    4060    0.562    0.419    0.007    0.012

2  Other    1945    0.714    0.258    0.014    0.014
2  Left     2886    0.839    0.143    0.009    0.009
2  Center   3219    0.664    0.312    0.013    0.011
2  Right    3278    0.522    0.459    0.013    0.006

Runner on second, third not occupied, and the batter singles

Outs/Where   Opp    To3rd    Score       OA
0  Other    1078    0.718    0.198    0.011
0  Left     1189    0.583    0.405    0.009
0  Center   1267    0.361    0.617    0.019
0  Right    1219    0.555    0.433    0.011

1  Other    1339    0.618    0.269    0.011
1  Left     2199    0.449    0.515    0.031
1  Center   2191    0.239    0.725    0.033
1  Right    2109    0.388    0.578    0.032

2  Other    1791    0.621    0.336    0.038
2  Left     2315    0.098    0.832    0.069
2  Center   2690    0.030    0.932    0.037
2  Right    2220    0.076    0.864    0.062

Runner on first, second not occupied, and the batter doubles

A Hardball Times Update
Goodbye for now.
Outs/Where   Opp    To3rd    Score       OA
0  Other    509    0.566     0.432    0.002
0  Left    1028    0.699     0.281    0.019
0  Center   385    0.460     0.522    0.018
0  Right    728    0.709     0.283    0.008

1  Other    660    0.552     0.447    0.002
1  Left    1333    0.683     0.290    0.028
1  Center   533    0.336     0.585    0.081
1  Right    976    0.686     0.288    0.026

2  Other    641    0.359     0.640    0.002
2  Left    1264    0.498     0.441    0.063
2  Center   499    0.164     0.788    0.048
2  Right    900    0.449     0.492    0.062

The “Other” category includes balls fielded by all other positions.

From these tables it is immediately obvious that both the number of outs and the location where the ball is hit play a large role in determining the advancement of the runner—hence the need to take these into account. For example, with two outs and a man on first when the batter doubles, the runner scores 80% of the time when the ball is fielded by the center fielder, but just 43% of the time when fielded by the left fielder. By using the position that fielded the ball, the differences in left-handed or right-handed hitters hitting behind certain baserunners is also taken into account.

Finally, for each individual player I compared their performance in each scenario to the matrix above (with all positions enumerated and for each season) and calculated not only how many bases they advanced but how that differed from the expected number of bases gained. For example, the expected number of expected bases gained with two outs and the runner on second when the batter doubles to left is:

2.19 = (.513 * 2) + (.428 * 3) + (.059 * -2)

Note that the runner is penalized one base in this situation for getting thrown out since I’m assuming they would have advanced two bases. So if Carlos Beltran scored in this situation he would be credited with .81 bases (3-2.19).

I call Incremental Bases (IB) the sum of the differences between the actual number of bases and the expected number of bases (EB). I christened Incremental Base Percentage (IBP) the ratio of IB to EB . An IBP of greater than 1.0 is good since it means that the runner advanced more bases than expected given the situations they found themselves in, while an IBP of less than 1.0 indicates a bit of a plodder.

The Results

Enough of the preliminaries. First, the top 10 baserunners with 100 or more opportunities from 2000-2004 in total number of incremental bases. Here OA stands for “out advancing” and records the number of times the runner was thrown out in these scenarios.

Name             Opp Bases    EB IB OA  IBP
Juan Pierre      259   411   367 44  4 1.12
Luis Castillo    272   450   410 40  4 1.10
Mike Cameron     168   286   252 34  2 1.14
Cristian Guzman  209   349   315 34  3 1.11
Ray Durham       203   334   301 33  0 1.11
David Eckstein   216   352   319 33  3 1.10
Carlos Beltran   208   346   313 33  0 1.10
Johnny Damon     256   420   390 30  3 1.08
Edgar Renteria   210   338   310 28  4 1.09
Jay Payton       168   280   252 28  3 1.11

And the bottom 10…

Name             Opp Bases    EB  IB OA  IBP
Mike Lieberthal  130   165   192 -27  4 0.86
Richie Sexson    151   203   233 -30  7 0.87
J.T. Snow        150   202   233 -31  5 0.87
Edgar Martinez   178   238   269 -31  3 0.89
Ben Molina       138   177   208 -31  4 0.85
Dmitri Young     139   185   217 -32  9 0.85
Rafael Palmeiro  207   273   307 -34  8 0.89
John Olerud      195   248   285 -37  7 0.87
Bill Mueller     191   256   293 -37 10 0.87
Carlos Delgado   237   324   362 -38  8 0.89

From looking at these lists it is pretty apparent that indeed the system is measuring something—at the very least, raw speed since the first list is dominated by speedsters and the bottom by plodders. In addition, the plodders were thrown out on the bases a significant number of times, which of course has the tendency to dramatically drive down the number of incremental bases since the runner is credited with negative bases in those cases.

As you can see the spread here is between about +40 to -40 bases over the period of five years. In other words, the best baserunners are worth about sixteen bases more per year than the worst over that span. Looking at an individual year, the spread is about +15 to -15 or a span of 30 bases as illustrated by the leaders and trailers for each season (with 25 or more opportunities).

Year Name            Opp Bases  EB  IB OA  IBP
2004 Rafael Furcal    59   101  87  14  0 1.17
2003 Brian Roberts    57    94  81  14  0 1.17
2002 Ray Durham       40    73  59  14  0 1.23
2001 David Eckstein   51    91  77  14  1 1.18
2000 Luis Castillo    57   105  86  19  0 1.22

Year Name            Opp Bases EB   IB OA  IBP
2004 Bill Mueller     47   56  74  -18  3  .76
2003 Juan Encarnacion 41   46  62  -16  5  .74
2002 Frank Thomas     40   46  60  -14  4  .77
2001 Luis Gonzalez    51   63  77  -14  2  .82
2000 Joe Randa        50   64  79  -15  2  .81

But because incremental bases is akin to a counting statistic like RBIs, it is heavily influenced both by the number of times the runner was on base and how often the hitters lower in the order got hits. Therefore, perhaps a better way to rank the runners is using the rate statistic, IBP.

Name             Opp Bases  EB  IB OA  IBP
Jack Wilson      117   203 179  24  1 1.14
Mike Cameron     168   286 252  34  2 1.14
Raul Mondesi     112   190 168  22  1 1.13
Chris Singleton  112   192 171  21  2 1.12
Juan Pierre      259   411 367  44  4 1.12
Vernon Wells     104   175 157  18  4 1.12
Jay Payton       168   280 252  28  3 1.11
Ray Durham       203   334 301  33  0 1.11
Torii Hunter     170   279 252  27  3 1.11
Cristian Guzman  209   349 315  34  3 1.11

Name             Opp Bases  EB  IB OA  IBP
Edgar Martinez   178   238 269 -31  3 0.89
Bill Mueller     191   256 293 -37 10 0.87
John Olerud      195   248 285 -37  7 0.87
Richie Sexson    151   203 233 -30  7 0.87
J.T. Snow        150   202 233 -31  5 0.87
Fred McGriff     110   140 161 -21  2 0.87
Frank Thomas     121   158 182 -24  5 0.87
Mike Lieberthal  130   165 192 -27  4 0.86
Dmitri Young     139   185 217 -32  9 0.85
Ben Molina       138   177 208 -31  4 0.85

So are Jack Wilson and Mike Cameron really the best baserunners of the last five years and are Ben Molina and Dmitri Young really the worst? I don’t know for sure, but once again it seems like the framework produces reasonable results.

Odds and Ends

In looking at the aggregate data a few interesting nuggets turned up:

  • Bobby Abreu, who one would think would be a good baserunner, was thrown out 10 times during the five-year period, which drove his IB to -5 and his IBP to .99, pushing him to 117th out of 177 players with 100 or more opportunities. Bill Mueller and Juan Encarnacion were also thrown out 10 times
  • The leader in getting thrown out in a season was Juan Encarnacion who was nabbed five times in 2003, twice when he was on first and the following batter doubled, and three times when he was second when the batter singled
  • Larry Walker, not a particularly fast man, is ranked 17th in IBP at 1.10, validating his reputation as a good baserunner
  • The most opportunities in a single season was 91 by Juan Pierre in 2003. Ichiro was a close second with 89 in 2004
  • Speaking of Ichiro, it is strange that for a player with such a good baserunning reputation, he comes in with only an IBP of 1.04, good for 65th place out of 177 players with more than 100 opportunities. Still, that’s better than average, and his yearly IBPs were a consistent 1.08, 1.04, 1.04, and 1.01
  • Carlos Beltran had the most opportunities, 208, without ever being thrown out. Ray Durham and Garrett Anderson both had 203 chances without being nailed, but while Beltran and Durham have IBPs over 1.0, Anderson is a conservative runner with an IBP of .96

Next

But of course this isn’t the end of the story. In addition to simply counting the number of incremental bases and calculating IBP there are, as Glick discussed, team effects to consider including the effect of managers and third base coaches as well as park factors. And of course, there is also the issue of converting these incremental bases into the number of runs gained or lost. I’ll explore these and other issues next week.

References & Resources
The Numbers Game: Baseball’s Lifelong Fascination with Statistics – by Alan Schwarz


1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Freedom
8 years ago

Hi, Mr. Dan Fox!

RE: Circle the Wagons: Running the Bases Part I

My name is Freedom, I am a professor in the Dept. Industrial Engineering of one of Korean national universities. . These days, I am working on a baseball simulation. For most of its input, I use players’ data of Korea Pro-Baseball League. Thanks a lot for your nice articles!

Question: Table for runner on second, third not occupied, and the batter singles: the probabilities of each row do not sum up to 1; e.g., those of first row sum up to .927 and those of 5th row sum up to .898. Is it because of the cases in which the runner on second has to stay on the second? Thus remaining .073 and .102 is for such cases?

I would appreciate your kind reply. It will help my runner-advancement modelling greatly. I am willing to share my work with you if you wish.

Freedom from Korea.
“We are bathed in love every minute and every day, for we are LOVE.”