Stolen base attempts: an algorithm for allocating run value

It is customary to credit a runner if he runs on the pitch and reaches the next base safely without the benefit of any major misplays by the defense. We call this a stolen base.

But how much credit does the runner really deserve? After all, there are always other players involved in a stolen base attempt, and frequently these other players are more responsible than the runner for the advancement, or the out that results. The pitcher may ignore the runner and allow a long lead, or a walking lead, or he may execute a very slow delivery to home plate. The catcher may bobble the pitch, or execute a slow exchange and release; his throw may be off target, or weak. The fielder at the play base may drop or miss the throw, or may fail to apply the tag.

This article presents an algorithm for logically dividing credit on stolen base attempts among the participating players, sharing the run value of the play result based on the quality of their performances.

To keep things simpler, this algorithm will cover only situations with a sole runner on first base who attempts to steal second base, where the play does not result in a passed ball or wild pitch, and where there is no defensive error or any additional advancement by the runner beyond the play base. Other plays involving these situations have their own algorithms, which will be discussed elsewhere.

The algorithm will also consider contributions from only the runner, pitcher and catcher, leaving consideration of the fielder’s contribution for future discussion.

Play value

How much is a successful steal of second base worth to the offense? How much does a failed attempt at second base cost the offense? These questions we will answer using run expectancy values. Run expectancy refers to the expected number of runs scored from each of the 24 run-out states. Here are the RE numbers for 2012:

image

The run value of any play can be determined by calculating the change in run expectancy from the initial to final state. So, with a runner on first, a successful steal of second base with none out changes the RE from 0.858 to 1.073 (the run value for a man on second, none out), a change of +0.215 runs. If the attempt is unsuccessful and the runner is thrown out, the change in RE is from the initial 0.858 to a final value of 0.263 (the run value for bases empty, one out), a net change of -0.595 runs. With one out, the play values are +0.144 runs for success, and -0.411 runs for failure. With two outs, the values are +0.097 and -0.221 runs.

A brief aside here: it is important to keep in mind that the RE values listed above are an aggregation of all major league data, so the precise run expectancies for a situation may (will) be different, depending on the players involved. Figuring out how the odds shift in particular situations is one of the things a good manager does. Figuring out the “centerline” odds and making them available to the manager is one of the things a good analyst does.

So, how does one go about deciding if the potential reward of a stolen base is worth the risk? Let’s do the math. The “break-even point” (BEP) is the success rate for attempts for which run value gained on successes and run value lost on failures balance each other. It is given by the equation:

BEP = CS Value / (CS Value – SB Value)

For zero outs: BEP = (-0.595)/(-0.595 – 0.215) = 0.735 = 73.5%

So, if one can exceed a 73.5 percent success rate, attempting to steal second with none out will be beneficial in the long term. If not, one would be advised to not try for the steal, although making an attempt from time to time when the odds say not to will help to keep opposing teams from becoming too accurate in anticipating one’s tactical moves.

With the groundwork now laid, we can move on to discussing the attribution of credit/blame for the outcome of stolen base attempts. When trying to allocate performance value on a play, we first must identify the participant players. For a stolen base attempt at second base, there are four participants: the runner, pitcher, catcher and a fielder (we will treat any advances or putouts that take place after the main as separate). Let’s consider each of the participants for a moment.

The participants

The runner is the most important player in any stolen base attempt, in the sense that he (among the involved players) decides when there will be an attempt, and of course he can unilaterally decide not to attempt a steal as well. Another important aspect of the runner’s involvement is that the runner’s performance forms one complete side of the confrontation: the runner’s reaction to the pitcher’s first move begins the play, and his initial touch of second base ends the play (if a tag hasn’t ended it sooner). The elapsed time between the pitcher’s first move and the runner’s touching second base is the key metric for the runner.

The pitcher’s delivery time to home plate governs the first portion of the defensive side of the stolen base attempt. The pitcher’s performance is relatively independent, in that the pitcher generally cannot alter his delivery or pitch selection based on the actions of the runner. The pitcher’s impact on the runner’s lead and/or jump, including the influence of the handedness of the pitcher, is significant, but will be discussed elsewhere.

The catcher exerts a huge influence on stolen base attempts, naturally. Unlike the runner and pitcher, the catcher does not begin his performance from a “clean slate”; he inherits a different situation on every stolen base attempt, based on the pitcher’s delivery time and the first portion of the runner’s sprint to second base. The catcher’s performance is encapsulated in the amount of time between his first touching the pitch and the arrival of his throw in the glove of the fielder covering second base. The accuracy of his throw is, of course, important, but will be discussed elsewhere.

A Hardball Times Update
Goodbye for now.

The fielder’s task is simple, if not always easy: He catches the throw and applies the tag. In this discussion, we will assume the fielder catches the throw and applies the tag, and we will not consider the value he provides by doing so; analysis of the fielder’s contribution will be discussed elsewhere.

Calculating individual player values

A sampling of stolen base attempts from the 2011-13 seasons was analyzed, with times measured for segments of the play corresponding to the performances of the runner and pitcher. The data for successful and unsuccessful attempts were separated, and probability density functions (PDFs) were fit for each category. The PDFs were then weighted and combined to yield plots which show the likelihood of success vs. the runner’s and pitcher’s times.

Runner time chart:

image

Pitcher time chart:

image

Note: due to the limited size of the sample, these plots should be considered approximate, and those who wish to make use of this algorithm should avail themselves of a larger sample of data, ideally a full season or more. However, the effectiveness of the algorithm is not dependent on the precision of the charts, and the focus of this discussion will remain on the algorithm.

Upon measuring the runner or pitcher’s time, and using the appropriate chart to convert the time to a “Safe %,” the weighted value of the performance is calculated by multiplying the Safe% by the SB Value, multiplying (1-Safe%) by the CS Value, and adding the two numbers.

Runner’s Value: The first value contribution to be calculated is that of the runner. It is determined as follows:

  • Measure the runner’s time, which is the time elapsed between the pitcher’s first move and the runner touching second base. Even if the runner is tagged out, the runner’s time is counted to the instant he touches second base.
  • Consult the runner’s time chart and find the corresponding Safe% for the runner’s time.
  • Multiply the Safe% by the SB Value, multiply (1-Safe%) by the CS Value, and add the two numbers. This is the Runner’s Value.
  • Example (using values for zero outs): for runner’s time = 3.26 seconds, the corresponding Safe % is 90.0%. Multiply 90.0% by +0.215, and add (1-90.0%) times -0.595, which equals +0.134 runs. This is the Runner’s Value. A positive number indicates a favorable contribution for the runner (adding runs), while a negative number indicates an unfavorable contribution (reducing runs) .

Pitcher’s Value: Next, the pitcher’s value is determined, as follows:

  • Measure the pitcher’s time, which is the time elapsed between the pitcher’s first move and the pitch touching the catcher’s glove.
  • Consult the pitcher’s time chart and find the corresponding Safe % for the Pitcher’s Time.
  • Multiply the Safe% by the SB Value, multiply (1-Safe%) by the CS Value, and add the two numbers. This is the Pitcher’s Value.
  • Example: for pitcher’s time = 1.33 seconds, the corresponding Safe% is 68.5 percent. Multiply 68.5% by +0.215, and add (1-68.5%) * -0.595, which equals -0.040 runs. This is the Pitcher’s Value. The negative number here indicates a favorable result for the pitcher (reducing runs).

 

Catcher’s Value: Finally, the catcher’s value is determined, as follows:

  • The Catcher’s Value is calculated as the overall run value of the play result (i.e. SB Value or CS Value) minus the sum of the Runner’s Value and Pitcher’s Value.
  • Example: given the inputs above (Runner’s Value = +0.134 runs, Pitcher’s Value = -0.040 runs), the Catcher’s Value will depend on whether the runner is safe or out at second base. If the runner is safe, the Catcher’s Value = +0.215 runs – (+0.134 runs) – (-0.040 runs) = +0.121 runs. If the runner is out at second, the Catcher’s Value = -0.595 runs – (+0.134 runs) – (-0.040 runs) = -0.689 runs.
  • If the runner successfully steals second base with a very fast time, and the pitcher’s delivery time to home is extremely slow, the sum of the Runner’s Value and Pitcher’s Value could in an extremely rare instance exceed the SB Value. In this case, the Catcher’s Value would be negative (i.e. reducing runs, i.e. a good defensive contribution), which would not make sense on a play where the catcher had essentially no impact on the play and the runner was safe. In this case, the Catcher’s Value is set equal to zero, and the Pitcher’s Value is adjusted so that the total play value equals the SB Value.

If the runner is safe in our example, the credit/blame is allotted as follows:

  • Runner’s Value: +0.134 runs
  • Pitcher’s Value: -0.040 runs
  • Catcher’s Value: +0.121 runs
  • Total Run Value: +0.215 runs

If the runner is out in our example, the credit/blame is allotted as follows:

  • Runner’s Value: +0.134 runs
  • Pitcher’s Value: -0.040 runs
  • Catcher’s Value: -0.689 runs
  • Total Run Value: -0.595 runs

Note that the runner and pitcher get the same credit in both instances, because they delivered the same performances. The catcher’s credit depends on whether he was able to receive a pitch at time = +1.33 seconds, and get it to second base in time for the tag to be applied before time = +3.26 seconds. This is a tough play for a catcher to make, and if we do the math, we find that the catcher’s break-even point on this play is 15 percent—if he can throw out runners on a play like this more than 15% of the time, his performance is adding value to his team.

Boundary cases:

To satisfy ourselves that this algorithm delivers sensible values, let’s consider some boundary plays (using values for zero outs).

Fast runner, slow pitcher: Runner’s time = 3.30 seconds -> 87 percent safe -> +0.110 runs. Pitcher’s time = 1.65 seconds -> 82 percent safe -> +0.067 runs. Catcher’s Value = +0.047 runs if SAFE, -0.763 runs if OUT. This fits: With a fast runner and slow pitcher delivery, the catcher gets a huge amount of credit if he throws the runner out, but only a small penalty for failing to do so.

Fast runner, fast pitcher: Runner’s time = 3.30 seconds -> 87 percent safe -> +0.110 runs. Pitcher’s time = 1.26 seconds -> 60 percent safe -> -0.109 runs. Catcher’s Value = +0.223 runs if SAFE, -0.588 runs if OUT. The runner’s excellent performance and the pitcher’s excellent performance cancel each other out, leaving the outcome of the play in the hands of the catcher.

Slow runner, slow pitcher: Runner’s time = 3.88 seconds -> 63 percent safe -> -0.082 runs. Pitcher’s time = 1.76 seconds -> 84 percent stfe -> +0.082 runs. Catcher’s Value = +0.224 runs if SAFE, -0.587 runs if OUT. Again, the runner’s performance and the pitcher’s performance balance each other, rendering the catcher’s performance decisive.

Slow runner, fast pitcher: Runner’s time = 3.75 seconds -> 67 percent safe -> -0.052 runs. Pitcher’s time = 1.22 seconds -> 51 per cent safe -> -0.185 runs. Catcher’s Value = +0.461 runs if SAFE, -0.349 runs if OUT. With a slow runner and fast pitcher delivery, the catcher has an easier-than-usual task, and thus merits a big penalty if he allows the stolen base; if he guns the runner down, he gets less credit than in most situations, since the runner and pitcher have essentially done some of his work for him.

Average runner, average pitcher: Runner’s time = 3.56 seconds -> 74 percent safe -> +0.001 runs. Pitcher’s time = 1.40 seconds -> 73 percent safe -> -0.001 runs. Catcher’s Value = +0.223 runs if SAFE, -0.587 runs if OUT. Both the runner and the pitcher have delivered performances that are essentially at the break-even point, which of course means that the catcher’s performance will decide the outcome.

What about “deterrence”?

Some pitchers (typically left-handed ones) are known for their deceptive delivery, which makes it difficult for a runner to detect whether the pitcher is going home or coming over to first; this, of course, makes runners less willing to attempt a stolen base, since they don’t want to be picked off if they read the pitcher’s motion incorrectly. This apparent ability to deter stolen base attempts is usually regarded as a positive feature for a pitcher.

However, it is important to keep in mind that pitchers like this do not deter stolen bases; they deter stolen base attempts, and stolen base attempts end in both positive and negative results for both sides. In 2012, there were 3,229 successful stolen bases, and 1,136 caught stealing, for a success rate of 74.0 percent. The break-even success rate in 2012, based on the frequency of RE24 states during stolen base attempts, and the value of stolen bases and caught-stealings, was about 74.7 percent. In 2012, major league teams in aggregate stole bases at a success rate equal to break-even, meaning the overall run value from stolen base attempts is near zero.

If the average run value of a stolen base attempt is zero, then there is no value, positive or negative, in deterring attempts, on average. A pitcher who generally discourages attempts will allow fewer stolen bases, but he will also benefit from fewer caught stealings, and the net value will be essentially zero. Therefore, no value is attributed to a pitcher for stolen base attempts that do not occur.

Future considerations

There are lots of areas where this stolen base attempt algorithm can be expanded. First of all, the performance values of the participating players can be subdivided, to provide additional insights on specific aspects of their play.

  • The runner’s performance value can be divided into lead, jump, run, and slide.
  • The pitcher’s performance value can be divided into release time, pitch time/speed and handedness (as it pertains to delaying the runner’s jump)
  • The catcher’s performance value can be divided into exchange/release time, throw accuracy and throw power

We discussed earlier that deterrence of steal attempts, such as might come from a pitcher having a very deceptive pitching motion, would not be assigned value, based on the similarity of the break-even rate and the actual success rate. However, a deceptive motion may not always completely deter attempts; it may instead hamper them, as measured by a shorter lead allowed, and/or a slower jump allowed. Future elaboration of the stolen base algorithm may include allotting a portion of the responsibility (run credit) for the runner’s lead and jump to the pitcher, which should allow better modeling of pitchers with deceptive deliveries.

Some other situations that were excluded from this discussion of the basic algorithm can be covered in the future. For example, stolen base attempts at third base, double steals and steals of second with a runner on third who stays put each have their own algorithms. Stolen base attempts where the pitch is off-target and not caught cleanly by the catcher can be considered. Wild throws, and the value added (or lost) by the fielder at the play base can be considered.

There is a lot to consider when diving deep on valuation of player performances; we are only at the very beginning.


9 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Northern Rebel
10 years ago

I’m a stat freak, but I admit you have overwhelmed me with info!
Bill James purported that in order for base stealing to be beneficial, the baserunner has to succeed 2 out of 3 times.
Obviously the timing and circumstances of these attempted thefts are important.
Not to sidetrack the conversation, I would love to see a study of baserunning overall. Many speedsters are terrible baserunners!
Being a Sawx fan, I remember Tony Armas, not a whirlwind by any stretch, was one of the smartest baserunners you would ever find. He seemed to know when to go from first to third, and when to tag and go after a fly ball, etc. I suppose it’s instinct, and teaching can only go so far.

Peter Jensen
10 years ago

Greg – I think this is a good first attempt to divide responsibility for stolen bases.  I am concerned about your arguments about deterrence however.  There is a huge amount of deterrence in just being a left handed pitcher.  Using 2009 to 2011 Retrosheet data the rate of stolen base attempts including pickoffs with first base only occupied is 35% greater for right handed pitchers than for left handers.  And the SB success rate (again including pickoffs) is 67.6% against right handers and 59.4% against lefties.

I also think you have to consider the importance of pitchouts in the caught stealing rate.  Also, break even points change dramatically by ball/strike count and batting order position. 

Finally, I think you need to construct a PDF table for catcher times as well.  You mention the rare case where the runners value and the pitchers value may the SB value resulting in a negative value for the catcher.  However the problem is greater than that.  Even if the pitcher’s time to home plate does not exceed the runners’s time from first to second the difference between the two might be small enough that a catcher might not be able to physically make the throw to second no matter how good he is.  The catcher should not be considered to have any responsibilty in thes cases.

Greg Rybarczyk
10 years ago

Good comments, Peter, thanks.

You will notice that I talked about working more on deterrence in the “Future Considerations” section.  I’m not satisfied with entirely discounting it, but for this basic algorithm, I think it’s reasonable to do so.

You no doubt recognize that deterrence is not always a good thing for the pitcher.  If a pitcher is quick to home, or deceptive enough to induce bad jumps, such that SB attempts against him succeed at less than the break-even point, it would be a bad thing to deter all attempts, since such attempts would favor the pitcher in terms of average run value.  Deterring excellent base stealers, however, is a good thing.  So, a tricky pitcher deters mediocre base stealers (bad for the pitcher), but also deters excellent base stealers (good for the pitcher).  Lots more work to do here to capture all of this, which I hope to share more of soon.

I don’t favor lumping pickoffs and stolen base attempts together.  Obviously they’re related, but they can and should be handled separately.  The practice of labeling some pickoffs as caught stealing muddies the water, and is something I hope to be able to illuminate in a better way.

Pitchouts?  They are already picked up by this algorithm in the (presumably) quicker delivery time to home plate.

BEP by count and/or batting order?  I could see adding that in the next level version of this…

Re the catcher time, I think if you run the numbers (I did), you will see that the exception I described (capping pitcher +run value to keep the catcher at zero) covers this.  Your point is valid, but it’s covered.

Thanks for taking the time to provide your thoughts, they are much appreciated…

Sean
10 years ago

Also cross-posted at Tango’s blog:

Greg, I really like the framework, though my personal experience recording times makes me a bit skeptical of these particular success/fail curves. Four seconds for a runner is a lot of time to still have a greater than 50% chance of stealing the base. Many of the times I personally record are in the 3.4-3.5 range, with the game’s truly fast guys in the 3.3-3.35 range.

Let’s even consider this a different way: If a pitcher from the stretch is somewhere in the 1.3 (slide step) to 1.5 range, and a big league catcher should have a pop time around 1.8-2.0+, our success/fail inflection point should be somewhere closer to 3.5, give or take a tenth of a second or two.

My guess is that the difference is partially accounted for by errant throws, which do not get errors unless the runner takes an extra base, or pitches that hit the dirt. The latter was somewhat controlled for by Greg by removing PB/WP, but again, that typically doesn’t get scored if the runner was going on the pitch and only stole the one base. Also, as Tango mentions, the issue about pitch types and location.

So overall, the success/fail curves may be skewed due to including errant throws and pitches in the dirt, giving (slow) runners too much credit because of the lack of granularity. Or maybe it’s just differences in stopwatch accuracy, though this is a pretty big difference. Greg, maybe you can clarify how these times were recorded?

Peter Jensen
10 years ago

Greg – The value of pitchouts will not be in the quicker time to home plate, presumably a pitcher cannot throw a pitchout faster than his normal fastball, but in the catcher’s faster catch and release time to second.

I think you will find that the numbers for pitchers with good moves to first are going to parallel the numbers that I quoted for left handers versus right handers.  That is that deterrence (fewer stolen base attempts) and a lower success rate on attempts will go hand in hand.

You are already lumping CS and PO together because as you note above some of what is labeled CS are POs where the runner tried to reach 2nd instead of going back to first.  What you don’t mention is “some” is over 50% of the CS of second base for left handed pitchers.

Peter Jensen
10 years ago

Greg – I also think that you may have some selection bias problems.  If a pitcher is able to get the ball to the catcher in 1.20 seconds and still have 40% of the attempted steals successful then he either has a really bad catcher or only very fast base runners are attempting to steal on him.

Greg Rybarczyk
10 years ago

Sean,

  I went in with some preconceived notions about the shape of those curves as well, and what I found wasn’t what I expected.

  I’d definitely encourage you to pull up MLB’s video archive, search on key word stolen base, and time a bunch of attempts that fit my criteria (i.e. 2nd base, no other runners, no wild pitch, etc.)  You’ll understand how fast runners frequently get nailed and slow runners frequently reach safely a lot better if you do this yourself…

Sean
10 years ago

Will do, Greg. Thanks for the research!

Gary Bayer
10 years ago

Greg- I would have liked you to have shown the BEP for all of the potential starting states.  If I did the math right, below is what results. The surprise to me was that the requisite success rate for stealing home with two outs is less than 50%. On the other hand, perhaps the 2/3rds rule seem to be more like 3/4s.

Initial
Outs   Bases   Attempt   BEP
0   0   NA  
0   1   1-2   73.5%
0   2   2-3   77.5%
0   3   3-h   85.2%
0   12   2-3   79.8%
0   13   1-2   78.3%
0   13   3-h   86.6%
0   23   3-h   87.3%
0   123   3-h   88.3%
1   0   NA  
1   1   1-2   74.2%
1   2   2-3   69.5%
1   3   3-h   68.6%
1   12   2-3   73.8%
1   13   1-2   84.5%
1   13   3-h   71.6%
1   23   3-h   72.7%
1   123   3-h   75.0%
2   0   NA  
2   1   1-2   69.3%
2   2   2-3   87.9%
2   3   3-h   33.0%
2   12   2-3   90.7%
2   13   1-2   83.3%
2   13   3-h   39.6%
2   23   3-h   44.0%
2   123   3-h   48.8%