Base Stealer Intangibles (Part 2)by John Walsh
February 06, 2006
Do Base stealers Disrupt the Pitcher When on First Base?
This is the question I first asked in Part 1 of this study. I set out to look at what batters do when a top base stealer is on first and compare that to what the same batters do in all situations. Before actually looking at those numbers, though, I considered how batting performance is affected by defensive positioning when there is a runner on first. In that case, I found (not surprisingly) that a lot more ground balls get through the infield; most of them are due to the first baseman having to hold the runner on, but a significant number are also due to the middle infielders playing for the double play when there are fewer than two outs. This effect has to be accounted for when trying see if the best base stealers have a disruptive effect on the pitcher.
I identified the top 10 base stealers during the period 2003-2005 (I call them the Stealers) and considered all plate appearances with those runners on first base. The raw numbers show an overall improvement in hitting when there is a Stealer on first base (S1B for short). Batting average went up 25 points, although the gains in OBP (7 points) and SLG (7 points) were more modest. Still, batters created more runs per game (RC27 = 6.2) when there was a Stealer on first base than they did in all situations (RC27 = 5.7). As promised last time, we now try to determine how much of that improvement is due to defensive positioning and how much due to disruption of the pitcher.
Taking Defense Into Account
In a previous article, I showed how one can account for different defensive alignments when evaluating batting performance in a specific situation. The key point is that you have to consider batting elements that only depend on the batter-pitcher match up, but not on the defense. Most readers are familiar with using only strikeouts, walks and home runs to evaluate pitchers; these are defense-independent stats. To evaluate hitters, I add the number of the different batted-ball types, or trajectories: fly balls, ground balls, line drives and pop-ups, abbreviated with F, G, L and P, respectively.
To translate the trajectories into outcomes, we use a table of probabilities, called the Hit-Trajectory (HT) Matrix, calculated from the play-by-play data. For example, we find from the play-by-play data that, on average, a ground ball will result in an out 77% of the time. Furthermore, its chances of becoming a single, double or triple are 21%, 2% and 0.1%, respectively. Here is the full HT Matrix, determined from the 2003-2005 play-by-play data using all situations:
Average HT Matrix: Out Single Double Triple HR F 0.733 0.056 0.080 0.012 0.119 G 0.767 0.213 0.019 0.001 0.000 L 0.265 0.519 0.176 0.015 0.025 P 0.981 0.015 0.003 0.000 0.000Given this matrix, we can convert the batted-ball types into outs and hits, giving an estimate of what the runner on first base (R1B) performance would have been against a normal defensive alignment.
Before we do that, however, there is one more adjustment to make. THT's own Dave Studeman pointed out in a recent article that not all batted-ball types are created equal. For example, a single fly ball by Barry Bonds is worth about 0.4 more runs than a fly ball by Einar Diaz (this is pretty amazing, when you think about it). Anyway, we have already seen that our sample of batters is not "average," and therefore we cannot use the HT Matrix calculated using all batters. So, I have determined a custom HT Matrix for the batters in this study (refer to Part 1 for a list of top base stealers and the batters that bat behind them), which I show here:
Custom HT Matrix: Out Single Double Triple HR F 0.726 0.063 0.081 0.015 0.115 G 0.754 0.225 0.020 0.001 0.000 L 0.267 0.518 0.173 0.016 0.024 P 0.974 0.020 0.006 0.000 0.000Note that these are the hit (and out) probabilities for the batters in our sample in all situations , not in runner-on-first situations. This will enable us to translate the performance of those hitters from runner-on-first situations to generic ones. The numbers in the custom HT Matrix are only slightly different than the average one, but the differences are significant. Specifically, the batters in our sample have higher hit probabilities for flyball, groundball and pop-up trajectories.
OK, now we can translate the Stealer-on-first base (S1B) offensive line into a defense-independent (DI) performance: home runs, strikeouts and walks remain unchanged in the translation and the batted balls are converted into hits and outs using the HT matrix.
Defense Independent Performance
Let's go right to the numbers:
AB H 2B 3B HR BB K All: 3286 904 180 20 105 381 593 R1B-DI: 3385 968 198 23 94 303 502The first line was shown in Part 1 and now I've substituted the runner-on-first line with its defense independent translation. We can see that the number of hits has been reduced by the translation to a defense independent context, as expected. Curiously, doubles and triples are slightly higher in the defense independent line. Here is the rest of the offensive line:
AVG OBP SLG RC OUTS RC27 All: 0.275 0.351 0.438 504 2381 5.71 R1B-DI: 0.286 0.345 0.441 515 2416 5.76So, there are some differences in the "All" and "DI" rows, which we might expect given that the strikeout, walk and home run numbers, which aren't affected by the defense independent translation, were quite different in the two cases. Taking these numbers at face value, there seems to be a very slight improvement in batter performance when there is a Stealer on first base.
Controlling for Pitcher
One thing that I haven't considered yet is the quality of the pitching. There is reason to believe that the inherent quality of the pitching with a Stealer (or Runner) on first base is below average. The reasoning is that the worse pitchers will put more runners on and will tend to pitch more often with a runner on first base than an average pitcher will. I checked this hypothesis is the following way: I made a list of all pitchers (and the corresponding number of batters faced) for the Stealer-on-first base sample. I then looked at how those pitchers did in all situations and compared them to the average pitcher. I weighted the contribution of each pitcher in the Stealer-on-first base sample appropriately using the batters faced. Here are the results:
AVG OBP SLG OPS | RC27 All pitchers: .264 .335 .423 .758 | 5.16 S1B pitchers: .264 .330 .420 .749 | 5.04So, it appears that, contrary to expectations, pitchers in Stealer-on-first situations are actually a little better than average. This means that that the small "disruption" value of 0.05 runs per game needs to adjusted upward. I think a reasonable way to do this is simply to add the difference found here (0.12 runs) to the "disruption" value found earlier (0.05 runs) to get 0.17 runs/game of disruption. This is probably not mathematically rigorous, but it's likely good enough for our purposes.
Any Runner on first base
The above analysis shows a small effect of "disruption" of the pitcher by a Stealer. But, to get to that conclusion, I had to do some complicated stuff, such as calculating a customized HT matrix, translating batted ball types into a defense independent context and controlling for pitching quality. Maybe there's an easier way: how about considering the case where any runner (Runners), not just Stealers, are on first base (with second base open)? Presumably the important features of the defensive alignment do not change much depending who the runner on first is. (Some very slow runners are not held on, especially when the pitching team is ahead, but I don't think this will change the results much.) Now, we cannot simply compare the batting line with Stealers on first with the batting line with Runners-on-first line, since we've already seen that this would lead to the selection bias discussed above. However, it's fair to compare the improvement in batting that occurs with Runners and Stealers on first base. Let's look at the Stealers again (note, I am not making the Defense-Independent translation here):
Batting Performance with Stealers on 1B AB H 2B 3B HR BB K All: 3286 904 180 20 105 381 593 S1B: 3355 1007 187 14 94 303 502 AVG OBP SLG RC OUTS RC27 All: 0.275 0.351 0.438 504 2381 5.72 S1B: 0.300 0.358 0.448 539 2348 6.19
The following is what we obtain if we consider any runner on first base, with second base open:
Batting Performance with Runners on 1B AB H 2B 3B HR BB K All: 103133 27356 5597 547 3401 9997 19367 R1B: 104327 29439 5791 489 3344 8323 18154 AVG OBP SLG RC OUTS RC27 All: 0.265 0.337 0.429 14908 75777 5.31 R1B: 0.282 0.342 0.443 15824 74888 5.71With a Stealer on first base, the improvement is 0.47 in runs per game (6.19 minus 5.72). When any runner is on first, the improvement is nearly as large, 0.40 runs per game (5.71 - 5.31). So, if we assume the improvement with Runners on first base is all due to defensive alignment, i.e., the Runners as a group don't disrupt, then we may conclude that the Stealers provide an additional 0.07 runs per game by "disruption". There is no need to make a correction for quality of pitcher in this case, since the pitcher quality for the S1B and R1B samples are identical. The value found here (0.07 runs of "disruption") is less than but fairly close to the value we found with the Defense Independent analysis with control for pitchers (0.17 runs).
In an attempt to make things even more straightforward, I thought it might be interesting to look at some simpler things that could indicate a pitcher is getting "rattled" by the Stealer on first: namely walks, hit batsmen and balks. If a pitcher is losing his cool out there, it stands to reason he might make more of these kinds of mistakes.
Walks have actually already been included in the above analysis, and we've seen that walk rate goes down by about 20% when there is a Stealer on first base. About half of this decrease is inherent to the sample of pitchers: the pitchers in the Stealer-on-first base sample walk about 10% fewer batters than the average pitcher. I believe that the rest of the reduction in walks in Stealer-on-first base situations is due primarily to a change in approach on the part of batters and pitchers, and is not caused by any real improvement in control in Stealer-on-first base situations. The batter wants to put the ball in play to advance the runner, and the pitcher wants to avoid putting another runner on via the walk. In any case, there is no evidence to suggest that a Stealer on first base causes the pitcher to increase his walk rate.
Hit batsmen might be a better indicator of pitcher disruption; it has nothing to do with the approach of the pitcher or batter, and it's clearly a (big) mistake on the part of the pitcher (except in beanball wars, but I am neglecting those here). When you look at the data you find that slightly more batters are hit by pitches when a Stealer is on first base. In the sample studied, 42 batters were hit by pitches, while the expected number was 35. However, since such a small number of batters are hit, it's necessary to perform a statistical test on the result to see if there is a real effect or just a fluctuation of the data. In fact, it's fairly likely (approximately 20%) that the observed difference is due merely to statistical "noise" and is not related to pitcher disruption. Statisticians usually require a probability of less than 0.05 to claim that a real underlying effect exists.
What about balks? First, I looked at the overall balk rate for situations where there is a runner (any runner) on first base and second base open. I found that pitchers balked about 2.7 times per 1000 batters faced. When I look for balks with a Stealer on first base and second base open, I find about 5.5 balks per 1000 batters faced. This time, the increase is significant: the p-value is 0.0024, meaning that it's very unlikely that the increase in balks is due to statistical fluctuation.
Wrapping it All Up
The goal of this study was to confirm or refute the notion that good base stealers disrupt the opposing pitcher/defense simply by his presence on first base. I reasoned that any disruption would show up in the performance of the batters who came to the plate with a Stealer on first base. I studied the performance of 219 batters (almost 3700 plate appearances) when one of the top 10 base stealers was on first base (with second base open) and compared that to what would be expected from those particular hitters. I then used a custom hit-trajectory matrix to convert the Stealer-on-first base performance into a defense-independent context that can be compared to the generic case. Finally, a small correction for pitcher quality was found to be necessary.
The results obtained indicate a small effect of disruption, amounting to about 0.17 in RC27, for the Stealer on first base situation. An independent cross-check was made considering what happens with any runner on first and again a very small effect of disruption was found (0.07 runs per game for that check). Finally, I found that pitchers hit a few more batters than expected when a Stealer was on first, although the effect was not statistically significant. It could be real, but we can't say for certain. Pitchers do commit more balks with a Stealer on first base.
Assuming these small effects are real, how much are these base stealer intangibles worth over a season? A typical Stealer is on first base for 180-220 plate appearances per season. That corresponds to about 128 outs (for this group), which adds up to 4.75 games. Assuming the improvement due to disruption is 0.17 runs per game, this gives a measly 0.8 runs over the whole season. Let's throw in an extra balk and a half-HBP (I didn't include the HBPs in the OBP calculation for simplicity) and we get an additional 0.5 runs (more or less), for a grand total of about 1.3 extra runs a year.
So, the next time you tune into the White Sox game and Hawk Harrelson is telling you that "Scotty" Podsednik, by virtue of his ability to disrupt the pitcher, is worth more than what his statistics show, well you now know he's telling you the truth. Podsednik is worth a little over one more run per season.
The above was all written and ready to go when I received an e-mail from Mitchel Lichtman (also known as "mgl" in sabermetric circles). After having read Part 1 of the study, he wrote to say that he thought my sample size of 3700 plate appearances was likely too small to draw any hard conclusions. In any case, he suggested that I quantify in a statistical way the precision of my findings (whatever they turned out to be).
Well, I hadn't worried about this too much, since 3700 plate appearances just seemed like a lot to me, but (as usual) Mitchel was right. I found that base stealers "disrupted" pitchers only to the tune of 0.17 in terms of RC27, but the uncertainty (one standard deviation) on that number is about 0.4 runs. So, what does this say about our conclusion, that "disruption" amounts to 0.17 runs per game? We can say the following:
There is an 84% chance that a Stealer on first base improves batter performance by less than 0.57 in RC27.
I know, it's much more satisfying to be able to say, the disruption effect is X runs per game, but that's life.
In terms of additional runs for a team over the course of a season, we can say, that
It's 84% likely that a base stealer adds fewer than 3.2 runs over the course of the season.
I tried re-doing the analysis using the Top 20 Stealers over the last three years (instead of the Top 10) to increase the sample size. The result I found is a little more stringent: the 3.2 runs per season number goes to about two runs per season.
So, instead of saying that Podsednik adds a little over one run per season due to his "disruptive" powers, we must content ourselves with saying that he very likely adds no more than two runs per season.
References and Resources
- Mark Pankin, Do Base Stealers Help the Next Batters?. This powerpoint presentation contains a wealth of information on the subject at hand. There is no attempt to disentangle the effects of defensive alignment, but many other aspects of the subject are covered.
- Cyril Morong, Does Base Stealing Create Havoc?. This article tries to answer exactly the same question that I've posed. It's an interesting take on the subject, written without the benefit of play-by-play data.
- The folks over at Retrosheet cannot be praised highly enough. They collect, digitize and make available play-by-play data for significant portions of baseball history. Studies like these (and many others!) would not be possible without their efforts.
- Thanks to Dave Studeman who read a preliminary version of this article and made several useful suggestions and also to MGL and other readers of Part 1 who e-mailed with comments and suggestions.
John Walsh dabbles in baseball analysis in his spare time. He welcomes questions and comments via e-mail.