So, who’s in charge here? In 2004, Yadier Molina made his major league debut alongside Chris Carpenter‘s Cardinals debut, the beginning of one of the most successful batteries in recent baseball history. In the decade since, Carpenter has owned a 65 percent caught stealing rate; during the same time, Molina has caught roughly 40 percent of would-be base-stealers. While Carpenter had a quick move to the plate and excellent control, many would argue that it was Molina’s presence behind the plate that led to his success (in all facets of the game). But could it be the other way around? Did Molina benefit from Carpenter’s presence on the mound? If so how much?
Like most interactions in baseball analysis, there is no clear way to distribute credit for the outcome. It is much debated how the value of a caught stealing should be divided among the three parties involved (the pitcher, catcher and baserunner). How much of a caught stealing should we grant to the pitcher on the mound? Does the credit lean toward the pitcher or the catcher? However unclear the picture remains on the surface level, the answer depends on numerous factors that are very much related.
Say, Carpenter (if he were still active) is on the mound and the Cardinals manage to put away Billy Hamilton on an attempted steal of second. In this situation, we assume that Molina is the main cause, because his talent at throwing runners out is well-known, tested, and consistent among many different battery mates. In singular cases, like this, it is easy to generalize which party is at fault or deserves credit — Molina is one of the few guys with the ability to throw out Hamilton. Whether through scouting information (pop-times, move-to-mitt, the runner’s jump) or play-by-play data, we know Molina has the tangibles and the record to pull it off.
The Molina-Hamilton showdown is the most extreme example I could think up. But imagine the normal spectrum where average pitcher meets average catcher meets average baserunner. There, it isn’t quite as easy to determine credit objectively. This is usually where we use modeling to make sense of it
Where Baseball Meets Basketball Analytics
Most of my research revolves around the battery dynamic — as I call it “The Battery Effect” — or the idea that there are so many confounding variables in a single battery interaction (a wild pitch or a stolen base attempt) that it’s wrong to assign all credit to one party.
For some reason, baseball decided to reward all caught steals and passed balls to the catcher and all wild pitches to the pitcher, instead of divvying responsibility “n-times” among “n-parties”. My previous research and the correlations suggest that the pitchers “control” most every battery outcome, but as you’ve heard before, correlation does not always equal causation. So I’ve been searching for a better way to adjust for “The Battery Effect” and assign credit more responsibly among the pitcher and catcher.
My attention was turned to a popular Basketball analytics model in Regularized Adjusted Plus Minus (RAPM) by Nik Oza — founder of the the Georgetown Sports Analysis, Business and Research Group. RAPM adjusts each player’s plus-minus (the accumulative sum of score margin while the player is on the court) to reflect the quality of his teammates and opposition, while accounting for home court advantage. Without RAPM, plus-minus (+/-) suffers from many of the same problems we observe in the battery setting — like co-linearity between teammates and lineups. These problems are inherent in any team sport, where the line between one player’s contributions and another’s influence is blurred. While baseball for the most part is a collection of individual contributions, perhaps no other situation is more similar to that of a team sport than the exchanges among the pitcher, the catcher and the baserunner — so a RAPM framework in this setting makes sense. The question is who affects whom? What are the player’s true contributions independent of his teammates?
Note: Feel free to skip this section ahead if you’re not interested in the mathematical details.
Introducing Regularized Adjusted Plus-Minus For Battery Related Outcomes
The idea behind any RAPM framework, like the one suggested by Nik Oza, is a ridge regression which penalizes coefficients and regresses them towards zero. The penalty factor is known as lambda, and the lambda that minimizes the error of the model is chosen and can be computed through a program like R. The ridge regression can adjust for co-linearity of variables and decrease the variability of the prediction while sacrificing some validity (introducing bias).
Say that pitcher A has pitched only with catcher B behind the plate or a majority of pitcher C’s innings have come with catcher D — the ridge regression does a good job at adjusting for the relationship between the two battery mates. With that in mind, each pitcher, catcher and baserunner becomes an independent variable in our model — with their respective coefficients reflecting the impact of their presence on an battery outcome. However, because the ridge regresses towards zero, in the future I would like the model to take on a more Bayesian approach by regressing with some prior (determined by pitcher handedness) in mind and a minor league prior for rookies. This can be achieved without a RAPM framework: In the past, Jared Cross, of Steamer Projections, has used a Mixed Effects Model to assign credit among battery mates while adjusting for batting order.
Many other variables could affect a stolen base attempt, in addition to the battery or baserunner:
- Score margin
- Wild pitch
- Passed ball
- Pitcher handedness
- Hitter handedness
With those factors in place I added them to the model and pulled all opportunities of the last three years from Retrosheet play-by-play. I grouped each line of my data by each opportunity where a runner was on first or second with no lead runner ahead. Later, I created dummy variables for every baserunner, pitcher and catcher possible. With the other factors above I plugged and chugged these numbers through the R package “glmnet”, and adopted and amended Jacob Frankel’s code used to calculate RAPM found here. (My code can be found here for replication’s sake).
The predictor I sought was caught steals per opportunity (what we will call adjCS+/-). I ran through and applied a similar framework to stolen base attempts, to get adjSBA+/-. We don’t want stolen base attempts in the denominator of our response variable, just because we don’t want to assume the same frequency of attempts for all parties involved. Instead, we know that attempting a stolen base against one catcher/pitcher tandem is much different than another. For this reason, I grouped by opportunity — or the number of times a baserunner was in a selected base state against the battery. I included only base-states where the runner could steal the base ahead of him, given no one was directly in front of him on the basepaths.
Back to the Carpenter dilemma. As we mentioned, Carpenter did have immense success keeping runners from stealing, with a 0.25 caught steals per stolen base attempt above average. However, when converting his plus-minus into an expected caught stealing percentage (xCS%) we see a large differential between his observed success and his adjusted performance. The difference between his CS% and his xCS% is nearly 20 percentage points (65 percent versus 48 percent). So when adjusting for the quality of his backstop (mostly Molina) his adjusted performance nets three runs less than his career track record suggests. (Note: to derive xCS% I used caught stealings divided by attempts as the response variable; the rest of the article uses caught stealings divided by opportunities to keep it consistent with SBA+/-).
Oh yeah, that’s three runs over a decade. So even when observing the most extreme drop-off between unadjusted CS% and expected CS% for a pitcher, we don’t even see even half a win as a consequence. Simply put, the pitcher does not accumulate enough stolen base attempts against him (unless his reputation is awful) for his expected performance to mean much. While a pitcher may “control” most battery outcomes, it’s not what they are sought out or selected to do, and for good reason. Meanwhile, some catchers and their pitchers are not as far off in contribution as the eye would suggest.
However, the opposite is true of a catcher. He accumulates many more attempts against him by virtue of his job — to sit behind the plate for a thousand-plus innings a year. However, a catcher’s value is not only in his arm but in his ability to reduce attempts through his reputation — the same is true for certain feared pitchers. So it is also important to introduce a statistic that objectively defines a player’s reputation, or the effective amount of stolen base attempts added or subtracted per opportunity (adjSBA+/-).
Overall, the predictability of both adjSBA+/- and adjCS+/- was weaker than previous seasons’ caught stealing percentage. I didn’t expect this method to be predictive in the first place. My reasoning is that if this were an independent measure of pitcher/catcher defensive contribution, then year-to-year caught stealing percentage would not reflect their “defensive skill” — since the remaining influences of last year’s similar battery and similar environment will remain.
However, I expected that when evaluating team switchers, we will see that their independent evaluation metrics will overtake caught stealing percentage in predictability. To test this out, I took all team-switchers from 2003-2013 and ran a regression of year one adjCS+/- and adjSBA+/- on year two caught stealing percentage. Only 40 catchers had switched teams from 2003-2013 with at least 500 opportunities with runners on the base paths. Their year-to-year relationship between caught stealing percentages was a mere 2 percent compared to 8 percent between adjCS+/- and CS%. So, in honesty this is not a predictive tool to begin with. Descriptively, adjSBA+/- could explain 70 percent of the variation in a pitcher’s CSruns in the same year, but there were little other relationships of interest. Instead I find interest in its use as a descriptive tool to tell us who affected the battery’s performance overall, which we will come back to later.
Below are the 2011-2013 numbers (Note: Retrosheet removes pickoffs from caught stealings, so caught stealing percentage is without pickoffs while the regression included it. See Mark Buehrle for discrepancy between these two.*)
Of course, the “Runsadded” is just an estimate, mostly because of a rough estimation of SBA value — which can be debated in the comment section.
Here I took adjCS+/- and adjSBA+/- and factored in the number of opportunities each catcher/pitcher had (this means runner in our selected base states). Multiplying these metrics by opportunities gives us a rough aggregate of their contribution (CSadded and SBAadded). The rough assumption here in turning this into runs is the value of a SBA added or subtracted. In general, the fewer stolen bases the better, since we are not testing the probability of a success or failure. So I took the difference between SBAadded and CSadded and multiplied by the value of a stolen base. Say that a player added two caught stealings and -50 stolen base attempts. Then, technically, he added -52 stolen bases. I know this is a rough way of assigning a run value but it will do for now until we can objectively define the value of a stolen base attempts added or subtracted — which I would like to be related to individual catcher/pitcher break even points, much like this framework.
|Paul Lo Duca||14||87||-12.8||108||468||4980||23%|
These numbers are on a counting basis, so what about on a pure rate statistic level?
Below are the leader boards from 2011-2012:
|Best and Worst Pitcher Reputation|
|Best and Worst Pitcher Caught Steals Added|
|Best and Worst Catcher Reputation|
|Best and Worst Catcher Caught Steals Added|
In the Scope of the Battery
Like the previous research I have conducted has shown, a pitcher’s adjusted CS performance correlates with the battery CS performance at a 1.5/1 ratio with catcher past CS performance. With this in mind, I also used our proxy of reputation to see which battery mate had more of an impact on the amount of stolen base attempts that took place under their watch.
When comparing CS+/- and SBA+/- with the battery numbers the following was found:
|Comparing CS+/- & SBA+/- With Battery Numbers|
The interesting note here is that it is the pitcher “reputation proxy” (adjSBA+/-) that correlates best with the actual SBA% of the battery — and the margin is not even close. For anyone who thinks that a base runner steals off the pitcher, this is more evidence in their favor.
When I return to this topic, I’d like to see Bayesian priors for both pitchers and catchers. These can be based off pitcher handedness and we have plenty of minor league catching data to build from. I think the inclusion of the above will improve the predictive value of this framework.
Meanwhile, I’d like to see dummy variables connected to pitch location, and swing/no swing. These factors have an effect on caught stealing percentage. On this front, pickoffs would need to be removed — which would require a link between PITCHf/x and Retrosheet.
References & Resources
- Thanks to Jared Cross, for his help with R Code and Method, and to Nik Oza, for his consult and his idea to adapt RAPM for the running game.
- Greg Rybarczyk, The Hardball Times, Stolen base attempts: an algorithm for allocating run value
- Jacob Frankel, Hickory High, How To Calculate RAPM
- Tangotiger, Evaluating Catchers