Baseball managers have myriad in-game decisions – steals, bunts, hit-and-runs, pinch hitters, and bullpen strategy to name a few. With the addition of instant replay, major league managers now have to decide whether to challenge just about every call during the game.
The decision will be simple in some situations. For example, a batter being out by a mile on a ground out to lead off the game should never be challenged, and a walk-off home run that veers toward the foul pole should always be challenged. The interesting decisions will be made on the plays between those two extremes.
The decision process, and the potential for fans and media to second-guess the decision, was summed up pretty well by Tony Kornheiser on Pardon the Interruption:
You’ve got one on and one out in the first and you don’t like a call. Are you going to drop that flag in the first inning? You have to have a sense of strategy. When it is most going to affect your game?
So, should a manager challenge a one-on, one-out play in the first inning? According to my analysis, the likely answer is “yes.” I found that the expected number of any over-turnable plays in a given game is less than one. Further, the expected number of over-turnable plays that go against your team in a medium-to-high leverage situation is far less than one.
Bottom line for managers: If there is a call that you have even a 50/50 chance of reversing, you should probably challenge, no matter the inning or the game situation. The odds are, there won’t be a better time to use that challenge later in the game.
That’s the 300-word answer. The rest of this post contains the 1,000-word answer of how I came to that conclusion and how teams can use their challenges to maximize win probability.
Without regurgitating the full text of the new rule, the key aspects as it pertains to managers are:
- Managers start the game with one challenge. If they use the challenge and the call gets overturned, then they get another challenge for a maximum of two challenges per manager, per game. If the initial challenge fails, then they can’t challenge for the rest of the game.
- Starting in the seventh inning, the umpire crew may initiate reviews without a manager’s challenge.
- Almost all plays are eligible for review, with the primary exception of balls and strikes.
To analyze this, it helps to define all the factors involved in a “challenge/don’t challenge” decision. Some can be estimated, and others will have to be assumed:
- Number of challenge-eligible plays per game. Balls and strikes are off the table, but this leaves ball-in-play outs, hits and errors. In 2013, there were more than 130,000 of these plays, or an average of 54.2 per game. On a per-inning basis, the average is six per inning in the first eight innings and four in the ninth. Admittedly, this method may eliminate some plays that do not show up in the box score, such as foul balls and pick-off plays, and plays with multiple calls.
- Probability of an “over-turnable” call. This is an unknown variable going into 2014, but should improve as data starts pouring in. Bad calls happen, but considering our base is every non-strike zone play, major league umpires likely make more than 99 percent of calls correctly. For a starting point, I assume umpires make one percent of calls incorrectly. On any bad call, one team would benefit, so assuming bad calls are evenly distributed, any play has only a 0.5 percent probability of being a bad call against your team.
- The seventh inning. Starting in the seventh, umpires will be able to initiate reviews without a challenge, although managers can still challenge. We don’t know how often umpires will review plays later in the game, so again, we have to make an assumption. Considering there is no guarantee an umpire will initiate a review of a bad play, let’s assume a manager still has incentive to hang onto his challenge for the later innings.
- Leverage. This is defined as the difference between the win probability if the call stands or if it is overturned. Managers will try to maximize the impact of their challenge, so using it to review a solo home run that added to a 10-0 lead has far less impact than using it on a home run that broke a 0-0 tie. This is one variable where no assumptions have to be made. It is possible to calculate the leverage of the situation at any point in a game.
- Confidence — the probability of a call being overturned. This is unquantifiable, but important to the decision process. Challenges do not carry over to the next game and can only improve your team’s situation, so the only cost is an opportunity cost. If a manager waits for calls on which he is 90 percent confident a review will overturn the play, he will likely challenge only a handful of calls all season. A manager who challenges any call that has a 10 percent chance of being overturned may use his challenge every game, but will likely find himself in want of a challenge at the end of many games. This is where smart challenge-management will come into play.
With the groundwork laid, we can now crunch some numbers.
For a manager to realistically challenge a call, it must meet three criteria: (1) it is eligible to be challenged; 2) you believe it it is an incorrect call and; 3) it is against your team. Using the aforementioned data and assumptions, a manager should expect 0.27 possible beneficial challenges per game. Further, he restricts his challenges to only high leverage plays – 1.5 LI or higher – it would calculate to an expected 0.05 bad, challenge-able, beneficial, high-leverage calls per game.
This is where the basic strategy is formed. The odds are, there will be a challenge-able, high-leverage call that goes against your team only once every 20 games, so when one happens, challenge it, no questions asked. There will likely be only one challenge-able, average-leverage call that goes against your team once per week, so you should challenge those every time too.
That’s the basic strategy, but now things get interesting. When should a manager challenge a low-leverage play? When should a manager challenge a call that isn’t obviously wrong?
The decision (like anything else) comes down to benefits and costs. The benefit of a challenge is added win probability, calculated as the difference between the state of the game if the call is overturned and the state of the game if the bad call stood.
For example, take a third-inning grounder to shortstop by Peter Bourjos. He is called out, but appears to be safe on replay. On the out, Bourjos lowers the Cardinals’ win probability to 40 percent. If he were called safe, it would rise to 45 percent. In this example, the benefit of a successful challenge would be 5 percent win probability added.
The cost of a challenge is the chance that a bad call will happen later in the game, on a play with more impact, with no way to challenge it. Let’s say manager Mike Matheny has a time-traveler in his dugout who knows that a bad call in the sixth inning will cost his team 10 percent win probability. In that (ridiculous) scenario, the benefit of the early challenge is five percent WPA, but the cost is 10 percent WPA. Cost outweighs benefit, so Matheny shouldn’t challenge the Bourjos grounder in the third inning.
Because managers do not know how the rest of the game will unfold, or how it will be umpired, this is where things get a little more complex.
As we said above, a manager can expect an average of 0.27 obviously challenge-able calls per game, and only 0.05 high-leverage challenge-able ones. This number decreases as the game progresses, because less of the game is left. It stands to reason that leverage increases as the game progresses, which is true, but not dramatic. In every first inning of 2013, 13 percent of challenge-eligible plays were high leverage (1.5 LI or more). That number creeps up every inning to 30 percent in the ninth and 78 percent in extra innings. Remember, this is just the number of challenge-eligible plays, not necessarily ones where bad calls were made.
Taking into account the number of plays left in the game, the probability of a bad call, and the rising probability of high-leverage situations yields the following graph, showing the expected number of bad calls at any point in the game:
The increase in leverage does not offset the decrease in plays left in the game. Therefore, as the game progresses, the opportunity cost of losing a challenge decreases.
To calculate the expected value of a challenge, you take the value of the overturned call multiplied by the probability it is overturned, minus the expected value of a future challenge multiplied by the probability that the challenge fails, forfeiting the opportunity to make a future challenge. From there, if the expected value is positive, then it is a good challenge, if not, then it is better to save the challenge for the prospect of a better opportunity.
Here is what a full example would look like:
In the first inning, a questionable call happens on a play in which the team stands to gain eight percent WPA if the call is reversed. The manager feels he has a 10 percent chance to overturn the call, so the expected benefit is 0.8 percent WPA. At that point in the game, there are 0.24 expected beneficial challenges left in the game, 0.05 expected beneficial high-leverage challenges and 0.01 expected beneficial very high-leverage challenges. Take those values multiplied by their associated WPAs, and the estimate that this particular challenge will fail 90 percent of the time and it yields an opportunity cost of about 1 percent WPA. The opportunity cost is higher than the expected benefit, so the manager should not challenge.
Graphing the combinations of inning, confidence and leverage produces a sort of cheat sheet, showing when a manager should challenge and when he should not:
This graph can be read by picking a leverage on the x-axis, a probability of a successful challenge on the y-axis, and finding where the associated point lies. The different colored areas correspond to the inning of the game. If the point falls to the left, then the manager should not challenge, if it falls to the right, then he should.
The graph makes the case that managers should challenge far more than they probably will this season. Even an average-leverage play in the first inning should be challenged if there is a 30 percent probability of the call being reversed. Once the game reaches the later innings, managers should be more and more challenge-happy.
If this seems strange, consider this thought experiment: A manager should challenge the final play of every game his team lost, no matter how obvious the call. At that point, there is no opportunity cost of a failed challenge, so even the smallest probability of an overturned call would mean a positive expected value for the challenge. Again, there is no reward for ending the game with a challenge in your pocket.
Admittedly, this analysis hinges entirely on the assumption that umpires call 99 percent of plays correctly, leaving just one percent with even a chance of being overturned on review. If it turns out that umpires make 98 percent of calls correctly, leaving two percent with a chance of being overturned, then it would double the opportunity cost of using a challenge and lead to a much more conservative challenge-management strategy. If it goes the other way, and umpires actually make more than 99 percent of all calls correctly, it would call for an even more aggressive challenge strategy than is presented in this analysis.
A couple of predictions for the inaugural year of the new era of replay:
- This topic will become yet another saber-vs-traditional media battleground. If a manager follows the fairly aggressive challenge-management plan laid out in this post, he will make more challenges than any other manager, be wrong more often, and not have a challenge to use on crucial late-game calls more often than any other manager. This is a recipe for media scrutiny and a lack of job security. In football, teams should go for it on fourth down more often, but because of the off-the-field costs of going for it on fourth down and failing, coaches are more conservative than they should be.
- Quantifying manager quality is one of the last big white whales of the stats community. Simply put, it is very hard to isolate the quality of a manager from that of his team. Challenges will be a fascinating new data stream that can be attributed directly to managers, and will no doubt be used to evaluate them. It is also highly likely that these data will be used incorrectly by the mainstream media. Challenge success percentage is the knee-jerk stat to evaluate, but this does not tell the full story. The optimal challenge-management strategy would mean using the challenge every game, and being wrong more often than not.