Who is playing the percentages? What are the percentages?

by Max Marchi
October 11, 2011

– You! Strawberry! Good effort today. Take a lap and hit the showers. I’m putting in a right-handed batter.

– Pinch-hitting for me?

– Yes. You’re a left hander, and so is the pitcher. If I send up a right-handed batter, it’s called playing the percentages. It’s what smart managers do to win ball games.

– I’ve got nine home runs.

– You should be very proud. Sit down. Simpson! You’re batting for Strawberry.

(Mr. Montgomery Burns and Darryl Strawberry, manager and right fielder, respectively, of the Springfield Nuclear Power Plant softball team.)

This article was inspired by this thread at The Book Blog, in which Mitchell Lichtman criticizes Tony La Russa’s managerial choices—well, actually “criticizes” is quite an euphemism.

The question is: Your starter has breezed through eight innings and is due to bat. Your team is leading by one run. Do you pinch hit for him?

Tony La Russa did not. It was the do-or-die fifth game of the National League Division Series, Chris Carpenter had blanked the Phillies lineup and was going to lead off the bottom of the eighth.

According to Lichtman, a.k.a. MGL, going to a better hitter was a no-brainer according to the numbers. You can read his arguments and his own words in the original thread, but the following summary should be a reasonable approximation of his position.

You pinch hit because:
1. You get a better chance of producing in your offensive inning.
2. Since every pitcher gets worse as he goes through the opposing lineup time after time, a fresh closer is a better option than a starter going to face the same batters for the fourth time.

Here we’ll expand a bit on point number two. Full disclosure: If I were managing in a deciding game and that situation occurred, I would NOT substitute for my starter.

Baseball talent

Let’s suppose we can visualize the distribution of baseball talent among people. It would probably be something like this.

The guys who actually play baseball should all be on the right end of the curve, with the major leaguers being on the extreme right part of the chart. Let’s zoom in on that part of the chart and focus on pitching talent. Something like the following might be reasonable.

No deep analysis has been performed to place those names on the talent spectrum, thus the positions are absolutely debatable. However, let’s suppose they’re placed appropriately.

We have Justin Verlander and CC Sabathia as the elite pitchers; Carpenter as a good-to-great player (if we were to consider the 2005/2006 seasons, his name would have been more to the right). Ted Lilly, who according to FanGraphs is one win over replacement level, can be considered a legitimate major leaguer. Finally, Dontrelle Willis, who has been attempting comebacks year after year, has to stay in the baseball limbo where the so-called Quad-A players have to live.

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

Starter versus closer

Let’s now try to visualize Carpenter’s effectiveness as the game progresses and compare it with Jason Motte’s effectiveness. Again, the placement of labels on the chart is complete (though a bit educated) guesswork on my part and might not reflect the real values.

Two questions:
1. Why is Motte’s name in the above chart marked with an asterisk?
2. And why is Carpenter’s effectiveness the fourth time through the lineup marked with the “§” sign?

If you compare Carpenter’s and Motte’s ERAs (3.45 and 2.25 in 2011, respectively), or their batting averages allowed, or other more or less advanced metrics, you might be tempted to say Motte is the better pitcher, but we all know that’s not the case.

Motte enters the game in the ninth, so he does not face batters a second, third and fourth time through the game, when he is tired and the opponents have had the opportunity to time his pitches. Also, you have to consider that while Motte can put everything he has on every pitch, Carpenter has to pace himself if he wants to throw multiple innings.

Thus, the asterisk means: Yeah, Motte’s average effectiveness (that is, his ability to prevent opponents from scoring runs) looks better than Carpenter’s average effectiveness, but that’s because he pitches in a different setting. So, in the second chart, Motte’s name would be to the left of Carpenter’s name, not to its right.

Also, if you look at OPS allowed, you would think Carpenter has some kind of resurgence late in the game.

situation			 OPS
First time through the lineup	.676
Second time through the lineup	.745
Third time through the lineup	.732
Fourth time through the lineup	.682

Several pitchers show those kind of numbers. Does it mean that pitchers get some kind of energy injection when they see the finish line? No, you know better.

Starters are allowed to pitch deep into games on nights when they are performing well, but get the quick hook when they show “they don’t have it.” If you forced managers to leave pitchers on the mound until they have completed their fourth time through the lineup no matter of the score, you would not see those “resurgences.”

(One can look at old-timers’ numbers—back when pitchers were supposed to complete their games—to prove this. Don Newcombe is a good example (Baseball-Reference time-through-the-order stats). I have done a cursory look and found similar patterns for Bob Feller, Robin Roberts and Whitey Ford.)

Thus, the “§” means: We have taken care of the selection bias issue.

Summarizing this section: If we suppose I have correctly laid down the labels in the chart, it’s better to have a fresh Motte out to the bullpen than having Carpenter facing the opposing team for the fourth time.

Good and bad days

The sentence closing the previous section is true on average, or if the players are robots always performing at the same level (same performance, same decline each time through the order, and so on).

But players, fortunately, are human beings—if you had some question about Verlander not being human, the postseason games played so far should have convinced you of the contrary—and human beings have good and bad days.

The chart above shows Carpenter’s Game Scores throughout his career. Though we should expect Game Scores variation for robots as well (due to luck), it’s safe to assume a significant portion of the variation in the chart is caused by Carpenter having good and bad days (due to health issues, psychological factors, luck…whatever).

Even in 2005 (shaded on the chart), his Cy Young season, Carpenter had a couple of extremely bad outings. It’s very possible that on those occasions he threw as well as in any other start and simply had bad luck, but it’s quite likely that he was not 100 percent: he might have not slept well the previous night, some minor ailment could have been affecting him, or he simply “didn’t have it” that night.

The chart below should not be too unreasonable.

On his best days, Carpenter can be the best pitcher in the game, while on an awful night he will resemble a back-of-the-rotation pitcher.

What kind of Carpenter was on the mound in the NLDS Game Five?

That question is why teams can not be run by computers.

The average Motte facing the opposing lineup for the first time in the game is better than the average Carpenter facing that lineup for the fourth time.

I’m pretty sure La Russa knows this; otherwise, he would not have relievers in his bullpen. If La Russa decides to leave his ace on the mound for the ninth inning, it’s because he believes Carpenter is having one of his best days. Thus, the chart in La Russa’s mind should look like the following.

Note: Carpenter’s first and second time through the lineup, in this scenario, are literally off the charts.

Carpenter on his best night is probably better the fourth time through the lineup than the average Motte coming out of the pen. Yeah, Motte might also be having the best night of his life, but there’s a difference.

For Carpenter we have eight innings of blanking the mighty Phillies. Sure, it can be the usual Carpenter with a lot of luck on his side, but with eight goose eggs against a powerful lineup we are entitled to shift (if ever slightly) our a priori idea of Carpenter’s effectiveness for the night. With Motte we don’t have any clue, except what he and the bullpen catcher can tell us.

La Russa bets Carpenter is blessed with an inordinately great condition and that Motte is his usual self. Is that bet ill-advised?

Okay, time for some numbers.

I looked at games played in the past 20 years, thanks to the invaluable Retrosheet data. I selected all the instances in which the starting pitcher has completed eight innings giving up one run at most. These should be the circumstances when the manager can believe his starter “has it” and can complete the game.

I removed the games in which the offense had provided the pitcher more than three runs. Thus, we are dealing with situations in which the game is still on the line, and the manager should be trying to maximize his chances. (In a blowout the skipper’s choices could be dictated by having to rest the bullpen or wanting to try a young arm.)

The games were then split in two groups: Games with the starter beginning the ninth (STARTER) and games with a reliever beginning the ninth (CLOSER).

Here’s how the two groups fared, with more than 1,000 games represented in each group.

runs		percentage
allowed      CLOSER  STARTER
  0  		76 	74
  1  		14 	16
  2   		 7  	 5
  3   		 2  	 3
  4+   		 0  	 1

Looking at the numbers above, the decision on whether leaving the starter in or removing him appears as a coin flip. However, the above table can suffer from selection bias, with three possible sources of bias coming to my mind.

It’s not a given that the quality of opposing lineups does not influence the choice of going (or not going) to the bullpen. I believe the quality of offense faced is equal between the two groups, but a check should be done.

On the contrary, I’m pretty sure that the talent of both the closer and of the starter play a role in the decision. If you have Mariano Rivera, you are more likely to give him the ball even when the starter has thrown eight frames of shutout ball, which has the effect of deflating the CLOSER numbers in the table above. But if the starter is a top-notch player (and this is true in many of the games we are analyzing) and the bullpen is not dependable, the manager will lean toward the slow hook, which should deflate the numbers in the STARTER column above.

So what are the percentages?

I would say you could flip a coin and make your decision. Tom Tango, performing different analyses, has arrived at a similar conclusion. And this is noteworthy. Many of us (I, for one) would have believed that leaving the starter is the only choice. Instead, calling in the closer is an equally sensible choice. And when you factor in the pitcher being due to the plate in the National League, it becomes even more sensible.

Giving Lichtman’s post the title Worst managing ever surely attracts some extra clicks, but it also overstates reality, even when you add to the mix the highly-questionable bunt calls not analyzed in this article.

References & Resources
The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Road, Newark, DE 19711.

During the past few days, a lot has been said (and calculated) on this subject at The Book Blog and Baseball Think Factory . I must admit that, while I tried to follow all the relevant threads, I might have missed significant parts of the discussions while writing this article.

15 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Mitch

12 years ago

I’ve always been on the side of leaving in a guy like Carpenter. I don’t know the average production of a typical NL pinch-hitter but I’m certain it’s not too impressive, and you have to factor in that Carpenter is not completely inept at hitting a baseball either. It’s sort of like the old Moneyball conundrum: over a full season, smart pinch-hitting will net you a few extra runs but in one playoff instance, that **** doesn’t work.

Rick Rivas

12 years ago

So the answer is a coin flip. I just finished reading a book on the art of negotiation. I found that people can have a difficult time solving a problem if their approach is anchored in binary assumptions. Sometimes the solution is waiting to be found beyond each party’s assumptions.

This article helped me see my sport debates with friends and the loud debates on sports radio in a new light. Sometimes, based on probability, both parties are right; unless arguing with a Yankee fan, we should be open to this possibility.

tangotiger

12 years ago

Great stuff Max!

Guy

12 years ago

Nice analysis, Max. Was there any difference at all between the two samples in terms of average RA in the first 8 innings, and/or size of lead for the pitcher’s team?

I also wonder if the starters were pulled during the 9th in enough cases to matter? If so, would be interesting to compare the performance of the closers and the starters specificially (in something like wOBA or OPS), excluding other relievers who may have come in.

tangotiger

12 years ago

Max, did you distinguish between home and away?

Can you look only at pitchers pitching in the top of the inning?

Bottom of the 9th offers its own limitations to number of runs a pitcher can allow, especially in your case, where he’s got at most a 2 run lead. If you have a disproportionate number of starters there, then that’s a source of bias.

David

12 years ago

To kind of extrapolate on Mitch’s point, isn’t it equally futile to expect certain results from one inning of pitching as it is from one at bat? Applying the probabilities you developed above on such a small sample size in order to make your decision seems kind of weak. Shouldn’t the decision to leave Carpenter in or not have been based on things like pitch count, how well he threw the last inning, how well Motte’s been throwing, potential match-ups?

Guy

12 years ago

Max:
A couple of other factors to consider are year and league. It could be that NL teams are less likely to allow the starter to pitch the 9th (because he was pinch hit for in the 8th or top of 9th). And it could be that starters were allowed to pitch the 9th more often in the 90s than today. Both of those could possibly bias your results (though I wouldn’t expect the impact to be huge).

Greg Rybarczyk

12 years ago

It seems that unless bringing in the closer is significantly better in terms of expected outcomes, you ought to leave the starter in. This because while doing so, you warm up the closer and have him ready to step in if the starter falters, or tires unexpectedly quickly in the 9th.

If you instead bring in the closer, you limit your options, as if the closer unexpectedly falters or tires prematurely, you can’t go back to the starter…

I was lucky enough to witness a game like this in person on Aug. 23, 2005 in the Metrodome, where Johan Santana pitched 8 innings of shutout, 3-hit ball, opposed by Freddy Garcia who pitched 8 dominant innings, giving up only 1 hit, a Jacque Jones home run in the bottom of the 8th. In the 9th, I was quite surprised to see the Twins call in Joe Nathan rather than run Santana out there again, but Nathan closed it out for the 1-0 win.

http://www.baseball-reference.com/boxes/MIN/MIN200508230.shtml

James

12 years ago

1. Greg, is it practical to consider a closer tiring prematurely in one inning of work? Injury or ineffectiveness I can see, but tiring?

2. Greg again, in this particular situation it’s also important to consider the value of pinch hitter/runner as Carpenter led off the previous inning.

3. David, you are miss applying the meaning of small sample size. The sample consisted of over 1,000 games in each group, which is an extremely large sample size. The fact that the analysis only covers one inning is not a sample size consideration.

4. Mitch, it’s not a conundrum, and it does work. The goal is to put the team in the best position to win, and even if it’s only a small advantage it’s still worth doing. This article was to determine the relative advantage of a starter vs a closer, and it turns out that it’s nearly a toss-up so either choice is equally valid. However, had one side proved superior to the other, it is worth doing because it makes it more likely to win a game, even if it’s only a small advantage. After all, isn’t the whole purpose of the manager to manage the team so it’s in the best position to win?

CircleChange11

12 years ago

@Mike S … yeah, it’s gotta be a first.

Not only would TLR have used Motte in the 9th, but had he PH’d for Carp in the t8, he would have used likely 2 more relievers in the 8th.

Philly alternates their R and L hitters at the top of the lineup, so it goes Utley, Pence, Howard.

TLR may have used Motte against all 3, or used any combination of Scrabble, Rhodes, Dotel, Salas, Motte over the last 2 IP.

To have Carp bat in the t8 and not have him go the distance would have been the head scratcher to me. Once he came to the plate in the 8th, I figured TLR had seen enough of the strikes, groundballs, etc to feel that carp was closing this game out, unless something drastic happened.

There is also the issue of bringing in a neophyte closer into a 1-0 game, on the road, in an elimination playoff game. I’m not comfortable assuming that we see the “average Motte” in that situation.

Andy

12 years ago

Did the starters pitch just as much in the 9th inning? I imagine they would be more likely to be taken out. What was the average IP for each group?

MikeS

12 years ago

I’m all for LaRussa bashing but is the first time anybody has ever accused him of not using his relievers enough?

Jeroen

12 years ago

When making this evaluation should we also account for the fact that relievers are generally more volatile than starting pitchers (Rivera excluded)? If relievers have a larger spread in their production and the chances of holding the lead is more than 50% anyway I can imagine that the starter is the way to go.

goGigantes

12 years ago

Wow, Max… A really fine write-up here, thanks for all the leg work on this. I like that you showed enough thoughtful consideration, but didn’t ‘bore’ us by trying to name ‘ALL’ factors that *might* be involved in the decision-making.

I’m not-so-good at data mining deep metrics, but this curiosity I’ve held for a long time relates to this…

I wonder how the pinch-hitters have done when they’ve been chosen to replace the pitcher in the scenario you’ve used. I won’t ‘assume’ the bench player is a .290 avg with a .340 OBP. Maybe more like a .265/.300? Sound reasonable?

So, I wonder if a large sample size (like 1,000 historic games) shows they had approximately 260-270 HITS or 290-310 ON-BASE… Then, you start getting into ‘clutch factor’ hitting…

And HOW MANY of those hits/on-bases by PH vs. pitcher:
a) add at least an RBI
b) keep a rally going (say, with one or two outs before the at-bat) that helped contribute to further runs from the 1-2-3 hitters, etc.

I’m not asking for you to do all this, I appreciate what you’ve already furnished here. Just got me to thinking about it some more.

Thanks.

goGigantes

12 years ago

Oh yeah, this made me think of a few more things. I look at batter ‘splits’ a lot, especially for switch-hitters. As a Giants fan, lots of Giants hitters hit better AGAINST the match-up. Lefties like Belt, Schierholtz, and Huff (yeah, it’s hard to use the word ‘better’ in a sentence even NEXT TO Huff’s name, I know…) trend to hit LHP’s better.

I’ve also noticed lots of quality hitters around MLB have varying degrees of this. Joey Votto and Buster Posey are quite ‘balanced’ on their splits. Whereas switch-hitting Jimmy Rollins has deteriorating splits as a RHB vs. LHB since 2007.

OTOH, some teams carry a great LOOGY LHP specialist (the Giants have Javier Lopez, for example). Some teams have a great closer (Rivera, as you mentioned; or Valverde). These bullpen ‘specialists’ are a somewhat ‘new’ contribution to baseball. I’d hardly believe TLR would’ve managed his bullpen as he does, if he managed in a time of a standard 4-pitcher rotation and all… Comparing the stats can just get a lil’ wonky then.

I don’t know how one could account for ‘nuanced’ variables, like a certain power hitter having ‘video game ownage’ of a bullpen pitcher (like 9-for-21 with two 3B and two HR) in the grand scheme of this discussion.

But, I find it very curious when I look at splits and I have previously noticed how BAD some hitters slash lines look on the 4th time seeing the starting pitcher, I think Rollins was the worst (BA .184). But most great hitters (Votto, Beltran, Reyes, Kemp, Fielder, McCutchen) are much BETTER seeing a starter for the 4th time in a game. Which, my fuzzy ‘logic’ would also agree ‘makes sense.’

In closing? I’d say it depends greatly on where the pitching strength lies (starter or bullpen). And factors like who is rested, if there is a day game after a night game, who’s at-bat, and who’d be available to pinch hit for the pitcher. A team like the Giants with a sad “noffense” in 2011, would trust ANY pitcher to hold a game more than ANY player to drive in a run. So, a defensive ‘double-switch’ is often called for.

The ‘situational data’ is much too prevalent to make any type of ‘conclusion’ IMHO.

So? Managers can go by ‘feel’ for the game. And can’t second-guess later if things don’t turn out well. And not rely upon the last success to determine their NEXT similar dilemma.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG