How Promotions and Demotions Affect Relief Performance

Chris Devenski has gained the most of any reliever in the leverage index this season. (via Keith Allison)

Every year, we see a few major-league relievers get shunted to a lesser role while others get increased responsibilities. In 2017, Sam Dyson received a lot of coverage for his demotion, but Kevin Siegrist and Luke Gregerson have lost the most average leverage index (gmLI) since last year. Meanwhile, Chris Devenski has gained the most gmLI since last season, followed by Matt Belisle.

In June, I explored the relievers who gained or lost the most trust in a single offseason since 1974. Some were acquired by new teams, some were put under new managers, and others simply pitched better. It was a fun list to compile, and I learned some interesting stories about guys like Mike Timlin and Donne Wall.

Talking about relievers losing or gaining trust was fun and interesting from a historical perspective. Sometimes that’s enough. Other times, we want to dig deeper. To that effect, one specific Internet citizen suggested some future research:

The question is a fine one. Answering it will help us understand what effect a demotion or promotion has on actual relief performance. Given today’s focus on relief pitcher roles, which despite Andrew Miller’s 2016 postseason are still the norm in major league baseball, knowing this information could guide bullpen construction and roster management. Plus, the question is just plain interesting. So I dug deeper.

Methodology

For this study, I used the same Year 1 and Year 2 gmLI measurements and data as I did in my last article. I also added in my interpretation of, “Did the reliever respond any differently than he would have otherwise?” A subsequent tweet by Tango clarifies “following” as Year 3 in this Year 1 to Year 2 sequence, so I interpreted “otherwise” to be the player’s Year 3 WAR projections. And I substituted Year 3 WAR for “respond.”

I set the threshold at 150 batters faced for three straight seasons because I wanted to include as many relievers as possible while ignoring flukes. As with many studies, selection bias rears its head here. The study includes only relievers who were good enough to be sent to a major league mound three years in a row. But it’s the data we have. Lowering the threshold any further opens us up to chance changes in gmLI or partial seasons, and I’m not equipped to incorporate those yet.

Examples

Recall from my last article that Mike Timlin’s gmLI increased by 1.40 from 1994 to 1995. He didn’t change teams or managers; he just pitched better, earning a more important role on the team. The question I answered was: How well did Timlin pitch in 1996, the year following his “promotion” vs. his 1996 projection, which is what we’d otherwise expect of him?

Tango’s WARcels method projected Timlin for 0.38 WAR in 1996. This projection seems quite low until you glance at the WAR values it’s using. The 1995 campaign was a breakout 1.3 WAR year for him; in 1993 and 1994 he’d recorded WARs of just 0.2. And 1996 was also his age-30 season, making him old (for a pitcher, anyway). Any reasonable manager, let alone an algorithm, would have had reason to doubt he could have maintained his 1995 breakout into 1996.

But maintain it he did. In 1996, Timlin recorded 1.42 WAR. So after seeing a Year 1 to Year 2 increase of 1.40 gmLI, he overperformed his Year 3 projections by 1.04 WAR. In the scatterplot you’ll see below, I’ll place a point at (1.40, 1.04).

Another example: From 2013 to 2014, Pat Neshek moved from the Athletics to the Cardinals and gained 1.15 gmLI along the way. After a solid season in St. Louis, his projected WAR for Year 3 (2015), was 0.42. He instead compiled only 0.21 WAR. So we add another data point, but this time at (1.15, -0.21).

Last one. Sparky Lyle lost 0.46 gmLI in between 1977, his Cy Young season, and 1978. His 1979 season projected for a 0.23 WAR. Lyle instead recorded 0.57 WAR. Slap a data point at (-0.46, 0.34).

A Hardball Times Update
Goodbye for now.

Results

The following scatterplot shows all 2,279 qualifying season trios:

That’s a pretty blob of randomness. The line of best fit shows a minuscule (r = -0.032) relationship between the gmLI change from Year 1 to Year 2 and WAR overperformance in Year 3. Note the negative slope; relievers who gained trust in Year 2 of a three-year stretch overperformed their Year 3 WAR by less than those who lost trust in Year 2, and vice versa.

The effect on player performance also isn’t large. For every one point gained (or lost) in gmLI, overperformance decreased (or increased) by 0.098 WAR. That’s about one-tenth of a win, which isn’t a lot. WAR isn’t so precise that (for example) 5.4 WAR is meaningfully different from 5.3 or 5.5 WAR. The best you could say is that players who get demoted in Year 2 tend to rise a spot or two in the FanGraphs WAR leaderboard rankings in Year 3.

Also consider that a one-point swing in gmLI is rare. In this study, only 0.61 percent of season-pairs involved a gmLI loss of one or more points. Only 0.43 percent gained one or more point of gmLI. Suddenly even that tenth of a win looks less and less achievable.

Why is a demotion associated with a WAR overperformance increase anyway? Why is the relationship negative? The best reason I can think of is our good friend regression to the mean. Players who got demoted in Year 2 were likely not as bad as they appeared to be. Meanwhile, managers may tend to promote players based on luck masquerading as an improvement in skill.

The short nature of a full relief season allows for enough random variance to fool even the most experienced baseball minds. I’m reminded of Daniel Kahneman describing his experience explaining regression to the mean, which he articulates in his excellent book, Thinking, Fast and Slow:

[The flight instructor] began by conceding that rewarding improved performance might be good for the birds, but he denied that it was optimal for flight cadets. This is what he said: ‘On many occasions I have praised flight cadets for clean execution of some aerobatic maneuver. The next time they try the same maneuver they usually do worse. On the other hand, I have often screamed into a cadet’s earphone for bad execution, and in general he does better on his next try. So please don’t tell us that reward works and punishment does not, because the opposite is the case.’

The instructor was right—but he was also completely wrong! His observation was astute and correct…but the inference he had drawn about the efficacy of reward and punishment was completely off the mark. What he had observed is known as regression to the mean, which in that case was due to random fluctuations in the quality of performance. The instructor had attached a causal interpretation to the inevitable fluctuations of a random process.”

Taking one last look at the data, I bucketed the reliever season pairs into 10 groups and calculated the mean WAR overperformance in each group. Group 1 lost the largest amount of gmLI in year 2, whereas Group 10 gained the most. The following chart shows how each decile performed:

This bar graph shows the same trends as the scatterplot:

  • Relievers who get demoted overperform similarly to relievers who get promoted.
  • Reliever who gain the most gmLI in Year 2 tend to overperform their WAR projections by the least amount in Year 3.

I am struck by the fact that all groups outperform their projections on average. I suspect this fact has more to do with the projections themselves and doesn’t mean there’s a hidden pattern here we’re not seeing. If anyone has suggestions for other projection mechanisms that would be suitable, let me know in the comments.

The conclusion here seems straightforward. I can find only the most tenuous connection between a role change for a reliever and his subsequent performance on the field. If a reliever does struggle or suddenly become lights out, look for other factors besides his recent promotion or demotion.


Ryan enjoys characterizing that elusive line between luck and skill in baseball. For more, subscribe to his articles and follow him on Twitter.
4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Paul G.
6 years ago

Very interesting.

“Regression to the mean” is a phenomenon that can be tricky. Sometimes true performance changes mimic random fluctuations. Sometimes the mean is a true thing, but only because outside forces create an equilibrium that would not exist otherwise. Statistics can fool you. Maybe those students would have performed the same without the instructor’s input but then again maybe they would not.

I think that the relievers who lost their jobs completely and therefore were out of your calculations may be important to understand the phenomenon here. A quality pitcher who goes from closer to mop-up (or the minors) due to fluky circumstances should rebound in theory. However, if the experience is especially stressful or depressing, it can permanently change the player’s performance especially if substance abuse is involved. If the player has gotten a bad reputation as part of the process, it can also reduce their chances at a rebound even if the true ability never actually changed.

Paul G.
6 years ago
Reply to  Paul G.

As a quick follow-up the anecdote used to describe “regression to the mean” is an odd one. The purpose of a flight instructor is to permanently change the student’s skill level. There is no stable mean to regress if the instructor is doing his or her job, at least not until late in the teaching process.

Jared Cross
6 years ago

This is interesting and, like Paul G. said, I think the missing pitchers are likely important. I think that the apparent over-projection may well be the result of setting a minimum of 150 batters in year 3. You show that, conditional on facing 150 batters, these pitchers out-performed their projections and I think that’s to be expected. If we include the pitchers who didn’t face that many batters in year 3 I think that the WARcel over-projection would largely go away. This *might* also explain why the over-projection is somewhat smaller for pitchers with higher LI in year2 since these players are more likely to reach 150 batters in year 3 (or at least not fail to do so for performance reasons) so the bias is somewhat reduced for such pitchers.

Eek Lips
6 years ago

In the second to last paragraph, you mentioned being “struck by the fact that all groups outperform their projections on average.” Isn’t this likely a result of the selection bias discussed in the beginning? I’m not convinced it’s an issue with the projections themselves.