More on explaining the runs gap between starters and relievers

by Glenn DuPaul
November 28, 2012

Last week, I discussed the historical runs allowed (RA9) gap between starters and relievers.

In that piece, I looked at the years 1974-2012 and found that gap between the average runs allowed per nine innings for starters and relievers across baseball seemed to be increasing over time.

I found that there was a significant positive correlation (r=0.29) between year and the RA9 gap. However, when I delved deeper into the data, I found that the increase in usage of relievers, which I defined as the percent of total innings thrown by relievers, was explaining the same thing as time, but with greater strength (r=0.41).

In my mind, that felt like just the tip of the iceberg. I had tested only two predictors, but many different factors could be explaining this gap. Also, percentage of innings pitched was explaining only about 16 percent of the variation in the RA9 gap; which isn’t a whole lot.

Luckily for me, the community responded, offering many suggestions that could improve my results. Suggestions came both in the comments of my original piece and in Tom Tango’s response.

One commenter, Alan Guettel, suggested that I separate the American League and the National League in the test because:

In the American league, where pitchers don’t hit, managers are more likely to yank them before the end of an inning (with none, one or two outs). And with runners on base, I see so often, a reliever gives up a hit or two and allows a run or more, but it’s charged to the starter, while the reliever gets credit for an out or two. It happens in the National League, but not as often because managers often want to pinch-hit for them and save a reliever. If the difference is consistently greater in the American League, it might give some weight to this factor.

THT’s esteemed leader Dave Studeman and another commenter, Nick, both suggested that using innings pitched per reliever as a predictor could improve results because this could lead to better performances for individual relievers and thus also for relievers as a whole.

Tango seemed to make a similar suggestion about testing the innings per reliever, or appearance, as well as suggesting that instead of using 1974 as my start year, I should begin the test in the late 1980s during the time of Dennis Eckersley and Tom Henke.

I brought all of these suggestions together and re-formatted my tests accordingly.

The test

For this article’s test, I used the years 1987-2012. I again found the gap between the average RA9 for starters and relievers in each year, separated by league. I used a multiple regression to attempt to explain the gap over this time period in each league. The predictors I used were:
{exp:list_maker}Year
Percent of innings pitched by relievers
Innings pitched per appearance for relievers*
The average strikeout percentage difference between starters and relievers**
{/exp:list_maker}*Instead of dividing the number of innings pitched for all relievers by the number of relievers who threw in a particular season, I divided the total innings by the number of overall relief appearances in the year. I felt the average length of an appearance was a better reflection of how more and more relievers are being used an leveraged in different ways.

**Relievers tend to strike out more batters on average than starters. So this predictor was calculated by subtracting the average starter strikeout percentage (K/PA) from the average reliever strikeout percentage.

The two regressions yielded vastly different results.

American League

Out of the four predictors tested, the difference in average strikeout percentage was the only significant predictor, at a 95 percent confidence level, for explaining the RA9 gap between American League starters and relievers.

Despite the fact that only one of the four predictors proved to be useful in explaining this gap, the r-squared from the test was not horrible. I found that K percentage, by itself, explained 37.3 percent of variation in the RA9 gap.

This test resulted in an r-squared that was over double the size of what I found in the test from my original piece.

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

My interpretation of the r-squared was that the higher the average strikeout percentage for relievers is above the starters’ average, the larger the gap in average RA9 you’ll see.

National League

I used the same set of predictors for National League test, yet the results I found were in no way similar.

In the National League test all four of the predictors tested were significant at a 95 percent confidence level. The year, the percentage of innings thrown by relievers, the average length of a relief appearance and the average difference in strikeout percentage for starters and relievers added predictive value for the National League gap.

The r-squared I found for the NL gap was pretty high. The combination of these four predictors explained 62 percent of the variation in the RA9 gap. This number was much higher than what I would’ve expected to find and was significantly more powerful than the AL test.

Why did the results of two tests differ so greatly?

Comparing the two leagues

Other than the obvious facts that the two leagues have different strategies, talent levels and rules, the dependent variable for each test differed by a significant margin.

The gap between average RA9 for starters and relievers has not been consistent over the time period tested; if it had, there would be nothing to test.

This inconsistency is common in both the gap for AL pitchers and NL pitchers; however, their gaps differ a good deal. The RA9 gap for NL pitchers was lower on average during this time period than the AL gap:

	AL	NL
Mean	0.446	0.248
Variance	0.021	0.031

I ran a simple t-test to test whether the average gap for AL pitchers was significantly greater than the average NL gap (95 percent confidence). This test resulted in a t-statistic of 4.38; which indicated that the AL average was significantly greater than the NL average.

Not only was the gap significantly lower for NL pitchers, there were actually two seasons (1997 and 2005), in which NL starters had a lower average RA9 than the NL relievers; thus, the gap being explained was actually negative.

I found it interesting that NL gap had a higher variance, yet it was easier to explain than the AL. It’s possible that give the fact that the AL gap was significantly farther from zero, that made the NL gap easier to explain.

Another possibility could have to do with strikeout percentage.

Over this time period, the percent of innings thrown by relievers has risen fairly consistently in both leagues, while the length per relief appearance has been decreasing fairly consistently in both leagues. However, the correlation (r=0.59) between the strikeout percentage gap and time for AL relievers is significantly higher than the NL correlation (r=0.36).

It seems to me that the gap in strikeout percentage for AL relievers is explaining the same thing as the length of appearance and percent of innings thrown by relievers, as those all risen consistently during this time. Thus, given that strikeout percentage had the strongest correlation with the RA9 gap, it was the only significant predictor.

Yet for NL relievers the strikeout percentage gap wasn’t increasing nearly as consistently over this time, which could have allowed the four predictors to work better together to explain the variation more capably.

Future studies

I’ll be the first to admit that the two tests I ran may not be the best way to go about explaining the gap between starters and relievers over time. However, I think working through these numbers brought about some ideas for future studies.

Some possibilities that have either been suggested to me or that I have thought of are:
{exp:list_maker} Testing to see if leveraging platoon advantages in bullpens has improved relievers’ numbers
Seeing if the number of one-batter appearances is increasing (think LOOGY)
Getting a better idea of what is affecting the AL gap other than strikeout rate
Narrowing the sample further and using tests other than multiple regression
{/exp:list_maker}
I’m willing to accept any and all other suggestions in the comment section or over email.

References & Resources
All statistics come from our friends at FanGraphs

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG