Should we ever use a complex ERA estimator?

In the past weeks I’ve written thousands of words about estimating runs allowed for pitchers. I’ve probably written more than was necessary on the subject, and it really has been a more taxing experience than I originally expected. After last week’s piece on relievers, I had just about run out of ideas for testing future runs with the approach that I was using. Instead of letting go of the subject for some period of time, I reached out for suggestions as to what else I could test in the future.

Some of the most sophisticated minds involved with sabermetrics, Colin Wyers and Brian Cartwright, both gave me good ideas.

Colin suggested that instead of trying to re-weight fielding independent pitching (FIP), which has been my primary concern for a few weeks, I should try to estimate future strikeouts, walks and home runs (the FIP components), instead of runs.

Brian suggested that instead of using walks and strikeouts to re-weight FIP, I should try to use contact and control to create an estimator.

Both were great suggestions that I plan on getting to in the coming weeks; however, their suggestions, brought about one final (hopefully) idea for me and testing my statistic, predictive FIP (pFIP).

The True Talent Idea

Expected Fielding Independent Pitching (xFIP) is a version of FIP that does not use the number of home runs that a pitcher actually gave up, but instead uses the number of home runs that they were expected to give up (or should have). The idea of xFIP lead me to the idea of an expected predictive FIP or (x pFIP).

My idea for a regressed version of pFIP went beyond that of home runs, but I wanted to include walks and home runs, as well.

The majority of the methodology behind this regressed statistic comes from applying the idea of xFIP with the ideas championed by Russell Carlton, Derek Carty and Harry Pavlidis. Their work dealt with the idea of “stabilization” and true talent level.

Their work in simple terms showed that after a certain number of plate appearances a statistic will reach a required correlation coefficient (r). For example, given an r of 0.50, we can consider half (50 percent) of that number to be their skill level and regress the other half back to the league average of that statistic.

Basically, I’m trying to apply the idea of regression to the mean, to each statistic (K,BB,HR) adequately, to get a better prediction model.

The concept resulted in the use of expected strikeouts, expected walks and expected home runs as the three components of a “x pFIP“.

The Study

Predictive FIP is metric created to predict runs starting pitchers; thus, for this test, I only looked at starting pitchers.

I took a sample of starters from 2004-2012 (n=731) who had at least 100 innings pitched in Year X and at least 100 innings in Year X+1. I then regressed their strikeout percentage (K/PA), walk percentage (BB/PA) and home run percentage (HR/PA) in Year X against those percentages in Year X+1, to find r’s for each statistic.

I used HR/PA instead of home run per fly ball, because I tend to stay away from using batted ball data due to the biases in that information.

Here are the correlation coefficients I found:

Statistic r
K/PA 0.77
BB/PA 0.68
HR/PA 0.366

I used these numbers to create an xK%, xBB% and xHR% to use for pFIP. For example, when finding xK% I took strikeouts’ r (.77) and multiplied that number by the starter’s strikeout percentage, and added that number to the league average K% for starters in that year multiplied by .23.

I ran a multiple regression with these numbers against the starter’s next season runs allowed (RA9), and found this statistic for x pFIP:

x pFIP = (50*xHR) + (10*x(BB-IBB+HBP))- (11*xK)/PA + Constant

I tested this number against my original pFIP equation, which used the raw numbers, and used r-squared (r^2) as my measure of choice. These r-squareds tell us the percentage of variation in runs allowed in Year X+1 explained by the estimator in Year X:

Estimator r^2
x pFIP 20.12%
pFIP 19.74%

The regressed version of pFIP was more predictive than my original raw statistic; however, the difference in r-squared is less than 0.4 percent, which is within the margin of error.

From this sample, there was not conclusive evidence to back the assumption that the more complex, regressed, “true talent” predictor would be better predictor of future runs than the original raw statistic.

So, I tested the x pFIP that I found for a sample of starters (same min. 100 IP) for 1996-2004 (n=705) to see if x pFIP would continue to be more predictive (if only slightly) than the unregressed pFIP.

Here are the results:

Statistic r^2
pFIP 23.04%
x pFIP 22.32%

The two r-squareds were very close, but the raw pFIP estimator came out ahead of the regressed version. This, again, was not conclusive evidence for the regressed estimator being the more predictive stat, and, in fact, was evidence against it.

Before I could completely throw out the idea that using regressed versions of strikeouts, walks and home runs would improve the statistic, I found the correlations between Year X K,BB,HR and Year X+1 for the 1996-2004 sample to see if using those numbers would improve x pFIP:

Statistic r
K/PA 0.787
BB/PA 0.731
HR/PA 0.401

I used these r’s to find new xKs, xBBs and xHRs and combined those in a multiple regression to find a new x pFIP equation based on this sample:

((33*xHR)+ (10*xBB)-(12*xK))/PA + Constant

This x pFIP equation resulted in an r-squared of 22.63 percent, which is only a marginal improvement over the statistic found in the first sample, and still falls short of the raw pFIP.

Conclusion

My attempt to regress the components of predictive FIP towards the mean, or true talent level, for the starters in these samples either only offered a negligible improvement over the original statistic, or actually hurt the predictive ability of the stat.

I’ll be the first to admit that this finding probably sounds irrelevant. But I think it does a great job of reflecting a larger over-arching theme.

I lead off this article by saying, I’ve written a ton over the past month about estimating runs allowed. At the start, I had no plans of writing more than one or two articles on the subject, yet the total is now up to six. I never imagined that I would come up with my own idea for a statistic, but now I’ve written four articles about pFIP.

The one recurring theme that I have found to be true through all of the different tests I’ve run from the different samples I’ve gathered, is that of simplicity.

I’ve yet to find a single piece of evidence to back the assumption that a complex estimator was better for predicting runs.

My first test looked at how ERA estimators performed within season for starters, and strikeouts minus walks was the best predictor. I then tested starters on a season-to-season basis, and K-BB was the best again, with FIP a close second. When I tested two seasons of work to predict the next season, basic FIP was the best predictor.

These findings, led to my development of predictive FIP (pFIP), which is as simple of FIP, just with different weights.

Predictive FIP beat all of the other estimators, simple and complex, in predicting future runs for starters, over various large samples.

Finally, I tested to see how well pFIP worked for relievers. pFIP is statistic was created for starters, but worked really well for relievers, as well. However, it only brought a marginal improvement over the extremely simple statistic of strikeout percentage.

All of my tests reiterated this point about simplicity.

There’s a chance that the more complex estimators work better for special circumstances, like pitchers who change teams, throw a ton of groundballs, or are remarkably adept at keeping the ball in the yard. Also, they could add descriptive value over simply using the combination of FIP and batting average on balls in play, but I have yet to hear a really great argument explaining that fact.

What I can tell you though is that when a predictor is made more complex, it must add more predictive value, by a significant amount. If it does not, then my only response can be just two words:

References & Resources
All statistics come courtesy of FanGraphs

0000
Next: WPS and the postseason, part three »

1. JKB said...

“For example, when finding xK% I took strikeouts’ r (.77) and multiplied that number by the starter’s strikeout percentage, and added that number to the league average K% for starters in that year multiplied by .23.”

Great idea Glenn.  I have a couple of suggestions that might improve the equation above.

Instead of using the correlation (r=.77 in the example above) as the multiplier, why don’t you regress Year 2 on Year 1 and use (intercept + Year 2 Coefficient * Year 1) as the multiplier?

Also, you might want to estimate two separate multipliers, one for pitchers that are Above the Leage Average in Year 1, and one for pitchers that are Below League Average in Year 1.  Regression to the mean goes in both directions (in a Gaussian Distribution), so your current formula is biased against pitchers that have a below average Year 1 and regress to average or above average in Year 2.

2. Glenn DuPaul said...

@JKB

Thanks for the comment.  Are you saying in the first suggestion that I regress say a starter’s K% in 2011 onto his K% in 2010? and then use that number to develop a linear regression equation for xK%?

As for the second suggestion, I considered breaking up above and below league average pitchers, but I was trying to keep things as simple as possible in an equation that was already getting complex.

I’m confused that you think my formula wasn’t regressing in both directions back to the mean.

Given a mean K% of 16%
Starter X1: 25%
Starter X2: 7%

Both are equally far away from the mean either above or below average.  Given my equation here are the projections:

Starter X1:22.93%
Starter X2: 9.07%

How is that biased towards the below average pitcher? Maybe I’m confused, but they both look like they’re regressed toward the mean?

3. Bojan Koprivica said...

Good stuff, Glenn.

Are you aware of any attempts to predict future K and BB rates by looking at PitchF/X data, such as velocity, break and location?

4. Glenn DuPaul said...

Thanks man.  Not that I’m aware of.  I think there’s been a ton of research into how velocity affects K-rates, but as far as movement and location, I haven’t seen any.  That doesn’t mean there isn’t or that it is not a good feasible idea, I just personally have not seen it done.

5. asym said...

Nice series, I’ve enjoyed reading it.

6. JKB said...

Hi Glenn,

Understood that the xK% is regressed to the mean.

What I was interested in communicating was that you might get an interesting model by using (xK%-K%),  (xBB%-BB%), and (xHR%-HR%), in the model rather than xK%, xBB%, and xHR% so that the regression to the mean is modeled explicitly.

But I got off on a tangent and the thought got muddled.  Nice work!

7. Glenn DuPaul said...

@JKB

Oh alright, I get what you’re saying now. I could try that, it’d be interesting to see how that affected the results.  Thanks a ton for commenting and the suggestions.

8. chuck said...

why not use hr/fb, adjusted for park effects?

9. Glenn DuPaul said...

@chuck

because I shy away from using batted ball data (HR/FB included), because of the biases in the data.  Also, even if I did use that number, I don’t think it would improve the model by a significant amount… complicating these metrics in most cases does not improve them

10. ksw said...

a priori’, one does not need to estimate runs allowed.