Last week, I introduced a statistic, which I called predictive FIP (or pFIP). It was based on the fielding independent pitching (FIP) statistic developed by Tom Tango. The original FIP statistic is an alternative to the traditional earned run average (ERA), which ignores outcomes that are affected by the defense playing behind the pitcher.
The original FIP is contrived as a descriptive statistic that is supposed to be a better indication of the pitcher’s true performance than actual ERA. Despite this, many people have used FIP as an ERA estimator instead of as an ERA-scaled descriptor. The FIP equation as it stands looks like this:
(13*HR + 3*BB – 2*K)/IP + Constant
This equation is descriptive, although it does a better job of predicting future runs allowed (RA9 or ERA) than typical runs statistics.
I ran a few tests on the predictive ability of certain ERA estimators, including FIP, and in almost every case, a simple predictor of subtracting walks from strikeouts beat the more sophisticated estimators.
I found this result to be as interesting as it was confusing. It made some sense that when home runs were added into the projection model, the actual predictive ability of the model would decrease. Home runs are prone to a good deal of random variability and noise; thus, a pitcher’s true talent ability to prevent home runs doesn’t begin to reveal itself until two-plus seasons of work. So, a rational conclusion could be that using just strikeouts and walks, instead of strikeouts, walks and home runs (as FIP does) would be more useful (or predictive) on a year-to-year basis.
At the same time, I felt I was leaving something to be desired by ignoring one of the three true outcomes (K, BB, HR). I’m all for a simple solution to runs allowed estimation, but I felt like there was a chance I could find a simple more predictive metric that included home runs.
This is where the idea of weighting FIP in a predictive instead of descriptive manner came from.
The pFIP I introduced last week, had different (more predictive) weights for home runs, walks and strikeouts than the original FIP. However, when I went about finding these new weights I made a few mistakes, some bigger than others.
First, the biggest and most unfortunate mistake had to do with my data.
To find the weights for pFIP, I used a sample of starting pitchers from 2004-2012, but one of the descriptive years (2011) had a large error. Instead of regressing the 2010 predictors against 2011 runs allowed (RA9), as I planned to do, the 2010 predictors were regressed against 2011 FIP, which threw off the weights and results a good deal.
Home runs seemed to be weighted too heavily in my original statistic, but the correction in data rectified this issue. The corrected statistic took this form:
pFIP = (5*HR + 1.6*BB – 2*SO)/IP + Constant
For those who missed my last few articles here’s an explanation of the regression measures displayed in my results:
The r-squared tells us the precent variation in RA9 in Year x+1 that is explained by variation in the predictor in Year x.
The root-mean squared error of the estimate also tells about the strength of the predictor. It works sort of like a standard deviation; thus, the lower the standard deviation (or RMSE) the better the model.
For the sample, I looked at starting pitchers who had at least 120 innings pitched in Year x and 100 innings in Year x+1, here are the corrected results for the various predictors tested:
Last week, pFIP was the most predictive and it beat all of the other statistics again, but the margin of victory is not as wide here.
The second issue with the pFIP I introduced has to do with denominator of the statistic. When I began this project I wanted to simply re-weight descriptive FIP as it currently stands; thus, innings pitched was my denominator of choice. However, strikeouts is the only one of the statistic’s three components that is part of the denominator. This seemed unfair to strikeouts, especially because strikeouts are the most predictive of the three components.
Also yesterday, Tango showed what the descriptive form of FIP would look like on a per plate appearance basis.
So for pFIP, I converted the formula’s denominator from innings pitched to total batters faced (plate appearances), and the formula changed to this:
(20*HR + 7*BB – 9*K)/PA+ Constant
This change improved the overall r-squared slightly (from .2077 to .211).
Testing out of sample
The final issue with the weights I found from this sample, is just that, they come from only this sample. These weights work great for this sample, because they are based on this sample, so instead of predicting RA9 in Year x+1 like the statistic is supposed to, instead pFIP just describes RA9 in year x+1 with year x data.
To test these weights validity, I looked at sample of starting pitchers from 1996-2004, with the same criteria (minimum 120 IP in Year x and minimum 100 IP in Year x +1).
Here is the pFIP formula I found for this sample:
(15*HR + 7*BB – 9*K)/PA + Constant
As you can see, walks and strikeouts were consistent with the original sample, while home runs differed slightly. The variation in home run weight was unsurprising, because of the variability in home run rates that exist in year-to-year samples.
I combined the two results to get a final pFIP formula for starting pitchers:
(17.5*HR + 7*BB – 9*K)/PA + Constant**
**Note– The constant ranges from 5.15-5.20
This new formula was still the most predictive for the 2004-2012 sample (r-squared = .2098, RMSE = .867). It was also the most predictive for 1996-2004; however, it had much less competition for that sample.
What I mean by less competition is, SIERA and xFIP aren’t published from 1996-2001, due to lack of batted ball data. Thus, I could only compare pFIP to strikeouts minus walks and descriptive FIP:
The simple estimator of strikeouts and walks came close to being more predictive, but fell short of pFIP.
An ERA-form of pFIP
Finally, questions were raised about how much the fact that pFIP was an RA9 estimator instead of an ERA estimator affected the results. So, I reweighted pFIP, using a combination of the two samples, to turn it from a RA9 estimator to a ERA estimator. The formula I found is:
(18.5*HR + 6*BB – 8*K)/PA + Constant**
**Note– The constant ranges from 4.70-4.75
The weights did not change too much from the RA9 estimator, with most of the change occurring in the constant. Home runs received slightly more weight, in comparison to strikeouts and walks.
Here’s how the ERA-version of pFIP stacked up against the other estimators in each sample:
Almost every predictor performed worse, in terms of r-squared, when the dependent variable is switched from RA9 to ERA. Strikeouts and walks were the most affected. The fits (RMSE) are tighter with ERA, only because the spread of RA9 is wider than ERA.
I like where pFIP is going, but there is more work to be done. For some people, the tools we already have could be enough. FIP, xFIP, SIERA and simple strikeouts minus walks do a fairly good job of predicting future runs. Also, projection systems, like PECOTA, Oliver, ZIPS, Steamer, Marcel and others take many more factors into account when going about predicting future runs. Maybe adding another simple predictive number to the mold, is just adding clutter.
I honestly don’t know; however, I like a statistic that can be calculated by hand and predicts runs allowed better than other more complicated/commonly accepted metrics.
So far, I’ve looked only at starting pitchers. There is a fairly large difference between relievers and starters, so for next week I plan on re-working pFIP specifically for relievers, hoping to separate the statistic into two different pFIP’s, the starter version and the reliever version.
References & Resources
All statistics come courtesy of FanGraphs