Yet another pitching metricby Peter Jensen
March 08, 2010
It’s been quite an offseason for pitching metrics. First, there was the controversy over the 2009 Cy Young award voting. Then there were some interesting discussions over the Christmas holidays at The Book Blog about FIP in response to the 10 questions asked by Mike Silva. Followed by Baseball Prospectus’ announcement of a new pitching metric called SIERA. Followed by even more discussions about SIERA and pitching metrics in general at both The Book Blog and BP.
If you have bothered to read this far you have probably followed at least some, if not all of this. So why should I clutter the landscape with yet another pitching metric when there are already DIPS, LIPS, FIP, xFIP, RA+, tRA, tRA*, PZR, SIERA, and others crowding for your attention? Well, there seems to be a need for one, because I think we may have taken a wrong turn when DIPS was introduced.
Don’t get me wrong, Voros McCracken’s idea that we know almost everything we need to know to predict a pitcher’s future performance by actually leaving out information and considering only those events over which the pitcher has complete control was inspired. The wrong turn to which I referred was that predicting future ERA was how we should measure the success of the new metrics.
Why in the world would we want to predict a pitching statistic that we know is heavily influenced by both the quality of the pitching team's defense and luck? Isn't the pitcher's "true talent" what we really want to predict? It's as if when James introduced Runs Created and Palmer and Thorn introduced Batting Runs, that we tested to find which was better by seeing how well they predicted RBIs. Of course, we did something equally stupid and tested to see how well they predicted team runs, but we’ll leave that rant for another post.
The single inspiration for beginning to work on a new metric came from this comment by Matthew Cornwell at The Book Blog in response to Mike Silva’s question about FIP:
Which "leaves out" more?Matthew’s comment seemed to concisely express the unease that many were feeling. Were existing metrics leaving out real skills that some pitchers possess and are important to success? This has been a common complaint about all the DIPS-related metrics since DIPS was introduced, and seems to stem from the confusion between metrics that are primarily descriptive, and those that are primarily predictive.
ERA - defensive support, leveraging/quality of batters faced, park factors, bullpen support, the pitcher’s responsibility regarding unearned runs
FIP - event timing/sit. splits/LOB%, etc., what BABIP skill does exist, pitcher defense, DP inducing, HBP, WP, leveraging/quality of batters faced, park factors, XBH prevention, pick-offs, controlling running game.
Most pitchers can’t prevent enough runs by controlling the running game or limiting doubles or defending their position well in any given season to make a huge difference in their FIP or ERA. That is why FIP works so well at a seasonal level - it leaves in what pitchers control the most, and as a fair trade-off for most pitchers (the Glavine’s being examples,) takes out what is least impactful and controllable. However, over 15-20 seasons, those secondary run prevention tools add up to be tons of runs for many pitchers.
Take RA+ - if you could just adjust for defensive support you should get pretty close to “true” RA+ for long career guys. BABIP and HR/FB have had enough PA’s to stabilize, leveraging and quality of batters faced is not a huge factor for modern starters, park is considered already, and bullpen support tends to be a smallish factor for most pitchers over long careers. Outside of defensive support, what else would dramatically skew a long- tenured pitcher’s “real” RA+ level?
I guess my point is, given a very long career, some defensive-adjusted RA+ would be better than FIP or ERA. And then use FIP for future performance and evaluating pitchers with only a handful of seasons under their belt. FIP definitely is very useful. Like many have said, it does what it is intended to do.
As I read Mathew’s list of complaints for FIP above, I realized that one of my favorite methodologies could be fashioned into a pitching metric that would correct some of the deficiencies that he identified. Such a metric would be close to the ideal descriptive metric and also might lead to new ideas for improving predictive metrics.
That methodology is RVA, or Run Value Added—you may be more familiar with its Fangraph name of RE24. Run Value Added was introduced by Gary Skoog in an article in the 1987 Bill James Baseball Abstract.
The concept is simple. Take the run value (from the RE table) for the baseout state that exists at the beginning of a play, and subtract it from the run value at the end of the play, plus the number of runs that scored on the play. For batters this is the best descriptive metric of the run value that he adds during a PA. It would be very close to perfect if it wasn’t for the problem of apportioning the value of extra bases taken by the runner.
As a predictive metric RVA is not as good as Linear Weights (which is simply the league-average RVA for an event) because there is little indication that batters are able to control their hitting by baseout state.
RVA has never been used as a pitching metric because it zeros out for each inning. Actually it doesn’t zero out at zero, but each inning’s RVA is simply the number of runs scored in the inning minus the league average runs scored per inning. Therefore, using runs allowed by a pitcher is much simpler and just as accurate.
But, If we separate the RVA that occurs on the DIPS events -- HRs, Ks, NIBBs, and HBPs -- from the non DIPs events all non-HR hit balls except ROEs and safe FCs, and then processed each group separately, we could have a very good descriptive pitching metric. And that would be DIRVA+.
DIRVA without the plus is an acronym for Defense Independent RVA, and is the pitcher’s RVA for the DIPS events, minus what an average pitcher would be expected to have totaled for DIPS events RVA for the same number of innings. The plus in DIRVA+ is the run value of the non-DIPS events, minus his team’s average run value for those events for the year. This basically subtracts out the value that the defense adds and compares the run value the pitcher adds to what an average pitcher on his team adds.
Is this a perfect measure of a pitcher’s control over his hit balls? No, for several reasons. The team’s pitching staff as a whole may be better at preventing runs on hit balls than the league average. But DIPS theory says that no individual pitcher has much control over his hit balls, so in the aggregate a staff of pitchers should have a variance in this ability even closer to zero.
A more problematic concern is that a particular pitcher’s hit-ball location distribution may vary from the staff average, and either have more balls hit to the better defenders or the poorer defenders. This problem is correctable, but only with accurate hit-ball location data and much programming. I opted for the less accurate but simpler method that could be used for datasets without any hit ball location data.
So what does DIRVA+ do that other pitching metrics don’t? Other metrics use Linear Weights to calculate the run values of events. Using RVA has the advantage of including the sequencing of events. If a pitcher pitches better or worse with men on base, DIRVA+ will show it where other metrics would not. If a pitcher has a high LOB or induces more-than-average DPs or lower-than-average XBHs DIRVA+ will also show it.
Of course, there is a lot of luck in a single year’s LOB, or RISP average, or DP rate, or XBHs. But this is a descriptive stat, so including the luck is OK. DIRVA+ also includes a measure of whether a pitcher is better than average at preventing runs on balls in play independent of his team’s defense. Some believe this is luck, others believe this is a pitcher skill. DIRVA+ doesn’t care because it is a descriptive statistic and both luck and skill count.
Looking at Matthew’s lists above, what doesn’t DIRVA+ do? It’s a pitching stat, so it leaves pitcher defensive value to the defensive metrics. Wild Pitches, pickoffs, pickoff errors, and balks are included. Steals and passed balls are not, because the catcher has a significant portion of responsibility for them. DIRVA+ is not adjusted for parks since it is a descriptive stat. The starting pitcher’s RVA is completely independent of his bullpen support. The numbers that I will be presenting don’t adjust for the quality of the hitters.
On the whole DIRVA+ satisfies most of Matthew’s requirements, but not all of them. The only thing left to do is show you who the stat thinks were the best pitchers in 2009.
And the results are:
Top Ten DIRVA+ Starting Pitchers 2009 Pitcher Innings Exp Runs DIRVA Runs Hit Ball Runs DIRVA+ 1. Zach Greinke 229 15.7 -41.6 -6.6 -63.9 2. Roy Halladay 239 16.4 -18.1 -11.0 -45.5 3. Adam Wainwright 233 16.0 -20.9 -7.9 -44.8 4. Tim Lincecum 225 15.5 -30.6 2.8 -43.3 5. Jair Jurrjens 215 14.8 -2.6 -25.8 -43.3 6. Chris Carpenter 192 13.2 -18.1 -10.5 -41.8 7. Javier Vazquez 219 15.1 -22.0 -3.9 -41.0 8. Dan Haren 229 15.7 -11.2 -14.0 -40.9 9. Wandy Rodriguez 205 14.1 -2.5 -16.3 -32.9 10. Ubaldo Jimenez 218 15.0 -8.9 -8.5 -32.4Looks about right. Just to confuse you, I am sticking to my plan that I used in my BZM defensive metric, and I am showing runs saved by any defensive team member as a negative number. So the Grienke’s -63.9 runs DIRVA+ total is really, really good and not really, really bad.
These are actual runs saved over the course of the season and converted to wins at the same approximate 10-runs-per-win rate used for offensive players. The innings number shown in the table are innings as a starting pitcher, with the fraction portion lopped off. Exp Runs is the number of DIRVA runs a league-average pitcher would have for the same number of innings. DIRVA runs and Hit Ball runs are calculated by the methodology that I have described above, and DIRVA+ is just the sum of DIRVA runs and Hit Ball runs minus the Exp Runs.
Look at the DIRVA+ run values for No. 3 Adam Wainwright down through No. 8 Dan Haren. No wonder the NL Cy Young was so controversial. That’s a pretty tight grouping of pitchers. I am sure the four runs separating No. 3 from No. 8 are within the margin of error for this metric.
The other interesting aspect of the chart for me was the variety of run values for hit-ball runs. Remember these have already been adjusted for the quality of the defense on the pitcher’s team, so what remains should be mostly luck according to DIPS theory. If so, they should regress back toward zero with multi-year sample sizes. You’ll have to wait for Part 2 to find out if they do.
I’ll leave you with a chart of the 10 best relievers by DIRVA+. WPA totals are another powerful way to judge relievers because the leverage of the gamestate is built in to WPA so I am not sure how much using DIRVA+ for relievers adds, but it can’t hurt to compare the two metrics.
Top 10 DIRVA+ Relievers 2009 Pitcher Innings Exp Runs DIRVA Runs Hit Ball Runs DIRVA+ LI 1. Michael Wuertz 78 5.4 -16.8 -6.6 -28.8 1.25 2.Joe Nathan 68 4.7 -12.0 -10.9 -27.6 1.86 3.Andrew Bailey 83 5.7 -9.0 -12.4 -27.1 1.41 4.Jonathan Papelbon 68 4.7 -13.9 -5.8 -24.4 2.17 5. Mariano Rivera 66 4.6 -8.7 -9.0 -22.3 1.72 6. Kiko Calero 60 4.1 -9.3 -8.6 -22 0.93 7. Jeremy Affeldt 62 4.3 -5.6 -9.4 -19.3 1.46 8. Ryan Franklin 61 4.2 -7.8 -7.1 -19.1 1.87 9. Mike Gonzalez 74 5.1 -10.5 -3.4 -19 1.7 10. Phil Hughes 51 3.5 -10.0 -5.5 -19 1.39 11. Jonathan Broxton 76 5.2 -16.6 3.0 -18.8 1.82I added Jonathan Broxton because of his very high DIRVA runs and his below average hit ball runs.
References and Resources
The LI or leverage index numbers are from Fangraphs where they are licenced from InsideTheBook.Com. Other information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at http://www.retrosheet.org.
Click here to download a spreadsheet with full DIRVA+ results for all pitchers in 2009.
When he was ten, Peter caught a foul ball hit by Ted Williams at Griffiths Stadium. He keeps hoping, but so far life hasn't gotten any better than that.