Yet another pitching metric

It’s been quite an offseason for pitching metrics. First, there was the controversy over the 2009 Cy Young award voting. Then there were some interesting discussions over the Christmas holidays at The Book Blog about FIP in response to the 10 questions asked by Mike Silva. Followed by Baseball Prospectus’ announcement of a new pitching metric called SIERA. Followed by even more discussions about SIERA and pitching metrics in general at both The Book Blog and BP.

If you have bothered to read this far you have probably followed at least some, if not all of this. So why should I clutter the landscape with yet another pitching metric when there are already DIPS, LIPS, FIP, xFIP, RA+, tRA, tRA*, PZR, SIERA, and others crowding for your attention? Well, there seems to be a need for one, because I think we may have taken a wrong turn when DIPS was introduced.

Don’t get me wrong, Voros McCracken’s idea that we know almost everything we need to know to predict a pitcher’s future performance by actually leaving out information and considering only those events over which the pitcher has complete control was inspired. The wrong turn to which I referred was that predicting future ERA was how we should measure the success of the new metrics.

Why in the world would we want to predict a pitching statistic that we know is heavily influenced by both the quality of the pitching team’s defense and luck? Isn’t the pitcher’s “true talent” what we really want to predict? It’s as if when James introduced Runs Created and Palmer and Thorn introduced Batting Runs, that we tested to find which was better by seeing how well they predicted RBIs. Of course, we did something equally stupid and tested to see how well they predicted team runs, but we’ll leave that rant for another post.

The single inspiration for beginning to work on a new metric came from this comment by Matthew Cornwell at The Book Blog in response to Mike Silva’s question about FIP:

Which “leaves out” more?

ERA – defensive support, leveraging/quality of batters faced, park factors, bullpen support, the pitcher’s responsibility regarding unearned runs
FIP – event timing/sit. splits/LOB%, etc., what BABIP skill does exist, pitcher defense, DP inducing, HBP, WP, leveraging/quality of batters faced, park factors, XBH prevention, pick-offs, controlling running game.

Most pitchers can’t prevent enough runs by controlling the running game or limiting doubles or defending their position well in any given season to make a huge difference in their FIP or ERA. That is why FIP works so well at a seasonal level – it leaves in what pitchers control the most, and as a fair trade-off for most pitchers (the Glavine’s being examples,) takes out what is least impactful and controllable. However, over 15-20 seasons, those secondary run prevention tools add up to be tons of runs for many pitchers.

Take RA+ – if you could just adjust for defensive support you should get pretty close to “true” RA+ for long career guys. BABIP and HR/FB have had enough PA’s to stabilize, leveraging and quality of batters faced is not a huge factor for modern starters, park is considered already, and bullpen support tends to be a smallish factor for most pitchers over long careers. Outside of defensive support, what else would dramatically skew a long- tenured pitcher’s “real” RA+ level?

I guess my point is, given a very long career, some defensive-adjusted RA+ would be better than FIP or ERA. And then use FIP for future performance and evaluating pitchers with only a handful of seasons under their belt. FIP definitely is very useful. Like many have said, it does what it is intended to do.

Matthew’s comment seemed to concisely express the unease that many were feeling. Were existing metrics leaving out real skills that some pitchers possess and are important to success? This has been a common complaint about all the DIPS-related metrics since DIPS was introduced, and seems to stem from the confusion between metrics that are primarily descriptive, and those that are primarily predictive.

As I read Mathew’s list of complaints for FIP above, I realized that one of my favorite methodologies could be fashioned into a pitching metric that would correct some of the deficiencies that he identified. Such a metric would be close to the ideal descriptive metric and also might lead to new ideas for improving predictive metrics.

That methodology is RVA, or Run Value Added—you may be more familiar with its Fangraph name of RE24. Run Value Added was introduced by Gary Skoog in an article in the 1987 Bill James Baseball Abstract.

The concept is simple. Take the run value (from the RE table) for the baseout state that exists at the beginning of a play, and subtract it from the run value at the end of the play, plus the number of runs that scored on the play. For batters this is the best descriptive metric of the run value that he adds during a PA. It would be very close to perfect if it wasn’t for the problem of apportioning the value of extra bases taken by the runner.

As a predictive metric RVA is not as good as Linear Weights (which is simply the league-average RVA for an event) because there is little indication that batters are able to control their hitting by baseout state.

RVA has never been used as a pitching metric because it zeros out for each inning. Actually it doesn’t zero out at zero, but each inning’s RVA is simply the number of runs scored in the inning minus the league average runs scored per inning. Therefore, using runs allowed by a pitcher is much simpler and just as accurate.

But, If we separate the RVA that occurs on the DIPS events — HRs, Ks, NIBBs, and HBPs — from the non DIPs events all non-HR hit balls except ROEs and safe FCs, and then processed each group separately, we could have a very good descriptive pitching metric. And that would be DIRVA+.

DIRVA without the plus is an acronym for Defense Independent RVA, and is the pitcher’s RVA for the DIPS events, minus what an average pitcher would be expected to have totaled for DIPS events RVA for the same number of innings. The plus in DIRVA+ is the run value of the non-DIPS events, minus his team’s average run value for those events for the year. This basically subtracts out the value that the defense adds and compares the run value the pitcher adds to what an average pitcher on his team adds.

Is this a perfect measure of a pitcher’s control over his hit balls? No, for several reasons. The team’s pitching staff as a whole may be better at preventing runs on hit balls than the league average. But DIPS theory says that no individual pitcher has much control over his hit balls, so in the aggregate a staff of pitchers should have a variance in this ability even closer to zero.

A more problematic concern is that a particular pitcher’s hit-ball location distribution may vary from the staff average, and either have more balls hit to the better defenders or the poorer defenders. This problem is correctable, but only with accurate hit-ball location data and much programming. I opted for the less accurate but simpler method that could be used for datasets without any hit ball location data.

So what does DIRVA+ do that other pitching metrics don’t? Other metrics use Linear Weights to calculate the run values of events. Using RVA has the advantage of including the sequencing of events. If a pitcher pitches better or worse with men on base, DIRVA+ will show it where other metrics would not. If a pitcher has a high LOB or induces more-than-average DPs or lower-than-average XBHs DIRVA+ will also show it.

Of course, there is a lot of luck in a single year’s LOB, or RISP average, or DP rate, or XBHs. But this is a descriptive stat, so including the luck is OK. DIRVA+ also includes a measure of whether a pitcher is better than average at preventing runs on balls in play independent of his team’s defense. Some believe this is luck, others believe this is a pitcher skill. DIRVA+ doesn’t care because it is a descriptive statistic and both luck and skill count.

Looking at Matthew’s lists above, what doesn’t DIRVA+ do? It’s a pitching stat, so it leaves pitcher defensive value to the defensive metrics. Wild Pitches, pickoffs, pickoff errors, and balks are included. Steals and passed balls are not, because the catcher has a significant portion of responsibility for them. DIRVA+ is not adjusted for parks since it is a descriptive stat. The starting pitcher’s RVA is completely independent of his bullpen support. The numbers that I will be presenting don’t adjust for the quality of the hitters.

On the whole DIRVA+ satisfies most of Matthew’s requirements, but not all of them. The only thing left to do is show you who the stat thinks were the best pitchers in 2009.

And the results are:

Top Ten DIRVA+ Starting Pitchers 2009
Pitcher                   Innings   Exp Runs   DIRVA Runs  Hit Ball Runs  DIRVA+
1. Zach Greinke              229       15.7        -41.6         -6.6     -63.9
2. Roy Halladay              239       16.4        -18.1        -11.0     -45.5
3. Adam Wainwright           233       16.0        -20.9         -7.9     -44.8
4. Tim Lincecum              225       15.5        -30.6          2.8     -43.3
5. Jair Jurrjens             215       14.8         -2.6        -25.8     -43.3
6. Chris Carpenter           192       13.2        -18.1        -10.5     -41.8
7. Javier Vazquez            219       15.1        -22.0         -3.9     -41.0
8. Dan Haren                 229       15.7        -11.2        -14.0     -40.9
9. Wandy Rodriguez           205       14.1         -2.5        -16.3     -32.9
10. Ubaldo Jimenez           218       15.0         -8.9         -8.5     -32.4

Looks about right. Just to confuse you, I am sticking to my plan that I used in my BZM defensive metric, and I am showing runs saved by any defensive team member as a negative number. So the Grienke’s -63.9 runs DIRVA+ total is really, really good and not really, really bad.

These are actual runs saved over the course of the season and converted to wins at the same approximate 10-runs-per-win rate used for offensive players. The innings number shown in the table are innings as a starting pitcher, with the fraction portion lopped off. Exp Runs is the number of DIRVA runs a league-average pitcher would have for the same number of innings. DIRVA runs and Hit Ball runs are calculated by the methodology that I have described above, and DIRVA+ is just the sum of DIRVA runs and Hit Ball runs minus the Exp Runs.

Look at the DIRVA+ run values for No. 3 Adam Wainwright down through No. 8 Dan Haren. No wonder the NL Cy Young was so controversial. That’s a pretty tight grouping of pitchers. I am sure the four runs separating No. 3 from No. 8 are within the margin of error for this metric.

The other interesting aspect of the chart for me was the variety of run values for hit-ball runs. Remember these have already been adjusted for the quality of the defense on the pitcher’s team, so what remains should be mostly luck according to DIPS theory. If so, they should regress back toward zero with multi-year sample sizes. You’ll have to wait for Part 2 to find out if they do.

I’ll leave you with a chart of the 10 best relievers by DIRVA+. WPA totals are another powerful way to judge relievers because the leverage of the gamestate is built in to WPA so I am not sure how much using DIRVA+ for relievers adds, but it can’t hurt to compare the two metrics.

Top 10 DIRVA+ Relievers 2009
Pitcher                   Innings   Exp Runs  DIRVA Runs Hit Ball Runs  DIRVA+    LI
1. Michael Wuertz             78        5.4       -16.8         -6.6    -28.8    1.25
2.Joe Nathan                  68        4.7       -12.0        -10.9    -27.6    1.86
3.Andrew Bailey               83        5.7        -9.0        -12.4    -27.1    1.41
4.Jonathan Papelbon           68        4.7       -13.9         -5.8    -24.4    2.17
5. Mariano Rivera             66        4.6        -8.7         -9.0    -22.3    1.72
6. Kiko Calero                60        4.1        -9.3         -8.6      -22    0.93
7. Jeremy Affeldt             62        4.3        -5.6         -9.4    -19.3    1.46
8. Ryan Franklin              61        4.2        -7.8         -7.1    -19.1    1.87
9. Mike Gonzalez              74        5.1       -10.5         -3.4      -19     1.7
10. Phil Hughes               51        3.5       -10.0         -5.5      -19    1.39
11. Jonathan Broxton          76        5.2       -16.6          3.0    -18.8    1.82

I added Jonathan Broxton because of his very high DIRVA runs and his below average hit ball runs.

References & Resources
The LI or leverage index numbers are from Fangraphs where they are licenced from InsideTheBook.Com. Other information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at http://www.retrosheet.org.

Click here to download a spreadsheet with full DIRVA+ results for all pitchers in 2009.

Print Friendly
 Share on Facebook0Tweet about this on Twitter0Share on Google+0Share on Reddit0Email this to someone
« Previous: Book review:  Joe Cronin
Next: Ten things I didn’t know about bullpens »

Comments

  1. David Gassko said...

    Great stuff, Peter. This is definitely a worthy addition to the slew of pitching metrics already out there.

  2. sportz4life said...

    Peter, any metric that measured pitching in 2009 and left out Felix hernandez and Justin Verlander, yet included Wandy Rodriguez and Ubaldo Jiminez has flaws. Not to denegrate either of those pitchers, but they dont belong in the same sentence as Verlander or Hernandez.

    Seems like high K/9 pitchers were penalized

  3. Peter Jensen said...

    sportz4life – High K/9 pitchers are not penalized.  The full value of their K’s are in the DIRVA portion of DIRVA+, and both pitchers did quite well in DIRVA runs above average, Verlander ranking 2nd with 49 runs and Hernandez 9th with 29 runs.  DIRVA (without the plus) is comparable to other predictive metrics like DIPS and FIP and one would expect both pitchers to do quite well in the future

    DIRVA+ is a descriptive metric that also adds the runs that occur on a pitchers in play hit balls above the run rate of his team’s fielders.  Both pitchers played for teams that had strong fielding in 2009.  Hernandez was about average on his hit balls and remained 12th in the full rankings.  For some reason Verlander gave up 30 more runs on his in play hit balls than would have been expected given his fielders.  This lowered his final ranking considerably.  This, of course, varies from year to year and is less predictive than DIRVA of how the pitch will perform in the future.

    You are certainly correct the Verlander and Hernandez should be considered as possessing more “true talent” than Jiminez and Rodriguez.  But Rodriquez played on a bad fielding team and did much better on his in play hit balls.  And Jiminez had good scores on both DIRVA and his hit balls and just had a good year over all.

    There should have been a link to a spreadsheet giving all the information on all 2009 starting pitchers.  I think THT may be trying to add that to the article.

  4. Craig Tyle said...

    Peter, if I read this correctly, all pitchers on the same team will have the same expected results on balls-in-play; thus, a groundball pitcher and a flyball pitcher on the same team would have the same baseline—correct?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current day month ye@r *