This is sort of a follow-up to two articles. The seemingly obvious one is is my article from last week, about how situational pitching might affect a pitcher’s ability to prevent (or allow) runs. In it, I advanced a theory that we could better assess a pitcher’s ability by taking into account how he pitches in particular situations, not just by looking at his overall performance.
How to determine this, however? To do that, I had to lean on a previous article on how well we can predict ERA using current methods. (While referencing that research, I discovered a minor bug that doesn’t change the conclusions. For more detail, see the References section.)
So I conducted a repeat of the previous test of persistance, if you will, in pitching metrics. I created another metric based on the skeleton of tRA, but more akin to RE24. In other words, every event was given a different linear weights value depending on the number of outs and the runners on base. That way, a pitcher can be credited for the added value he creates (or fails to create) by timing his pitching performance based upon the situation.
And… it doesn’t seem to help any:
This looks at the Root Mean Square Error of split halves (in other words, even and odd numbered events) from 2003-2008. The first row does split halves for each pitcher by season; the second looks at splits for a pitcher over the entire five-year sample; the third looks at only those players with at least 500 IP in the first split half.
And… nothing. At no point does looking at situational pitching splits add any predictive value.
At this point, I am required to note that it’s very possible I am simply chasing phantoms. But I’m not so sure just yet.
Another possible approach?
By breaking everything down into 24 base-out states, it’s possible that I’m splitting things too fine; in other words, adding more noise than I am signal. That doesn’t necessarily mean the signal isn’t there, just that we have to work harder to find it.
(And of course, the harder we work to find the signal, the less meaningful it probably is in the aggregate. But if it is there, there are probably some individual pitchers to whom this skill matters more than a little.)
Let’s circle back around to the value of a walk for a minute. The value of any event on offense is based upon three things:
- Getting on
- Moving other runners over
- Avoiding an out
By timing walks so that they occur more frequently with first base open, a pitcher can reduce the number of runners they move over. As Peter Jensen pointed out in the comments last week, it’s very possible that pitchers have no real skill here, as there are simply some times where the situation dictates a walk, regardless of the pitcher’s skill at issuing walks in general. Call these intentional unintentional walks. A pitcher with an overall low walk rate will issue more of these relative to his total walks, though, even if this is not a “skill” in any sense.
So let’s consider a pitcher’s rate of walks with first base open, compared to his walks in general. How well does that persist? Let’s look at split half correlation, weighted, from 1989 to 1999. And let’s compare it to the year to year BABIP over that same time period.
There is slightly more correlation between our open-walk rate (.24) than BABIP (.21), in substantially fewer chances (52 walks for the average pitcher in that time period, compared to 470 balls in play). What this tells us is that there is a stronger talent in controlling one’s situational walk rates than there is in controlling one’s BABIP, but it’s not any easier to pick up this talent due to the vastly smaller number of observations.
Again, there doesn’t seem to be a whole lot here. Unfortunately, it doesn’t seem that we can discover a whole lot about a pitcher by looking at these sorts of things (as much as I thought and perhaps hoped there would be). I’m not out, but I’m certainly down, shall we say.
References & Resources
The other note is that yes, I should be testing against RA, not ERA. I recieved a very nasty e-mail informing me of this, as a matter of fact. And I agree with the point in general. So why am I persisting in my error? Because on this scale, it really doesn’t matter what one I test against. For any individual player, yes, it matters which you use. But when you aggregate like this, the problems of ERA versus RA really disappear, and most people are more comfortable “thinking in ERA,” if you will.