Measuring defense is really hard. I know it. You know it. Stats people and non-stats people alike know it. We know that defense is important, that some players are far better defenders than other players, and that this difference is significant.
We also know that defense consists of more than just limiting mistakes, or errors. If a player just stands in one spot and waits for the ball to come to him, he will probably have a fantastic fielding percentage. But, of course, he will not be a good defender, because he will only turn a very small percentage of balls hit in his direction into outs. Long story short: defense is more than just errors and fielding percentage. Again, I think we can all agree on that.
So, what is defense, and how do we measure it? Let’s ignore errors, and fielding percentage, UZR and DRS and all of that crazy stuff for now. Because defense, when it is really reduced to the core, is about two simple goals: turning balls in play into outs, and limiting the advancement of baserunners;mostly, turning balls in play into outs. Players that are good at turning balls in play into outs are good defenders. Players that are bad at this are bad defenders. It’s really that simple.
I’d like to introduce an analogy that I have found helpful in understanding defense: RBI. As much as I believe the RBI to be both a misused and useless statistic, it is based in a very real and simple concept: turning baserunners (and the batter) into runs. Which, when you think about it, is the ultimate goal for an offense: score runs.
And yet, while both RBI and BIP-turned-into-outs measure the core of what we care about in offense and defense respectively, they are also both incredibly and obviously flawed when applied to individual players. Why?
Let’s start with RBI. Say we have two players with 100 RBI at the end of the regular season. Traditional analysis may say that they contributed equally to the team (all else being equal), but then we learn that Player A had 800 potential RBI—that is, 800 baserunners plus himself that he could have driven in—while Player B only had 700 potential RBI. All else being equal, Player B suddenly looks much better because he was able to drive in the same number of runs given lesser opportunity.
We can do the same thing with defense. Say Players A and B both made 300* outs, either putouts or assists, in center field. However, while Player A made these outs in 1,000 “chances”—that is, balls hit to the outfield—Player B only had 800 chances to make these plays. All else being equal, Player B was more valuable on defense.
*Keep in mind that these numbers are entirely fabricated and probably very unrealistic.
You probably see what’s wrong with both of these situations. While we now have the core measure along with the number of “opportunities” each player had to accumulate them, we have very little sense of the quality of these opportunities.
Let’s go back to RBI. Remember that we had two players, A and B, the first of whom had 100 RBI in 800 opportunities, and the second of whom had 100 RBI in 700 opportunities. If we take that at face value, we assume that all of those opportunities were created equal. Even if we remove the own batter from the RBI opportunities, we are left with many different situations in which an RBI was a possibility.
Driving in a runner on third with less than two outs is much easier than driving in a runner on first with less than two outs. And yet, they are treated the same when we simply group all opportunities together. In order to truly understand how well a player drove in runs given their opportunity to do so, you can see that we have to look at each situation separately and somehow determine the number of runs that we expect them to drive in, compared to how many they actually drove in.
Luckily, we have a statistic that does almost exactly that, though few people know of it: RE24. In RE24, we look at how many runs we expect the entire team to score in the inning before the play, and compare that numbers the number of runs we expect the team to score in the inning after the play. In this sense, it’s not exactly like RBI, because RE24 rewards advancing runners and getting on base. However, in the sense that it measures run production relative to quality of opportunity, it gets at the same core goal as RBI.
Can we do the same with defense? Well, sort of. Remember that we have Player A making 300 outs in 1,000 opportunities and Player B making 300 outs in 800 opportunities. Now those “opportunities” aren’t quite the same as the corresponding RBI opportunities because they probably include many, many, balls that the centerfielders simply could not have reached.
Nevertheless, like RBI, in our pool of opportunities we have a range of difficulty and likelihood of turning the ball into an out, albeit to a much larger extent. A lazy fly ball straight to the centerfielder may have a 99% chance of being caught, but a line drive in the gap may have literally no chance of being caught. We can’t just give the two fielders the same credit for every opportunity, just like we can’t give every batter the same credit for an RBI opportunity.
Can we do the same thing on defense that we did for offense with RE24? Sort of. See, the problem with doing this is twofold. One, quality of opportunity in defense is not discrete; no two batted balls are the same, and there is no way to know the probability that any given batted ball will be turned into an out. Two, even if we could know the “quality” of each defensive opportunity, we have no way of measuring it. Theoretically, if we had FIELDf/x, or some other tool in which we could measure the exact trajectory of every batted ball, and we had a reasonable estimate of the probability of each of those balls turning into outs, then we could measure defense with relative accuracy.
But we don’t. And if the situation is even possible, it probably won’t happen for a long time. So we can’t do with defense when we do with offense and RBI and RE24. Unlike with RBI opportunities, we can’t accurately measure the quality or difficulty of each fielding opportunity.
Advanced defensive metrics like UZR, DRS, and FRAA try to solve this dilemma through a variety of methods that I’m not going to outline here. But each is forced to make assumptions about the quality of fielding opportunity. And because each must make said assumptions, each must be regressed significantly in small sample, and moderately in larger sample.
This regression is smart, and if we want to be as accurate as possible, we absolutely should regress defense. But, there’s a caveat, and it’s a caveat that I don’t often see made. When we regress defensive metrics to the mean, we reduce the overall range of runs saved.
In other words, say UZR, at this point in the season, is distributed normally—that is, a bell curve—over all fielders such that 99 percent of players will have a UZR between -10 and 10. And, while these numbers aren’t accurate on an individual level, I think it’s reasonable to assume that they are somewhat accurate as far as representing the true distribution of defensive runs saved (not to be confused with DRS). What I mean is, though UZR, in a short sample, is very inaccurate when looking at each player individually because of the uncertainty of measuring quality of opportunity, it is still likely to have a relatively accurate range of defensive value.
But, if that is in fact the case, then regressing UZR (or another defensive metric) will necessarily underrate great defenders and overrate bad defenders. Because now, by UZR, all those 7-10 run players are now 4-7 run players, even though 7-10 run players do exist. We just don’t know who they are.
Why is this important? Not for the sake of accuracy, for regression helps us become more accurate, but for the more practical sake of understanding what we are measuring, and what we know. When we regress defensive metrics to make them more accurate, those players that truly are elite—or, I should say, have performed at elite levels thus far—are penalized (assuming their UZR was positive), and our new assessment of them reduces their contributions in our eyes.
But they exist, and we have to remember that they exist, because we have to remember that even with drastically regressed defensive numbers, WAR is not a perfect representation of a player’s contribution to the team. If two players, like Mark Reynolds and Elliot Johnson, for instance, have a difference of, say, eight runs or so before we consider defense, then it is likely that Reynolds has been better. And if we regress UZR or DRS or whatever, WAR will tell us that Reynolds has been better. But if Johnson has actually been an elite defender and/or Reynolds has been an awful defender, then they may have actually been equally valuable. We don’t know, and we probably shouldn’t assume, but it’s a possibility that we must take into account.
Here’s the short version of all those words that I just wrote. Defense measures outs, in the same way that RBI measures runs. But unlike what RE24 can do with RBI opportunities, our current defensive metrics cannot accurately measure the likelihood that balls in play will turn into outs. So we must base our metrics on assumptions. In doing so, we sacrifice accuracy, so in order to get that accuracy back, we regress our defensive metrics. But when we regress our defensive metrics, we also penalize those players that shouldn’t be regressed. We don’t know who those players are, but they exist, and we must remember that they exist. If we remember that they exist, but that we don’t know who they are, then we can both agree with people like Jon Heyman that some WAR values seem fishy, but disagree with him in his assumption (an assumption that I am admittedly assuming) that they must be wrong.
Thanks to Tom Tango and the discussion over on his blog for the inspiration for this article.