Editor’s Note: This is the fourth post of “10 Lessons Week!” For more info, click here.
Like many baseball analysts of my generation, my sabermetric interest was inspired by several revolutionary books in the 1980s. Most of you are probably familiar with them. When I was in my 20s, I voraciously read the Bill James Abstracts, the Elias Baseball Analysts, Pete Palmer’s and John Thorn’s The Hidden Game of Baseball, and Craig Wright’s The Diamond Appraised. In the 1990s, I also started to devour a few other publications, like Mike Gimbel’s Player and Team Ratings, and STATS Annual Baseball Scoreboards (as well as the STATS–later Bill James’–Handbooks).
Lesson #1: The People Behind Retrosheet Are Saints
After whetting my sabermetric appetite with those publications, I began doing a lot of my own research on evaluating offense and defense, spring boarding off the work of many of the original pioneers like James and Palmer. I originally used play-by-play and batted ball location data provided by a volunteer organization of baseball fans and statheads who collected this data at the ballpark and from television broadcasts. The group was originally called Project Scoresheet, then The Baseball Workshop, and now Retrosheet. Some of the founders and principals of these organizations are Sherri Nichols, Dave Smith, and Gary Gillette, along with Pete Palmer, a long-time colleague and collaborator of Gillette. All of us owe a great deal of gratitude to these early data collectors and to those who currently compile and disseminate (for free) the Retrosheet data.
Lesson #2: Advanced Defensive Metrics Have Been a Long Time Coming
One of the first things I developed in the mid-’80s was a zone-based defensive metric (I don’t know that I called it anything in particular at the time), using the Project Scoresheet batted-ball location data. At around the same time, I had heard that Sherri Nichols and Pete Decoursey, two other early baseball researchers, were doing the same thing. They called their metric defensive average, not to be confused with the traditional fielding average.
A few years later, STATS, in 1989 or 1990, came out with its own Zone Rating (ZR) and presented it in its first annual Baseball Scoreboard. I think that all of us developed our own version of “zone rating” independently, as neither Nichols’ nor my work was disseminated broadly, and in fact, few people knew of their existence. Remember, this was all pre-internet, or at the very least, at the beginning of the internet, when a lot of baseball research was being shared and discussed on Usenet and other little-known “electronic bulletin boards” and the like.
Around 2000, I think, STATS came up with an Ultimate Zone Rating, whereby it assigned different values to catches or non-catches in various locations on the field for each fielder, rather than using one single zone for each fielder (and some shared zones). The assumption was that not every ball in a fielder’s zone was equally difficult to catch even though ZR treated them all the same. That might seem obvious today, but as with every new discovery or invention, it was apparently not so obvious at the time, and was considered somewhat of a breakthrough in defensive evaluation–at least by me.
For some reason, STATS abandoned this methodology after its initial presentation in the Scoreboard, and it was never heard from again, until John Dewan resurrected a modern, more advanced version, the plus-minus (PM), and eventually defensive runs saved (DRS), with BIS almost 10 years later. So the credit for the original Ultimate Zone Rating, goes to STATS and not to me. I loved the concept and enthusiastically ran with it. I also kept the name, which was eventually shortened to UZR.
My work on UZR was never intended to be and never did become a commercial endeavor. I have spent hundreds of hours writing on, researching, and coding various incarnations of UZR over the years, and tens of thousands of dollars purchasing more advanced (than that provided by Retrosheet and the early data providers) hit location data. The only remuneration I have ever gotten is a small licensing fee from FanGraphs for the last few years. So, if anyone wants to accuse me of stealing the idea from STATS (not that anyone ever has, I don’t think), they would be completely justified. Basically, I loved the idea and decided to refine it. As they say, imitation is the greatest form of flattery!
For some reason, my version of UZR has gained a lot of traction over the years and is often considered the de facto modern sabermetric defensive metric, despite the fact that there are many equally good and similar ones, including John Dewan’s DRS, David Pinto’s PMR, Humphrey’s DRA, Shane Jensen’s SAFE, Sean Smith’s Total Zone, and others. I have to give John Dewan and everyone else at the original STATS company a lot of credit for never claiming that UZR was their original idea (which it was). In fact, I owe a lot of my early sabermetric inspiration to those STATS Scoreboards. They were wonderful publications filled with hundreds of well-researched and interesting questions about offense, defense, pitching, and other aspects of the game of baseball. I remember being pretty devastated when they ceased publication after the 2001 issue, I think.
By the way, if you want to read a very good, comprehensive history of defensive metrics, including UZR, I highly recommend this 2010 article by Dan Basco and Jeff Zimmerman from the SABR Baseball Research Journal.
Lesson #3: Asking the Right Question(s) Is Important
In order to understand UZR and defensive metrics in general, it is important to first ask the proper questions. In fact, the proper question is the most important thing when it comes to crafting any effective metric. One has to be perfectly clear what one is trying to capture in order for the metric to be any good, and the methodology has to do an adequate job in answering it.
Lesson #4: Defense Is Simple
So what are we trying to measure with UZR and how do we do it? Obviously we are trying to measure the quality of a fielder’s defensive performance, but what does that mean in the context of having access to certain batted ball data? Compared to hitting and pitching, surprisingly, evaluating defense is a lot easier–in a sense.
When a ball is put into play and is not a home run, there are only two things that can happen from a defensive perspective: Either the ball is turned into an out, or it falls for a hit or an error. Obviously, there are all kinds of other things that can happen after a batter reaches base safely or even if the batter is retired, which involve defense, but the bulk of the work implicated in evaluating defense starts and ends when the batter puts the ball in play and is either retired or reaches base. Even the value of the hit (single, double, etc.) doesn’t have a lot to do with the fielder if he isn’t able to turn the batted ball into an out. So, for now, we will focus on a binary outcome: Either a fielder turns a batted ball into an out, via a fly ball catch or a ground ball out at first or another base, or the batter reaches safely on a hit or an error. Seems easy, huh? Well, not really, as it turns out.
Lesson #5: Defense Is Far from Simple
Here is the key question when it comes to just about any good defensive metric, even the theoretical perfect one: Given the nature and location of each batted ball, how likely is it that an average fielder at each position would turn it into an out? If we knew the answer to that simple question, our job would be almost over and the results would be near-perfect. Let’s say that for a certain batted ball, say a fly ball to a certain location in center field, the answer to that question was “zero” for all fielders other than the center fielder, and for him it was 80 percent. First of all, how do we know those numbers? That’s simple too. We look at all such balls over some lengthy period of time, say five years, and we count how often each fielder catches each one of them and how often they don’t. In our example, no one but the center fielder ever catches that type and location of fly ball, and the center fielder catches it 80 percent of the time–a pretty routine fly ball that presumably all the but the slowest or worst center fielders are able to catch (or those who are “out of position” for various reasons, which we shall discuss later).
Once we know those numbers, our job is almost over. If a certain center fielder is on the field when that exact same batted ball is launched into the outfield, and he catches it, he gets credit for 20 percent of a catch. If he misses it, he gets debited 80 percent of a catch. A catch is worth the average value of an out which is around .25 runs in the current low run environment, plus the average value of a hit for that particular batted ball, which varies depending on its type and location. For example, a deep fly ball which is not caught might result in an extra base hit 75 percent of the time and a single 25 percent of the time, and a short fly ball not caught might result in a single 75 percent of the time and an extra base hit 25 percent of the time.
Let’s say that in our example, the average hit value of our batted ball is .6 runs, a little more than the value of a single. So, a catch, as compared to a non-catch, is worth .25 runs plus .6 runs, or .85 runs. A center fielder who catches the ball gets credit for 20 percent of .85 runs. When he doesn’t catch it, he is debited 80 percent of .85 runs. If our center fielder catches that ball 80 percent of the time, as often as an average center fielder, he would get credited .2 * .85, or .17 runs 80 percent of the time, and minus .8 * .85, or – .68 runs 20 percent of the time, which, lo and behold, adds up to exactly zero runs! A fielder who makes plays as often as an average fielder at that position, must by definition have a UZR of zero.
Again, that seems like a simple, effective system. Once we know the type and location of a batted ball and how often each fielder catches it over the course of a season, we can tally up all of a fielder’s pluses and minuses and the result is a near-perfect accounting of a fielder’s performance, just like linear weights or wOBA for batters. Unfortunately, what seems like a simple answer to a simple question, yielding a perfect metric, turns out to be nothing of the sort. And here is where my 15-year headache begins.
Lesson #6: Multiple Fielders Complicate Things
The first of many problems I encountered was, “What to do with balls that are caught by one fielder and could have haven caught by another.” Many batted balls hit to certain areas of the field are either turned into an out or not by one and only one fielder (at least during the pre-shift era) – for example a ground ball hit down the first or third base line. However, lots of balls are hit in areas in which at least two fielders can and do turn those batted ball into outs. For example, a fly ball or line drive in the left field gap might have a hit rate of 50 percent, a catch rate by the center fielder of 25 percent and a catch rate of 25 percent by the left fielder. That seems straightforward enough, at least on a hit, where we simply dock both of those fielders .25 balls each, or around .225 runs (25 percent of the sum of .65 for the hit and .25 for the out). But, what if the ball is caught? The fielder who catches the ball gets credit for half of a caught ball, or around .45 runs. What about the other fielder? For accounting purposes, we must dock him .25 balls, the same as if no one had caught the ball.
So, in that situation we end up penalizing each fielder the same amount whether no one catches the ball or the other fielder catches the ball. That doesn’t seem right, does it? Surely when the center fielder catches the ball, there is some chance that the left fielder could have caught the ball too, and vice versa. This situation often occurs when one fielder is a “ball hog,” that is, he tends to catch the majority of the batted balls that are catchable by more than one fielder (usually only one other).
Lesson #7: Errors and Hits Are Not the Same
Another thing that I had to consider which, surprisingly, turned out to be a quagmire (a common theme in developing and refining UZR), was how to handle errors. Originally, in the early zone rating type metrics, errors were treated exactly the same as hits. After all, whether a batter reaches on an error or a hit makes little difference as far as potential run scoring is concerned, assuming the batter and runners end up on the same base(s). Remember, defensive metrics, even the most advanced ones, are primarily concerned with a binary outcome – either safe or out. Even most of the advanced metrics, like Dewan’s PM and Humphrey’s DRA, continued to treat errors the same as hits (other than perhaps the fact that their average run values might be slightly different).
I think I may have done the same thing when I designed my original simple zone rating system, but in thinking about UZR back in the ’90s, it occurred to me that errors and hits were very different in one regard even though their outcomes were very much the same. When a batted ball fell for a hit, we docked one or more fielders some percentage of a catch, depending on how often that batted ball type and location was normally turned into an out by each fielder. For example, if a ground ball in the shortstop hole is normally fielded by the shortstop 20 percent of the time, a difficult play, a shortstop gets docked 20 percent of a catch if he doesn’t make the play. What about that same batted ball that is scored an error? As I said, most defensive metrics treat it as if it were a hit, docking the shortstop the same 20 percent of a catch.
It occurred to me that the fact that the scorer deemed the play an error, by definition meant that the play could not have been a difficult one to make. So why are we docking the shortstop only 20 percent of a catch as if it were a tough play to make? I ended up using a different accounting method when a ball was scored as an error rather than a hit–one that assumes a relatively easy play, such that the fielder who makes the error is debited a larger percentage of a “play” than if the ball were a hit. Several analysts in the sabermetric community disagree with the way I handle errors. Some of their arguments are not unreasonable. Unfortunately, a full discussion of the “error issue” is beyond the scope of this article.
Now, how is it possible that a batted ball can be of the same type and in the same location, yet be a difficult play one time and an error, or presumably an easy play, another time? Well, even though the answer to that is fairly obvious, it opens up a gigantic can of worms, and exposes us to the most problematic aspect of any advanced defensive metric, including UZR–positioning of the fielders, and the quality of the data.
Lesson #8: Park Factors Are Hard
Before we get to that, let’s talk a little about some of the more mundane issues I had to deal with, which also turned out to be somewhat problematic. I don’t know to what extent the other advanced defensive metrics handle park factors, but to me, they are quite important. Surely you can’t use league average catch rates for left fielders at Fenway or the vast expanses of Coors Field or even the short porches in Yankee Stadium and Minute Made Field, among other quirky parks. As well, ground balls in Denver and Arizona scoot through the infield like a hockey puck on ice, whereas they get eaten up by the tall infield grass at Wrigley.
Again, using park factors to “neutralize” catch rates in the infield and outfield at the various parks seems pretty straightforward, but trust me, it’s not. For one thing, in the outfield, not all locations can be treated alike, even if we narrow the park factors to something like left field, center field and right field. For example, a short fly ball in left field at Fenway is more likely to be caught by an average fielder than one in Coors Field or Yankee Stadium, as the left fielder playing in front of the Green Monster is going to be normally stationed 10 or 20 feet shallower than in parks with a more expansive left field. And of course a long fly ball that might be a can of corn in most parks could be off the wall and uncatchable in left field at Fenway or in Cleveland, or in right field in Orioles Park. But where do we draw the distinction between a short and long fly ball, or left, center and right, to apply the appropriate park factors?
And what should those park factors be? We know that when computing and then applying park factors, for offense, pitching, or defense (e.g., UZR), we must regress the observed splits toward some mean or “league average.” If we know little or nothing about how or why a park might reasonably affect the stats we are adjusting, then we must assume that the mean is neutral and unbiased (i.e., a park factor of 100). However, for balls hit to left field at Fenway, or anywhere in the outfield at Coors, or ground balls through the hard and fast infields of Chase and Coors Field, we do know something even without observing the numbers.
Still, how do we establish those “means?” For example, if we observed that 70 percent of balls hit to a certain area in short left field are caught at Fenway, but only 50 percent are caught by the same fielders at all other parks, but our data cover only one year, what should we use as our park factor? Normally, we wouldn’t use the observed ratio, 7/5, or a PF of 140, because our sample size is so small–we have to regress that 140 heavily towards some mean, usually 100 (if we knew nothing about the park). But, we expect that balls hit to short left field at Fenway should be caught more often, even without looking at any of the numbers. So, what do we regress the observed 140 percent increase toward? 110? 120? 150? Honestly, I have no idea. Again, there is probably a mathematically rigorous solution to this kind of problem, but unfortunately, it is above my pay grade!
Lesson #9: Fielders Are Everywhere
What about fielder positioning based on the outs and base runners and even the batters at the plate? If the batter is fast, the infielders are going to be playing shallower than if the batter were slow. Surely that affects their catch rates on ground balls–if you are forced to play shallow, hard hit ground balls are more likely to scoot through the infield. Same thing with outfielders and batter power. The more power the batter has, the deeper the outfielders must play. So I ended up grouping batters into various categories, slow, medium, and fast, low, medium, and high power, and then putting their batted balls into separate “buckets.” Again, not a particularly elegant solution, but, as they say, it’s good enough for government work!
I also faced a similar problem with respect to base runners and outs. Certain fielders would be positioned differently, depending on what runners were on what base, and to some extent, the number of outs. For example, with a runner on first and no outs, the first baseman almost always starts out on the bag, holding the runner, therefore limiting his range in the first base hole. At the same time, with less than two outs, the second baseman and shortstop are a bit shallower and pinched in towards the second base bag, in what is called “double play depth.” In some cases, with no outs, or with one out and a pitcher at bat, the third baseman is playing up in anticipation of a bunt. Again, these non-standard fielder positions must be accounted for in the UZR “engine.” And let’s not even talk about “shifts” which have been occurring at higher and higher rates around the league starting in around 2012 or 2013. Currently, UZR completely ignores batted balls that are affected by a shift.
As you can see, fielder positioning is a critical part of ascertaining how often a fielder “should” turn a certain batted ball into an out. To some small extent, it is actually part of a fielder’s skill–at least that is the refrain you often hear. For example, Cal Ripken Jr., an excellent fielding shortstop in his heyday, despite not being fleet of foot, was considered particularly adept at being in the “right place at the right time.” However, in the grand scheme of things, fielder positioning is much more a function of the batter and the game situation (and the manager and coaches), and thus a good fielding metric must try and somehow estimate and account for where each fielder might be positioned at any point in the game, based on the batter, pitcher, base runners, outs, score, inning, etc. As you can clearly see, this is not such an easy task.
Lesson #10: We Are at the Mercy of the Data
Finally, we get to perhaps the stickiest of problems in crafting and calibrating a batted-ball-based defensive metric like UZR–the quality of the data. Ideally, we want to know two things when a ball is put into play: One, the exact starting position of each relevant fielder, and two, the exact location and character of the batted ball. We’ve already discussed the first part and how we can estimate that. The second part is even more difficult and even under the best of circumstances can only be done on an aggregated basis.
For example, originally, with Project Scoresheet, the data recorded only four types of batted balls–fly balls, pop flies, line drives, and ground balls, and the recorded location was maybe a 10 foot by 20 foot swath of the field. Obviously not all fly balls or line drives are created equal, and all three “air ball” categories overlapped to some extent. As well, there were biases in the recording of the data, depending on the “stringer” (the person who does the data recording), the park, whether the game was being watched on television or in person, and even the players on the field (something called “range bias,” whereby the “rangier” the fielder, the closer to his original positioning the stringers would tend to record the location of the ball). Later on, with BIS and STATS, and other data companies, we had access to more granular data, such as the “exact” location of each batted ball, a few more “type” categories, such as fly ball and line drive “fliners” (in between a line drive and a fly ball), the speed of the ball, soft, medium, and hard, and in the last few years, the actual air or ground time, in tenths of a second. Needless to say, even with these tremendously helpful details, they are all only approximations and we are still at the mercy of the stringers’ judgment.
As you can see, while the premise of an advanced batted ball fielding metric seems relatively straightforward–how often is each type and location of batted ball fielder by position X, and did fielder Y make the play or not?–it is hardly that. Our primary impediments are fielder positioning and the quality of the data. The best we can do is approximate both, given the data we have access to. Then there are issues of ball hogging (fielder interactions) and park effects to contend with. With the introduction of some of the newer, computer and video-based data recording techniques, like HITf/x and FIELDf/x we are getting closer and closer to the holy grail of fielding metrics. In the meantime, despite the gloom and doom you have just been introduced to, current advanced fielding metrics are actually quite good. Once we have a season or two of data, I think that they give us a relatively accurate indication of fielding talent and historical value. The methods employed by UZR and other defensive stats may seem a bit convoluted and even somewhat jury-rigged, but as Bill James once said, the messy truth is better than a tidy lie.