Getting defensive

Fielding analysis has always been like the search for the Holy Grail of Sabermetrics. We’re ever so close yet ever so far from reaching our destination. Hitting is pretty straightforward, with singles, doubles, triples, home runs, and so on all neatly categorized into buckets and credited to individual hitters. Evaluating pitching has its complications—much of which can be attributed to the interplay between pitchers and fielders—but it is still relatively “easy,” as we can spot good pitching by looking at things like strikeouts, walks, and home runs allowed.

Fielding, though, is much tougher. It shouldn’t be, right? All we really want to know is who makes the most plays, based on their opportunities. The problem is that while we know how many plays a fielder has made, we don’t have an exact measurement for that player’s opportunities. As a basic refresher course, let’s run down some of the fielding stats that have developed through the years.

Non-play-by-Play metrics

Non-play-by-play fielding metrics attempt to take seasonal data and estimate fielding prowess. They take a very broad view of fielding and don’t look at things like the exact location and speed of a batted ball.

Range Factor, developed by Bill James in the late 70s, is one of the first attempts at evaluating fielding performance, outside of fielding percentage and errors. It measures, quite simply, how many plays (putouts and assists) a player makes per nine innings. Range Factor does a great job of telling us who made the most plays, but it fails to address opportunities in any way. Therefore, players who get more opportunities to field batted balls, for whatever number of reasons, will look better than they really are.

Advances were made on James’ simple construct, like Tom Tippett’s Adjusted Range Factor. As its title implies, Adjusted Range Factor makes some simple yet important adjustments in an attempt to better estimate a fielder’s opportunities. For instance, Tippett uses balls in play while the player was in the field rather than just innings, and further adjusts for things like batter-handedness and a pitching staff’s groundball tendencies.

Similar statistics were developed, but for the most part, Tippett’s Adjusted Range Factor gets us about as far as we can go while using only aggregate season data.

Play-by-play metrics

Play-by-play metrics use data that analyze every batted ball, such as its hit location on the field and its estimated speed. These data are collected and distributed by companies like Baseball Info Solutions and STATS Inc.

Zone Rating is the most basic of the play-by-play metrics. It breaks the field up into distinct zones and assigns zones of responsibility to each position. It then calculates how many plays a fielder makes in his zone and how many batted balls actually went through that zone to get a percentage of plays made based on chances. Then it adds in out-of-zone plays and, voila, a more precise measure of fielding ability is created. Zone Rating has its problems, but nonetheless it is an essential building block for all of the advanced systems that followed. Here is STATS zone rating grid: image

MGL’s Ultimate Zone Rating (UZR), described here and here, is perhaps the most cited defensive metric today, freely available at FanGraphs. It is based on the original Zone Rating concept; however, it makes many useful adjustments to enhance its value. For one, the field is split up into many more small zones. A fielder’s performance is judged in each zone and compared to league average in that zone. Further, a bunch of other adjustments are made, including park factors, batted-ball speed, and base/out context.

John Dewan’s Plus/Minus is another one of the Goliaths of fielding stats, yet is quite similar to UZR. It is also available at FanGraphs. Plus/Minus again slices the field into vectors and credits or debits a fielder for making (or not making) plays in specific areas of the field, also considering the speed and type of the batted ball. There are plenty of differences between the two stats, but the basic constructs are very similar.

What are we missing?

As you can see from above, there’s a lot we can do with fielding statistics, advancing from basic Range Factor to Ultimate Zone Rating; from aggregate seasonal data to much more fine, granular play-by-play data. In the end, though, how close are we to truly, accurately measuring fielding performance? What are we missing?

  • A benchmark for comparison. If, say, Derek Jeter is 20 runs below average every year that is fine, but how do we verify that it’s correct? Sure, it’s nice that the metric (or metrics) is consistent, but what if it is consistently wrong? How can we determine if UZR or Plus/Minus is right on the money or way off the mark, if we aren’t able to verify our measurements against something we know is right?
  • Positioning. Fielders are always moving around, based on the batter, the pitcher, the count, the specific situation, and so on. If we want to measure fielding performance, wouldn’t we want to know the fielder’s initial positioning? It can be argued, perhaps correctly, that positioning is part of fielding; therefore we don’t need to know it—it’s already part of each metric, and a player gets credited for good positioning and docked for bad positioning.

    That said, it is certainly something that would greatly help further our understanding of defensive performance and sort out who truly has great range, as commonly defined, versus those who position themselves optimally.

  • Hang time. While “hard,” “medium,” and “soft” and line drive, fly ball, fliner, may serve as decent indicators of hang time (or a similar calculation on ground balls), they certainly fall short of an exact, precise measurement. There are developments in this area, but it certainly has been largely ignored and not included in most advanced fielding stats.
  • Consistent and unbiased data. The data that go into each of these stats is collected by humans (from either Baseball Info Solutions, STATS Inc., or MLB Gameday) and is probably far from perfect. Likely, it is biased in various ways. Imagine trying to distinguish a fly ball from a “fliner” from a line drive or classify a grounder as “hard,” “medium,” or “slow” all while watching from the press box or from a television feed. Not easy, I don’t think. Furthermore, as Colin Wyers has shown on these very pages, visual discrepancies between press boxes and ballparks, amongst other things, may help to skew a stringer’s perspective.

Consider the case of Tampa Bay Rays shortstop Jason Bartlett:

A Hardball Times Update
Goodbye for now.
Year Team UZR Plus/Minus
2005 Twins +13 runs +13 runs
2006 Twins +12 +7 runs
2007 Twins +9 +15 runs
       
2008 Rays +2 -2 runs
2009 Rays -6 +4 runs
2010 Rays -10 +2 runs

Bartlett was tremendous, by the numbers, in three seasons with the Twins. Upon moving to Tampa Bay, however, he transformed into an average, at best, defensive shortstop. It is plausible that Bartlett has simply declined, and these numbers reflect, somewhat accurately (if not precisely), Bartlett’s fading fielding ability due to age, not to mention regression to the mean. However, it also seems possible that something about moving from the Twins to the Rays—the pitching staff, ball park, stringers, teammates, etc.—has played a part in Bartlett’s fall from a great fielder to an average-to-below-average one, something that Bartlett has had no control over.

Where are we going?

Interestingly, some of the fielding stats created in recent years have seemed to have taken a step back from the likes of UZR and Plus/Minus, at least in terms of detail. The idea, however, is that if the added detail might *not* really be measuring fielding skill effectively, perhaps a more broad approach would provide increased accuracy, at least in the long term.

Peter Jensen’s Big Zone Metric uses hit-location data collected by MLB Gameday. Jensen divides the field into four zones in the infield and three zones in the outfield, with each position being responsible for one big zone. He then looks at a fielder’s performance both in and out of his zone, and makes various adjustments while comparing the player’s performance to league average.

Colin Wyers’ new Fielding Runs Above Average (nFRAA) is designed to be as factual as possible, not using any type of data (hit-location, batted-ball data, etc.) that could introduce any systematic bias. Wyers theorizes that in using this type of detailed data in an attempt to reduce measurement error, we are simultaneously introducing potential biases. We are not sure if we are actually getting more accurate or less. So instead, his approach is to be as broad and factual as possible, eliminating potential biases, while creating a measurement that can be accurate over a long period of time.

Tom Tango’s With or Without You, described in the 2008 Hardball Times Annual, takes a far different approach. For example, Tango looks at what percentage of batted balls, say, Jason Bartlett fielded when David Price was pitching, versus the percentage of all other shortstops behind Price. With large enough sample sizes, like using Derek Jeter as Tango did in his THT article, the results can be quite fascinating. The same approach can be applied to ball parks the fielder played in, hitters the fielder faced, and so on.

While these approaches offer a fresh contrast to the small-zone, play-by-play metrics covered above, I don’t believe they are adequate enough measurements for making multi-million dollar decisions that teams are often faced with. What if we only have a couple of years of data to go by to decide who the Tampa Bay Rays’ starting shortstop is going to be? While these metrics may be preferred for career value, they probably do not do a good enough job of measuring fielding ability over a relatively small sample—at least to the degree of accuracy we’d prefer.

What we really want is the detail of the play-by-play systems like UZR and Plus/Minus combined with the unbiased nature of, say, nFRAA. Enter Fieldf/x. Similar to PITCHf/x, Fieldf/x will record all of the movements that occur on the baseball field (batted balls, fielders, base runners, a swarm of midges) with high-resolution cameras installed at each stadium. Instead of relying on subjective estimates of ball speed and location, we’ll have rock-solid evidence of everything that happens on the field—a batted ball’s location, a fielder’s initial position and movement, etc. Undoubtedly, there will be plenty of issues to work through, not to mention the question of whether the data will be made public, but the possibilities of a Fieldf/x system could certainly be revolutionary. Fieldf/x could be that Holy Grail we’ve been in search of.

Currently, without any type of public Fieldf/x data, we have a lot of different fielding stats, all created with good intentions by very smart people. The issue is not so much in the metrics themselves; rather it’s in the data that go into these metrics. How much can we trust it? Do the biases cancel out over time or are they systematic? Are we still missing key elements like hang time or fielder positioning? Are the broad, less biased stats accurate enough in the short-term? These questions are not easily answered, but at the same time we must resist the urge to simply trust fielding metrics at face value because we have no better alternative. At least, we must understand the uncertainty and the potential issues at hand.

Armed with MLB.TV’s 2010 archive, in the near future, I hope to watch some games (in detailed fashion) and attempt to play the role of a stringer, classifying batted balls and trying to judge fielding performance (albeit in a very limited sample). Certainly, I won’t bring the experience and knowledge of a true stringer, but I hope that it will illuminate the process and leave me (and you) with a better understanding of what goes into many fielding stats, and how much confidence we should have in them.

References & Resources
It is almost impossible to profile every fielding metric in an article while still making it readable. Here, however, I will provide links and brief descriptions to a number of other fielding metrics, if you are interested further in the subject.

Non-play-by-play

Fielding Win Shares: The fielding portion of Bill James’ Win Shares system follows a top down approach of first looking at team fielding, then crediting individual players for their performance.

Fielding Runs: Pete Palmer’s Fielding Runs was another early fielding metric that attempts to measure a fielder’s worth by looking at how many plays he made and comparing that to a number of estimated chances.

Defensive Regression Analysis: Michael Humphreys fielding stat employs regression analysis to measure fielding.

Range: THT’s David Gassko takes a crack at a non-PBP metric, using things like batted-ball data and batter-handedness to estimate how many chances a fielder had.

Play-by-play

Total Zone: Sean Smith’s fielding metric uses Retrosheet play-by-play data, such as hit type and who fielded each hit, to estimate fielding performance.

Simple Fielding Runs: Dan Fox’s Simple Fielding Runs, similar to Total Zone, uses Retrosheet play-by-play data to analyze defense.

Probabilistic Model of Range: David Pinto’s PMR uses familiar data points such as direction of hit, hit type, and how hard a ball was hit to estimate fielding.

Spatial Aggregate Fielding Evaluation: Shane Jensen’s SAFE using a smoothing function to estimate the probability of a player making an out, using play-by-play data from Baseball Info Solutions. SAFE was further discussed, in detail, at The Book Blog.


10 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
MikeS
13 years ago

This is just outstanding.

I have several problems with fielding metrics, myself.

1) Why isn’t the spread of defensive talent the same as for hitting or pitching?  What makes anybody believe that fielding talent clusters so much?  If a hitter or pitcher can be worth several full wins per season, why can’t a shortstop who has far more opportunities/game than a hitter be worth just as much with his glove?

2) Although most people admit that most of your arguments are correct, why do people still use fielding metrics to calculate a hitter’s overall talent or value?

3)  People tend to accept the best fielding metrics available as good or at least adequate.  The best of something can still suck.  This should be obvious to anyone who has seen the Pirates play.

4)  As you pointed out, how can you trust a metric that varies so widely.  Some guys have good or bad years with the bat but the variability within the total sample is not as volatile as with fielding.  Barring injury, few players will go from slugging .450 to .300 and back again year after year.  Guys go from good to bad to good again at random in the field which should be a sign to a researcher that perhaps there is a problem with their collection methods.  Too often people assume that the data must be good, so the player is inconsistent.  Of course, they then say fielding isn’t as important as hitting so it doesn’t really matter (see #1).

Overall, I just think that people need to understand that if you can’t measure something it does not mean it isn’t important, it just emans you can’t measure it.  Discarding it out of hand or trusting bad data is never a good idea.

jinaz
13 years ago

On your proposed project, I’ve been doing something like this over the past season when scoring games on my iphone with iScore.  I think you’ll find that you will inevitably a) mark balls closer to fielder positions when they are outs, and in the gaps when they are hits, and b) be more prone to mark balls as hit “hard” when they are hits as opposed to outs (and vice versa).  I find myself doing these things even though I KNOW they are problematic biases (of course, I also have known that I have intention of doing anything with the data I’m scoring—it’s just a fun way for me to watch a ballgame).  I think this is especially likely to be a problem on televised games, but probably still will happen when watching a game live at the park.

How much it actually changes fielding estimates, I dunno.  My guess is that it’s a matter of a few runs per season.  But I don’t know that.
-j

this guy
13 years ago

This article is about 10 years overdue. Glad someone finally wrote it.

Geoff
13 years ago

MikeS,
  I think the reason for the clustering of fielding value is because a hitter will get something like 500-700 PA in a season, but the fielder with the most chances only gets half of that (most will have far less than the top shortstop), and even then a lot of those plays would be easy, routine plays, whereas all at-bats are an exercise in talent and focus.

Given this, it’s hardly surprising that the best and worst gloves are only amounting to -20 to +20ish runs a season.

Mike Fast
13 years ago

Geoff, your point about number of fielding chances is a good one, but I’ve become skeptical of the assertion that most fielding plays are at the extremes of difficult, either very hard or very easy.  I’ve not seen any evidence for that, and the evidence I have seen runs counter to that hypothesis.

Colin Wyers
13 years ago

Fielding value clusters because we cluster fielders.

Imagine for a second if you took all hitters and divided them into eight “troughs” based upon their ability as hitters, and the compared them ONLY to the hitters in that trough. (You could even do this with lineup spots.) What would that do to the observed spread of hitting value?

That said, I think that metrics using “zone” data or analysis that tries to emulate zone fielding metrics (like some versions of TotalZone) will categorically understate the spread of fielding moreso than it is in actual fact.

Myron Logan
13 years ago

Good points all around, guys.

I was going to mention that Colin’s fielding metric appears to have a much larger spread than the zone-based ones.

Part of that may be due to range bias, which Colin talks about in a BP article (http://tinyurl.com/248sb95).

MikeS
13 years ago

Geoff.

You may be absolutely correct but if you are not than I would suggest it’s that kind of thinking that may just be rationalization of a bad model.

Rick Swanson
13 years ago

The formula we need, is divide time by distance. We need to use feet and not meters.

The further you range in the shortest amount of time the lower the number will be.

R/R is the term I coined in 2004. Reaction Time over Range Distance.

FIELDf/x will give baseball that range distance number

fredf
13 years ago

good artical but a few things wat about retrstats and tere an article somewere on te internet for ss tat uses assists for all ss per team of any year and rank tem top to bottom you could ajust for groundball to flyball rigty lefty but ius interesting