Using Gameday to build a fielding metric (Part 2)by Peter Jensen
March 12, 2009
Editor's note: THT welcomes articles from guest writers, particularly if they represent a unique perspective or voice. You can learn more about Peter Jensen in the biography at the end of the article.
If you read Part 1 of this series, and you are still interested, you may be wondering whether the MLB hit location data is worth all the trouble. Is the data accurate enough to be useful in any serious analysis? It does suffer from a serious flaw.
Since it was created to fill the specific need of providing information for Gameday’s Player Hit Charts, a conscious decision was made to record where a hit ball was ultimately fielded rather than where it first hit the ground. So a line drive hit that barely clears the shortstop's glove and first hits the ground at 225 feet but rolls all the way to the wall will be recorded at 350 feet.
At first this may seem to make the data unusable, but for a fielding metric it's not much of a detriment. Assigning responsibility for hit balls to a specific fielder is more about having an accurate measure of angle than it is about having an accurate distance. Every analyst, of course, wants it all: every scrap of data that can be collected as accurately as possible. That’s why we often look starry eyed toward the future when electronic collection will tell us where the ball and the fielders are to within a fraction of a foot for every millisecond of the game. Oh, and we want it for free, of course.
The Gameday data is free, and two other sources of hit location data, STATS and Baseball Info Solutions, are not. BIS is the source for John Dewan’s plus/minus fielding metric which many feel is one of the best fielding metrics available.
Its main competitor is Mitchel Lichtman’s Ultimate Zone Rating (UZR), which has been calculated from both BIS and STATS data. Using the same metric on the two different data sources had the unexpected result of giving substantially different fielding values for identical players. This was seen as a setback for the confidence in defensive metrics in general, but for our purposes it’s more of an opportunity. It’s harder to criticize a metric based on Gameday hit locations when the two for-pay sources have such a large margin of error.
I don’t have the entire BIS and STATS datasets, but I do have hit-ball data from both sources for over 500 outs from last year’s Torii Hunter/Andruw Jones project. For that small subset the standard deviation of the differences between Gameday and either BIS or STATS hit-ball angles were 3.1 and 3.4. The standard deviation between BIS and STATS was about 2.8.
I concluded that a UZR or plus/minus type system that divides the field into small sectors and compares a fielder’s performance in each sector with an average fielder would probably suffer problems from this amount of potential error. The best use of the data would seem to be as a supplement to a whole-field type of fielding metric.
Several whole-field fielding metrics have been published over the last year or so. Sean Smith’s TotalZone was the first to be published, followed by Dan Fox’s Simple Fielding Runs (SFR), and PizzaCutter’s OPA!. Each uses the Retrosheet play-by-play data to construct a system that has more accurate inputs (and hopefully more accurate results) than non-PBP metrics. At the time they were conceived, their authors admitted that they could not hope to be as accurate as a system that included hit location data, but neither UZR nor Dewan’s plus/minus results were being made publicly available for the current year.
Since then the restrictions on UZR have been lifted and UZR fielding values are available on Fangraphs and plus/minus results are available on Bill James Online. If UZR and plus/minus were in close agreement for player’s fielding values, there would be little value in constructing a new fielding metric based on Gameday data. But they are not. A Gameday-based fielding metric has just as much chance to be taken seriously on its own merits.
Plus, the process of constructing a metric from Gameday data has much to offer in evaluating the relative value of zone-based versus whole-field metrics, and how fielding metrics should be constructed in the future when more accurate hit-ball data becomes available.
Some people have said that it is logically impossible to construct a whole-field type fielding metric that will be as accurate a zone-based system. On its face the argument seems to have merit. How can you gain accuracy by ignoring more detailed information?
The answer is, you can’t. You can’t if you are positive that the more detailed information is a true reflection of the skills you are trying to measure, and not the result of either measurement error or an erratic distribution caused by small sample size.
If measurement error and small sample size are potential problems, and they are with zone data, then a possible solution is aggregating the data into larger samples. Measurement errors tend to cancel out and distributions smooth in larger sample sizes. This is not a new concept in sabermetrics. Both our batting metrics and pitching metrics are based on aggregated data.
Where to begin? All fielding metrics share the same two basic concepts. How many plays did a fielder make compared to how many plays he "should" have made? And how many runs did he save on his made plays compared to the number of runs cost on the plays he did not make? Sounds simple, but the devil is in the details, and there are a lot of details.
"Plays made" is relatively simple. If a player fields a ball and causes either the batter to be out by catching the ball in the air or forcing him at first, or forces another runner. it always counts as a play made. If the batter or runner would have been out but the fielder receiving the thrown ball makes an error, it counts as a play made.
The only controversial decision on plays made is how to handle the fielder’s choice. Most analysts classify a fielder’s choice as a play made when an out occurs on the play. That is the method I have chosen. But there are arguments for ignoring all fielder’s choices, or classifying them all as plays made. An individual fielder usually averages fewer than two fielder's choices a year, so the practical consequences of choosing one method over another are minimal.
Whole-field systems and Dewan’s plus/minus count all of a fielder’s plays made in one calculation. Zone systems divide them into in-zone plays and out-of-zone (OOZ) plays. I have chosen to divide the infield into four large zones, each the responsibility of a fielder other than the pitcher. Likewise, the outfield is divided into three zones, one for each outfielder. Plays made are divided into in-zone plays and out-of-zone plays.
Deciding how to calculate how many plays a fielder should have made (Expected Outs) is one of the two crucial decisions for any fielding metric. For zone-based systems, the typical method is to calculate the percentage of plays that are made by a league average fielder for a ball in that zone and multiply that percentage times the number of balls hit in that zone for each fielder. This raw number can be further adjusted by park factors, a pitching factor, or a guesstimate of how hard the ball was hit.
The details of how each fielding metric handles these decisions are most often not available, and the differences between metrics can have large effects on the results. Dewan’s original plus/minus did not adjust for park effects for outfielders. He has since implemented a park effect by designating balls that hit the outfield wall above an outfielder’s reach as unfieldable. I believe that UZR park-adjusts in a conventional manner for both infielders and outfielders.
I have chosen to park-adjust for outfielders only, and to do it by a method used in some fashion by each of the whole-field metrics. Instead of creating a park-zone fielding factor based on home and away splits, I use a park-average fielding percentage based on a multi-year average of all fielders who have played that outfield position in that park. I have weighted the factor by using all the stats from visiting players and 1/12 the stats from home players; this way, a particularly good or bad home team fielder won’t overly influence the results. Other whole-field systems have chosen different weighting methods.
I do not adjust for pitching quality or hit-ball speed. But I do adjust for the handedness of the batter.
I also make some adjustments for infielders that plus/minus or UZR might make (I'm not sure). It seems obvious that if the ball is fielded by a player before it reaches a shortstop then it shouldn’t count as a chance for the shortstop. So I remove from the total of hit balls in the shortstop’s zone all those balls fielded by a fielder in front of the shortstop. I make the same adjustment for all infielders even though it mostly affects shortstops and second basemen.
Whole-field metrics have an issue that zone systems don’t have: How to allocate the ground ball hits that go past an infielder or the air ball hits that fall between the outfielders. Because they have no hit-ball location information, whole-field metrics don’t know who the closest fielder was to the hit. The only solution is to use a fixed allocation based on league averages. But a fixed allocation cannot take into account the relative fielding abilities of the two adjacent fielders.
If the league average says that a shortstop usually is responsible for 60 percent of the ground ball hits that go between an average shortstop and third baseman, it causes significant inaccuracies to apply that percentage to a specific fielder when a shortstop is either much better or much worse than the third baseman next to him. This is one of the main failings of whole-field metrics and is the reason they report results with less range between good and bad fielders at a position than do zone-based systems.
Converting plays to runs
I end up with a plus/minus number of outs for each fielder’s one big zone and a separate plus/minus for the plays he makes outside of his zone. I then have to convert the plus/minus scores into runs. This conversion from plus/minus plays to runs is one of the two big sources of variation in the reported results of the different metrics.
The simplest method (let's call it the "generic value") is to multiply plus/minus plays made by the average linear weight difference between an out and an average hit, usually estimated at about .8 or infielders and .9 for outfielders. This method is simple, but there is no excuse for using it for all positions. For shortstops and second basemen it gives a reasonable value because almost all hits on plays at those positions are singles. But even for those two positions there are differences between fielders in the number of hits that are infield singles or singles to the outfield.
There are also differences in the number of errors made. Third basemen and first basemen have to adjust their on-field positioning due to the additional risk of allowing doubles if a hit ball gets by them on the foul line side. For outfielders, the distribution of singles, doubles and triples allowed is a key driver of their total runs allowed.
One possible solution is to track the plus/minus of infield singles, outfield singles, doubles, triples and errors just as you track plus/minus for plays made. You can then apply the appropriate linear weight (LW) to each event and create a plus/minus run total. Another solution is to measure the run value added (RVA) for each fielding event directly by calculating the increase of expected runs scored from the base-out state prior to the play to the base-out state after the play, adding any runs that actually scored on the play.
The advantage of the linear weights (LW) method is that it avoids run variation due to uneven distributions of base-out states. The advantage to the RVA method is that it includes runs differences from double plays and arm ratings without having to create separate calculations. It is also possible that fielders make strategic decisions on how they will play the ball for a given base-out state. A practical consideration is that the RVA method is much easier to calculate.
I haven’t completely decided where I stand on the issues presented by the LW and RVA systems, so I am going to present both results for this metric. By now you are probably thoroughly confused. The best way to understand the methodology is to see an example.
Here is part of the 2008 center field line for Carlos Beltran. An FB is a fly ball, an LD is a line drive; EXP stands for Expected and ACT stand for Actual.
So the first thing we see is that Carlos had a very good year. He was a +12 for balls in his zone (subtracting the differences between Actual and Expected for both fly balls and line drives) and an astounding +17 on balls OOZ. His zone of responsibility ran from 74.7degrees to 102 degrees for left handers and 76.5 to 103.7 degrees for right handers. It may seem counterintuitive to give the CF less than an angular third of the field but you have to remember that there is a lot more square footage to cover in that 27 degree slice. I drew the boundaries where roughly 50% of the fly ball hits fell on each side of the line for the league as a whole.
One might wonder whether Beltran’s +17 OOZ was due to ball hogging. I haven’t done a full study, but a preliminary look at the data seems to indicate that outfielders are pretty astute at judging the range ability of their fellow outfielders and positioning themselves so that they have about an equal chance of reaching hit balls between them.
The second observation from this data is that there is not a lot of room to show ability by catching more than the average number of line drives. But a speedy centerfielder can save a lot of runs by turning line drives into singles rather than doubles or triples. Beltran has nine fewer LD extra base hits than expected and 7 fewer FB extra base hits. Let’s look at how this all translates into runs, using our three types of run methodologies:
Beltran by any measure had a very good year. Oh, minus values are good and positive values bad. Just the way I think about runs on defense. Minus means a run saved. I know it’s weird but get used to it because that’s what you are going to see in the leaderboards and spreadsheets in the next article.
Anyway, linear weights has Beltran saving 16.7 runs in 2008. Not quite as much as a generic approach at 27.6 and a little more than Run Value Added at 14.8. Translated into a rate stat, linear weights is 16.3 runs saved per 150 games where the game is defined asthe average number of chances per game for a center fielder of 4.2. Defining games by chances makes more sense than using innings since what we really want to know is how many runs will Beltran per opportunity.
Just for fun, let’s compare these results with UZR and Dewan's plus/minus. UZR has Beltran saving seven runs in 2008. Interestingly, UZR shows 377 expected outs and 418 put outs. I have him at 385 expected outs with 414 plays made. The slightly fewer plays made is due to my ignoring line drive outs OOZ and also ignoring plays for which Gameday has no hit location data. How MGL interprets a +41 plays made over expected as only being worth 7 runs will have to be explained by him.
Plus/Minus gives Beltran 404 expected outs and +14 plays made. This is adjusted to +24 in the "enhanced" model and 14 runs saved (not including runs saved with his arm).
Since the methodology is different for infielders, let’s look at one. Here are Alex Gordon’s 2008 numbers at third base:
HB is the number of ground balls hit in the third base zone (40 degrees to 65.6 right, and 64.7 left). PP is the number of plays made by the pitcher in that zone (five is a little above average). Third basemen ranged from 0 PP for Evan Longoria in slightly fewer games to 12 for Casey Blake in fewer games but more HB. OUTS are non-double play outs.
Gordon does a little better than expected, with four more outs and two more outs OOZ. For a third baseman, OOZ outs means that he was moving to his left into the shortstop's zone. Combined with the higher-than-usual number of pitcher plays, this might be an indication that batters were hitting balls off his pitchers with less than usual speed off the bat. Or he might be cheating a little towards short. Or he might just be better at going to his left.
I_1B are infield singles. His much-higher-than-usual number could mean that he got to more hit balls, preventing them from going to the outfield. Or it could mean that he was slower getting the ball out of his glove and/or not getting as much on the throw to first. SSS is shortstop saves. It is the number of balls in the 3B zone that get by the third baseman and are fielded by the SS for an out. The lower than average number means he wasn’t getting much help from Mike Aviles and the other KC shortstops.
O_1B is the number of outfield singles. Sixty-five is the expected number, 62 was Gordon’s actual number and 57 the projected actual number if Gordon had had an average ability shortstop playing next to him. 2B is the number of ground ball doubles that were picked up in the 3B zone. Apparently, Gordon wasn’t cheating toward the shortstop to get his OOZ outs. Allowing only four actual doubles for the number of chances he had is extraordinary. His differential of 13 was the best in the league. Only Bill Hall, with 11, came close. Finally, Gordon’s 16 DPs in 45 opportunities was three fewer than expected.
Gordon was good in 2008, and projects to be a better-than-average third baseman in the future. However, all of his good stats and all of the areas where he underperformed could be partially explained if his pitchers were allowing hit balls that were hit with less speed than average. With slower-hit balls he is able to field more of them, allowing fewer to get by him to either be fielded by the SS or turn into outfield singles or doubles. But he also has less time to make the throw to first or to second for a double play so he appears to underperform in those areas. We anxiously await Hit f/x to provide more answers.
With all of this Gordon managed a LW/150 of -7.8 (remember minus is good), 12th in the league behind Scott Rolen’s leading -28.7. But remember those five extra outs Gordon would have had if he had an average SS playing next to him. That is counted in his run total, so his projection would be even better.
UZR has Gordon at a UZR/150 of 4.8 runs worse than average. Dewan has him at -9 GB plays worse than average and a rank of 27th. So there is plenty of disagreement here. Both UZR and Dewan use either pitcher quality and/or the speed of the ball to make adjustments to the raw data—I don’t do either. Perhaps they have concluded that Gordon was getting really, really easy ground balls to field. Time will tell.
I hope you have seen the possibilities that having detailed fielding numbers can provide for analysis. The only thing left for today is to give my fielding metric a name. Since its main feature is the use of supersize zones I am going to call it “Big Zone Metric” or BZM. Part 3, which will run next Tuesday, will take a look at the best BZMs for 2008 at each position.
When he was ten, Peter caught a foul ball hit by Ted Williams at Griffiths Stadium. He keeps hoping, but so far life hasn't gotten any better than that.