Using Gameday to build a fielding metric (Part 3)

by Peter Jensen
March 17, 2009

I spent a couple of articles detailing a methodology for turning publicly available Gameday data into an advanced fielding metric. Now for the results.

Here is a link to download an Excel spreadsheet with 2008 BZMs. The 2008 BZM All Stars by position (based on Linear Weight Runs Saved per 150 games) were:

1B: Lance Berkman (-29)
2B: Adam Kennedy (-19)
SS: Mike Aviles (-23)
3B: Scott Rolen (-28)
LF: Jay Payton (-31)
CF: Jody Gerut (-36)
RF: Franklin Gutierrez (-41)

The spreadsheet speaks for itself. But the numbers may not always speak clearly, so I’ll be happy to answer questions about them. There are some surprise rankings, but they’re mostly what you would expect. I have made every effort to check for mistakes in calculations, but that doesn’t mean that mistakes haven’t occurred. If you spot anything that looks suspicious, please let me know. I’ll welcome the fresh eyes. There are 249 Access queries just for the fielding metric portion alone, so there was plenty of opportunity to make stupid mistakes.

Here are some advance answers to some general questions that I am sure will come up.

Are the Gameday data as good as STATS or BIS data? No. I have already discussed the problem of Gameday giving the location of where the ball is picked up instead of where the ball lands on base hits. But in addition some fielding locations are missing and others seem to be clearly in error. The Gameday data had to be normalized because of differences in the hit recording diagrams and the normalization process will create errors.

Won’t this lack of quality in the Gameday data make the BZM numbers much less accurate than UZR or Plus/Minus? Even though the raw Gameday data are less accurate than BIS or STATS raw data, it is still very close. The omissions and errors that I mentioned above are present in only about 1 percent of the raw data and are less in the most recent years.

All three data sources depend on human observation and no one knows which of the human observers has been most accurate in the data collection. The variation caused by errors in the Gameday data appears to be very small compared to the variation caused by the human observation process. And the final differences in the processed numbers of BZM, UZR and Plus/Minus are much more a result of the analysis within the metrics themselves than in the data they use.

Why did you choose to use the single supersize zone instead of the smaller zones in UZR and Plus/Minus? One reason was the question of accuracy in the Gameday raw data. A single supersize zone meant that any inaccuracies would affect the analysis only at the two edges of the zone instead at all the edges of the smaller zones. But even with guaranteed accurate raw data, a single super zone may still be the best method.

You want your fielding metric to be measuring data that reflect an actual skill difference of the fielders and not a quirk in the distribution of hit balls to those fielders. The smaller sample size of a single year’s data combined with the even smaller samples of the smaller zones makes the chance that you are measuring fielding distribution quirks instead of skill far more likely.

Why didn’t you then choose to use no zones at all like TotalZone, SFR and OPA!? I considered it. I had developed a whole-field fielding metric of my own several years ago. The problem that I saw was in calculating the expected outs. Having no zones meant that the areas of responsibility for adjacent fielders would overlap, making the calculation for expected outs for each fielder more complicated. I thought the supersize zone system was the better solution.

How important is the location of where you set the boundaries for each supersize zone? I truly don’t know. The main difference would be in the calculation for OOZ runs. Currently, if a player makes a play out of zone he gets the out value for making that play. On the other hand, he doesn’t receive the run value for saving a hit on that play as he does if the ball had been in his zone. I chose to do this because I felt there was no assurance that the ball would have been a hit rather than have been caught by the adjacent fielder.

There are certainly many cases in which it would have been a hit and the fielder making the OOZ play is being cheated out of runs that he deserves. But there are certainly cases where the adjacent fielder also could have made the play, so that giving the full hit value for an OOZ play made would be overly rewarding the player. There may be a compromise solution that would treat each position’s OOZ plays in an individual manner. Or changing the superzone boundaries a bit may yield better results.

Why did you feel that it was better to use your method of park adjusting for outfielders than the traditional method of calculating park factors? I tried the traditional method and found that the year-to-year variation of the results was more than I thought was correct, even using multiple years. Some of that variation was because the traditional method is affected by changes in the other parks. Other parks don’t affect the park adjustment calculated by method.

My park adjustment is biased by the quality of the fielders who actually played in that park. I don’t adjust for that and BZM would be more accurate if I did, but the calculations to do so seemed too involved for a minimal gain. The biggest problem would be in the new Washington Park where there is only one year of data on which to park adjust.

Why did you choose 1/12 as the weighting for home field stats in the park adjustment? Things would be a lot easier if there were still balanced schedules and no interleague play. I would then use 1/(x-1) and every team would be equally represented in each park for each separate league. I chose 1/12 for the home team because I wanted the home team to be represented at an amount that was closer to that of the teams in its own division than 1/13 or 1/15 would have been. I experimented with other fractions in my earlier whole field metric and found that the adjustment wasn’t particularly sensitive to the actual fraction that was chosen.

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

You say that you used batter handedness as a factor in your calculations, but I don’t see it in the spreadsheet. Where is it? It is one step back in the process. The rankings are the final step where the numbers from left-handed and right-handed batters have been combined. Although there is a significant difference in the rates at which balls hit to the opposite field are fielded for an out, the differences in how each fielder performed were minimal, and I didn’t find them interesting enough to report.

Why didn’t you calculate arm ratings for outfielders like UZR and Plus/Minus do? I am not confident in the methods that I have seen used to calculate arm ratings. There are certainly differences in the abilities of outfielders to throw a ball strongly and accurately, but in the calculations that I have seen those differences in skill are overwhelmed by the variation created by other factors outside of the fielder’s control.

For now the difference between RVA/150 and LW/150 when calculated over multiple years reflects an outfielder’s arm skill and his strategy in adjusting to different base-out situations. For that reason I would use the multiple year RVA/150 for projecting a player’s future true ability when I had multiple years available. Otherwise, I would stick with LW/150 and assume that arm ratings are not that important.

Are you going to do BZMs for pitchers and catchers? Pitchers, yes, eventually. Catchers, no. Most of a catcher’s value is in how he handles his pitching staff and very little on hit balls. I haven’t tackled pitchers’ fielding yet because it is not usually calculated and I wanted to make the other numbers available for comparison purposes.

Are you going to evaluate bunts, popups and line drives for infielders and ground balls for outfielders? Bunts, yes, eventually, for third basemen, firstr basemen and pitchers. Popups and line drives, probably no. Ground balls for outfielders, I’m not sure.

Covering bunts is a fielding skill and should be evaluated. Covering popups is also a fielding skill. Unfortunately, I am missing a vital piece of information to evaluate the skill properly. I need to know how many foul popups don’t get caught. Infield line drives are mostly being in the right spot at the right time and variations in hit ball distribution. Covering outfield ground balls and preventing extra bases is a skill and should be evaluated, but Gameday doesn’t give enough information to do so. With HITf/x it might be possible.

Finally, thanks to Retrosheet for making this kind of analysis possible. Some of the information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at http://www.retrosheet.org.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG