Almost three months ago, I wrote an article introducing a new defensive metric that used The Hardball Times’ batted ball data to measure individual defensive performance, calling it Range. I found that it was overall pretty good, with some problems in right field and at first base. The 2005 calculations were then made available in The Hardball Times Baseball Annual 2006.
A little over a month ago, Chris Dial wrote an article over on Baseball Think Factory that compared Range to UZR (the best fielder rating tool known to man), as well as Zone Rating, using correlation and mean error as his comparison tools. Correlation measures how closely two sets of numbers follow, with 1 meaning that they have a perfectly positive linear relationship, and -1 meaning that their relationship is perfectly negatively linear. Generally speaking, a correlation of .5 means that there is some relationship between two variables, and .7 means that the relationship is strong.
However, correlation has no interest in the magnitude of two data sets; if all my fielder ratings were three times as great as UZR, the correlation would still be one, but the conclusions that I might reach (that Darin Erstad is the greatest player of all-time, for example) would be ludicrous. That’s why Dial also used mean error, which measures the average difference between two variables. If one rating was on a whole different scale than UZR, mean error would make that (painfully) clear.
Let’s move past the statistical explanations, and look at what Dial found. His study found that the correlation between UZR and Range was .61 overall, with my main problems, as I had already known, in RF (correlation = .14), and first base (correlation = -.28). This prompted me to re-tool my ratings at those two positions, given how well Range did at other positions.
After re-doing Range in right field to better estimate the number of line drives caught by right fielders, I found that my ratings had barely changed, and the same was true for the correlation with UZR. So I removed one troublesome player: Nick Swisher. Range said that Swisher was 17 runs below average, UZR thought he was 23 runs above average, and ZR said he was a perfect 0. Clearly there are huge problems with his rating, so I thought that was the right thing to do.
The correlation between Range and UZR in right field immediately shot up to .54 (.55 using the new Range calculations in RF). That’s still not great, however for whatever reason, non-play-by-play systems have huge problems with right fielder ratings. Baseball Prospectus’ ratings for example, showed a .03 correlation with UZR right fielder ratings, with Swisher removed. So I’m pretty happy with what I have here. Removing Swisher, the overall correlation between UZR and Range went to .68.
But I still had huge problems at first base. Here, I needed to completely overhaul my system, which had previously been based on Bill James’ idea of independent putouts. Instead, here’s what I now do:
First, I find each teams’ outs on ground balls. I subtract from that infield assists, then add back first basemen’s assists and adjust for double plays. This way, I’m able to find exactly how many ground ball outs a team’s first basemen made. I divide this number between a team’s individual first basemen based on the proportion of first base assists each made. Finally, I subtract from that number expected plays made, which is based on the number of ground balls the team allowed and the percentage of its innings played by the first baseman.
How well does the system work? Well, the correlation at first base with UZR jumps from -.28 to .55. Removing Daryle Ward, whose Range rating is much closer to his ZR translation than his UZR rating is, the correlation is even better: .72. Increasing the number of first basemen in the sample (Mitchel provided some UZR ratings for me that were not posted online) does not alter that correlation. Also keep in mind that my first base ratings assign credit for lowering their infielders’ throwing error totals, which no other system does as, far as I know.
The overall correlation between UZR and Range is now .76 (Ward included), and the mean error has been improved from 12.2 to 10.4, which is almost as good as Zone Rating. All without actual zone data. You can now feel confident using Range ratings at every position by themselves or to “check” ZR, and make sure that it’s not off.
Also, for those who purchased the Hardball Times Annual (and if you haven’t do so now ), a spreadsheet with the new data will be uploaded to the Annual site. The per 150 game ratings, which were done incorrectly in the Annual, will also be there, this time in the correct format.
References & Resources
Thanks to Chris Dial for his great article on Range, Zone Rating, and UZR. Thanks to Mitchel Lichtman for UZR. And thank you to Baseball Info Solutions and The Hardball Times for providing batted ball data to the public, allowing for a multitude of research possibilities.