From Twitter: UZR and Plus/Minus

Two days ago, Fangraphs added the BIS +/- fielding statistics to its pages. Over the past couple days, Twitter was the scene of a conversation comparing +/- Defensive Runs Saved (DRS) to Ultimate Zone Rating (UZR), also available at Fangraphs, based on the same underlying BIS batted ball data set. The conversation was between Colin Wyers, myself, and several other contributors. Rather than let this conversation sink into the abyss of old forgotten tweets, I decided to preserve it here, with Colin’s permission.

Please consider that the following was not intended as an in-depth research presentation by any of the participants, and it was subject to the 140-character limit of the Twitter medium. I have lightly edited the transcript to improve readability.

Colin Wyers: I’m not sure what I think about this yet: http://bit.ly/aSQfXB
CW: For ’09, correlation between BIS DRS and UZR (same pbp data source) is .79. RMSE is 3.1.
CW: Should note – that RMSE is for typical player, or someone with about 66 BIZ chances as defined by BIS. Starter has ~ 300 BIZ.
CW: For players with > 1200 innings at a position, RMSE between UZR and BIS DRS is 7.5.
azruavatar: Is there anything detailing the +/- methodology?
CW: BIS periodically publishes the Fielding Bible. For difference between +/- and UZR, try this: http://bit.ly/bGC0rb
CW: Major difference between UZR and +/-, IIRC, is the baselines – UZR uses three-year baselines, +/- uses one year.
MNTwinsZealot: I understand John Dewan’s Fielding Bible +/- system, but how accurate is it? Should it be used just the same as UZR?
CW: I guess that depends on how you want to define accuracy.
CW: UZR has a higher year to year correlation with next season UZR than DRS has with next season DRS. Not sure it’s significant.
CW: UZR also seems to be a better predictor of next-season DRS than DRS is of next season UZR. Again, not sure if this is a significant result.

…Colin and I carried on a chat and email conversation, not recorded here, which led to some of the following tweets…

Mike Fast: Split into infield vs. outfield. UZR does better with outfielders, DRS better with infielders, relatively speaking.
CW: That’s interesting. I need to look at this closer.
MF: Also, I don’t find the correlation between Yr-1 error and Yr-2 error to be very high. R-squared = 0.07 (2003-09, >400 innings)
MF: Where error = DRS/inning – UZR/inning. It’s not insignificant, but less than I had expected from what you had said.
MF: UZR Yr2-v-Yr1 outfield R-squared 0.21, infield 0.19. DRS Yr2-v-Yr1 outfield 0.12, infield 0.19. (2003-2009, >400 inn).
CW: You’re normalizing by playing time, which is probably better than what I was doing. Can you look at the park switchers, too?
MF: Wow, ugly. For team switchers, UZR Yr2-v-Yr1 outfield R-squared 0.01, infield 0.06. DRS Yr2-v-Yr1 outfield 0.00, infield 0.02.
CW: That is ugly, especially once you consider that both are adjusted for park.
Robbie Griffin: I’d be interested in hearing how you feel. I saw some of your comments on the UZR vs. Dewan conversation at the Book Blog.
CW: At this point, I don’t know what these numbers mean. And I’m starting to lose confidence that what we’re measuring is defense.
CW: That last tweet probably came off as MORE optimistic than intended. I’ve been losing confidence for months – not sure I have any left.

Dan Turkenkopf: So some concerns about the quality of data that we’re basing a lot of our analysis on?
CW: Mike Fast and I have both found that the correlation for UZR year-to-year drops substantially if you look only at team switchers.
Graham MacAree: hardly a big surprise there.
CW: Combine that with my findings on the batted ball data itself (and there’s more that should be published soon), and, well…
RG: Is that for just +/- or all defensive metrics currently en vogue?
CW: You have two potential sources of discrepancy – methodology and the underlying data. At this point I have questions about both.
CW: @MacAree They’re both supposed to be park adjusted. Unless you have some other explanation…
GM: ‘supposed to be’ being the major question mark.
DT: Pitching staffs?
CW: Again, that’s SUPPOSED to be controlled for. If it isn’t, that’s a real problem.
Matt Klaassen: “and well,” what, back to range factor? FRAA?
RG: My main question would be along those lines: is it still probably better than anything else even with the issues?
CW: I don’t have good answers yet.
GM: My money is on us having a real problem. If our data is bollocksed we can’t do anything.
CW: That’s where I’m at right now as well.
GM: I don’t think I’m too disheartened by this thought – as long as the logic is sound I’m happy. Just need better data.
CW: For want of a stopwatch, the kingdom was lost.
MF: Except hope that Trackman and/or FIELDf/x some day become operational and public in some fashion.
GM: Right. As long as we’re attacking problems in the right way the source data will catch up.
CW: I think once you’ve identified the problems in the existing data, you can do more with it. We’re getting closer there.
MF: That’s a good point, Colin.

MF: Based on team-switchers, looks like outfield data is hopeless, wonder if that is mostly due to the LD counting issue you found?
MF: Team-switcher data shows UZR still has promise for infielders. What would the park/team bias be? How hard the groundballs are?
CW: There are two questions about the batted ball data – I’ve really only addressed trajectory. Location data could have issues too.
MF: Breakdown by position (sample size ~20 players) for team-switchers: UZR good for SS, 3B; mediocre for LF; useless for RF,CF,2B,1B.
MF: Breakdown by position for team-switchers: BIS +/- very good for SS; mediocre for LF; useless for RF,CF,3B,2B,1B.
MF: General insight: averaging UZR and BIS +/- does nothing to improve the year-to-year predictive power.
GM: This is for UZR/150 so pitcher staff GB rate won’t matter, yes?
CW: UZR/Inn, so that shouldn’t be a factor, right. I say shouldn’t but at this point I don’t know.
GM: Defensive innings, I imagine? so DG/9?
CW: Actual innings played, not derived innings.

CW: Can we look at year-to-year correlation for ExO in players who DON’T switch?
MF: I can’t find ExO on Fangraphs any more, is it okay to use BIZ?
CW: Hrm. Looks like they pulled DG as well. I don’t think BIZ quite gets at what I’m measuring, but it could work.
MF: Team-switchers BIZ/inn for outfielders y-t-y correlation R^2=0.06, for non-switchers R^2=0.18.
MF: Team-switchers BIZ/inn for infielders y-t-y correlation R^2=0.55, for non-switchers R^2=0.65.
MF: The difference between BIZ and ExO being ExO includes a measure of difficulty, e.g. how hard the scorer thought the ball was hit? Is that what you’re saying?
CW: Yeah. Still, I wonder how much of the BIZ correlation is pitcher tendencies and how much is scorer bias.

CW: My latest article on batted ball bias – http://bit.ly/bYycrn (This one’s free to nonsubscribers.)


6 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Detroit Michael
13 years ago

Given this statement:

“CW: Major difference between UZR and +/-, IIRC, is the baselines – UZR uses three-year baselines, +/- uses one year.”

I believe this result is completely expected:

“CW: UZR has a higher year to year correlation with next season UZR than DRS has with next season DRS. Not sure it’s significant.”

Mike Fast
13 years ago

Michael, yes, there are definitely sample-size effects involved here.  That’s true for the fielding stats themselves as well as for the R-squared numbers or root-mean-squared errors that Colin and I reported.

One of the things we were trying to puzzle out is how much impact on the runs-saved season numbers came about simply because of the different sample-size baselines used by the two methods.

I think it’s reasonable to believe that there is a real impact from the sample-size baselines, but there seem to be other troubling issues in the data as well that make the whole picture very cloudy on the surface.

Some care will need to be taken to sort through these issues carefully, discard the ones that are misleading, quantify the ones that are accurate, and hopefully emerge on the other side with better fielding metrics.

I imagine that the fielding metric gurus like MGL have already sorted through some of these problems.  One of my hopes from documenting this conversation was to have it around as a reference for questions in the future when MGL is talking about the construction of UZR.

Colin Wyers
13 years ago

I think they actually follow pretty logically from one another.

Think of it this way. Assume that in a particular zone the “true” out rate on a ground ball is 60%. (This is all entirely fictional data to illustrate a point.)

In the course of a season, let’s say 500 grounders are hit in that zone. We can figure the random variance as so:

.6*(1-.6)/500 = 0.00048

The square root of that gives us the standard deviation, 0.022.

So in other words, the observed out rate will typically fall somewhere between .058 and .062.

The smaller you split the zones, the lower your sample and so the more “noise” you get in your results. By smoothing out the values (either by increasing the number of years or the size of your zones, for instance) you cut down on the noise in the observed out rates. That should increase your predictive power year-to-year, all else being equal.

Peter Jensen
13 years ago

CW: @MacAree They’re both supposed to be park adjusted. Unless you have some other explanation…

From what source are you determining that UZR and +/- are both park adjusted?  I am not sure that either one of them are.

Colin Wyers
13 years ago

Peter,

If you check the link early in the conversation on the differences, both MGL and Dewan/Ben say they are adjusting for park in computing the metrics. They actually spend some time discussing the differences in how each is adjusted for park.

Peter Jensen
13 years ago

Colin – If you read those comments in that thread closely you’ll see that Ben J. is saying that they don’t have to park adjust their numbers because of their smaller “more precise” buckets.  MGL in the comments confirms that he does some type of generic park adjustment by outfield position and a single infield adjustment, but exactly how it is done is not exactly clear.