Monday, October 20, 2008
Thinking more about risk: performance variationPosted by Victor Wang at 12:01am
I've been thinking a lot about risk lately. In the risk profile series I have been doing, I have put all of a player's risk factors into one overall rating. However, when we talk about risk with players, we can really break risk up into two categories. The first is performance risk, and the second is playing time risk. It is this first category that I am going to discuss today in this article.
What I mean by performance risk is the variation we expect around a player's statistics not due to a change in playing time. For example, when we evaluate players we usually a formula like (X-Y)*PA. "Y" is the baseline we set for the statistic we are using and "PA" are a player's plate appearances. We want to compare our player's statistic, "X," to a set level of performance, "Y." It is this variation of "X" that we care about when we talk about performance risk.
Let's use batting average as an example. Batting average follows a binomial distribution, meaning that for every time a player comes to the plate and puts the ball in play, he will either get a hit or not get a hit. If we get a large enough sample size, this binomial distribution will approximate a normal distribution. The normal distribution has nice properties and would make it fairly easy for us to approximate the risk in a player's performance statistics. So the distribution for a .280 hitter would look something like this:
The exact variation will depend on how much playing time the player is projected for. As playing time increases, variation decreases. For now, though, let's assume that playing time is constant. There are still, unfortunately, a few problems that arise. We can never be certain that a player has a true talent of, say, .280 . We can only estimate this. So there's a chance that our .280 hitter is actually a .270 hitter or a .290 hitter. Here is what our dilemma looks like:
If we believe .280 is our best estimate of a hitter's talent, we may underestimate the variation in a player's batting average since there is a chance that we are wrong about the hitter's true talent. So to be able to express a player's performance variation, we'll need to do three things:
- 1. Express the desired stat in binomial form.
2. Estimate the true talent of that player's statistic.
3. Come up with an error estimate of the true talent.
So from this process we can see that performance risk comes from two things, not including playing time. Those two factors are the risk that naturally comes from the distribution around a mean, and the risk in correctly estimating a player's talent. The first type of risk isn't something we can really control; deviations will occur simply due to chance. However, the second factor is a risk that we can attempt to manage.
So, theoretically, who should have the lowest risk when it comes to performance variation? Well, that is pretty simple to answer. It will be players who have had the greatest amounts of playing time since we will be able to create more accurate estimates of a player's true talent with a larger sample size. The errors around that true talent estimate will also be smaller.
There is one more area of risk we will need to consider for performance risk. That is the risk that a player has a change in his true talent; or in other words, that he has a breakout season, or collapses. Typically, these changes are found in young or old players, with younger players usually having higher chances of breaking out while older players have higher chances of collapsing.
However, one problem we have when discussing changes in true talent is whether a "breakout" or "collapse" type performance is actually a change in talent or whether it is variation around his mean. For example, is Ryan Ludwick a new player or did he just have a lucky season? PECOTA tries to predict odds for breakouts and collapses, though the accuracy of those predictions has not been empirically tested. Also, David Gassko has published work looking at projecting breakout performances.
To summarize the main points of this article:
- 1. There will be natural variation around the true talent in a player's performance. However, our larger concern is the chance that we measure a player's true talent incorrectly.
2. If there were no error in estimating a player's true talent, we could pretty easily come up with a performance risk measurement.
3. Just because we have an estimate of a player's true talent doesn't mean a player's talent can't change.
Victor Wang's work on OPS has been featured in SABR's By the Numbers magazine, and was the 2007 recipient of SABR's Jack Kavanagh Memorial Youth Baseball Research Award. He can be reached via email here.