Projecting playing timeby Victor Wang
April 07, 2009
I've been saying for a long time that the next step in sabermetrics will be incorporating risk management into player projections. A big part of this will involve more accurate projections of a player's playing time.
We can think of a player's risk in two main areas: performance risk and playing time risk. By performance risk I mean how certain are we that a player will perform near his projected statistics? In other words, what are the chances a player breaks out or collapses, relative to his projection? It can be hard to determine whether a risky (by this I mean higher variance in his projection) player in this situation is preferred to a player who is less risky. A lot will depend on a team's situation.
When we talk about playing time risk, I simply mean how accurately can we project how much a player will play? This factor usually involves injury risk, but it also can involve the chances a player gets demoted or benched. The tricky thing about this is that these risks can be correlated. For example, a player might lose playing time because his performance worsens. Or a player's performance might decline because he is playing hurt. However, this is a topic for another time. In this article I am going to focus on projecting playing time.
We've come pretty far in projecting performance. As this article shows, multiple projection systems out there, including THT's own developed by David Gassko, do pretty well. The next step, though, is to project how often a player will play. This information is arguably more important now given the number of quality projection systems there are available. This got me wondering how accurately we could project a player's playing time, given certain information we have on players before a season begins. I used multiple linear regression to try to find out.
Here's what I did. From 2007 and 2008 Marcel projections, I took every player who was projected for at least 300 plate appearances. This got rid of players who may have received projections only because they were injury replacements or September call-ups. Next, I recorded several variables I thought would be important for projecting playing time. These included such things as whether the player was projected to be a starter, what position he played, his past injury history, etc. Finally, I recorded how many plate appearances that player actually received in 2007 and 2008. I compared these results to the Marcel projections, and then ran a regression using the variables I recorded to see if they had any relationship with a player's playing time.
First, it's important to know how Marcel did trying to project playing time, because if we can't beat this monkey, then it's not really worthwhile using a new system. Marcel projects playing time by adding a player's plate appearances from a year ago multiplied by .5 to a player's plate appearances from two years ago multiplied by .1. From that sum, he adds in a constant of 200. When I ran a correlation between Marcel's projected plate appearances and a player's actual plate appearances, I came up with an R of .28 and an R-squared of about .08. This means Marcel's playing time projections explains about eight percent of the variation in a player's actual playing time.
Now let's look at the results of the multiple regression. The results were basically what you would expect. While I looked at a number of variables, these ended up being the highly significant ones:
1. Age: The older a player is, the lower his plate appearances total is expected to be. This makes sense because older players typically have an increased chance of getting hurt or seeing a large dropoff in performance. This also suggests that we could consider health or playing time a young player's skill, meaning players who struggle to stay healthy will age worse relative to their peer group.
2. Starter: This was a dummy variable in which I recorded if a player was an Opening Day starter. You have to be very careful using this variable. It is very easy to assume a player was a starter given the amount of playing time he had at the end of the year. The result of this variable is very obvious. Starting players get more playing time than bench players.
3. PA1: This was the number of plate appearances recorded in a player's previous year. Plate appearances from two and three years ago were not statistically significant.
4. WAR1: This was the Wins Above Replacement a player had in his previous year. WAR from two and three years ago was not statistically significant. WAR data were taken from FanGraphs. Clearly, a player who did well in his previous season will receive regular playing time his next year; managers generally expect players to perform as they did in the past.
5. DL1: This was the number of disabled list days a player had in his previous year. DL days from two and three years ago were not statistically significant. DL data were taken from Ron Shandler's Baseball Forecaster. In most cases, players with more DL days in the past are more likely to get hurt in the future. No specific injuries were statistically significant at a five percent level. Groin injuries had the highest significance, being significant at an eight percent level (insert jokes here).
Using these five variables and a constant, the model was able to achieve an R-squared of .74 and an R of about .86. Each variable was significant at the 0.1 percent significance level. This suggests that these five variables can account for 74 percent of the variance in a player's playing time.
Other factors can cause some trouble predicting playing time. For example, playing time for back-ups is dependent on how healthy the starters stay. Case in point: Jose Molina got a little bit more playing time last year due to Jorge Posada's injury.
What if you aren't able to project starters for a team accurately? When I take the "Starter" variable out of the regression equation, the model is still able to achieve an R-squared of .67. These results are very encouraging, suggesting we can accurately predict a player's playing time for next year given a few key variables. While the variables used are fairly obvious, it's still nice to see our intuition backed up by significant results.
Obviously, there is more we can do with projecting playing time. The next steps will likely include creating probability distributions and projecting injury risk separate from a playing time projection.
References and Resources
Tom Tango would like your help as he too is working on creating better playing time projections.
Sig Mejdal, now working for the St. Louis Cardinals, wrote about projecting injuries in the Bill James 2005 Handbook.
Vince Gennaro has written about factoring risk into player valuation.
Victor Wang's work on OPS has been featured in SABR's By the Numbers magazine, and was the 2007 recipient of SABR's Jack Kavanagh Memorial Youth Baseball Research Award. He can be reached via email here.