Pitching injuries: A PITCHf/x lookby Kyle Boddy
April 13, 2011
|Manny Corpas has suffered multiple elbow injuries. Could PITCHf/x metrics help us determine why? (Icon/SMI)|
My apologies for the long delay in articles. I've been hard at work on re-creating minorleaguesplits.com over at ML Splits using Jeff Sackmann's freely-available data files. I have all the MLEs and park factors from Jeff's data for batters and pitchers, and it's finally at a stable point where the archive data can be easily accessed. I've also modified some scripts with the help of THT's own resident genius - Brian Cartwright - and have started to spider all the original source data from all the minor leagues (Rookie to Triple=A) in hopes of creating a brand new data warehouse and continually-updating site - just like Minor League Splits used to be, with a few more upgrades!
The other project I alluded to in previous articles is the Open Biomechanics Project. This has been pushed back to mid-May 2011 at the earliest due to my work with minor league data, vacation trips, and running Driveline Baseball in Seattle.
PITCHf/x Variables and Pitcher Injuries: The Link
Our holy grail is to correlate easily-collectible data points with trips to the disabled list. This would help save teams millions of dollars by knowing which pitchers were more likely to break down. Josh Kalk wrote a bit about this using neural net analyses back in February 2009, and Jeff Zimmerman has done some work on this using BMI, which didn't show a lot of promise (see comments).
Using the Advanced Baseball Injury Database, I separated pitchers into two groups:
1) Pitchers who suffered shoulder or upper arm injuries that kept them out for at least 15 days from 2008-2010
2) Pitchers who suffered elbow or forearm injuries that kept them out for at least 15 days from 2008-2010
I pulled all the days lost on the disabled list by pitchers that fit into these two groups and then calculated the following nine independent variables that went into my regression analysis:
-Body Mass Index (BMI)
-Adjusted vertical release point (Height - Average Z-release point)
-Average fastball velocity
-Variance of z-release point (weighted across all pitches)
-Variance of x-release point (weighted across all pitches - pitchers who clearly stood on different parts of the rubber were eliminated from the analysis)
-% of pitches that were fastballs
-% of pitches that were sliders and cutters (grouped)
-% of pitches that were curves or knuckle curves (grouped)
-% of pitches that were changeups or sinkers (grouped)
Some initial hypotheses that some people have about pitching mechanics and injuries are:
-A more consistent release point decreases the chance of injury
-The higher the release point in relation to the body, the more stress on the glenohumeral (shoulder) joint due to various stabilization issues
-The lower the release point in relation to the body, the more stress on the elbow joint due to pronation/hyperextension theories
-More sliders/cutters increase the chance of elbow injury (supinated release)
-More changeups/sinkers increase the chance of shoulder injury (pronated release)
Results: Shoulder Injuries
I identified 144 pitchers from 2008-2010 that suffered major shoulder injuries that fit the criteria that I set forth above. Using the nine-factor model above, no characteristic was statistically significant at the alpha = 0.05 level. None came close. This was surprising, as I personally thought that a higher release point (in relation to the body) would be weakly and positively correlated with increased risk for shoulder injury. This theory is shared by many educated rehabilitation specialists, and it wasn't even close to being statistically significant in this model.
No other factors were close to a cut at alpha = 0.05 or even alpha = 0.10, so we cannot reject the null hypothesis based on this model.
Results: Elbow Injuries
I identified 114 pitchers from 2008-2010 that suffered major elbow injuries that fit the criteria that I set forth above. Using the nine-factor model above, BMI and Slider/Cutter % were statistically significant at the alpha = 0.10 level, while vertical release point variance was statistically significant at the alpha = 0.05 level! Using this more specific three-factor model, the r-squared was 0.11 and both Slider/Cutter % and vertical release point variance had p-values < 0.05, while BMI was still just below 0.10.
The theories that a more varied vertical release point can lead to more elbow injuries may have some validity to it, as are the theories that increased use of sliders/cutters have the same detrimental effect. Increased BMI was weakly and negatively correlated with elbow injury - meaning if the effect is real, the bigger you are, the less likely you are to suffer an elbow injury.
While I know most people were hoping for a bit more exciting news, the truth is that proper use of regression analyses and statistics very rarely leads to these kinds of discoveries. (The joke in scientific research is that if you've made a conclusion, you've done something terribly wrong.) We're simply not going to find a model of basic characteristics (height, weight, PITCHf/x values) that has an r-squared of 0.50 with p-values of each variable < 0.05, meaning that 50% of the variance is explained with a very low likelihood of being subject to chance alone.
Generally an r-squared of 0.25 is desired to take action on the model, but there are some refinements to the data that may yet yield better (or at least different) results. There are a number of data-specific issues that can be drilled down (reducing the sample size but perhaps increasing the specificity of the sample - ligament damage specifically rather than just bone chips, for example) or better clarified (days on the DL does not control for injuries suffered at the end of a season where off-days are used to recover from the injury, for example).
I plan on doing more research into this field and find it promising that we've discovered even a tenuous link between elbow injuries and some basic (though annoying to compile) variables. It certainly warrants further research, even if no conclusions can be made from the findings. It's possible that neural net analyses will turn up more interesting information on historical data, or that an in-game analysis tool can be built like Josh Kalk alluded to in his previously mentioned article.
Kyle Boddy is the owner of Driveline Baseball and Driveline Biomechanics Research, both in Seattle, Washington. At his facility, he's melded statistical analysis, strength & conditioning, prehab/rehab, and advanced biomechanical analysis concepts to develop improved efficiency, durability, and fastball velocity of baseball pitchers. He can be reached via email at email@example.com and found on Twitter: @drivelinebases.