Recall this scenario: It’s Game Two of the American League Division Series in 2010. James Shields has been dominant through four innings, and he takes the mound to face Matt Treanor to begin the fifth. Treanor takes a strike, followed by two pitches outside the zone. Shields’ next pitch hits Treanor in the back, and Treanor takes first base.
Buck Martinez notes that, while Shields has been great, he has released “a few errant throws” (similar to the pitch that hit Treanor) throughout the game. These awkward pitches seem to come almost out of nowhere, often when Shields appears to be in complete control.
Shields’ first pitch to Elvis Andrus is another one of those wild tosses. It misses the outside corner by a few feet and is released from a somewhat atypical arm slot. His next two pitches are much more effective, though. The third pitch of the at-bat, a good sinking fastball, induces a weak ground ball single that splits third baseman Longoria and shortstop Jason Bartlett.
In the dugout, pitching coach Jim Hickey picks up the phone and hangs it up a few seconds later. He motions to manager Joe Maddon, who stands near the dugout railing. The two talk briefly (though Maddon appears to be doing more listening than speaking) and Maddon nods before heading out to the mound.
As Maddon approaches the congregation forming on near the mound, Shields looks furious and mutters a few words into his glove. As the pitcher walks off the field, he mouthes one word caught by the cameras: “Wow.”
In a post-game autopsy of Maddon’s decision to remove his starter, Joe Smith of the Tampa Bay Times would reveal that Shields was “surprised” and “disappointed” to be removed from the game. Shields had a point. He had, after all, thrown a mere 68 pitches (a season low). The righty also thought he had the edge against Michael Young and Josh Hamilton, the next two hitters he would have faced. He had struck out each once and hadn’t allowed either to reach base that day. Maddon would offer little explanation for his decision.
Somewhere in the clubhouse sat Josh Kalk, former Hardball Times writer and current Tampa Bay Rays analyst. Kalk had been monitoring every pitch thrown by Shields, and he saw something in the last few pitches that he didn’t like. It was Josh Kalk’s opinion, not Joe Maddon’s, that led to Shields’ removal.
Everything I just wrote is true, except for Kalk’s role in removing Shields—that was wild speculation. I don’t know what was said on the phone to Hickey (or who was on the phone, for that matter), but I would guess that Hickey was merely checking on the status of reliever Chad Qualls.
I bring up this scenario, though, because I recently came across the article that vaulted Josh Kalk out of the blogosphere and into a major league baseball operations department. Way back in 2008, Kalk attempted to create a model that predicted and prevented pitching injuries. It was a tall task for someone with no access to a major league pitching coach, but Kalk thought he had data rich enough to give it a try.
The Injury Zone
When Kalk wrote “The Injury Zone” in early 2009, a new form of baseball data had just been released to the public. Kalk was able to take advantage of data produced by PITCHf/x cameras that have been recording baseball games since 2008, and that now sit in every major league stadium. The information that Sportvision’s system provides essentially allows for the reconstruction of the ball flight and release point of each pitch thrown in a major league ballpark.
Kalk assumed that somewhere in this mountain of data there had to be some variable (or combination of variables) that could provide a signal predictive of future arm health. What was really novel about his idea, though, was that it worked in the short-term. Instead of taking on the task of projecting the likelihood of a pitcher succumbing to injury within the next week, month, or season, Kalk wanted to identify situations in which a pitcher might hurt his arm after throwing his next pitch.
The process seemed pretty straightforward. We’d look at PITCHf/x data for healthy pitchers and compare it to date for pitchers who suffered an arm injury. If the data look different a few pitches before the injury, then we have our signal—right?
Not exactly. It doesn’t make sense to compare raw pitch data across pitchers, and Kalk realized this. For example, changes in fastball velocity that look normal for a pitcher like Yu Darvish may look like trouble for a pitcher like John Lackey. The key, then, is to compare data from just before the occurrence of an arm injury to that same pitcher’s data from a point at which things were going well.
Kalk also realized that the few PITCHf/x variables he thought would stray from normal levels just before an arm injury were (individually) weakly correlated with actual injuries. If we were to simply plug these variables into an equation, we probably wouldn’t find anything meaningful. Thus, his solution made use of a neural network.
A neural network takes a series of input variables and creates a combination of weights that make the most sense for the data it is provided with. These weights are then used to create one single variable that is used to predict the output variable (in our case, a pitching injury). Neural networks tend to do a good job with complex data that may not exhibit a linear relationship with the outcome.
The Case for a New Model
Kalk’s results turned more than a few heads, and after much digging around I found only a single attempt to modify or recreate his work. Hardball Times writer and pitching analyst Kyle Boddy also worked with a neural network, and found that a pitcher’s last start before a DL stint doesn’t look much different from a randomly picked healthy start.
Five years have passed since Kalk’s piece was published, and teams don’t seem to be getting any better at preventing injuries. According to data from FanGraphs’ injury database, pitchers with an elbow, shoulder, or general arm injury spent a combined 11,656 days on the disabled list last year—up from 10,755 days in 2012 and 8,424 days in 2011. This upward trend seems not to be confined to minor setbacks. In 2013, the number of arm injuries that sidelined a pitcher for more than 15 days also reached its three-year high.
Preventing injuries to major league starting pitchers is clearly a valuable pursuit. In the above scenario, the Rays (hypothetically) saved Shields and what was left of his $2.5 million salary for the rest of the playoffs. In 2012, the Nationals were so worried about their investment in Stephen Strasburg that they held him to a strict innings limit during a race for the playoffs. This controversial decision upset many D.C. fans. A model similar to Kalk’s surely would have come in handy when the Nationals were deciding on how to distribute Strasburg’s innings.
What follows is my adaptation of Josh Kalk’s method. Though I set out with the same goal that Kalk did, I made a few significant changes to the model. At this point, the best model may not make use of publicly available data—but it’s worth a shot.
My model was created using PITCHf/x data from 2011-2013, and injury data from 2012 and 2013. One advantage I have in approaching this problem five years later is that Sportvision’s system has improved greatly since 2008. We used to see large discrepancies across stadiums (something that Kalk worried about), but not as much anymore. The injury data I used come from Jeff Zimmerman’s database, which can be found at baseballheatmaps.com.
I’ll also be introducing a new variable in my model: release point variance. In 2011, Boddy found that the variation in a pitcher’s release point is a significant predictor of elbow injuries. For our purposes, variance captures consistency around a pitcher’s average release point. After I created this measure and compared it to pitch count data, I found that most pitchers lose this release point consistency as they fatigue. I also found that amount of vertical release point variance in the last 15 fastballs a starter throws tends to spike a few pitches before a DL stint, so I included this measure in the model.
Now I’ll get down to the nitty-gritty, but bear with me. I chose only fastballs thrown by starting pitchers who made at least five starts before suffering an elbow, shoulder or triceps injury during this span. I created injury and non-injury periods for each pitcher, because the model needs to be shown both types of data. Each injury period represents the last 15 fastballs thrown before an injured starter was removed. Kalk chose a 10-pitch range; I will soon explain why I went with a larger one. Period averages of the following PITCHf/x variables of interest were used as input variables for my neural network:
- Horizontal (X) spin deflection: Measured in inches, this is correlated with horizontal movement.
- Vertical (Z) spin deflection: Also measured inches, but correlated with vertical movement.
- Vertical release point: Ideally, this would measure the height off the ground at which the pitch is released. Instead, PITCHf/x captures the height of the pitch from 50 feet away from the plate. Mike Fast noted that most pitchers release the ball from 54 to 55 feet from home plate. Stuff happens within the five or so feet between release and measurement, and that stuff makes it difficult to compare pitchers using these data. I’ll attempt to work around this issue by comparing a pitcher’s PITCHf/x release point values only to the same pitcher’s other PITCHf/x release point values.
- Vertical release point variance: The amount of variance in a pitcher’s vertical release point around the average of his last 15 (fastball) release points. A pitcher whose last 15 release points are tightly grouped records a low variance, and a pitcher whose last 15 release points are all over the place is assigned a high value.
When creating non-injury periods and baseline values, I think one factor that Kalk neglected to consider was in-game context. Most starting pitchers show evidence of fatigue as they progress through a start. The rate of decline, however, tends to vary by pitcher. To come closer to identifying signs of danger, I think it is important to look at changes relative to a pitcher’s normal response to fatigue.
For example, some starters (like Jake Westbrook) release each pitch at a slightly lower vertical point than the last one. This is why pitch counts become important. They create the pitcher-specific context necessary to evaluate anything out of the ordinary.
Look at the graph below, which plots averages of these five variables at each level of fatigue (measured by pitch count) for Westbrook. You’ll quickly realize that it doesn’t make sense to compare Westbrook’s average release point (in blue) from pitches one through 20 to an injury that occurred at pitch 45.
Kalk attempted to remedy this issue by creating non-injury periods representing averages of the last 10 pitches of each pre-injury start, but I don’t think that solves the problem completely. Instead, I created non-injury periods that match each exact pitch count level at which the injury occurred. So, if the injury period for Westbrook is made up of data from pitches 40-54 (assuming the injury occurred on pitch 55 and that these previous pitches were all fastballs), all non-injury periods for Westbrook will come from that pitch count range and come from previous starts. Similarly, my baseline data for Westbrook would be made up of all pitches thrown in this pitch count range preceding the injury.
This last bit sets us up to compare a healthy version of a pitcher to the version approaching injury, but I couldn’t simply feed the model these raw differences. Instead, I measured the gap between averaged variables for a pitcher’s injury period to averages from that pitcher’s pitch count-specific baseline period. The differences, in standard deviations, were what I fed the model for each variable.
Another example might help explain this point. Say Westbrook’s baseline fastball within the pitch count range of 40-54 pitches is 88 mph with a standard deviation of two mph. If his fastball averaged 86 mph over this range before an injury exit on pitch 55, then I fed the model the difference of one standard deviation. When we use standard deviations instead of raw differences, we create a level playing field.
The sample size for arm injuries to starting pitchers isn’t as large as I would have hoped. After getting rid of pitchers who didn’t meet the criteria I described earlier, I ended up with 52 injury starts, and a little over 2,200 non-injury starts. To address sample size concerns, I created a few instances of the model.
Each time I created a model, I picked a random sample of 52 non-injury starts such that the number of injury and non-injury starts was equal. This full sample was used to train the neural network. Behind the scenes, the neural network looks for patterns between input variables and the injury output variable. The model uses these patterns to assign combinations of weights on PITCHf/x input variables. These individual weights then interact with each other, producing one single variable. The model then attempts to use many different combinations of initial weights until it reaches one that best predicts the real-life data it was trained with.
A neural network can be judged on how well it predicts things we already know the answer to. So, after training a few of these models, I tested them on new samples. In a perfect model, all pitchers in the non-injury sample would be assigned predicted values near zero, and each pitcher in the injury sample would be assigned a value near one. The first model is Kalk’s, and mine is placed below it.
At first glance, things don’t look that different. Most of the non-injured sample hovers around the .20 mark in both models, with quite a bit of overlap in the middle. The models differ a bit for pitchers assigned values greater than .80, though. In my model, every pitcher with a predicted value greater than .80 would suffer an injury on the next pitch. In Kalk’s, a few healthy pitchers reached this high value.
Below are two more models I created with different non-injury samples and test samples.
Again, the cutoff after which 100 percent of predicted values actually lead to injury appears to be .80. With the exception of two incorrectly predicted healthy cases in the second model, every pitcher in every sample that was assigned a value of .80 or greater suffered an injury on the next pitch. The cutoff for when a pitcher enters the “injury zone” appears to be pretty clear.
On the other hand, each of my models model makes a bunch of mistakes in the .50-.70 range. The difference between a healthy pitcher and one who is on the verge of injury can’t be distinguished here. The additional data and adjustments I made make it clearer that some injuries can’t be predicted using short-run fluctuations in PITCHf/x variables.
Some of the highest predicted values in each of my models (those greater than .80) preceded elbow injuries. I wouldn’t draw any hard conclusions from this outcome, though, because the sample size for values greater than .80 is small (and the number of elbow injuries in this category even smaller). If we had more data, we could start to make connections between short-term red flags and specific types of arm injuries.
Velocity (Kalk’s “most important” variable) was a significant predictor in these models, and this makes sense. As Kalk pointed out, changes in velocity are usual a clear sign that something isn’t right for a pitcher. When we hold fatigue constant, we see that velocity is still a very important indicator of short-term arm health.
In my model, vertical release point was somewhat important as well. Interestingly, this variable was the least sensitive component in Kalk’s model. Why did it become relevant in mine? A pitcher’s release point tends to change more than any other variable over the course of a start. So, once you control for normal fatigue, a different release point spells trouble. The Westbrook example illustrates of this idea.
The effect of release point variance on injuries also seemed to be relatively important in some models. Though a high weight was attached to this 15-pitch measure of release point consistency in my final models, the variable may receive this weight because it interacts with the other input variables. In other words, release point consistency matters when you take other factors under consideration. Simpler models I attempted to create didn’t attach any significant weighting to release point variance, suggesting it may not mean much on its own. A smaller measurement range, something like the last five pitches, might make release point variance relevant by itself.
With all this in mind, it seems like some pitching injuries do come with clear warning signs. These situations are worth acting on. In less than a handful of cases, pitchers entering the injury zone were fine after throwing the next pitch—but the rest weren’t.
In practice, I can see becoming controversial. A manager who removes a starter who insists he’s fine after throwing 30 pitches will draw fire from the media, fans and the pitcher he pulled. “The model told me to” is hardly an explanation for pulling James Shields from Game 2 of the ALDS.
A preferable method to predicting pitching injuries would incorporate kinematic data and video on mechanics, rather than two-dimensional release point and ball flight data. The connection between kinematic information and injury is more direct than PITCHf/x data, and it would also be easier to defend in a press conference. Kyle Boddy talks about the complications in getting kinematic data here.
Let’s Save Some Arms
In a way, Jaime Garcia is the face of the injury zone. The injury that preceded his recent discomfort forced season-ending surgery to repair rotator cuff and labrum tears in his left shoulder on May 22, 2013. Garcia had experienced pain in his shoulder in 2012, but he mentioned that he was experiencing “increasing pain” in his last three starts before the injury. The Cardinals were made aware of this issue only after it became severe during his final pre-injury start.
Garcia’s last 15 fastballs before exiting that last start pushed him deep into the injury zone.
This example may cause some to argue that pitchers entering this injury zone may, in fact, already be injured. Garcia’s silence shouldn’t come as a surprise to us. Pitchers, often motivated to appear durable, can to put their own arms at risk.
We can debate the necessity of pulling an injured pitcher from a game one pitch or a few pitches early, but I think this kind of information has tremendous value. After all, it takes only one pitch for a sore elbow or shoulder to become something permanent, like a torn ligament. Just ask Jaime Garcia.