|Tom Glavine was one of the many starting pitchers felled by an arm injury last year (Icon/SMI)|
By my count, sifting through MLB’s transaction pages, 423 pitchers threw at least one pitch in the big leagues and then landed on the disabled list. This doesn’t even count pitchers who began the year on the disabled list like Carl Pavano or pitchers who missed the entire year like Chris Capuano.
Identifying circumstances in which pitchers are in danger of injury is something Sabermetricians have been looking for for quite some time. In 2002, a giant leap forward was taken when Keith Woolner showed that pitchers with excessive pitch counts had lowered performance and an increased chance of injury. Woolner’s new tool, Pitcher Abuse Points (PAP), had a profound impact on the game, especially for young pitchers.
It has gone so far that a strong opposition group has emerged denouncing pitch counts. The basic argument goes that pitch counts need to be individualized and the number of pitches that you would let C.C. Sabathia throw is more than you would let a youngster like Jair Jurrjens. This makes a lot of sense, but when all you have are raw pitch totals, tailoring a plan to a specific pitcher can be hard. If an average MLB pitcher can safely throw 100 pitches in a game (just picking a number here), then how many more do you trust Sabathia with? How many fewer do you give Jurrjens before coming to pull him?
Enter the PITCHf/x data. With these data we have a huge number of variables with which to examine a pitch. With a nearly complete data base of every pitch thrown in 2008, this material is a pure gold mine and an excellent place to expand our knowledge of when a pitcher might get injured. The key to this study is that these data will allow us to compare a pitcher when we know he is healthy to the last few pitches he throws before going on the DL. This individualized comparison will help us put a pitcher like Sabathia and one like Jurrjens on equal footing.
Preparing the data
The first step is to isolate the injured pitchers we are going to use in our sample. First, this study will look only at starting pitchers—it is clear that starters and relievers should be separated and it is far easier to study starters. Most of them are on a regular five-day routine. Second, this study will examine only starters who went on the disabled list due to an arm injury. Non-pitching injuries like Yovani Gallardo messing up his knee need to be thrown out along with injuries to groins or backs. Even if pitching was the cause of such issues, those injuries may manifest themselves differently than arm injuries.
Next, we need to identify which PITCHf/x variables we are going to use to try to correlate with the arm injuries. The variables I chose were speed, horizontal movement, vertical movement, horizontal release point and vertical release point. In previous articles I combined the horizontal and vertical release points into a release slot, but here I want to keep these variables separate. The reason for this is PITCHf/x doesn’t know where a pitcher is standing on the rubber. If a pitcher moves left or right, it is going to adjust the horizontal release point. If a pitcher is lowering his arm angle because his arm is hurting it will adjust both the horizontal and vertical release point so this should allow us to somewhat shield the data from this effect.
In addition, we don’t want to use the raw PITCHf/x numbers. We want to compare these numbers to the known healthy numbers for a pitcher. To do this, baseline values for each of our variables will need to be calculated by averaging all the pitches for a pitcher before the start he had to leave due to injury. We then will compare the last 10 pitches that a pitcher threw before going on the DL to his mean values.
To normalize this process, we will be comparing these values in terms of standard deviation. This is a rather complicated idea so let me give you an example. If Joe Pitcher’s average fastball is 90 mph with a standard deviation of 1.5 mph and he throws a fastball at 87 mph, that fastball is -2 standard deviations away from his average. Pitches are always compared to the same type pitch for the average (fastballs to fastballs, sliders to sliders, etc.). This pitch, plus nine others, will be averaged for each of the variables we are interested in. This way we can properly compare a fastball thrown by Jamie Moyer and one thrown by Ubaldo Jimenez.
You may be wondering why I picked the last 10 pitches thrown by a pitcher. I would like a small sample because it is likely that the pitcher is feeling the effects of the injury only close to the time he is pulled. But I would also like a large sample because any time during a game a pitcher might throw two or three poor pitches in a row. After playing around awhile, 10 appeared to be the sweet spot.
A similar procedure is done with healthy pitchers to develop a background sample. This sample is kind of like a double blind sample in clinical trials and is important because at the end of a regular start most starters are starting to tire. We don’t want to confuse this normal tiring with injury, so this sample is created to help remove this bias. I created this sample by comparing the last 10 pitches for all starts after which the starter made another start five days later to that pitcher’s baseline. This should remove cases where a starter skipped a start or was pushed back due to a minor injury that didn’t require a trip to the DL.
Now we are ready to start making some correlations between our variables and the injured pitcher sample compared to the healthy pitcher sample. The problem is that none of the variables all by themselves produce even a weak correlation (greater than 0.3 correlated). This is not too surprising, because if one of these variables was even weakly correlated it is likely that this would have been discovered previously and teams would know what to be wary of. Also, when you have something as complicated as throwing a baseball where everything has to be in tune for good results, a wide breadth of variables is often needed to properly study the sample. To go further, we are going to have to pull out a bigger hammer.
The hammer of choice here is going to be a neural network. Neural networks are used extensively in situations like this where you have many variables that are weakly correlated to something you are interested in. The network sets up a hidden layer and, after training, will produce one variable that is a combination of all the starting variables. This new variable will have a much stronger correlation to the thing you are interested in. So I fed my sample of PITCHf/x variables for injured pitchers and healthy pitchers to a pre-built neural network. The results were rather amazing.
Here is the output from the neural network. Both the healthy and injured pitcher samples have been normalized so the total area is one. The idea is the network will play around with different combinations of the starting variables until it creates a variable in which most of the signal (the injured pitchers sample) is around one and most of the background (the healthy pitchers sample) is around zero.
While this separation between the two samples looks like a lot, this is actually one of the poorest separations I have seen. Quite often, the background will be just a spike at zero and the signal will be just a spike at one, so the network did struggle with this data. Then the you can place a cut on this variable wherever you want and you will have a new sample that will contain a larger percentage of the signal. For instance, if I remove all the data below 0.5, the data that remain will contain about three times as many injured pitchers as healthy pitchers in the normal sample and about half as likely in the standard sample . This seems like a reasonable spot to me so I will call the region beyond 0.5 the injury zone. If a pitcher enters this zone there is about a one in three chance for him to end up on the DL instead of making his next scheduled start.
I won’t report the messy combination of variables that is shown here, but I will say that the most important variable is speed followed by vertical movement, horizontal movement, vertical release point and then horizontal release point. Previously, we learned that even after throwing a lot of pitches, pitchers don’t lose a lot of speed on their fastballs, so it isn’t too surprising that if a pitcher has lost a good deal of speed on any pitch, something might be wrong.
Vertical movement was the second most important and, if you combine horizontal movement, total movement is actually more important than speed. Movement here is created entirely by spin and drag, so if a pitcher isn’t quite right it is very hard to get the proper spin on the ball. Vertical movement is likely more important than horizontal because most of the spin applied is backspin (fastball) or front spin (curveball) which makes the move up or down. Large horizontal movement is most often found in sliders, but not all pitchers throw a slider and even those who do don’t always produce a large slide with it.
Release point was the least sensitive, but there is a good chance that this is a limitation of the data. Not only will a pitcher moving left and right on the rubber change the release point, but PITCHf/x tracks the ball only while it is in flight, so it reports only a horizontal and vertical number. No information about the third dimension (closer or farther from home plate) is available, so our view of the actual release is rather incomplete. If we had information on where the pitcher was standing and information on the extra dimension, it is likely that release point would be much more sensitive.
Pros and cons to this method
This method takes recorded PITCHf/x variables and shows a distinct difference between pitchers who were injured and pitchers who were healthy. The data compare a pitcher to himself, which is preferable to one strict variable like pitch counts. With this information, we should be able to track pitchers and identify when they are approaching the injury zone, but this isn’t a silver bullet just yet.
First, while I have tried my hardest to remove biases in these samples, some certainly remain. One of the biggest issues is the severity of the injury to the pitcher. In this study, a tired arm is treated exactly the same way as a pitcher blowing out his elbow. If it means a trip to the DL, it qualifies. You could image a system in which these injuries were weighted by severity, but getting that information can be extremely difficult. For instance, when the Mets put Billy Wagner on the DL, the listed reason was forearm stiffness, but later it was discovered he needed reconstructive surgery on his elbow. So simply sifting through the transaction list isn’t enough. For high profile pitchers like Wagner, eventually getting (mostly) accurate information isn’t difficult. For lesser-known players, getting accurate information can be daunting.
Also, tracking something like this in real time is easier said than done. The PITCHf/x data are updated on the Major League Baseball Advanced Media (MLBAM) website only after an at-bat, and these updates aren’t always super quick. A pitcher easily could have thrown 10 more pitches than you have access to, so even if you can grab and process the updated data instantaneously, there is going to be some lag.
In addition, the data have some quirks. Not only do correction factors need to reflct park to park differences, but sometimes a malfunction records bad data. One pitch that is tracked completely wrong could make things appear much worse than they are. Also, quickly identifying the type of pitch for proper comparison isn’t easy. MLBAM’s algorithm (it uses a neural network for this as well) didn’t produce very appealing results in 2008. This may be improved in 2009, but it may still have too many issues. While my classification algorithm is considerably more accurate, it also takes several seconds a pitch. Right now it is way too slow to be run in real time.
Even if you could properly track this in real time, there is no guarantee that you would catch these injuries in time. First, about 25 percent of the injured pitchers never got to the injury zone. It is likely these injuries occurred very suddenly—a pitcher feeling a pop in his elbow, for example. It appears these are somewhat rare events, but they do happen and this method won’t provide complete protection.
You can ratchet down your cut, but the lower you go the less actual risk there is to the pitcher. This would lead to pitchers who weren’t really in any danger being removed from the game. Even the pitchers who enter, or come near, the injury zone and are properly removed might still be injured. Possibly. these data separate as well as they do because the pitcher is already suffering from the effects of the injury whether he feels it or not. A manager might properly pull the pitcher only to find out he still is going to need Tommy John surgery.
That said, I feel this is an important step in understanding pitcher injuries. It is clear, if you dig deep enough in the data, that pitchers who land on the DL are pitching differently from healthy pitchers. Future studies can look at things like how quickly a pitcher moves toward the injury zone and whether there is a sign of danger earlier than the very end. The separation between these samples also might be improved with better data or a more advanced neural network or other methods.
Lastly, the real test will be to look at 2009 data and determine how well this correlation holds. If the injury zone appears there. then we can have very good confidence in the method.
Sabathia the horse
I want to close by looking at the pitcher who was ridden as hard as any in the game last year, C.C. Sabathia.
Sabathia gobbled up innings late in the year as the Brewers desperately attempted to make the playoffs. There was much talk about the Brewers’ right to use a rented player this way, and about then-manager Ned Yost in particular.
Near the end of the season, I looked at these effects and found that Sabathia didn’t seem to be having any issues. With this new metric, we can ask how close Sabathia came to the injury zone. The answer: not very. His highest value was a mere 0.13, ironically on the last day of the season against the Cubs, clinching the playoffs for the Brewers. This is very good news for Yankees fans who shouldn’t worry that the Brewers wore Sabathia out. And this probably means Sabathia will be able to handle a large load this season as well.