Pitcher similarity scoresby Josh Kalk
February 12, 2008
Since baseball was invented, people have been trying to compare players to other players. You will hear fans say "That guy is like Jose Reyes, without the speed" or a scout describe a Double-A pitcher as "having a ceiling of Roy Oswalt." Intuitive comparisons like these are useful, but aren't very quantitative. How similar is a player to Reyes if he doesn't have the speed? What exactly does it mean for a pitcher's future if he has a ceiling of a Roy Oswalt?
To help answer these questions, you have to start using numbers. Bill James was the first to do this, in 1986 when he added similarity scores to his "Baseball Abstract." Basically, the scheme was to calculate the similarity score between two players: You started with 1,000, then subtracted points depending on how the two players' statistics differed in key categories. James' most recent update to similarity scores came in "The Politics of Glory," published in 1994. (If you can't find this excellent book, you can see the full explanation at baseball-reference.)
These similarity scores are best used at or near the end of a player's career, because they rely heavily on differences in counting statistics. It is no surprise, then, that these scores are great for comparing potential Hall of Fame players. If a player is most similar to someone already in the Hall, then it makes sense that the player should be as well.
The scores are not intended to compare pitchers' stuff, just the results. For example, Roger Clemens' most similar comparison is Greg Maddux. While this makes sense because they are similar in categories like wins, losses and ERA, these pitchers achieved these numbers quite differently. So, if we are interested in a similarity score that compares two pitchers' stuff, we will have to look elsewhere.
Another similarity score, developed by Nate Silver for Baseball Prospectus, is at the heart of his PECOTA forecasting system. Silver's similarity score compares only the last three years, but relies heavily on rate statistics (like K/9 and BB/9 for pitchers) and attributes like handedness, height and age. Silver's scores do an excellent job at creating similarities that PECOTA then uses to forecast the player's future.
Because of its reliance on rate statistics, the Silver system does a better job of comparing pitchers' stuff, but occasionally will still throw a clunker in there. For example, before the 2007 season, retired knuckleballer Charlie Hough scored a 45 compared to Brewers fireballer Derek Turnbow. While 45 might not sound very similar, it actually is. According to the BP glossary, a score of 0 is "meaningfully similar" and a score above 50 is "substantially comparable."
Why is Hough showing up as so similar to Turnbow? Both have a hard time finding the strike zone, regularly walking nearly four batters every nine innings. But the reason for them missing the strike zone is quite different. Turnbow imparts every last ounce of his energy into making the ball fly as fast as it can. As a result, he has a hard time controlling it. Hough fluttered the ball in the general direction of home plate, often missing the strike zone. While the results are similar, the method could not be more different. So if we are going to properly compare pitchers' stuff, we are going to have to go deeper than rate statistics.
This is where PITCHf/x comes in. We can compare pitchers' stuff by comparing their fastballs, curveballs, sliders, etc., to each other. To do this, I will be comparing each pitch type using percent thrown, speed (mph), and the movement of the pitch compared to a pitch thrown without spin in the horizontal and vertical directions (inches). By combining these variables in a crafty way, I will come up with a score that will indicate roughly how similar two pitchers' stuff is.
Note that none of these variables contain anything related to the success of these pitches or the pitcher's command of them. This similarity score is focusing just on the types of pitches a pitcher throws and how he throws them. The next section is going to lay the mathematical framework for the similarity score. While I am going to try my best to fully explain everything carefully, ifyou are really math adverse or just want to skip to the results, now is the time.
The math behind the similarity scores
I could approach the problem the way Bill James does in his similarity scores, subtracting a point for each difference of .02 in ERA (max 100 points) but there are a few problems when you do something like that. First, the artificial cap that is put on the similarity score for variables like ERA (e.g., a pitcher with a 2.00 ERA is just as similar to a pitcher with a 4.00 ERA as he is to a pitcher with a 6.00 ERA) is a problem. James does this because some pitchers' similarities would just blow up and be completely dominated by the difference in ERA if left uncapped.
Still, there should be some extra difference if the ERA difference is above 2.00, so we would like to come up with a scale without caps by lessening the penalty the further away the pitchers are. So, in our ERA example, we could do something like subtract one point for each difference of .02 in ERA up to 50 points, then one point for each difference of .04 in ERA up to 50 points, and so on. Tax brackets are set up in a somewhat similar way.
The other question is why James chose one point for each difference of .02 in ERA. Does that make ERA too important, or maybe not important enough? Should he instead have chosen one point for each difference of .03 or maybe something in between? My guess is James chose .02 because it's a nice round number and when he finished assigning point values for ERA, walks, strikeouts, etc., the subtractions from each variable were pretty close to each other.
While pretty close is nice, mathematics offers us a way of aligning each of the variables so their weight is exactly the same. The method is called normalizing the data, or finding a normalizing constant. The advantage is that we get exact weights for each of our variables, and when you are comparing things like speed of the pitch in mph and movement of the ball in inches you can put them on equal footing. The disadvantage is that the math is quite hard to do and to explain. Lastly, I'd prefer a scale from 0 to 100, like a percentage, where 0 is completely dissimilar and 100 is exactly the same.
So here is the equation I chose. This is a first attempt and this equation might change as I do more research with it.
Okay, that might be a pretty intimidating mess to many of you, but once I break it down, it will make more sense. The similarity between two pitchers starts at 100. Every difference found is subtracted from that. That is the first part of the mess.
The next part of the equation is the mathematical symbol sigma or sum. Basically, all it means is add up the following expression, starting by plugging in the value on the bottom for the variable n and then keep going until you reach the value on the top. Here we are doing this because we have seven different pitches with which we would like to compare the pitchers. (I haven't added the knuckleball to this, because of a lack of data for knuckleballs). This is used so I don't have to write out each term separately. So all this is saying: Compare the pitchers by looking at the differences between each type of pitch (e.g., pitcher 1's fastball to pitcher 2's fastball and so on).
Then, we compare the pitchers via percent thrown for each type of pitch. For example, if pitcher 1 throws his fastball 66 percent of the time and pitcher 2 throws his fastball 50 percent of the time, this term would be 16/2 or 8. The absolute value is there to make sure this is positive because the difference between the two pitchers needs to be the same no matter if you are using Greg Maddux for pitcher 1 and Tom Glavine for pitcher 2 or visa versa.
I am dividing by two because I want to keep the total similarity between 0 and 100. If we compared a pitcher who threw nothing but fastballs to a pitcher who threw nothing but curveballs, when we compared their fastballs we would have |100-0|/2 or 50 and when we compared their curveballs we would have |0-100|/2 or 50 which would add to 100 and 100-100 is 0 meaning these two pitchers would be completely different. If the two pitchers throw the same types of pitches the same amount, they will come out with a high similarity score. If they throw different pitches or throw the same pitches but at a radically different amount, they will come out not very similar.
Okay, now to the most complicated part of the equation, the Cs, Cx, Cz terms. We already have compared the pitchers by the frequency of the pitch type. Now it is time to compare them with the attributes of the pitches, the speed (Cs) and movement of the ball compared to a ball thrown without spin in the horizontal (Cx) and vertical (Cz) directions. If one of the two pitchers doesn't throw that pitch, these terms are considered to be 1 because the other pitcher doesn't have any attributes to compare to. If both pitchers throw that pitch, we have to calculate Cs, Cx and Cz. I have written out only Cs, but Cx and Cz are identical, just plugging in movement instead of speed into the equation for Cs.
Cs is terribly messy because we need an equation that offers a smaller and smaller difference the further apart the two pitchers are and treats speed and movement on equal footing. This is done using a normal, or Gaussian, distribution. These "bell curves" are used everywhere in statistics to describe things from IQ and test scores to height and weight of a population. This distribution is handy because we can normalize the area under a Gaussian curve to 1. This is how we can put speed and movement of a pitch on equal footing because the Gaussian for each will have the same total area. The normalization factor is the fraction outside of the integral. If you are unfamiliar with standard deviation you can read up (here) and the Gaussian itself is the term inside the integral.
(If you haven't taken calculus, that funny squiggle with S1s on the bottom and S2s on the top is called an integral. All you need to know about it is that means find the area under the curve starting at point S1s and going to point S2s. You can take a peek at this picture from wiki to get an idea.).
The beauty of a Gaussian distribution is when you are comparing two values on its curve and they are say 2 mph apart it matters where on the curve you are how important that difference is. If one pitcher throws 1 mph above league average and the other pitcher throws 1 mph below league average, many, many pitchers fall in between these two pitchers (shown by the larger area under the curve in the middle). While the absolute difference between the pitchers is only 2 mph, the relative difference is very high. On the other hand, if both pitchers are fireballers and throw well over the league average, then there are many fewer pitchers between the two (small area under the curve at the extremes) so the same 2 mph difference is relatively much smaller.
So, just as with the frequency of the pitches, if two pitchers throw the same pitch with almost the same speed and movement the difference between the two pitchers will be small. If the pitchers throw the same pitch with a much different speed and movement (think Jamie Moyer's and Joel Zumaya's fastballs), it's quite different.
Whew, finally we are done. If you don't fully understand what is going on in every part of the equation, that's okay. Just remember that the equation looks at the seven pitch types and compares them using frequency of each pitch and the speed and movement for that pitch.
So which two pitchers throw the most similar stuff? The answer, using the above equation, is Will Ohman and Rafael Perez at a 99.9 similarity score. You can take a quick look at their stuff by looking at their player cards at my blog here and here.
Now, when you think about Ohman and Perez, the first thing that jumps out at you is their difference in success. Ohman didn't have a great 2007 pitching out of the Cubs bullpen; Perez was lights out in Cleveland. That said, they both are lefties (though the equation couldn't care less about that) who throw a fastball and a slider. They throw these pitches at nearly the same ratio (54 percent fastball, 46 percent slider) and with nearly the same speed (90 mph on the fastball and about 83 mph on the slider). In fact, looking at their player cards you can tell why they ended up with very high similarity scores.
It isn't too much of a surprise that a pair of two-pitch pitchers ended up with the highest similarity scores. The equation punishes pairs of pitchers if they don't throw the same type of pitches, and the more types of pitches thrown, the more chance for them to be thrown differently. Looking at pitchers who throw more pitches might be a little more enlightening. So here are the five most similar pitchers to Barry Zito, according to the equation.
Pitcher Similarity Score Matt Herges 94.7 Mark Hendrickson 93.9 Ted Lilly 83.6 Rich Hill 81.5 Jon Garland 79.5
Zito throws a fastball, a change-up, a curveball and a slider in that order of frequency. Despite his nasty curve being third on the list, he throws a relatively low number of fastballs, so he still is using that curve at a relatively high frequency. What about the other pitchers on the list?
Herges throws those same four pitches but doesn't have nearly the same bite to his curve, so he throws it about half as much. His other pitches are very comparable, so he ends up rated very highly.
Hendrickson doesn't throw a slider and his curve isn't much to write home about, but he uses it almost as frequently as Zito so he grades out pretty well. After Hendrickson, though, Zito's comps fall off a cliff.
Lilly and Garland throw the right four pitches, but the frequency and attributes are all messed up, and Hill is missing the slider, but his strong curve is enough to keep him in the ball park.
So what does this say about Zito? Well it implies that there are only a few pitchers out there who are very similar to him. What does that exactly mean? How unique a pitcher is Zito? Next column: This one only scratched the surface of what we potentially can do with these similarity scores.
References and Resources
I'd like to credit to reader Kent, who mentioned this idea to me several months ago.