Pitcher similarity scores (Part 2)by Josh Kalk
February 19, 2008
In the first part of this article we looked at comparing pitchers using a new similarity score that was intended to compare pitchers' stuff. Here, stuff means what types of pitches a pitcher throws, how often he throws them, how fast the ball travels and how much movement it has compared to a pitch thrown without spin.
In this article, we will dig deeper into these similarity scores and some results from them. But first, I made a change to the similarity scores themselves. The idea is still the same but the math is slightly different. If you who want to see the math, read on. If not, skip on to the results in the next section.
Similarity scores take two
After conversing with readers who offered help with the similarity scores over at BallHype, I settled on a new equation.
Again, the beauty of this equation is that all scores will be between zero and 100. I toyed with removing the pitch frequency but found that too many pitchers who just dabbled with a pitch would end up with similarity scores dominated by how they threw that pitch. So pitch frequency is back, but now it is on the same footing as the other pitch attributes with a weight factor out front.
Results for the new similarity scores
There were 514 pitchers who threw at least 100 pitches tracked by PITCHf/x (after removing the two knuckleballers), so if you compare every pitcher to every other pitcher and plot all the similarity scores, you end up with a histogram like this.
You can kind of break this histogram into three parts. The first part is the spike at 0. This occurs when two pitchers don't throw any pitches in common (e.g., one throws a sinker and a slider and the other throws a fastball and a curve). The next part is the bump near 30. This is from pitchers who throw just one pitch in common (e.g., one throws a sinker and a slider and the other throws a fastball and a slider).
The reason for the large spread in this bump is that the percent that the two pitchers throw their common pitch is variable. As that percentage grows, so does the similarity score. Lastly, the curve rises until near 100 where is drops to zero. These represent pitchers that throw virtually identical pitches. It drops to zero at 100 because that is the maximum similarity. I like this curve because you can think of these scores as grades (60 and below F, 60-70 D, 70-80 C, 80-90 B, 90-100 A).
Sometimes these similarity scores result in some revealing comparisons. For instance, if you look at the top five most similar pitchers to Roy Oswalt you come up with: Matt Cain, Manny Parra, Matt Garza, Phil Hughes and Scott Proctor. The top four pitchers are all young, rising stars. All throw hard and are four-pitch pitchers (fastball, change-up, curve and slider). The fifth is a journeyman reliever. This goes to show that throwing the same type of pitches isn't a recipe for results. Any one of the four youngsters on the list could harness their talent and have a career like Oswalt. Or they could slide back and become relatively anonymous.
I also want to point out that while the pitcher's arm angle isn't included in the similarity scores, the arm angle greatly affects the spin axis on the ball (and therefore the movement of the pitch), so the similarity score can pick up on it. For example, submariner Cla Meredith's top comps include fellow submariners Ehren Wassermann and Byung-Hyun Kim first and third on his list. Who is second? Brandon Webb. Webb's sinker has so much sink to it that is right in line with many of the side-armers. So while arm angle does play a role, pitchers like Webb can mess up some of the comparisons.
Okay, so now that we have a better definition for these similarity scores, what can we do with them? The easiest thing to do is to calculate a uniqueness rating for each pitcher. To do this I am going to use the top 20 most similar pitchers based on the similarity scores. Basically, I just add up the top 20 scores, divide by 20, subtract from 200 and then multiply by 3/2 so the scale goes roughly from zero (very common) to 100 (incredibly unique). Unlike the similarity scores, though, most pitchers end up very low on the uniqueness scale.
Here are all of the 514 pitchers' uniqueness scores. As you can see, most pitchers are below 20 and anything above 30 is quite rare. Here are the top five most unique pitchers.
Pitcher Uniqueness Kevin Cameron 103 Mariano Rivera 100 Justin Duchscherer 80 Jose Valverde 67 Paul Shuey 56
The top three pitchers on the list all feature the same pitch, the cut fastball. Because few pitchers throw a cutter a lot, the pitchers who do end up scoring quite high on the uniqueness scale. In fact, Cameron and Rivera are each other's top comps. You probably didn't need this uniqueness scale to know that Rivera is a pretty rare bird, but the fact that he ends up second on this list is a good sign for this metric.
Valverde is on the list because of his splitter. Last year he limited himself to only throwing his fastball and his splitter and his tracked pitches bear that out. His splitter, though, is an incredibly unique pitch which comes from how he holds the ball. His grip is regular but instead of holding the ball at the seams he places one seam in between his fingers. I've never heard of any other pitcher doing this. If you have heard of another pitcher doing this please let me know in the comment area below.
The result of this unique grip is a very large sink for a splitter and much less horizontal movement to the point that it is very similar to his fastball. This can really confuse the hitters and they see fastball until the bottom drops out.
Shuey was a pretty big surprise to me. I have seen him pitch many times and I didn't find him remarkably different than other pitchers. In fact, none of Shuey's three pitches are very unique by themselves. His fastball is almost identical to a league average fastball, his sinker has a bit more bite to it but nothing extraordinary, and his curve is pretty over the top, producing pretty close to a 12 to 6 curve.
What is unique about Shuey is how he puts these pitches together. First, he uses his fastball and his sinker very frequently but mixes them up quite a bit, throwing nearly as many sinkers as regular fastballs. This is very unusual for a pitcher as most favor either the fastball or the sinker and rarely throw the other. Second, most pitchers who throw a sinker also throw a slider (how many times have you heard a pitcher described as a sinker/slider guy?). Shuey doesn't throw a slider at all and it is very rare to find a pitcher with a solid sinker throwing a curve, much less a 12 to 6 curve like Shuey's.
The future of similarity scores
I hope that as more data roll in we can start putting these similarity scores to good use. For instance, how does a pitcher compare with himself as the years go by? If he looses a tick on his fastball or starts throwing a new pitch, the similarity score will notice. The same is true for pitchers coming back from injury. You can imagine looking at pitchers who have come back from Tommy John surgery and checking to what level they come back using these similarity scores.
Also, is it true that their second year after their surgeries is much better than their first? What about the effects of a pitching coach on a pitcher? Might a pitcher who no longer works with a Zen master like Leo Mazzone have some mechanical breakdowns that lead to his pitches being altered? So I hope this is just the tip of the iceberg.
Lastly, I've added similarity scores and uniqueness to each pitcher on my player cards page. If you would like to check out your favorite pitcher and see how he compares, that is the place to do it.
References and Resources
Thanks to all who commented about the similarity scores after the last article. Without your help, I never would have gotten to this current setup. I'd especially like to thank reader Ike who suggested taking the square root on the sum the squares which really is the key to this whole thing. I'd also like to thank Daron Sutton, who did a great job of explaining Valverde's grip on his splitter during one of his broadcasts.