Curveball command
by Max MarchiFebruary 11, 2011
One night in Brooklyn (Johnny Sain) threw 32 straight curveballs from 32 different directions: underhand, overhand, sidearm, three-quarters, behind his back almost. The whole world knew what was coming. Didn't matter. We haven't hit one of them yet. He could drop a curve in a coffee mug.
- Rex Barney
Several PITCHf/x analysts have tackled the issue of measuring pitchers' ability to locate the ball where they intend.
Mike Fast in his essay on Cliff Lee's 2008 turnaround (see the 2009 Hardball Times Annual) wrote that "with a little extra work ... we can come close, at least qualitatively if not quantitatively, to assessing a pitcher's command of his pitches."
Dave Allen explored the matter by looking at charts of Mariano Rivera's cutter. Jeremy Greenhouse first approached the topic last June and published a list of the best at locating pitches in the 2011 Hardball Times Annual.
One issue making the road toward the perfect measure of command difficult is that we can't know the intended target of the pitch, and I believe we will never be able to do that. We'll get close when (if?) we have data on catchers' glove positioning (see Nick Steiner's follow-up on Jeremy's article in the Annual), but that won't be the end-all-debates solution.
Many catchers nowadays move until the last moment to keep hitters from peeking at their position. And, especially on breaking pitches, the position of the glove might just be a reference point; i.e., the pitcher aims at that target but to obtain the desired effect of delivering the ball to a different spot (lower, for example, on a curve).
Thus, data on catchers' glove positioning may be at best an acceptable proxy of intended location (as Mike Fast mentioned in the past, if tracking the backstoppers' feet is an easier task, that would work the same, if not better).
However, all those difficulties do not entirely prevent us from measuring command. In the following lines I'll try to add my contribution, which is heavily based on the excellent work of those who preceded me.
Let's start with a chart, showing John Lackey's curveball locations (data from 2009 and 2010).

As Harry Pavlidis first noted, the scatter plot of pitch locations of a given pitcher often reveals the arm angle of that pitcher.
Here's another one, relative to Matt Cain's curves.

One difference between Lackey's and Cain's charts is that the latter is—well, fatter. If we suppose Cain doesn't throw his curves around on purpose, the fatter scatter plot is an indication of a lower ability to hit the spots.
Mike Fast, while working on the 2009 Annual article, noticed "(Cliff) Lee threw each pitch type to only one location, almost without fail." Curveballs have not been chosen at random for this example. While many of the other pitch types are more effective when delivered on the black, what's important for deuces is vertical location. You can see that in one of my first articles.
Thus the assumption that pitchers do not intentionally fatten their location scatter plot on curveballs can be accepted for the moment.
How can we measure pitchers' ability to hit their spots? Ideally we need to rotate the scatter plot so that the dots are somewhat vertically aligned, instead of roughly aligned along the delivery angle. In this way the rotated reference system would shift from horizontal location vs. vertical location to location perpendicular to the delivery angle vs. location parallel to the delivery angle. A measure of the variation along the new horizontal axis (for example the standard deviation) would capture the lateral command.
Let's go step by step.
1. Rotation
Principal Components Analysis (PCA) is an advanced statistical method. It consists of a linear transformation of the data that chooses a new coordinate system (a rotation of the original axes) for the data set such that the new axes coincide with directions of maximum variation of the original observations.
Okay, let's get ungeeky. Given the above Lackey chart, PCA finds a new axis as shown below.

I hope the above image is worth more than the 50 preceding words.
If the new axis (the first principal component) is chosen as the ordinate, then we have the desired new reference system, where the abscissa is location perpendicular to the delivery angle while the y-axis is location parallel to the delivery angle.
2. Variability
The following histograms (distribution along the second principal component) show the lateral command (from now on short for location perpendicular to the delivery angle) of Lackey's and Cain's curves (they are limited to 2010 data against right-handed batters).


The good news is that both are symmetrical with a single central peak; i.e., roughly bell-shaped. That's another point in favor of the pitchers-aim-at-the-same-horizontal-spot-on-curveballs hypothesis.
Cain's histogram is wider, indicating higher dispersion around the mean (that is, the preferred target). The standard deviation for his lateral location is .627, while it's .517 for Lackey's.
3. Results
Applying the algorithm to the regular curveballers (included in this analysis are the pitchers who have thrown at least 400 curveballs both in 2009 and 2010, according to MLBAM classifications), we get the following list for 2009 and 2010 combined. Lower values indicate better lateral command of the bender. Values have been separately calculated according to batter handedness, then recombined in a single value.
lateral
player command
Roy Halladay 0.51
Barry Zito 0.51
John Lackey 0.51
Dan Haren 0.51
Paul Maholm 0.52
Felix Hernandez 0.53
Zach Duke 0.53
Jason Hammel 0.55
Sean Marshall 0.55
Chris Carpenter 0.56
Javier Vazquez 0.56
Randy Wolf 0.56
Bronson Arroyo 0.57
Jon Lester 0.57
Gio Gonzalez 0.58
Adam Wainwright 0.59
Gavin Floyd 0.59
Justin Verlander 0.59
Wandy Rodriguez 0.60
Yovani Gallardo 0.60
Joe Saunders 0.61
Roy Oswalt 0.63
Matt Cain 0.63
James Shields 0.63
Tim Lincecum 0.67
Ricky Romero 0.67Other than the sexy list, the analysis spat out bell-shaped histograms for nearly all considered pitchers. Another encouraging result is that pitchers show very similar lateral command values against right-handed batters and left-handed batters; they are also stable in consecutive seasons, although the year-to-year correlation seems (remember it's just a 26 individuals data set) a bit lower than the versus righties/versus lefties one (I feel it should be that way).
4. Repeat the process for upright command
I have tackled lateral command first, because it was easier to accept the pitchers-aim-at-the-same-spot hypothesis for it.
However, since we have seen that it's vertical location that is important for Uncle Charlies, let's try to see if the whole process holds for upright command (from now on short for location along the delivery angle) as well.
Here's another scatter plot comparison, featuring curveballs by Roy Halladay and Felix Hernandez (2009 and 2010 data).


This time the charts are more or less equally "fat," but the one picturing the King's curves is way "longer" than the one of Doc's.
The standard deviation on the first principal component (the new axis along the delivery angle) is 1.19 for Hernandez and 0.99 for Halladay. Here are the corresponding histograms (again limited to 2010 data against righties).


Again, they surely have a single peak and are fairly symmetrical. And the histograms of all pitchers listed before share more or less the same traits. Again, the hypothesis that pitchers aim roughly to a single spot when delivering curves is acceptable.
Now, some more ranking.
upright
player command
Bronson Arroyo 0.92
Zach Duke 0.94
Dan Haren 0.95
Roy Halladay 0.99
Javier Vazquez 1.00
Sean Marshall 1.01
Adam Wainwright 1.04
Wandy Rodriguez 1.04
John Lackey 1.05
Justin Verlander 1.07
James Shields 1.07
Matt Cain 1.08
Jon Lester 1.12
Barry Zito 1.13
Tim Lincecum 1.14
Jason Hammel 1.15
Paul Maholm 1.16
Joe Saunders 1.17
Gio Gonzalez 1.17
Roy Oswalt 1.19
Felix Hernandez 1.19
Randy Wolf 1.21
Gavin Floyd 1.24
Yovani Gallardo 1.26
Chris Carpenter 1.29
Ricky Romero 1.47Upright command showed a higher year-to-year correlation than lateral command, and there's also some relation between the two.
Comments
How can all this stuff really be useful other than for creating command rankings?
- It could be employed to monitor command improvements. Adam Wainwright, for example improved both his lateral (from 0.62 to 0.56) and his upright command (from 1.06 to 1.02) from year 2009 to 2010.
- A check on whether higher command of one pitch results in higher efficacy for that pitch (as measured by pitch run value) is surely required. Separate analysis on lateral and upright command would be advisable.
- Another field that could be explored is whether delivery angle plays a role in a pitch effectiveness: Are pitchers with a lower angle allowed a poorer command because their mistakes do not end in the most dangerous locations?
- Finally, at a micro level, it might be employed to suggest where a pitcher should aim his throws, based on the hitter's weaknesses (everybody is heatmap crazy lately!) and the pitcher's command (both lateral and upright) and delivery angle.
Issues
One big issue with this analysis is that the PITCHf/x system doesn't have a perfect and consistent calibration, as other analysts have shown (see Mike Fast during last year LCS, to have a feel of how much the system could be offset on a given night). Thus if a pitcher played in games with extreme miscalibrations, he would look wilder than he is. This issue spurred me to build an algorithm for calculating the correction factors for every game, thus the locations used for this article have been corrected (for an explanation on the correction algorithm just stay tuned).
Another problem is that on some pitch types you are more likely to find scatter plots like this one, featuring two distinct targets, for which a more elaborated algorithm is needed.

Well, not exactly like this one, because you'll hardly find someone hitting his spots like Mariano Rivera with his bread-and-butter cutter. We'll tackle this issue before long as well.
References and Resources
Mike Fast: The Cliff Lee Turnaround - The Hardball Times Baseball Annual 2009
Dave Allen: Measuring a Pitcher's Ability to Locate a Pitch - The Baseball Analysts
Jeremy Greenhouse: Spitballing on Command - The Baseball Analysts
Jeremy Greenhouse and Nick Steiner: Scouting by Numbers - The Hardball Times Baseball Annual 2011.
After creating a baseball rendition of The Beatles' Sgt. Pepper cover, Max began his baseball writing because he needed an excuse to show the picture. He wrote for an Italian audience for six years before making the jump to The Hardball Times. You can contact him by e-mail.







 
Another terrific article Max.
I wonder how much a pitcher’s use of deception plays a part on the choice of locations. A pitcher may be choosing a pitch type and pitch location to have that pitch mimic another pitch without much concern of where the pitch actually ends up since the success of the pitch will depend on how well it deceives the batter. That might defeat your analysis somewhat.
This is one area where teams have a clear advantage over independent analysts. While we can only infer a pitcher’s command for in game performance, they can ask a pitcher to pitch to a certain spot and measure how well he does.
Look forward to your future article on the calibration algorithm