Curveball command

One night in Brooklyn (Johnny Sain) threw 32 straight curveballs from 32 different directions: underhand, overhand, sidearm, three-quarters, behind his back almost. The whole world knew what was coming. Didn’t matter. We haven’t hit one of them yet. He could drop a curve in a coffee mug.
Rex Barney

Several PITCHf/x analysts have tackled the issue of measuring pitchers’ ability to locate the ball where they intend.
Mike Fast in his essay on Cliff Lee’s 2008 turnaround (see the 2009 Hardball Times Annual) wrote that “with a little extra work … we can come close, at least qualitatively if not quantitatively, to assessing a pitcher’s command of his pitches.”
Dave Allen explored the matter by looking at charts of Mariano Rivera’s cutter. Jeremy Greenhouse first approached the topic last June and published a list of the best at locating pitches in the 2011 Hardball Times Annual.

One issue making the road toward the perfect measure of command difficult is that we can’t know the intended target of the pitch, and I believe we will never be able to do that. We’ll get close when (if?) we have data on catchers’ glove positioning (see Nick Steiner’s follow-up on Jeremy’s article in the Annual), but that won’t be the end-all-debates solution.
Many catchers nowadays move until the last moment to keep hitters from peeking at their position. And, especially on breaking pitches, the position of the glove might just be a reference point; i.e., the pitcher aims at that target but to obtain the desired effect of delivering the ball to a different spot (lower, for example, on a curve).

Thus, data on catchers’ glove positioning may be at best an acceptable proxy of intended location (as Mike Fast mentioned in the past, if tracking the backstoppers’ feet is an easier task, that would work the same, if not better).

However, all those difficulties do not entirely prevent us from measuring command. In the following lines I’ll try to add my contribution, which is heavily based on the excellent work of those who preceded me.

Let’s start with a chart, showing John Lackey’s curveball locations (data from 2009 and 2010).
image

As Harry Pavlidis first noted, the scatter plot of pitch locations of a given pitcher often reveals the arm angle of that pitcher.

Here’s another one, relative to Matt Cain’s curves.
image

One difference between Lackey’s and Cain’s charts is that the latter is—well, fatter. If we suppose Cain doesn’t throw his curves around on purpose, the fatter scatter plot is an indication of a lower ability to hit the spots.

Mike Fast, while working on the 2009 Annual article, noticed “(Cliff) Lee threw each pitch type to only one location, almost without fail.” Curveballs have not been chosen at random for this example. While many of the other pitch types are more effective when delivered on the black, what’s important for deuces is vertical location. You can see that in one of my first articles.
Thus the assumption that pitchers do not intentionally fatten their location scatter plot on curveballs can be accepted for the moment.

How can we measure pitchers’ ability to hit their spots? Ideally we need to rotate the scatter plot so that the dots are somewhat vertically aligned, instead of roughly aligned along the delivery angle. In this way the rotated reference system would shift from horizontal location vs. vertical location to location perpendicular to the delivery angle vs. location parallel to the delivery angle. A measure of the variation along the new horizontal axis (for example the standard deviation) would capture the lateral command.

Let’s go step by step.

1. Rotation

Principal Components Analysis (PCA) is an advanced statistical method. It consists of a linear transformation of the data that chooses a new coordinate system (a rotation of the original axes) for the data set such that the new axes coincide with directions of maximum variation of the original observations.

Okay, let’s get ungeeky. Given the above Lackey chart, PCA finds a new axis as shown below.

image

I hope the above image is worth more than the 50 preceding words.

If the new axis (the first principal component) is chosen as the ordinate, then we have the desired new reference system, where the abscissa is location perpendicular to the delivery angle while the y-axis is location parallel to the delivery angle.

A Hardball Times Update
Goodbye for now.

2. Variability

The following histograms (distribution along the second principal component) show the lateral command (from now on short for location perpendicular to the delivery angle) of Lackey’s and Cain’s curves (they are limited to 2010 data against right-handed batters).

image
image

The good news is that both are symmetrical with a single central peak; i.e., roughly bell-shaped. That’s another point in favor of the pitchers-aim-at-the-same-horizontal-spot-on-curveballs hypothesis.

Cain’s histogram is wider, indicating higher dispersion around the mean (that is, the preferred target). The standard deviation for his lateral location is .627, while it’s .517 for Lackey’s.

3. Results

Applying the algorithm to the regular curveballers (included in this analysis are the pitchers who have thrown at least 400 curveballs both in 2009 and 2010, according to MLBAM classifications), we get the following list for 2009 and 2010 combined. Lower values indicate better lateral command of the bender. Values have been separately calculated according to batter handedness, then recombined in a single value.

		   lateral 
           player  command
     Roy Halladay  0.51
       Barry Zito  0.51
      John Lackey  0.51
        Dan Haren  0.51
      Paul Maholm  0.52
  Felix Hernandez  0.53
        Zach Duke  0.53
     Jason Hammel  0.55
    Sean Marshall  0.55
  Chris Carpenter  0.56
   Javier Vazquez  0.56
       Randy Wolf  0.56
   Bronson Arroyo  0.57
       Jon Lester  0.57
     Gio Gonzalez  0.58
  Adam Wainwright  0.59
      Gavin Floyd  0.59
 Justin Verlander  0.59
  Wandy Rodriguez  0.60
  Yovani Gallardo  0.60
     Joe Saunders  0.61
       Roy Oswalt  0.63
        Matt Cain  0.63
    James Shields  0.63
     Tim Lincecum  0.67
     Ricky Romero  0.67

Other than the sexy list, the analysis spat out bell-shaped histograms for nearly all considered pitchers. Another encouraging result is that pitchers show very similar lateral command values against right-handed batters and left-handed batters; they are also stable in consecutive seasons, although the year-to-year correlation seems (remember it’s just a 26 individuals data set) a bit lower than the versus righties/versus lefties one (I feel it should be that way).

4. Repeat the process for upright command

I have tackled lateral command first, because it was easier to accept the pitchers-aim-at-the-same-spot hypothesis for it.
However, since we have seen that it’s vertical location that is important for Uncle Charlies, let’s try to see if the whole process holds for upright command (from now on short for location along the delivery angle) as well.

Here’s another scatter plot comparison, featuring curveballs by Roy Halladay and Felix Hernandez (2009 and 2010 data).

image
image

This time the charts are more or less equally “fat,” but the one picturing the King’s curves is way “longer” than the one of Doc’s.
The standard deviation on the first principal component (the new axis along the delivery angle) is 1.19 for Hernandez and 0.99 for Halladay. Here are the corresponding histograms (again limited to 2010 data against righties).

image
image

Again, they surely have a single peak and are fairly symmetrical. And the histograms of all pitchers listed before share more or less the same traits. Again, the hypothesis that pitchers aim roughly to a single spot when delivering curves is acceptable.

Now, some more ranking.

		   upright
           player  command
   Bronson Arroyo  0.92
        Zach Duke  0.94
        Dan Haren  0.95
     Roy Halladay  0.99
   Javier Vazquez  1.00
    Sean Marshall  1.01
  Adam Wainwright  1.04
  Wandy Rodriguez  1.04
      John Lackey  1.05
 Justin Verlander  1.07
    James Shields  1.07
        Matt Cain  1.08
       Jon Lester  1.12
       Barry Zito  1.13
     Tim Lincecum  1.14
     Jason Hammel  1.15
      Paul Maholm  1.16
     Joe Saunders  1.17
     Gio Gonzalez  1.17
       Roy Oswalt  1.19
  Felix Hernandez  1.19
       Randy Wolf  1.21
      Gavin Floyd  1.24
  Yovani Gallardo  1.26
  Chris Carpenter  1.29
     Ricky Romero  1.47

Upright command showed a higher year-to-year correlation than lateral command, and there’s also some relation between the two.

Comments

How can all this stuff really be useful other than for creating command rankings?
{exp:list_maker}It could be employed to monitor command improvements. Adam Wainwright, for example improved both his lateral (from 0.62 to 0.56) and his upright command (from 1.06 to 1.02) from year 2009 to 2010.
A check on whether higher command of one pitch results in higher efficacy for that pitch (as measured by pitch run value) is surely required. Separate analysis on lateral and upright command would be advisable.
Another field that could be explored is whether delivery angle plays a role in a pitch effectiveness: Are pitchers with a lower angle allowed a poorer command because their mistakes do not end in the most dangerous locations?
Finally, at a micro level, it might be employed to suggest where a pitcher should aim his throws, based on the hitter’s weaknesses (everybody is heatmap crazy lately!) and the pitcher’s command (both lateral and upright) and delivery angle. {/exp:list_maker}

Issues

One big issue with this analysis is that the PITCHf/x system doesn’t have a perfect and consistent calibration, as other analysts have shown (see Mike Fast during last year LCS, to have a feel of how much the system could be offset on a given night). Thus if a pitcher played in games with extreme miscalibrations, he would look wilder than he is. This issue spurred me to build an algorithm for calculating the correction factors for every game, thus the locations used for this article have been corrected (for an explanation on the correction algorithm just stay tuned).

Another problem is that on some pitch types you are more likely to find scatter plots like this one, featuring two distinct targets, for which a more elaborated algorithm is needed.

image

Well, not exactly like this one, because you’ll hardly find someone hitting his spots like Mariano Rivera with his bread-and-butter cutter. We’ll tackle this issue before long as well.

References & Resources
Mike Fast: The Cliff Lee Turnaround – The Hardball Times Baseball Annual 2009
Dave Allen: Measuring a Pitcher’s Ability to Locate a Pitch – The Baseball Analysts
Jeremy Greenhouse: Spitballing on Command – The Baseball Analysts
Jeremy Greenhouse and Nick Steiner: Scouting by Numbers – The Hardball Times Baseball Annual 2011.


10 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Peter Jensen
13 years ago

Another terrific article Max.

I wonder how much a pitcher’s use of deception plays a part on the choice of locations.  A pitcher may be choosing a pitch type and pitch location to have that pitch mimic another pitch without much concern of where the pitch actually ends up since the success of the pitch will depend on how well it deceives the batter.  That might defeat your analysis somewhat. 

This is one area where teams have a clear advantage over independent analysts.  While we can only infer a pitcher’s command for in game performance, they can ask a pitcher to pitch to a certain spot and measure how well he does. 

Look forward to your future article on the calibration algorithm

Dave Studeman
13 years ago

Fantastic, Max.

Jeremy Greenhouse
13 years ago

Very cool, Max. Wish I knew more about principal component analysis or clustering.

Derek
13 years ago

The underlying premise here is that vertical movement is all that matters.  (e.g. with good vertical movement, the hitters cannot hit it if they know it is coming)

Then, why not simply calculate

(strikes + balls located low and away)  / total curves attempted

I know that it would not seem nearly as “advanced” as throwing out the term principal components analysis, but anyone who understands PCA, knows that this simple fraction may make more sense as a command metric.

The problem with this method is that pitches that are high and heading toward the batter’s ear (clear mistakes that never result in a strike) count the same as well placed wipe out pitches low and away that often get strike three swinging or set the batter up for the next pitch.

Finally, one could imagine that having a score of zero on this metric (first component accounts for all of the variance) makes a pitcher VERY hit-able unless he has incredibly sharp breaks and tight spin on every pitch. 

Who gave up more home runs on their curveball:  Cain or Lackey?

Max Marchi
13 years ago

Peter,
Probably Scientific Baseball LLC has the best setting to put this kind of analysis to test.

Max Marchi
13 years ago

Derek,
what you propose is to see whether a pitcher throws the ball where it’s least dangerous, no matter if he intended to hit that spot. Many other analysts have already done that.

What I’m trying to see is whether a pitcher can locate the ball where he wants, no matter if it’s a good place.
If a pitcher wants to throw his curves in the fat part of the plate and letters high, and hit that spot every time, he has great command of his curve—too bad he aims it at bad spots. This particular pitcher doesn’t need to improve his command, he needs to select different locations.

And yes, for this kind of analysis I WANT a mistake at the batter’s ear to be the same as a mistake low and away. I know the outcome is way different, but the roughly bell-shaped histograms make me believe that the probability that a pitcher misses on one side or the other is the same.

Obviously, if we don’t know the intended location, we can’t say if the low-and-away pitch is a mistake (a good mistake) or if it was thrown there on purpose. The histograms and Mike Fast’s and Nick Steiner’s observations on catcher mitt’s positioning suggest me it’s more often the former.

Also, I’m not sure where you get the underlying premise regarding vertical movement. I never mention movement in the article.

Derek Neal
13 years ago

Max:

The premise is that the best pitchers in the world have catchers who are not stupid.  Thus, unless one assumes that the vertical movement is the dominate (or even only) factor determining the batter’s success at hitting the pitch hard, one would tend to assume that balls left in the middle of the plate were more likely to be deviations from their intended target than balls placed low and right on the corner.

I am also willing to assume that curve balls at the ear are less likely to be hitting an intended spot (since batters never swing and they are never called for strikes) than curve balls low and away.

The concept of dominated strategies in game theory combined with the assertion that you do not get to call a MLB game if you are stupid imply that your parallel treatment of curves that are (i) high and in, (ii) right down the middle, (iii) right on the outside corner, and (iv) low and away in the dirt—as long as they all share the same first principal component—makes no sense. 

Better pitches—in terms of expected outcomes—should be more likely to be intended pitches or someone else should be calling the pitches.

Max Marchi
13 years ago

Derek:

A clarification first.
The centre of the histograms, i.e. the higher bar, does not represent the middle of the strike zone, rather the place where most of the pitches are located for one particular pitcher. That is, it might be down-the-middle if a pitcher constantly delivers down-the-middle, or low and away if the pitcher nibbles that corner on most of his offerings.

What you are saying makes perfectly sense and doesn’t necessarily conflict with findings in the article.

Let me try to rephrase my previous reply.
Say you aim 100 darts at the same target (center of the dartboard). I would expect the darts to scatter randomly around the desired target—close together if you have good command, far apart if you have poor command. In this case we know precisely where you are aiming, thus things are quite simple.

In the MLB pitchers case, we don’t know the desired location, so we have to try to infer it.
My hypothesis is the desired location is the center of the scatter plot.

If high pitches are more likely to be mistakes than low pitches (where high is “higher than the usual location”, not “high at the batter’s ears”), we should get skewed histograms. The fact they are symmetrical hints (at least this is my interpretation) pitchers equally misses high or low. Alternatively it could be they equally move their intended location high or low.

Summing up:
– When I talk about mistakes, I’m talking about deviations from the centre of the scatter plot. I’m sorry if I introduced some confusion with my previous reply.
– If you aim at a target, you are going to equally miss on either side (actually your error along your delivery angle is different in magnitude from your error perpendicular to your delivery angle). You can aim at different spots but you can’t decide which way you are going to err.
– The hypothesis underlying my reasoning is that pitchers aim at a single spot on curveballs. While I know this isn’t exactly true, observations on catchers’ mitts by Mike Fast and Nick Steiner and some of the results shown here hint that the assumption is not out of the ballpark.

Now, I would really love to test this in an experimental setting like this one: http://www.scientificbaseball.com/

Derek
13 years ago

I assumed that 0 on the horizontal axis was for a second principal component of zero (i.e. I assumed that the verbal description above the chart was correct and the label was wrong)  Thus, the histogram is providing the density function for the perpendicular distance (second principal component) from the bold line (what you call delivery angle).

My problem is that the second principal component is zero anywhere on that line because along that line the first component is giving a perfect fit, and ON THAT SAME LINE, many of the points are good pitches and many are bad pitches.

I do not care if it is right on the delivery angle if it is never going to be called a strike and a batter is never going to swing at it.

All of our other disagreements center around this assumption:

“The hypothesis underlying my reasoning is that pitchers aim at a single spot on curveballs.”

I find this hard to believe.

Max Marchi
13 years ago

Derek,
I presented two measures, one on the first and one on the second principal component.
Thus, if the pitch is along the delivery angle, it is considered to be well located “laterally”, but it might be badly located “upright”. The combination of the two should give a better idea of the pitcher’s command.

Thank you for all your thoughtful comments. I’m working on some ideas I got from reading them, including something on the single spot issue (common sense says you’re right, so the burden of proof is on my side).

Surely, as Peter suggested in the first comment, teams can push this kind of analysis further since they can create some kind of experimental setting (not during games, obviously) and force the constraint to be true.