The Internet cried a little when you wrote that on itby Mike Fast
June 17, 2010
"The woman Folly is loud; she is undisciplined and without knowledge." (Proverbs 9)
It's easy to look stupid doing baseball analysis. Anyone who hasn't isn't trying. But you shouldn't make it any easier that in has to be. Anything from the list below is almost guaranteed to get you into trouble unless you've really done your homework. Beware!
"Scientia potentia est." (Bacon)
The PITCHf/x data set is an incredible resource, rich and complex, revealing depths to the game of baseball heretofore unseen. It can be a difficult data set to use well, particularly for the novice analyst. However, it also rewards the diligent.
If you call out for insight and cry aloud for understanding,
and if you look for it as for silver and search for it as for hidden treasure,
Then you will understand what is right and just and fair—every good path.
For wisdom will enter your heart, and knowledge will be pleasant to your soul.
Discretion will protect you, and understanding will guard you. (Proverbs 2)
Let's cover the potential pitfalls in more detail.
1. [Pitcher] is using his [pitch type] a lot more this year, according to PITCHf/x, where [pitch type] = two-seamer, cutter, etc.
The pitch classifications in the PITCHf/x data, as shown on Brooks Baseball, Texas Leaguers and Fangraphs (not to be confused with the BIS pitch classifications also on Fangraphs), are done by an algorithm developed by Ross Paul at MLB Advanced Media. Ross has made significant updates to this algorithm every year, and one of the most noticeable impacts is that the percentage of pitches classified as two-seam fastballs has increased with each update. This does not mean that the pitchers themselves have changed anything about their pitch selection or the movement on their pitches.
If you want to know if a pitcher has really added a new pitch type, explore his data on a site like Texas Leaguers and see if he's added a completely new pitch cluster from the previous year. Don't pay attention to how the pitches are labeled; learn for yourself how the different pitch types behave. When you have mastered that, then you're ready to identify when a pitcher has added a new pitch.
For example, take a look at the following two graphs that can be found on Texas Leaguers for Ricky Romero. They show the pitch speed versus spin axis angle for 2009 and 2010. The MLBAM pitch classifications are indicated by the different symbols on each graph. Note how Romero's stuff has remained mostly the same from 2009 to 2010, but the classifications attached to the pitch clusters have changed dramatically.
2. [Pitcher] is tipping his curveball by releasing it higher than his other pitches.
The most important thing to know here is that almost all PITCHf/x data presented on the web, including that at the three websites listed in Item 1, lists "release" points measured at 50 feet from home plate. Most pitchers release the ball at about 54-55 feet from home plate, give or take a foot or so.
The next thing to think about is some simple physics. If you throw one type of pitch that drops less than a foot on its way to home plate (let's call it Fastball) and another type of pitch that drops three feet on its way to home plate (let's call it Curve), and you want both to end up in the strike zone, which one do you think you'll need to release on a trajectory that's initially going higher? Right. Curve. And where do you think you'll see the biggest separation between the trajectories of the two types of pitches? Yeah, 5-10 feet out from release would be a good place to start to look. It just so happens that's where all the PITCHf/x data you see is looking, too.
Of course, not every pitcher has a big-dropping, slow curveball such that he needs to release it pointed noticeably higher than his fastball, which is why you don't see a huge separation for every pitcher. That's not to say some pitchers don't tip their pitches, or even that batters don't pick up the difference between fastball and curveball that way even if it's intentional by the pitcher. But take a closer look before you assume there's something wrong with seeing the curveball higher than the fastball 50 feet out from home plate.
3. [Pitcher] has lost velocity on his fastball, but he's exchanged speed for movement.
There are typically at least two claims in these sorts of articles. The first is that a pitcher has lost fastball speed. That does happen on occasion and sometimes authors are correct about that. The smaller the sample size, the more suspicious you ought to be. Temperature has an effect on fastball speed, and for data from cold games in April you might want to pay attention to that.
The source for velocity information can be a radar gun or PITCHf/x cameras, and both obviously have measurement errors associated with them. In the PITCHf/x case, the camera systems are usually pretty good, but they can have calibration errors that can cause mismeasurement of pitch speeds by up to a couple of mph. A good way to check for this is to compare road data to home data.
For example, here you can see that Zack Greinke's pitch speeds have been 1-2 mph faster at home than on the road this year, even as his velocity has increased throughout the year:
Once you are satisfied that the fastball velocity loss is real and not just a measurement artifact, you can turn to the second claim that so often accompanies the first. At least it does if the pitcher is still being effective with his lower velocity. If he's not, you may find the opposite claim—that he has also lost movement—which we will cover in a moment.
Like changes in velocity, the changes in movement are subject to measurement and calibration errors in the PITCHf/x cameras. Shifts of 2 to 4 inches due to errors are fairly common. Again, comparing road data to home data is one way to check for these kinds of errors.
Here you can see that the spin deflections for Zack Greinke are shifted up and to the left by a couple of inches at home relative to the data from his road starts in 2010:
The other confusing factor here is the question of what "movement" really means and whether a change of a couple of inches is something that would even affect a batter. Or whether a shift in the observed direction is a good thing or a bad thing. If the pitcher has improved, it's a good thing. If the pitcher has performed poorly, it's a bad thing. QED. Right? If the author bothers to check any of these critically important facts, he or she is a better man or woman than 95 percent of the baseball writers out there.
4. The [expletive] umpire called a strike on a pitch to [left-handed batter] that was way outside!
Many analysts have shown that the average strike zone called by umpires extends a couple of inches outside the rulebook zone to right-handed hitters and several more inches beyond that to left-handed hitters. While an author who notes that the pitch was not inside the rulebook strike zone may be technically correct, the point that the umpire was being unfair to the team of interest may not be correct.
Fans love to hate umpires, so I won't belabor this point.
5. [Pitcher] is struggling this year because he's lost movement on his pitches.
We touched on this earlier in the "exchanging speed for movement" section. The first thing to check is whether any change in movement can be attributed to either a change in classification of the pitch types or to PITCHf/x measurement errors. I could probably stop right there because (1) almost no analyst does that and (2) most pitchers don't really change the movement on their pitches that much, at least not that I've noticed. Josh Kalk did find some changes in movement within a given game for a pitcher who was tiring, but that's a different kettle of fish.
If someone demonstrates that pitchers actually do lose movement on pitches (and what does "lose movement" mean on a slider, which typically has little or no spin movement anyway?), the next thing to ask is whether the change in movement matters to the performance of the pitcher. Dave Allen, among others, has given us a good beginning on the research on this question, but we're far from having the final answers at this point. Unless the article addresses these topics, the talk about a pitcher's performance changing because he lost movement is worthless fluff.
6. [Pitcher] is doing better this year because he's increased the movement on his pitches.
This is clearly a corollary to the previous point, but flipped on its head. The same caveats apply. Pitchers typically add movement by trying brand-new pitch types. They don't all of sudden start snapping their wrists harder or finding extra snap on their fingertips to make the same old pitch type move more. I'm not saying that's impossible, but if I had a nickel for every time someone claimed it happened and had to give back a dollar for every time it was true, I'd be well ahead in the game.
What should I do?
Instead of jumping immediately to speed, release point and spin movement for explanations simply because they are the numbers listed on Fangraphs, try looking at the pitcher's results—balls in play, swinging strikes, called strikes, fouls and called balls—and working backward from there. Is the pitcher performing better because of better results on balls in play? Why? Look at his ball-in-play charts and see what you can learn.
Is the pitcher performing better because he's getting more strikeouts? What pitch type is he getting those whiffs with? And to which handedness of batter? The same pitch type from the pitcher looks very different to different handed batters.
Is the pitcher missing the strike zone more often with a certain pitch type? If so, where is he missing? Remember to consider batter handedness. Is he throwing the same location as always but now batters are laying off pitches below the zone that they used to swing at?
I don't intend this list to discourage newcomers from dipping their feet into the PITCHf/x water, playing around with the data, and even publishing unpolished results. Yes, I want to discourage sloppy analysis, but scaring people away is not the point. Educating people about common mistakes and encouraging good principles of analysis is what I'm after. I've also noticed a certain awe and air of invincible authority ascribed to PITCHf/x analysis around the web, whether it be shoddy or excellent. I hope to educate the reader of PITCHf/x analyses a little about how to tell the good wine from the cheap wine.
Mike Fast is a Royals fan who enjoys investigating baseball questions using data of many sorts. He is a member of Complete Game Consulting. He welcomes comments via e-mail.