Those of you who keep a keen eye on baseball analysis will have no doubt sussed the latest data doing the rounds this year is no longer play-by-play but pitch-by-pitch. Courtesy of those kind folks at MLB.com we now have access to the super-detailed Pitch f/x system in selected major league parks.
Pitch f/x tracks the flight of every pitch thrown in the bigs and logs a ton of useful parameters such as start and end velocity, location, release point and break.
This gives an analyst a wealth of new information with which to play. Leading protagonists in this field are THT’s John Walsh, Dan Fox of Baseball Prospectus and Joe P. Sheehan of Baseball Analysts. Even I have dabbled with this data in the past looking at inter-park consistency using Kevin Millwood as an example.
The most exciting application of these data, I think, is to augment scouting information. No longer do we have to rely on the baloney spewed from Joe Morgan’s mouth; we can crack open the data and look for ourselves. Does King Felix really throw eight pitches? What about Daisuke Matsuzaka’s mythical gyroball? Does Roger Clemens tire after five innings?
These and many other fine questions can be answered by delving into Pitch f/x. Today I want to look at Braves pitcher Tim Hudson and see what we can learn about him using the Pitch f/x data.
What does Tim Hudson throw?
Pitch classification is a topic that will no doubt garner many column inches over the coming months and years but John Walsh’s excellent synopsis is a great starting point in our quest.
We can classify by speed and movement and although different hurlers have different pitch characteristics we can make some generalizations. Here is a basic cheat sheet (assuming a right handed pitcher throwing to a right handed batter):
- Fastball (four-seam): 90+mph pitch thrown with backspin so “rises” five to 10 inches. We say “rises” but the measurement is relative to a pitch thrown with no spin, so relative to this it “rises”. Fastballs tend to have a touch of side spin, so move five to 10 inches “in” on a right handed batter
- Sinker (two-seam, possibly): 90+ mph pitch that “rises” five inches or so, marginally below the fast ball—will also have lateral movement “in” on the batter
- Splitter: 85-92mph, moves up three to five inches, likely to have less or different sideways movement to a fastball because of the way it is thrown
- Curveball: 70-80 mph, moves down about 10 inches relative to a pitch with no spin and moves away from the batter zero to 10 inches (depending on whether it is a classic 12-6 curve or 11-5)
- Slider: 80-85mph, movement is between the curve and fastball
- Change-up: 80-85mph, similar movement to a fastball (up and in)
Those in the know (scouts and some fans) differ on whether Hudson has a four- or five-pitch repertoire. Let’s take a look at the Pitch f/x data and see what we can detect.
Before we launch into the data let me briefly discuss methodology. The chart includes all pitches tracked by Pitch f/x up until and including Hudson’s June 4 start against the Fish. Since then MLB has been experimenting with the changing point from the point where the pitch is first measured.
In the past a pitch has been picked up at 55 feet, but on June 6 this was altered to between 40 and 55 feet. When the pitch is picked up later the system records a lower start speed and less break. These pitches could cloud our classification, so they were omitted. My understanding is that in the future MLB will reset the release point at 50 feet. Also I’ve only looked at starts at the Ted to strip out any inter-park variation.
Anyway, let’s get back to the data plotted above. We have vertical movement on the y-axis and horizontal movement on the x-axis, with the different colors representing different pitch velocities—it’s a bit of a mess, for sure. If you look closely you can identify three main clusters and a potential fourth. (It’s worth noting that Hudson’s pitch type is particularly hard to classify relative to some other hurlers).
- Cluster 1: Jets in at 85-95 mph rising slightly and breaking in to the right handed batter by around seven inches (red and greed spots)
- Cluster 2: Moves similarly to cluster 1 but is much slower, clocking 75-85 mph (black spots)
- Cluster 3: 85-90 mph pitch that breaks away from the right handed hitter and rises a touch less than the fastball (red spots)
- Other clusters: There are a smattering of black spots moving away and down, which could be a fourth cluster
Armed with that information it is straightforward to glean pitch type. Cluster 1 is obviously a fastball, cluster 2 the change and cluster 3 is perhaps a splitter or slider. But it’s difficult to be sure and Hudson’s sinking fastball remains hidden (unless all his fastballs are sinkers).
John Walsh showed us a couple of weeks ago that that sinkers have less upwards movement than a normal four-seam fastball. It might be that within the fastball cluster there are some sinkers (two-seamers, perhaps). Also within the morass of points in the right hand half of the chart there could be a mix of sliders and splitters.
We can use statistical clustering techniques to shed more light on the issue. Below I’ve used a clustering function to separate the groups. This is more of an art than a science but by playing around with cluster parameters it is easy to create some sensible groupings. If we change the format of the chart and plot speed on the y-axis with horizontal (solid spots) and vertical (open spots) movement on the x-axis the clusters are easier to see:
The graphic is tricky to interpret so the solid spots should be read as horizontal movement from the catcher’s perspective, while the further open spots are to the left the greater the downwards movement. The above graphic is based on specifying five clusters; four groups are now obvious. The change-up (blue) is easy to see as is the slider (red), which moves marginally down and away. The green markers show a pitch that sinks more than a typical fastball and also moves the other way. This is likely to be the splitter, which is thrown with a different grip to the fastball as well as around 5 mph slower.
Interestingly, the fastball grouping has split into two. Now, this could be a false grouping—it is difficult to tell—but there is a relationship between pace and movement. A faster ball moves both down and into the batter more. This is not what we’d intuitively expect as a faster pitch will have less opportunity to sink (and is thrown with more backspin). The greater lateral movement could be due to more side spin imparted on the ball in the release. However, scouting information suggests that Hudson throws both these pitches and that his sinker touches 92-93mph, which is consistent with the data.
Who knows? For the time being we’ll refer to these clusters as four-seam and two-seam/sinker fastballs.
Arm Slot by Pitch Type
Okay, so now we have classified Hudson’s pitches by type we can do some other cool analysis. For instance, can a hitter work out what is coming by Hudson’s motion and arm slot. Well Pitch f/x can tell us:
Interestingly the four-seam and two-seam fastball appear to have a slightly different release point. The two-seam sinker seems to be thrown from a marginally lower position—so it looks as though it could be a different pitch after all.
The change-up release point isn’t perfect as it is a touch in from a four-seam fastball. In reality it is a about a quarter of a foot, which is a meagre three to four inches. In practice, when you are in the batter’s box you’re going to need a radio telescope to detect the difference.
Of course the success of a pitcher, especially a starter, is not only how well he throws his pitches but how well he mixes his stuff up. Measuring pitch randomness is possible but requires effort, meanwhile it easier to see whether batters rip a slider more than, say, a fastball. Here are the data for Hudson:
Four-seam Slider Splitter Change Sinker Ball 34% 47% 49% 30% 33% Called Strike 27% 12% 9% 7% 15% Foul 12% 9% 11% 18% 23% Swinging strike 4% 18% 15% 17% 4% In play 24% 15% 17% 29% 25%
The first thing to be wary of is small sample size. Hudson has thrown 629 trackable pitches and the most in a category was 240, registered as sinkers. That isn’t many. The standard deviation on a 30% in-play rate is around 3%—a sizable range.
Anyhow, let’s peer into the data. Hudson has most control over his fastball and change—these pitches have the highest strike and out rates, and he throws them more frequently than the others. By contrast he has much less control over the splitter and slider so throws them less often.
Look at the splitter for a second. Fifty percent of pitches are balls and 15% are swinging strikes, implying that perhaps they buzz the edge of the strike zone. His slider shows similar characteristics. Compare that to the pitch I’ve termed the sinking fastball. Only 33% of these pitches are balls and swinging strikes account for 4%. The difference is that hitters make contact (foul and in-play outs account for 39%), though fortunately for Braves fans never very good contact.
Another cut is to look at what happens at final ball—by this I mean the pitch that ends the at-bat. Why would we want to look at this? Simply, it is a purer measure of outcome because early in the count batters are willing to take balls and strikes. On the final ball we know for sure that they have definitively decided to either take the pitch on or leave it.
Here are the final ball outcomes by pitch type:
Four-seam Slider Splitter Change Sinker Double 4% 0% 5% 0% 5% Fly Out 16% 14% 5% 13% 13% Ground Out 38% 0% 38% 41% 35% Home Run 2% 0% 0% 0% 0% Line Out 4% 0% 5% 3% 3% Pop Out 2% 14% 5% 0% 1% Single 20% 43% 19% 25% 15% Strikeout 6% 29% 14% 19% 15% Walk 6% 0% 10% 0% 4% Other 2% 0% 0% 0% 9%
The slider, again, looks like his weakest pitch—but if you get under the skin of the data you realise he has only throw seven final ball sliders! So, we can’t infer anything useful. Probably the only semi-useful observation here is that by looking at the splitter and sinker we can see that pitches down in the zone are less likely to produce fly-outs and more likely to produce strikes. The speculation is mute as a chi square test is not significant.
Another fascinating application of pitch by pitch data is to work out how a hurler handles different situations. We can do this in a number of ways. For instance, looking at how he varies his pitches according to the count. Another variable to consider is base out state when the pitch was thrown.
We don’t have the sample size to look at base out state but we can look at pitch by count.
Four-seam Slider Splitter Change Sinker 0-0 45% 4% 11% 9% 31% 0-1 17% 11% 22% 17% 33% 0-2 14% 11% 32% 18% 25% 1-0 28% 1% 7% 18% 45% 1-1 19% 3% 25% 24% 28% 1-2 11% 16% 33% 16% 24% 2-0 30% 0% 0% 10% 60% 2-1 20% 0% 9% 20% 51% 2-2 14% 11% 8% 14% 54% 3-0 75% 0% 0% 0% 25% 3-1 27% 0% 0% 0% 73% 3-2 7% 0% 7% 7% 79% TOTAL 27% 5% 15% 14% 38%
If you’d faced Hudson on a 3-0 count this year then every time you’d have seen a fastball. Knowing that vastly increases the odds that you’ll belt it over the fences. Let’s see what other tidbits we can discern from the data:
- Fastball is the most likely pitch with which he’ll open the inning
- When down in the count the odds of a splitter increase dramatically
- On a 3-2 count expect a sinking fastball (two-seam)
- Like the splitter, a slider is most likely when behind in the count
- Change-ups are thrown mostly at random
What about location? Does Hudson favor the inside or outside of the plate? The charts below plot pitch location for a right-handed batter (on the right) and left-handed batter (on the left):
The first thing that leaps out is that Hudson definitely works the lower outside corners. Also he mostly throws the splitter and slider to right handed batters because the lateral movement takes the ball away from the hitter. Likewise, the fastball and change combo are more frequently thrown to left-handers. It is only the sinker, his favorite pitch, that appears to be thrown independently of handedness.
Cool Stuff We Can’t Do Yet
Another important application of this data is to identify hot and cold zones. For instance, are balls down the middle of the plate hit more often than those on the corners? Although small sample size prevents us from drawing too many conclusions let’s have a look at pitches and balls in play by pitch location.
The numbers in the squares indicate the number of balls in that area. The chart on the left is all final ball outcomes, the chart on the right is those balls that were put in play. Based on this tiny sample it does appear that balls down the middle are whacked more than those on the edge but we can’t draw any firm conclusions.
Eventually, the data will be able to tell us Hudson’s hot and cold zones. For instance, does he always get belted if he pitches inside? Pitch f/x’s pitch location data will inform us. Also we’ll be able to look at batters’ hot and cold spots too and understand where the holes are in each batter’s swing.
Analysing these data is both rewarding and frustrating. Rewarding because we are pushing the boundaries of what has been done in sabermetrics (because we have the data). Frustrating because the system is new, is very much in trial and mlb continues to make adjustments. This means it is very had to get enough data points to draw firm conclusions.
Over the next few years that will definitely change.
References & Resources
A huge thanks to mlb and its data team for making the Pitch f/x data available. Also thanks to Joe P. Sheehan, Dan Fox and John Walsh for leading the charge in analysing this data—the more people we can get crunching this stuff the more we can learn. Thanks too to Cory Schwartz from mlb.com for being open to answering any inane questions about this stuff.