This year, MLBAM unveiled an algorithm for classifying pitches for their Gameday application. How successful is their algorithm?
I looked at data from nine pitchers in 14 appearances in the first week and a half of data. The nine pitchers are Tim Hudson, Odalis Perez, Brian Bannister, Jon Lester, Doug Davis, Johnny Cueto, Phil Hughes, Yasuhiko Yabuta, and Eric Gagne. I compared to my own detailed post-game pitch classification, which should be 99 percent accurate or better.
First of all, what do the MLB pitch type codes mean?
Code Pitch FA Fastball FF Four-seam fastball SI Sinker CH Change-up SL Slider CU Curveball FC Cut fastball FS Split-finger fastball KN Knuckleball
How well does MLBAM do at detecting the pitch type?
Pitch FA FF SI CH SL CU FC FS KN Correct FB-4S 274 1 30 9 0 0 2 0 0 87% FB-2S 34 0 0 0 0 0 0 0 0 100% Chg 19 0 4 74 14 0 2 24 0 54% Sld 1 0 0 0 62 3 2 0 0 91% Crv 0 0 0 1 0 72 0 1 0 97% Cut 23 0 0 23 19 0 17 0 0 21% Spl 0 0 0 0 0 0 0 0 0 Knu 0 0 0 0 0 0 0 0 0
They do quite well at detecting fastballs, but not very well at distinguishing between four-seamers and two-seamers. That’s a tough thing to do for most pitchers, although Lester has an easily distinguishable two-seamer, and MLB didn’t pick it up.
They are pretty lousy at detecting the change-up. They call a lot of them fastballs, cutters, and sliders. That’s inexcusable for any sort of sophisticated classification system. Change-ups are probably the second-easiest pitch to classify, after curveballs. It seems like the system may be setting some sort of hard velocity and break bins that are getting it into trouble. Across the league, one change-up can look quite different than another, but for any given pitcher, change-ups are usually a very distinct cluster, separated from the fastball by a six-mph gap or bigger and separated from the slider and cutter by several inches of horizontal spin deflection.
Given the trouble they have with change-ups, they do surprisingly well with sliders. I’ll talk about this in the next section, but I think they are setting the parameters for the slider very generously, such that they label almost all of the sliders correctly, but they lump in other pitches, too.
Curveballs are the easiest pitch to classify, with uniqueness in both speed and movement. The system does well, and that’s a good sign.
As bad as the system was with the change-up, it is abysmal with the cutter. The cutter takes skill to identify; it’s not as hard as differentiating four-seamers and two-seamers, but it’s tough enough. Identifying the cutter is what separates the men from the boys in the pitch classification world. When the MLBAM system sees a cutter, it basically throws at a dartboard that lists fastball, cutter, slider, and change-up. It’s true that a cutter has features of all those pitches, but I’m disappointed that MLB is as poor as they are with them.
There were no split-finger fastballs or knuckleballs thrown by the pitchers I evaluated. (Well, that’s true for eight of the nine pitchers–I deferred on trying to identify Hudson’s splitter for the moment, but that should be okay for this comparison because MLB couldn’t identify his splitter either. It didn’t think that Hudson threw a single splitter. I’m pretty sure he did, but I just can’t tell them apart from the fastballs and change-ups consistently without studying a larger data set.)
Let’s take a look at the data from the reverse perspective. When MLB identifies a pitch type, how trustworthy is that identification?
FB-4S FB-2S CHG SLD CRV CUT SPL KNU Correct FA 274 34 19 1 0 23 0 0 88% FF 1 0 0 0 0 0 0 0 100% SI 30 0 4 0 0 0 0 0 0% CH 9 0 74 0 1 23 0 0 69% SL 0 0 14 62 0 19 0 0 65% CU 0 0 0 3 72 0 0 0 96% FC 2 0 2 2 0 17 0 0 74% FS 0 0 24 0 1 0 0 0 0% KN 0 0 0 0 0 0 0 0
Pitches identified as fastballs are correct 88 percent of the time. That’s not a very good percentage, in my opinion. This ought to be close to 99 percent for a good system, even in real time, and that’s leaving aside the issue of distinguishing two-seamers. (They appear to be looking for sinkers based on vertical and horizontal spin deflection components. In my experience, spin angle and spin rate parameters are much more useful in distinguishing two-seamers/sinkers from four-seamers.)
If they call it a change-up, it might be a cutter. If they call it a slider, it might be a change-up or cutter. Their definition for a slider appears to be too expansive. A slider can be a difficult pitch to distinguish from a cutter. (The BIS pitch-type data available at Fangraphs has this issue.) But distinguishing a slider and change-up should not usually be very hard.
Their curveball identification is spot on, as expected. They lumped in three sliders that shouldn’t have been that hard to identify as sliders, but I won’t complain too much about that.
Pitches identified as cut fastballs are correct 74 percent of the time. That’s an okay percentage, as such. The problem lies in the fact that they’re being very conservative about cutters, and still they get them wrong 26 percent of the time. I’m not even sure what I would suggest for improvement. There are no hard and fast rules about cutters, and hard and fast rules seem to be what this system needs in order to succeed. Cutters are usually thrown closer in speed to fastballs than are sliders. They’ll often have a higher component of the spin rate in the x-z plane than a slider, and their spin axis is almost always closer to a fastball than a slider. But none of those rules are without significant exceptions. A human can look at the data and usually make a good judgment call, but I haven’t figured out how to communicate that to a computer in an algorithm.
What the system is calling splitters are really change-ups for these pitchers. A splitter usually has a more inclined spin axis than the four-seam fastball and speed a couple mph greater than a change-up. For a pitcher who throws both, it can be tough to tell them apart. For a pitcher who throws only one or the other, it’s usually best to err on the side of calling it a change-up unless you have an indication that the given pitcher throws a splitter. The MLB system doesn’t seem to incorporate prior information about a pitcher’s repertoire into its decision-making, but I’m not sure how you identify pitches like the splitter correctly if you don’t do that.
This was not meant to be an exhaustive evaluation of the MLB algorithm, but I think it’s clear from the data that they still have a long way to go. Being right 80 percent of the time is not impressive when 70 percent of pitches are quite simple to classify. Getting that other 30 percent of the pitches mostly right is what will make this data useful for analysis purposes.
If you want to dig into this further, the Excel spreadsheet that I used for this analysis can be downloaded here.