Another look at Enhanced Gamedayby Joe Sheehan
April 18, 2007
MLB.com's Gameday application has been beefed up this season. The application now tracks detailed Type/Location/Velocity information about every pitch thrown in a game, which means, if you wanted, you could find the average speed of a Kevin Millwood fastball and what hitters do when Millwood throws them a fastball.
Gameday presents data to help answer these questions and many more. The system that generates this information first appeared during the 2006 playoffs, under the guise of Enhanced Gameday, and is supposed to be installed at every major league park this season. High-speed cameras and motion-capture software track every pitch, determine various data points for each pitch and dump that information into XML files which are retrieved by the Gameday application.
One piece of information in the XML describes where each pitch crossed the strike zone. The graph on the left shows (almost) every pitch that Felix Hernandez threw in his Opening Day start vs. the A's, colored by the pitch speed. The graph is from the catcher's perspective, so a left-handed batter would appear on the right of the graph.
The first thing you notice from the graph is that Hernandez throws hard. Of the 102 pitches I have data for (he threw 110 total), only nine were slower than 85 mph.
The other obvious visual on the graph is that most of the pitches were around the strike zone. Combining this location with his velocity resulted in a great outing for Hernandez. If you divide the strike zone into a 3x3 grid you get the graph on the bottom, and can count how many pitches he threw to each area of the strike zone. Eventually (with more than one start worth of data) you can find out what part of the strike zone he typically throws to or where batters hit the most groundballs or the batting average for a batter in a certain area of the strike zone.
Getting back to Millwood, to determine the average speed of his fastball, you need to classify all the pitches he throws. The difference between a curveball and a fastball is how they move through the air toward home, their aerodynamic fingerprint. Gameday measures the vertical and horizontal "break" of every pitch, providing enough information to classify a pitch.
According to the Enhanced Gameday blog, http://gameday.mlblogs.com/gameday/2006/10/enhanced_gameda.html, break (now called pfx) is defined as "the measurement of the distance between the location of the actual pitch thrown over the plate, and the calculated location of a ball thrown by the pitcher in the same way, with no spin." No measurement scale is given, but every pitch has a fingerprint consisting of its speed and two breaks. Not every fastball is going to go exactly 93 mph, but all fastballs from a pitcher are going to have similar speeds and breaks, which are different compared to the speed and breaks for a curveball from that same pitcher.
The next graph shows the horizontal and vertical breaks vs. speed for Millwood. The blue dots show the horizontal break for every pitch while the red dots represent the vertical break. Each pitch has two dots, and from the graph, you can pick out clusters of Millwood's pitches. His two fastest pitches moved in similar fashions, although one was about 5 mph slower.
Below is a similar graph, but with each type of pitch colored differently. You can see from both graphs how much Millwood has relied on his fastball so far this season. He's thrown 186 pitches and 123 have been fastballs, with an average velocity of 92.5 mph. I'm curious whether he usually throws this many fastballs or if this is just something that’s happened in his first two starts, or even if throwing this many fastballs is unusual for a starting pitcher.
While I don't have enough data yet to quantify my answers, based on other pitchers I have charted and watched, I'd say that Millwood relies on his fastball more than a "typical" pitcher and his current usage is normal for him. As far as naming his other pitches, I'm hesitant to classify them because I don't know what the pfx values actually mean. I don't know the scale and I don't know why some are negative and some are positive. If I were to guess, though, I would say that pitch A is a curveball because of its speed and unique break patterns. Pitch B is a pitch that moves similarly to his fastball, so maybe it’s a changeup, although an average speed of 87 mph seems too fast for a changeup.
Millwood throws fastballs 66% of the time, but what happens to those pitches at the plate? Of the 121 fastballs that Millwood threw, 22 were put in play, 48 were balls, and 53 were strikes (24 called strikes, six swinging strikes and 23 fouls). Here's a chart showing what happened to all of his pitches.
Pitch A Pitch B Pitch C Number of pitches 24 39 123 Balls 42% 38% 39% In play (hits and outs)21% 21% 18% All strikes 38% 41% 43% Swinging strikes 78% 13% 11% Called strikes 11% 31% 45% Fouls 11% 56% 43%
A couple notes about the chart. The first three percentages, balls, in play and all strikes, are a percentage of the total number of pitches of that type. The three percentages that detail the types of strikes are calculated as a percentage of the total strikes for pitches of that type. Of all type A pitches, 42% were balls, while 78% of strikes thrown with pitch A were swings-and-misses.
The other thing to notice about the chart is that none of the percentages indicate any skill yet. The chart is only a reflection of what Millwood has accomplished so far in 2007. I don't have enough data for each pitch to say that when he needs a swing-and-miss, Millwood throws pitch A. With more than two starts, you could get a better sense of how Millwood, and other pitches, uses his different pitches.
The only problem with the enhanced system so far is that it is incomplete. For whatever reason, enhanced data has been provided from only eight stadiums so far—Seattle, Anaheim, Chicago, Texas, San Diego, Atlanta, Los Angeles and Toronto. As a result of this uneven distribution, most pitchers don't have multiple starts worth of data yet.
Apparently the plan is to have the system installed at every park by mid-season, but currently, there are large holes in the data. I'd love to chart a start by Dice-K, (or Daniel Cabrera or Barry Zito or Rich Hill), but they haven't pitched in an on-line stadium yet. While it is frustrating to have limited data, complaining about this limitation is like finding $10 on the ground and whining that it wasn’t $20.
Here's the link to the XML from Greg Maddux's start vs. the Rockies on April 6. The web directory is organized intuitively, and with a little poking around, you can find the XML files for any game. There's a ton of information contained in these XML files.
Joe Sheehan enjoys baseball and colorful graphs. He isn't that Joe Sheehan, although he does celebrate the man's entire collection. You can email him email@example.com[/email]"]here.