Drinking from a fire hoseby Mike Fast
May 22, 2008
One of the most incredible weekends of my life was spent recently in the San Francisco Bay Area. On May 10-11, I attended the First Annual PITCHf/x Summit, co-sponsored by Sportvision, the creator of the PITCHf/x system, and Major League Baseball Advanced Media (MLBAM). I was excited about the summit beforehand, as you might have been able to guess from this article, but the experience far exceeded my expectations.
First of all, I thank my wife, Lori, for her encouragement and assistance. We ostensibly made the occasion into a family vacation to beautiful San Francisco. In practice, it meant that she was navigating the city transit systems with our four-year-old and two-year-old while I was soaking up baseball knowledge and making new friends at the summit. To top it off, she had kid duty on Mother's Day, earning her the award of Best Wife and Mother of 2008. I would also like to thank Sportvision for their hospitality to my family.
One of the best parts of the summit was putting faces to names. You might already be familiar with Dr. Alan Nathan through his Physics of Baseball site. His work with PITCHf/x has been very influential in my own understanding of the physics of pitching, and it was good to finally meet him. Alan did a masterful job of pulling together people to present on disparate topics that turned the program into a whole that was greater than the sum of its parts.
It was helpful to see the PITCHf/x system in action. I learned in detail about the calibration and registration of the cameras, and I saw the software interface that the PITCHf/x operator uses during the game. I also learned for the first time that there are two people taking care of the system and the data entry during the game. (I learned this by confusing the two terms and getting a blank look from one of the Sportvision or MLBAM folks). First, there is the stringer, who has the primary responsibility for data entry, such as attaching ball/strike/in-play outcomes to the pitches detected by PITCHf/x; you can think of the stringer as the scorer for the MLB Gameday system. And then there is the operator, who has primary responsibility for the health of the PITCHf/x system of cameras, computers and software, making sure it stays calibrated and otherwise performs as expected throughout the game.
Saturday's session covered two interesting topics. The first topic dealt with methods for pitch classification, and the second topic dealt with proposals for gathering trajectory data for batted balls, also known affectionately as HITf/x. Pitch classification is a favorite subject of mine—I love learning about how pitchers attempt to use their craft to outwit the batter, and about the tools that pitchers have at their disposal. I am fascinated by the aerodynamics of spinning baseballs and how that translates into the athletic endeavors of pitchers. I had the privilege of presenting some of my thoughts on pitch classification to the group. My method for pitch classification relies primarily on three parameters: the speed of the pitch and the spin axis and spin rate of the baseball. Using John Smoltz as an example, I talked about how I use these parameters to classify pitches, and I showed a few results of the type of analysis that this classification enables. Those of you who have followed my previous writing are familiar with the kind of examples I showed.
The pitch classification panel also included Ross Paul of MLBAM. Ross demonstrated his method for real-time pitch classification that is incorporated in MLB's Gameday application. Basically, he uses a neural net algorithm that takes the location, velocity and acceleration data from PITCHf/x as inputs, weights the various inputs with a hidden layer, and outputs the confidence that the sampled pitch matches each pitch type. A 1.5x multiplier is applied to the confidence for pitches that are known to be part of the pitcher's repertoire, and the pitch with the highest output confidence is reported in Gameday.
The training data for the neural net were scouting reports for 3,000 pitches from last year. Ross reported that some of the quirks in the pitch classification output, such as a dearth of two-seam fastballs, may be due to quirks in the training data. There was no paucity of suggestions for improvements to MLBAM's classification system. Ross mentioned one improvement that was already in progress—namely, scaling the input speed to a given pitcher's fastest and slowest recent pitches. Figuring out how to classify pitches quickly and accurately while still being able to account for any generic pitcher who might be promoted from the minors is no small task, but it's exciting to think how having this information available in the game broadcast will deepen the experience for the fan.
The HITf/x discussion panel was my first exposure to the frame-by-frame video captured by the PITCHf/x system. I'd seen some bits and pieces of PITCHf/x video captures in other places, but Peter Jensen's presentation was the first time I'd seen it advanced and rewound frame by frame where I could really see how the baseball was moving both on the pitch and off the bat after impact. Even with only a few frames of batted ball data in the existing PITCHf/x video captures, Peter highlighted how much valuable information could be gathered. Even a crude measure of the velocity of the ball off the bat would tell us a lot about the effectiveness of both hitters and pitchers. There is a potential treasure trove of information just in the limited HITf/x information captured by the existing PITCHf/x system.
Greg Rybarcyzk of Hit Tracker Online followed with a proposal for a more comprehensive HITf/x system that would capture information not only about the launch of a batted ball off the bat but also about its flight path, landing point and fielding location. Greg made an impressive case for how this information would transform the fan experience in the MLB Gameday application. He also used information from the Washington Nationals construction webcam that was in place earlier in the year to show how a camera with a view of the whole field could be used to track the positions of fielders.
Matt Thomas then gave a very helpful talk on the use of cameras to photograph events on the field and how to calculate the errors involved in order to optimize the measurements. His presentation is a great reference for anyone who undertakes a project that involves registering camera pixels from a digital image to real-world locations. Matt showed some interesting applications of his work at Busch Stadium, including some of the more extreme infield shifts that he observed as well as things like the shallowest and deepest positions for outfielders. The shallowest outfielder that Matt observed was Kevin Mench, playing only 243 feet away in left field when pitcher Mike Maroth was at the plate; the deepest outfielder was David DeJesus, standing 350 feet away in center field when Albert Pujols was batting in the bottom of the ninth inning in a tie game.
Saturday afternoon, we all attended the Giants-Phillies game at AT&T Park. It was my first chance to see the beautiful new park. My previous Giants' games had all been at Candlestick, where my fond memories will always include the cold summer winds for which the park gained its well-deserved reputation, and Greg Maddux pitching 5.2 no-hit innings before allowing a single to his mound opponent Mark Gardner. This day's ballgame featured crafty lefty Jamie Moyer for the Phils and young star Tim Lincecum for the Giants. Moyer was shelled for six runs and gone after four innings, but Lincecum pitched a gem, allowing only two solo home runs and three other baserunners over eight innings. I didn't pay as much attention to the game as I normally do. We were seated in the McCovey Cove Loft, courtesy of Sportvision, and there was a lot of chatting and eating and drinking to do.
One of my favorite moments at the park was getting Matt Lentzner's in-person tutorial of his energy model of the pitching grip and release. Matt and I had corresponded previously about the ideas in his model, but it was fascinating to see it demonstrated first-hand by Matt with the baseball that he carries in his pocket. Matt has promised to publish more details of his model to a wider audience, so I won't steal any more of his thunder. Suffice it to say that his model is very simple yet very powerful and informative, and if nobody has come up with it before (and I'm not aware that they have), I can't see why not.
Sunday's session covered three topics: data quality improvement, pitch database creation and the aerodynamics of baseballs. My fellow University of Oklahoma physics alum Ike Hall presented some excellent work that he has been doing using drag coefficient to detect park-to-park biases in the PITCHf/x cameras. After Ike's presentation, Mont Hubbard and someone else (whose name I forget) made suggestions for ways to include a measurement of gravity in the PITCHf/x pre-game calibration. I'm looking forward to the corrections and improvements to the data that come from the work of Ike, Mont and others.
Ike and I presented to the group about the structure of our respective pitch databases, mine in MySQL and Ike's in ROOT. I'll skip the details here, but you can find our respective methods outlined at my web site and Ike's web site.
The aerodynamics panel was one of my favorites. I've been fascinated by the physics of baseball since my brother bought Dr. Bob Adair's Physics of Baseball book back in the early nineties. PITCHf/x has unleashed a volume of experimental data about the physics of pitched baseballs, and Alan Nathan had already grabbed my interest with his exploration of this data to investigate the drag and lift coefficients and find approximate solutions to the equations of motion for a pitch. Dr. Mont Hubbard looked at what we know historically about the drag and lift coefficients for spinning baseballs and at some of the theory behind the Magnus force and the so-called "drag crisis." There was some good discussion about why we don't see a drag crisis in the PITCHf/x data.
I hope that I've given you a good taste of the subject matter at the PITCHf/x summit. It was an amazing experience to be in a room full of people interested in the same niche subject as I am, even as they came at the topic from very different viewpoints. I enjoyed meeting a number of the club representatives as well as all of the other independent researchers.
Both Sportvision and MLBAM were very gracious hosts. I was surprised and pleased by their openness and eagerness to engage the research community. I was duly impressed by the technical skill of the Sportvision people, ranging from their creativity in coming up with PITCHf/x in the first place, to their engineering skill in making it work, to their production ability in making interesting and useful video and effects available to broadcasters in real time. I was excited to hear about MLBAM's vision for using PITCHf/x to improve the baseball experience and about their commitment to keeping the detailed pitch data freely available for academic purposes. Finally, no summary of the summit would be complete without a huge thanks to Catherine Kung of Sportvision for organizing it and making it a pleasant experience for all of us.
I left the summit excited and optimistic about the future of PITCHf/x and its coming impact on the game. Even though I might have considered myself an expert on PITCHf/x, after being around that group of people for a weekend, I was left with the distinct feeling of how little I really know and how much I depend on the efforts and smarts of others. For whatever reason, baseball analysis seems to attract the best and brightest. I look forward to seeing what creative research and PITCHf/x improvements are spawned from the First Annual PITCHf/x Summit, and I wonder what we'll be discussing at next year's summit!
Mike Fast is a Royals fan who enjoys investigating baseball questions using data of many sorts. He is a member of Complete Game Consulting. He welcomes comments via e-mail.