A view from the Sportvision Baseball Summitby Harry Pavlidis
July 14, 2009
For the second year in a row, Marv White and the wonderful folks at Sportvision put on a showcase of their work, and allowed the analyst community to do the same.
Sportvision folks are those who gave us the first down marker and the K-Zone. They are partners with MLBAM in the PITCHf/x venture, something we all know through MLBAM's Gameday. White, the chief technology officer of Sportvision, augments his incredible team with the power of the community. With the pitch data available freely for research purposes, MLBAM has unleashed a monster, a monster that White and crew have used to advance their technology and business plans. As analysts, we get some of the most fascinating data around and sincere gratitude, and a good time.
The Summit itself features a trip to a Giants game, along with a first hand (and hands on) look at the PITCHf/x system. It also allows a lot of baseball nerds to get together and put their minds, and beer mugs, together. It's a very well formed, symbiotic relationship. The beer and the analysts, that is. The same can be said about Sportvision/MLBAM and the analysts, too.
A true win-win. We get our toys, they get candid and enthusiastic feedback on the toys. Which ain't toys, not at all.
PITCHf/x in 2009
The main purpose of this conference, if you can pick just one, would be to talk about HITf/x. With April's data available, there were plenty of things to talk about. But it was something even newer that stole the show. We'll get to parts two and three of the conference agenda; we need to start at the beginning.
The one-day summit started with PITCHf/x. Quite a lot has happened since last year, and it was a chance for Dan Brooks to showcase his PITCHf/x toolkit, which you should know from his Brooks Baseball web site. Beyond his summary of the site's functionality, which is continually being augmented, Brooks discussed the challenges of small sample sizes and many other statistical topics. When Ross Paul, of MLBAM, briefly discussed the systems used to classify pitches for Gameday, it turned into a meeting of two neural net nerds, as Brooks and Paul discussed details and concepts that went over my head, bounced off the wall and continued, once again, over my head. Despite my own ignorance, I greatly appreciated the value of such an exchange, as it enhanced understanding of the MLBAM system for those who could understand it to begin with.
Matt Lentzner, whose work you have seen here, presented his continuing conceptual work on arm slots. In 2008, he gave us the Lentzner Axis, which became a critical component of pitch classification methodology. In 2009, Lentzner turned the line into a peanut. Lentzner's Peanut describes a consistent pattern of pitch movement at various points up and down the Lentzner Axis. The concept was demonstrated on a variety of pitchers, with a variety of arm slots—taking it from concept to tool. Paul, commenting in real time on a Beyond the Box Score thread, noted the value this could have in his pitch classification system.
Without any time line whatsoever, Paul is interested in pursuing the idea. This author will gladly provide training data, and has extended an offer to do so. Meanwhile, Brooks has already created a beta Lentzner Axis Finder, which can now be seen at Brooks Baseball. We appear to be on our way, on a couple of fronts, to improving automated pitch classification. It takes a village.
Villages need teachers. Good ones. Like Paul Robinson, who is a far better high school physics teacher than he is a ball dude. Robinson, widely known for his great work at San Mateo High School and on television, shared his thoughts on using PITCHf/x to teach physics. Paul Kagan, of Cal State, co-authored the presentation, but Robinson had the floor. And did he ever.
Sharing video clips of his "work" as a ball dude down the right field line at AT&T Park, Robinson showed his passion for the game, his ability to fall down in front of tens of thousands of people, and the reflexes of a cat. A cat that is supposed to let live balls go by, but a cat that decided to chase down a fly ball behind him instead.
But that's not the point, Robinson showed how valuable something real, like baseball, and so precise, like PITCHf/x data, can be in taking a topic many students dislike and making it relevant and fun. Next thing you know, you're learning. I wonder if Robinson would notice if I audited his class next semester.
To White's good natured chagrin, it was pointed out by many (cough) that the 15,000 hit balls currently available in the HITf/x database are only enough to whet appetites, create beautiful graphs, and build frameworks for fielding, batting and pitching statistics. Baselines for quality analysis, however, will require more data. We did a fair job, as a group, of not groveling too pitifully for more. The business model for HITf/x is still in the works, so we'll enjoy what we've got already. But, in terms of the flushing out the business model, I would advocate for a broader release of data.
By providing the analyst community access to more data, improvements can be made to the early models and tools discussed during the summit. It is my belief that this would only serve to improve the application, and collection, of batted ball data. It would also enhance the value proposition in terms of broadcast and online entertainment, as well as player analysis, scouting and development in the front offices of major league clubs. More data will mean more meaningful research, which will further demonstrate the utility to the front office. Perhaps we have already reached that threshold, and I must admit, I have ulterior motives—I want more toys in the toy box. And I'm sure those who share my feelings on the value of releasing the data will frequently share the same bias.
The presentations on PITCHf/x kicked off with Greg Moore, who is a key part of the effort to develop, and monetize, HITf/x. Greg demonstrated the techniques used to calculate the hit ball data (like speed and angles). This is something that Peter Jensen discussed in 2008, and it was exciting to see the results of yet another collaborative, community-driven effort. Jensen, whose work you have also seen here at The Hardball Times, discussed his most recent article, Skill Dependent Metrics, in greater detail.
Not satisfied with analyzing the data, Jensen took the effort to play back, via MLB.TV, play after play after play to validate and understand the HITf/x data in relation to the stringer data in Gameday. In each park, a stringer records the outcome of a play, including the location where the ball was fielded. This is shown in Gameday, and available in the downloadable files. I'll let you download Jensen's talk, and hope he publishes more here.
Before Jensen, I had my turn at the podium. I stuck to run values and the value of HITf/x in that endeavor. Look for more from me in this space on that topic. I also had a chance to speak, in an impromptu discussion, to the group about park issues with the PITCHf/x system. In short, the discussion quickly turned to getting the park issues ironed out and, if possible, a publication of all the known changes to the systems. This avenue ran two ways, as the issues with Yankee Stadium, seen notably as a shift in release points in 2008, may be addressed based on analysis I will be sharing with SportVision. Heck, I'll give them everything I've got on the topic as soon as I can. Let them rack up some frequent flyer miles so I don't have to write some really hard software to adjust the data myself.
The HITf/x portion of the program concluded in style, with two fascinating presentations. Dave Allen presented a tutorial on creating those amazing heat/contour maps he's become known for. Baseball Analysts has become informative eye candy with Allen's addition. It was a very technical presentation, and I encourage you to check it out if you want to learn how to use R to make graphics of the quality Allen regularly produces.
Alan Nathan followed Dave Allen with his analysis of landing spot estimates based on HITf/x data. Combining HITf/x and Hit Tracker, Nathan began working out the challenges and opportunities he discovered while looking into the supposedly homer-friendly new Yankee Stadium. Nathan described a technique for calculating estimates of flight characteristics, such as descent angle. Again, highly technical; see his slides for yourself. Short story: Yankee Stadium doesn't seem to have unusual conditions, beyond stadium dimensions, that create home runs. Perhaps the opposite. Coors Field, amazingly, was found to give the ball the most extra carry.
As I mentioned earlier, HITf/x was promoted as the star of the show. But it wasn't the headliner. This year, White wasn't going to let us push him around. He decided to blow our doors off.
But can it dance?
At the end of the 2008 summit, White remarked on how hungry the attendees were for HITf/x and more more more. He said he felt like the man who showed off his piano-playing cat, and we asked why the heck it couldn't sing. Well, if PITCHf/x was the piano, and HITf/x the singing, White showed us a feline moonwalking this time.
The final portion of the conference (excluding the Giants game and hands-on demo) started with Matt Thomas, who, despite being a Cardinals fan, is a really great guy. Thomas' expertise is photogrammetry, the use of two-dimensional photographs (what other kind are there?) to measure distances. Working as a stringer at Busch III, Thomas uses a combination of a camera, a PC and some software to capture player positions on the field. Using knowledge gleaned at last year's gathering, Thomas improved his model and has increase the overall accuracy of his measurements.
Thomas' talk was technical, so I won't hack it up here. He did share one nugget of data with me, when the group was discussing batted ball classifications (a folly we hope to discard in terms of analysis, but not presentation to the casual fan). In Thomas' data, the average "fly out" was caught 302 feet from home plate (usually measured by the location of a fielder's feet). That's from a sample of 308 fly outs. I lied, two nuggets. The average "line drive" single landed 240 feet from home. Thomas has 200 of those in his data set. I'll use this public forum to beg Thomas to share the two games he has that HITf/x also has so I can do some combined analysis. No pressure.
We were now moving into the realm of something new, FIELDINGf/x. Rick Swanson set the stage with his presentation, Reaction Over Range. The topic created a lively debate, and included an amazing historical note, so it was quite memorable. As it turns out, the concept of how far and how fast a fielder can get to make a play is far from new. Swanson took us through his own measurements of distance covered (estimated) over the time it took (via stopwatch). By dividing the distance by the time, Swanson finds a number that indicates an exceptional play. His proposal to track just the exceptional plays sparked a debate, since Swanson is, in effect, discarding data. That's one of the great things of events like the summit: Smart people debating complex ideas almost always leads to advancements.
What was truly memorable for me was the 1910 article by Hugh Fullerton that inspired Swanson. To see baseball dweebery of this quality 100 years ago is wild. Be sure to check out the links in the References and Resources, which will get you to a scanned, online version of the Fullerton piece. I vaguely remembered seeing it when Tom Tango linked it, but I had forgotten about it, and seeing in the context of what I knew I was about to see was exciting.
We're in full stride at this point. The penultimate presentation: Greg Rybarczyk: Baseball F/X: Architecture for the Ultimate Virtual Gamecast.
In a room full of Ph.Ds, and future Ph.Ds, and many assortments of smart folk, Rybarczyk stood out. Again. Building from his 2008 presentation, Rybarczyk's vision for the ultimate, immersive game cast has blossomed. Using his expertise as a navigator, he showed us how to classify the direction a fielder is moving using a relative bearing. Just go check out his presentation; it covers everything from the way to align your measurements for describing fielding to an incredible example of what you can do visually once you've mapped the entire flight of the ball. An animation of a Mickey Mantle home run was shown, and would have been more impressive if White didn't show us something even more immersive.
Read Rybarczyk's presentation. It basically lays out the whole kit and kaboodle, and left White with little to do in his talk, other than show off the next great thing: A full digital record of the movement of each player, umpire and ball on the field. And seagulls.
You probably saw it in the New York Times, but we saw it in motion. An overhead view of a virtual playing field, where the pitch is tracked, and classified, the hit is tracked, and measured. The baserunners and fielders are tracked. You can see the shortstop move toward the batted ball, watch it go by him, follow him as he goes out to be the cut-off man. At the same time, more numbered/colored dots are moving. You select one, and you can see how far the runner is to the next base and how fast he's running, in miles per hour. The outfielder's throw, yes, that's measured, too. It was one play, but I could have watched it a thousand times.
The play that was shown in the demo was from Oakland. The cameras are installed in AT&T Park. White tried to point them out to me, but the lights were on so we couldn't make them out. He expressed confidence in Sportvision's ability to capitalize on the potential of a full field view of the action such that it can be used to enhance our experience, as fans and analysts. It's a research project, and there are no time lines, but the proof-of-concept was spectacular. I hope the videos will be available online, if they aren't already.
So much more to describe, but this has gone long enough. Everything from the feasibility of tracking catcher position to the future installations in non-major league parks was discussed just in the conversations I was privy to. It was a well-attended event, and I hope others will share their experience and perspective as well.
References and Resources
You can download the presentations described above at the summit website (http://baseball.sportvision.com/summit/)
Baseball Analysts, home to much of Allen's work
Swanson's article on Reaction over Range, with a link to Fullerton's article.
I'd like to thank Mike Fast and Ike Hall for their over-the-web contributions to the summit, occasionally prompted by my requests over the webcast
Special thanks to Alan Nathan, Marv White, Cory Schwartz and Ross Paul for their continued interest and support of my work
Harry Pavlidis admits he has a baseball problem. He is the founder of Pitch Info LLC, His pitch classifications power the player cards at Brooksbaseball.net. Feedback, questions and comments are appreciated - Email email@example.com and Twitter @harrypav