A view from the Sportvision Baseball Summit

For the second year in a row, Marv White and the wonderful folks at Sportvision put on a showcase of their work, and allowed the analyst community to do the same.

Sportvision folks are those who gave us the first down marker and the K-Zone. They are partners with MLBAM in the PITCHf/x venture, something we all know through MLBAM’s Gameday. White, the chief technology officer of Sportvision, augments his incredible team with the power of the community. With the pitch data available freely for research purposes, MLBAM has unleashed a monster, a monster that White and crew have used to advance their technology and business plans. As analysts, we get some of the most fascinating data around and sincere gratitude, and a good time.

The Summit itself features a trip to a Giants game, along with a first hand (and hands on) look at the PITCHf/x system. It also allows a lot of baseball nerds to get together and put their minds, and beer mugs, together. It’s a very well formed, symbiotic relationship. The beer and the analysts, that is. The same can be said about Sportvision/MLBAM and the analysts, too.

A true win-win. We get our toys, they get candid and enthusiastic feedback on the toys. Which ain’t toys, not at all.

PITCHf/x in 2009

The main purpose of this conference, if you can pick just one, would be to talk about HITf/x. With April’s data available, there were plenty of things to talk about. But it was something even newer that stole the show. We’ll get to parts two and three of the conference agenda; we need to start at the beginning.

The one-day summit started with PITCHf/x. Quite a lot has happened since last year, and it was a chance for Dan Brooks to showcase his PITCHf/x toolkit, which you should know from his Brooks Baseball web site. Beyond his summary of the site’s functionality, which is continually being augmented, Brooks discussed the challenges of small sample sizes and many other statistical topics. When Ross Paul, of MLBAM, briefly discussed the systems used to classify pitches for Gameday, it turned into a meeting of two neural net nerds, as Brooks and Paul discussed details and concepts that went over my head, bounced off the wall and continued, once again, over my head. Despite my own ignorance, I greatly appreciated the value of such an exchange, as it enhanced understanding of the MLBAM system for those who could understand it to begin with.

Matt Lentzner, whose work you have seen here, presented his continuing conceptual work on arm slots. In 2008, he gave us the Lentzner Axis, which became a critical component of pitch classification methodology. In 2009, Lentzner turned the line into a peanut. Lentzner’s Peanut describes a consistent pattern of pitch movement at various points up and down the Lentzner Axis. The concept was demonstrated on a variety of pitchers, with a variety of arm slots—taking it from concept to tool. Paul, commenting in real time on a Beyond the Box Score thread, noted the value this could have in his pitch classification system.

Without any time line whatsoever, Paul is interested in pursuing the idea. This author will gladly provide training data, and has extended an offer to do so. Meanwhile, Brooks has already created a beta Lentzner Axis Finder, which can now be seen at Brooks Baseball. We appear to be on our way, on a couple of fronts, to improving automated pitch classification. It takes a village.

Villages need teachers. Good ones. Like Paul Robinson, who is a far better high school physics teacher than he is a ball dude. Robinson, widely known for his great work at San Mateo High School and on television, shared his thoughts on using PITCHf/x to teach physics. Paul Kagan, of Cal State, co-authored the presentation, but Robinson had the floor. And did he ever.

Sharing video clips of his “work” as a ball dude down the right field line at AT&T Park, Robinson showed his passion for the game, his ability to fall down in front of tens of thousands of people, and the reflexes of a cat. A cat that is supposed to let live balls go by, but a cat that decided to chase down a fly ball behind him instead.

But that’s not the point, Robinson showed how valuable something real, like baseball, and so precise, like PITCHf/x data, can be in taking a topic many students dislike and making it relevant and fun. Next thing you know, you’re learning. I wonder if Robinson would notice if I audited his class next semester.

HITf/x beta

To White’s good natured chagrin, it was pointed out by many (cough) that the 15,000 hit balls currently available in the HITf/x database are only enough to whet appetites, create beautiful graphs, and build frameworks for fielding, batting and pitching statistics. Baselines for quality analysis, however, will require more data. We did a fair job, as a group, of not groveling too pitifully for more. The business model for HITf/x is still in the works, so we’ll enjoy what we’ve got already. But, in terms of the flushing out the business model, I would advocate for a broader release of data.

By providing the analyst community access to more data, improvements can be made to the early models and tools discussed during the summit. It is my belief that this would only serve to improve the application, and collection, of batted ball data. It would also enhance the value proposition in terms of broadcast and online entertainment, as well as player analysis, scouting and development in the front offices of major league clubs. More data will mean more meaningful research, which will further demonstrate the utility to the front office. Perhaps we have already reached that threshold, and I must admit, I have ulterior motives—I want more toys in the toy box. And I’m sure those who share my feelings on the value of releasing the data will frequently share the same bias.

The presentations on PITCHf/x kicked off with Greg Moore, who is a key part of the effort to develop, and monetize, HITf/x. Greg demonstrated the techniques used to calculate the hit ball data (like speed and angles). This is something that Peter Jensen discussed in 2008, and it was exciting to see the results of yet another collaborative, community-driven effort. Jensen, whose work you have also seen here at The Hardball Times, discussed his most recent article, Skill Dependent Metrics, in greater detail.

Not satisfied with analyzing the data, Jensen took the effort to play back, via MLB.TV, play after play after play to validate and understand the HITf/x data in relation to the stringer data in Gameday. In each park, a stringer records the outcome of a play, including the location where the ball was fielded. This is shown in Gameday, and available in the downloadable files. I’ll let you download Jensen’s talk, and hope he publishes more here.

Before Jensen, I had my turn at the podium. I stuck to run values and the value of HITf/x in that endeavor. Look for more from me in this space on that topic. I also had a chance to speak, in an impromptu discussion, to the group about park issues with the PITCHf/x system. In short, the discussion quickly turned to getting the park issues ironed out and, if possible, a publication of all the known changes to the systems. This avenue ran two ways, as the issues with Yankee Stadium, seen notably as a shift in release points in 2008, may be addressed based on analysis I will be sharing with SportVision. Heck, I’ll give them everything I’ve got on the topic as soon as I can. Let them rack up some frequent flyer miles so I don’t have to write some really hard software to adjust the data myself.

A Hardball Times Update
Goodbye for now.

The HITf/x portion of the program concluded in style, with two fascinating presentations. Dave Allen presented a tutorial on creating those amazing heat/contour maps he’s become known for. Baseball Analysts has become informative eye candy with Allen’s addition. It was a very technical presentation, and I encourage you to check it out if you want to learn how to use R to make graphics of the quality Allen regularly produces.

Alan Nathan followed Dave Allen with his analysis of landing spot estimates based on HITf/x data. Combining HITf/x and Hit Tracker, Nathan began working out the challenges and opportunities he discovered while looking into the supposedly homer-friendly new Yankee Stadium. Nathan described a technique for calculating estimates of flight characteristics, such as descent angle. Again, highly technical; see his slides for yourself. Short story: Yankee Stadium doesn’t seem to have unusual conditions, beyond stadium dimensions, that create home runs. Perhaps the opposite. Coors Field, amazingly, was found to give the ball the most extra carry.

As I mentioned earlier, HITf/x was promoted as the star of the show. But it wasn’t the headliner. This year, White wasn’t going to let us push him around. He decided to blow our doors off.

But can it dance?

At the end of the 2008 summit, White remarked on how hungry the attendees were for HITf/x and more more more. He said he felt like the man who showed off his piano-playing cat, and we asked why the heck it couldn’t sing. Well, if PITCHf/x was the piano, and HITf/x the singing, White showed us a feline moonwalking this time.

The final portion of the conference (excluding the Giants game and hands-on demo) started with Matt Thomas, who, despite being a Cardinals fan, is a really great guy. Thomas’ expertise is photogrammetry, the use of two-dimensional photographs (what other kind are there?) to measure distances. Working as a stringer at Busch III, Thomas uses a combination of a camera, a PC and some software to capture player positions on the field. Using knowledge gleaned at last year’s gathering, Thomas improved his model and has increase the overall accuracy of his measurements.

Thomas’ talk was technical, so I won’t hack it up here. He did share one nugget of data with me, when the group was discussing batted ball classifications (a folly we hope to discard in terms of analysis, but not presentation to the casual fan). In Thomas’ data, the average “fly out” was caught 302 feet from home plate (usually measured by the location of a fielder’s feet). That’s from a sample of 308 fly outs. I lied, two nuggets. The average “line drive” single landed 240 feet from home. Thomas has 200 of those in his data set. I’ll use this public forum to beg Thomas to share the two games he has that HITf/x also has so I can do some combined analysis. No pressure.

We were now moving into the realm of something new, FIELDINGf/x. Rick Swanson set the stage with his presentation, Reaction Over Range. The topic created a lively debate, and included an amazing historical note, so it was quite memorable. As it turns out, the concept of how far and how fast a fielder can get to make a play is far from new. Swanson took us through his own measurements of distance covered (estimated) over the time it took (via stopwatch). By dividing the distance by the time, Swanson finds a number that indicates an exceptional play. His proposal to track just the exceptional plays sparked a debate, since Swanson is, in effect, discarding data. That’s one of the great things of events like the summit: Smart people debating complex ideas almost always leads to advancements.

What was truly memorable for me was the 1910 article by Hugh Fullerton that inspired Swanson. To see baseball dweebery of this quality 100 years ago is wild. Be sure to check out the links in the References and Resources, which will get you to a scanned, online version of the Fullerton piece. I vaguely remembered seeing it when Tom Tango linked it, but I had forgotten about it, and seeing in the context of what I knew I was about to see was exciting.

The dance

We’re in full stride at this point. The penultimate presentation: Greg Rybarczyk: Baseball F/X: Architecture for the Ultimate Virtual Gamecast.

In a room full of Ph.Ds, and future Ph.Ds, and many assortments of smart folk, Rybarczyk stood out. Again. Building from his 2008 presentation, Rybarczyk’s vision for the ultimate, immersive game cast has blossomed. Using his expertise as a navigator, he showed us how to classify the direction a fielder is moving using a relative bearing. Just go check out his presentation; it covers everything from the way to align your measurements for describing fielding to an incredible example of what you can do visually once you’ve mapped the entire flight of the ball. An animation of a Mickey Mantle home run was shown, and would have been more impressive if White didn’t show us something even more immersive.

Read Rybarczyk’s presentation. It basically lays out the whole kit and kaboodle, and left White with little to do in his talk, other than show off the next great thing: A full digital record of the movement of each player, umpire and ball on the field. And seagulls.

You probably saw it in the New York Times, but we saw it in motion. An overhead view of a virtual playing field, where the pitch is tracked, and classified, the hit is tracked, and measured. The baserunners and fielders are tracked. You can see the shortstop move toward the batted ball, watch it go by him, follow him as he goes out to be the cut-off man. At the same time, more numbered/colored dots are moving. You select one, and you can see how far the runner is to the next base and how fast he’s running, in miles per hour. The outfielder’s throw, yes, that’s measured, too. It was one play, but I could have watched it a thousand times.

The play that was shown in the demo was from Oakland. The cameras are installed in AT&T Park. White tried to point them out to me, but the lights were on so we couldn’t make them out. He expressed confidence in Sportvision’s ability to capitalize on the potential of a full field view of the action such that it can be used to enhance our experience, as fans and analysts. It’s a research project, and there are no time lines, but the proof-of-concept was spectacular. I hope the videos will be available online, if they aren’t already.

Encore

So much more to describe, but this has gone long enough. Everything from the feasibility of tracking catcher position to the future installations in non-major league parks was discussed just in the conversations I was privy to. It was a well-attended event, and I hope others will share their experience and perspective as well.

References & Resources
You can download the presentations described above at the summit website (http://baseball.sportvision.com/summit/)
Brooks Baseball
Gameday
Baseball Analysts, home to much of Allen’s work
Hit Tracker
Swanson’s article on Reaction over Range, with a link to Fullerton’s article.

I’d like to thank Mike Fast and Ike Hall for their over-the-web contributions to the summit, occasionally prompted by my requests over the webcast

Special thanks to Alan Nathan, Marv White, Cory Schwartz and Ross Paul for their continued interest and support of my work


15 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Mike Fast
14 years ago

Thanks, Harry!  I’m grateful for your great report on the summit.

I’m eager to hear more about the park/camera corrections for PITCHf/x .  I was chatting a bit with Dan Brooks about that last night, and I look forward to seeing details from someone.  Did Rand/Sportvision make any sort of commitment about releasing any of the behind the scenes correction factors to us?

Also, I’m totally on board with you about wanting/needing more HITf/x data in the public purview.  Unfortunately I missed this portion of the discussion on the webcast.  I know Sportvision has worked at producing HITf/x data out of the 2008 games.  It seems like that would be the perfect data to release to the public.  Even if they are trying to sell the 2009 data to the clubs, it seems like they would get less monetary return from the clubs, if any, from the 2008 data (which is probably why they are concentrating on keeping up-to-date with the 2009 data at a higher priority than producing the 2008 data).  However, I don’t think they’re going to be able to sell to the clubs nearly as effectively without us analysts providing the framework/context in which the data can be understood.  And one month of data is a pretty shaky context.  It’s a heckuva lot better context than a mere handful of games, but full 2008 season data plus April 2009 data I think would provide enough of a baseline that we could begin to see some really meaningful analysis emerge.  Did Greg or Marv talk about any plans to produce and release the 2008 data to the public?

Also, I’m curious, has anybody (Peter?) looked into finding errors in the HITf/x data?  I’m sure they’re in there, but I haven’t had time to go over the data with a fine-toothed comb yet.

Harry Pavlidis
14 years ago

Thanks guys, I’m glad I got it right.

No real commitment was made on the correction history information, but I’ll prod Marv on that soon, just to make sure we know what to expect, or not expect.

The release of more data is pending. The business plan is still being ironed out, so, it is possible the teams will get the data before we do. Or who knows, that’s up in the air. Obviously, I agree on the value of releasing more data.

Rick Swanson
14 years ago

Thanks Harry.

I’m glad you liked the Fullerton pictures. I feel that time over distance is the holy grail of defense. I know many think you need to use data on every play, and it might take some time before they change that view.

Think of it like they put errors in the boxscore. They only list those that are missed, not the plays made.

R/R plays that are measured might only happen once or twice a game.

I really felt that Greg’s presentation illustrated that more range distance in the same reaction time, will create a lower Reaction over Range number

Thanks again for including me in your article.

Hope to see you again next year.

Peter Jensen
14 years ago

Mike – I have looked for errors in the Hit f/x data.  I found about 30 or so entries that looked suspect and have passed the list along to Sportvision.  I have several others that I have promised Greg that I will send to him.  Marv asked for a show of hands of who thought it would be helpful to have a list of what changes were made to the data cameras or registration or anything that might have had made a change in the data output we were getting, when they occurred, and what the correction factor should be, and an overwhelming majority raised their hands so I think that will happen.  I asked about 2007 and 2008 Hit f/x data.  Although Marva had told me in January that they intended to run that first they decided not to and have still not run it.  the decision was to try and perfect the automation of the system to minimize the hand work before running the old data so they wouldn’t have to run it again if they made significant improvements in the automated processing.  So for right now they are trying to keep up with 2009 and tweek their automation process.  I told them that I thought that 92% to 94% was better than I thought they would get with an automated system and the balls that were giving them problems, pop ups and balls chopped into the ground were the least interesting to analysts anyway as both were well over 90% outs and should be near 100%.

Selling the data is a concern and is necessary for us to see a continued stream of data for analysis.  We may not ever see Hit f/x and Field f/x in real time like we see Pitch f/x because of the clubs need to get some period of exclusive access for the price they pay.  But several of the club representatives privately told me that releasing the data to the public before the following season would probably not be a deal breaker for them, and both Marv and MLBAM are looking out for our interests as much as they can as they realize and appreciate that we are the driving force in creating new uses for the data.

Mike Fast
14 years ago

Excellent.  Thanks, Peter.  Good to hear.

Tom M. Tango
14 years ago

Peter, what do you mean by “deal breaker”?  Unless I misunderstand the workings of MLBAM, the 30 team reps do not vote on how MLBAM handles the data.  I doubt the GMs would spend more than a few minutes to voice any kind of objection to the release of this data.  Did you get the feeling that the teams have alot more influence over MLBAM’s operation than I am describing?

Brian Cartwright
14 years ago

In regards to Reaction over Range, dividing time by distance implies a steady speed of the player, disregarding the player’s inital acceleration from a standing start to full velocity – I’m picturing in my mind a series of bubbles around each player, with compass angles as Greg R illustrated, which show how far each fielder can range in given amounts of time. For example – if a ball is hit in a given way that it should land 90 feet to the right of the centerfielder 3.5 seconds after being struck by the bat, how far can that centerfielder travel to his right in 3.5 seconds? More or less than 90 feet?

Batted balls can be classified by their absolute expected landing location and hang time, and then that position relative (distance and direction) to the fielders.

Peter Jensen
14 years ago

Great article Harry.  You captured not only the inspiration of seeing the presentations, but the just pure fun of being there amidst so many people that ahare the same perversion.  I’m talking about baseball statistical analysis, not your other perversions Harry.  It was an exhausting trip, but it was worth every lost hour of sleep and every leg cramp from the coast to coast flying.  We even got to see live demonstrations of magnus force in the curve of the balls hit directly at us while we sat in our seats directly behind the right fielder.  Let’s hope that MArv and MLB continue to sponsor this event.  Those of you who haven’t been able to make it in 2008 or 9, save your vacation time for 2010.

Tom M. Tango
14 years ago

Brian, I have tried to make that point privately to Rick a few months ago.  The basic idea is fine.  The actual mathematics of it is wrong.  But, it is just one little equation away from being fine.

Brian Cartwright
14 years ago

Rick – Yes, I want to know how much ground can a player cover in a given amount of time, but I don’t think you can ignore the inital acceleration. The same player who can go 60 feet in 3 second (.050) will not be able to go 40 feet in 2 seconds.

Rick Swanson
14 years ago

Brian, I’m glad you keep asking questions. That is how we will find the answers.

What I am saying is, on every given play each player has X amount of distance to range in X amount of time. Some players will make the play and others will not.

There will be the furthest distance ranged in under 1 second. Furthest under two seconds, under 3, 4, 5, and 6 seconds.

Then you will have the number of .050-060 plays that are made for each of these times. Maybe we will have plays from .040-.050. My distance measurements are always guesses. Matt Thomas sent me over 600 plays all with R/R numbers.

Rick Swanson
14 years ago

Brian,

My whole concept of Reaction time over Range Distance is to create a statistic that is simple to compute.

It does not matter how fast you accelerate, it only matters if you catch the ball in X amount of time.

Greg’s graphs illustrated it better than I said it.

If one player ranges 60 feet in 3 seconds, and another only 40 feet, player A will have a R/R number of .050, and player B will be .075. The lower the number the better the play

Mike
14 years ago

This is rather basic, but how do you open the xml files in powerpoint?

Mike Fast
14 years ago

Mike, if you’re talking about the .pptx files, they are the Office 2007 version of .ppt files.  If you have Office 2003 or an older version of Office, you can find a converter to open them here:
http://office.microsoft.com/en-us/products/HA101686761033.aspx

If you’re actually talking about .xml files, perhaps you could be more specific about which files you are referring to?

Mike
14 years ago

Thanks, Mike, looks like that did the trick.  For some reason IE was opening the .pptx files needed to be unzipped and read as .xml, instead of opening them directly with Powerpoint.  That’s still the case, but Firefox seems to have no trouble opening them.