Hands up who hasn’t heard of Hit Tracker? Right, no one … no introduction required then! Well, just in case you happen to be a THT virgin, I’ll proffer the one paragraph version.
Hit Tracker was, in my opinion, the best thing to come out of the Internet in 2006. Greg Rybarczyk, a former U.S. Naval officer, part-time physicist and all-round good guy has spent many man hours developing some software that tracks the speed and distance of every home run.
Greg has kindly agreed to stop by for some virtual coffee and cake to talk about the genesis of Hit Tracker, some cool things that he learned last year and what his plans for the future of the site are.
The interview is in two installments. This is part one, and part two will follow in a couple of weeks. Part one focuses more on the genesis, physics and accuracy of Hit Tracker. In part two we will talk about some of the cool things you can learn from the site. As you’ll see, Greg does most of the talking. (Which is a good thing, I might add!)
Here is what he had to say.
My Chat with Greg
John Beamer: Greg, many thanks for agreeing to spend some time talking about Hit Tracker. In the THT 2007 Annual you mention that the inspiration for Hit Tracker came from a Manny Ramirez home run in 2005. Is that really true?
Greg Rybarczyk: The Manny story is true. I have been a Red Sox fan all my life, and have always followed them any way I could as I moved around the country. In April, 2005, I read a game recap in the online version of the Boston Globe written by Sox beat reporter Chris Snow, who has since become a member of the front office of the Minnesota Wild NHL hockey team.
In the article, Snow described a long Ramirez homer that cleared the light tower in left field at Fenway on its way out of the park, and stated that the Red Sox no longer provide home run distances. In the article, Snow wrote, “No one will ever know how far the ball traveled”. That triggered an idea in my head: why not?
JB: Cool, so what happened after that how, long did it take and what did you have to do to turn that dream into reality?
GR:I have some background in physics, and I love a challenge, so I decided to give it a try, and within a couple days I had a working prototype that incorporated all the forces that act on a flying baseball: gravity, wind resistance and Magnus Force. I came up with the idea of timing Ramirez’s homer to the point where it left the park, and by knowing exactly where that point was (height, distance, direction), I found I could project the rest of its flight path and say with confidence how far it would have gone: in that case, 449 feet.
Knowing that one piece of information was satisfying, but it also made me want to put it in context. I decided that it would be fun to look at the rest of the homers the Red Sox hit in Fenway that year, so I made a scale diagram of Fenway and analyzed all of those. Again, more desire for context, so next I made diagrams for a couple other parks (Yankee Stadium and Tropicana Field), and naturally started imagining what it would be like if I had diagrams for all 30 parks. If I had that, I could provide an accurate distance for every home run, with details to back them up, something that more often than not has been missing from past estimates. The concept of Hit Tracker originated then.
In September, 2005, I contacted Chris Snow at the Boston Globe and showed him Hit Tracker, and he immediately began consulting me for information on Red Sox homers, using Hit Tracker data in several articles over the last month of the season. Chris’ enthusiasm was a huge motivator for me to take the step of covering the whole of MLB, which turned out to be a monumental task.
During the off-season I worked probably two to three hours a night, four to five nights a week, refining Hit Tracker and especially researching and crafting the scale diagrams of all the parks. I didn’t finish the last park until the morning of opening day, April 2006! In parallel, I worked with a website designer to create a site to display Hit Tracker data, which went live in April, 2006 as well and has been a great success.
JB: Wow. That is quite a story. How did you go about making sure that the representations of the parks were correct and that landing points were accurately marked?
GR: It’s the upfront research that I’ve done that makes it accurate. I worry about it in November through March, so I don’t have to during the season.
I’ve spent a huge amount of time getting the diagrams to be accurate, that’s number one. I’ve taken satellite photos, which are truly a blessing of the modern world, and then did the trigonometry to correct them for the overhead angle not being exactly straight down (look at a Google Earth photo of Yankee Stadium’s right field pole, and it appears as a line, not a dot—that means the photo wasn’t taken from straight overhead).
I check all my diagrams against the known between bases (90′) and home to 2nd (127.3′) whenever possible, and doing that has given me a lot of confidence that the satellite photos, and thus my diagrams, are accurate. Domed stadiums are a bit harder, but the same principles apply: use hi-res photos, apply trigonometry, and check everything against known distances. I guess I’m depending on the bases being the right distance apart, though!
Having a correct diagram is only part of getting an accurate landing location, though. Making an accurate spot is the other part of that equation. That is just a matter of reviewing video very carefully, several times if needed, and being meticulous about transferring the spot from the video screen into the Hit Tracker program.
Here I think there probably is some error induced; for balls that land in easily marked spots (e.g. they hit a sign, the fair pole, a tunnel leading back to the concourse, etc.) this error is probably near zero, but when a ball lands in a sea of people in the middle of a wide bleacher section (e.g. Wrigley Field), the spot may be off by a foot or two, sometimes. There’s also more uncertainty when the cameraman zooms in very close for the landing, which seems to happen in U.S. Cellular Field frequently—I have to replay the video and mark people’s locations by what they’re wearing, before the zoom in, and then mark the ball spot from them.
Spotting balls that completely leave the park are the toughest, and require the most replays. This is most frequent in Fenway Park, Wrigley Field and Minute Maid Park (roof open), and occasionally comes up in PNC, Shea Stadium and a couple others. I try to make sure I record these games on my HDTV, so I can pick the ball up more precisely, and I look through the whole telecast for alternate angles (the Red Sox telecasts usually have a side view of the homers over the Monster).
Also, check my diagrams for Fenway and Minute Maid, and you’ll see that there is a special sub-diagram in left field for each, to help mark the long homers in that direction. The balls that fly near objects in these sub-diagrams (e.g. Vladimir Guerrero‘s July 30 homer at Fenway that went through the light tower) can be very accurately marked, while some other balls disappear from the video and never are seen again, at which point I call them “missing observations” and leave their data blank (which happened 11 times in Minute Maid park and 36 times total).
JB: Okay, so you’ve clearly put a lot of thought into the park layout; what about the analysis engine? And how much uncertainty is there is in the final measurement … I imagine it’s quite easy if it lands near a known distance marker, but those over the Fenway light tower must be much harder to calculate accurately, especially when you are converting to a standard distance.
GR: Basically, it uses equations for aerodynamic drag and general projectile motion that are well accepted and have been around for centuries. Where specific values are needed (e.g. drag coefficients for a given ball velocity), I have used values provided by Dr. Robert Adair, whose book “The Physics of Baseball” is a well known classic. Adair himself has warned his readers that those drag coefficients have some uncertainty, but this uncertainty doesn’t have a significant impact on the distance numbers for most fly balls because of the way Hit Tracker works. I’ll try to explain that succinctly:
Hit Tracker’s algorithm is to measure the time of flight to a point near the end of a ball’s trajectory (say the grandstand, 15 feet above field level), and then generate a “test” trajectory to try to pass through the landing point after the right amount of time. The test trajectory (with the proper wind, temperature and altitude applied) is adjusted until it does pass through the landing point at the right time, and when it does, that is the trajectory the ball must have taken.
The closer the observed landing point is to field level, the less influence any of the uncertain drag numbers or weather numbers have on the outcome. In the extreme case, if the ball is observed to land at field level (say in an outfield bullpen), there is no uncertainty at all in the calculated distance. When the observation point is high in the air (light tower homers), there is more uncertainty, perhaps a few feet of distance, but unfortunately, when a ball strikes a light tower, you can’t check this estimate, since the ball never got to complete its trajectory. You’ve got to depend on the methodology, and the diligence of one’s observations.
JB: What about weather factors such as temperature and wind. I know you’ve spent a lot of time thinking about those, but is there room for greater accuracy?
GR: The atmospheric conditions used by Hit Tracker are certainly a source of uncertainty, perhaps the largest one, and one of my biggest areas of focus going forward. These values, particularly wind, won’t impact most distance measurements greatly, for the reasons explained above related to the proximity of most observation points to field level, but it can have a big impact on the Speed Off Bat estimates.
When a ball lands in an outfield bullpen at field level, there is no doubt about the distance if it is marked properly, but the difference in how hard the ball was hit will be great between a tail wind, no wind or a head wind. So, I try to make sure the wind is properly estimated, and properly modeled. The best wind observations are made from watching a flag on the video replay. If this isn’t available, I use weather info from the Internet, and if I have nothing else, I will fall back on the box score weather data, which will be hours old by the time the game ends.
What I am hoping will happen is that MLB will install weather stations at their ballparks and make the data streams accessible to the public via the Internet. Weather Underground, a great weather site, has a program for doing so which requires a few hundred dollars and a computer connected to the Internet. Apparently there is already one of these stations set up at U.S. Cellular Field: here’s a link.
JB: I know you recently discovered a little nit in the program. Can you shed light on exactly what happened and what the implications are for the data?
GR: Yes, unfortunately, I did discover that there is a minor error in the Hit Tracker analysis code, or more precisely, Dr. Alan Nathan did. Dr. Nathan is a professor at the University of Illinois, and one of the most highly respected authorities on baseball physics.
Recently, I sent Dr. Nathan a copy of Hit Tracker so he could review my analytical model, particularly the spin modeling, which is based on the equations and parameters published in Dr. Robert Adair’s The Physics of Baseball. Dr. Nathan had a few welcome suggestions on how I can improve the spin modeling, based on recent research, but much to my horror, he also pointed out a problem in my code related to units. In the section of the code related to the Magnus Force (i.e. the force on the ball due to the spin), my equation used velocity in meters per second, where Adair’s equation called for velocity in mph.
Fortunately, by re-analyzing my original observations with the corrected code, I have been able to determine that the impact of this error is very small. The error has essentially no effect on the distance calculations, due to the way Hit Tracker works (using a known observation point at the end of the ball’s flight). It does, however, have a minor effect on the SOB, or speed off the bat, numbers, with the SOB value for most home runs changing by 0.5 to 1.0 mph (or in rare cases, slightly more than this). So, embarrassing as it is to find a mistake, the data impact is trivial, and there is no effect on any conclusions I have drawn from the home run data to date.
I plan to recast all of the 2006 and 2007 home run data, based on the correction of this units error, as well as any other modifications to the spin model that Dr. Nathan and I arrive at, and I am gratified that the end result will be a Hit Tracker that is more accurate than ever. And I take solace in the thought important as we all feel our baseball statistics to be, at least my English/Metric units error didn’t result in the loss of a $125 million spacecraft!
JB: So all-in how accurate do you reckon Hit Tracker is at the moment?
GR: Overall, I feel that most home run distance on Hit Tracker are accurate to within one or two feet, while the ones that leave the park at a high height are probably accurate within one to five feet.
That’s it for part one. Part two will run in a couple of weeks and will discuss Hit Tracker as a source of serendipity, as well as what we can look forward to in 2007 and beyond.
References & Resources
A big, big thanks to Greg for sharing the history and physics of Hit Tracker with us. The amount of time, effort and dedication that has gone into the site is nothing short of astonishing. Baseball fans everywhere should be eternally grateful.