Garik at Beyond the Boxscore has recently warned us about PITCHf/x calibration issues, following the steps of many other analysts. Mike Fast’s “The interned cried a little…” can be cited as the central article on the subject, despite it not being the first time the issue was brought on. Probably Josh Kalk (now working with the Rays) has been working on calibration corrections since PITCHf/x day one.
During last season’s Championship Series, Fast showed the values he estimated the PITCHf/x system to be offset in each game, pointing to an extreme occurrence in ALCS Game Six, for which the horizontal location might have been off by as much as 5.5 inches.
This doesn’t imply PITCHf/x location data are unreliable—only we have to make an effort to estimate the system bias on a given night.
Making our work simpler is the modern baseball habit of not letting the pitchers go the full nine. In fact, with more than four pitchers getting into most of the games, we can notice whether every hurler in a given game seems to place his offerings in spots unusual for him.
Let’s say PITCHf/x reports Roy Oswalt is locating his slider wider than usual. That might indicate he’s not hitting his spots tonight. But if all the other pitchers who get into the game are reported to miss their usual spots in the same direction and by roughly the same amount of space, then it’s probably a bias of the system.
This is not an original idea, since Fast calculates his values starting with the same assumptions.
I used Multilevel Modeling to estimate calibration offsets from 2008 to 2010. Some time ago I used the same statistical technique as an advanced form of With-Or-Without-You to evaluate holding baserunners, accounting for the abilities of runners, catchers and pitchers, and their interactions.
Location here is modeled as a function of the pitch type (the preferred location of a curveball is different from the one of a fastball), the batter handedness (a slider from a right-handed pitcher against a right-handed hitter is differently aimed than a slider to a portsider), and the count on the batter (since we can expect pitchers to expand their zone when ahead).
The always useful Wikipedia defines Multilevel Models as “statistical models of parameters that vary at more than one level.”
In this case we have the single pitch as the basic level.
Going up one level we have the pitchers. The fact that location is a function (among other parameters) of pitch type, batter handedness and count, should be true for every pitcher, but those parameters don’t have the same effect on all of them: One pitcher might expand more when ahead, another might use the backdoor slider more frequently, and so on. Putting pitchers at a higher level in a Multilevel Model assures those differences are taken into account.
Another factor that needs to go on a higher level in the model hierarchy is the game, as our hypothesis is that each game has an effect on location (due to the PITCHf/x miscalibration), which applies no matter what’s the count or who is the pitcher.
Between 2008 and 2010, more than 7,000 games have been played in major league parks. For half of them, the calibration offset on horizontal location has been lower than one inch and nine out of 10 had values lower than two inches. A total of 71 games had the PITCHf/x system recording values biased by more than three inches: 15 happened in Miller Park, eight at Camden Yards, seven in Minute Maid Park and six in the new Yankee Stadium.
As a result of the many games in which the cameras were off, Miller Park values erred on average 1.3 inches on the third base side. At the other side of the spectrum was Fenway Park which, having three games with the system off by more than three inches, had numbers biased toward first base by an average of 0.8 inches. Plotting histograms of game-by-game offset values of the two most extreme stadiums shows that using average values to correct data would be a poor choice, since the charts overlap a lot.
Mis-calibrations on vertical location have been smaller in the same period, as you can gauge from the summaries below.
offset Min Q1* Median Q3* Max horizontal -5.0 -0.8 0.0 0.8 4.5 vertical -3.3 -0.6 0.0 0.6 3.8 * Q1: first quartile, Q3: third quartile
Just 11 contests had errors over three inches, with Ameriquest Field in Arlington being the only park with multiple showings (four); lowering the bar at 2.5 inches, we get 62 miscalibrated games, 14 in Arlington and six in both Busch Stadium and Dodgers Stadium.
It does not appear that Sportvision has reduced the number of games with extreme bias during the past three years. On the contrary, while in 2008 there were 10 games offset by more than three inches on the horizontal axis and 10 offset by more than 2.5 inches on the vertical axis, the numbers have grown to 25 and 23 in 2009 and 36 and 29 in 2010.
Here are the “major offenses” of every MLB park in the 2008-2010 period.
------horizontal offset------ -------vertical offset------- park 2008 2009 2010 total 2008 2009 2010 total total Arlington 0 1 2 3 1 4 9 14 17 Miller Park 2 5 8 15 0 0 0 0 15 Busch Stadium 1 0 2 3 1 2 3 6 9 Fenway Park 1 1 1 3 2 0 3 5 8 Camden Yards 1 4 3 8 0 0 0 0 8 Yankee Stadium 0 4 2 6 0 0 2 2 8 Coors Field 0 1 1 2 0 0 5 5 7 Dodger Stadium 1 0 0 1 0 4 2 6 7 Minute Maid 0 0 7 7 0 0 0 0 7 Comerica Park 0 2 0 2 0 1 1 2 4 Jacobs Field 1 0 1 2 2 0 0 2 4 Nationals Park 1 0 2 3 1 0 0 1 4 Tropicana Field 0 0 2 2 0 1 1 2 4 Wrigley Field 0 2 1 3 0 1 0 1 4 Metrodome 0 0 0 0 0 3 0 3 3 U.S. Cellular 0 1 0 1 0 1 1 2 3 AT&T Park 1 0 0 1 1 0 0 1 2 Citizens Bank 0 1 1 2 0 0 0 0 2 Great American 0 0 0 0 0 2 0 2 2 PETCO Park 1 0 0 1 0 0 1 1 2 PNC Park 0 0 0 0 1 0 1 2 2 Rogers Centre 0 0 0 0 0 2 0 2 2 Angel Stadium 0 0 1 1 0 0 0 0 1 Chase Field 0 1 0 1 0 0 0 0 1 Citi Field 0 0 1 1 0 0 0 0 1 Dolphin Stadium 0 1 0 1 0 0 0 0 1 Kauffman St. 0 1 0 1 0 0 0 0 1 McAfee Coliseum 0 0 0 0 0 1 0 1 1 Safeco Field 0 0 0 0 0 1 0 1 1 Shea Stadium 0 0 0 0 1 0 0 1 1 Target Field 0 0 1 1 0 0 0 0 1
Running a Multilevel Model produces offset values for every game in the PITCHf/x database: How can we be certain the calculated numbers reflects true system bias? The short answer is we can’t, but we can perform some sanity checks.
The first is to compare numbers with those of others who have tackled the issue in the past. Below are the numbers Mike Fast published for the 2010 ALCS (horizontal axis only), compared to those coming out from Multilevel Modeling.
Venue and game Mike Max Fast Marchi 2010 NLCS Citizens Bank Park, Game 1 -0.1 0.7 Citizens Bank Park, Game 2 0.0 0.0 AT&T Park, Game 3 -1.7 -1.7 AT&T Park, Game 4 -2.5 -2.2 AT&T Park, Game 5 -1.8 -1.0 2010 ALCS Rangers Ballpark, Game 1 -1.9 -1.9 Rangers Ballpark, Game 2 -2.7 -2.9 Yankee Stadium, Game 3 -0.5 -0.7 Yankee Stadium, Game 4 -2.1 -1.6 Yankee Stadium, Game 5 -1.0 -1.8 Rangers Ballpark, Game 6 -5.5 -5.4
Mike was also kind to share his values for the full 2010 season. With more than 2,400 games to compare, the correlation between our numbers exceeds 0.92. Ninety percent of the corrections have differences within an inch and 98 percent within one and a half inch. For only one game our estimations differ by more than 2.5 inches. It is a Phillies-Braves game in which all but 17 pitches were thrown by starters Roy Halladay and Derek Lowe; as was mentioned earlier, the ability to estimate PITCHf/x bias is higher when more pitchers get on the mound.
Thus, the numbers coming from two different systems are nearly coincident. This is encouraging, but it can also mean Mike and I are committing similar errors (which might be, since the premises to our calculations are pretty similar).
Another sanity check can be done with umpires. Rob Drake, for example, was appointed to judge strikes and balls in two extreme games. According to my estimations, the May 13, 2009 contest between the Reds and the D-Backs in Houston had the PITCHf/x tracking system off by 4.5 inches toward first base, while the Sept. 6, 2009 game featuring San Francisco at Milwaukee was off by 2.9 inches in the opposite direction. Below you can see how Drake called those games.
While no adjustment has been made to take into account who was on the mound or the kind of pitches they delivered, we can notice something supporting the estimated PITCHf/x biases.
According to the recorded data, during the game in Milwaukee, Drake called strikes on a lot of outside pitches to left-handed batters, while ignoring offerings on the inside corner; a few months earlier, in Arizona, he had never barked on balls away from lefties, calling instead the inner part of the plate, especially on pitches at the knees.
A consistent behavior appears when looking at Drake’s calls against righties. The inner part of the plate looks like it had been shaved from the game at Chase Field; in Milwuakee, on the contrary, the ump seemed not reluctant to give the pitchers the inside edge against right-handed batters.
We can assume Drake called antithetical games on May 13 and on Sept. 6; otherwise, we can apply the estimated corrections to the PITCHf/x data and redraw the charts.
Now it seems the man in blue was quite consistent in those game.
Let’s have another look at umpires as a way to confirm estimated PITCHf/x bias. For each game, take the horizontal location of every called strike, then find the game’s fifth and 95th percentile. This should give an idea of how far the umpire was willing to call strikes in either direction (the fifth and 95th percentiles are chosen rather than the minimum and maximum values, so that a single blunder by the umpire does not influence values for the whole game).
Then run a correlation between the percentiles and the estimated offsets.
Both the left and right limits show a positive correlation (coefficient around 0.5) to the calculated PITCHf/x biases, once more pointing that the estimations make sense.
A few more ingredients into the mix
A more complex model was also built, in which two other factors were considered. One is the presence of men on base. This could alter the pitcher’s target in a couple of ways: The pitcher might prefer to throw around batters more when runners are threatening to score, and pitching from the stretch might slightly alter his mechanics, influencing the final destination of the ball.
The other is the batter’s identity: rather than simply separating righties and lefties, we can force the model to adjust for each batter, so a right-handed hitter who thrives on the inside part of the plate is treated differently than a right-handed hitter who likes to extend his arm on pitches away.
The values estimated by the complex model are nearly identical to those coming from the simpler one, the highest differences being in the order of 0.1-0.2 inches.
Nothing new here. We have been warned many times about the occasional miscalibrations of PITCHf/x. Thus, once again, think about it before using unadjusted data to point to an umpire’s ineptitude, or before coming to conclusions about allegedly modified pitching approaches.
Sportvision has been doing a terrific work in the past few years in tracking every major league pitch and we are really fortunate that it (and MLBAM) let us put our hands on that wealth of data. Thus, pointing at miscalibrations is not meant as criticizing their amazing work, but rather as a way to give something back.