Back in 2014, the Yankees sponsored a competition for their fifth-starter role in spring training. Following a spirited audition, left-hander Vidal Nuno was awarded the consolation prize of a spot in the major league bullpen. But five starters is never enough. The Yankees were rained out on Jackie Robinson Day, which led to a doubleheader the next day and the need for a sixth starter five days later.
On April 20, Nuno got the nod in Tampa and capitalized by striking out six batters in five scoreless innings before watching the New York bullpen surrender his lead in the seventh. But even a casual fan with a box score, perhaps armed with an education delivered by Brian Kenny’s relentless anti-win crusade, could look past the no-decision and conclude the guy had pitched pretty well.
The still-winless Nuno got another start against the Rays on May 2 back in New York. This time he took a 2-1 lead into the fourth and, after retiring the leadoff man, appeared to get a quick second out when Evan Longoria hit a first-pitch curveball high in the air to center field. But Jacoby Ellsbury “lost it in the sky” — as he described it — and the ball landed for a triple. Nuno’s stat line absorbed two earned runs in the inning, but none would score if his center fielder had caught the fly ball. Nuno’s first win would have to wait. And this time there was not a trace of forensic evidence in the box score suggesting he deserved better.
Perhaps we need to dig deeper to get things right. Technologies like HITf/x and Statcast characterize a batted ball at contact, which provides the opportunity to separate a batted ball’s intrinsic value from its outcome. In the process, we remove the effect of factors such as the defense, the weather, the ballpark and random luck. As a result, we can define batted-ball statistics for batters and pitchers that are less subject to random variation than statistics that are based on batted-ball outcomes.
The HITf/x System
HITf/x is a system developed by Sportvision that estimates the initial trajectory of a batted ball in three dimensions using the same video sequences acquired by PITCHf/x. Each batted ball is described by three parameters. The speed, s, is an estimate of the ball’s initial speed in miles per hour. The vertical launch angle, v, is the angle the batted ball’s initial direction makes with the plane of the playing field where a vertical angle of -90 is straight down and a vertical angle of +90 is straight up.
The horizontal (or spray) angle, h, specifies the direction of the batted ball in the plane of the playing field where the direction toward third base has a horizontal angle of -45 and the direction toward first base has a horizontal angle of +45.
The result of a batted ball has a strong dependence on the (s,v,h) vector. For example, a vector of (75, 70, 35) typically is a pop-up to the first baseman, a vector of (60, -10, -15) typically is a ground ball to shortstop, and a vector of (100, 25, -35) usually results in a home run to left field. Early HITf/x studies by researchers including Peter Jensen, Brian Cartwright and Mike Fast demonstrated that HITf/x measurements provide significant advantages for analysis over previous data that included only a ground ball, line drive or fly ball descriptor for each batted ball.
Learning the Value of a Batted Ball
Sportvision generously provided data acquired by the HITf/x system for every regular-season major league game in 2014. Using these data, I generated a model for the intrinsic value of a batted ball as a function of its s, v and h parameters. The model was constructed using the data for all balls in play with a horizontal angle in fair territory that were tracked by the system, where bunts are excluded. This results in a set of 124364 batted balls, which represents more than 97 percent of the major league total for 2014.
A Bayesian framework was used to derive the mapping from batted-ball parameters to intrinsic value. If we consider the six batted-ball outcomes R_0=out, R_1=single, R_2=double, R_3=triple, R_4=home run, and R_5=batter reaches on error, then Bayes’ rule states that the probability of an outcome R_j given a measured HITf/x vector x=(s,v,h) is:
P(R_j | x) = p(x | R_j) P(R_j) / p(x)
where p(x | R_j) is the conditional probability density function for x given outcome R_j, P(R_j) is the prior probability of outcome R_j, and p(x) is the probability density function for x. For example, P(R_1 | (90,15,0)) is the probability of observing a single for a ball hit up the middle at ninety miles per hour with a vertical launch angle of 15 degrees.
Bayes’ rule is important because we can represent some of our favorite baseball statistics as a weighted sum of the P(R_j | x) values using:
S(x) = w_0 P(R_0 | x) + w_1 P(R_1 | x) + w_2 P(R_2 | x) + w_3 P(R_3 | x) + w_4 P(R_4 | x) + w_5 P(R_5 | x)
If we treat sacrifice flies as ordinary outs, then a weight vector of (w_0,w_1,w_2,w_3,w_4,w_5) = (0,1,1,1,1,0) turns S(x) into the expected batting average for a batted ball with parameter vector x. If we instead use the vector (0,1,2,3,4,0), then we get slugging percentage. The key point is that we have a way to estimate a batted ball’s value that is separate from the batted ball’s particular outcome.
As we know, batting average and slugging percentage are deficient for describing the value of a batted ball. Weighted on base average (wOBA), on the other hand, also can be represented in the form of S(x), but it uses coefficients derived from the average run value of each event. Therefore, we define the intrinsic value of a batted ball as wOBA(x) using the weight vector (0.000, 0.892, 1.283, 1.635, 2.135, 0.920), where w_0, w_1, w_2, w_3, and w_4 for 2014 were obtained from the FanGraphs guts page, and w_5 was obtained from The Book. Thus, wOBA(x) is proportional to the expected run value of a batted ball given only what occurs at contact.
For this process to work, we still need to find the functions on the right side of Bayes’ rule. This is accomplished by applying machine learning techniques to the HITf/x data. The details of the approach are somewhat technical, but the curious reader is encouraged to follow the link. The Cliffs Notes version is that nonparametric estimates for the probability density functions are generated using a kernel method that employs cross-validation to learn an optimal set of anisotropic smoothing parameters.
Now that the math is out of the way, we can enjoy the fruits of our labor. Since wOBA depends on the three variables — s, v, and h — we can visualize its structure by taking lower-dimensional slices through the wOBA cube. If we fix the batted-ball speed at 93 miles per hour, for example, we get a wOBA plane that depends on the other two variables:
For this value of s, the best results for batters occur for balls hit with vertical angles between 25 and 40 degrees that are near the left-field line (-45° to -35° in h) or the right-field line (35° to 45° in h) where ballpark dimensions are typically the shortest. These batted balls often result in home runs. Batted balls hit at the same speed with the same vertical angle are less valuable at horizontal angles near zero, which correspond to larger ballpark dimensions in center field. For this initial speed, batted balls with vertical angles near 12 degrees tend to carry over the infielders and land in front of the outfielders and have a high value for all horizontal angles.
Typical horizontal angle positions for the three outfielders are evident from the three cold zones for balls hit in the air with vertical angles between 15 and 20 degrees, and typical horizontal positions for the four infielders are evident from the four cold zones for ground balls (v < 0).
We can also examine one-dimensional slices through the wOBA volume. Let’s look at ground balls with a vertical angle of -2° that are hit at 85 and 93 miles per hour:
Minima in the two curves correspond to the typical position of infielders with the minima near -36, -14 ,14, and 37 degrees, corresponding to the third baseman, shortstop, second baseman and first baseman, respectively. Over most horizontal angles, balls hit at 93 mph have a higher value than balls hit at 85 mph since ground balls hit at a higher speed have a higher probability of eluding a defender.
We also can consider balls hit in the air with a vertical angle of +15° at the same speeds:
Here minima in the two curves correspond to the typical position of outfielders with the minima near -20, zero, and 20 degrees, corresponding to the left fielder, center fielder and right fielder, respectively. For this vertical angle, balls hit in the direction of an outfielder have a higher value for a speed of 85 mph because these balls often fall in front of the outfielder for hits, while balls hit at 93 mph more frequently carry to the outfielder for outs. For both the ground balls and fly balls, the largest wOBA values occur for balls hit near the foul lines (h=-45° or h=45°) which often result in extra-base hits instead of singles.
Significant wOBA differences for the same s, v and h occur between left-handed and right-handed batters due to differences in the positioning of defenders. Thus, we can define separate values, wOBAl for left-handed batters and wOBAr for right-handed batters. Let’s look at the effect of batter handedness on ground balls with a vertical angle of -2° that are hit at 93 miles per hour:
As before, we observe four minima in each curve that correspond to the typical position of the four infielders. We see, however, that the minima for left-handed batters are shifted several degrees toward the first-base line (h=45°) compared to the corresponding minima for right-handed batters. This shift corresponds to the difference in fielder positioning as a function of batter handedness. We also see that ground balls near the first-base line have a higher value for right-handed batters since there is a lower probability of a defender in that region, and that ground balls near the third-base line (h=-45°) have a higher value for left-handed batters.
We also can look at balls hit in the air with a vertical angle of +15° and a speed of 93 miles per hour:
Once again, the three minima in each curve correspond to the typical positions of outfielders, but the minima are shifted several degrees toward the right-field line (h= 45°) for left-handed batters. We also see left-handed batters have an advantage for batted balls hit in the direction of the right fielder (h near 20°) since the right fielder is typically playing deeper for left-handed batters, which allows additional batted balls to fall safely for hits. We observe the opposite effect for batted balls hit in the direction of the left fielder (h near -20°) since the left fielder is typically playing deeper for right-handed batters.
Intrinsic Contact Statistics
A batted ball with the parameters s, v, and h can be assigned the intrinsic value given by either its wOBAl or wOBAr value depending on the handedness of the batter. Batted balls may also be assigned an observed value given by the wOBA coefficient for the result of the batted ball. The observed value depends on several factors that are beyond the control of the batter and the pitcher, such as the defense, the weather, and the ballpark. That high fly ball Ellsbury lost in the sky had a minuscule intrinsic value of 0.040, but it scored an observed value of 1.635 after landing for a triple. We’ve used the intrinsic and observed values to derive statistics that describe batters, pitchers, defense, and park effects. Today we’ll focus on batters and pitchers.
Analysts sometimes quantify the value of a hitter’s batted balls using the average, O, of his observed batted ball values over a period of time. The statistic O is referred to as “wOBA on contact” or wOBAcon. As we’ve pointed out, however, O depends on a number of variables that are independent of the batter’s quality of contact. Thus, we propose the average, I, of the intrinsic values as a more accurate valuation of a hitter’s collection of batted balls. The batters with the highest I in 2014 among players who hit at least 300 batted balls that were tracked by HITf/x are:
No surprises here. These guys tend to hit the ball hard.
For an individual batter, several factors can contribute to differences between the average observed outcome, O, and the intrinsic value, I, of his batted balls. Batters who are fast runners, for example, force infielders to play shallower, which compromises range and leads to additional hits. Fast runners also tend to beat out more infield hits and garner additional bases on hits to the outfield. Thus, a faster runner will tend to achieve a higher O for a given I.
Batters with a high degree of predictability in their batted balls, such as left-handed batters who hit a large majority of their ground balls to the right of second base, are easier to defend than batters who produce a more uniform distribution of batted balls. Batters with a higher degree of predictability, therefore, will tend to have a lower O for a given I. Luck also can play a role in creating differences between O and I.
The next table presents batters with the largest values of O-I during the 2014 season where both O and I are computed using the batted balls tracked by HITf/x:
Most of the players in this list have above-average running speed, with Hamilton and Cain having exceptional speed. The top two players on the list also benefited from good luck. Marte led major league baseball by reaching base on an error 14 times in 2014, which contributed to his major league-leading O-I. Jose Abreu also experienced significant good fortune, as many of his 36 home runs just barely cleared the fence, causing his home runs to have an average intrinsic value I of 1.461, which is significantly less than the corresponding O of 2.135.
Finally, we have the batters with the lowest values of O-I during 2014:
All of these batters have below-average running speed and several (Moss, Teixeira, Santana) also had sufficiently predictable batted-ball distributions against which opposing teams were able to employ extreme defensive shifts.
In 2001, McCracken suggested pitchers have little control over the result of opponent batted balls that are not home runs. Since then, however, a number of researchers (really, lots of them) have presented evidence that pitchers can have some effect on the expected outcome of balls in play. Despite this progress, models that isolate the impact of the pitcher on the fate of batted balls have been elusive due to the confounding effects of the defense, ballpark, weather and luck on a batted ball’s outcome.
Since the HITf/x system characterizes a batted ball at contact, the influence of these confounding factors can be removed. As with batters, we can assign the intrinsic value, I, to the collection of batted balls allowed by a pitcher. The statistic, I, provides a context-invariant measure of a pitcher’s opponent contact, which allows this aspect of his performance to be accurately quantified.
The pitchers with the lowest I values in 2014 among those who allowed at least 300 batted balls that were tracked by HITf/x are:
Eight of the 10 pitchers in the table had an average fastball speed in 2014 above the league average, and Richards, who earned the top spot in the list, enjoyed one of the highest average fastball speeds in the majors. The success of the two softer-tossing pitchers on the list was due in part to an exceptional sinker for Keuchel and an exceptional split-fingered fastball for Cobb. An interesting topic for future research will be to study pitcher characteristics that lead to low values of I.
Value for Modeling and Forecasting
The statistics that measure the intrinsic quality of contact for batters and pitchers are influenced less by random variation from contextual variables than traditional statistics that depend on batted-ball outcomes. In addition, the new statistics can be used to separate the various skill components that contribute to a player’s performance on batted balls.
A batter’s performance, for example, can be partitioned into statistics that measure his intrinsic contact, running speed, and batted-ball distribution, which determines susceptibility to defensive shifts. An important advantage of generating separate statistics to represent distinct skills is that each statistic can be regressed and projected using its individual reliability and aging curve during forecasting.
The new statistics also allow us to investigate how players control quality of contact. We observed, for example, that many of the pitchers who were the most effective at controlling contact also exhibited an above-average fastball velocity. Given the wealth of descriptors measured by the PITCHf/x system, we have the opportunity to characterize the relationship between the quality of a pitcher’s opponent contact and his distribution and sequencing of pitches. Similarly, we can study the relationship between a batter’s intrinsic contact and his swing parameters.
I am grateful to Sportvision for providing the HITf/x data which made this work possible. I also thank Alan Nathan and Tom Tango for helpful comments on a previous draft of this article. I am happy to acknowledge the help of Qi Shi in the preparation of this document.
References & Resources
- A. Beaton. (July 27, 2014). Jose Abreu: the champion of the cheap home run.
- B. Bowman and J. Inzerillo. MLBAM: Putting the D in data. Sloan Sports AnalyticsConference, Boston, 2014.
- J.C. Bradbury. (May 24, 2005). Another look at DIPS.
- J.C. Bradbury. Peak athletic performance and ageing: Evidence from baseball. Journal of Sports Sciences, 27(6):599–610, 2009.
- B. Cartwright. What ground balls can tell us about fly balls. The Hardball Times Baseball Annual, 2012, pages 249–254. ACTA Sports, Chicago, 2011.
- B. Efron and C. Morris. Stein’s paradox in statistics. Scientific American, 236(5):119–127, 1977.
- M. Fast. (Nov. 16, 2011). Who controls how hard the ball is hit?
- M. Fast. (Nov. 22, 2011). How does quality of contact related to BABIP?
- G. Healey. Technical addendum to The Intrinsic Value of a Batted Ball, 2016.
- P. Jensen. (Jun. 30, 2009). Using HITf/x to measure skill.
- M. Lichtman. (Feb. 29, 2004). DIPS revisited.
- V. McCracken. (Jan. 23, 2001). Pitching and defense: How much control do hurlers have?
- A. Nathan. (Dec. 24, 2015). Optimizing the swing, part deux: Paying homage to Teddy Ballgame.
- N. Silver. Why was Kevin Maas a bust? Baseball Between the Numbers, pages 253–271. Basic Books, New York, 2006.
- M. Swartz. (Dec. 15, 2010). Ground-ballers: better than you think.
- M. Swartz. (Mar. 17, 2010). Why SIERA doesn’t throw BABIP out with the bathwater.
- T. Tango, M. Lichtman, and A. Dolphin. The Book: Playing the Percentages in Baseball. Potomac Books, Dulles, Va., 2007.
- T. Tippett. (Jul. 21, 2003). Can pitchers prevent hits on balls in play?
- wOBA and FIP constants.