Improving Projections with Exit Velocity

When the topic is exit velocity, the conversation begins with Giancarlo Stanton. (via Arturo Pardavila III)

When the topic is exit velocity, the conversation begins with Giancarlo Stanton. (via Arturo Pardavila III)

Batted-ball exit velocity is all the rage these days. With MLB’s rollout of Statcast last year, exit velocity and other batted-ball metrics have become a part of the casual baseball fan’s vocabulary. Quite simply, it’s cool to be able to see how fast Giancarlo Stanton’s laser-shot home runs leave the bat.

 
And as a sabermetrics geek, batted-ball data appears to be the path forward in the everlasting quest for more accurate player analysis.

For the past seven months, Jared Cross and I have been working on a project to dig deeper into the Statcast data and put it to use. As the first fruits of our labor, we are releasing estimates of players’ average exit velocities for the 2012 through 2014 seasons. In addition, we are making a first attempt at adjusting players’ 2016 Steamer projections based on their 2015 average exit velocities. Not only are we excited by the potential uses of exit velocity information, but we believe its introduction signifies the beginning of a new era in baseball projections.

Fundamentally, building an accurate player projection system for hitters is about identifying skill and filtering out luck. In baseball, of course, there’s a lot of luck involved. We’ve all seen the scorching line drive that’s caught for an out or the soft dribbler that rolls down the third base line for a double. The key here is to focus on the process of hitting and not the outcome, because that is what the hitter can control.

We care not about whether a player ends with a hit, but whether he is carrying out the process that is conducive to good hitting. That is, hitting the ball hard and hitting it squarely. By looking at batted-ball data, we can figure out just that. Batted-ball metrics are defense-independent, and they let us filter out the pesky batted-ball luck we traditionally regress hitters’ BABIP to counteract. We think we also can use them to better predict a hitter’s home run power.

As you probably know, these kinds of luck work themselves out as the sample size of the data gets larger. But home run rate only stabilizes after 170 plate appearances or so. BABIP takes a whopping 820 balls in play to stabilize. So traditional outcome-stat-based projection systems need to use a large sample size of data for accuracy. What this means is that the projections for hitters who have had only a few plate appearances or whose skills have changed recently are not going to be very well informed. The goal of implementing batted-ball data is to reduce this minimum sample size necessary to make an accurate estimate of a player’s ability.

While Statcast appears to be the future of batted-ball data and can be credited for making our research possible, the system in its current form is not without its flaws. Probably the most publicized issue with Statcast has been gaps in the data—batted balls on which the exit velocity, for whatever reason, is not recorded. It has been found that this problem does not occur randomly, and some types of batted balls are, in fact, more likely to be missed than others.

Less well known, but perhaps more troubling, is the number of obviously bogus exit velocity readings scattered throughout the data we have. Take, for example, Noah Syndergaard’s first major league home run last year—Statcast recorded its velocity as 59 mph, which obviously is way off. Combine these bugs/holes with the fact that we only have Statcast data for a single full season, and we want to make our data set more robust. By the end of this season, the story could be much different, as MLB Advanced Media is already improving Statcast data.

For now, we decided to get creative. Through FanGraphs, we were given access to two other useful pieces of data: batted-ball distance from Baseball Info Solutions (BIS) and batted-ball hang time from Inside Edge. These data has the advantage of having very few missing data points, and luckily, it dates all the way back to 2012.

The one disadvantage to using this data is it only exists on line drives and fly balls, not for ground balls. The good news, though, is that we believe exit velocity on ground balls isn’t as valuable. Players’ ground ball exit velocities in the 2015 Statcast data fluctuated more than that of line drives and fly balls. We ran an experiment in which we split the season into two essentially random halves (odd days and even days) and found that a player’s ground ball exit velocity was only half as predictive from half to half as that of line drives and fly balls. Because of this, we thought we could judge players almost as well just using data for line drives and fly balls.

EVEN/ODD DAY EXIT VELOCITY CORRELATIONS
Group E-GB Vel O-GB Vel E-notGB Vel
O-notGB Vel 0.26 0.18 0.68
E-notGB Vel 0.23 0.19     
O-GB vel 0.30          
Notes:
1. E=even group, O=odd group.
2. According to the 2015 Statcast data, ground balls are hit an average six mph softer than non-ground balls, which means we’re overestimating players’ exit velocities by failing to include grounders. How much we are overestimating depends on the individual player’s groundball rate, but for a league-average player, we think we would be overestimating by about two mph.

Here is a graph of hang time, distance, and exit velocity on 2015 line drives and fly balls for which we have Statcast.

hangtimevsdistance

You can see that if you know the distance and the hang time of a batted ball, you can pretty easily estimate what its velocity would be. This being true, we made a model that would spit out an estimate of a batted ball’s velocity from its hang time and distance. Now we would be able to plug in any batted ball dating back to 2012 for which hang time and distance was recorded and estimate its velocity. If you’re interested, you can see the R code we used to create the model here.

model
Note: When creating our model, we removed batted balls for which Statcast data was suspected to be bogus, as identified by contradictions between the ball’s Statcast exit velocity and its BIS “hard”/“medium”/“soft” classification.

We were actually pretty impressed by the accuracy of the model. When we plugged in hang times and distances of batted balls from 2015 on which we knew the actual velocity from Statcast, we found our model’s estimates were off by only about two mph on average. And remember how long traditional outcome stats took to stabilize? Using this newly-derived exit velocity data, we were able to determine that a player’s average exit velocity on line drives and fly balls stabilized at roughly 20 batted balls, which would be reached around 50 plate appearances.

Most importantly of all, by estimating exit velocity in this way, we were able both to avoid the bugs associated with Statcast and to derive exit velocity data for seasons before Statcast became available. We’re excited to share with you a table of each player’s average exit velocities for seasons 2012 through 2015, adjusted for park effects (it’s important to correct for park effects when estimating velocity this way—a ball hit at a given velocity at Coors Field will travel farther and hang longer than a ball hit at the same velocity at, say, Minute Maid Park):

You can also view the whole sheet here.

Having this reliable exit velocity data going back to 2012 is pretty awesome in itself, but let’s be clear — the big question is whether it can help us better evaluate players. We had a feeling it might be pretty useful, but we wanted to know for sure, so we compared players’ average velocities to both their actual stat lines and their Steamer projections. We wanted to see not just whether players with higher velocities hit better in general, but if they outperformed their projections.

To start off, we ran a regression analysis comparing players’ Weighted On-Base Average (wOBA) to their average exit velocities from the previous season. (We matched up wOBA from 2013, 2014, and 2015 with exit velocity from 2012, 2013, and 2014, respectively. There were 1,028 players in our sample, each with at least 50 fly balls or line drives in the prior year and at least 50 plate appearances in the projected year.)

PREDICTED WOBA FROM PRIOR YEAR EXIT VELOCITY
Term Coefficient (standard error) p-value
Prior year exit velocity 0.0083 (0.0005) 3*10-53

We found that for each mph of exit velocity a hitter is above league average, we can expect him to put up eight additional points of wOBA the next season, which is pretty big. This tells us that players with higher exit velocities are, in fact, hitting better overall. But we also wanted to know if they were outperforming their projections, so we ran a similar regression but included Steamer projections as a variable.

PREDICTING WOBA USING STEAMER AND PREVIOUS SEASON EXIT VELOCITY
Term Coefficient (standard error) p-value
Steamer   0.754 (0.044) < 2 x 10-16
Prior year exit velocity 0.0028 (0.0006)  3.6 x 10-7
Intercept          -0.176

When used in conjunction with Steamer, the impact of exit velocity was still significant, both statistically and practically. We can expect a hitter to outperform his Steamer projected wOBA by roughly three points for each mph of previous-season exit velocity. (We found a similar effect when using either ZiPS projections or an average of ZiPS and Steamer.)

Seeing the predictive value of previous-season velocity, we decided to make a table of exit velocity-adjusted Steamer wOBA projections for the 2016 season. This could be seen as a sort of “first taste” of what a batted-ball based projection system for hitter stats could look like:

You can also view the whole sheet here.

While this rough adjustment should indeed be an improvement over using Steamer alone, we do want to caution that it’s probably not the best way to use exit velocity to adjust projections. Look, for example, at two players with similarly high 2015 exit velocities, David Ortiz and Miguel Sano. Ortiz has been an elite hitter for several seasons, over the course of which his exit velocity always has been high. So Steamer, using his stats from those seasons, will give him a projection that’s already reflective of a hitter with high exit velocity. Sano, on the other hand, has played only part of a season in the major leagues, so his high exit velocity won’t be fully cooked into his Steamer projection. Our adjustment would give Sano and Ortiz about the same increase in wOBA, but in reality, Sano probably deserves a bigger bump up than Ortiz.

We also want to caution that a hitter may have an outstanding exit velocity but be far from a perfect hitter. A good example is our 2015 velocity leader, Joey Gallo. We certainly would expect Gallo to do better than a similar player with a less-impressive average exit velocity. But it matters not just how hard you hit the ball, but how often you hit it. Gallo had a crazy 46 percent strikeout rate last season, so while we know he can hit the ball hard when he puts it in play, he won’t have many opportunities to put the ball in play unless he can cut down on the strikeouts.

Finally, it’s important to pay attention to other batted-ball metrics besides exit velocity. Average vertical launch angle, whether measured precisely in degrees, or coarsely in terms of flyball, line-drive, pop-up and groundball rates, also plays a role. Regardless of their velocity, balls hit straight up into the air or straight down into the ground usually are the result of poor contact with the bat and are unlikely to produce hits. In fact, it has been found that ground balls as a category are the least valuable type of batted ball for run production. This works against a hitter like Pedro Alvarez, who, while he put up an impressive 2015 average exit velocity, had a high 52 percent groundball rate last season.

All in all, exit velocity is only one piece in the puzzle of evaluating hitters. It’s a big step forward, though, and there are more advancements on the way. We expect to see an improvement in both the reliability and scope of Statcast from 2016 on, including the addition of vertical launch angle data on all batted balls. And we’re hoping in the near future to release a more comprehensive batted-ball-based player projection system, one that takes into account a range of factors such as launch angle, handedness, defensive shifts, and running speed. So stay tuned—there’s more batted ball fun right around the corner, and we think the best has yet to come.

References & Resources


William Sapolsky is a high school senior at Saint Ann’s School in Brooklyn. He is an avid Mets fan, and enjoys playing the piano and cooking in his free time. Follow him on Twitter @WilliamSapolsky.
22 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Jeff Zimmermanmember
7 years ago

Great work.

I love the “Hangtime vs Distance (Model)”, but it is little hard to read in places with the transitions. Is there any way to get the graph with just lines showing the breaks or a spreadsheet of the data?

Again, great work.

Peter Jensen
7 years ago

William – It is admirable that you and Jared have come up with an Exit Speed estimator from the BIS distance data and Inside Edge hang time. But your estimator will only be as good as the input data and while hang time is relative easy to get a good estimate from video, accurate hit ball distance is not. BIS sources have told me that they are happy if they can get multiple viewers to agree within 15 feet of estimated distance. Coupled with the fact that back spin or top spin an a hit ball will create a considerable variation in distance for balls hit at identical exit speeds and vertical angles there would be a fairly large error range between estimated Exit Speed and actual Exit Speed for any individual hit ball.

As far as producing a model for using hit ball data to estimate future player projections, you may want to review the work done by researchers during June and early July of 2009 on the publicly released Hit Fx data including my own contribution of “Using Hit Fx to Measure Skill” published here at THT. Mike Fast, Harry Pavlidis, Max Marchi, Matt Lentzer and others also wrote articles or gave important presentations at the 2009 Sportvision Summit on the various aspects of how hit ball data from HIt Fx could be used. So far I have seen nothing in the recent articles on StatCast data that demonstrates improvement on the work that was done then. Eventually there will be, and it will come from a new generation of researchers like yourself, but only if you make yourself aware of what has already been accomplished by analysts in the past.

Incidentally, it is a little ironic that you are using distance and hang time to estimate hit ball speed. Hit Fx was originally an interim proxy developed from my failure to be able to accurately estimate hit ball landing location and hang time from the initial hit ball variables. Accurate landing location and hang time are the holy grail of sabermetric data that will ultimately produce the best metrics for hitting, pitching, and fielding. When all the bugs are worked out StatCast should be able to provide those data.

tz
7 years ago
Reply to  Peter Jensen

Peter, what are your thoughts on the framework used here, if accurate exit speeds had actually been available for the period of the regression study?

Peter Jensen
7 years ago
Reply to  tz

Even if Exit Speeds are perfectly known for every hit ball, and there are problems with both Hit Fx and Trackman systems, there is only so far that you can go building a projection system built on that data. And looking at average Exit speed is a particularly bad use of hit ball speed data.

tz
7 years ago
Reply to  Peter Jensen

Thanks. I think there’s some value in working on an analysis framework on flawed data as long as those flaws are transparently caveated – the value would come in being able to redeploy that framework once better data comes along.

In fact, one good use for such analysis is making the case for using distribution of exit speed instead of average exit speed. Intuition would suggest that two fly balls each hit with enough exit velocity to land right in the CF’s glove are less valuable than on hit 15mph slower for a Texas Leaguer and one 15mph faster for a home run. Even allowing for a generous margin of error, the existing exit speed data allows us to confirm that.

tz
7 years ago
Reply to  Peter Jensen

Thanks. I believe that there is some value in putting together models like this while the data is still quite spotty, so you don’t have to build from scratch once solid data becomes available. But I fully agree about the use of average exit speed – you need to reflect the full distribution of exit velocities somehow in such a model.

tz
7 years ago
Reply to  tz

oops – thought that my initial response didn’t make it through.

Dreama
7 years ago
Reply to  Peter Jensen

Kenties kissa on hillitön optimisti ja toivoo, että säätila olisi sen kymmenen minuutin aikana ehtinyt muuttua sutmsiuamoaksi? :DMinäkään en haluaisi pakata Tuuttia koppaan valjastelemaan mennessä, koska raukkaparka inhoaa ko. välinettä ja valjastelun ilo olisi sillä pois pyyhitty. Kielto on pihassa varmaankin eläinten jätösten pelossa laitettu, enkä viitsi sitä uhmata, koska monille kissattomille (ja lapsellisille, näin ilkeästi osoitellen) on ilmeisen hankalaa ymmärtää, ettei kissan kanssa valjastella vessapaikkaa etsien :/

kredit 30000 euro ägypten
7 years ago
Reply to  Peter Jensen

Edwin   Hoi,Het is mijn wel gelukt, maar je kan toch net zo goed van je selectie een vectormasker maken? Als je dat hebt gedaan kan je d.m.v. een paar keer inzoomen nog wat corrigeren met het penseel.Of is bovenstaande sneller? Groet,Edwin

united airlines coupons
7 years ago
Reply to  Peter Jensen

Åhr fodnusseri, det lyder dejligt. Det får mig til at tænke på, hvor mange dejlige ting, jeg har i vente, når jeg kommer til Danmark igen efter udveksling. Selvom udveksling nu heller ikke er skidt PS: Jeg hedder bare CP på Starbucks hernede, tror faktisk medarbejderne synes det er skønt, man bruger sine initialer!

Mike P
7 years ago

Hi. Not sure where the “Syndergaard 59 mph HR” thing came from, but it’s shown here as being 105.6 mph.

http://atmlb.com/24u5KSV

J. Cross
7 years ago
Reply to  Mike P

We noticed that this was fixed on the new site. It looks like much of the data that was missing before has also been filled in.

William Sapolsky
7 years ago
Reply to  Mike P

J. Cross is right. We did this project using an older Statcast data file which had a lot of missing/mislabled data points. At that point, we thought that it might be the best that would ever be available for 2015. Luckily, MLB/Baseball Savant are now providing better data both going forward into 2016 and for the completed 2015 year. Most exciting of all is the vertical launch angles, which are now published for nearly all 2015 batted balls.

As for why so many of the old data points were mislabled, I think what was happening was that batted balls that went into play were getting confused with earlier foul balls in the at bat. So Syndergaard actually hit a 59 MPH foul ball earlier in the at bat and Statcast published that number instead of the 104 MPH home run.

Kevin Morse
7 years ago

Nice work. In my opinion, you should look at the research of Perry Husband as he was the original pioneer of understanding and improving exit speeds. My company has been assessing exit speeds for over 7 years and we rank HS players (soon to be college as well) on National, State and County Leaderboards. Would enjoy connecting with you.

Peter Jensen
7 years ago
Reply to  Kevin Morse

Perry Husband’s Effective Velocity system for pitch selection and sequencing deserves more attention than its been given. Unfortunately, Husband’s Well Hit Average is not a particularly good measure of hitting ability because it puts too much emphasis on Exit Speed.

Russell A. Carleton
7 years ago

Current mind status: blown.

Pete
7 years ago

“So Steamer, using his stats from those seasons, will give him a projection that’s already reflective of a hitter with high exit velocity. Sano, on the other hand, has played only part of a season in the major leagues, so his high exit velocity won’t be fully cooked into his Steamer projection. Our adjustment would give Sano and Ortiz about the same increase in wOBA, but in reality, Sano probably deserves a bigger bump up than Ortiz.”
I feel like there has to be a way to account for this, no?

Alan Nathan
7 years ago

I would be interested in know more about your algorithm for obtaining exit speed from distance and hang time. I realize I could go to your R code that you linked, but perhaps you could explain in words what you do. In particular, is it “statistics based” (i.e., some kind of regression analysis) or “physics based”? As you might guess, I would be particular fond of the latter, less so of the former.

Alan Nathan
7 years ago

One more point I meant to add: If a physics-based extraction technique is used, then you can probably get the launch angle also. That is, there is (sort of) a one-to-one correspondence between (exit speed, launch angle) and (distance, hang time). The “sort of” caveat is to remind us that life is usually more complicated, as exit speed and launch angle do not uniquely determine distance and hang time. But they probably come pretty close for the type of analysis you want to do.

derp
7 years ago

Hello, I have personally developed a wOBA estimator based upon launch angle and exit velocity. I approached the problem from a totally different perspective. In your Sano example, I give him (adjusted to your numbers) a .378 wOBA. On the flip side, I gave Ortiz .438 (and he’s batting .440).

My system uses the launch angle as a key factor in how balls are evaluated, and I hope to write up a report of how it works and my findings soon. I’m private on twitter at the moment, but if you follow me, I’ll follow you back and we can talk about it. @derpymets

Alan Nathan
7 years ago

In my recent article, http://www.hardballtimes.com/going-deep-on-goin-deep/, I used Statcast data from 2015 to fine-tune my algorithm for calculating trajectories of batted balls. I just used that model to investigate the relationship between (exit speed, launch angle) and (distance, hang time), at least for a standard air density and no wind. Since the parameters of the model have already been adjusted to best fit the Statcast distance data (but not hang time, which I did not have), it should be the best physics-based method available for relating the two sets of quantities. Contact me privately if you want to discuss.

Boaty McBoatface
7 years ago

My idea would be to create a neutral ballpark and then assign each batted ball a value based on exit velocity, launch angle and direction. Adjust all non-HR for speed and all bip that might not clear infield by ground ball pull frequency.