Why Are Ground Balls Hit?

Zach Britton is, without question, baseball's best ground ball pitcher. (via kowarski)

Zach Britton is, without question, baseball’s best ground ball pitcher. (via kowarski)

Zach Britton is MLB’s best groundball pitcher. His groundball rates in each of the past two years—twice eclipsing 77 percent—are the highest figures on FanGraphs’ all-time leaderboard. If you don’t watch many Orioles games, you might assume Britton induces his grounders chiefly with sinkers at the bottom of the strike zone. Keeping the ball down, after all, is the sinkerballer’s credo, but the Orioles closer doesn’t fit into that mold.

brittonKzone
Against righties, there are ample pitches at and below the knees, but the thicker clusters are in the heart of the strike zone. Similarly, most grounders coming off lefties’ bats came on middle-middle pitches. What gives? The platitude about keeping the ball down hardly applies to Britton’s grounders, and it might be overblown for other pitchers, too. Many other pitch and contextual factors could play a part in predicting grounders. Which matter most?

Set-up

The plan here will be to predict whether or not a batted ball will be a grounder with the three dozen variables below.

MODEL FACTORS TO TEST
Category Inputs
Pitch Attributes Vertical location, horizontal location, vertical movement, horizontal movement, velocity, release height, break angle, spin rate, arm angle
Situational Handedness, plate count, leverage index, is double play situation, is sac fly situation, is two out and nobody on, year
Batter/Pitcher Talent Batter’s projected GB%, batter’s projected HR%, entropy of pitcher’s arsenal
Previous Pitch Attributes Vertical location, horizontal location, vertical movement, horizontal movement, velocity, break angle, is fastball, is offspeed, is breaking ball, is hard fastball inside, is slow fastball inside, pitcher pace
Previously in the PA Total number of hard fastballs thrown inside before that pitch (THIFB), total number of slow fastballs thrown inside before that pitch (TSIFB)

Each of these inputs is its own individual question. Is a grounder more likely when the pitcher throws with a sharp downward plane? What if the pitcher mixes his pitches well (measured by entropy)? What if the situation calls for the batter to loft the ball into the outfield? To what extent does previously pitching inside matter? We’ll put these questions through a model that will answer each simultaneously: a decision tree model with boosting.

You’ve likely come across a standard decision tree, which models an event by taking a data set and stratifying it into spaces. Those trees are nicely straightforward and take a simple flow chart form. But a single decision tree suffers from high variance, where these overfit models do poorly in out-of-sample testing with new data.

Variance can be reduced by fitting the tree many times over, motivating the use of boosting. This adds power to the decision tree framework by fitting a new tree to each current tree’s set of residuals. Over many tree-fitting iterations, this “slow learning” process attacks areas where the model is underperforming and adapts to provide more spot-on predictions. Because the model allows for non-linear relationships and accounts for any potential interaction effects, it’s suitable for spatial data (like pitch locations). Its output won’t be a tidy coefficient formula, but we’ll get the percentage influence of each factor as part of the whole.

I’ll run individual models for three pitch-type groups: fastball, offspeed, and breaking. The broad pitch-type categories (detailed in the appendix) will allow us to hone in on the traits necessary to get a grounder out of a fastball, offspeed pitch, or breaking pitch. For instance, the fastball model can simultaneously address the difference between four-seamers with a bit of tail vs. sinkers with great movement. Pitches on 0-0 counts are excluded at this stage so the previous pitch attributes can be evaluated in full.

After fine-tuning through cross-validation, the models perform reasonably well. By area under the ROC curve (AUC), my predictions would be regarded as fairly good, as per this primer from the University of Nebraska.

Results

The influence scores reported by the boosted trees are normalized to sum to 100, showing the relative importance of each of the variables passed into the models. Pitch attributes will be our starting point, as they do the heavy lifting in explaining grounders. First we’ll put the sinkerballer’s credo to the test, considering vertical location’s influence results and its underlying marginal effects: modeled values of groundball percentage as the other factors are taken at their averages.

In this and several other charts, you’ll notice in the x-axes that PITCHf/x factors were rescaled so lefties and righties of various heights can be compared analogously. Here, a knee-high pitch takes a value of zero.

INFLUENCE OF VERTICAL LOCATION ON GB%
Fastball Offspeed Breaking
19.5% 23.0% 19.8%

vertical_location
Vertical location clearly is important. In terms of influence, it’s the most vital factor in offspeed and breaking-ball predictions and a close second in fastball predictions. Still, pitchers should recognize that it’s just ~20 percent of the equation.

What are the practical effects of a well-located pitch? With similarly interweaving curves, it’s clear that down is universally better. One notable difference here is that the curves for secondaries curl up on the right side—showing that as pitchers throw breaking balls and offspeed pitches higher and higher, GB% flattens out and stops dropping. Contrast that with fastballs, for which GB% continues to get worse as pitches are thrown belt-high and above. Locating an otherwise-identical pitch an inch lower raises the probability of a grounder by 1.4 percent on fastballs and 1.2 percent for offspeed and breaking balls.

Let’s now consider the horizontal component of location.

INFLUENCE OF HORIZONTAL LOCATION ON GB%
Fastball Offspeed Breaking
9.8% 17.7% 13.2%

horizontal_location
By influence, horizontal location matters quite a bit more for offspeed than hard or breaking stuff, but all chart curves here follow a sinusoidal path. Throwing inside to a batter is a bit better than grooving one (obviously); after passing by the middle of the plate, GB% climbs and climbs the farther outside a pitch is from a batter. I would have guessed that pounding hitters inside with riding sinkers or cutters is a reasonably effective way to get grounders, but that’s hardly the case. Instead, as long as a pitch is thrown at least from the middle of the plate (at dist ≈ 2.25) and outward, each additional inch outside will result in GB% rises of 1.7 percent for fastballs, 1.4 percent for offspeed, and 1.3 percent for breaking balls.

Moving down the line, we’ll next look at movement.

INFLUENCE OF VERTICAL MOVEMENT ON GB%
Fastball Offspeed Breaking
22.8% 14.3% 13.0%

vertical_movement
The most interesting takeaway here is that vertical movement is the most crucial factor for fastballs. Heat’s percentage importance nearly doubles the rates owned by secondaries, and its marginal GB% changes faster.

Still, for all pitches, the more downward movement—the more negative the number is—the better the resulting GB%. “Rising” pitches—positive movement—aren’t good for grounders. The penalty eases up slightly at the highest rungs of movement, although the curves are still strongly linear (with correlations in excess of 0.98 in all three instances). All else equal, each additional inch of downward movement will increase GB% by 3.5 % percent for fastballs, 2.2 percent for offspeed pitches, and 1.7 percent for breaking balls.

INFLUENCE OF HORIZONTAL MOVEMENT ON GB%
Fastball Offspeed Breaking
9.2% 1.6% 1.7%

horizontal_movement
Lateral movement is among the most important characteristics for fastballs, comprising nearly 10 percent of the recipe. The chart shows that fastballs moving in towards hitters (with negative movement) are effective groundball pitches; those are cutters to opposite-handed hitters and sinkers to same-handed hitters. For each additional inch of inward movement beyond 1.5 inches, a pitcher can raise his fastball GB% by 2.5 percent. Any outward movement beyond that hurts the groundball effort.

It’s easy to envision a hitter swinging at a fading changeup and weakly grounding out to the pull side. But the extent that an offspeed (or breaking) pitch moves away from a batter is of no consequence. This is reflected in small influence figures and flat marginal curves. If anything, sliders and changeups moving in towards batters are a teeny tiny bit more effective, but in the end, lateral movement shouldn’t be part of the pitcher’s calculus if he’s looking to turn an offspeed or breaking pitch into a grounder.

Velocity is a big part of Britton’s ability to overpower hitters; how does it help groundball percentage?

INFLUENCE OF VELOCITY ON GB%
Fastball Offspeed Breaking
3.6% 10.1% 10.6%

velocity
In terms of influence, velocity is triply more important for secondary pitches than for fastballs. Yet all the velocity curves are similar, being close to parallel as they proceed on extremely linear paths. With an extra 1.0 mph of velocity, an otherwise identical pitch will yield a 1.5 percent rise in GB% for fastballs, a 1.7 percent GB% bump for offspeed pitches, and a 1.6 percent jump for breaking balls.

The rest of the pitch attributes, shown below, hold much lesser weight in prediction. Surprisingly, release height is among this group.

INFLUENCE OF OTHER PITCH ATTRIBUTES
Input Fastball Offspeed Breaking
Release Height 2.4% 2.1% 2.0%
Break angle 1.9% 1.0% 3.6%
Spin rate 1.5% 1.2% 2.0%
Arm angle 2.9% 2.4% 2.5%

release_height
When a short pitcher comes along, there’s a question of whether his lack of downward plane will make it hard to get hitters out. But height doesn’t measure heart, and it also isn’t much of a groundball catalyst. It helps some; an extra upward inch in release height, for instance, adds a 0.4 percent GB% boost on fastballs. But again, there are much larger groundball rises to be had if a pitcher can squeak out inch-level improvements in location and movement. Short guys shouldn’t be deterred if their groundball stuff is otherwise solid.

Next we’ll move onto the other variable categories. The influence figures for the situational factors are shown below.

INFLUENCE OF SITUATIONAL FACTORS
Input Fastball Offspeed Breaking
Handedness   0%   0%   0%
Plate count 2.1% 0.4% 0.5%
Leverage index 0.4% 0.7% 0.8%
Double-play situation   0%   0% 0.1%
Sac fly situation   0%   0% 0.1%
Two out nobody on   0% 0.1%   0%
Year 0.7% 1.2% 1.2%

Even if a batter wants to hit a sac fly, stay out of a double play, or launch a home run with two outs and the bases empty, there won’t be any change in whether his batted ball is a grounder or not. Whether or not the batter finds himself in a clutch situation hardly matters either. Failing to pick up crucial sac flies can be frustrating, but maybe we should give batters a pass, as the outcome appears to be out of their control and counterbalanced by the pitcher’s desire to prevent a fly ball.

Plate count matters a bit on fastballs—GB% trickles down by about one percent as the count becomes more favorable to the hitter.

Another finding here is that batter/pitcher handedness, in and of itself, is irrelevant. It’s how pitches move that is important. That’s a nontrivial distinction, particularly when many managers are wedded to making substitutions that optimize the traditional left/right platoon. Pitcher arsenals need to be considered when making relief and pinch-hitting substitutions.

INFLUENCE OF BATTER/PITCHER TALENT
Input Fastball Offspeed Breaking
Batter Proj. GB% 12.3% 11.4% 13.5%
Batter Proj. HR%  1.2%  1.2%  1.6%
Pitcher Entropy  1.3%  1.3%  1.6%

batter_gb_pct
Yes, it’s true: groundball-hitting batters hit grounders. These influence figures hover around ~12 percent, far less than the pitch attributes discussed above. The upshot here is that pitchers are much more in control of whether or not the ball is hit on the ground.

The batter’s home run talent and pitcher’s ability to mix pitches hold virtually no significance, the same fate that meets the previous pitch characteristics.

INFLUENCE OF PREVIOUS PITCH ATTRIBUTES
Input Fastball Offspeed Breaking
Vertical location 1.5% 2.3% 2.2%
Horizontal location 2.2% 1.9% 2.4%
Vertical movement 1.0% 1.6% 1.9%
Horizontal movement 0.7% 0.7% 1.0%
Velocity 1.0% 1.1% 1.3%
Break angle 0.8% 1.4% 1.6%
Is fastball   0% 0.1%   0%
Is offspeed   0%   0%   0%
Is breaking   0%   0%   0%
Is hard fastball inside   0%   0%   0%
Is slow fastball inside   0%   0%   0%
Pitcher pace 0.8% 0.8% 1.7%

The way a pitcher immediately sets up the ball-in-play pitch is pretty unimportant in generating grounders. If all totaled, we can see that the “previous” variables are altogether a bit mightier—they do compose about one-tenth of the recipe—but improving a pitch’s GB% this way can only be done in tiny increments. All pitches see slight groundb all bumps if an inside pitch or a low pitch precedes the BIP pitch. Pitches’ groundball-friendliness also can get little boosts if the prior pitch “rises” high, is slower, or comes at a quick pace.

A pitcher who does all these things well can raise his GB% a few ticks. But the greatest increases come when pitchers improve their movement or location instead of sequencing. The last table shows that even the long-revered brushback pitch is inconsequential.

INFLUENCE OF PREVIOUS EVENTS IN THE PA
Input Fastball Offspeed Breaking
THIFB 0.1% 0%   0%
TSIFB 0.1% 0% 0.1%

Left out of the original analysis were a pair of extra factors that are worth testing. In previous research, Baseball Prospectus’ Harry Pavlidis found that to get a grounder from a pitched change-up, it’s good if there’s a small gap between the fastball and offspeed offering, and it’s good if the change-up sinks more relative to the fastball. I went back to re-run the offspeed model with these variables included. The direction of my results were in agreement with his: Offspeed pitches perform better with smaller velocity differentials and more sink than fastballs. But the big difference in my results is that these factors hold little import. Each factor hits just over three percent importance.

The difference, I’d think, is due to my use of more rigorous methods that further take context into account. Between this latest result and the lackluster results from the other sequencing variables, it’s clear that pitches have a natural gorundball talent unto themselves, largely distinct from other aspects of the arsenal.

Wrapping up with the Best Groundball Pitchers

This analysis shows that keeping the ball down is just 20 percent of the groundball puzzle, a lower estimation than most sinkerballers surely would guess. It’s important, but the same can be said of several other factors. Velocity, both components of movement, horizontal location, and the batter’s own groundball tendencies matter a great deal, and other factors also claim smaller chunks of predictive power.

So who does the model predict as the best groundball pitchers? The table below shows the top ten player-seasons by predicted GB% on all pitches (min. 100) through 2015. For completeness, separate models were run to make groundball estimates for pitches coming on 0-0 counts, and those predictions are included in these tallies.

PLAYER SEASONS WITH THE HIGHEST PREDICTED GB% (ON ALL PITCHES)
Player Team Year Predicted GB%
Zach Britton Orioles 2015 81.8%
Zach Britton Orioles 2014 80.9%
Cody Eppley Yankees 2012 80.1%
Brad Ziegler Diamondbacks 2012 79.5%
Jonny Venters Braves 2011 79.5%
Jared Hughes Pirates 2012 79.3%
John Holdzkom Pirates 2014 78.6%
Jared Hughes Pirates 2011 78.6%
Mike MacDougal White Sox & Nationals 2009 78.5%
Jared Hughes Pirates 2013 78.3%

Despite the upward locations in the initial chart, the models identify Britton as the best groundballer in the PITCHf/x era. His fastball’s velocity, downward movement, and lateral movement are all at the top of the class. Joining Britton at the top are several of the best sinkerballers of the past eight years.

Notice also there are more seasons coming from the Pirates than any other club. The model loves Jared Hughes, and John Holdzkom’s dominant nine-inning stint with the 2014 Buccos was enough to earn him a 7th-place ranking. This isn’t a surprise, given the Pirates’ devotion to a strategy of creating grounders to be gobbled up into defensive shifts. There are interesting questions that follow such as, what impact does being a Pirate have on a pitcher’s groundball percentage? Tomorrow we’ll examine that question and take a closer look at Pittsburgh’s strategy.

References & Resources

Appendix

Here are a few technical PITCHf/x details:

MLBAM pitch types are categorized as follows:

PITCH TYPE CATEGORIES
Fastball Offspeed Breaking
4-seam fastballs (FF) Changeups (CH) Curveballs (CB)
2-seam fastballs (FT) Splitters (FS) Knucklecurves (KC)
Sinkers (SI) Forkballs (FO) Sliders (SL)
General fastballs (FA) Cutters (FC) with above-average horizontal movement
Cutters (FC) with below-average horizontal movement


Gerald Schifman is the lead researcher at Crain's New York Business and a writer at The Hardball Times. He previously worked in the New York Mets' baseball operations department and in Major League Baseball's publishing department. Follow him on Twitter @gschifman.
16 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Jim S.
7 years ago

Excellent.

Adam
7 years ago

Great stuff. Do you have the top ten list limited to just SP?

Gerald Schifman
7 years ago
Reply to  Adam

Sure do. These are again all-pitch predictions, and the minimum pitch total was raised to 1,500.

Player Year Predicted GB%
Charlie Morton 2013 74.2%
Derek Lowe 2010 73.1%
Jake Westbrook 2013 72.6%
Derek Lowe 2008 72.4%
Aaron Cook 2009 72.3%
Roberto Hernandez 2008 72.0%
Justin Masterson 2010 72.0%
Felix Hernandez 2014 71.8%
Charlie Morton 2011 71.8%
Trevor Cahill 2014 71.7%

Peter Jensen
7 years ago

You only give us the top 10 predicted GB% pitcher seasons since 2008 to support your methodology’s effectiveness and unfortunately that is not much support at all. The top 3 predicted season are indeed the top 3 actual GB% pitcher seasons and Venter’s 2012, predicted as number 5 is actually number 6. Which would be impressive if any of the other 6 predicted top 10 were even close, but none of those even cracked the top 30. Hughes and Holdzkom’s predicted GB% as 78%+ actually turn out to be 56.3% and their top 10 rankings fall to the 280s. That would still rank those pitcher seasons in the top 10% of all pitcher seasons of more than 20 innings in GB% since 2008 and if that is all the precision that one is attempting then I guess it could be deemed a success. More troubling is that all of the top 10 predicted GB% are greater than the actual GB%. Some by as little as 3 percent, but 4 of the ten overestimating by more than 19%. This is an indication of significant bias that would have to be corrected. All of the top 10 are relief pitchers with consequently low innings. It would be interesting to see, as Adam suggests above, if a list of the top 10 starting pitchers are also biased by overly optimistic results as well. Also it appears that the author used the same data set for testing the results as he did for defining his model which would taint his results as well.

Gerald Schifman
7 years ago
Reply to  Peter Jensen

I didn’t list the top 10 seasons to show the models’ effectiveness; my intention was to highlight the best GB pitchers. I can show some additional testing here: I grabbed pitcher-pitch type-seasons which had at least 40 BIP, and compared the predicted GB% to the actual GB%, weighting pitchers by their total BIP. Aggregated up, fastball and changeup predictions were -0.1% less than the actual figures. For breaking balls, the difference was an even slimmer -0.01% underprediction. Over and underpredictions cancel themselves out, leaving no persistent bias in either direction.

As MGL notes below, zero error is to be expected when the testing is done in-sample. I briefly touched upon my out-of-sample testing above, but should note that the AUC scores were produced via six-fold cross validation. Models were built on 5/6 chunks of data and tested on the other 1/6. This process was repeated for all permutations. The average AUC hovered around ~0.7 in each instance.

One thing to remember about the figures in the table is that these are all-pitch predictions, not just estimates for BIP. It’s not an apples-to-apples comparison between those predictions and the actual GB% (which come only on BIP, of course). Britton, for example, is a guy for whom that makes a difference; in 2015, he actually was more consistent at keeping the ball down for pitches not put into play (vs. LHB: http://atmlb.com/1YwbcD9; vs. RHB: http://atmlb.com/1UzmRzG). A strength of the models is that we can see a fuller picture of pitchers’ GB ability, because “ground ball quality” isn’t restricted to the pitches on which batters happened to swing and hit balls onto the field of play.

Peter Jensen
7 years ago

The average AUC hovered around ~0.7 in each instance.

You didn’t include the actual AUC value in your initial article so there was no way to evaluate your model from that standard. You stated that your predictions were “fairly good” but the source you cited in your article evaluates a .7 AUC as being on the cusp between fair and poor.

One thing to remember about the figures in the table is that these are all-pitch predictions, not just estimates for BIP. It’s not an apples-to-apples comparison between those predictions and the actual GB% (which come only on BIP, of course).

I understand that evaluating all pitches is what you did I just don’t understand why you used that information to evaluate pitchers. I you implying that Cody Eppley you you list as the 3rd best GB pitcher with your metric predicting 80.1% of his pitches would be ground balls if hit only had an actual GB rate of 60.3% because the batters wouldn’t cooperate and hit his best pitches? The only way to evaluate your metric properly is to test its predictions on pitches that were swung at and hit and compare with the actual hit ball results. When you have a nearly 20% disparity between prediction and results for a particular pitcher it is incumbent on you as a researcher to find out what that pitcher is doing that caused your model to work poorly for that pitcher. And when that nearly 20 % disparity happens in 4 of your top 10 it may be time to begin again and try and construct a better model.

The title of your article is “Why Ground Balls Are Hit”. Your metric is designed to find out what pitches are most likely to induce ground balls. Whether it does that well is questionable for the reasons given above, but even if it did that only provide part of the answer as to why ground balls are hit. It may be necessary information but it isn’t sufficient.

Gerald Schifman
7 years ago
Reply to  Peter Jensen

Your point about the AUCs has merit. Still, scores were generally inside that ‘C’-grade band, and the predictions are least useful as benchmarks.

I’m saying that if Eppley’s every pitch were put into play, I predict that his GB% would be 80.1%. We can’t directly evaluate whether the model predictions are right or wrong, because many pitches won’t have a BIP to compare against. I look at the all-pitch predictions as another way to consider a pitcher’s GB talent beyond the typical GB/BIP percentage. The fact is that as a pitch crosses the plate with a certain location, movement, etc., it has an inherent GB probability—an expectation if the batter were to hit it into play. That table represents a quick look at GB proficiency that doesn’t rely on whether or not opposing batters chose to swing.

Peter Jensen
7 years ago

I you implying that Cody Eppley you you list as the 3rd best GB pitcher with your metric predicting 80.1% of his pitches would be ground balls if hit only had an actual GB rate of 60.3% because the batters wouldn’t cooperate and hit his best pitches?

This sentence should have been: Are you implying that Cody Eppley, who you listed as having the 3rd best pitcher season with your metric predicting 80.1% of his ground balls if they were hit, actually had a ground ball percent of 60.3 because the batters wouldn’t cooperate and hit his best pitches?

Sorry for the mistakes.

Louisa
7 years ago
Reply to  Peter Jensen

Como somos hipócritas,o dia que entendermos que o caminho para Deus se passa pelo coração e não pela igreja vamos parar com essa idiotice de ficar colocando selo em nossas testas de quem é Católico ou Evalnégicos,e assim entendermos que somos todos Cristãos,e que lutamos para seguir e trilhar o caminho até o Pai(Deus). Aqueles que entenderam digam Amém!!!

http://www.torontolimodepot.com/
7 years ago
Reply to  Peter Jensen

I also think MedBall cleans are pretty much harder than anything else. Today, after a full month, I think I actually started doing them right, which made them even harder.

http://www./
7 years ago
Reply to  Peter Jensen

I’m pretty sure the EBS outage affected only 1 of the 4 Availability Zones in US East 1 Region. Amazon preaches cross AZ balancing and scaling which they don’t charge extra for. If your application can’t do that, then maybe you shouldn’t be using AWS in the first place.

kredit für selbstständige anfänger
7 years ago
Reply to  Peter Jensen

great web website site…Thanks an excellent lot specifically for proclaiming this strategy through each one of the people you might know people products you might be communicating with relation in order to! Publication apparent. Kindly additionally speak with great net web we…

MGL
7 years ago

Good and interesting analyses, although somewhat intuitive. I agree with Peter in that the model should not be biased, i.e. in large samples the predicted and actual should be equal. He is also right that the training data and testing data should definitely be independent. Given that they are not, it is even more surprising that the predicted and actual results are not equal.

“This analysis shows that keeping the ball down is just 20 percent of the groundball puzzle, a lower estimation than most sinkerballers surely would guess. ”

A sinkerball is actually defined in two ways: One, pitches in the lower part of the zone, and two, more importantly, pitches that have less backspin than the average fastball such that the batter is fooled into swinging over the ball. You suggest that a “sinker” is simply a pitch thrown low in the zone. Really sinkers are fastballs with less-than-average backspin and SHOULD be thrown low in the zone to be effective, but if they are not, they are still sinkers.

Gerald Schifman
7 years ago
Reply to  MGL

Thanks, MGL.

I don’t think I imply that a sinkerball is only tied to vertical location, but yes, certainly—whether or not a pitch is a sinker hinges principally on downward movement.

As for your other remarks, please see my response to Peter (above).

GB
7 years ago
Reply to  MGL

“Good and interesting analyses, although somewhat intuitive.”

That may be the nicest thing MGL has ever said online.

Peter Jensen
7 years ago
Reply to  GB

That may be the nicest thing MGL has ever said online.

That is an unfair and untrue and silly comment. MGL is a tough critic and doesn’t pull his punches when he feels that posters haven’t done their homework, but he often gives praise to those who he feels have done serious research and analysis well.