Exploring contact quality

by Dan Turkenkopf
February 13, 2009

Voros McCracken shocked the baseball world when he introduced his concept of Defense Independent Pitching Statistics (DIPS), which suggested that pitchers had a lot less influence over the outcome of balls in play than anyone thought. This was a radical idea at the time, and many were quite adamant that it could not be true.

Everyone knew that some pitchers tended to evoke weak grounders to the second baseman or soft line drives. Others seem to have a tendency to be hit hard, whether a screaming line drive or a scorching grounder up the middle.

Since then, many analysts have offered refinements. One is David Gassko, who released DIPS 3.0 on this site in 2005. With DIPS 3.0, batted ball rates were introduced into the equation, which begins to account for the types of balls in play a pitcher surrenders. But DIPS 3.0 is based on league average hit rates, and so doesn’t address the objection about pitchers giving up easier or harder balls to field.

Now, just because people believe something is true doesn’t necessarily mean it is. If that were the case, there would be no concept of defense independent pitching at all.

According to our eyes, though, and the collected wisdom of more than a century of baseball history, pitchers clearly have an impact on whether a ball in play is turned into an out. And most defense independent statistics do show at least some individual influence, by incorporating things like ground ball ratio or strikeout rate when determining a pitcher’s expected Batting Average on Balls in Play (BABIP).

But does such an impact exist?

Suppose that pitchers do have an influence on whether a given ball in play is turned into an out. Conceptually, we’d probably say that pitchers who give up harder hit balls or those whose pitches are hit in hard-to-reach places are more likely to have a higher BABIP against than those who give up soft line drives and two-hoppers to the infield. So our key identifying factors would be speed of the batted ball and where it was hit on the field—which I’ll give the nebulous term of “contact quality.”

With detailed ball in play data, it should be possible to identify the probability of a given ball in play being turned into an out. This is the idea behind all of the play by play fielding metrics out there. But the same concept can be applied to pitchers (and has been with PZR). By summing those probabilities over the course of every ball in play given up by a pitcher, we can predict his BABIP against.

David Pinto takes care of that for us with his Probabilistic Model of Range (PMR). While PMR is intended as a measure of defense, Pinto also publishes a summary of how well the defense performed behind each pitcher. In that analysis, he determines what each pitcher’s predicted BABIP against is based on six criteria:

1. Direction of hit (a vector).
2. The type of hit (fly, ground, line drive, bunt).
3. How hard the ball was hit (slow, medium, hard).
4. The park.
5. The handedness of the pitcher.
6. The handedness of the batter.

Notice that our key components of contact quality are included as items one and three. But how can we isolate those two facets and see if there’s anything to the idea that pitchers have influence over the contact quality?

Based on batted ball rates, and using some sort of defense independent conversion rate, we can develop a different measure of expected BABIP—one that doesn’t include contact quality.

If we then subtract the batted ball expected BABIP (eBABIP) from the PMR predicted BABIP (pBABIP), we can remove the impact of batted ball type on BABIP against, leaving contact quality (plus things like park, and batter and pitcher handedness, but those things can be corrected for).

Note that because both measures are based on some average conversion rate, they are defense independent, so we don’t have to worry about the influence of the fielders behind a pitcher.

Let’s run through a quick example of the calculations.

In 2008, A.J. Burnett had 612 balls in play, broken up as such:

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

Infield flies               23
Fly balls                  106
Bunts                       15
Groundballs                299
Liners                      76
Fliner infield flies         1
Fliner fly balls            49
Fliner LD infield flies      0
Fliner LD fly balls         43

For those who are unfamiliar with the term, a fliner is in between a fly ball and line drive. BIS separates fliners into two categories: more like a fly ball and more like a line drive.

Taking those events and assigning a league average hit rate to each category, we can determine that Burnett was expected to have an eBABIP against of just under .300.

Looking Pinto’s PMR numbers for 2008, Burnett was predicted to have 411 outs in 613 balls in play (I’m not sure where the discrepancy came from), for a pBABIP of .329.

We simply subtract the batted ball type average (eBABIP), .300, from the PMR average (pBABIP), .329, to get a difference of .029, or Contact Quality Average (CQA). This implies that the balls that were hit against Burnett were substantially harder to field than you would expect—to the tune of almost 30 points of batting average.

The run value of turning a ball in play from a hit to an out in the American League in 2008 was .94 runs, so Burnett’s BABIP difference is worth almost 17 runs or approximately .7 runs of ERA.

I repeated the steps for the other 139 pitchers who had at least 300 balls in play in 2008 and identified the pitchers who gave up the easiest and hardest to field balls in 2008. CQA is the Contact Quality Average, or the difference in expected BABIP attributable to how the location and speed of the batted balls. Runs is the effect of the CQA based on linear weight run values for balls in play, and ERA is the impact to the pitcher’s ERA of their contact quality.

Easiest balls to field

Pitcher		CQA	Runs	ERA
Greg Maddux	-0.017	-8.14	-0.48
Cole Hamels	-0.013	-7.42	-0.29
Ted Lilly	-0.010	-5.36	-0.24
John Maine	-0.008	-3.03	-0.19
Todd Wellemeyer	-0.008	-4.14	-0.19
David Bush	-0.007	-3.45	-0.17
Gavin Floyd	-0.006	-3.48	-0.15
Barry Zito	-0.004	-2.36	-0.12
Fausto Carmona	-0.004	-1.6	-0.12
Jake Peavy	-0.003	-1.15	-0.06

So far, so good. If you had to name one starter who induced weak contact, Greg Maddux would be among your top choices.

Hardest balls to field

Pitcher			CQA	Runs	ERA
Boof Bonser		0.055	19.94	1.52
Nate Robertson		0.054	28.57	1.53
C.C. Sabathia		0.050	15.77	1.16
Dustin McGowan		0.050	15.78	1.28
Vicente Padilla		0.044	21.93	1.15
Garrett Olson		0.044	18.56	1.26
Felix Hernandez		0.043	23.15	1.04
Daisuke Matsuzaka	0.041	17.17	0.92
Randy Johnson		0.041	19.66	0.96
Andy Sonnanstine	0.040	24.09	1.12

Note: Sabathia’s results incorporate onlyhis time in Cleveland. In Milwaukee, he was an additional .46 runs of ERA above expected based on his batted ball types.

There are some surprises on this list, including some pitchers who had very good seasons. Matsuzaka had a very strange year by defense independent numbers. He ended up with 2.90 ERA, which was over one run lower than his FIP. And, according to this measure, he should have given up yet another run based on the quality of contact against him. He looks like he might be in for some serious regression this season.

The leaders and trailers are interesting, but we haven’t discussed hit the million dollar question: Is contact quality a repeatable skill?

It’s hard to tell. To this point, I haven’t corrected for park and the handedness of the pitcher and the batter. The correlation between the CQA from 2007 to 2008 is .26, which is not great, but is roughly the same level as home run rate and the PMR predicted BABIP itself.

Of course CQA as currently calculated includes other factors, like park and the handedness of the batter and pitcher which will need to be accounted for to get a better understanding of the total effect of contact quality. And contact quality could be the result of a myriad of other factors that already are incorporated into DIPS 3.0 or another measure.

The next steps are to apply the corrections for park and handedness, then attempt to predict CQA from more common measures like batted ball rates or strikeout rate. I’d also like to see if it adds any predictive power to DIPS 3.0. Finally, I plan to incorporate PITCHf/x data to see what influence pitch speed and, perhaps more importantly, pitch location have on CQA. Stay tuned for further exploration in future articles.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG