How Much Can We Learn By Looking At “Stuff”?

by David Gassko
December 14, 2005

With the A.J. Burnett signing, I returned to wondering about one of the more interesting phenomena in baseball: the value of having “stuff.” Burnett’s numbers haven’t been that impressive, but when scouts see his 97-mph fastball and nasty knuckle-curve, they see a future Cy Young winner, a guy well worth the $55 million contract handed to him by the Blue Jays.

I’ve done some research into what can be done by measuring a pitcher’s stuff, but I wanted to go further. From that arose this (not by any measure complete) study.

I have scouting reports on 419 pitchers taken from Scouts Inc. reports on ESPN.com. These reports contain a list of the pitcher’s pitches and ratings on a scale of 1-9, with 9 being excellent. For the purpose of this study, I combined these ratings into the following categories: fastball, change up, breaking ball and “other” (generally, a screwball, splitter, or knuckleball). Also listed is a rating for each pitcher’s control taken from these scouting reports.

I then divided up the pitchers based on pitch combinations. For this study, I’ll use the all-inclusive group, that is, the 30 pitchers who throw a fastball, change up, breaking ball and “other” pitches.

I regressed the pitchers’ ratings onto their 2004 ERAs. I used a very broad definition of significance (.2), because the relationship between ERA and stuff is not strong enough for a more stringent definition (generally .05). That’s okay, in my opinion, if the results we get are statistically accurate, though I’m willing to let someone who knows more about statistics correct me if that’s wrong.

Anyways, only one pitch type—“other”—was not statistically significant. I have two theories on why this is so: (1) “other” signifies different types of pitches which might have different impacts on ERA; for example, maybe pitchers that throw forkballs have much lower ERAs than pitchers that throw screwballs, or (2) pitchers generally rely on their main pitches (fastball, change up, breaking ball) so often that the impact of other pitches is minimal and not statistically significant. It’s also quite possible (maybe even likely) that the relatively small sample size used contributed to the statistical insignificance. Whatever the reason, I removed “other” pitches.

The regression showed a correlation coefficient of .56 with 2004 ERAs, but the question that interests me is whether or not a Stuff Score” (the ERA predicted by a player’s pitch ratings) can be helpful in predicting future ERA. To that end, I looked at the correlations between the pitchers’ 2005 ERAs and their 2004 ERAs, their Stuff Score” and Tangotiger’s Marcel predictions:

               2004 ERA Stuff Score" Marcel
Correlation    0.47      0.45         0.49
Mean Error     0.95      0.93         0.95

You can see that in this sample, the three different measures were pretty close. Marcels had a slightly better correlation, but the lowest mean error came from the Stuff Score.” There should be no bias here, as the Stuff Score” was regressed onto 2004 ERAs. In essence, what that means is that knowing a player’s stuff can give you extra information in terms of forecasting a player’s future.

How much? That’s a valid question. To answer it, I regressed Marcels and Stuff Scores” onto 2005 ERA. The regression equation was .57*Marcel + .411*Stuff Score. What that means is that to project future ERA, you’d want to weight his past statistics (with an age adjustment and regression to the mean thrown in) at about 60%, and his stuff at about 40%. A pitcher’s stuff indeed tells you a lot about his future success.

This is just the beginning, however. To continue down this road, I (or someone else) am going to need to use a larger sample size and probably find a better way of combining ratings for individual pitches. Perhaps whether a pitcher is a lefty or a righty would have an impact as well. This is only the first step on a long road, and who knows where it will take us? But it’s clear to me that when projecting pitchers, having a rating of his stuff will help gain precious accuracy. Just by combining Marcels and my Stuff Score,” I was able to improve the correlation between predicted ERA and actual to .53 and reduce the mean error to .91. With more research, we can do better.

And Burnett’s projected 2004 ERA based on his stuff, by the way? 3.73. That can serve, to an extent, as his true talent level. It’s not really all that inspiring if you’re a Blue Jays fan.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG