Improving pitcher projections
by John R. MayneMarch 01, 2010
What if CHONE could scout pitchers?
What if the niftiest projection system out there could watch the pitchers throw, and make determinations how they would progress based on that?
Fangraphs’ pitch data gives us the tools to figure this out.
We’re starting from a more difficult place than projecting hitters. Projecting pitchers is hard. For one thing, they easily break. You look at Rich Harden funny, and his elbow pops. (Try it!) For another, they’re more erratic than hitters. So, we look at strikeouts and walks and BABIP and ground-ball ratio and ERA and possibly pet names. But so far, no pitch data. You can bring CHONE to a game and all it will do is complain about the TRS-80 you ported it to. It won’t watch.
But what if it could integrate scouting? What if it could add some numerical input based on scouty knowledge? Could we improve CHONE? Could we get substantially more accurate pitcher projections?
The answers, for those of you tired of the foreshadowing portion of the article and waiting impatiently for the content portion, are “We can,” “We can,” and “Yeppers.”
Here’s one scouty thing for pitchers: It is better to throw harder than softer. Announcers and some pundits like to say that it’s all about mixing pitches, changing speeds, hitting spots—and all those things are nice, and if you can’t do any of them, you’re Kyle Farnsworth—but it’s still better to throw harder. If over half your pitchers are fastballs and you throw 82, you’re playing against Sean Smith, not Albert Pujols, and I’m not the first to observe this.
My hypothesis was that pitchers who throw the ball real fast will do better than their projections. Pitchers who don’t throw the ball real fast will do worse.
So, if we had two pitchers:
Pitcher A, 5’11” 185 lbs, 165 IP 3.94 ERA 118K 74BB 1.71 GB/FB .299 BABIP
Average fastball speed 94.7 mph.
Pitcher B, 5’11” 185 lbs, 165 IP 3.94 ERA 118K 74BB 1.71 GB/FB .299 BABIP
Average fastball speed 85.2 mph.
Scouts would like Pitcher A a lot more.
Should CHONE look at the speed of the pitches? Does the speed of the fastball have a prediction value outside of the component stats?
I looked at the top 20 and bottom 20 pitchers in fastball speed for 2006. To qualify, a pitcher had to 1) Pitch at least 100 innings and 2) Use at least 50 percent fastballs (Tim Wakefield is not relevant to this discussion).
I then looked at their 2007 CHONE and their 2007 results. I did the same analysis for the following two years.
What I hoped to see was that the hard-throwing cohort would outperform CHONE, and the softer throwers wouldn’t. If this were true, this would be a significant breakthrough—the use of pitch data, breaks, and other numerically-definable scouting-type information could be used to generate some really excellent projections; improving on CHONE appears to be improving on the best projections available. If the hypotheses were correct, this would be the beginning of a new way to project pitchers—a better way.
At this point, I must caution you, gentle reader. Unless you are some sort of Rotisserie or Scoresheet player, terrible pitching can hurt your eyes. Even reading about it may cause pain. The experienced fantasy leaguer knows that for every misery-inducing performance of one’s own team, there are many fabulously entertaining performances by one’s competitors. This year, those performances are called, “Rafaels,” after the Indians’ Mr. Perez.
Let us view those who try to get by with less velocity—but keep throwing fastballs. These lists start at the very slowest, and require that the pitcher throw at least 50 percent fastballs. The last column shows which was better, performance or CHONE:
2006 Slow-throwers 2007 CHONE 2007 Performance
IP ERA IP ERA CHONE or Performance?
Greg Maddux 197 3.93 198 4.14 CHONE
Livan Hernandez 211 4.79 204 4.93 CHONE
Mark Redman 180 5.06 41 7.62 CHONE
Kenny Rogers 190 4.40 63 4.43 CHONE
Aaron Sele 130 4.58 53 5.37 CHONE
Tom Glavine 192 4.18 200 4.45 CHONE
Kirk Saarloos 121 4.98 42 7.17 CHONE
Barry Zito 206 3.62 196 4.53 CHONE
Mark O’Connor 117 4.19 7.17 ERA in AA CHONE
Jeff Francis 192 4.42 215 4.22 Performance
Paul Byrd 179 4.47 192 4.59 Even
Casey Fossum 147 4.88 76 7.70 CHONE
Chris Capuano 195 3.92 150 5.10 CHONE
Woody Williams 161 4.51 188 5.27 CHONE
Josh Fogg 172 5.17 165 4.94 Performance
Jason Jennings 180 4.05 99 6.45 CHONE
Steve Trachsel 140 5.15 158 4.90 Performance
Mark Hendrickson 172 4.46 122 5.21 CHONE
Pedro Martinez 63 3.11 28 2.57 Even
Zach Duke 195 3.94 107 5.53 CHONEThree very marginal wins for performance. 3-15-2 is a pretty impressive set for the theory that low-velocity pitchers are bad bets, even compared to a good projection system like CHONE. Will it get better if we look at 2008? Darken your sunglasses; the meltdown’s brutal:
2007 Slow-throwers 2008 CHONE 2008 Performance
IP ERA IP ERA CHONE or Performance?
Mike Maroth 120 5.48 Nada CHONE
Livan Hernandez 198 5.19 180 6.05 CHONE
Tom Glavine 194 4.78 63 5.54 CHONE
Lenny DiNardo 107 4.88 23 7.43 CHONE
Mike Bacsik 137 4.93 None CHONE
Barry Zito 202 4.19 180 5.15 CHONE
Greg Maddux 205 3.91 194 4.22 CHONE
Justin Germano 160 4.11 43 5.98 CHONE
David Wells 144 5.13 Called in Fat CHONE
Matt Chico 166 4.93 48 6.19 CHONE
Paul Byrd 194 4.55 180 4.6 Even
Woody Williams 178 4.80 None CHONE
Orlando Hernandez 145 4.10 Nope CHONE
Jeff Francis 207 4.39 143 5.01 CHONE
Mark Hendrickson 143 4.28 133 5.45 CHONE
Noah Lowry 164 4.50 Broke CHONE
Chris Capuano 177 4.32 None CHONE
Andy Sonnanstine 184 4.50 193 4.38 Performance
Randy Wolf 98 4.04 190 4.3 Performance
Chuck James 155 4.41 29 9.1 CHONEEven grimmer. CHONE collectively had some hope for these guys, but they were, as a class, even more thoroughly awful than expected; it’s like watching American Dad. If you rely on a fastball that isn’t fast, you’re going to get killed.But maybe CHONE just has a sunny disposition when it comes to pitchers. (I think this is actually true; CHONE tends toward more optimism than some other methods, though it won’t be mistaken for the Bill James projections.)
Let’s go to the charts for the top-end velocity pitchers; the pitchers are listed from 1-20 amongst qualifiers in average speed – the last column again shows which was better, the projection or the performance:
2006 Best Fastball 2007 CHONE 2007 Performance
IP ERA IP ERA CHONE or Performance?
Felix Hernandez 177 3.34 190 3.92 CHONE
Justin Verlander 152 3.99 201 3.66 Performance
AJ Burnett 167 3.74 165 3.75 Even
Daniel Cabrera 172 4.19 204 5.55 CHONE
Josh Beckett 186 4.00 200 3.27 Performance
Scott Proctor 91 3.99 86 3.65 Performance
Brad Penny 179 3.77 208 3.03 Performance
Seth McClung 99 4.98 12 3.75 Even
CC Sabathia 195 3.58 241 3.21 Performance
Matt Cain 184 3.50 200 3.65 Even
Jeremy Bonderman 200 3.63 174 5.01 CHONE
Kelvim Escobar 154 3.69 195 3.40 Performance
Chien-Ming Wang 182 4.09 199 3.70 Performance
Ervin Santana 169 3.94 150 5.76 CHONE
Johan Santana 212 2.51 219 3.33 CHONE
Jorge Sosa 119 4.20 112 4.47 CHONE
Ian Snell 169 3.94 208 3.76 Performance
Roy Oswalt 211 3.28 212 3.18 Even
John Smoltz 191 3.59 205 3.11 Performance
Ben Sheets 155 3.01 141 3.82 CHONEOK, pretty close: 9-7 in favor of performance. But this is a decent set for the hypothesis. Let’s see how our hard throwers did in 2008:
2007 Best Fastball 2008 CHONE 2008 Performance
IP ERA IP ERA CHONE or Performance?
Felix Hernandez 191 3.53 200 3.45 Even
AJ Burnett 165 4.04 221 4.07 Even
Justin Verlander 188 3.88 201 4.84 CHONE
Dustin McGowan 157 4.41 111 4.37 Even
Josh Beckett 197 3.79 174 4.03 CHONE
Daniel Cabrera 195 4.57 180 5.25 CHONE
Tim Lincecum 118 3.28 227 2.62 Performance
Edwin Jackson 153 4.76 183 4.42 Performance
Zach Greinke 124 4.35 202 3.47 Performance
Kelvim Escobar 175 3.86 None, thanks CHONE
Fausto Carmona 180 3.85 120 5.44 CHONE
Jeremy Guthrie 155 4.76 190 3.63 Performance
Brad Penny 195 4.11 94 6.27 CHONE
Matt Cain 196 3.54 217 3.76 CHONE
CC Sabathia 221 3.50 253 2.70 Performance
Ben Sheets 137 4.01 198 3.09 Performance
Roy Oswalt 213 3.59 208 3.54 Even
Chien-Ming Wang 189 4.33 95 4.07 CHONE
Matt Albers 153 5.53 49 3.49 Even
Jake Peavy 217 2.99 173 2.85 EvenAlas, a 6-8 run. When I did this for PECOTA, the numbers had a much better showing for the performance than the projections. CHONE's general optimism for pitchers may well be because it’s right to be more optimistic than other systems.Now, I did these charts after the fact, and I made the prediction that 2009 would be similarly situated. Was my method able to predict the future? Let’s go to the slowest of 2009 (with the prior caveats; retired pitchers weren’t considered for this year.)
2008 Worst Fastball 2009 CHONE 2009 Performance
IP ERA IP ERA CHONE or Performance?
Livan Hernandez 172 5.23 183 5.44 Even
Barry Zito 176 4.70 192 4.03 Performance
Jeff Francis 169 4.58 143 5.01 CHONE
Jeff Suppan 161 5.37 161 5.29 Even
Chris Young 114 3.95 76 5.21 CHONE
John Lannan 148 4.68 206 3.88 Performance
Jarrod Washburn 151 4.41 176 3.78 Performance
Greg Smith 144 4.88 Minors/bad/hurt CHONE
Pedro Martinez 76 4.45 44 3.63 Even
Scott Olsen 176 4.96 62 6.03 CHONE
Mark Hendrickson 60 4.20 105 4.37 Performance
Brian Burres 127 5.24 Mostly minors/bad CHONE
Brandon Webb 209 3.70 Ow CHONE
Chris Sampson 72 3.88 55 5.04 CHONE
Darrell Rasner 96 4.50 113 5.40 CHONE
Nate Robertson 171 4.53 49 5.44 CHONE
Andy Pettitte 166 4.55 194 4.16 Performance
Garrett Olson 157 4.24 80 5.60 CHONE
Tom Gorzelany 152 4.32 47 5.55 CHONE
David Bush 180 4.30 114 6.38 CHONEBest year ever for our soft throwers, going 5-17-3, and with pretty big wins by Zito and Lannan. While not reflected in the charts above, the data suggests to me that lefties should receive about one mile per hour credit for being lefties; it’s actually true that left-handers can get by with less velocity.Let’s see how our hard throwers did in 2009:
2008 Best Fastball 2009 CHONE 2009 Performance
IP ERA IP ERA CHONE or Performance?
Joba Chamberlain 102 3.26 157 4.75 CHONE
Ubaldo Jiminez 166 4.34 218 3.47 Performance
Dustin McGowan 118 3.97 Broke CHONE
Felix Herndandez 185 3.60 238 2.49 Performance
Ervin Santana 191 3.77 139 5.03 CHONE
Josh Beckett 177 3.61 212 3.86 Even
AJ Burnett 167 3.83 207 4.04 Even
Tim Lincecum 154 3.21 225 2.48 Performance
Clayton Kershaw 115 4.15 171 2.79 Performance
Edwin Jackson 145 4.78 214 3.62 Performance
CC Sabathia 211 3.41 230 3.37 Even
Justin Verlander 185 3.94 240 3.45 Performance
Edinson Volquez 166 3.58 49 4.35 CHONE
Johnny Cueto 145 4.19 171 4.41 Even
Zack Greinke 137 4.14 229 2.16 Performance
Jeremy Guthrie 156 4.33 200 5.04 CHONE
Seth McClung 68 3.84 62 4.94 CHONE
Matt Garza 168 3.96 203 3.95 Even
Jorge de la Rosa 108 4.67 185 4.38 Performance
Fausto Carmona 129 4.26 125 6.32 CHONEWe see a slight edge here for the performance, but it’s not big – further study is needed to see if there’s a reliable effect. The 2009 crew of hard throwers had little more injury-related falloff than prior years.For now, we can confidently assert that slow-throwers who rely primarily on a fastball will do worse than their prior component stats and worse than a sophisticated projection model would otherwise predict.
We can unconfidently assert that fast-throwers will do slightly better than their prior component stats and better than a sophisticated projection model would otherwise predict.
This is just the first step. We can look at how the pitch selection, and pitch speeds affect likely outcomes. At some point, we can incorporate pitch break. We can improve CHONE. Or, someone can improve CHONE who has actual database skills. Trained monkeys have better database skills than I do (at least, the trained monkeys who generate Marcel at Tango Palace do.)
I’ve taken a few additional steps in modifying the formula: [Velocity=Good]. Because I’m a reprehensible human being, I’m not sharing. But, here’s the list of leaders and trailers for next year, under the alpha version of the projection modification system. This is a ranking of *differences from an established projection system*, not overall goodness. If you go draft Jeremy Guthrie ahead of Tim Lincecum, you’re missing the point.
Good:
1. Homer Bailey (You can imagine my concern about putting a Dusty pitcher here.)
2. Ubaldo Jiminez
3. Edwin Jackson
4. Justin Masterson
5. Jeremy Guthrie
6. Justin Verlander
7. Brad Penny
8. AJ Burnett
9. Josh Johnson
10. Felix Hernandez
Bad:
1. John Lannan
2. Jered Weaver
3. Aaron Laffey
4. Derek Lowe
5. Dallas Braden
6. Brian Moehler
7. Jeremy Sowers
8. Zach Duke
9. Joel Pineiro
10. Ted Lilly
I expect the Good guys to have ERA’s about 5 percent less than CHONE projects, but I could be wrong. I expect the Bad guys to pitch many fewer innings than CHONE expects, and be far less effective, and I’m not wrong on that.
At the end of the season, we’ll revisit if Dave Studeman drops the restraining order and lets me stuff more crumpled charts on yellow lined legal paper on his doorstep.
What do you think? Have I missed something important? Or am I right?
Place your bets in the comments.
John R. Mayne is a prosecutor in Northern California. Neither the county he works for nor his elected boss take a position on Jeremy Guthrie’s likely 2010 efficacy, which probably doesn’t come as too much of a shock. Mayne has published baseball articles at Baseball Prospectus, BBHQ, and RotoWire and, years ago, in something called a "newspaper."
<< Return to Article