Benchmarks for pitch types

This one is about the numbers. No pitcher to profile, no story to tell. Instead I’m sharing the initial output of a fairly extensive project—2010 pitch classifications.

I’ve managed to tag and review every pitch thrown so far in 2010, including spring training. The numbers below include only regular season games and, despite my best efforts, there are errors in the pitch classifications. Given the post hoc nature (as opposed to Gameday’s real time) of this labels, and the mix of automate, psuedo-automated and manual processes, I’m fairly confident in the utility of the data set for at least one purpose—creating a baseline for a variety of metrics that can be referred to from here on out.

In other words, when it comes down to an individual pitcher, pitch tags can be moved around but, as a group, there is enough of a sample to use the following numbers as benchmarks.

Here’s your big baseline, a lump of every classified pitch. Unclassified pitches are the result of PITCHf/x glitches (rare) or mid-plate appearance pitching changes (less rare).

Data definitions:

Type = pitch type
# = number thrown
rvERA = a rough but reasonable estimate of pitch effectiveness based on linear weights and outcomes on ball/strike counts
MPH = speed at release, 55 ft. from the back end of home plate
Swing = swing rate (swings/pitches)
Whiff = whiff rate (misses/swings), includes foul tips
Foul = foul ball rate (fouls/swings)
B:CS = umpire called ball-to-called strike ratio
IWZ = rate of pitches thrown within a "wide" strike zone
Chase = swing rate outside of the wide zone
Watch = take rate inside of the wide zone (inverse of swing rate)
nkSLG = non-K slugging, or SLGCON
GB% = rate of balls in play tagged by MLBAM stringers as grounders
LD% = line drives
FB% = outfield fly balls
PU% = infield fly balls
HR/FL% = home runs per outfield fly + line drive

rvERA is not league adjusted, park adjusted or starter/reliever adjusted. Batted ball outcomes are regressed towards MLB average outcomes. It’s a toy, maybe a fancy one, but a toy nonetheless.

Type # rvERA MPH Swing Whiff Foul B:CS IWZ Chase Watch nkSLG GB% LD% FB% PU% HR/FL%
All 131633 4.34 88 0.435 0.211 0.377 2.1 0.511 0.261 0.394 0.516 44% 20% 28% 7.4% 7.5%

These numbers include only 2010, so there are some weather-related changes to come. For example, fastballs (see below) will get faster and more fly balls will leave the park.

You’ll notice, despite a generous strike zone, pitchers have trouble throwing strikes, and the average ground ball rate is 44 percent. Both the GB percentage and whiff rate are up from 2009, so some decline could be coming over the next several hundred thousand pitches.

Now, for each pitch type. You’ll have to pardon my sometimes confusing two-letter abbreviations—please refer to this key.

Pitch-type abbreviations:

CH = Change-ups, may include some splitters that tail more than tumble
CU = Curveballs, probably some slurves
F2 = Two-seam fastball, sinkers, tailing fastballs
F4 = Four-seam fastball, generic fastballs
FC = Cutters and some slutters, can be a fuzzy group
FS = Splitters, foshes and forkballs, may include some other tumbling change-ups
KN = Knuckleballs, although some of Eddie Bonine's (et al.) are not in here
SB = Screwball, sole property of Danny Herrera
SL = Slider or slurve, even some slutters

Type # rvERA MPH Swing Whiff Foul B:CS IWZ Chase Watch nkSLG GB% LD% FB% PU% HR/FL%
SL 18722 3.82 84 0.456 0.327 0.317 2.4 0.484 0.304 0.374 0.505 45% 18% 29% 8.4% 7.8%
FS 1841 3.87 84 0.501 0.345 0.298 4.0 0.434 0.340 0.280 0.424 48% 19% 25% 7.3% 6.8%
CH 13325 4.16 83 0.491 0.307 0.291 3.7 0.441 0.325 0.287 0.452 50% 18% 25% 7.1% 6.9%
FC 7230 4.22 87 0.475 0.212 0.389 2.3 0.526 0.275 0.337 0.494 44% 21% 26% 9.3% 6.1%
CU 11156 4.47 77 0.373 0.261 0.327 2.2 0.467 0.254 0.487 0.512 49% 19% 27% 4.8% 8.3%
F4 46115 4.49 92 0.421 0.164 0.438 1.7 0.561 0.226 0.416 0.567 35% 21% 34% 9.6% 7.7%
F2 29551 4.53 91 0.430 0.128 0.391 1.9 0.543 0.239 0.400 0.499 52% 20% 23% 4.5% 7.3%
KN 821 4.88 69 0.445 0.227 0.384 2.7 0.515 0.236 0.348 0.563 37% 20% 32% 10.6% 8.1%
SB 42 5.26 66 0.333 0.071 0.286 2.5 0.429 0.250 0.556 0.111 44% 33% 11% 11.1% 0.0%

If a pitcher has a higher than expected HR/FL rate, will it regress toward league average or toward league average by pitch? For example, a fastball/curveball pitcher could be expected to give up more home runs per fly ball-plus-line drive than a sinker/change-up pitcher. If you get that awkward question, “where do ground balls come from?,” you can answer “from sinkers and off-speed pitches.” If your favorite pitcher doesn’t command, or even own, a sinker, a slider or change-up can get the ground ball when needed. You can also look at the above table and understand why a fastball that gets a whiff rate north of .3 is so darn impressive, while a slider with the same rate may not be.

Now let’s try some pitch types grouped together, but not in mutually exclusive groups. Cutters are in FC/SL and F4/FC—all of them. The CH/FS group is probably the most useful combination due to their similarity and overlap, followed by F4/F2 for the same reason. The rest are sketchy or totally arbitrary (KN/SB).

Type # rvERA MPH Swing Whiff Foul B:CS IWZ Chase Watch nkSLG GB% LD% FB% PU% HR/FL%
FC/SL 25952 3.93 85 0.461 0.295 0.337 2.4 0.496 0.296 0.364 0.502 45% 19% 28% 8.7% 7.3%
SL/CU 29878 4.06 81 0.425 0.302 0.321 2.3 0.478 0.285 0.416 0.508 46% 18% 28% 7.1% 8.0%
CH/FS 15166 4.12 83 0.492 0.312 0.292 3.7 0.440 0.327 0.286 0.449 50% 18% 25% 7.1% 6.9%
F4/FC 53345 4.45 91 0.428 0.171 0.431 1.8 0.556 0.233 0.405 0.557 36% 21% 33% 9.6% 7.5%
F4/F2 75666 4.51 92 0.425 0.150 0.420 1.8 0.554 0.231 0.410 0.540 42% 21% 30% 7.6% 7.5%
KN/SB 863 4.90 69 0.440 0.219 0.379 2.7 0.511 0.237 0.358 0.541 37% 21% 31% 10.6% 7.7%

While I’ve already called most of these groupings arbitrary and sketchy, there is utility hidden in a few places. For example, the SL/CU group may be handy for “breaking pitches” of unknown variety. I’m sure creative minds can think of more uses, and more sophisticated approaches. I hope we’ll see some of that in the comments. If nothing else, I hope this provides a handy reference.

References & Resources
PITCHf/x data from Sportvision and MLBAM. Pitch classifications by the author.

Print Friendly
 Share on Facebook1Tweet about this on Twitter2Share on Google+0Share on Reddit0Email this to someone
« Previous: Frustration and fantasy baseball
Next: Necessity is the mother of genius and beer leagues »

Comments

  1. Tom M. Tango said...

    Harry, good stuff.  A couple of points.  Why not have a “wide” and a “tight” strike zone, and then anything chased wide is Chase and anything watched tight is Watched.  The borderline pitches shouldn’t be in either chase or watch.

    Instead of SLGCON, why not wOBA CON?  SLG totally messes the weighting, while wOBA sets it right.  And, you can use simpler weights like 0.9 for 1B, 1.3 for 2B, 3B, and 2.0 for HR if you like.

  2. Harry Pavlidis said...

    Thanks Tom.
    To the first point, at Complete Game Consulting we are working on a improved strike zone model, but it isn’t ready for use in publications yet. I actually pulled the data out at the last minute. I think the notion of tossing the close pitches from chase/watch is fantastic, I’m taking that!

    Re SLGCON, I agree, in part. I present linear weight based rvERA already, so I go with something more “traditional” for the rest. But I think showing the non-swing, swing, and in-play+HR weights would be informative as you’re pointed out.

  3. James M. said...

    Looking at rvERA it implies pitchers should throw a lot more sliders, cutters and changeups and a lot fewer fastballs.  But aren’t those results a function of when the different types of pitches are most likely to be thrown (i.e., the first three when ahead in the count, fastballs when even or behind)?  To judge the effectiveness of each pitch type, don’t you need to put it in context of the ball/strike count?

  4. Tom M. Tango said...

    “based on linear weights and outcomes on ball/strike counts”

    It’s LWTS by ball-strike count.

  5. Peter Jensen said...

    Harry – Are Start_Speeds really calculated at 55 feet when all other start parameters are calculated at y0 or 50 feet? Why would you say that Watch is the inverse of Swing Rate when that is neither mathematically nor conceptually correct?

    Otherwise I like the table and commend you on the hard work that you have put in thus far.  As far as SLGCON goes I am not sure why Tom would say that it “totally messes the weighting.”  Slugging (bases/AB) seems entirely appropriate for a measure “on contact”.  I do like his idea of separating the pitches at the edge of the strike zone from those that are either clearly in or clearly out however.

  6. Tom M. Tango said...

    Peter, if you hit a HR in 6 PA, that’s +1.4 runs for the HR and -1.4 runs for the 5 outs.  That makes it an average set of 6 at bats.

    If you hit 1 single and 1 double in 6 PA, that’s +1.25 runs for the hits, and -1.12 runs for the 4 outs, or a bit over average.

    SLGCON says .667 in the first case and .500 in the second case.

    The weightings are wrong.

    Now, if you did “2” for a HR, you get a wOBA of .333.  And if you did “0.9” for a 1B and “1.3” for a 2B, you get .367 in the second case.

    That’s why you should throw SLG out the window and into the sewer.  And for good measure, explode it.  Then evaporate it into the sky.

  7. Harry Pavlidis said...

    A key factor in rv based metrics in context. You can not assume that throwing more sliders would result in a better overall pitcher.

    Peter – MLBAM publishes the data from 50 ft, you can extrapolate location anywhere (home plate, release point) from there. Watch is the inverse of Swing rate in the zone, 1-Watch=Swings in the zone

  8. Peter Jensen said...

    Yes, Tom, Linear Weights better measures the run value of an AB because it includes the value of preserving future PAs by avoiding outs.  I understand that as well as you do.  But slugging, by measuring just the number of bases per AB, gives a better measure of how hard the ball was hit, which I think is a legitimate measure of the effectiveness of a specific type of pitch.

    If you want to throw SLG “out the window and into the sewer” go ahead, but SLG has its uses as a baseball metric.  It is just a matter of using it appropriately and I think Harry has.

  9. Peter Jensen said...

    Peter – MLBAM publishes the data from 50 ft, you can extrapolate location anywhere (home plate, release point) from there. Watch is the inverse of Swing rate in the zone, 1-Watch=Swings in the zone

    Harry – Now you are really confusing me.  Are you saying that MLBAM is publishing the Pitch f/x Start_Speed as of 50 feet, but that you have extrapolated it back to 55 feet? 

    I understand that 1-Watch = Swings in the zone.  But you don’t give Swings in the zone.  You defined Swing as swings/pitches and then said Watch was the inverse of that.  So is Swing = swings/all pitches or just swings/pitches in the zone?

  10. Tom M. Tango said...

    Peter, certainly, if you want to measure how “hard a ball is hit”, that’s perfectly legitimate.  I question that SLG is that method.  I see no reason to accept that 0,1,2,3,4 as the weights for out,1b,2b,3b,hr as a proxy for hardness of ball hit.

    One can easily argue that the weights could be:
    0.5 contact out
    1.0 single
    3.0 double
    5.0 triple
    10.0 HR

    Indeed, all you need to do is run a regression of speed off bat to the outcome events, and you can come up a decent regression equation to give you what you want.

  11. Harry Pavlidis said...

    Peter – The confusion on swing/chase/watch is sloppy semantics.
    Swing = all swings/all pitches
    Chase = swings OWZ/pitches OWZ
    Watch = takes IWZ/pitches IWZ

    That said, based on the great feedback here, I’ll be changing those in the near future. I’ll also be sure to keep the explanations clear.

    Re. the 50 ft/55 ft. The data was originally provided at 40 ft, then 55, then they settled on 50. Everything is a parabolic curve fitting process, always has been. Plate locations are calculated, not measured, for example. You may recall Rand talking about the foam board experiments at the first SV Summit.

  12. Jeremy Greenhouse said...

    Harry, can you explain your reasoning as to why “you can not assume that throwing more sliders would result in a better overall pitcher.” The research I’ve seen would suggest otherwise.

  13. Harry Pavlidis said...

    That slider may be effective because it is set-up by “ineffective” fastballs or it may be effective because it isn’t seen often. This can be teased out, so there’s no need to assume one way or the other. Study the pitcher, how he uses the pitch etc, before determining a change in pitch mix. Or target a specific count/situation where there may be more clarity.

  14. Matt said...

    That the change-up would be an above average groundball generating pitch makes sense.  Most pitchers turn over the change-up and it really acts like a slow tumbling sinker.

    What a cool story, good read.

  15. Joe said...

    If you multiply Swing by Whiff, that gives you overall Miss%, which should be similar to fangraphs’ SwStr%.  Their league average is 8.3%, but here it’s 9.2% – can I assume that the difference is based on how you handle pitchouts (excluded?) and foul tips?

    Great article though – I had just been looking for league average swing and whiff rates like the kind mentioned in this article (http://www.fangraphs.com/fantasy/index.php/lincecum-on-another-level/) – Texas Leaguers says they have it but I couldn’t find it.

  16. Harry Pavlidis said...

    Foul tips are considered swinging strikes, bunts and IBB are tossed out. Pitch outs, good question. I’m not actually doing anything to handle them.

    I do have a odd feeling about the overall whiff rate – it looks high to me, but I double-checked. I haven’t triple-checked it though.

  17. Mike Rogers said...

    Awesome work, Harry. I love reading this stuff. Really liked Tom’s suggestion about the strikezones. Glad to see you taking that up.

  18. Peter Jensen said...

    Harry – I am aware of the history of Pitch f/x and the changes in calculating Start_Speed. But I was under the impression that Sportvision had standardized on 50 feet during 2009 and had used that distance for all the calculations during 2010.  I was trying to determine whether your definition above:

    MPH = speed at release, 55 ft. from the back end of home plate

    was just a typographical error and you were using the MPH reported by MLBAM, or if you were using an MPH at 55 feet that you had extrapolated yourself.

  19. RZ said...

    Excellent work Harry. It would be cool if you update this after the season then divided up for batter handedness. Curious to see how sliders rate.

  20. Harry Pavlidis said...

    This is great, you guys are outlining my next article for me

    - chase/watch based on Tango’s suggestion
    - LW broken down by swing/take and contact
    - L/R splits
    - break down by velocity

  21. Will said...

    I’d love to see FB2 and FB4 broken out further by speeds.

    I also think count-based outcomes are a key missing piece of information because every pitch is defined by that context.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>