Monday, September 15, 2008
Introducing plate discipline statsPosted by Derek Carty at 11:03pm
Last year, Pizza Cutter over at MVN's Statistically Speaking blog used Retrosheet pitch-by-pitch data to dig deeper into what we generally call "plate discipline." What is plate discipline, and how do we measure someone with good plate discipline? His findings were extremely interesting and have significant fantasy implications.
Now, we now have access to data that weren't available when his study was originally conducted, so I'll be introducing some changes and two new stats based on these new data.
Until now, even some of the more statistically inclined fantasy analysts have gone only as far as breaking down "plate discipline" into strikeouts and walks, sometimes separately, other times in the form of BB/K. These analyses use BB/K as a measure of a hitter’s plate discipline, batting eye, or ability to distinguish between balls and strikes.
This seems to make sense at first glance. Pizza Cutter put it well when he said that "the logic behind the ratio seems relatively obvious: the two outcomes of a plate appearance that don’t involve the ball being put into play are the walk and the strikeout. As both events are made up by either good or bad command of the strike zone, it would seem that they are the opposite sides of the same coin." His research that follows, however, shows that this isn't nearly the strongest measure we can find. Strikeouts and walks don't actually appear to be direct opposites of each other, as many people assume and as BB/K seems to imply.
As an example, let's look at the argument that a player who takes a lot of walks is disciplined. While this may be true, it's also very possible he isn't disciplined at all. Consider the scenario in which a hitter takes a lot of pitches—even some good ones—because his batting eye is so poor he has difficulty distinguishing between good and bad pitches. So to compensate, he lets any ball he isn't totally sure about go by. He'll take walks, but that doesn't mean he's disciplined. In actuality, his eye stinks!
To take it a step further, let's say he makes contact with every pitch he actually does swing at, keeping his strikeout rate low. While his BB/K might be good, this isn't because he has a good eye. It's because he takes a ton of pitches and has good bat control to compensate.
What fantasy players care about
As fantasy players, walk rates don't mean a whole lot to us. Sure, those with high walk rates will score some extra runs and have more opportunities to steal bases, but there are plenty of more important stats to focus on. In the context of plate discipline, what we really care about is strikeout rate, because it has a direct effect on batting average.
We've discussed the topic before, but essentially, if a batter strikes out, he has no chance at all of getting a hit. If he avoids striking out and puts the ball into play, there is always a chance that it will fall for a hit. Therefore, the players who limit strikeouts best will have the most opportunities to get hits and have the higher batting averages, holding all else constant.
This being said, today we'll look at some new statistics that will really illuminate a player's skills in terms of limiting strikeouts.
Sensitivity and response bias
We've now established that BB/K isn't optimal, but how should we measure plate discipline, and more importantly for us, the plate discipline skills that allow a batter to limit strikeouts? Pizza Cutter came up with a very cool way using signal detection theory. I'd highly recommend reading his entire work on the matter (although be warned: It is a lot to digest. There's a simpler discussion here).
In summary, he came up with two new statistics, which he calls sensitivity and response bias. Sensitivity "is a reflection of how good a batter is at judging between the pitches at which he should and shouldn’t swing." Response bias shows the batter's tendencies when he makes a mistake. Is he taking too many pitches or swinging at too many? (If a batter is going to make mistakes, these stats show that swinging more will limit strikeouts better than taking too many pitches.)
Technical changes to sensitivity and response bias (for those who are interested)
When Pizza Cutter conducted his study, it was before FanGraphs published pitch-by-pitch data and right before PITCHf/x really exploded, so he was unable to incorporate location data. Now that we have location data, however, we are able to account for some of the things that were then impossible.
If you've read through his methodology, then you'll probably also be interested in the next few paragraphs, which explain the changes I made based on FanGraphs' location data.
In the previous incarnation, a "correct swing" included all contacted balls. In my version, it includes all in-zone swings. In the previous incarnation, a "Type I Error" included all swinging strikes. In my version, it includes only out-of-zone swinging strikes. All called strikes remain a "Type II Error," and all balls remain a "correct rejection."
Pizza Cutter made a distinction between two-strike foul balls and zero and one-strike foul balls. Because FanGraphs doesn't make this distinction, all foul balls are simply treated as contacted balls.
I could have used PITCHf/x data and made this distinction, but there were a few reasons I didn't. The first is because we have only one full season's worth of data and we wouldn't be able to track a player's progress yet (something we like to do in fantasy). The second is because I'd like to run tests this offseason to see how plate discipline affects other skills, and more than one season is needed for this. The last is because I'd like to provide you with a way to calculate these stats on your own throughout the season. At the bottom of this article, you'll find a download link for an easy-to-use calculator.
The purpose of the statistics remains the same.
Superficial changes to sensitivity and response bias (for everyone)
Pizza Cutter used the terms "sensitivity" and "response bias" because they are the technical terms associated with the statistical technique used. We're going to change the names to make it easier to understand what I'm talking about. From this point, I'll refer to sensitivity as judgment (I'd call it eye, but this could get confusing since BB/K is often called eye) and response bias will be called aggressiveness/passivity bias (or Agg/Pass or A/P).
If anyone has a better suggestion, please let me know; I realize "aggressiveness/passivity bias" really isn't very catchy, although it does sum up what it is. When a hitter makes a mistake, is it because he is too aggressive or too passive? Swinging too much or too little?
In the spirit of keeping things simple, I've also made
The scale for
In addition to judgment and aggressiveness/passivity, I'll be using a third statistic that will help us evaluate a hitter's plate skills: bat control. If a batter has a perfect eye but isn't able to take advantage of it by swinging the bat well, what's the point?
Bat control is the percentage of balls within the strike zone that the hitters makes contact with (given that he swings). The formula is (in-zone contacted balls)/(in-zone swings).
Since a hitter is swinging, we can assume it's because he believes he can hit the ball (yes, this isn't always true, but it's good enough for our purposes), and a ball within the strike zone is definitely capable of being hit (by focusing on in-zone pitches, we ignore the times where a batter swings and misses on pitches out of the zone. This is because these are more likely to be caused by poor judgment, not poor bat control—the batter shouldn't be swinging at a ball outside the strike zone if he isn't able to hit it).
So the percentage of times he does what he intends to do (make contact) when he should be expected to (when it's in the strike zone), I contend, gives us a good measure of bat control.
I wanted to make this into an index stat as well, but because league average is roughly 88 percent, even a batter who hits every single in-zone pitch he swings at (100 percent bat control) would receive only a 113 bat control Index score. This would be confusing, because the difference between a 103 and a 113 index doesn't look large, but it's actually the difference between a slightly better than league average batter and a perfect one!
So, we'll have to use the raw data. League average is 88 percent, better than 95 percent is very good, less than 80 percent is very bad.
Bad ball hitting
While bat control measures a batter's ability to hit balls that the rules of baseball say he should be able to (pitches within the strike zone), some hitters are able to hit balls that they really aren't expected to (pitches out of the strike zone). So our second new stat we'll call bad ball hitting (name lifted from Dan Fox's article on plate discipline stats).
It is calculated as (out-of-zone contacted balls)/(out-of-zone swings).
Here it would be more feasible to use an index scoring system, but I think we'd just be getting too confusing. Since bat control uses raw percentages, so will bad ball hitting to stay consistent. League average is 57 percent. Above 75 seems to be good, above 85 is elite. Below 50 generally seems to be poor, below 45 is terrible.
Some quick correlations. If you're not interested in the exact numbers, it might be worth it to look quickly at the order in which our new stats best predict strikeout rate.
+-------------+-------------+ | STAT | CORRELATION | +-------------+-------------+ | Bat control | -0.85 | | Bad ball | -0.79 | | Judgment | -0.57 | | A/P | 0.42 | +-------------+-------------+
Just about as I'd expected: Bat control is most important, followed by the ability to make contact with balls that aren't expected to be hit, followed by judgment, and finally the hitter's aggressiveness/passivity bias.
And if you're interested, here's how they predict walk rate:
+-------------+-------------+ | STAT | CORRELATION | +-------------+-------------+ | A/P | -0.39 | | Judgment | 0.22 | | Bad ball | -0.18 | | Bat control | -0.17 | +-------------+-------------+
Here are the inter-class correlations that show how stable these stats are from year-to-year:
+-------------+-------------+------+ | STAT | CORRELATION | R-SQ | +-------------+-------------+------+ | Bat control | 0.90 | 0.82 | | A/P | 0.82 | 0.67 | | Bad ball | 0.81 | 0.65 | | Judgment | 0.74 | 0.55 | +-------------+-------------+------+
As you see, they are all pretty stable.
Calculate our four plate discipline stats on your own
Click here to download an Excel spreadsheet that will allow you to calculate these stats on your own. You can use it throughout next season to see which players are changing their approach or getting better in terms of their plate skills.
All you need to do is copy the "Plate Discipline" section of stats from any FanGraphs player page and paste it into the corresponding section in the worksheet (make sure not to copy the headings or it screws things up). Everything else should be taken care of.
If you have questions about my methodology or thought process, on how to use the stats, or really any questions at all, feel free to e-mail me.
References, resources, and thanks
A huge thanks to Pizza Cutter. He spent a lot of time talking with me about this.
The new stats use FanGraphs plate discipline data.
While including location data has alleviated one of the concerns people had with sensitivity and response bias, others still haven't been addressed. The biggest is probably the assumption that all pitches thrown inside the strike zone should be swung at. Certainly some hitters have cold zones or see pitches that they don't think they can do much with. In the long run, they might be better off taking a called strike and waiting for a better pitch.
PITCHf/x would allow us to at least attempt to look at this, but when I went to do it, I ended up with a mile-long list of confounding factors. And if the majority of these factors aren't accounted for, how do we know if a batter took a called strike intentionally or because of a lapse in judgment? If we can't be relatively sure about this, I think it's a mistake to try to make a distinction. I think we're better off making our simple assumption.
Therefore, this issue remains untouched for now. Perhaps I'll look in the future, but for now, I think what we'll be using will be perfectly fine for our purposes, and certainly more accurate and more in-depth than BB/K.
Derek Carty, 23, has also been published by NBC's Rotoworld, Sports Illustrated, FOX Sports, and USA Today. This season, he'll be contributing to FanDuel and will be linking to all of his work at DerekCarty.com. In his three years competing in expert leagues, he has won 2 titles with 4 top three finishes, including a LABR NL title in 2009, making him the youngest person to ever win a major expert league title. Derek is a proud graduate of the MLB Scouting Bureau's Scout Development Program and is a firm believer in the importance of combining stats and scouting. He welcomes questions via e-mail, Facebook, or Twitter.