# The great strikeout debate (Part II)

About a month ago I introduced the idea that the common measure of strikeout ability for pitchers, strikeouts per nine innings (K/9), is flawed and suggested a better measure which I named True K percentage. True K percentage is different from K/9 in that its baseline is at-bats instead of outs. It is also different from strikeout percentage (K%) in that walks are filtered out of the equation because I believe control and strikeout ability are two unrelated skills for the most part.

To get a better understanding of why some of these decisions were made the way they were, I encourage you to read over the first “Strikeout debate” article and the accompanying comments.

As another refresher, here are the exact formulas I am using:

K/9 = (K * 9) / IP

K% = (K / TBF) * 100

True K% = (K / K + BIP) * 100

Now that were have discussed the pros and cons of all three strikeout measures—K/9, K%, True K%—in a theoretical sense, let’s roll out the numbers for each pitcher and see who they disagree on. The following chart shows the top 25 starters for each measure in 2009, with a minimum of five games started (165 starting pitchers qualify).

```           K/9                            K%                          True K%
1 Rich Harden         11.23    Javier Vazquez     31.16%    Rich Harden        34.64%
2 Javier Vazquez      11.21    Justin Verlander   30.37%    Justin Verlander   33.76%
3 Justin Verlander    11.05    Rich Harden        28.97%    Javier Vazquez     33.65%
4 Jon Lester          10.62    Tim Lincecum       28.14%    Jon Lester         31.58%
5 Tim Lincecum        10.53    Jon Lester         27.83%    Johan Santana      30.95%
6 Johan Santana       10.37    Johan Santana      27.58%    Tim Lincecum       30.93%
7 Jake Peavy          10.14    Jake Peavy         27.46%    Jake Peavy         30.77%
8 Jorge de la Rosa     9.62    Zack Greinke       26.43%    Chad Billingsley   29.61%
9 Jordan Zimmermann    9.47    Dan Haren          25.50%    Jorge de la Rosa   28.57%
11 Daisuke Matsuzaka    9.29    Jordan Zimmermann  24.90%    Zack Greinke       27.87%
12 Max Scherzer         9.27    Jorge de la Rosa   24.49%    Max Scherzer       27.82%
13 Zack Greinke         9.25    Erik Bedard        23.99%    Yovani Gallardo    27.80%
14 David Purcey         9.12    Max Scherzer       23.96%    Clayton Kershaw    27.56%
15 Josh Beckett         8.96    Yovani Gallardo    23.91%    Dan Haren          27.44%
16 Erik Bedard          8.91    Josh Beckett       23.24%    David Purcey       27.08%
17 Jonathan Sanchez     8.89    Felix Hernandez    23.20%    Erik Bedard        27.08%
18 Yovani Gallardo      8.88    Clayton Kershaw    22.79%    Edinson Volquez    26.86%
19 Felix Hernandez      8.86    Wandy Rodriguez    22.73%    Josh Beckett       26.48%
20 Clayton Kershaw      8.72    Roy Halladay       21.78%    Jonathan Sanchez   26.42%
21 Dan Haren            8.62    David Purcey       21.67%    Joba Chamberlain   25.66%
22 Edinson Volquez      8.52    Edinson Volquez    21.56%    Felix Hernandez    25.61%
23 Wandy Rodriguez      8.47    Randy Johnson      21.48%    Wandy Rodriguez    25.51%
24 Oliver Perez         8.31    Josh Johnson       21.33%    Randy Johnson      24.71%
25 Joba Chamberlain     8.24    Jered Weaver       21.23%    A.J. Burnett       24.54%
```

As you can tell by looking across the rows and finding different pitchers, there are significant differences for a lot of them. Even the best strikeout pitcher is questioned with K/9 and True K% saying it is Rich Harden while K% likes Javier Vazquez.

Since we understand the formulas behind the three, we know why some pitchers are ranked higher in some than in others. A pitcher like Dan Haren will be ranked more highly by K% since he walks very few batters. And Oliver Perez has the greatest difference in K% and True K% because of his 8.72 BB/9 rate.

But what type of pitchers are ranked most different between K/9 and True K%? It is harder to define the type of pitcher so lets look at those with the biggest gaps.

The five pitchers with the greatest differential between their K/9 and True K% ranked higher by True K% are:
{exp:list_maker}Mark Buehrle
David Bush
Chris Carpenter
Johnny Cueto
Kyle Davies
{/exp:list_maker}
The five pitchers with the greatest difference between their K/9 and True K% ranked lower by True K% are:
{exp:list_maker}Chien-Ming Wang
Ricky Nolasco
Oliver Perez
Dana Eveland
Kevin Slowey {/exp:list_maker}
I was not exactly sure of the relationship between these pitchers until I had finished the list of pitchers that are ranked lower, and I realized that all of those pitchers had terrible starts with the exception of Slowey. Then I began thinking what caused their poor performance and realized BABIP had a lot to do with it. Checking out their BABIPs, I found that even Slowey has an unlucky BABIP of .351 and the group as a whole has an average mark of .406.

Then it was easy to realize the first group, those ranked higher, must have relatively low BABIPs. I was right; their collective average BABIP is .249, led by Carpenter’s .210 mark.

If you think about it this should make sense that the difference is BABIP-dependent since for True K% you are dividing by all balls in play while in K/9 you are only dividing by those balls in play that go for outs. The difference between all balls in play and balls in play that become outs is balls in play that become do not become outs—or what you would otherwise call hits. And hits are the driving force between a high or low BABIP.

My next thought was that the pitchers whose rank is about the same for both measures True K% and K/9 would have BABIPs about league-average .300. That also turned out true as the 11 pitchers with no change in rank averaged a BABIP of .307.

This best shows how True K% is superior to K/9 because True K% is not wrongly affected by BABIP, which as far as I am aware of, is not something that should have any effect on a pitcher’s strikeout rate.

You should not think of True K% as an attempt to predict K/9, you should use True K to completely replace K/9. After seeing the numbers for the first time, I began wondering how many times I must have quoted a pitcher’s decreased K/9 rate as the reason for his problems when really it was poor BABIP luck showing up again in his strikeout rate.

The main practical use of True K% that you can identify some pitchers whose perception of their skills is incorrect, making them good trade targets. In the next article, I will get a spreadsheet up of the True K% numbers for all pitchers over the last few seasons and point out some specific pitchers whose perception of their strikeout ability may be off because of a difference in their K/9 and True K numbers. Now I will turn over the floor to any of your thoughts…

Thank you to Fangraphs for data and Derek for discussing the True K formula with me.

0000
« Previous: Two emerging talents
Next: More on the LIPS/FIP discrepancy »

1. Mark said...

These were 2 GREAT and informative articles!!!

2. Jacob said...

Isn’t your interpretation completely backwards? True K% appears to be overrating pitchers with low (lucky) BABIP, and underrating those with high (unlucky) BABIPs. Nolasco, for instance, has been popping up on a lot of buy-low lists for the past few months, on the assumption that his BABIP will regress towards the league average.

3. Paul Singman said...

Jacob,

True K is not overrating pitchers with low BABIPs and underrating pitchers with higher BABIPs; K/9 is overrating the pitchers the with high BABIPs and underrating those with lower BABIPs.

True K% is correctly ranking the second group of pitchers lower. This should make sense since a higher BABIP = less outs, which means the denominator of the K/9 equation is less (less outs, less IP). A lower denominator means the overall value fraction will be higher, therefore True K is correct in ranking these pitchers lower.

When I get the numbers sorted out and have them organized for the last few seasons we will be able to look at specific pitchers and I think it will become more clear.

4. Jacob said...

Are we looking at this as an indicator of past success? I think we would agree that the pitchers with the high BABIPs did not have a lot of success, and thus, K/9 overrates that success.

However, we would EXPECT True K% to rise to match K/9, rather than the other way around, as their BABIP and thus their BIP regress to typical values, while their strikeout rates remain the same.

Thus K/9 seems more accurate as an indicator of future quality—we expect K/9 to raise constant, while True K% rises and falls with BABIP.

5. Paul Singman said...

Hmmm, Jacob you might be correct. I need a little more time to think about it which I don’t have right now.

I did create True K% as a better way to determine which pitchers truly were the better the strikeout pitchers based on their past performance, not expected future output.

If that is true, though, then True K% would have little fantasy relevance but might be useful to someone trying to historically compare pitchers strikeout rates.

Nice catch.