THT Essentials:

#### Get It Now!

The tenth Hardball Times Annual is now available. It's got 300 pages of articles, commentary and even a crossword puzzle. You can buy the Annual at Amazon, for your Kindle or on our own page (which helps us the most financially). However you buy it, enjoy!

And here's the full roster.

## Evaluating xWHIP

Posted by Martin Alex Hambrick at 3:02am

The writer is xWHIP 2.0 Calculator co-collaborator and creator of Simple/Quick xWHIP. You can download the xWHIP 2.0 Calculator (requires Excel or Open Office) by clicking here.

About six months ago, I began tinkering with the concept of predicting WHIP. After all, WHIP is one of the most widely used indicators of pitcher skill, and in most fantasy leagues, it's a category. Additionally, the components should, in theory, be relatively easy to predict. Walk rate tends to remain consistent from year to year, and the number of hits a pitcher allow should regress, much like a pitcher's BABIP.

I set about predicting WHIP by regressing hits against various standards. While I was looking at the numbers, I noticed an interesting trend: WHIP correlated extremely strongly with K/BB ratio. I was interested in turning correlation into something useful for predicting WHIP, but I wasn't sure how to. So, I took the extremely scientific (sarcasm) path of using Excel to generate the trend line equation and then worked backwards from that.

I was left with the following equation: 1.54 - .512K = xH/IP . I rounded out the numbers, threw bases on balls into the mix, and was left with the following: 1.5 + (BB - .5K)/IP = xWHIP

This correlated extremely strongly with career WHIP, with an R-squared and R value of .6758 and .822 , respectively. However, it consistently estimated WHIP far higher than the actual results. So despite the strong correlation, the accuracy wasn't quite there. I tinkered with the variables involved and tried to standardize it based on factors such as league hits/out, and was left with the following formula, which worked well and was quite accurate. But I didn't have any explanation for why it worked. "It just works" was my motto:

1.375 + (BB - .5K)/IP = xWHIP

I sat on this formula for awhile until I met another stat-head, Jeff Gross, who had independently come up with the idea of xWHIP. However, he was using a far more scientific means of generating xWHIP, regressing against real game data averages for various events: ground balls, fly balls, pop-ups, line drives, etc. All of these tend to produce outs at a specific rate that remains constant from year to year. So if you have the number of batted ball types a pitcher allows, you should be able to predict the number of hits he will allow and thus predict his WHIP.

We worked together with the formula for expected hits, and came up with an additional statistic: expected outs, which would be used in place of innings pitched. While Jeff worked on further refining the formula, I reexamined the "Simple xWHIP" formula.

Using Jeff's formula, I plugged in the statistics of an "average" pitcher, one who throws an average number of ground balls, fly balls, line drives, and pop-ups. The only two variables I left unknown were strikeouts and walks. After simplifying the equation, I was left with the following: 1.3747 + (BB - .496*K)/IP. Since the goal is simplicity, and these numbers are close enough, we can round up and get exactly what I got before:

1.375 + (BB - .5K)/IP = xWHIP

Finally there was a valid basis for my formula that "just worked." After discovering that, I went to work at checking the statistical validity of both xWHIP (using the expected innings formula, not actual innings pitcher) and "Simple xWHIP." I did two calculations for each year: predictive and evaluative. Evaluative is simply comparing 2010 xWHIP vs. 2010 WHIP. Predictive is taking a pitcher's career xWHIP until a given year and using that to predict that year's WHIP.

The results were as follows:

As you can see, xWHIP does a pretty good job as both a predictive and evaluative stat. Stat-heads can look at the R^2 value. For the lay person, just look at "Accuracy." What that says is, if you picked a pitcher at random, his xWHIP would be, on average, that much higher or lower than his real WHIP.

As you can also see, xWHIP and "Simple xWHIP" are remarkably similar in their predictive and evaluative power. Full xWHIP is clearly the better choice when you have all the data in front of you, and it will more accurately reflect the abilities of extreme groundball and flyball pitchers. But, when it comes to simplicity, you can't beat Simple xWHIP. In many cases, you don't even need a calculator to figure it out.

So, using Simple xWHIP, let's examine pitchers in 2010. First, we'll look at the 10 pitchers whose xWHIP was significantly higher or significantly lower than their WHIP.

This gives a pretty good indication of people to steer clear of. Most of these are pretty obvious, but Matt Cain and Ubaldo Jiminez both strike me as pitchers a lot of people would overpay for.

On the flip side, some interesting names top the list of people poised for a bounce-back year. Of course, some of these you still want to avoid like the plague—a .13 dropoff from a 1.56 WHIP is still pretty bad. I'm looking at you, Paul Maholm.

Now, let's look at the people with the best WHIP, and what you can expect from them next year:

Cliff Lee, Jered Weaver, Roy Halladay, Mat Latos, Josh Johnson, Shaun Marcum and Justin Verlander are all poised to remain at or below a 1.2 WHIP, and you should pay face value for them. A few guys, like Adam Wainwright, Ted Lilly and Roy Oswalt should still be high-end WHIP pitchers, but be aware that they pitched far beyond their abilities last season.

Finally, let's take a look at the pitchers with the best xWHIP and see if we can find any high value pickups.

Most of the pitchers with very low xWHIP also had very low WHIP, so you will have a hard time getting them for a good value. James Shields, Scott Baker and Francisco Liriano seem to represent the best values here. Dan Haren and Tim Lincecum are good values in theory, but more than likely you will see people pick them up simply because they have the marquee name, so you may still have to overpay to grab them.

So there you have it. xWHIP and Simple xWHIP. Simple xWHIP, like FIP, is easy to calculate, yet still very powerful. xWHIP is a bit more complex, but accounts for much more and will give you a more accurate view of any specific pitcher, especially an extreme groundball or flyball pitcher.

Danys said...

Of course, if you only focus on ERA and Ks, you’ll end up with a low WHIP anyway.

Posted 02/04  at  10:05 AM
Jeremy said...

Wouldn’t extreme flyball pitchers consistently outperform their simple xWHIP?  This would explain Cain’s presense, among others on the “outperformer” list.

Posted 02/04  at  10:08 AM
Richard Kenno said...

The last two lines in the article showing your equation 1.375+ (BB-0.5*K)= XWHIP, appear to be missing /IP under (BB-0.5*K) and the equation in line above is 1.5 -(BB-0.5*K)/IP = XWHIP. Should the equation not be 1.5 +(BB-0.5*K)/IP = XWHIP?, otherwise a pitcher that is wilder, i.e. who gives up more BBs would be calculated to have a lower WHIP.

Posted 02/04  at  11:09 AM
Martin Alex Hambrick said...

Danys- This isn’t the case. Gio Gonzalez and Jaime Garcia are two good examples from 2010. You could have 200+ Ks and a sub-3 ERA, but if your K/BB ratio is ~2:1, you will still have a pretty bad WHIP, xWHIP and Quick xWHIP.

Posted 02/04  at  11:09 AM
Martin Alex Hambrick said...

Good catch. The correct formula should be

1.375 + (BB-.5K)/IP

I will update the article.

Posted 02/04  at  11:12 AM
Jeffrey Gross said...

No, high IFFB guys tend outperform their xWHIP, not extreme FB guys.

Posted 02/04  at  11:34 AM
Martin Alex Hambrick said...

Jeremy- To be sure, there is some impact. But it is relatively minimal considering the other variables involved. An extreme FB pitcher would outperform his simple xWHIP by approximately .015. With Matt Cain, I suspect his low HR/FB rate and his low LD% in 2010 had far more impact than His FB%.

Posted 02/04  at  11:44 AM
Danys said...

But neither Gio Gonzalez nor Jaime Garcia had 200 Ks in 2010. And only Garcia had a sub-3 ERA. And Garcia was the only pitcher with a sub-3 ERA with a WHIP below the median for qualified pitchers. And he’s not exactly a strikeout pitcher and gives up too many walks.

My point being, looking at K/9, BB/9, K/9-K/BB (correlates better with ERA than K/BB), and ERA is much more useful than WHIP, which can vary due to BABIP.

Low ERA + High Ks + High K-BB = Low WHIP.

The peripherals are much more useful than actually using WHIP.

Posted 02/04  at  01:03 PM
Martin Alex Hambrick said...

Danys, your initial post didn’t mention BB at all. So I picked pitchers with high BB to show that a pitcher can have a low ERA, high K and still have a high WHIP.

Regarding your second post, the entire point of my post was to illustrate how strongly WHiP correlates to K and BB. We don’t really disagree on anything. Ultimately, Quick xWHIP is simply a way to rephrase K/9 and BB/9x We can argue the usefulness of WHIP, in fact my research originally started as a way to show the inherent variability in WHIP.

But at the end of the day, WHIP is still a popular barometer, readily available, and it’s a category in almost every fantasy league. So being able to predict it with a few simple statistics can be valuable.

Posted 02/04  at  01:29 PM
Jeffrey Gross said...

Thanks for all the writeup Alex!

Posted 02/04  at  04:50 PM
dave smyth said...

I did this years ago but included HR, and I got

1.26 + BB/IP + HR/IP - .5*K/IP

Posted 02/06  at  08:33 AM
Martin Alex Hambrick said...

Dave, I like where your head is at, but I do have two issues. Firstly, by including HR directly, you aren’t accounting for the impact of luck; HR/FB tends to regress to ~10%+-.5%. This dovetails with my second issue: HRs are hits, plain and simple. The more numbers you use that directly include hits , the less accurate your statistic will be at predicting next years performance.

I would be interested in seeing the research that led to the constant and coefficient in your formula. As far as I can tell, then 1.26 number seems pretty accurate. How did you come by it?

Finally, I think you can take the concept and tweak it a bit to make it a better predictor. By replacing HR with FB, you can effectively account for almost everything that a pitcher controls (besides Popups): K, BB, HR, FB, GB, LD. (IP implicitly gives you the number of batted balls a pitcher allowed. Subtract FB, normalize LD to ~19% of the total batted balls, and voila- you have GB).

So that said, you could add a bit of complexity to Quick xWHIP to make it more effective by changing it to:

1.26 + (BB -.5*K + (League HR/FB rate * Park Factor)*FB)/IP

Again that assumes the accuracy of the 1.26 constant, I would be curious to see how you came to that number. Of course, the more complex you make the stat, the closer to Full xWHIP it becomes. And Jeff already gave us a wonderful calculator to figure out Full xWHIP.

Posted 02/07  at  07:58 AM
Page 1 of 1 Commenting is not available in this weblog entry.

<<Previous Post:  2011 top 10 prospects update