|Are we really so sure that Matt Cain has some sort of special home run-preventing skill? (Icon/SMI)|
Last week, there evolved a long discussion about DIPS, regression to the mean, xFIP, HR/FB, BABIP, and forecasting in general over at RotoWire. I linked to it here at THTF, and a few of you posted comments to it. There was one comment in particular that I wanted to respond to at length:
Dan Haren is singlehandedly destroying my faith in FIP, xFIP, and SIERA.
I’ve kept him on my team all year long and he just continues to kick me in the nuts, like today with his 7 earned runs allowed. His K/BB is great, his strikeout rate is good, yet his BABIP continues to be off-the-charts high.
Meanwhile, I dropped Tim Hudson in early May because I thought his .240 BABIP was not sustainable and his K/BB was barely at 1.0. Hudson just keeps rolling along. His BABIP is now .235.
I’m beginning to think too much knowledge is a bad thing. I make moves based on underlying peripherals and with the thought of “regression to the mean” in mind, and I’m behind owners who pick up Carlos Silva and Livan Hernandez.
Another commenter followed with:
You need to look at the actual player too. Some players are good at bettering the stats while others don’t live up to them. Matt Cain seems to better them while someone like David Bush is not. Bush had 2 season with a 1.14 WHIP and his ERA was like 4.4 and 4.2.
The coin flip analogy
While Matt Cain has posted better-than-average HR/FBs for a few years now (probably the best and longest we’ve seen since batted ball data has become available), that doesn’t necessarily mean he’s any better at preventing home runs on fly balls than Dave Bush. Think about it this way: If we have 8,000 fair coins and we flip them, probably 4,000 will land on heads and 4,000 on tails. If we take the “heads” coins and flip them again, about 2,000 will land on heads again. Flip those, and you get 1,000 of them landing on heads. Do this another nine times, and you’ll probably end up with two or three coins landing on heads each time.
But are these coins any different than the others we’ve been flipping? Is there something special about them that makes them more likely to land on heads than one of the original 4,000 to land on tails? Of course not. I told you in the beginning that they were fair coins. So if we flipped those last two or three another 8,000 times each, I’ll bet you they land on heads close to 4,000 times each.
While it’s hard to view humans in this way, we do know that humans don’t have ultimate control over everything in a baseball game and that random chance is involved. If it weren’t, we’d have a much easier time projecting performance.
But which coins will they be?
|Most players are clustered toward the middle, but when a dataset is distributed normally, there will always be a few outliers in the 0.2% area.|
We know that stats like HR/FB follow a (relatively) normal distribution (the same as our coin flips would). They form a bell curve (of sorts), with most players clustered toward the middle, but there are always outliers who are far removed from the middle. We also know that these outliers are rarely the same from year to year—the same as if we performed our coin flip exercise several times and marked each coin, we wouldn’t end up with the same two or three coins at the end of each trial. They’d always be different coins, even though we could be certain that we’d always end up with two or three of them. But predicting precisely which two or three would be impossible to do beforehand.
And the same holds true for things like BABIP and HR/FB. Sure, Livan Hernandez and Tim Hudson are having years where their ERAs don’t match their peripherals. But ask yourself this: How long do you expect them to continue doing that? If you don’t answer “indefinitely, because they truly deserve low BABIPs and HR/FBs,” then don’t beat yourself up. There’s nothing you can do, because the fact of the matter is, they are getting lucky. For the 2010 season, they are those final two coins remaining from the 8,000 flips. And it’s as simple as that.
And I put my money where my mouth is. I happen to own both Livan Hernandez and Carlos Silva in LABR NL this year (part of a strategy that involved owning a few crappy pitchers), but despite their successes, I’ve only used Livan for 87 innings and Silva for 70 (though I have begun to start Silva regularly over the past couple months because he’s combined legitimately good peripherals with a change in approach. Our coin flip example would still hold for him to an extent, though, because no one expected him to outperform his projections to this extent unless they scouted him in Spring Training and noticed his improved change-up, improved breaking ball, renewed control, etc.)
To go along with this, I wanted to bring up one last comment from a post I made at the CardRunners site:
Dan Haren is an example of a first half ace. He’s a bum every second half… not only does his ERA jump about a run (3.29 to 4.22), but his WHIP goes from 1.10 to 1.31.
First half ERAs from 2006 to 2009: 3.52, 2.30, 2.72, 2.01. Second half from 2006 to 2009: 4.91, 4.15, 4.18, 4.62
Like with BABIP and HR/FB, “second-half ERA” is a stat with lots of variation. It takes many years to stabilize, and because it’s normally distributed, there will always be outliers, especially when dealing with smaller samples. In Haren’s case, we are dealing with a small sample of four poor second halves (plus two years where his second half was better than his first), so claiming that he’s merely a “first-half ace” may be a bit hasty.
So does that mean we know nothing?
No, it doesn’t. Just because it’s possible that Matt Cain is a true 11% HR/FB pitcher doesn’t meant that he absolutely is. Along with knowing that we’re looking at a mere sample and that what we’ve seen could be simple random variation, we have seen something. And what we’ve seen for Cain is a career 7.8% HR/FB. So what we do is weight his career and regress to the mean to remove the effects of luck as well as possible. Once we do that for Cain, we probably arrive at an expectation for his HR/FB of around 9% or so.
And as I said in my previous article in this long-running discussion, that expectation would change if we have other data (such as scouting or a PITCHf/x study). But unless we have that data, that’s the best we can do.
I think that covers everything I wanted to cover, so if you have any questions or comments, feel free to let me know. I’m sure there will be some of you who will still be skeptical, so feel free to voice your concerns if you are.