Do we use luck and randomness as a crutch?

by Derek Carty
July 30, 2010

On Wednesday, I posted an (entirely self-serving) article over at the CardRunners site, which could be summed up as half whining about my bad luck and half praising myself for entirely rebuilding my roster to compensate. In other words, nothing you’ll find interesting.

One of the commenters on my post, however, was leaguemate Chris Liss of RotoWire. One of his comments in particular stuck out, so I wanted to repost it here and then respond:

Dave Cameron had a good post about randomness in which he used the clearly random example of the NFC winning 14 straight coin tosses to illustrate that Dan Haren’s .350 BABIP could in fact be dumb luck and not have any cause, e.g., bad mechanics, tipping pitches, etc. that people tend to ascribe in those situations. And that’s entirely true. BUT—it’s also wrong to assume that his .350 BABIP must be dumb luck. It might well be, and it might not be. There could be a problem with his location, mechanics, etc. that partially or entirely explains it.

I think a mistake that a lot of the sabr community makes is to assume that bad pitcher BABIP is always bad luck, or bad HR/FB rate is always bad luck. Sometimes, there is something wrong.

In fact, Todd Zola sent me BABIP data by count—and BABIP goes up reliably as the count gets more hitter favorably – like .315 on 3-0, and .285 on 0-2. It’s .305 on the first pitch. So let’s say a guy like Haren (or Aaron Harang or Dave Bush) gets a rep as an extreme strike thrower—then batters might swing more often at the first pitch, rather than take a pitch and get behind.

There are probably better examples out there, too—just wanted to point out that not everything that might be luck is in fact luck. The hard part is figuring out which is which.

I’ll absolutely agree with Chris that what may look like bad luck can—sometimes—be a legitimate problem. What often goes unnoticed is that part of what we “sabr” folk consider statistical regression to the mean is, in actuality, players making adjustments. If a player truly is tipping his pitches, he’s either going to be out of the big leagues before we can see him “regress” or he’s going to make the necessary adjustments to stick around long enough for us to actually see him “regress.” Of course, that’s not all that regression to the mean is—part of it truly is just mere statistical theory in the mold of Dave Cameron’s coin flip analogy — but it is definitely a part.

The Haren example

In Chris’ comment, he says, “I think a mistake that a lot of the sabr community makes is to assume that bad pitcher BABIP is always bad luck, or bad HR/FB rate is always bad luck. Sometimes, there is something wrong.” While this is absolutely true, Chris has made it seem (at least to me) that the instance where “something is wrong” comes along more often than it actually does, or more often than we’re truly able to identify it.

Perhaps he’s just taking this stance because he perceives the guys on the other side of the argument to hold the polar opposite view, and his actual views are more balanced, but I think, in this instance, Chris is overstating how often the “there is something wrong” scenario actually occurs. To continue using Haren as our example, let’s lay out what we know:

Haren has posted a .355 BABIP to this point in 2010
With a career BABIP of .305 career over 1,400 innings, his 2010 figure is very abnormal
It takes roughly six full seasons for BABIP to normalize

Given these known facts—six full seasons!—you’re going to need to show me some very compelling evidence that Haren will not regress. That’s not to say that the evidence doesn’t exist, just that if we’re not buying into a Haren regression, we’re either being extremely foolish or we have some very convincing evidence at our fingertips. It’s entirely possible he’s tipping pitches or is having trouble with his mechanics, but the odds of him regressing are simply too great to ignore if we don’t have proof to the contrary.

The one other thing we need to consider is that if Haren is indeed tipping pitching or struggling with his mechanics, it’s highly unlikely that it would only manifest itself in his BABIP. Analysts will often say that BABIP is all luck, but that’s not really the case. If we were to put a Little Leaguer on the Diamondbacks and allow him to throw 200 neutral-luck innings, I guarantee you he’s posting a BABIP above .500. BABIP is in large part luck given that the pitcher in question is a bona-fide big leaguer.

In the case of the Little Leaguer, that .500 BABIP is going to come along with an 0.0001 K/9 and a 15.0 BB/9. He’s not a legitimate big leaguer, so a high BABIP is expected. Haren, though, is posting monster strikeout and walk numbers. Guys who post monster peripherals don’t consistently have high BABIPs. It just doesn’t happen. I defy you to show me one example in the history of baseball of a pitcher with a 9-plus K/9 and sub-2 BB/9 but whose BABIP stayed over .350 in the long run.

And even if we’re only talking about the short-term here, if Haren’s high BABIP is a result of tipping pitches or doing something that bona-fide big leaguers can’t get away with, it’s highly unlikely that he’d also have peripherals worthy of a 3.32 xFIP—because those problems would affect his other numbers too! While it might be “wrong to assume that his .350 BABIP must be dumb luck,” it’s highly, highly probably that it is dumb luck. Unless you can show me evidence that it isn’t.

More musings on luck and randomness

This ties in with another of Chris’ comments on the CR post:

I will take issue with one premise though that I think is not entirely true—when your players play worse or better than they have historically that is not bad luck… it seems like people are alleging that buying a breakout player is dumb luck. It’s not. Maybe you couldn’t predict the extent to which he’d break out, but for example, as loathsome as it is for me to give Eric any credit, he deserves it for rostering Josh Hamilton. And he’s entitled to whatever massive numbers Hamilton puts up even if he didn’t specifically foresee them because that was part of the bargain he made when he bought him—that possibility.

Without getting too heavily into this (I disagree that players over or underperforming projections is completely independent of chance), I wanted to delve just a bit into distinguishing when we are truly predicting breakouts and when we’re merely getting lucky—and deciphering one from the other is no easy task.

I think fantasy analysts—and I’m implying no one in particular here—sometimes fall into a confirmation bias trap of seeing their breakout picks pan out and automatically calling it a success, even if the original analysis supporting the pick was shotty.

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

While I’m picking on Chris (kidding; I’m not really picking on Chris), one example of a breakout player that jumps to mind is Ricky Romero, who Chris drafted and has trumpeted his success with. Not to imply that the analysis was “shoddy” here (I don’t know what Chris’ analytical process with Romero was), but if we’re going to take credit for predicting Ricky Romero’s breakout, I think we need to make it clear why we thought he would break out. And it needs to be more than just “the ground ball rate last year really jumped out at me.” Dana Eveland had a better GB percentage than Romero last year, but he hasn’t broken out (quite the opposite, actually).

Again, this isn’t meant to be a shot at Chris in the slightest. I’ve made it clear in the past that I have a lot of respect for Chris, and he is the one winning the CR league right now. I’m quite sure there was more to it with Romero than just “he has a good groundball rate.” I would be interested in hearing about it, though.

My point is that I think we, as fantasy analysts, should be held accountable for our analysis and predictions. Or at the very least, we should need to explain our reasoning if we take credit for predicting a breakout.

Radio appearance

For those interested, I’ll be appearing on RotoWire’s radio show today at 11:30 am EST to talk with Chris Liss about these sorts of things.

31 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Mike Podhorzer

13 years ago

I love the “we should need to explain our reasoning if we take credit for predicting a breakout” line. I’ve always felt this way as well. Tell me why you expected a player to perform the way he has that proves you “right” and then I will determine if you deserve credit or not based on your answer.

You thought Josh Hamilton would not only rebound, but have the best season of his short career so far, because his swing is too sweet not to? Buzz, you lose.

Mike Podhorzer

13 years ago

Another problem with the non-stats guys or “balanced” guys is they don’t seem to realize that there actually are stats available for many of the things they mention.

Chris brings up Haren and some other pitchers possibly throwing so many first pitch strikes that get swung at and are inflating their BABIP. Isn’t that easily checkable using PitchF/X and gasp…stats?? Instead of endless speculating, once again we could check the facts.

eric kesselman

13 years ago

I don’t think anyones suggesting breakouts are “completely independent of chance.” Very little is. I think the point (and perhaps people are being a bit too prone to jumping to extremes or accusing the other side of jumping to extremes) is that there is some skill involved here.

Next, sadly, Im going to make a Lissian quant/quaint type objection. I’m all for using stats and having quantifiable reasons for our beliefs. However, sometimes we have strong intuitive senses about player break outs. This feeling may not occur often, but when it does, it is extremely accurate. I’ll spare you the anecdotes, or list of players I’ve had strong feelings about, but my point is that I think its unfair to throw out anything that isn’t backed up by a justifiable or quantifiable change or rationale. We watch a lot of baseball, and even when I see a game where I’m aware something just gave me that feeling, I can’t always tell you what exactly it was. But I think it would be a mistake not to act on it.

I often feel the same way in poker, where I get a very strong feeling about my opponent’s hand, and I can’t tell you exactly why.

Where I tend to part ways with Chris is on pricing this feeling. I’d like to say ‘So if i’m right, what should i bid/bet?’ and that is the point where I think we can use quantifiable techniques to come up with a valuable play that is still based on my intuition.

Derek Carty

13 years ago

Agreed, Mike. I checked into that today, actually http://www.hardballtimes.com/main/fantasy/article/how-much-do-counts-affect-babip/

Eric,
TRAITOR!

Chris Liss

13 years ago

I liked Romero’s robust ground ball rate in a tough division with lots of power hitters, but obviously, I was aware of his very solid K rate, that he was in a major growth phase (second full season in the bigs), was a good prospect, etc. But the ground ball rate to me jumped out as an insurance policy of sorts, and sure enough, he’s allowed just 8 HR so far this year. But it’s not just stats as Eric says – it’s a feeling about a player from watching him, from tracking his games, from seeing what he did when he was on. I’d almost always rather have an inconsistent pitcher or one that fell off down the stretch after being really good, than an middling pitcher with similar stats when it comes to the end-game. But it’s all context-specific – if you try to form a rule from what I thought about Romero, it will probably steer you wrong.

And I realize there are first pitch swing stats for Haren, but you’re missing the point. I’m not saying that was the reason, just that it could be the reason is SOME cases that is other than pure randomness. I also wrote there are far better examples than that. I’m not even arguing that Haren’s bad cosmetic stats aren’t dumb luck – they very well might be. But if you want to buy Haren for me at his “luck-neutral” value, I’m selling. Because there might be something going on. I’ll also be letting you roster Aaron Harang whose xFIP is consistently good, but who has destroyed fantasy teams the last several years. Maybe it’s just bad luck – who knows? – but I’ll let you make that bet.

My feeling is that while the .300 expected BABIP hypothesis is correct generally (as is the 10 FB/HR one), it varies among particular pitchers, and at particular times for reasons other than luck we might or might not grasp. That the truth is more complex than “everyone regresses to .300 over the long haul,” and not to get wedded to one side of the debate, but try to take it case by case and keep the possibilities open. If I’m in a novice league, and someone’s selling Haren based on his cosmetic stats, sure I’m buying. But I don’t play in novice leagues, and in my experience, the money to be made is on the other side of the trade in my circles – selling Haren at his luck-neutral price to the regression zealots.

eric kesselman

13 years ago

Yeah, I wasn’t happy about it either.

Also, I’m not at all suggesting you can’t make meaningful conclusions entirely based on numbers, reasons, or analysis. I’m just saying we should ALSO be relying on our intuitive sense when we have a strong feeling.

Derek Carty

13 years ago

Thanks for the reply, Chris. I know you said there were better examples than count-based BABIP, but since that was the one you gave, I thought it’d be interesting to check it out. If you have better examples that you think would be testable, I’d be happy to run a study on them.

I also have a follow-up question for you that I’m curious to hear your response to. Do you feel the same way about “lucky” BABIPs as you do about “unlucky” BABIPs? That is, if you believe it’s possible for a guy like Haren’s high BABIP to be legit, do you consider it equally possible for a guy like David Price’s low BABIP to be legit?

And if you were the guy to draft, say, David Price this year, should you get credit for it, even though his success has largely been “luck”-driven?

Derek Ambrosino

13 years ago

I think I’ll write a bit more in depth about the idea of “taking credit” for predicting a break out for Wednesday’s column, but in a nutshell, it’s very tricky.

Branch Rickey famously said that luck is the residue of design.

Now, one way to interpret that in the context of this discussion is in relation to players’ seemingly lucky or unlucky performances. But, the other way of looking at this is from a fantasy GM point of view. To win, you need some luck, some legit break outs, so to speak, and some smoke and mirrors. Making wise decisions and reacting to what’s going on during the season put you in a better position to be the beneficiary of luck.

So, it really comes down to somebody’s body of work. If you’re out there making your opinions known, you can quite likely defend every one of your decisions reasonably well. So, I’m not totally sure we should be looking at being able to soundly claim credit for a prediction as being the barometer here. I’d focus instead on the overall body of work, figuring that if you play enough and know what you’re doing, you’ll get plenty of “illegitimate” booms and busts.

Further, often times we get caught up in strategy (because it can be discussed on a meta level), but execution was equally important. For example, let’s take Jose Bautista. Regardless of whether his break out is or is not legit, it seems like holding on to Jose Bautista was the proper move. But, could selling high also have been smart? Sure, it could have been – depends on what you sold him for. And, even if you held him, that may not have been the best move either depending on the context of your team.

Nick Steiner

13 years ago

Fantastic article Derek, you hit the nail on the head with Haren. The same thing happened with Smoltz last year – people readily accepted his BABIP as a refleciton of his skill because they had a good explanation (he’s old, has lost his stuff, etc). But guys simply don’t strike out 8 per 9 and walks 1.5 per 9 in the AL East if they are old and lost their stuff (well Smoltz was old and lost his stuff, but he was still damned good, as evidenced by his time in the NL).

I think the other thing people miss is that a player’s single season BABIP simply isn’t a reflection of how well he’s pitched. That doesn’t mean that pitchers don’t have legitimate skills and vices that should allow them to post higher or lower BABIP’s – it means that those skills are not manifested in the players actual BABIP because of the amount of luck that drives a players BABIP.

For example, Haren could be the worst BABIP pitcher in the game and could still easily luck into a .270 BABIP. Or he could be the best and could easily luck into a .350 BABIP.

At any rate, good job on the article again. Reading the comments to that FanGraphs post made my head hurt.

aweb

13 years ago

6 years to normalize BABIP is clearly far too long to take it seriously, if that is the case. 6 years is a long, long time for a pitcher – velocity drops, movement might change, new pitches get learned, defenses change, injuries happen. How is this 6 year figure determined? 6 full seasons is a pretty good career.

“I defy you to show me one example in the history of baseball of a pitcher with a 9-plus K/9 and sub-2 BB/9 but whose BABIP stayed over .350 in the long run.” – this statement is set up to be impossible, since pitchers who give up hits that often don’t last long. My favourite example of someone who had good K and BB numbers but stunk – http://www.baseball-reference.com/players/n/nakammi01.shtml
Long story short – he threw a breaking ball that was either devastating, or it hung and the batters killed it. Small sample size, obviously, but the minors and Japan had trouble with him, but MLB hitters just don’t miss the hangers enough (or hit singles off them as much). But he couldn’t have a long career, even if he could strikeout hitters without walking them. His xFIP was fine (around 4.00), but more HRs than BBs seemed likely to continue (as an aside, what’s the longest career where that happened?).

Oscar Heller

13 years ago

Excellent article in general, and the part about taking credit for breakout seasons deserves an article(s) of its own.

The point I really want to drive home is that in a very real way, results do not matter for evaluating decisions: what player you draft, what trade your favorite team’s GM makes, etc. To evaluate the quality of a decision-maker, the only thing you should look at is the decision-making process, not the result. If a GM signs a pitcher because (and only because) he won 15 games the past year despite a terrible ERA and peripherals, and the pitcher goes on to win the Cy Young, it was a bad decision. If I play roulette on a fair wheel with one or more zeroes, it’s a bad decision, no matter how I do.

If you pick up Luke Scott because you think he’s undervalued and might be a nice bench bat, and he hits 50 HR and carries you to a fantasy win, you don’t deserve credit for it.

Results only matter insofar as they bear on process. If you sign Mike Gonzalez without giving him an MRI (Orioles fan here) and then he gets injured, that result gives insight to flaws in the process. But if Josh Hamilton has a career year for your fantasy team, that result means nothing unless you had (past tense) both predicted it and had sound reasons for it.

The final point is that, in a game as inherently random as baseball, looking at results is sometimes actively BAD. Often, when looking at a front office, the process is opaque and we have no insight into why a particular move was made, and on what information. And in some cases, judging those moves on results not only adds no relevant information for judging the true talent of a GM, but actually gives BAD information.

A player’s season is like rolling a die. It can only end one way, and when it does end with a defined result, it obscures the fact that the season was still in one very real way a random result. If I roll a die once, it can’t accurately communicate its full range of outcomes the way a hundred rolls would. Just because Player X has been terrible this season doesn’t mean that Player X was definitely going to be terrible this season (although in a way he was ). Even though the die I rolled came up with a certain result, that doesn’t change the fact that the die roll was still random. And the decisions a GM makes regarding roster construction are die rolls (with asymmetric/unknown probabilities) that can only be rolled once. All a GM can do, and all we can judge them on, is how good the information they gathered was, and how good their decision-making process was with said information. And the necessary data for this kind of analysis is very difficult to find, but that doesn’t mean turning to results is an acceptable substitute – in fact, it is useless and misleading.

The final thing to consider, in this paradigm of roster construction as random events, is that the sample size of GM moves is appallingly small. I’m surprised that nobody’s really picked up on the fact that GMs are judged on a handful of moves (I can’t think of a GM who’s gotten out of the double digits in terms of meaningful player moves made) they make with a) imperfect information and b) inherent randomness. Would you judge a gambler’s skill based on 15 hands of poker, or a baseball player’s true talent on 15 (or 150) plate appearances? Then don’t pretend to have a bead on the quality of a GM based on 15 major roster moves – and the ONLY ones you can judge are the ones where you have insight into the process.

Results don’t matter!

matt

13 years ago

I think one of the problems is that a lot of people in the sabr community tend to be falling into the Duellish mindset of “Everything that can be invented has been invented.” Now, the real the smart thinkers continue to push the stats forward, bettering them regularly, but many people seem to look at something like xFIP and say, hey, it’s better than ERA, it must be perfect.
I think Chris is merely acknowledging the fact that we don’t know everything, everything hasn’t been invented, so if Pangloss is going to give me what player x is worth in the best of all possible worlds, I’m going to take it, because there’s a chance we don’t live in that world.

I’m a firm believer that people are too ready to rest on their laurels and blame all we can’t account for as luck. Granted, luck plays a big role in baseball, any player will admit that, but I also believe the old saying that luck is what happens when preparation meets opportunity.

Blair

13 years ago

For interest sake:

For the top 700 SP in IP(in history), I ran the correlation between K% : BABIP,BB% : IP, K/BB : IP.

There is a minor positive correlation between k%:BABIP, and a larger negative correlation between BB%:BABIP. And the Largest correlation (.3) between k/bb : BABIP.

What does this tell us?

Pitchers that have great K/BB rates have a higher expected BABIP, and are less likely to regress to “league average”.

So, yes, Haren should regress, but when we see pitchers who spike in K/BB & BABIP simultaneously, we can’t attribute the BABIP to “chance” alone.

Nick Steiner

13 years ago

Blair – the correlation by itself doesn’t mean anything. The slope (and p-value) are more important. Can you please post those numbers?

Matt

13 years ago

@Oscar,

But results do matter, besides in the obvious real way, they matter to the decision making process. Unless we know everything (and if there’s ever a time where it’s safe to use never it’s to say that we’ll never know everything) we can’t be sure that our decision making process is in fact sound.

If I pick tails in a coin toss and lose 14 times in a row, that’s a weird coincidence. If I pick tails and lose 114 times in a row, I need to be aware of that and maybe pick heads next time because maybe there are some forces at work that I don’t know about.
In an isolated event (or SSS), if your process appears sound and the results were negative, you can say it’s bad luck, but we learn whether our process is sound or not by weighing it against results. That’s the whole purpose of the scientific method they teach you when you’re a kid. You learn from results, they aren’t meaningless.

Derek Carty

13 years ago

Matt,
I was just having this conversation with Eric Kesselman. Results are important, but – as you imply – more in a macro-perspective where we can look back and see a large sample of what processes produced what results. In that respect, results are extremely important. But the result of a single event, which a lot of people tend to focus on and make judgments off of, is relatively meaningless.

eric kesselman

13 years ago

I agree with most of the above. Results DO matter, but you need the right sample size. I think people might be a bit too hasty to use short term results to prove something, and I also think people might be too dismissive of some results, claiming we ‘know’ where the guy’s value really is.

Question: How are we to deal with results over short samples sizes? Having no experience with say, football, how would someone use football stats to say anything meaningful?

Tim

13 years ago

I find the whole results v. evaluative process discussion to be extremely interesting. And an important piece of this discussion, to my mind, is draft picks. In any sport, GMs are routinely harassed for making “poor picks.” It’s important to look at those picks with the convenience of 20/20 hindsight. Greg Oden, at the time he was drafted, was considered to be a potential franchise center. Durant was more of a wild card. We can’t criticize the Portland front office for what was, at the time, very likely the right pick. (Sorry for the non-baseball reference; the draft just has a higher profile in other sports)

Chris Liss

13 years ago

*A player’s season is like rolling a die. It can only end one way, and when it does end with a defined result, it obscures the fact that the season was still in one very real way a random result. If I roll a die once, it can’t accurately communicate its full range of outcomes the way a hundred rolls would.*

To me this is a fundamental misunderstanding that many in the sabr community have. A player’s season is NOT like rolling a die. A player isn’t an All-Star baseball or stratomatic card with certain defined probabilities in each at-bat or each season. Players are alive, so they change. Dice do not change. Playing cards do not change.

Is Jose Bautista just lucky this year? Or has he changed? Is he the 18 HR guy, and every spin of wheel keeps shocking landing in the HR area? Or is he now a 35 or 40 HR guy?

YOU CANNOT KNOW for sure because unlike with a die, you can’t roll it 10,000 times. Each player is unique, and each season’s circumstances are unique. Yes, we can gather *SOME* information based on his history (though that would have failed us in Bautista’s case) and yes, we know things about players generally because there have been a lot of them, and they tend to do certain things like peak near 27 and decline appreciably in their late 30s. (But that wouldn’t have helped us with Bautista, either).

And the Haren example is the simplest one: K:BB intact, BABIP not. Even then, I don’t presume to know that it’s pure luck, and I agree that the six-year argument for BABIP is crazy as no pitcher is the same guy for six years straight besides maybe Rivera and Halladay

When we consider credit for breakouts, I’m not talking so much about better BABIP luck – any semi-literate baseball analyst can predict regression in either direction. I’m talking about skills growth. And not just a little growth due to nearing peak age and getting experience, but a true breakout where a leap is taken: Romero, Liriano (granted he was at this level a few years ago), Bautista, even David Price (just because he’s lucky doesn’t mean he’s not also good).

Let’s also not presume to know what we don’t. Luck is involved, and it’s pretty obvious we should expect *some* positive regression in Haren’s case. But I’ll remain agnostic as to whether it’s entirely bad luck, or whether there’s a repeatable cause. It might be 90 percent bad luck, 10 percent something else, or it could be 100 percent bad luck. I don’t know. And I’m going to value him slightly less based on that uncertainty.

eric kesselman

13 years ago

The point of the quote is merely that we care about the expectation of results, not the actual results. That’s pretty hard to disagree with, and I don’t think you do.

Your point (which you’ve made before in other forums) is just that we can’t KNOW in baseball what the expectation of results are. Unlike a die roll, a poker hand, or a strato simulation, we can’t just mathematically solve it for the real answer because we don’t have perfect information in baseball.

Instead, we are trying to infer who players ‘really’ are, and this often involves heavily relying on the results they’re actually generating. While there’s a ton of variance in there, we all know that there’s a correlation in there too. So we use results to try to figure out what we can meaningfully say about a player. The questions become: which results? over what time periods? How strong a statement can I make, and with what certainty?

Additionally you then complain that players’ skill sets change, and that complicates our analysis. We can’t separate variance from a possible skill set change. Is Jeter 2009 the ‘real’ Jeter? Or is Jeter 2010? Or were they both the ‘real’ Jeter, and he’s just declined between? Who can know?

I understand this is a tricky problem, but I always feel like you want to just shrug your shoulders, say we can’t ever know perfectly, and therefore the inquiry isn’t worthwhile. I don’t agree.

Derek Carty

13 years ago

Agreed with what Eric says, Chris. Additionally, you say “A player’s season is NOT like rolling a die,” but in many ways, it is. Even if we were to know with absolute certainty that Player X has true talent Y, we’re not going to say that there is a 100% chance he plays at Y level, because he won’t. Even if we know true talent with absolute certainty, there is still going to be variation around Y, because we’re looking at a finite sample size of 600 AB. That’s simply how the world works.

The reason a lot of analysts have gotten into the habit of saying that we can only ever reach “70% accuracy” is because once you reach that point, the remaining “30%” is random statistical variation, and there’s absolutely nothing you can do about that.

As to the 6-year thing, sure, player’s don’t stay the same over a 6-year period, but that’s not the argument. The argument is merely that BABIP has so much random variation that it takes 6 years worth of data for us to observe it stabilizing. If you want to apply your argument to the study itself, then say it takes 5 years or 4 years. It’s still a very long time. The 6-year thing isn’t saying that all pitchers stay the same, it’s a way to quantify the amount of regression necessary to include in any reasonable forecast.

You say you’re going to value Haren “slightly less based on” the possibility of his BABIP not being bad luck (and that’s reasonable), but isn’t that what regression does? Only more precisely? Regression says “Okay, BABIP has a lot of random variation in it, even if we know 100% what a pitcher’s true talent BABIP is. Haren’s BABIP is bad this year, but unless you tell me he’s tipping his pitches and therefore has a true-talent .340 BABIP, I’m going to regress his .355 BABIP to league average .305 (or maybe a little more precisely to a “good pitcher” BABIP of .295 or .300). And because BABIP in general is so unstable, I’m going to regress most of the way. Not all the way, so I can account for the possibility that it’s not bad luck, but most of the way because, in all likelihood, it is.” Isn’t that kind of what you’re doing, Chris, even if you haven’t thought about it in that way?

Chris Liss

13 years ago

Yes, that’s what I’m doing – I’m aware that Haren’s BABIP is probably bad luck for the most part, but leaving open a small sliver of possibility that it’s the result of a decline in some skill. But Haren’s kind of an easy case. What about Aaron Harang or Dave Bush – I’m leaving open a much larger possibility that those guys aren’t just unlucky but actually bad. How much do you discount those guys off their xFIP?

Also, can we please dispense with this kind of garbage:

*Even if we know true talent with absolute certainty, there is still going to be variation around Y, because we’re looking at a finite sample size of 600 AB. That’s simply how the world works*

Do you really think that’s even up for debate – that variance exists over a 600 AB sample?

The question is the extent to which you can know a player’s actual talent level. My hypothesis is that you can know his previous talent level with a fairly high degree of accuracy. And that generally one’s previous talent level correlates highly with ones present one.

BUT NOT ALWAYS. And the extent to which it doesn’t among the various players is where the rubber meets the road here. It’s 90 percent of the game.

All this other stuff you guys are talking about is beyond obvious. Sorry to be annoyed, but seriously, do you think I’m retarded or something?

Chris Liss

13 years ago

*I understand this is a tricky problem, but I always feel like you want to just shrug your shoulders, say we can’t ever know perfectly, and therefore the inquiry isn’t worthwhile. I don’t agree.*

Seriously, Eric – I’m shrugging my shoulders? No, I have my own way of determining whether I think a player will deviate strongly from his previously demonstrated skills. The quants treat breakouts of that sort as its own kind of variance. They just think I was lucky to get Ricky Romero. I think there’s more to it than that. In fact, i’d argue that they shrug their shoulders, roster a bunch of guys who could breakout with no particular preference, but I actually target particular ones and care which ones I get for the most part.

eric kesselman

13 years ago

While I often feel you skim my replies, you didn’t miss that. I don’t want to be forced to defend anyone else, or their techniques. That being said, I don’t think your characterizations are fair. The quants are perfectly capable of taking shots at break out candidates, and using methods similar to yours IN ADDITION to theirs. You keep trying to force them into one box. Bill and Robert for example had Feliz, Avila, Scherzer, Matuz, Wade Davis on their auction roster. I don’t think those players were picked because of past performance. Now maybe you are better at picking these kind of guys, and that’s a big skill. But you aren’t the only one trying.

And the question that is most valuable to discussion is ‘how should we be trying to do this?’ I’m very satisfied when you give answers like your Romero one. I’d love to see more of that kind of stuff. Maybe the best you (or I) can say is ‘its a case by case basis, and i just know it when I see it’ but that just doesnt advance the discussion anywhere. A lot of our dispute comes down to my effort to push to quantify and specify, which I feel you often resist with arguments like ‘a human being is not a die.’ But personally we really aren’t that far apart, (and I plan on narrowing it even more in the standings!) and I’ve said that before too.

I don’t think the answer is just to go and play a bunch of fantasy leagues and try to do it, anymore then the best way to get good at poker is just to go and play a bunch of poker. Experience/practice is a big part sure, but so is studying and research. In baseball we’re still trying to figure out exactly what to study, and that’s where the discussion should be.

eric kesselman

13 years ago

I’m sorry you’re annoyed, but you singled out a quote whose whole point was about focusing on expectation, and not results. I don’t think its unfair when you get stuff back about basic expectation and variance.

Now I admit your point wasn’t really to counter that principle (as I noted in my last post), but I do feel that you tend to present your objections more in a ‘We can’t know the truth’ vein rather than in a ‘this is a tricky problem- how can we approach it?’ That’s all I mean by my ‘shrugging the shoulders’ comment.

Anyway, here I agree with you 100%: “The question is the extent to which you can know a player’s actual talent level. My hypothesis is that you can know his previous talent level with a fairly high degree of accuracy. And that generally one’s previous talent level correlates highly with ones present one.

BUT NOT ALWAYS.”

And I agree that figuring out these candidates in advance is a huge part of the game (and you might be better at it), although I think pricing the ‘previous talent level’ is also a huge part of the game, and I disagree that prices are quite as market efficient as (I think) you do.

Chris Liss

13 years ago

And by the way, the post I cited was arguing that results don’t matter, only the odds going in. Which of course presumes that it’s like roulette or poker. But it’s not, and results are one of the ways we determine how many sides the die had, or how many zeroes were on the roulette wheel in baseball. Because we can’t simulate infinitely many results and find the true odds, the results we have are important. They would only not be important if somehow we knew the odds without them. When Bautista hits 40+ HR this year, that says something about what his baseline was before this year. Which we had no way of knowing without the results. But if someone looked at his swing, and what he did last September and projected him for 30 HR, that person would deserve credit. But only because we know the result!

Chris Liss

13 years ago

Also

*but I do feel that you tend to present your objections more in a ‘We can’t know the truth’ vein rather than in a ‘this is a tricky problem- how can we approach it?’ That’s all I mean by my ‘shrugging the shoulders’ comment.*

Isn’t the exactly the opposite true? I actually try to figure out how to solve the problem of players having unpredictable breakouts, whereas it’s the quants who chalk it up to variance. They’ve given up! So instead they work at the margins, trying to squeeze more efficiency out of the pricing of last year’s stats.

They’ve said: it’s too hard to reliably predict which players will break out, so we’ll remain agnostic and give up. Instead, we’ll focus on the known – last year’s stats (or this year’s projected ones which are just an adjustment of the past ones) and what they’re worth.

They’re shrugging their shoulders and giving up, while I’m actually pressing on and looking for ways to solve for this very tricky problem.

How is that not transparently clear?

eric kesselman

13 years ago

If you read what I wrote a few comments up, I think you’ll find we’re largely on the same page regarding results.

I wrote: “Instead, we are trying to infer who players ‘really’ are, and this often involves heavily relying on the results they’re actually generating. While there’s a ton of variance in there, we all know that there’s a correlation in there too. So we use results to try to figure out what we can meaningfully say about a player. The questions become: which results? over what time periods? How strong a statement can I make, and with what certainty?”

So I’m not sure where we’re disagreeing there. I’d also love for you to answer my football question. How do we deal with fantasy football where the sample sizes are always so insanely small? Now how do you make judgments with no meaningful results?

As for the tricky problem of changing baselines, and predicting breakouts- I’d love to hear more detail about how you go about solving the problem. My issues with what you’ve written in the past tend to be because they aren’t quantifiable. My understanding of your process is that you

1) thoroughly study a bunch of stuff, from scouting reports, observations, to statistical stuff.
2) Think about and internalize what you’ve learned.
3) Make a bid based on the auction and in line with your particular strategy.

Not to re-open the whole CR debate, but when you go past step 1 (which I feel you’ve articulated well), you’re kind of this black box. What is going on? How can I replicate it? What can I learn from it? What hypotheses can I test?

Chris Liss

13 years ago

*Bill and Robert for example had Feliz, Avila, Scherzer, Matuz, Wade Davis on their auction roster. I don’t think those players were picked because of past performance.*

Yes they were. Past performance plus historical projected trajectory (they’re not unaware that top prospects do take a leap). But for them to deviate from that would be to make a choice and not to be agnostic. But Bill acknowledged that he was 100 percent agnostic and didn’t even see the other side of the argument.

How can an “agnostic” have a feeling about a particular player above and beyond his expected career trajectory? He can only roster that player if he feels the market has accidentally discounted that player’s expected career arc.

Unless you think agnostic/genius is merely a semantic debate and we’re all doing the same thing – finding bargains. But it seemed like they took a pretty clear stance that they don’t target players.

*I don’t think the answer is just to go and play a bunch of fantasy leagues and try to do it, anymore then the best way to get good at poker is just to go and play a bunch of poker. Experience/practice is a big part sure, but so is studying and research. In baseball we’re still trying to figure out exactly what to study, and that’s where the discussion should be.*

With poker, you can step back and break it down mathematically because the cards have fixed values. With baseball, they are variable and to some real extent unknowable. Or they’re knowable to a point. To me there are two ways to proceed:
(1) Push the knowable another couple percent (really dissect how much a HR is worth in a given context (which is also somewhat unknowable in advance – historical stacking of categories might have *some* value as a predictive tool, but I’m not really clear about how much weight to give that and whether the margin for error in a given year makes adjusting for it more helpful than harmful – and (2) Dive into the unknown which is 30 percent of what happens. Clearly (2) is where the real advances will be made and what makes the game worth playing.

How do we approach (2)? I’ve outlined my approach several times. Is it optimal? Probably not. But I’m convinced it’s better than punting and chalking it all up to unknowable variance. You are, too, because you deal with it the same way.

Can we derive some rules? Ron Shandler did a while ago with:

“If a player displays a skill, he owns it.” “Draft skills not roles, etc. etc. Lawr Michaels likes to draft 3rd year players – not rookies, not first full season, but the second full season where he’s got that experience under his belt. Personally, I like players who have about 700-1000 career at-bats. That’s often when the light bulb goes on. Shander’s 1-5 game scores for pitchers – roster ones with a lot of 1s and 5s rather than 2s and 3s.

How do we quantify each of these principles? It depends on the situation. I’m not sure how well it serves us to say someone in the 700-1000 AB sweet spot (if such a thing actually even exists) is worth $3 extra in a standard $260 auction. Maybe it does, and maybe one can quantify all of this into a formula and plug everything through it. But my hunch is that if you tried to do that, something would be lost in translation, and it wouldn’t be effective. But maybe I’m mistaken about that.

My theory is that once you have an eye for the breakout guys, your brain becomes the algorithm, and it’s fairly reliable over time. At least more reliable than just punting on the whole enterprise. But I’m just not that inclined to codify it into a formula. If you are, then I think you need to answer the questions – where do we start? What are some factors that can be quantified in your breakout formula? I gave you some starting points above. What else should we look at? And how do we know how to translate it into specific dollar adjustments?

Chris Liss

13 years ago

First off, do you not acknowledge that the agnostics are punting on the “tricky problem,” not those who actually try to anticipate breakouts for particular players? I must have skimmed over your retraction.

*Not to re-open the whole CR debate, but when you go past step 1 (which I feel you’ve articulated well), you’re kind of this black box. What is going on? How can I replicate it? What can I learn from it? What hypotheses can I test?*

We’ve been over this time and again, but you don’t seem satisfied with my answers even though in practice you obviously are since you play nearly the same way I do (and I’d argue not coincidentally are the only one with a good shot at catching me in CR):

IT”S A CASE BY CASE BASIS! I explained my rationale for Romero above, and also said, I wouldn’t draw some kind of rule about it lest it mislead people. Why did you like Josh Hamilton this year? Should one always roster former drug addicts turned All-Star coming off bad years? Is there a Josh Hamilton formula?

Sometimes something jumps out at you. What percent of pitchers get 2.3:1 GB/FB? It’s got to be less than 10, maybe 5? Derek can look it up. And how many of those K 7+ per 9? How many of those did so in their rookie years and in the AL? I’d say 1 – Romero. So if you want a rule – 2.3/1 GB/FB as a rookie, 7+ K/9 in the AL, and still goes for $3 in LABR and CR. I’d say that’s a buy. But there probably won’t be another one like that for a long time. But there will be other combinations of factors that make sense. Liriano was killing it in the Winter League, and his velocity was largely back. He was the best pitcher in baseball three years ago. Good type of player to buy for less than $15. David Price – A+ pedigree and stuff. As long as he stayed healthy, it was a matter of time – and look at the defense behind him. It’s a combination of pedigree, stats, scouting, health, age, experience, team context, park, etc. Obviously. The trick is in weighting these combined factors in the right proportion.

Could these be quantified and systematically aggregated? Maybe. I just don’t know consciously exactly how much weight I give to each. But with Romero the extreme ground ball rate jumped out at me – it was rare. A 7+ K/9 rookie pitcher in the AL East is already rare, but with that GB rate? Why was he so cheap? Didn’t make sense. So I got him everywhere.

It’s not rocket science. There will be other players whose price doesn’t make sense given the skill set, history and situation. Over time you develop your instincts for identifying these types of players. Good scouts do it. Talent scouts do it in the music or movie business, art collectors do it, etc. You develop an “eye” for it.

You want to test the hypothesis? You do it every year in your leagues when you pick the guys YOU think have a much better chance than others think to break out. How has it worked out?

How do you replicate that? You get good at it. Some things humans are still good it. Not everything is best done by a machine.

But regardless of whether I’m right or wrong about this – i.e., whether I actually have this ability or even if I don’t that someone has it – I’m trying to do this. I’m trying to go beyond the mind-numbingly boring world of “past performance is everything, and anything that deviates outside of individual and general historical norms is unpredictable variance.”

eric kesselman

13 years ago

ARggh. Just had a long post eaten.

Let me try to re-create:

I think you make some interesting arguments.

I do think the genius/agnostic thing is a bit confusing at times. I just checked my spreadsheet pre-auction. I had Hamilton at $25.3. I suppose this is at least $5 over the market valuation. I guess that does make me a ‘genius’ on him, although I was ‘agnostic’ in the sense I didn’t target him, and wouldn’t go over my valuation. I just expected to get him given my valuation was likely on the high side. I think you could make a similar argument for Bill/Robert on guys like Jered Weaver. Are they ‘geniuses’ or just ‘agnostics’ with a higher valuation for the player? Is that all it takes to be a ‘genius?’

I see a lot of similarity between your $3 Romero and (for example) their $4 Feliz. I don’t think its fair to say you’re doing something different in nature here. I don’t think its fair to say ‘they aren’t unaware top prospects leap forward.’ Well sure. That’s the whole question. Which prospects? How far forward? How much do you pay for them? Perhaps the difference is you might be willing to bid $10 for Romero, and they wouldn’t for Feliz.

I’ve only tended to disagree with you in the past because you often seemed to argue against trying to make inroads in these areas, particularly through quantifiable techniques. You really tended to say stuff like ‘your model wont work because these things are unknowable, or because base lines change’ whereas now you seem a lot more willing to concede its a tricky problem but these methods might eventually have some insight. I actually think we’ve reached a lot of consensus in this thread.

As to how to begin or what exactly to study, I have no idea.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG