Monday, August 10, 2009
In search of new inefficiencies in the fantasy marketplace
Posted by Derek Carty at 2:00am![]() |
| Joe Mauer has been the most valuable player in the American League this season, by most accounts. But is he really as good as he's been playing? (Icon/SMI) |
Mainstreaming
We've reached the point where UZR/150 is being talked about during Braves broadcasts, HR/FB is being discussed during Diamondbacks broadcasts, and everyone and their mother seems to be using them for fantasy, even if they're not always using the best available or even using them correctly. We're even seeing mainstream sites like CBS and ESPN make mention of once-nerdy stats like BABIP (okay, maybe our continued use of such succinctly-named acronyms as BABIP does still qualify us as nerds).
And judging by a lot of the comments we get here at THT Fantasy, it seems that a good chunk of our readership plays in leagues with owners who are savvy to these more advanced kinds of stats and analysis. There is one concept, however, that not all fantasy owners (or even analysts) appear to fully grasp. And when we notice this sort of thing, it often creates an opportunity for us to get a leg up on our competition.
A possible inefficiency?
What am I talking about? Placing too much emphasis on this year's statistics. I've been seeing this all too often recently, both among my leaguemates and on other sites. I imagine that this is partially due to ignorance and misunderstanding and partially due to human nature.
Most humans are results-driven. Even when we try to use our intellect and stay objective, we often find ourselves looking for ways to explain and justify what has happened, engaging in a form of confirmation bias in the process (for an example of this, see nearly every analyst on earth's reaction to David Ortiz's slow April and May). And because what has just happened is fresh in our minds, we pay more attention to and often place more emphasis on this than on things that have happened months or years in the past. After all, what's happening right now must be truer than what happened last year, right? Wrong. This kind of thinking, I believe, can lead to some faulty conclusions among fantasy owners and analysts.
Why is this thinking incorrect?
My underlying reasoning is based on two things: (1) the importance of utilizing all the information we have about a player and (2) regression to the mean.
Because most all players of interest have been around for a while, we have more than just data from 2009 to work with. While 2009 data is certainly most relevant, seeing as it's most recent, it is incorrect to ignore all the data from previous years entirely. Instead, logically, we should put more emphasis on 2009, less on 2008, even less on 2007, and so on. This isn't a concept many people would intuitively argue with (I don't think, anyway), but in the heat of the season and perhaps out of laziness or lack of sufficient thinking, people will often ignore previous years or at least put too little emphasis on them.
My second point is something called regression to the mean, which has received quite a bit of play around the internet recently. MGL put this concept very succinctly a couple weeks ago:
Anyway, no one mentions the obvious so far. Any player who posts a better than average number in any category for one year or for 100 years is EXPECTED to do worse in any other time period you look at, even if that player’s true talent never changes.
This might sound crazy if you're new to the concept (or maybe even if you're not), but it is indisputably true. To elaborate, read this except from David Gassko's piece from last week:
The important thing to remember is that statistics are just a sampling of an athlete’s true ability; actually, they’re less than that since that true ability constantly varies. But even if we forget about that variation, no number of plate appearances will tell us exactly how good that player is. At a trillion plate appearances, we might have to go out many, many decimal points before the player’s sample numbers and our best estimate of his true talent diverge, but eventually they would.
Because all we have to work with is a sample of a player's true ability, there will always, always be a non-zero chance that any player in baseball is no different than any other player. The chances might be one-in-a-million (or more), but there is a very real, non-zero chance that Barry Bonds was no better than Neifi Perez.
What we must also understand is that every stat stabilizes at its own rate. Some stabilize very quickly, while others take several years. If we use this year's data for some stats, we won't go too wrong, but if we use it for others, we could be way, way off. Batting average, for example, is a stat that takes a long time to stabilize. The terrific Pizza Cutter estimated that it takes roughly 1,000 plate appearances for batting average to stabilize.
Let's look at Ichiro Suzuki, who has posted an incredible .365 average in 472 MLB plate appearances in 2009. If this is all we knew about Ichiro and nothing else, we would be most accurate in guessing his batting average going forward by assuming that he is 32 percent likely to continue hitting at his current rate and 68 percent likely to hit at league average. That would leave us with a weighted expectation of just .298. That's how much pull regression to the mean can have if all we do is look at this year's statistics.
And that brings us back to my first point: the importance of utilizing all the information we have on a player. For almost all players, we have several years to look at, and those years tell us that Ichiro is probably not a sub-.300 hitter. We have over 6,000 PAs prior to this year for Ichiro, which tells us quite a bit about him. In fact, by using all of these PAs in our estimation, we'd be safe in assuming a split of 86 percent Ichiro, 14 percent league average. That would give us a batting average estimation of .324, which is not too far off from Ichiro's career .333 average.
Falling prey
Let's take a look at a few recent examples of placing too much emphasis on this year's stats from some notable fantasy baseball sites. Please do not take this as a shot at any of these sites. I have great respect for each of them, and I'm simply citing these as examples to show how easy it is, even for the best of us, to fall prey to this kind of thinking (and many more sites than just these three do it).
With the Oakland A's last week dealing their best hitter, Matt Holliday, Cabrera was left in a horrible offense in a horrible hitter's park. In Minnesota, he'll be playing in what ranks this season as the AL's best park for offense.
Park factors are one stat that you will go terribly wrong with if you use single-year factors. If we look at David Gassko's park factors (which take multiple years into account and include proper regression to the mean), we see that Minnesota's Metrodome should have a park factor around 100.6 (ever so slightly higher than league average), a far cry from the 2009 factor of 119.2 cited in the article. Because of how regression to the mean works, the Metrodome's running projection would be higher than 100.6 at this point in the season, but it will still be pretty close to neutral.
Here's another one:
With 50 K in 44.1 IP, [Andrew] Bailey throws gas and for the most part, has been able to keep his control in check. His ERA has been helped by a [.250 BABIP], so it will continue to move closer to his 3.73 [ERA estimator of choice]. But the overall skills package from a 25-year-old who barely made the team is impressive. He'll need to keep tabs on his FB rate (47%), but the spacious Oakland park is forgiving in that regard (-11% RS). The challenges of negotiating the closer role over a full season are still ahead, but from the limited data set we have, Bailey seems well-prepared for the task.
This one comes from a highly respected site which usually adds a qualifier for small sample sizes, which makes it a very good example of how easy it is to fall victim to this kind of thinking. No doubt Andrew Bailey has been terrific this year (3.19 LIPS ERA to this point), but we must realize that we're dealing with a guy that no major projection system pegged for any better than an ERA in the low 5.00s at the start of the season. In Double-A last year, he didn't even strike out a batter per inning and he walked 4.6.
Granted, Bailey was a starter for most of his minor league career, but just because he's pitched like a 3.73 ERA pitcher for 44.1 MLB innings this year does not mean we should expect him to regress to (or "move closer to") a 3.73 ERA. Instead, we should expect him to regress to whatever that 3.73 expected ERA changes his running projection to (adjusted for his move to the bullpen, of course), probably something in the mid-to-high 4.00s. This article was published at the end of June, and even after we've seen Bailey's innings total climb to 62.0 , we still only see ZiPS (5.82 preseason projection) project a 4.64 ERA for the rest of 2009.
One more example:
Cordero made just one appearance this week, chucking two scoreless innings against Colorado in an eventual extra-innings loss. Co-Co has a 1.70 ERA on the season, but his xFIP (4.01) tells a different story. The righty has posted his lowest full-season K rate (7.65) since 2000. A .238 BABIP and 4.9 HR/FB rate have hidden the overt signs of decline, but batters are making more contact and swinging at fewer pitches out of the zone.
Has Cordero declined? Yes, but probably not as much as the 7.65 K/9 indicates. ZiPS pegs him for 9.8 the rest of the way and Heater concurs (for the most part) at 8.9. Heater's rest-of-season ERA projection is 3.17, much better than his 4.01 xFIP.
The irony of it all
What's ironic about all this is that most people have moved past using last year's data to pick their players on draft day. Most people are perfectly comfortable using preseason projections—which weight seasons in a declining fashion and include regression to the mean—yet when the heat of the season takes hold, it somehow changes things and makes us feel like we can explain away swings that are occurring in a small sample size (and, yes, four months is a small sample size for most stats, and for all stats in some sense).
Taking advantage of this inefficiency
So what can we do as fantasy owners if we notice our opponents falling into this pattern? Trade away players who are having career years and acquire those who are underperforming, especially if they are of the right age. If a player is having a career year at age 27, the other owners will be much more likely to buy into it than if the player was 37. Pitchers will probably be better than hitters to trade away since things like contact rate and home runs are relatively stable in comparison to ERA or even ERA estimators. Of course, this is all easier said than done.
Taking full advantage of this means finding players who your opponents could realistically believe have taken a legitimate step forward and will maintain it, yet based on sound statistical principles shouldn't be expected to (at least not fully). A few guys like this might be Joe Mauer, Edwin Jackson, and Luke Hochevar. On the flip side, you could try to acquire players who your opponents could realistically believe have taken a step backward. This might include players like Chris Young (the OF), Francisco Liriano, Russell Martin, and Garrett Atkins.
And of course, this won't apply to every player who is overperforming or underperforming. For some players, their history will be too ugly or the player will be too old for even the best 2009 stats to overcome in the minds of fantasy owners. No matter how good Jason Marquis's 3.49 ERA or 3.98 FIP look, he's been too bad in the past and is too old for someone to think that this is his new talent level. At 35 years old, you'll be hard-pressed to find someone convinced Johnny Damon is now a 32 HR hitter.
Concluding thoughts
What do you guys think? Agree with me? Have you noticed your leaguemates engaging in this kind of thinking? What players make good buy or sell targets? And of course, if you have any questions, feel free to ask.
Derek Carty, 23, has also been published by NBC's Rotoworld, Sports Illustrated, FOX Sports, and USA Today. This season, he'll be contributing to FanDuel and will be linking to all of his work at DerekCarty.com. In his three years competing in expert leagues, he has won 2 titles with 4 top three finishes, including a LABR NL title in 2009, making him the youngest person to ever win a major expert league title. Derek is a proud graduate of the MLB Scouting Bureau's Scout Development Program and is a firm believer in the importance of combining stats and scouting. He welcomes questions via e-mail, Facebook, or Twitter.








I’m actually opposite you on this one, Derek. I wouldn’t be surprised if under-appreciating current performance might not be a bigger inefficiency than over-valuing them. Of course there are different roles for player-types depending on your team’s needs, but i often find that i’m too slow to take flyers on players in the midst of breakout campaigns, holding on to hopes like Chris Young and Garrett Atkins (though this was more PT than performance-based IMO) instead of turning to upstarts like Juan Rivera and Kendry Morales.
We can probably both agree, however, that the real inefficiency to exploit comes in identifying the true nature of these players defying expectations. Joe Mauer might be playing over his head, but Edwin Jackson and Josh Willingham might have legitimately transcended previous plateaus. Milton Bradley may have just needed to turn things around, but Francisco Liriano and Russell Martin may have lost certain physical capabilities. Distinguishing between these cases is where we can really make our bones. And it’s here that an off-season can have a real effect on how player regression to previously established norms, ‘resetting’ players a bit, giving them a chance to recuperate, work out kinks, whatever.
Anyway, i just don’t think you should completely discount prolonged season effects, that can persist for one reason or another.