In search of new inefficiencies in the fantasy marketplace

by Derek Carty
August 10, 2009

Joe Mauer has been the most valuable player in the American League this season, by most accounts. But is he really as good as he’s been playing? (Icon/SMI)

As fantasy baseball players, we’re always looking for ways to one-up our opponents. We’re always looking for an edge, a different way to do things that will separate us from the competition. Over the last eight years or so, “sabermetrics” have grown exponentially in popularity.

Mainstreaming

We’ve reached the point where UZR/150 is being talked about during Braves broadcasts, HR/FB is being discussed during Diamondbacks broadcasts, and everyone and their mother seems to be using them for fantasy, even if they’re not always using the best available or even using them correctly. We’re even seeing mainstream sites like CBS and ESPN make mention of once-nerdy stats like BABIP (okay, maybe our continued use of such succinctly-named acronyms as BABIP does still qualify us as nerds).

And judging by a lot of the comments we get here at THT Fantasy, it seems that a good chunk of our readership plays in leagues with owners who are savvy to these more advanced kinds of stats and analysis. There is one concept, however, that not all fantasy owners (or even analysts) appear to fully grasp. And when we notice this sort of thing, it often creates an opportunity for us to get a leg up on our competition.

A possible inefficiency?

What am I talking about? Placing too much emphasis on this year’s statistics. I’ve been seeing this all too often recently, both among my leaguemates and on other sites. I imagine that this is partially due to ignorance and misunderstanding and partially due to human nature.

Most humans are results-driven. Even when we try to use our intellect and stay objective, we often find ourselves looking for ways to explain and justify what has happened, engaging in a form of confirmation bias in the process (for an example of this, see nearly every analyst on earth’s reaction to David Ortiz’s slow April and May). And because what has just happened is fresh in our minds, we pay more attention to and often place more emphasis on this than on things that have happened months or years in the past. After all, what’s happening right now must be truer than what happened last year, right? Wrong. This kind of thinking, I believe, can lead to some faulty conclusions among fantasy owners and analysts.

Why is this thinking incorrect?

My underlying reasoning is based on two things: (1) the importance of utilizing all the information we have about a player and (2) regression to the mean.

Because most all players of interest have been around for a while, we have more than just data from 2009 to work with. While 2009 data is certainly most relevant, seeing as it’s most recent, it is incorrect to ignore all the data from previous years entirely. Instead, logically, we should put more emphasis on 2009, less on 2008, even less on 2007, and so on. This isn’t a concept many people would intuitively argue with (I don’t think, anyway), but in the heat of the season and perhaps out of laziness or lack of sufficient thinking, people will often ignore previous years or at least put too little emphasis on them.

My second point is something called regression to the mean, which has received quite a bit of play around the internet recently. MGL put this concept very succinctly a couple weeks ago:

Anyway, no one mentions the obvious so far. Any player who posts a better than average number in any category for one year or for 100 years is EXPECTED to do worse in any other time period you look at, even if that player’s true talent never changes.

This might sound crazy if you’re new to the concept (or maybe even if you’re not), but it is indisputably true. To elaborate, read this except from David Gassko’s piece from last week:

The important thing to remember is that statistics are just a sampling of an athlete’s true ability; actually, they’re less than that since that true ability constantly varies. But even if we forget about that variation, no number of plate appearances will tell us exactly how good that player is. At a trillion plate appearances, we might have to go out many, many decimal points before the player’s sample numbers and our best estimate of his true talent diverge, but eventually they would.

Because all we have to work with is a sample of a player’s true ability, there will always, always be a non-zero chance that any player in baseball is no different than any other player. The chances might be one-in-a-million (or more), but there is a very real, non-zero chance that Barry Bonds was no better than Neifi Perez.

What we must also understand is that every stat stabilizes at its own rate. Some stabilize very quickly, while others take several years. If we use this year’s data for some stats, we won’t go too wrong, but if we use it for others, we could be way, way off. Batting average, for example, is a stat that takes a long time to stabilize. The terrific Pizza Cutter estimated that it takes roughly 1,000 plate appearances for batting average to stabilize.

Let’s look at Ichiro Suzuki, who has posted an incredible .365 average in 472 MLB plate appearances in 2009. If this is all we knew about Ichiro and nothing else, we would be most accurate in guessing his batting average going forward by assuming that he is 32 percent likely to continue hitting at his current rate and 68 percent likely to hit at league average. That would leave us with a weighted expectation of just .298. That’s how much pull regression to the mean can have if all we do is look at this year’s statistics.

And that brings us back to my first point: the importance of utilizing all the information we have on a player. For almost all players, we have several years to look at, and those years tell us that Ichiro is probably not a sub-.300 hitter. We have over 6,000 PAs prior to this year for Ichiro, which tells us quite a bit about him. In fact, by using all of these PAs in our estimation, we’d be safe in assuming a split of 86 percent Ichiro, 14 percent league average. That would give us a batting average estimation of .324, which is not too far off from Ichiro’s career .333 average.

Falling prey

Let’s take a look at a few recent examples of placing too much emphasis on this year’s stats from some notable fantasy baseball sites. Please do not take this as a shot at any of these sites. I have great respect for each of them, and I’m simply citing these as examples to show how easy it is, even for the best of us, to fall prey to this kind of thinking (and many more sites than just these three do it).

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

With the Oakland A’s last week dealing their best hitter, Matt Holliday, Cabrera was left in a horrible offense in a horrible hitter’s park. In Minnesota, he’ll be playing in what ranks this season as the AL’s best park for offense.

Park factors are one stat that you will go terribly wrong with if you use single-year factors. If we look at David Gassko’s park factors (which take multiple years into account and include proper regression to the mean), we see that Minnesota’s Metrodome should have a park factor around 100.6 (ever so slightly higher than league average), a far cry from the 2009 factor of 119.2 cited in the article. Because of how regression to the mean works, the Metrodome’s running projection would be higher than 100.6 at this point in the season, but it will still be pretty close to neutral.

Here’s another one:

With 50 K in 44.1 IP, [Andrew] Bailey throws gas and for the most part, has been able to keep his control in check. His ERA has been helped by a [.250 BABIP], so it will continue to move closer to his 3.73 [ERA estimator of choice]. But the overall skills package from a 25-year-old who barely made the team is impressive. He’ll need to keep tabs on his FB rate (47%), but the spacious Oakland park is forgiving in that regard (-11% RS). The challenges of negotiating the closer role over a full season are still ahead, but from the limited data set we have, Bailey seems well-prepared for the task.

This one comes from a highly respected site which usually adds a qualifier for small sample sizes, which makes it a very good example of how easy it is to fall victim to this kind of thinking. No doubt Andrew Bailey has been terrific this year (3.19 LIPS ERA to this point), but we must realize that we’re dealing with a guy that no major projection system pegged for any better than an ERA in the low 5.00s at the start of the season. In Double-A last year, he didn’t even strike out a batter per inning and he walked 4.6.

Granted, Bailey was a starter for most of his minor league career, but just because he’s pitched like a 3.73 ERA pitcher for 44.1 MLB innings this year does not mean we should expect him to regress to (or “move closer to”) a 3.73 ERA. Instead, we should expect him to regress to whatever that 3.73 expected ERA changes his running projection to (adjusted for his move to the bullpen, of course), probably something in the mid-to-high 4.00s. This article was published at the end of June, and even after we’ve seen Bailey’s innings total climb to 62.0 , we still only see ZiPS (5.82 preseason projection) project a 4.64 ERA for the rest of 2009.

One more example:

Cordero made just one appearance this week, chucking two scoreless innings against Colorado in an eventual extra-innings loss. Co-Co has a 1.70 ERA on the season, but his xFIP (4.01) tells a different story. The righty has posted his lowest full-season K rate (7.65) since 2000. A .238 BABIP and 4.9 HR/FB rate have hidden the overt signs of decline, but batters are making more contact and swinging at fewer pitches out of the zone.

Has Cordero declined? Yes, but probably not as much as the 7.65 K/9 indicates. ZiPS pegs him for 9.8 the rest of the way and Heater concurs (for the most part) at 8.9. Heater’s rest-of-season ERA projection is 3.17, much better than his 4.01 xFIP.

The irony of it all

What’s ironic about all this is that most people have moved past using last year’s data to pick their players on draft day. Most people are perfectly comfortable using preseason projections—which weight seasons in a declining fashion and include regression to the mean—yet when the heat of the season takes hold, it somehow changes things and makes us feel like we can explain away swings that are occurring in a small sample size (and, yes, four months is a small sample size for most stats, and for all stats in some sense).

Taking advantage of this inefficiency

So what can we do as fantasy owners if we notice our opponents falling into this pattern? Trade away players who are having career years and acquire those who are underperforming, especially if they are of the right age. If a player is having a career year at age 27, the other owners will be much more likely to buy into it than if the player was 37. Pitchers will probably be better than hitters to trade away since things like contact rate and home runs are relatively stable in comparison to ERA or even ERA estimators. Of course, this is all easier said than done.

Taking full advantage of this means finding players who your opponents could realistically believe have taken a legitimate step forward and will maintain it, yet based on sound statistical principles shouldn’t be expected to (at least not fully). A few guys like this might be Joe Mauer, Edwin Jackson, and Luke Hochevar. On the flip side, you could try to acquire players who your opponents could realistically believe have taken a step backward. This might include players like Chris Young (the OF), Francisco Liriano, Russell Martin, and Garrett Atkins.

And of course, this won’t apply to every player who is overperforming or underperforming. For some players, their history will be too ugly or the player will be too old for even the best 2009 stats to overcome in the minds of fantasy owners. No matter how good Jason Marquis’s 3.49 ERA or 3.98 FIP look, he’s been too bad in the past and is too old for someone to think that this is his new talent level. At 35 years old, you’ll be hard-pressed to find someone convinced Johnny Damon is now a 32 HR hitter.

Concluding thoughts

What do you guys think? Agree with me? Have you noticed your leaguemates engaging in this kind of thinking? What players make good buy or sell targets? And of course, if you have any questions, feel free to ask.

24 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Andrew

14 years ago

I agree that we should always be mindful of a player’s track record (or True Talent).

At the same time, I think it’s certainly possible that some players can just have lost seasons; that is, significant skills erosion, even at a young age. There’s nothing in Russell Martin’s underlying statistical profile to indicate that he’s still a top-5 catcher. In a game in which we grab new players at the beginning of each year (aside from keeper leagues), perhaps it’s not always best to wait around for that True Talent to come around.

Mike Podhorzer

14 years ago

Hmm Derek, didn’t you just basically conclude by saying we should attempt to buy low and sell high? Just stated in a prettier fashion? I think that whole concept is pretty close to dead to begin with, well at least in my leagues!

Also, to support the original author of the Cordero commentary, doesn’t the large decline in strikeout rate indicate that something might not be right? Sure, sometimes it is just a small sample fluke, and his K/9 will rise back to historical levels. But other times it could be an indication that Cordero is pitching hurt. In addition, if his K/9 does remain at this lower level, his ERA should rise to whatever higher level some expected ERA metric tells you.

Rest of season projections, whether from ZiPS or Heater, have no clue about potential injury, velocity drops, complaints of a sore shoulder, etc etc. They just blindly take history into account and combine that with what has happened so far this season. So the question becomes: is Cordero’s skills changing so now he is truly a 7.65 K/9 skill pitcher? If so, his ERA is going to go up. If not, then his skills will eventually catch up to his ERA, which will end up remaining about the same.

Derek Carty

14 years ago

archilochusColubris, Brian M, and Andrew:

You’re definitely right that we can’t “completely discount prolonged season effects, that can persist for one reason or another.” Every player should be taken on an individual basis. As I’ve said before (http://www.hardballtimes.com/main/fantasy/article/my-player-evaluation-philosophy/), I’m 100% in favor of combining scouting and stats.

We just need to make sure that, if we are going to stray from the firm statistical principles, we are using solid qualitative information to back up our assertions. If we are grasping at straws to try and explain away what’s happening, we may be right sometimes, but overall we are going to lose accuracy, guaranteed. If we don’t have this strong scouting information for a particular player, we are best off just trusting the stats.

If we are going to say that a player has reached a new plateau, we must have the utmost confidence in ourselves because this rules out the possibility of him regressing to his previous level. To use Edwin Jackson as an example, while his K/9 and BB/9 have both improved, we are only looking at 147 IP. To say that he will stay at this new level is completely unwise unless we have very, very good information to tell us that he is a different pitcher than last year. And even then, we’re only seeing a sample of this “new Edwin Jackson”, so the principles of Regression to the Mean would still apply to Edwin 2.0.

Even if a player’s underlying statistical profile for 2009 completely backs up the way the player is performing (maybe like Andrew’s example of Russell Martin), we must understand that we are only looking at a sample of Martin’s ability and that it is prone to fluctuations. He has performed at a high level in the past, and unless we are convinced that this is a completely different player, the possibility exists that he will revert to that previous state.

Derek Carty

14 years ago

Mike,
It could be viewed this way, although I think there is a distinct difference in what I’m saying. I see Buying Low and Selling High as trying to acquire Ricky Nolasco, where the peripherals don’t match the surface numbers. But if we see a legitimate rise or fall in peripherals this year, some owners may be inclined to believe that this represents a new skill level for the player. If this is the case, and they are ignoring the possibility of regression to the mean (or regression to previous levels), then I think an opportunity arises. Their personal evaluation of the player, in this case, will not match the proper evaluation of the player.

As to Cordero, it is definitely possible that this represents a legitimate shift in skills, as it does with any player. But we can’t say with 100% certainty that this is true. When we look at previous seasons and weight them, it’s almost as if we are giving each a probability of occurring again in the next time period. 2009’s is most likely, 2008’s is less likely, etc.

And, of course, you are absolutely right about projection systems ignoring things like injuries, mechanics, velocity, etc. Again, I’m 100% in favor of combining stats and scouting in a prudent manner, when the information is available (http://www.hardballtimes.com/main/fantasy/article/my-player-evaluation-philosophy/).

But when we have this scouting information, it still really only changes the percentages of each scenario taking place – it doesn’t say with 100% certainty that the player will remain at his current level, the same as the presence of a new level of peripheral statistics doesn’t tell us this.

We cannot always be right, but by committing to the extremes every time we have a little piece of information, we will decrease our accuracy.

In Cordero’s case, perhaps this is being caused by injury, but his fastball velocity isn’t to blame. It’s the highest it’s been since 2006. In any case, we’ll be best off by weighing the possibilities and going from there.

Adam

14 years ago

Not to speak for Derek, but I think there’s a huge difference between applying this theory to trades and applying this theory to waiver-wire pickups. Guys like Ben Zobrist, Kendry Morales, Brian Bannister, or Edwin Jackson probably weren’t even drafted in most leagues, so I don’t think that’s quite what he means.

I think all Derek is trying to say that if you commit to the philosophy of trading guys like Bannister for guys like Arizona Chris Young, you’ll come out ahead over time.

Derek Carty

14 years ago

Exactly right, Adam. There’s a big difference between the waiver wire and trades. I’ve discussed at times in the past that it’s important to be aggressive on waivers early in the season, seeking out those high-variability guys. If all they cost is the worth of a bench spot, why not see if they’ll continue their terrific seasons? For trades at this time of year, though, I don’t think we’re necessarily looking for high variability. Some teams will be, but in general just going with the better percentage play is the way to go.

Ed D.

14 years ago

Derek,

Good article, and good examples.

The one thing that I don’t think you wholly considered was what it really takes to WIN a fantasy league. In most years, you NEED to capture a few “new plateaus” on your roster to end the season the winner. In a league of 12 or 15 or 18 competitors, SOMEONE will take a chance on every single one of “this year’s surprises” (except maybe, as you said, the Jason Marquis types for whom we have too much history to believe in him) and even though that means many owners will only grab the (majority of) players who regress, one or two will get lucky and grab the few that either (a) have in fact reached a new sustainable level of performance or (b) at least maintain their lucky unsustainable level of performance for the entire season.

In my opinion, a statistically-minded owner who ignores all of this activity will finish a very comfortable 3rd-5th place in most years. But to win, you need to capture some of the high-side variability that presents itself only after the auction has ended.

Jeremy

14 years ago

How would a guy like Mark Reynolds fit into this? I tried to trade him at an earlier peak this season and couldn’t get any value for him. So I held onto him and now find him at another peak. I don’t expect Reynolds to be a top 5 in all of baseball performer the rest of the season. But I didn’t expect him to be a top 30 in all of baseball performer for the rest of the season when I tried to trade him in June.

How do you know what kind of value to accept in a trade when other owners aren’t buying into this year’s stats?

archilochusColubris

14 years ago

I’m actually opposite you on this one, Derek. I wouldn’t be surprised if under-appreciating current performance might not be a bigger inefficiency than over-valuing them. Of course there are different roles for player-types depending on your team’s needs, but i often find that i’m too slow to take flyers on players in the midst of breakout campaigns, holding on to hopes like Chris Young and Garrett Atkins (though this was more PT than performance-based IMO) instead of turning to upstarts like Juan Rivera and Kendry Morales.

We can probably both agree, however, that the real inefficiency to exploit comes in identifying the true nature of these players defying expectations. Joe Mauer might be playing over his head, but Edwin Jackson and Josh Willingham might have legitimately transcended previous plateaus. Milton Bradley may have just needed to turn things around, but Francisco Liriano and Russell Martin may have lost certain physical capabilities. Distinguishing between these cases is where we can really make our bones. And it’s here that an off-season can have a real effect on how player regression to previously established norms, ‘resetting’ players a bit, giving them a chance to recuperate, work out kinks, whatever.

Anyway, i just don’t think you should completely discount prolonged season effects, that can persist for one reason or another.

Winter Trabex

14 years ago

I’d love to see how any of this applies to hot and cold streaks- which seem to ignore regression to the mean or anything of the sort. I think this can explain why batting average drops once it gets to a certain point (ie, .370 to .400). What it doesn’t explain, though, is why batting average doesn’t come up again once it’s down around .220 or so.

Garrett Jones and Ben Zobrist have won spots on my team because of immediate output. Does that mean I should expect them to fall off? I don’t know if that theory holds up. Maybe Jones really is a guy like Adam Dunn who can hit 40+ HR every year. Maybe Zobrist really is a legit 20/20 player.

And, more importantly, maybe Garrett Atkins really is washed up because he stopped taking illicit substances. That has to factor in too, don’t you think?

Chad Burke

14 years ago

Has Reynolds shown any signs of letting up down the stretch? Every time I see someone suggest to cash in on him he has seemed to go on a tear. I’ve got him in one league and I’m keeping him as even if he drops a little in production he’ll still be exceeding what I paid for him by a mile.

Russell Branyan, OTOH, I tried to sell for ages on another of my teams and nobody was buying it and apparently rightfully so.

Jeremy B.

14 years ago

Derek,
Nice article. This is my first year in a fantasy league and I regret a trading Carlos Lee earlier in the season when I placed too much emphasis on his declining slugging percentage, and too little emphasis on his historical trends. I could use him in my outfield right now.

Playing devil’s advocate, I think an important thing to consider when evaluating this year’s statistics vs. historical performance is the Dynamic context of baseball. The player pool, team compositions, and stadium construction, are not Fixed in time.

Last year Yankee and Shea Stadiums had neutral park factors for home runs, this year they (Yankee and Citi) are numbers one and five. Also, the majority of the Yankee’s starting rotation was not starting there last year.

Those differences are contextual, and present us with some scenarios where emphasis on this year’s statistics is necessary because we can’t expect Yankee stadium’s park factor or their team pitching stats to regress to previous years when they had a (mostly) different starting rotation pitching in a different stadium.

I suspect that the Yankees and Mets are exceptions though, and probably don’t typify the norm across teams and years. Thanks again for your advice, and I’ll be looking at multiple years of park factors from now on.

Derek Carty

14 years ago

Agreed, Ed D., although I think many of those guys can be had off waivers, not necessarily in August via trade.

Jeremy,
There’s not hard and fast rules; it all depends on the kind of owners you are dealing with. As archilochusColubris initially mentioned, it may be that some owners will see any play above preseason projection as fluky and will refuse to pay anything additional for it. In this case, you might be better off hanging onto the player, since after all, 2009 stats do count for something. And reversely, you could trade for players who are exceeding expectations and the other owner believes they will regress too far back to a previous level.

Derek Carty

14 years ago

Winter Trabex,
I’m not quite sure what you mean with your .400 and .220 examples, but regression to the mean is exactly why “hot streaks” and “cold streaks” are just that, streaks. They are small sample sizes and shouldn’t be expected to continue. If they do continue for a long time, then it’s not really a streak anymore and it will show up to a significant degree in the seasonal numbers, which will be factored into our decision making.

As to your examples of Garrett Jones, it’s absolutely possible that Jones is now an Adam Dunn-type hitter. However, it’s also possible that he is *not* Adam Dunn. That’s what my article is getting at. What we are looking at with Jones is a 134 at-bat sample, which means something, but is by no means absolute. Instead, we would be best off by weighting this sample and combining it with all of his previous ABs (which also mean something) to arrive at an average estimated performance level. While sometimes Jones will continue to perform like Dunn, other times he’ll fall back to earth and start playing like Scott Rolen. In the long-run we’ll be most successful by taking the weighted mean projection.

Again, everything you say with Atkins needs to get factored in, but because of both 1) regression to the mean and 2) his past, it would be incorrect to believe Atkins will continue to perform at such a low level going forward. It’s entirely possible that he has fallen off because he “stopped taking illicit substances” (not an accusation), but many other possibilities exists, not the least of which is simply random variation.

Derek Carty

14 years ago

Jeremy B.,
You’re 100% correct. Context is extremely important, as I’ve stressed many times in the past (my favorite: http://www.hardballtimes.com/main/fantasy/article/introducing-quality-of-opponent-adjustments-and-caps-for-pitchers/)

In the case of moving to a completely new stadium, it renders the old park factors irrelevent. Some purists would even argue that any change to a stadium (even something like lessening foul ground by a couple feet) means that we should begin a new park factor for it, and I don’t totally disagree.

In cases like this with New Yankee Stadium and Citi Field, looking at 2009 statistics is important because it’s all we have. But because 1) 2009 is all we have and 2) park factors take several years to stabilize, we can’t take the 2009 numbers at face value. Instead, the proper way to evaluate them would be to regress them heavily to the mean. This holds true for any instance where all we have is a small sample size to work with. It’s simply imprudent to do otherwise without solid qualitative evidence.

As an example of why looking at raw 2009 Citi/Yankee factors is incorrect, just look at Minnesota’s 119 factor. If this was the first year the stadium was in existence and we assumed it really did inflate scoring by that much, we would be end up being terribly incorrect (which we know because the stadium *has* been in existence for many years and tells us this).

And as you said, this isn’t a typical scenario as teams keep their stadiums for many years.

BobbyRoberto

14 years ago

Derek,
So, considering your article above and the many interesting comments, if you had to choose right now between having David Ortiz (great history of production, but struggling again now) or Kendry Morales (having a great year, not a good history), would you stick with Ortiz? Is he the better percentage play?

Derek Carty

14 years ago

BobbyRoberto,
I don’t have an in-season projection system built myself, but I’d have to think Ortiz is the better power bet and Morales is the better contact bet. Overall, I’d probably go with Ortiz unless we had good complementary information about either player.

Andrew

14 years ago

For what it’s worth, going forward this year, I would rather have Kendry Morales over David Ortiz, Adam Lind over Vladimir Guerrero, Aaron Hill over Brian Roberts, Ben Zobrist over Jimmy Rollins, Mark Reynolds over David Wright, Josh Willingham over Alfonso Soriano, Wandy Rodriguez over Francisco Liriano, and JP Howell over Brad Lidge.

In my opinion, the underlying skills showcased this season should form the basis for the decisions that we make in non-keeper leagues.

BobbyRoberto

14 years ago

I’m inclined to agree with Andrew, which I think is exactly what Derek is cautioning against.

I know it’s just a handful of players, but I’m putting those names down and will compare their production from this point forward to season’s end.

Chad Burke

14 years ago

Are these rest of season projections taking into account that Papi is in his 4th year of declining statistics, is 34 years old, and is currently hitting 0.219 on the season? I held onto him all of last year, including his DL stint, hoping for a resurgence that still has never come other than a few hot streaks. No way in hell I trade Morales (on two of my three teams) for Papi. Right now I’d hesitate to trade Nick Swisher for Papi and he hasn’t been anywhere near as hot as Morales has been. Morales one weak spot is that the majority of his production (22 of 25 HR’s) have come against right handers but his average and OBP against lefties is still way better than Ortiz overall numbers. We are looking at almost 400 AB’s and his production has only gotten better with each month, what could possibly make one think that Papi is likely to flip the switch and revert back to Papi of 4-5 years ago?

As for the other player comparisons, they are very difficult to evaluate some of them since the players offer such different skill sets. Aaron Hill will continue to put up better power numbers than Roberts, I guarantee it, but Roberts will be similarly valuable in R’s and SB’s. Wright/Reynolds is a good one as they are both power/speed combos but I just couldn’t trade a guy on pace for 40/25 for a guy who’ll be lucky to get to 15 bombs from a power position playing in a park that is appearing to suppress power numbers, irregardless of their histories.

Derek Carty

14 years ago

That’s exactly what I’m cautioning against, BobbyRoberto. That’s not to say I’d take the slumping guy in all of those cases, just that we shouldn’t automatically assume that the one who is playing better this year will be better going forward.

Chad Burke, projections do take into account the things you mentioned: a player’s history, age, and current season. I can’t speak to the exact methodology of ZiPS, but it definitely sees those things the same as you. As far as Papi goes, no one is saying that he’ll revert to the Papi of 4-5 years ago, not even ZiPS. What I (and ZiPS) are saying is that he will be better going forward than he has been in 2009. As the MGL quote above says, any player who performs poorly in one time period is expected to perform better in the next time period. This is especially true when the player has performed better every year for the past six.

As I said in the article, this concept won’t be easy for everyone to accept, but it is undoubtedly true. 400 AB may seem like a lot, but it really isn’t.

BobbyRoberto

14 years ago

Again, looking at the players Andrew listed, I wanted to see what ZiPS thinks they’ll do from here on out. These are rest-of-season projections:

Morales .287/.327/.493, .820 OPS
Ortiz .263/.365/.509, .874 OPS

Lind .287/.341/.482, .823 OPS
Vlad .303/.359/.493, .852 OPS

Hill .284/.330/.450, .780 OPS
Roberts .286/.363/.439, .802 OPS

Zobrist .258/.359/.444, .803 OPS
Rollins .271/.327/.449, .776 OPS

Reynolds .272/.351/.533, .884 OPS
Wright .312/.402/.519, .921 OPS

Willnghm .273/.373/.500, .873 OPS
Soriano .269/.332/.486, .818 OPS

So, among these 6 pairs of hitters, ZiPS is projecting Zobrist and Willingham to be better in the triple slash stats from this point forward.

The pitchers:
W. Rodriguez 3-3, 4.42 ERA, 1.40 WHIP, 3.89 FIP
F. Liriano 3-2, 4.40 ERA, 1.38 WHIP, 4.16 FIP

JP Howell 2-1, 3.33 ERA, 1.22 WHIP, 3.52 FIP
B. Lidge 1-1, 4.29 ERA, 1.43 WHIP, 4.03 FIP

ZiPS sees Wandy as equal to Liriano and Howell as better than Lidge.

Andrew

14 years ago

I wasn’t trying to stir up any controversy by providing that list of players having breakout seasons whom I would take over other players with more stable skill sets, some of whom are struggling. It’s pretty easy to cherry pick a few names and come up such a list.

I think you’re one of the best business, Derek. I guess it’s just tough for me to grasp the fact that even four months of data is still not a significant sample size. If that’s true – and I accept that – then fantasy baseball becomes a test of the most accurate pre-season rankings since virtually nothing in-season matters.

So, in general, would you agree with this statement: Barring injuries, the fantasy baseball team with the most True Talent will most often win in the long run over time.

It may be due to the fact that I have only been playing for seven years now, but it has not been my experience that that statement holds true. Again, maybe this is another problem of sample size.

B N

14 years ago

Firstly, I should preface by saying that I’m getting a doctorate in modeling and simulation (including probabilistic models) so it’s not like I want to slam the use of statistics. However, I think the whole premise of some of this logic doesn’t make sense as it is confounding our analysis of the system with the actual system.

Ex. “there is a very real, non-zero chance that Barry Bonds was no better than Neifi Perez.” This depends greatly on the assumptions you are willing to make during analysis. I can always make a stat “BondsRulesNeifi” which clumps players by similarity to each player and works on the assumption that Bonds is better than Neifi. On that stat, Bonds will with 100% certainty be better than Neifi (if you are willing to make the assumption that I can make the model work right).

The “True Talent Level” is problematic for analysis because it is a complete fallacy. All talent level is relative to the competition, which varies over time due to matchups and schedules. None of the major stats cover useful component analysis (ability to hit certain trajectory pitches, etc) so you’re always throwing away tons of important information.

I think that you’re on one of the right tracks about concentrating on the limitations of your opponents but I think one of those weaknesses is in fact an over reliance on unreliable stats (of which BABIP is definitely one). I would say a big limitation is the very assumption of “regression to mean” which assumes stationary distributions over time (which we know not to be true).

I would say that time series and pattern analysis would be a lot more useful for fantasy purposes. For example, I tend to see a disproportionate number of rookie players come off to hot starts then trail off. It’s easy to say “this is unsustainable versus the mean and they will trail off” but I think that is an oversimplification. In our explanations, we will instead say “The league will catch up with him.” But do we have a stat for that? How do we differentiate between a player to which the league has to make adjustments (i.e. a game theoretic context) versus a player who is benefiting from some random variation? The truth is that I haven’t seen that be a big part of the discussion.

I would say that is really the bigger advantage- stopping the indiscriminate use of distributions and probabilistic models that are known to be fundamentally unreliable with models that can anticipate a pattern over time rather than just an aggregate number (which while easier is far less informative).

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG