The ‘optimistic’ Bill James projections

image
Bill James has A-Rod down for 37 HRs while CHONE has him down for 34. Does that mean the James system is higher on A-Rod than CHONE? The answer may surprise you. (Icon/SMI)

It’s that time of year when projection systems are starting to be released. CHONE was released about a month ago, ZiPS is in the process of being released one team at a time, and Marcels will probably make an appearance sometime in the next few weeks. For hardcore fantasy baseball enthusiasts, it can be a lot of fun to go through the various systems and see how they are viewing players and how they stack up against each other. While I enjoy this time of year as much as anyone else, one of my biggest pet peeves is when people start talking about how optimistic the Bill James projection system is. Some examples of this sentiment:

Pending Pinstripes (Yankees blog):

I’ve always thought that the Bill James projections were wildly optimistic, but they’re still interesting to look at.

The McCovey Chronicles (Giants blog) comments section:

Considering how crazy-optimistic Bill James projections usually are, that seems awfully pessimistic for Affeldt.

The Crawfish Boxes (Astros blog):

Surprisingly, James has an offensive prediction for Chris Johnson, and even more surprisingly, James is somewhat more optimistic about Johnson than most of us on this board (including me). Obviously, James’ system believes that Johnson’s minor league numbers indicate decent enough power to offset, at least in part, a paltry OBP. I have my doubts on that.

South Side Sox (White Sox blog):

James tends to have the most optimistic projections of any of the major forecasters, specifically for young players.

Driveline Mechanics:

The James projections often seem optimistic…

I think you get the idea.

Flawed logic

Inevitably, each year, the James system seems to be higher on the vast majority of players than are other systems such as CHONE, ZiPS, or Oliver. And inevitably, each year, baseball analysts see nothing wrong with making straight comparisons between systems. A couple weeks ago, I saw one article about Jake Fox that read:

For 2010, Bill James projects a whopping .284/.339/.546 line for Fox in limited playing time. That strikes me as wildly optimistic. CHONE’s forecast appears much more reasonable, with a projected .257/.316/.452 performance.

Other times, we’ll see straight comparisons to previous years:

Bill James is super-optimistic – when I looked at the projections they came up with a few years ago, he had the majority of starters being better than average at every single position.

Still other times, sites will refer to the James projections while completely ignoring the context under which they were created:

James also projects Jose Reyes will return from injury to hit .285 with 57 stolen bases, 14 home runs, 67 RBI and 113 runs created.

…man, i hope so… it looks to me like James essentially took Jose’s brief 36 games in 2009 and extended them out over 162 games, which is totally fine by me.

Or:

The projections here are extremely pleasing. I believe that they’re too optimistic, though, especially for the bullpen.

This isn’t a shot at these articles or writers in the slightest, but I believe this line of reasoning—which is common across many sites and blogs—is a bit flawed, so today I’d like to help correct some of the common errors and misconceptions that many seem to have about projections.

Relativity and context

The most important concept I’d like to stress is that of relativity. The kinds of articles I just mentioned operate under the assumption that the James projection for a player should be looked at relative to another system’s projection for him or relative to last year. This is incorrect, though. What we should be doing is examining the James projection for a player relative to all of the other players the James system projects.

As I’ve stressed many times before, context is of the utmost importance when it comes to almost anything fantasy baseball related. In this case, most people ignore the run environment that the James projection system assumes. To illustrate my point, I’ll use a very extreme example. Let’s say that we transport Albert Pujols and his 44 HR projection into a league where it is common for the worst players to hit 80 HRs per year and the best to top 200 HRs. While Pujols and his 44 HRs look terrific in our reality, in this new one it looks kind of pathetic. That’s context.

“What does this have to do with the James projections, though,” you ask? Well, while the James projections don’t assume a run environment where people are routinely hitting 200 HRs, it usually does assume that hitters perform a little bit better, on the whole (when compared to previous seasons or other projection systems). So if everyone is being projected to hit a few extra HRs, it does not necessarily make Alex Rodriguez’s 37 HR projection any more optimistic than CHONE’s 34 HR projection.

After all, when we’re drafting players in fantasy leagues, it doesn’t matter if the first pick has 200 HRs and the second pick has 190 or if the first pick has 40 and the second has 38. We don’t care about the actual numbers; we care about the relative rankings. It doesn’t matter if James has Albert Pujols at 44 HRs and CHONE only has him at 39. If James is inflating numbers across the board, Pujols will still be considered the No. 1 pick and everyone else will fall in line behind him, regardless of the system used or whether or not its numbers are inflated relative to other systems—we just can’t mix-and-match.

Comparing systems

So how can we compare systems if we can’t do it directly? Ideally, we’d find the league average for all systems for all of our relevant stats (or even more ideally, the average for all players that will be drafted in a particular fantasy league, though that obviously works better in theory than in practice) and create a set of conversion factors so direct comparisons can be made between systems.

I didn’t buy the Bill James Handbook or the projections this year, so I don’t know what its league average is (and thus am not 100% certain that the James system is actually inflating stats this year, but they have been inflated in the past and anecdotally seem to be this year). If anyone wants to share what the league average is for James (or other systems that require payment), I’d be happy to whip up some quick conversion factors and post them for everyone to make use of.

Outliers

“But what about players whom the James system is extremely high on? Should they be disregarded?” Of course not. Like any other system, James will like certain players more than those other systems. They’re just a little tougher to pick out without applying the conversion factors since we have to guess at how much we should discount their stats. One guy who might fit this criteria this year, though, is Mark Reynolds. James has him down for 40 HRs while CHONE is at just 30. Marcels will likely be closer to 30 as well when it comes out. That’s a big difference, even considering inflation. We just need to remember that all systems will favor certain players and show a distribution of players they like (relative to other systems), dislike, and are neutral on.

Fan Projections

One last point is that the fan projections FanGraphs is running will likely be sitting in the same boat with the James projections. I’d guess that fans will be more apt to project players they like, which means league average will probably be a bit higher for these projections as well. Just something to keep in mind.

Concluding thoughts

Hopefully this has cleared up some confusion regarding projections, specifically regarding the Bill James system. If you have any questions, feel free to e-mail me or post in the comments.


20 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Greg F.
14 years ago

Hah, I remember writing that. I think I was more referring to young players. Austin Jackson’s projection was a little outlandish, giving him a .341 wOBA and a .371 BABIP—those two things are very unlikely to happen.

TCQ
14 years ago

It’s still a flaw in the James system if the league average it’s using does not correspond with reality.

Paul Singman
14 years ago

I understand that all of the projections need to be taken within the context of the system, but the Bill James projections at first baffled me, and now I just find them amusing. They’ve become almost an inside joke within my circle of baseball friends because of the irrational optimism behind them. It must take a conscious effort on Bill James’ part (or whoever does them) to inflate everyone’s stats to that level.

It’s something that could be resolved too easily if someone confronts him to find out his reasoning, but for now it will remain a mystery to me.

henry j.
14 years ago

being a casual reader but avid fantasy player,(12 veried formats on average per year) i would like to know what were the actual stats compared to each systems predictions.this should be the definitive answer to this question of inflating the forecasting accuracy.

Toffer Peak
14 years ago

I have to disagree with this article for the same reason as TCQ, if BJ’s projections are based on a league average different than MLB’s then they are simply wrong, or at least projecting something different from reality. If at the end of the season his projections are found to be consistently too high then his projections are simply not good (in comparison to others). Your defense of his projections is only applicable to people who plan to use his numbers purely to create an ordinal ranking of fantasy baseball players. For that purpose, fine, it might be okay that the league average is off but for all other purposes it simply makes them bad projections.

The other problem is that you and he can’t have it both ways. He projects both hitters AND pitchers to do better than other projection systems. How can that be? If hitters are going to hit a ton of HRs and RBIs it must be coming at the expense of the pitchers but he seems to project them to play great as well. The only way I can imagine that this is explainable is that the players he doesn’t project are going to really stink.

Derek Carty
14 years ago

TCQ and Toffer Peak,
I should first note that I am in no way affiliated with the Bill James projection system, James himself, or Baseball Info Solutions.  This article wasn’t meant so much to defend the system or proclaim that it is superior to any others as it was to clarify some of the misconceptions with it as objectively as possible.

Objectively speaking (and of course speaking from the perspective of a fantasy analyst), I have to disagree (at least to a large extent) that they are “simply wrong” merely because they assume a different league average.  Sure, it doesn’t mirror reality as closely as some other systems do, but no system perfectly predicts league average consistently.  League average doesn’t even project league average 100% correctly.  League average changes a little from year-to-year (HR/AB was 3.25% in 2004, 3.01% in 2005, 3.21% in 2006, and 2.95% in 2007), as do the projected league average for each system, so it really just comes down to our own personal tolerance for whether each system comes close enough to this general range.  And that’s if we care about league average at all.

It makes absolutely no difference in the predictive quality of the system whether the league average HR/AB is 1% or 25%.  Sure, it makes interpreting the numbers difficult because we’re *used* to the numbers looking a certain way, but they truly do mean the exact same thing.  If you equate “difficult to interpret correctly” with “bad”, then sure, they might be a little bad.  But if we care about the predictive nature of them, “difficult to interpret correctly” in no way means “bad.”

As far as the hitter/pitcher conundrum, I really can’t comment since I haven’t studied the projections well enough to really know.  I will say that some problems can arise if the hitter/pitcher dynamic isn’t treated carefully enough, but I really don’t know how the James system works in this regard.

GBSimmons,
To the best of my knowledge, James no longer is involved in the projection process of the “Bill James projections.”  I believe that the guys at Baseball Info Solutions create the projections without his aid and the name has remained as a means of marketing.  I think.

Derek Carty
14 years ago

henry j,
I think there’s a difference between inflating the forecasting accuracy and inflating the stats themselves.  The James system, anecdotally, seems to do the latter, which will have no impact on the system’s accuracy.

As far as picking out which system is most accurate, it’s not as straight forward as we might believe.  There’s been a lot of this kind of thing done over at The Book Blog, but I’m not so sure we really have one definitive answer.

TCQ
14 years ago

Derek,

I understand the basic principle that assuming a different league average does not effect the predictive accuracy of the system, but I still don’t see how it can simply be dismissed that the Bill James system does not reflect reality as closely as CHONE or Marcels or what have you.

Also, it’s logically farcical to say that it doesn’t matter that the James system uses a less accurate league average simply because other systems aren’t perfect. Nothing’s perfect; that doesn’t mean that everything falls shy by the same amount.

gtwma
14 years ago

Recognizing all the problems in measuring forecast accuracy using this and other methods:

http://www.fantasybaseballcafe.com/2007/tips_assessingprojections.php

http://www.baseballthinkfactory.org/files/newsstand/discussion/2006_projection_results/

James, optimistic or not, seems to be little different in accuracy than others.  To me, it just seems that fantasy players focus on a handful of projections on young players in his projections.

GBSimons
14 years ago

So Bill James’ stat projections are bloated and unreasonable?  Sounds a bit like the man himself.

Sorry, I know James has led the statistical revolution and contributed mightily to the game, both directly and indirectly.  But reading his work, whether in books or comments in a chat, I get the feeling he’s among the more arrogant people on the planet.

Patrick
14 years ago

Derek,

I’m sure you understand this argument, from what you wrote, but I’d like to stress it, on behalf of the more considered complaints about the Bill James system (and this is more from a real-world than a fantasy baseball perspective).  Just before I start – I don’t know if the James system overstates offense and gets league average offense wrong.  I haven’t read The Book Blog’s thoughts on it.  Assume for a moment that it does, and here’s my opinion:

We do not lack for context.  We have one very simple context.  The real world and the players actual stats.

If James’ projection system is consistently over-projecting offense, then, as a projection system, there’s something wrong with it.

If all you’re interested in is using the insights of the James projection system for relative judgements with no consideration of the actual final numbers, then yes, it’s just fine.  That’s good for ranking fantasy players.

And if you’re willing to use only the James system, it’s good for making real world player decisions as well…  You just have to use only the James numbers in your decisions, or come up with a correction factor to bring them to nearer average.

If the other systems do a significantly better job of predicting leauge average offense, then they’re doing something better than the James system and the James system needs to be adjusted.

Maybe by some very simple adjustment, just tweaking the run scoring environment down a bit (though this might be done with components, and so it might NOT be so easy), etc, but it still should have it.

Perhaps the James system is amazing at predicting relative performance and in that sense is better…  But then we still need to scale it to match the real world averages.  And James et al could do that, and should, if they want it to be used for its actual numbers, as opposed to using only relative-to-average (OPS+, etc) type statistics when drawing from the James system.

Like I said, we do not lack for context.  We have one, simple, solid frame of reference, against which, ideally, we judge all projection systems.  If they spit out component numbers rather than relative-to-average numbers, then it seems to me they’re saying they want to be judged against the real world statistics, and not on a relative-to-average basis.
—-

This is not intended to give any offense – I think you get this – but I would put a different emphasis on the situation.  I very much sympathize with those who are uncomfortable with the James projections for this reason.

James
14 years ago

I’m not sure this article said anything new. But I am stunned that no one suggested taking the projections made by Bill James, CHONE, Marcels, et all and compare their projections to the actual stats of the players.  Which service was closer to predicting what ARoid would hit last season? Or Jason Bay?

Andrew
14 years ago

Hi, Derek. I was just curious to know if you’ll be participating in any mock drafts anytime soon. Looking forward to seeing who you like for 2010.

Derek Carty
14 years ago

Yes, gtwma, the truth of the matter is, there is little difference between projection systems in terms of overall accuracy (including James).  It’s my assertion that the only substantial benefit to be gained (in fantasy baseball) from using one system over another is by using a system that is significantly different than the others.

TCQ
14 years ago

Nothing new to say here, but I’d like to give some props to Patrick…that was a really good extension of what I was trying to say(and what I think Toffer was saying as well).

Derek Carty
14 years ago

TCQ,
You’re right about that being a logical fallacy, although I don’t think I phrased what I said as carefully as I should have.  I think I was more directing it to Toffer’s comment of “if BJ’s projections are based on a league average different than MLB’s then they are simply wrong.”  I think that’s a little harsh since no system perfectly reflects MLB average.

Patrick,
You make some very valid points.  I think all systems should try to reflect league average as closely as possible to make things easier for people.  My ‘defense’ of them was simply meant for fantasy purposes where it doesn’t matter what the assumed league average is.  I’ve actually been talking a little bit with a BIS analyst who works on the James projections, so I’ll be posting his explanation of some of these issue shortly.

James,
I’m quite sure what I said is nothing new.  It’s just that these principles don’t seem to be as widely known as they probably should be.  I’d be very surprised if no one else has ever made mention of this sort of thing, but it’s an important thing to stress, nonetheless.

As far as comparing projections to actual stats, figuring out which system is ‘best’ is a lot more complicated than it sounds, methodologically speaking.  That’s a topic that would deserve a series of articles unto itself – just about the pitfalls or using different methodologies.

Derek Carty
14 years ago

Andrew,
I’ve only participated in one mock draft so far, the USA Today mock that will appear in their magazine.  I was going to be in Rotoworld’s magazine mock, but my hotel blocked Mock Draft Central so Paul Singman took my place.

I’ve participated in the official Mock Draft Central expert drafts the past couple years, so hopefully I’ll have the opportunity to do so again this year.  I believe I was in Draft #2 the past couple years (not sure if that means anything for this year).

I’ll keep you posted if I participate in any others.

Mike
14 years ago

A-Rod will hit over 40 HRs and will probably lead the league in HRs.  He hit 30 last year missing the first month of the season and still building back his strength for the next three months after the surgery.  He didn’t look comfortable until mid-August. 2010 will be the first year since 2007 that he’ll be healthy as his hip started bothering him mid way through 2008.

Bid high!

Toffer Peak
14 years ago

“TCQ – You’re right about that being a logical fallacy, although I don’t think I phrased what I said as carefully as I should have.  I think I was more directing it to Toffer’s comment of “if BJ’s projections are based on a league average different than MLB’s then they are simply wrong.”  I think that’s a little harsh since no system perfectly reflects MLB average.”

You’re right, that probably is a little harsh as no projection system is perfect. However I think it’s a pretty big flaw, particularly considering that it has been this way for multiple years.

I guess your main point is true enough that BJO’s system is just as good as others for creating an ordinal ranking of players (this was posted in the Fantasy section after all). But they probably shouldn’t be used if you’re trying to evaluate real players and real teams. After thinking about it for awhile, I really think that these are just targeted at the average casual fan who wonders how a player “should” do, which pretty much means that it assumes a player will stay healthy and regress only little if any from their baseline.

Ed
14 years ago

Toffer I think you’re getting the point. I have used BJ for years (I play mostly Scoresheet though so his defensive evaluations are more important to me) to help make out a ranking list.

Who cares if Bill James predicts too many HRs for a given player? The issue isn’t whether player A hit as many HRs in real life as BJ predicted, but whether player A was predicted to hit more HRs than player B within the confines of the system. If you can reliably predict player A will be more productive than B, regardless of the unrealistic numbers within the system, then you’re going to draft well and win your league, and that’s really all we’re interested in.

So, in that case the BJ system isn’t really for the casual fan, (keeping in mind the casual fan is much more interested in seeing “real” numbers without having to make the conversion within their heads) but for fans who are able to look beyond the numbers and measure the relative production within the projection system.