The cool thing about looking forward towards the future is that we don’t really know what it holds. We can fantasize about peace in the Middle East or the Pirates making the playoffs because we can never predict what will happen. And yet, in any area, whether it be political relations or baseball, there are people trying to do just that.
Since The Hardball Times is not a political website, we’ll leave that alone and instead let’s focus on baseball. Baseball projections are a multi-million dollar business in the U.S., with huge demand for fantasy information. Everyone from fantasy baseball players to major league teams relies on getting the best and most accurate projections to win. And yet there’s still so much to be discovered and understood about how to best predict the future.
To help facilitate some discussion of this issue, I gathered most of the best forecasters in the business for a round table that will run Monday through Friday in five parts. Besides me (David Gassko), the participants were:
Chris Constancio, who writes a weekly column for The Hardball Times and publishes his projections for minor leaguers at FirstInning.com.
Mitchel Lichtman, who is a co-author of The Book – Playing the Percentages in Baseball. He has been doing sabermetric research for almost 20 years, and for two years was the senior advisor for the St. Louis Cardinals.
Voros McCracken, who is most famous for his invention of Defensive Independent Pitching Statistics (DIPS), and is a former consultant for the Boston Red Sox.
Dan Szymborski, who is Editor-in-Chief of Baseball Think Factory, and publishes the ZiPS projection system.
Tom Tango, who co-authored a baseball strategies book called The Book – Playing The Percentages In Baseball, which is available at Inside The Book.
Ken Warren, who started doing player projections and studying baseball statistics in 1992, when he was totally unsatisfied with the projections that STATS was publishing in their annual handbook, and Bill James’ claim that he couldn’t or wouldn’t project pitchers until they had 500 major league innings. Since then he has been continually adding little bits of sophistication and complexity to his system, and has noticed small improvements in accuracy over the years.
David Gassko: OK, last question. Seriously, we’re almost done. Where do you think we are in the baseball statistics community, in terms of our projection methods? Do we still have a ways to go, or are we close to maxing out what we can do with our projections? And what exactly is it that we need to do or figure out before we reach the point of diminishing returns (assuming we aren’t there yet)?
Ken Warren: It seems that we all have different methodologies and philosophies, so I suspect that there is lots of room for improvement. The next step in my mind is to compare projections that are created by different systems and methods, both before and after we have actual results, to see if one method or system actually outperforms the others. (For example, after the projections are done we could compare two systems in the following manner: Difference in OPS of .050+ , .030 to .049, .020 to .030, and <.020. Ignore the players where the delta is less than .020 and then look for patterns in the other three categories. Age, position, specific skills or lack thereof. We could use a delta of .020 for xERA and so the same sort of thing. Then, take the actual 2007 results, adjust them for luck factors (BABIP and LOB%) and compare which system did the best on the projections where there significant variation. Possible ideas: Which system was more accurate on the most deltas, comparison of average error on the players where there was a significant delta. Or maybe find some combination of ideas that will lead to the best results of all. We would need to know the basic assumptions that each projection system was making in coming up with the projected values. Aging trends, similarity scores (assumptions and how they are incorporated). Probably after a few seasons of data and actual results, we would start to learn a lot about what is working better and what is not working so well. And maybe be able to develop a comprehensive projection system.
Mitchel Lichtman: As far as where we are and where we are going (a standard question in sabermetrics), good projections are not that hard to do, as evidenced by Marcel the Monkey, so I think we are and have been for a while at the point of diminishing returns, practically speaking. I have been improving on my projection system for almost 20 years, and I think I may have improved by 10% over those 20 years.
By the way, as is often the case when you have lots of smart people working on a particular problem and using different methodologies, there is a lot of collective wisdom. In other words, it is likely that a combination of everyone’s (who is competent) projections is much better than any one individual’s.
Chris Constancio: Any good similarity system should really start with a more traditional “regression based” projection, so even if someone “nails” the regression-based system, that doesn’t mean there isn’t room to enhance projections with other techniques.
I think the similarity systems are one way—not the only way or even necessarily the best way—to address heterogeneity of development. I think my first e-mail message identified this as one of the main areas of weakness in most current projection systems. As in most areas of life, not everyone who plays baseball professionally develops in the same way at the same age. Instead of treating this as “noise”, we should really focus on different paths of development and try to understand what they mean. We lose a lot of information when we aggregate aging patterns to model a fictional “average” player’s development.
I might be over-simplifying what most people do when projecting ballplayers’ performances. If anyone feels like they already address this problem or have ideas about how to do so, I would be very interested in reading your thoughts.
Finally, I just want to echo the “only so much you can do” criticism of similarity systems. One perceived advantage of similarity systems is that they are better at handling “unusual” players. This may be true to a certain extent, but in some cases it’s just a feel-good claim and the similar player info doesn’t really add much to the projection. It was really difficult to find comparable pitchers to Felix Hernandez at age 19, so the similar player info really did not end up getting much weight. In that case, there should be no significant difference between a good similarity system’s projection and a good regression-based system’s projection.
Vorors McCracken: I think the possibility of “maxing out” projections is really the wrong point of focus. I’ve always argued that teams that don’t use reasonable statistical projections for players are just being stubborn and foolish. It’s not that they’re perfect, but clubs absolutely must have some reasonable expectation of what the various players can be expected to do before they can make sound decisions on what moves to make.
As for how projections can be improved, I think the biggest area to improve is to how to deal with changes in performance from one year to the next. The problem is determining real changes in ability from statistical flukery. But even if we work real hard, it’s worth remembering that we’re not projecting something like a child’s future height or the weather. The results we’re comparing our projections to are not quite what we’re really after. They are merely insights into the true ability levels of the player in question, not the ability levels themselves. “Thirty-two home runs” isn’t some abstract rating describing Jason Bay’s power hitting ability in 2005, it is simply his home run total. This allows you to say something amusing like, “my projection was 100% accurate, it was the results that were all wrong.”
So because of the inherent randomness of the results, the diminishing returns of improving projection methods brought up by Tom come into play. Still, it may wind up being a case where the new methods don’t help you at all for 149 players but help you immensely for the 150th. In an environment where being able to identify over and under-valued players, I think further refinements can still be useful in individual cases, even if on the whole they don’t do much.
Chris Constancio: Well, we’re certainly at the point of diminishing returns in terms of projecting the performance of players who have enough experience to earn lots of guaranteed money in MLB. And that’s what organizations should be most concerned with.
There are a few areas in which there’s still a lot of work to be done, however. Here are three I’ll throw out there:
1) Long-term projections: I think lots of people are good at projecting results for the upcoming season, but we’re in our infancy of understanding player development. There are not many 3- to 5-year projections out there and we don’t know how good they are.
2) Estimating ranges of ability and performance: Not only do we need to convey the range of expected results given an ability level (as Voros nicely described), but we need to better estimate the range of ability level we can expect from a player in the short-term and long-term. Confidence intervals are a good thing.
3) Projecting performances for very young players: Nobody is any good at this right now and future prospects are limited, but that doesn’t mean we can’t use some performance and scouting information for teenagers to get some better-than-random estimates.
Ken Warren: But instead of using “home runs”, if we used “isolated power” then we would have something that actually measures Jason Bay’s current skill level, and doesn’t have nearly the same amount of randomness or luck. We need to focus on stats that measure skills rather than traditional baseball stats that measure results and are subject to a lot of randomness and luck.
ERA, hits per inning, batting average on balls in play (BABIP) (for both hitters and pitchers), batting average, home runs, and RBIs are not measurable skills.
Skills can be measured as follows.
Plate discipline: BB/PA or OBP-AVG
Hit rate: (AB-K)/AB
ISO: SLG – AVG
Strikeout rate: K/IP
Control rate: BB/IP
Home run rate: HR/Outfield Fly
xERA: which calculates what a pitchers ERA would be with a league average BABIP and league average strand rate based adjusted for his OBP against. For example, pitchers who allow fewer base runners would naturally be expected to have higher strand rates.
David Gassko: I agree, Voros and Ken, that what’s more important—and what we’re really looking for—is measuring “skill” and not necessarily what a player will actually do (though on a macro level, those things will even out and basically be one and the same). But Ken, though your projections obviously do quite well, I disagree with how you choose to define skill. BABIP, especially for hitters, is most certainly a skill. And something like Isolated Power isn’t quite skill either. ISO is still a combination of extra-base hits, and those too are prone to randomness and luck. Either, we need to be looking at the most granular data possible (distance per fly ball) or we need to admit that randomness and luck affect all statistics, but still try to adjust for even the smallest effects.
Ken Warren: I guess I should have been more thorough. I assumed that this sort of stuff is pretty well-known by this group.
Every hitter has a career norm BABIP. It is the variations from this that are luck related and not indicative of a new skill level. You can then plug this value in the batting average formula. Batting average = (hit % X contact rate) and come up with a players xBA. Any variation between this xBA and his actual batting average is not skill but luck.
In summary, you are right that BABIP is a skill for hitters, but we do know what a players BABIP should be based on his prior history. Fluctuations from this norm do not represent a change in a players skill level. If a player has increased his skill level by hitting more line drives or for more power, this will be reflected in the xBA formula.
I do take all this into account in my projections. I just didn’t explain in enough detail. Sorry.
My experience is that changes in a players ISO are sustained and do reflect a change in skill level. Any randomness in ISO is minimal compared to the randomness in such stats as AVG, ERA, RBI, etc.
David Gassko: In that case, I agree that you can probably ignore actual BABIP for hitters. You might lose a little information, but not too much.
Tom Tango: What we are always after is the underlying true skill of the player. ” Home runs” are not a skill, but a result. This is where scouting, injury information, and batting/pitching approach comes into play.
Because of the huge amount of variation in the data, it’s important to be able to distinguish between Brady Anderson‘s 50 home runs and Sammy Sosa‘s 50 home runs. What we care about is how he accomplished it. How much away from his true mean those results were. The more real things we know about them, the better we can regress his sample data to his true talent.
We have very little ways to go here for experienced hitters. We have some ways to go with pitchers, and we have a lot of ways to go with new players.
I’d get a team to spend as much money on scouting as possible, because they hold the key to closing their eyes to the numbers, and providing me with the necessary data.
David Gassko: So what you’re saying then is that the key to improving projections down the road is better scouting info?
Mitchel Lichtman: I would certainly agree with that. Theoretically, if we had perfect scouting information we would not need any data to project a player’s performance. On the flip side, when we have large amounts of data, we don’t really need any scouting info, other than that which might clue us in as to whether (and why/how) a player’s talent has changed. Bottom line is that for young players and prospects, scouting is key, and for experienced players, scouting is not very important at all, although definitely useful. The question then is what is more important in the grand scheme of things, projecting veteran player performance (for trades, salary, lineup construction, etc.), or projecting prospects and young players.
References & Resources
xBA, xERA, xOBA and xSLG, speed rating and “contact rate” are all proprietary statistics developed at BaseballHQ.