The cool thing about looking forward towards the future is that we don’t really know what it holds. We can fantasize about peace in the Middle East or the Pirates making the playoffs because we can never predict what will happen. And yet, in any area, whether it be political relations or baseball, there are people trying to do just that.
Since The Hardball Times is not a political website, we’ll leave that alone and instead let’s focus on baseball. Baseball projections are a multi-million dollar business in the U.S., with huge demand for fantasy information. Everyone from fantasy baseball players to major league teams relies on getting the best and most accurate projections to win. And yet there’s still so much to be discovered and understood about how to best predict the future.
To help facilitate some discussion of this issue, I gathered most of the best forecasters in the business for a round table that will run Monday through Friday in five parts. Besides me (David Gassko), the participants were:
Chris Constancio, who writes a weekly column for The Hardball Times and publishes his projections for minor leaguers at FirstInning.com.
Mitchel Lichtman, who is a co-author of The Book – Playing the Percentages in Baseball. He has been doing sabermetric research for almost 20 years, and for two years was the senior advisor for the St. Louis Cardinals.
Voros McCracken, who is most famous for his invention of Defensive Independent Pitching Statistics (DIPS), and is a former consultant for the Boston Red Sox.
Dan Szymborski, who is Editor-in-Chief of Baseball Think Factory, and publishes the ZiPS projection system.
Tom Tango, who co-authored a baseball strategies book called The Book – Playing The Percentages In Baseball, which is available at Inside The Book.
Ken Warren, who started doing player projections and studying baseball statistics in 1992, when he was totally unsatisfied with the projections that STATS was publishing in their annual handbook, and Bill James’ claim that he couldn’t or wouldn’t project pitchers until they had 500 major league innings. Since then he has been continually adding little bits of sophistication and complexity to his system, and has noticed small improvements in accuracy over the years.
David Gassko: Back to something that was touched upon a little bit earlier. There’s a lot of talk about ekeing out a couple percentage points of accuracy in projection systems, but what frontiers provide the greatest potential for improvement? Where can a projection system get ahead by more than just a few decimal points?
Dan Szymborski: Getting an edge up on the competition is very difficult. The projectors that will do the best in this regard are simply the people that do the best job of applying new baseball research into their system. That being said, the differences are unlikely to ever be large. We may have more work left to do in determining which projections are most likely to be reliable rather than actually making projections more reliable. Error bars aren’t as sexy as determining which pitcher will have an ERA three runs worse next year, but they’re important for any kind of cost-benefit analysis.
Ken Warren: Once a good projection system has been developed, I think that the only improvements that you will be able to make are going to be very small. Like anything else, big improvements usually come in a series of very small steps.
Trying to incorporate good ideas from other systems that you are not currently using.
Since there is such a high luck factor in individual performance stats it’s not clear that there is a lot of improvement actually possible. To measure the accuracy of projections I think we need to normalize actual performance as much as possible.
For pitchers the parameters that I think we should focus on are: BB, K, K/BB, & HR rate.
For hitters we should emphasize OBP-AVG or BB/PA (plate discipline), SLG-AVG or ISO (power), HR, 1 – K/AB (contact rate), BB/K (batting eye). I believe that it is possible to project these skills very accurately for healthy players using aging patterns for various skills. To convert these projections to OBA, SLG, and OPS we can simply plug-in xBA instead of BA.
Mitchel Lichtman: What a lot of people don’t understand is that a projection system estimates a player’s true talent or his performance rates if he were to have an infinite number of place appearances or batters faced. A player’s performance in the next month or year or whatever is just a sample of that talent and is necessarily and inevitable subject to sample error. That is why Tango says that even a perfect system (one that would tell us exactly what a player’s true rates are) has a limited correlation coefficient (or whatever you want to use to measure accuracy) for one year of performance data of around .7 (or whatever it is).
Add to that the fact that a player’s true talent can change at any given time, he can get injured (or stay very healthy), etc., all of these things essentially unpredictable, and what we find is that is it is impossible to project “performance” per se to the level that some people (erroneously) expect. I don’t know if I am getting my point across, but what I mean is that when we look at one year of performance and compare that to our projections, there is so much random fluctuation in the one year of data that a portion of our projections are necessarily going to look horrible and there is nothing you can do about that (of course).
That being said, I don’t think that there is a “holy grail” frontier that will squeeze out those few extra percentage points. It is the sum of all of the things that we have been talking about throughout this discussion. Certainly, as Tango said, the TLV pitch data is nice source of wonderful data for improving projections, among other things. The batted ball work being done and utilized by some researchers for projections is excellent. Minor league analysis can be improved for young players.
Use of park factors can be improved, as we’ve discussed. A better understanding of how player talent develops is something that can be done in the future. Injury projections. Better understanding of how aging varies and differs among different players and player types. And of course incorporating character and other intangibles into our projections systems. I am working with a few unnamed sports journalists on that.
David Gassko: OK, so we’ve got two diverging opinions here. Ken is saying that there isn’t much we can do besides measure the very basic components of skill. Mitchel is saying that there are still things that can have a relatively strong impact on the accuracy and utility of a projection system. Looking at what he cites, a short list is as follows:
a) Batted Ball Data
b) Injury Data
c) Individualized Aging Patterns
d) Psychological Factors
e) Pitch-by-Pitch Data.
It seems to me that we should add scouting information into that mix (I’ve found it improves projections remarkably) as well. Is there anything else we’re missing? And is this all really important, or do these little things all add up to a hill of beans?
Ken Warren: I thought I was saying pretty much the same thing as Mitchel, in this post and my previous posts.
a) Batted Ball Data: This is the basis for eliminating luck from prior performance stats, and I agree that it is critical.
b) Injury Data: This is how we know whether we are dealing with a healthy player or not
c) Individualized Aging Patterns: Aging patterns vary based on player skills. For example a speed player peaks at an earlier than a power hitter, who in turn peaks at an earlier age than a strike zone control specialist. If we project each skill independently from each other we will automatically be taking into account how different types of players age differently.
d) Psychological Factors: Not sure this is practical. I doubt that we can even predict how psychological factors will affect our own performance, and these types of factors are not reliably available for many players. So how can we start using these factors when projecting baseball players. I’m quite sure that all of Barry Bonds‘ potential legal problems would affect other players totally differently that they affect his performance.
It seems to me that we should add scouting information into that mix (I’ve found it improves projections remarkably).
I’m not sure what this means or how it improves projections. If Troy Glaus is known to scouts as a hitter who likes sliders on the outside of the plate, can I use that information in doing his projection? If so, how?
Chris Constancio: One more idea I’ll throw into the mix: We need better estimates of the error surrounding an individual prediction. No single set of numbers will ever exceed .8 correlation with actual results on a consistent basis, but estimates of variance around a prediction can be quite useful for a number of purposes. I know Marcels now includes a reliability score and that’s a step in the right direction. Do others have ideas about how to handle this problem?
Individualized Aging Patterns: Aging patterns vary based on player skills. For example a speed player peaks at an earlier than a power hitter, who in turn peaks at an earlier age than a strike zone control specialist. If we project each skill indepedently from each other we will automatically be taking into account how different types of players age differently.
I’m not sure what you mean here. Or rather, I think I know what you mean and I don’t agree. When I think of individual aging patterns, I don’t think what you’re suggesting really solves the problem. To take a basic example, would you project a 30-something-year-old catcher’s power to decline in the same way you would project a 30-something-year-old right fielder?
David Gassko: When I talk about scouting information, I mean things like pitcher stuff. I looked at it here , and found that a pitcher’s future ERA was 60% past performance, and 40% stuff. For example, I expect Jeff Francoeur to be a better hitter than other 22-year-olds with a sub-.300 OBP (Melky Cabrera?), because he looks good. He isn’t a good hitter yet, but I would project him to improve more steeply than your average 22-year-old. This sort of ties into Mitchel’s use of “character” in his projections: things that scouts can see and a stat line does not include which might tells us about a player’s future performance.
Mitchel Lichtman: They all add up to more than a hill of beans. And yes, I agree that scouting (and draft) info can be important for young players. If nothing else, the mean that a projection system regresses toward should be the mean of players from similar draft rounds, scouting assessment, etc. For young players that is. For established players, I hate to say it, but scouting info/traditional evaluation is worthless, except perhaps for older players who may or may not be losing their skills or for players who may be injured or coming off an injury-plagued season or two. For established pitchers, maybe (and I mean maybe), some scouting info might be helpful.
I was kidding about the character/psychological thing, by the way. Not that I discount it, but it would be awfully hard to put into a projection system; except for Jeff Weaver of course.
Ken Warren: Chris, yes, I would. Is this wrong?
I’ve heard people say that second basemen don’t age very well. Then I see Craig Biggio and Jeff Kent still going strong. Carlton Fisk and Micky Tettleton were power-hitting catchers that did well at older ages. Mike Piazza is doing great this year, playing in a very pitcher-friendly ballpark.
Is there any evidence that hitters age/peak differently based on the position they play? I don’t recall any of the studies that took a look at a player’s peak performance years finding differences across positions.
Mitchel Lichtman: Tom usually has good ideas about confidence intervals surrounding projections, or whatever you want to call them. And PECOTA is “famous” for them of course. A system like PECOTA, that primarily uses similarity scores of other players (among other things I think) for their projections, is a natural for incorporating those confidence bands. For example, if 50 players are similar to player A and 10 of them exceed player A’s projection by 10%, they can say that there is a 20% chance that player A will exceed his projection by 10%, etc. (I am simplifying things). I’m not sure how reliable those confidence bands are, however. The sample sizes of those similar players is just too small.
To me, the less the information you have on a player, the less reliable the projection is. Period. If we know nothing about a player, we simply project him as league average and our confidence band is exactly equal to the variance of talent among major league players. Period. If we have an infinite amount of historical data on a player, then our projection is spot on, with a zero width confidence band (more or less). Everyone else (which is all of our players of course) is in between.
Keep in mind (and this is an important point, or distinction if you will), that the confidence band surrounding our estimate of a player’s true talent is completely different from the confidence band surrounding a player’s expected performance in some time period, usually a full season. For example, if we had that perfect projection for a player who has an infinite amount of historical data (and his true talent never has or never will change), we have a zero width confidence band around that projection, but there is a non-zero width band around our estimate of that player’s performance over the next year.
Of course that confidence band is a mathematical given. If we have a perfect projection on a player’s batting average , say .280, plus or minus no batting average points, if we are going to compare that projection with his actual performance in the next year, say over 600 at-bats, then we would expect that player to hit exactly .280 plus or minus 40 points in batting average, at two standard deviations. Those 40 points are given by the binomial approximation of batting average in 600 at-bats. So there are essentially two very different “confidence bands” or “intervals” if you will.
One is trivial (the random variance around the actual performance over some finite time period, usually one season), and the other depends on the variables going to into your projection system (age of the player, amount of historical data, injury history and chance of future injury, etc.).
I agree with what David says about scouting information, although I was kidding about the character thing.
I think there is evidence that players at different positions age differently, but the “real” variable might be speed (or size) or something like that. In my research, the only thing I found was that faster players aged better than slower ones. Anything else I tried came up zilch. I have heard said that players with onlyspeed skills do not age well (after their speed goes they have nothing else). I have not confirmed this though.
Ken Warren: My studies indicate that speed peaks at age 24 (much earlier than any other skill), so it makes a lot of sense that “speed only” players will peak early and then fizzle out. Terrence Long, Roger Cedeno, and Juan Pierre are a few guys that come to mind.
What I get from David’s article is that past ERA is a poor indicator of future ERA and/or past performance, which we already know. If we used xERA, developed by Baseball HQ, we would get a lot higher co-efficient than 60% for “past performance” I would imagine.
As for Francoeur and Cabrera:
Minor League Equivalent (MLE) OPS
Francoeur has been clearly superior (statistically speaking) for all of the past three seasons. I’m not sure how scouting reports would help with doing a 2006 projection. Francoeur may not be “a good hitter yet,” but he has been a lot better than Melky Cabrera.
Mitchel Lichtman: Just because speed peaks early (and I am not sure it peaks at age 24; in fact, I am pretty sure that it peaks earlier than that, but we don’t have enough major league data to know exactly when) does not mean that players with speed only (whatever that means) will “fizzle out” early. For whatever reason, players with speed, even with “speed only,” may retain all of their skills better as they age, than players who are not fast. Again, I don’t know whether it is true or not that players with “speed only“ (again, there has to be a specific definition for that) do not age well, but by no means is it self-evident.
I’m not sure where these MLE OPS numbers come from and whether they are normalized to a major league OPS, but for 2005, I have Francouer at .673 and Cabrera at .628. There seems to be a wide variety of MLEs depending on the source. Part of the reason is that the park and league factors are so important and so variable for the minor leagues. Another reason is the data.
Are intentional walks included in OBP (they should not be)? Are HBP considered a walk (they should be)? Are sac bunts included as anything (they should not be)? Are sac flies counted as a regular out (they should be)? Not to mention, how are the MLEs done (what coefficients are used? Are they applied linearly, etc.)? I have seen MLE’s all over the board and frankly I don’t trust too many of them (other than my own of course).
Again, I think that scouting reports (good ones at least) are important for young players, as we have to regress their sample data towards something. If we simply regress it towards that of an average rookie or something like that, that can’t be correct when we can regress it toward the mean of rookies who are highly touted (or not), or first round draft picks, or whatever. Regression should always be toward the mean of the smallest group you think the player is similar to, assuming that that group has a different mean than the larger group it comes from. For example, big, first-round, lefty batters probably have different means than small (say a shortstop or second baseman), fifth-round righties.
David, I re-read your article on “stuff” from last December. You used STATS ratings from ESPN.com and regressed that and 2004 ERA (and Marcel projections) on 2005 ERA, right? What year was the “stuff” ratings compiled? Before or after the 2005 season? For the study to be valid, the stuff ratings have to have been compiled before the 05 season. If they were compiled after the scouts have “seen” the pitchers’ 2005 performance (and are obviously influenced by it), the study is worthless of course.
Ken Warren: I have used Baseball Forecaster‘s speed ratings for the past several seasons. Every season so far has shown that age-24 players have higher speed ratings than any other age, at least at the major league level. If a player is well above average in speed and well below average in health, power, contact, plate discipline, and batting eye I would say that he is a speed-only player.
400 AB, 100 H, 15 Doubles, 5 Triples, 2 HR, 25 BB, 80 K, 27 SB, 11 CS, .250 BA, .294 OBA, .328 SLG, .622 OPS.
This to me is a “speed only” guy.
Cabrera’s MLE OPS comes from Baseball Prospectus 2006 (page 313) and Francoeur’s from Baseball Forecaster (page 71). Baseball Prospectus gives Francoeur a .738 MLE OPS for his minor league AB and .873 MLE OPS for his major league time, which would be very much in line with Ron Shandler’s .789 (combined rating).
Francoeur posted a .809 OPS in 367 at-bats at Double-A, and .885 in 274 major league at-bats. Intuitively it’s hard to see how this results in a .673 MLE OPS for the season. Is there an explanation why it would be so low?
Mitchel Lichtman: I don’t know and I am too busy to look it up. Basically I take a player’s minor league stats and park and league (within the level) adjust them to make them context neutral Double-A or Triple-A (I don’t do MLE below Double-A). Then I apply linear MLE coefficients to each component (singles, doubles, triples, home runs, walks, strikeouts). Then I normalize that to the average rates for the major leagues for whatever year we are talking about.
That is why I had Francouer projected so crappily this year (because of such poor MLE’s), despite his good major league stats last year. I was not disappointed either, as his park adjusted linear weights so far this year are -21.2 per 630 PA (150 games), which is sub-sub-replacement. Melky has not disappointed either, with a -16 linear weights per 630 PA so far this year.
Since the pool of players is different at all age levels, in order to see at what age players in general “peak” at whatever skill, you have to look at the average difference between all players’ 23 and 24 year, 22 and 23 year, etc. That is the “delta method” and is the only way to do aging curves and establish peaks. You cannot look at speed scores among 23-year-old players and then 24-year-old players, etc. Heck, if you do that, you will find that offense peaks at like age 30 or 31 (since age 30 or 31 players are the best players offensively; the bad players never make it to age 30 or 31).
When the average difference goes from positive to negative, that is a peak age. That is hard to do for speed since there are so few players who have played in the major leagues at age 20-21 and 21-22, etc. When someone does that (uses the delta method) for speed, they can come back and tell me the peak age for speed. They probably need to look at minor league stats though, since, as I said, there is not a large enough sample of major leaguers at such a young age.
Ken Warren: I agree with you, but I don’t have the time, resources, or inclination to do all that.
I think that we agree that speed peaks very early compared to other baseball offensive skills, which is really the point that we were working with. And I do think that evidence does show that players whose primary skill is speed will not have very long careers.
Heck, if you do that, you will find that offense peaks at like age 30 or 31 (since age 30 or 31 players are the best players offensively; the bad players never make it to age 30 or 31).
I suspect it is primarily the “speed only” players, AKA “bad players” who are the ones who don’t make it to age 30 or 31.
Mitchel Lichtman: Ken said:
And I do think that evidence does show that players whose primary skill is speed will not have very long careers.
I am not trying to be argumentative, but that is not what we were discussing. They may in fact not have long careers because they are not that good. The question is whether their aging curves relative to their own baseline overall offensive production is different from the average player, and if so, in what way. How long a player’s career is, is mostly determined by how good he is overall at some baseline age.
Ken Warren: In projecting player performance there is no such thing as “the average player.” Some guys have power and plate discipline. Some have power, but poor batting eye and contact skills. Some have speed and can put the bat on the ball, but little power. Some have speed and a great batting eye. Every different type of skill set will have a different aging curve.
I don’t think it makes any sense to say that there is a peak age that applies to all baseball players. It all depends on what skills a player is most proficient at, and at what age those skills peak. We accept that in other sports; tennis players peak around 25, golfers in their late 30s, gymnasts and short distance swimmers in their teens, sprinters in their early 20s, quarterbacks in their early 30s, running backs in their early 20s, hockey forwards in their mid-20s, defensemen in their late 20s, and goaltenders in their early 30s. It only make sense that speed players will peak at a different age than power guys, who in turn will peak at a different age than strike-zone control guys, or contact rate guys.
Mitchel Lichtman: I think this discussion is going nowhere and getting bogged down in semantics, but I’ll make one more reply.
We can certainly say at what age the average player peaks at total offensive production as measured by OPS or lwts (or EqA or whatever), or whatever other statistic, if we want to. It is around 28 for the average player. Obviously different players have different “true” (as opposed to at what age they happened to have their best offensive season) ages at which they peak offensively.
And yes, that probably depends on what type of player they are, among other things. The more we can isolate which type of players peak at what age, the better our projections. And yes, it is useful in a projection “system” to use different aging curves for each component. In fact, it is more than useful, it is necessary, as the components have largely different aging curves. As far as I know, only players with speed versus players without speed have been shown to have significantly different aging curves overall (for overall offensive performance). If you can point me to some research that indicates that there are other types that have different aging curves for overall offense or for any individual component, please do so.
It does not necessarily make sense (to me at least) that “speed players will peak at a different age than power guys, who in turn will peak at a different age than strike-zone control guys, or contact rate guys.” Again, peak at what? Overall offense? Some component? Do you have data or some research to back up this claim or does it just “make sense to you?”
As far as different types of players having different types of aging curves for one or more of the various components (for example, fast players peak in speed scores at a different age or rate than slow players), that would be difficult to analyze and “prove” as you would have huge selective sampling problems that may be insurmountable.
David Gassko: To respond to Mitchel’s earlier comment, the scouting reports were gathered prior to the 2005 season.
Mitchel Lichtman: You still have the problem of the scouting reports being influenced by the pitcher’s 2004 performance, which they no doubt are. IOW, the scouting reports and the 2004 ERA are not nearly independent. I’m not sure how that would affect the regression equation though.
Voros McCracken: And this is, of course, with a lot of scouting and in particular the merger between scouting and statistics. You simply don’t know how the player’s current statistics are being factored into the scout’s report on the player.
Tom Tango: Actually, you do.
For example, when I correlate the 2004 UZR to the 2005 UZR, and the 2004 Fans Scouting Report to the 2005 UZR, I can tell that the 2004 UZR and Fans are independent.
The r for the UZR is .5 (r-squared is .25) The r for the Fans is .35 (r-squared is .12) The r for both is .6 (r-squared is .36)
If they were truly independent, I’d get .25+.12 =.37. That I get .36 tells me that they are virtually independent.
Scouting information is crucial, when your sample size is low, or there is a fundamental change in a player’s approach or skills.
Mitchel Lichtman: I would think that UZR and scouting would be almost completely independent as there is nothing in UZR that the scouts can “see.” In fact, UZR is almost designed to measure what no one can see.
Scouting reports on pitchers and ERA are completely different! The scouts can see the ERA and are unlikely to say, “What a great pitcher,” when he has a 5.50 ERA or, “What a crappy one,” when he has a 2.50 ERA. There is no doubt that ERA and scouting reports are very dependent.
That being said, I don’t remember what the individual “r”s were and the combined “r” was for ERA and scouting report in David’s sample of pitchers.
David Gassko: There is some dependence. Marcel’s and my “stuff score” had a .55 correlation. Marcel’s had a correlation of .49 with 2005 ERA, “stuff score” had an “r” of .45. If they were totally independent, they would have a .67 correlation, rather than .55. But I don’t see a problem with dependence. It seems pretty clear to me that scouting information is extremely important for building projections, especially those of young pitchers. In fact, and I’m thinking out-loud here, but I believe the correct mean to regress to would indeed be based on scouting info.
Mitchel Lichtman: Well, the fact that you got a higher “r” combining the two means that they are somewhat independent, so there is no problem combining the two. But if they ARE mostly dependent, then the scouting info is worthless over and above the ERA. But it all depends on the correlations I guess, although for one season of ERA or scouting I would think that our sample error on those “r’s” is pretty large.
As far as what number to regress to, if the scouting reports are mostly independent, then by all means regress to them. If they are somewhat dependent, which they apparently are, then you don’t want to regress to them; you either want a third term in the regression equation, or you want to use the “two-term” regression equation for ERA and scouting and then simply regress that “answer” toward some league average ERA (by age or experience, or L/R, or whatever). That is the same thing as having 3 terms though (I think).
But I will defer to the statistical experts of which I am not one.