Projection Roundtable (Part 2)

by David Gassko
August 1, 2006

Note: You can read Part 1 here.

The cool thing about looking forward towards the future is that we don’t really know what it holds. We can fantasize about peace in the Middle East or the Pirates making the playoffs because we can never predict what will happen. And yet, in any area, whether it be political relations or baseball, there are people trying to do just that.

Since The Hardball Times is not a political website, we’ll leave that alone and instead let’s focus on baseball. Baseball projections are a multi-million dollar business in the U.S., with huge demand for fantasy information. Everyone from fantasy baseball players to major league teams relies on getting the best and most accurate projections to win. And yet there’s still so much to be discovered and understood about how to best predict the future.

To help facilitate some discussion of this issue, I gathered most of the best forecasters in the business for a round table that will run Monday through Friday in five parts. Besides me (David Gassko), the participants were:

Chris Constancio, who writes a weekly column for The Hardball Times and publishes his projections for minor leaguers at FirstInning.com.

Mitchel Lichtman, who is a co-author of The Book – Playing the Percentages in Baseball. He has been doing sabermetric research for almost 20 years, and for two years was the senior advisor for the St. Louis Cardinals.

Voros McCracken, who is most famous for his invention of Defensive Independent Pitching Statistics (DIPS), and is a former consultant for the Boston Red Sox.

Dan Szymborski, who is Editor-in-Chief of Baseball Think Factory, and publishes the ZiPS projection system.

Tom Tango, who co-authored a baseball strategies book called The Book – Playing The Percentages In Baseball, which is available at Inside The Book.

Ken Warren, who started doing player projections and studying baseball statistics in 1992, when he was totally unsatisfied with the projections that STATS was publishing in their annual handbook, and Bill James’ claim that he couldn’t or wouldn’t project pitchers until they had 500 major league innings. Since then he has been continually adding little bits of sophistication and complexity to his system, and has noticed small improvements in accuracy over the years.

David Gassko: Okay, next question: How important are park factors in a projection? I know Voros has said that he disregards park factors for all but extreme stadiums (like Coors Field or Dodger Stadium), and Tom has criticized the use of multiplicative park factors (that is, if a player hits 40 home runs in a neutral stadium, and he moves to a stadium with a home run park factor of 1.2, saying that he will hit 40*1.2 = 48 home runs). How useful are they, and how should we be applying them?

Mitchel Lichtman: Obviously, if you are just projecting a player’s actual performance and he does not change parks, you don’t have to do any park adjustments, which is ideal. If a player changes parks in the projection year, it is necessary to do some kind of park adjustment. If he has played in different home parks in your historical database (that you use for your projections), then park adjustments are necessary as well.

Personally, since my goal is to be able to compare one player to another on a level playing field, I use park adjustments for everything, and project a player’s context-neutral stats. Technically, if I were doing projections for a certain team (say, St. Louis), I might not do any park adjustments for St. Louis players (those who have been with the team for several years), and then project all other players as if they were going to play for St. Louis, so that I can compare players, with the assumption that they all were going to play for St. Louis.

So, any way you look at it, it is best to use park adjustments for some players in some situations. Obviously, for players who play their home games in extreme parks, if you want to project their context-neutral performance or their performance in another park, you better do some kind of park adjustments.

Park adjustments can be tricky and unreliable for many reasons, but they can be done, if you do them right. Some of the keys for doing good park adjustments are using multiple year data and the right regressions, and not using the “multiplicative method” that you mention. There are other “keys” which I am not at liberty to divulge (until 10 years after my death).

Voros McCracken: The first year I did projections, it was a rush job so I did disregard them, and my opinion was that you could do so and get away with it because for most parks it made little difference.

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

My opinion on park factors is that people too often make adjustments for statistically insignificant results. A 1% or 2% increase in a park factor is clearly statistically insignificant and tells us nothing about the park. The small studies I’ve done on the subject seem to confirm this: When you adjust for every minute difference you don’t really add any accuracy to future projections. When you stick strictly to the larger park differences, you do add a little accuracy to them.

Also I strongly disagree with using a player’s individual home/road splits to park adjust for him. In most cases, the sample is simply too small to make that kind of adjustment.

Interestingly enough, I’ve never found a whole lot of evidence that parks affect walk totals by more than trace amounts. The number of parks that have seasons with walk factors that measure as a statistically significant tend to be equal to the number you’d expect from chance alone. Strikeouts, yes, walks not so much.

All of this (except maybe the walks) changes when you look at the minor league and college levels, where parks differ in size and environment more severely than MLB parks do.

Tom Tango: I’m in the Voros camp here. The types of parks, until the new wave anyway, made it so that it didn’t make much of a difference insofar as forecasting was concerned. First, most players stay in the same park, so when you forecast Todd Helton or Larry Walker, it didn’t really matter what the park factor was. Secondly, as Voros points out, a few percentage points here or there is not worth getting worked up for.

Mitchel would likely say that something is better than nothing. Again, it goes back to cost/benefit. How much time do you want to spend, to figure out if Carlos Beltran should be +35 runs against +32 runs.

I’m not saying to ignore them, but, in terms of what worries me, park factors are lower on the totem pole. Coors, and the extremes, are always the exception.

Dan Szymborski: As Voros has stated, for most parks, the park factors aren’t going to make that much of a difference. The error bars in the projections and even the park factors themselves are going to dwarf the small adjustments you’d make to a player’s stats in most cases. Like pinch-hitting data for hitters, actual home/road splits of individual players don’t seem to help the accuracy of projections at all.

There are probably ways to utilize player type in order to make projections for players that change parks more accurate, but I don’t believe that anyone’s found that particular Holy Grail yet. Minor League Equivalencies (MLEs) do rely on good minor league park information, which has been frustratingly difficult to acquire since STATS stopped printing minor league park factors some years ago. In this department, Jeff Sackmann’s new site may very well be the most irreplaceable minor league site on the Internet now.

Ken Warren: Park factors generally only tell you the impact of a park on offense as a whole. Every player will be affected differently depending on what he brings to the table. It’s very possible that a player could move to a more hitter-friendly park, but be negatively affected personally.

If you’re going to use park factors to adjust individual projections you will also have to factor in whether a player is a fly ball or ground ball hitter, how much of his value comes from home runs, whether his home runs are hit predominantly to the same field or not. Once you determine this, then you have to determine how his specific skill set will be affected by the change in stadiums.

Another consideration is whether the new stadium affects the batting average on balls in play (BABIP) on ground balls. Apparently Colorado now has about a 19% hit rate on ground balls compared to about 23% in previous seasons. This is supposedly because they are not cutting the grass as short anymore. This kind of factor will affect different players’ projections differently.

So basically unless you’re prepared to customize your park factors for each player they are probably of limited usefulness for individual player projections.

Chris Constancio: I don’t have too much to add to what has already been said, but there are two main points that I would like to elaborate on:

You can afford to de-emphasize park factors for certain purposes. This certainly is not an option, however, when dealing with a sample of players playing in minor league stadiums (or college or even high school, for that matter). In some of the smaller minor leagues, it almost doesn’t make sense to talk about a “league average” independent of home parks. Ray Winder Field and Wahatburger Field look like they were designed for two different games, but they’re both Texas League parks.

My second point is closely related to Ken’s response. W hile many people are happy with a single “run” factor (along with a “hr” factor in some cases), I’m finding that it’s very helpful to individualize the factors to some extent. Handedness plays an important role in predicting how a hitter (and to a lesser extent, pitcher) will fare in a particular context. Similarly, batted ball information will also help you understand how a ballplayer’s actions will be affected by his environment. A left-handed ground ball hitter may not be affect by Fenway’s deep right field as much as left-handed fly ball hitter.

All these variables interact in ways that I find to be quite informative. I get bored when I hear someone talking about adjusting hitters’ performance for some “neutral” context. In my opinion, we should not be trying to de-contextualize hitters’ performances but instead figuring out how a MLB hitter’s tendencies will play out differently in the context of 30 different major league parks in 6 different divisions with unbalanced schedules. This is a difficult process and doesn’t lead to huge gains in projection accuracy, but it does give us some improvements and, just as importantly, improves our understanding of how ballgames are affected by contextual factors.

Mitchel Lichtman: Voros wrote:

Also I strongly disagree with using a player’s individual home/road splits to park adjust for him. In most cases, the sample is simply too small to make that kind of adjustment.

I don’t know what he means by that. When you park adjust a player’s stats, you apply his home park factors to his home stats and the road park factors (which is basically what’s “left over”) to his road stats.

And yes, as I said, the component factors (if that is what you are using) must be regressed appropriately. For example, for walk rate, as Voros indicates, the regression is almost 100% regardless of the number of years of data. The regressions are tricky because you want to regress toward the mean of the type of park you are dealing with. So, for example, you regress a park’s home run rate toward the mean of similar parks in terms of altitude, ambient weather, size, etc. It is tricky, but as I have always professed to Tango, if you don’t know exactly how much to regress something, computing the adjustment factors and regressing them a lot is always better than not doing the adjustments at all.

For example, let’s say that you have one year of home run data on a park that you know nothing about, and that one year home run factor is 1.20 (lots more home runs hit in that park than on the road). While some people may argue that you are better off not doing a home run park adjustment at all because you only have one year of data and you know nothing else about the park, they would be incorrect in not doing any adjustments.

You would be better off using a park home run factor of 1.01 and adjusting all player stats than not doing a home run park adjustment at all. Now, how much you actually regress one year of data for a park you know nothing about is tricky, but, as I said, it is definitely less than 95% (the 1.01) and more than 0% (using the 1.20).

Speaking of Colorado, I don’t think the grass is as much of a factor this year on hits per ground balls as is the new use of the humidor. And yes, of course, you have to use a player’s individual profile to make the proper park adjustments. For players like Scott Podsednik or Jason Kendall, who almost never hit home runs, on the one hand, a park’s home run factor won’t affect them much, but on the other hand, because a park factor is not multiplicative, players like them might hit two or three home runs a year rather than one in a park like the old Coors (not the new one, which is around neutral with regard to home runs).

I’ve mentioned this many times before and I think we are all in agreement that minor league park factors (as well as league factors of course) are very important. In fact, that may be the difference between good and bad MLEs.

Tom Tango: Right, that goes with my general point that Busch Stadium cannot possibly affect Vince Coleman, Willie McGee and Jack Clark the same way. Wade Boggs, because of the type of hitter he is, might be affected differently than Fred Lynn, at Fenway. And at Dodger Stadium. Or really any park.

I prefer to do very limited park adjustment, because doing complete park adjustments, even the “1%” kind, gives me very little comfort. It makes me initially feel that “yup, I’ve got the adjustment, let’s move on”, instead of making me feel “darn it, I’ve gotta get this adjustment down really well”.

Homer Simpson believes that the American way is to do it half-assed, and that’s really what most park factors are. Unless it’s Coors or the Astrodome or a couple of the extreme parks, I wouldn’t bother with park factors, until you put your whole butt in there.

I agree that minor league park factors are a different animal than major league ones, for a few reasons. Since we care about the major league forecast, we don’t have the Todd Helton/Larry Walker issue. In every single instance, we are talking about moving a player into a completely new environment.

To me, the least interesting thing about forecasting, is forecasting. The interesting thing is to learn about how the parks affect players, and how other players affect players. The forecast is simply some overall accounting, accumulation, of everything we know, and don’t know. We are, in effect, masking what we learned, by trying to come up with a forecast, without justifying that forecast.

I’d rather learn why a forecaster thought Dante Bichette and Andres Galaragga would have found success in Denver. Or not. Saying “Bichette will get 27 home runs” really does nothing for me.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG