Oliver, smarter than your average monkey

…and not just because he’s a chimp.

Oliver, a full-featured (batting, pitching and fielding) player projection system presented at The Hardball Times Forecasts, had its beginnings three years ago in an attempt to answer the question, “Just how good is Rajai Davis?” upon his promotion to the Pirates. The key concept is context—looking at all the other players who have come before, accounting for their ages, the ballparks they played in, the level of competition, and (where play-by-play is available) the in-game situation, and comparing the expected value to the observed. Oliver is also designed as a “learning machine,” in which rules define the relationships between the data, and it is expected to produce better and more specific results as more data are added.

This system of finding how other players did at each stop in the minors compared to their major-league performance soon outgrew a spreadsheet and was moved to a relational database—first Microsoft Access, and now MySQL—where it tops 50 gigabytes in size.

The then-unnamed projections were introduced to the public in an Aug., 2008 piece at the late great Statspeak, entitled “Turning the Monkey Into a Gorilla,” as it was at that time based on Tom Tango’s Marcel model. That was a weighted mean of the past three years, regressed to the league mean and adjusted for age, to which were added park factors and minor league statistics. Tango suggested naming it Oliver, after the chimpanzee who displayed human characteristics.

Oliver’s 2009 preseason batting projections were hosted at FanGraphs, alongside Bill James, ZiPS, Marcel and Chone. Before the 2010 season they moved to The Hardball Times, adding pitching and defense. Plans for 2011 include expanded defensive statistics and an injury database (though, as always, there are more ideas than time to implement them).

Which players are projected?

The goal is to profile every professional player:
{exp:list_maker}Everyone in Gameday, which includes the major leagues, the affiliated minors, Mexico and the fall and winter leagues is included. From those, add any player who appeared in a game or on a game roster in the current season, or is listed in MLB’s biographical database as still being active.
Players in Japan’s Central and Pacific Leagues will be updated weekly in 2011.
Players in the major American independent minor leagues (Atlantic, Frontier, American Association, CanAm, Northern, Golden and United) will also be updated weekly in 2011.
Player stats in the Dominican and Venezuelan Summer leagues are collected and analyzed, but the player is not given a projection until he appears in a higher league.
Stats for college (most Division I and some lower schools) and major summer collegiate leagues (Cape Cod, Cal Ripken, Clark Griffith, Valley, Coastal Plains, Texas Collegiate) are updated annually, with the players given a projection once they are drafted. {/exp:list_maker}

The historical database starts in 1998, the year of the last expansion of the major leagues, which set the current ratio of the number of major to minor league teams. Having 13 years of statistics allows a comparison of past projections with their following seasons and the calculation of aging curves, both of which are used to adjust the current projections to make them as reliable as possible.

Gameday records begin in 2005, providing detailed play-by-play and batted-ball locations. When both season stats and play-by-play are available, the play-by-play takes precedence.

How Oliver works

Batters and pitchers and measured by these same sets of stats:
{exp:list_maker}Base hits per ball-in-play
Extra base hits per base hit
Triples per extra base hit
Home runs per ball contacted
Hit by pitch, bases on balls and strikeouts per plate appearance
Sacrifice bunts, sacrifice flies and grounded into double plays by balls-in-play {/exp:list_maker}
And for batters only:
{exp:list_maker}Steal attempts per single, walk, and HBP
Caught stealing per steal attempt {/exp:list_maker}
After any adjustments to these rates, they are reassembled into a standard counting stat line based on the player’s plate appearances or batters faced.

Within each league, players’ stats are adjusted for the ballparks they played in and their age during each season. Then all the players from each league who have also played in the majors have those adjusted stats compared to their major league performance to determine the league’s level of competition, which is then applied to all the players in that league to produce a Major League Equivalency (MLE). A projection is the weighted mean of the past three seasons’ MLEs, aged and regressed (to adjust for small sample sizes).

Park factors

Park factors account for how each park modifies the stats of the players who played there. They are derived by taking how each team’s batters and pitchers performed in their home games compared to what they did on the road. However, in this model, a park’s factors will change when any road park changes, even when there is no actual change in the park being measured.

For example, from 1978 to 1998, Fenway Park dropped from the second-easiest home run park in the AL to the second hardest, even though Fenway didn’t change. It was because most of Boston’s road parks were replaced with newer versions that were more homer-friendly. To determine a rate in each stat category for each park, I compare how each pair of teams (by right and left-handed batters) did each in each of their parks, for all the seasons that those parks coexisted.

For example, take all the Pirates-Cubs games in PNC Park and compare them to all their games in Wrigley Field in the same seasons. Then the Pirates-Reds in PNC Park compared to Great American Ballpark, etc. Repeat for each combination of teams and parks, then compare the sum of the home games to the sum of the road games for each park.

The resulting factors represent how the average performance in each park compares to the average of all the parks in its league—the numbers (unless further analyzed) are not comparable across leagues, but serve the purpose of normalizing performances within each league. Park factors are regressed to the average of their league to prevent extreme values in smaller sample sizes when there is a new park.

Aging curves

Aging is used in to adjust stats used to calculate league factors, and to predict how much a player’s skills will change several years into the future.

There can often be several years between a player’s performance in a given minor league and his appearance in the major leagues. When comparing a player at age 19 in a short-season league, and the same player at age 24 in the majors, you would expect that the difference in performance would be affected not only by the change in quality of competition, but by the player’s expected change in skill level as he ages. In an attempt to isolate aging, I used park-adjusted statistics for players who were in the same league in two consecutive seasons, regardless of the level of the league, and grouped by age.

A Hardball Times Update
Goodbye for now.

Each of the above-listed rate statistics has its own unique patterns. For example, speed-based skills such as triples and steals start declining by age 21, power peaks at 28 to 29, while walks and strikeouts peak in the early 30s.

When calculating and applying aging, Oliver uses an exact decimal age as of Aug. 1 of each season, rather than a rounded age. (If a rounded age were used, a player born July 31, 1990 would be considered age 21 in 2011, while another born Aug. 1, 1990 would be considered 20, and we would get aging adjustments that would treat them as being a year apart.)

League factors

Oliver calculates league factors by a direct comparison of each player’s performance in each minor league with his performances in the major leagues, while most other systems use a “chaining” process in which only performances in adjacent levels are compared. An example of chaining is when High-A is compared to Double-A, which is compared to Triple-A, which is compared to the major leagues, to get a High-A to majors factor. A problem with chaining is that each additional element in the chain multiplies any selection bias that might be present. A direct comparison, without adjusting for the age of the players at each level, gives a better estimate than chaining of how well the player will perform if and when he gets to the major leagues. Adjusting for age allows an estimate of the player’s true talent now.

Once age and park factors have been accounted for, Oliver calculates the translation factors for each league. Applying these factors to each player’s park-adjusted stats in each league, then summing into a single stat line, produces the player’s MLE for that season.

Past season weighting

To estimate the player’s current true talent level, several years of data need to be considered. Tests have shown that the best results are obtained by using the past three seasons, with the most recent given a full weight of 1.0, the next 0.8, and the last 0.6, so that the most recent season accounts for 42 percent of the total. Pitchers are less reliable from season to season, so their more recent season is given additional relative weight — 1.0, 0.7, and 0.5.

In-season weighting

A modified weighting scheme is used for projections generated at any time during a season. The current season is given a weight of 1.0, although the effective weight is equal to the percentage of the season’s games that have been played to that point. The previous seasons are given a sliding scale based on that percentage of games played. At the beginning of 2010, for batters 2009 had a weight of 1.0, with 2008 at 0.8 and 2007 at 0.6. At the end of 2010, 2009 will be weighted 0.8, 2008 at 0.6, and 2007 at 0.0 (not included). Therefore, after 81 games (50 percent) of 2010 have been played, the weight for 2009 will be 50 percent of the way between the preseason weight of 1.0 and the end of season weight of 0.8, which is 0.9. Likewise, 2008 is weighted 50 percent between 0.8 and 0.6 (0.7), and 2007 50 percent between 0.6 and 0.0 (0.3).

Applying future aging

The weighted mean of past-season MLEs gives an estimate of a player’s true talent now. Applying aging factors will give a talent estimate at any point in the future. In projecting the very next season, some aging has already leaked into the weighted mean, so a best fit is obtained by comparing projections from the past 12 seasons to their next season MLEs and then applying those factors to the current projections. All further years into the future are estimated by applying aging to the single-season projection.

Regressing to the mean

All the performance data collected for each player is a sample of his true talent. Flipping a coin does not always result in exactly 50 percent heads and 50 percent tails, even though we know that’s the true chance. The larger the sample size becomes, the closer the sample rate will be to the true rate. Regression helps avoid extreme values in smaller sample sizes by adding a fixed amount of average performance to each player’s stats. Whether projecting one or six years into the future, the base sample size remains the same, although more adjustments are applied with each additional year projected.

The values that each player is regressed to are determined by information about the player other than his performance—his age, position, and level played. A 19-year-old in Double-A will be regressed to a higher mean than a 23-year-old at that same level, as the team presumably considered the 19-year-old to have more talent in aggressively promoting him. Similarly, a first baseman will be regressed to a higher home run rate than a shortstop, as we know that, on average, first basemen hit more home runs than shortstops.

The lowest mean error of comparing each batter’s projected strikeout rate to his performance the following season is achieved by adding 100 plate appearances of average strikeout rate to each batter, while walks require 250 PAs of regression. Multi-year forecasts are subject to more variation, and are thus less reliable than those projecting one year into the future. Two years out, the optimal regression amounts needed to temper extreme values increase to 300 for strikeouts and 400 for walks.

Defense

Each player on defense is rated at various skills.

Outfielders:
{exp:list_maker}Percent of air balls caught
Extra bases on air-ball hits
Extra bases on groundball hits
Extra bases by baserunners (planned) {/exp:list_maker}

Infielders:
{exp:list_maker}Percent of grounders kept in infield
Percent of those made into outs (bunts listed separately)
Double play starts
Double play pivots {/exp:list_maker}

Catchers and pitchers:
{exp:list_maker}Percent of infield grounders made into outs
Wild pitches and double plays
Stolen bases/caught stealing
{/exp:list_maker}
The number of groundball hits to the outfield that are assigned to the infielder deemed most responsible for the ball is estimated based on the distribution of infield grounders given the same ballpark, batted ball vector, base/out state and handedness of the batter. For all others, the actual totals are counted as accurately as possible from the play-by-play descriptions.

The counted totals are compared to each player’s expected totals to get an adjusted rate and runs allowed total. Each of these skills is graded separately. For example, Derek Jeter might rate as -10 runs on his range but +3 on infield grounders. The sum represents the player’s total rating at each position.

Playing time

Oliver estimates future playing time by looking at past playing time and how good the player is relative to other players in the same league. These estimates are used in the six-year forecast.

At the major league level, The Hardball Times staff maintains a depth chart for each team, in which the remaining playing time each season is divided among players under contract, based on the opinion of the editor.

Wins above replacement

WAR has two components: level of production (measured in runs), multiplied by playing time, so that given the same rates of production, a player’s WAR can rise or fall with changes in playing time.

All outcomes, whether for batting, pitching, baserunning or defense, can be expressed as a number of runs. Batting and pitching are compared to replacement, while defense and baserunning are compared to league average. With all measured in the same unit, runs, they can be summed and then divided by 10 to get the numbers of wins, plus or minus, that the player contributed to the team.

Replacement level is considered the minimum level of talent necessary to be a major league player. It would be expected that roughly half of the players at replacement level would be in the majors, the other half in the minors. This is the freely available talent, major league non-tenders and minor league free agents, who during the season occasionally start for a poor team, sit on the bench for many, or wait in Triple-A to replace a disabled major leaguer.

Production is compared to the replacement level at each fielding position, which is approximately 90 percent of the average batting wOBA of players at that position. Major league first basemen have the highest average wOBA of any position, .357, while catchers have the lowest, .312. The Yankees’ catching prospect, Jesus Montero, is projected to have a batting wOBA of .357 while suffering through -3 fielding runs. At catcher, this is 30 runs above replacement and produces 2.7 WAR, ranking sixth. If moved to first base, Montero’s same batting line would rank 14th, only 11 runs above replacement, as it is easier to find a player capable of playing first base and hitting this well than it is to find one who can catch.

Final words

Collecting data from the amateur ranks to foreign leagues to the minors and the majors, Oliver attempts to capture all aspects of a player’s contribution, as well as the context in which the player performed, to give the most accurate statistical profile available, especially for players with little or no major league experience. I am proud of what Oliver can do now, but there’s always room to look for ways to improve existing methods and to add more features.

Also, please let me know in the comments if there is an aspect of Oliver you would like me to cover in more detail in future articles, and I will do my best to accommodate your requests.

References & Resources
Other articles by the author
What I Hate About Line Drives, FanGraphs
How Good Is That Projection, FanGraphs
The Great Derek Jeter Conspiracy, FanGraphs
Major League Equivalencies, Baseball Prospectus
Park Factors, Baseball Prospectus
Everything I Know About Baseball I Learned In Sandlot, Baseball Prospectus


In addition to writing for The Hardball Times, Brian has written for FanGraphs, consulted for a Major League Baseball team and invented the Oliver projection system. Follow him on Twitter @blcartwright.
22 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Peter Jensen
13 years ago

Brian – Thanks for taking the time to write up this overview of your Oliver methodology.  Unfortunately, it doesn’t answer either of the two questions that I posed last week.  Specifically, why do the six year projections of player’s playing time continue to project them with signifcant playing time after their projected war has substantially decreased, sometimes to below replacement level rates?  And second, why does Andrew Torres projected MLE defensive value as a center fielder value greatly exceed the any previous rate of performance that he has actually achieved?  Perhaps a step by step example using Torres would help me and others understand how you are making these evaluations.  Thanks.

Brian Cartwright
13 years ago

Peter, I am working to improve playing time estimates. Torres had a low PA year in 2009 which pulled down his historical mean, and the system is trying to get him back to fulltime. I had a routine that used a player’s WAR to adjust his playing time up or down, but I had to unplug it when I changed some other methods, and I need to get it back online. I’m also looking to use portion of available time played instead of raw PA totals, especially with the new injury db providing the numbers of games dl’ed.

As for Torres’ defensive projection, I need to check the code for how the last three years are being combined.

Jeffrey Gross
13 years ago

@Brian,

If I am not mistaken, P.T. estimates are ridiculously hard to predict accurately, right? Is it possible to make a guess your own PT input?

Xeifrank
13 years ago

What will an example of the output look like for both a pitcher and hitter?  Will there be some kind of csv output for the projections?
vr, Xei

Jameson
13 years ago

Brian,

I love the explanation of Oliver.  I picked up my copy last week and have been in Nerd Heaven since then.  When reading your Aging Curves I noticed that you selected age 21 as the year a person starts to decline in speed related skills.  I’m curious why such an early age was chosen when studies show that power doesn’t generally decline until after age 25.

Dave Studeman
13 years ago

Xeifrank, if you go to the THT Forecasts link at the top of the page, you can see sample pages for batters and pitchers.  Also, yes, you can download stats in CSV format.

Great overview, Brian.  I know more details are forthcoming.  One thing I’d like to hear more about someday is your WAR approach.  For instance, it appears that you use wOBA as the basis for replacement level, but don’t consider any fielding stats?  I’d like to hear more about that, and about the calculations in general.

Brian Cartwright
13 years ago

Jameson – power and speed are two different skills that age at their own rates. I found that speed peaked the earliest. I will have a more in depth article on aging soon.

Studes – Fielding is part of WAR, I’m sorry if I didn’t make that clear. I take the player’s wOBA minus the replacement wOBA at his position (if he played more than one position, a weighted mean of all the positions), times PAs, then divided by 1.15 for batting runs above replacement. That plus total fielding runs above average at all positions, divided by 10 is WAR.

Jameson
13 years ago

Brian, sorry for the confusion, I was talking about power in terms of kinesiology.

Dave Studeman
13 years ago

Thanks, Brian.  I understand that fielding is part of WAR, but it appears that it doesn’t factor into your assessment of replacement value, right?  You use just wOBA to determine replacement value?

Brian Cartwright
13 years ago

(batting runs above replacement + pitching runs above replacement + fielding runs above average + baserunning runs above average) / 10 = WAR

Dave Studeman
13 years ago

Thanks, Brian.  That’s what I thought. So someone who is at replacement level offensively and average at fielding is replacement level overall.

Makes sense, depending on how you set the replacement level for batting. That’s what I’d like to hear more about someday.

JEH
13 years ago

Brian-

Thanks. 

To the extent I use any regression, I do it on the observed numbers (e.g., 2010 Observed Stats—‘regression’—> 2010 ‘True Talent Level’—aging—> 2011 ‘True Talent Level’—context {park)—> 2011 Projected Stats), but it probably makes little difference. 

Do you regress at the same levels when looking ahead multiple years?  At first glance, it seems that would be working at odds with the aging factor (one pulling up, one pulling down) as you progress into the future.  Coupled with the fact that the initial regression should go a ways toward “centering” the projections going forward my inclination would be to leave out regression and just apply aging.

I have not attempted projecting multiple years out, but I am curious if the projections for years beyond the next simply treat the projected stats as observed stats and repeat the “next year” process.

Thanks again,

Eric

JEH
13 years ago

Brian-

Are the steps as listed sequential?

Specifically, I am curious if, for projecting 2011, the regression is applied to 2008-2010 stats or the 2011 projected stats.

-Eric

Brian Cartwright
13 years ago

Eric, the data that is on line at this moment is all regressed the same amount, but the mean that is being regressed to will vary by the player’s age, level and position.

The biggest problem I have to deal with is getting realistic projections from young players who did very well with less than a significant sample size (abt 800 weighted PAs). I thought everyone was behaving, but some popped back up again when I went to the continuous aging, which made some guys up to 6 months older than before, and effected the league factors.

Since posting the data I ran tests on each year projected in the future which showed that larger amounts of regression on the extended forecasts did minimize the mean error of each category. This helps with the problem group, but it does work against the aging curve. I want it to do that for the smaller samples, but not so much for the larger. It’s a balancing act. 99%+ of the players are fine, but I won’t be satisfied unless everyone is.

I also tried using the projected as ‘observed’ for future years, but it caused some players to get worse, then better which was not received well, so I went back (for now) to just using agin curves and regression in the out years.

JEH
13 years ago

Brian-

Thanks.  Regression is to a specific pool of comparable players.  Got it. 

For your problem area, isn’t your level of regression a function of your observed sample size?  I assume yes, but I am hazy on why the larger regression on the problem group might have a significant impact on the the larger pool.

Regarding your last paragraph, do you know why players got projected as worse and then better in future years?  That seems likely to be a code (as opposed to data) generated surprise. I.e., I can’t picture the scenario where simply treating the projected values as observed values causes that behavior.

Brian Cartwright
13 years ago

They are presented close to sequentially – regression is the next to last step, and is applied to the projections. Regression is only done once, so I take the 2011 basic projection, age one year, and regress; then age two years and regress; etc., on the same base sample size, age, level, position – what changes is number of years projected and the amount of regression (as more years in the future are less reliable).

Only thing that follows is adjusting from park neutral back to the player’s scheduled ballparks for his parent club’s team.

jinaz
13 years ago

hi Brian,

Thanks for the overview—Oliver’s learned a lot since his debut!

Two questions:

1. Any plans to include some form if scouting data into the mix, e.g. Fastball velocity for pitchers?

2. Has anyone (you or others) done a review of how Oliver did vs other projection systems in 2010?  I’ve seen prior years, but wasn’t sure how much the engine has changed since then.

Thanks!
Justin

Brian Cartwright
13 years ago

Justin, we have all the pitch f/x info in the db, and I would like to display it in some way by spring training.

We have to decide whether to use mlbam’s pitch types (which have been redefined over the years) or develop our own method, which will take time but ensure consistency across the past years.

I have a beta that displays the pitch selection percentage vs each bat hand, min, mean and max speed of each pitch and mean movements.

Right now I’m setting up some tests of Oliver vs Chone, Zips & Pecota, and I’m especially interested in how we all do in forecasting rookies who have little or no MLb stats to go on.

jinaz
13 years ago

I’m not sure if this was clear or not, but I was talking about adding scouting data to the projection engine itself to try to improve the projections (incrementally).  Fastball velocity might be the easiest place to start.  Obviously you’d need to resolve the issues you mentioned to get it working.

Cheers,
Justin

Xeifrank
13 years ago

If you were looking to project a players current (as of this date) true talent level, would you use a methodology any different than you would for projecting a players stats over an entire season?  If so, what would you do differently?
vr, Xei

Brian Cartwright
13 years ago

Xei, I am assuming you are speaking of an in-season projection?

In that case, I use a modified weighting scheme that uses the current season as 1.0 and for the three previous a sliding scale between their pre and post season weights, dependent on the percentage of the current season played so far. That gives the true talent level at that point in time. Those rates are projected over the expected number of plate appearances remaining in the season and then added to the counting stats already accumulated to give the projected end of season total.

JEH
13 years ago

Brian-

“I’m especially interested in how we all do in forecasting rookies who have little or no MLb stats to go on. “

I am, admittedly, guessing, but I think you will want match-up data, at the very least, to consistently project rookies.  My hypothesis is that the level of competition varies enough that the stats alone do not necessarily tell the story.

I think that would be a good place to try and apply whatever other data you might have available (scouting reports, pitch velocity/location, match-up quality, prospect rankings, etc.) 

I have not even tried to automate the projection of rookies and am very leery of September data as well.

Now I am curious how you do with them as well.