Oliver, smarter than your average monkey
by Brian CartwrightDecember 30, 2010
…and not just because he’s a chimp.
Oliver, a full-featured (batting, pitching and fielding) player projection system presented at The Hardball Times Forecasts, had its beginnings three years ago in an attempt to answer the question, “Just how good is Rajai Davis?” upon his promotion to the Pirates. The key concept is context—looking at all the other players who have come before, accounting for their ages, the ballparks they played in, the level of competition, and (where play-by-play is available) the in-game situation, and comparing the expected value to the observed. Oliver is also designed as a "learning machine," in which rules define the relationships between the data, and it is expected to produce better and more specific results as more data are added.
This system of finding how other players did at each stop in the minors compared to their major-league performance soon outgrew a spreadsheet and was moved to a relational database—first Microsoft Access, and now MySQL—where it tops 50 gigabytes in size.
The then-unnamed projections were introduced to the public in an Aug., 2008 piece at the late great Statspeak, entitled “Turning the Monkey Into a Gorilla,” as it was at that time based on Tom Tango's Marcel model. That was a weighted mean of the past three years, regressed to the league mean and adjusted for age, to which were added park factors and minor league statistics. Tango suggested naming it Oliver, after the chimpanzee who displayed human characteristics.
Oliver’s 2009 preseason batting projections were hosted at FanGraphs, alongside Bill James, ZiPS, Marcel and Chone. Before the 2010 season they moved to The Hardball Times, adding pitching and defense. Plans for 2011 include expanded defensive statistics and an injury database (though, as always, there are more ideas than time to implement them).
Which players are projected?
The goal is to profile every professional player:
- Everyone in Gameday, which includes the major leagues, the affiliated minors, Mexico and the fall and winter leagues is included. From those, add any player who appeared in a game or on a game roster in the current season, or is listed in MLB’s biographical database as still being active.
- Players in Japan’s Central and Pacific Leagues will be updated weekly in 2011.
- Players in the major American independent minor leagues (Atlantic, Frontier, American Association, CanAm, Northern, Golden and United) will also be updated weekly in 2011.
- Player stats in the Dominican and Venezuelan Summer leagues are collected and analyzed, but the player is not given a projection until he appears in a higher league.
- Stats for college (most Division I and some lower schools) and major summer collegiate leagues (Cape Cod, Cal Ripken, Clark Griffith, Valley, Coastal Plains, Texas Collegiate) are updated annually, with the players given a projection once they are drafted.
The historical database starts in 1998, the year of the last expansion of the major leagues, which set the current ratio of the number of major to minor league teams. Having 13 years of statistics allows a comparison of past projections with their following seasons and the calculation of aging curves, both of which are used to adjust the current projections to make them as reliable as possible.
Gameday records begin in 2005, providing detailed play-by-play and batted-ball locations. When both season stats and play-by-play are available, the play-by-play takes precedence.
How Oliver works
Batters and pitchers and measured by these same sets of stats:
- Base hits per ball-in-play
- Extra base hits per base hit
- Triples per extra base hit
- Home runs per ball contacted
- Hit by pitch, bases on balls and strikeouts per plate appearance
- Sacrifice bunts, sacrifice flies and grounded into double plays by balls-in-play
And for batters only:
- Steal attempts per single, walk, and HBP
- Caught stealing per steal attempt
After any adjustments to these rates, they are reassembled into a standard counting stat line based on the player’s plate appearances or batters faced.
Within each league, players' stats are adjusted for the ballparks they played in and their age during each season. Then all the players from each league who have also played in the majors have those adjusted stats compared to their major league performance to determine the league’s level of competition, which is then applied to all the players in that league to produce a Major League Equivalency (MLE). A projection is the weighted mean of the past three seasons' MLEs, aged and regressed (to adjust for small sample sizes).
Park factors
Park factors account for how each park modifies the stats of the players who played there. They are derived by taking how each team’s batters and pitchers performed in their home games compared to what they did on the road. However, in this model, a park's factors will change when any road park changes, even when there is no actual change in the park being measured.
For example, from 1978 to 1998, Fenway Park dropped from the second-easiest home run park in the AL to the second hardest, even though Fenway didn't change. It was because most of Boston’s road parks were replaced with newer versions that were more homer-friendly. To determine a rate in each stat category for each park, I compare how each pair of teams (by right and left-handed batters) did each in each of their parks, for all the seasons that those parks coexisted.
For example, take all the Pirates-Cubs games in PNC Park and compare them to all their games in Wrigley Field in the same seasons. Then the Pirates-Reds in PNC Park compared to Great American Ballpark, etc. Repeat for each combination of teams and parks, then compare the sum of the home games to the sum of the road games for each park.
The resulting factors represent how the average performance in each park compares to the average of all the parks in its league—the numbers (unless further analyzed) are not comparable across leagues, but serve the purpose of normalizing performances within each league. Park factors are regressed to the average of their league to prevent extreme values in smaller sample sizes when there is a new park.
Aging curves
Aging is used in to adjust stats used to calculate league factors, and to predict how much a player’s skills will change several years into the future.
There can often be several years between a player’s performance in a given minor league and his appearance in the major leagues. When comparing a player at age 19 in a short-season league, and the same player at age 24 in the majors, you would expect that the difference in performance would be affected not only by the change in quality of competition, but by the player’s expected change in skill level as he ages. In an attempt to isolate aging, I used park-adjusted statistics for players who were in the same league in two consecutive seasons, regardless of the level of the league, and grouped by age.
Each of the above-listed rate statistics has its own unique patterns. For example, speed-based skills such as triples and steals start declining by age 21, power peaks at 28 to 29, while walks and strikeouts peak in the early 30s.
When calculating and applying aging, Oliver uses an exact decimal age as of Aug. 1 of each season, rather than a rounded age. (If a rounded age were used, a player born July 31, 1990 would be considered age 21 in 2011, while another born Aug. 1, 1990 would be considered 20, and we would get aging adjustments that would treat them as being a year apart.)
League factors
Oliver calculates league factors by a direct comparison of each player’s performance in each minor league with his performances in the major leagues, while most other systems use a "chaining" process in which only performances in adjacent levels are compared. An example of chaining is when High-A is compared to Double-A, which is compared to Triple-A, which is compared to the major leagues, to get a High-A to majors factor. A problem with chaining is that each additional element in the chain multiplies any selection bias that might be present. A direct comparison, without adjusting for the age of the players at each level, gives a better estimate than chaining of how well the player will perform if and when he gets to the major leagues. Adjusting for age allows an estimate of the player’s true talent now.
Once age and park factors have been accounted for, Oliver calculates the translation factors for each league. Applying these factors to each player’s park-adjusted stats in each league, then summing into a single stat line, produces the player’s MLE for that season.
Past season weighting
To estimate the player’s current true talent level, several years of data need to be considered. Tests have shown that the best results are obtained by using the past three seasons, with the most recent given a full weight of 1.0, the next 0.8, and the last 0.6, so that the most recent season accounts for 42 percent of the total. Pitchers are less reliable from season to season, so their more recent season is given additional relative weight — 1.0, 0.7, and 0.5.
In-season weighting
A modified weighting scheme is used for projections generated at any time during a season. The current season is given a weight of 1.0, although the effective weight is equal to the percentage of the season's games that have been played to that point. The previous seasons are given a sliding scale based on that percentage of games played. At the beginning of 2010, for batters 2009 had a weight of 1.0, with 2008 at 0.8 and 2007 at 0.6. At the end of 2010, 2009 will be weighted 0.8, 2008 at 0.6, and 2007 at 0.0 (not included). Therefore, after 81 games (50 percent) of 2010 have been played, the weight for 2009 will be 50 percent of the way between the preseason weight of 1.0 and the end of season weight of 0.8, which is 0.9. Likewise, 2008 is weighted 50 percent between 0.8 and 0.6 (0.7), and 2007 50 percent between 0.6 and 0.0 (0.3).
Applying future aging
The weighted mean of past-season MLEs gives an estimate of a player’s true talent now. Applying aging factors will give a talent estimate at any point in the future. In projecting the very next season, some aging has already leaked into the weighted mean, so a best fit is obtained by comparing projections from the past 12 seasons to their next season MLEs and then applying those factors to the current projections. All further years into the future are estimated by applying aging to the single-season projection.
Regressing to the mean
All the performance data collected for each player is a sample of his true talent. Flipping a coin does not always result in exactly 50 percent heads and 50 percent tails, even though we know that’s the true chance. The larger the sample size becomes, the closer the sample rate will be to the true rate. Regression helps avoid extreme values in smaller sample sizes by adding a fixed amount of average performance to each player’s stats. Whether projecting one or six years into the future, the base sample size remains the same, although more adjustments are applied with each additional year projected.
The values that each player is regressed to are determined by information about the player other than his performance—his age, position, and level played. A 19-year-old in Double-A will be regressed to a higher mean than a 23-year-old at that same level, as the team presumably considered the 19-year-old to have more talent in aggressively promoting him. Similarly, a first baseman will be regressed to a higher home run rate than a shortstop, as we know that, on average, first basemen hit more home runs than shortstops.
The lowest mean error of comparing each batter’s projected strikeout rate to his performance the following season is achieved by adding 100 plate appearances of average strikeout rate to each batter, while walks require 250 PAs of regression. Multi-year forecasts are subject to more variation, and are thus less reliable than those projecting one year into the future. Two years out, the optimal regression amounts needed to temper extreme values increase to 300 for strikeouts and 400 for walks.
Defense
Each player on defense is rated at various skills.
Outfielders:
- Percent of air balls caught
- Extra bases on air-ball hits
- Extra bases on groundball hits
- Extra bases by baserunners (planned)
Infielders:
- Percent of grounders kept in infield
- Percent of those made into outs (bunts listed separately)
- Double play starts
- Double play pivots
Catchers and pitchers:
- Percent of infield grounders made into outs
- Wild pitches and double plays
- Stolen bases/caught stealing
The number of groundball hits to the outfield that are assigned to the infielder deemed most responsible for the ball is estimated based on the distribution of infield grounders given the same ballpark, batted ball vector, base/out state and handedness of the batter. For all others, the actual totals are counted as accurately as possible from the play-by-play descriptions.
The counted totals are compared to each player’s expected totals to get an adjusted rate and runs allowed total. Each of these skills is graded separately. For example, Derek Jeter might rate as -10 runs on his range but +3 on infield grounders. The sum represents the player’s total rating at each position.
Playing time
Oliver estimates future playing time by looking at past playing time and how good the player is relative to other players in the same league. These estimates are used in the six-year forecast.
At the major league level, The Hardball Times staff maintains a depth chart for each team, in which the remaining playing time each season is divided among players under contract, based on the opinion of the editor.
Wins above replacement
WAR has two components: level of production (measured in runs), multiplied by playing time, so that given the same rates of production, a player’s WAR can rise or fall with changes in playing time.
All outcomes, whether for batting, pitching, baserunning or defense, can be expressed as a number of runs. Batting and pitching are compared to replacement, while defense and baserunning are compared to league average. With all measured in the same unit, runs, they can be summed and then divided by 10 to get the numbers of wins, plus or minus, that the player contributed to the team.
Replacement level is considered the minimum level of talent necessary to be a major league player. It would be expected that roughly half of the players at replacement level would be in the majors, the other half in the minors. This is the freely available talent, major league non-tenders and minor league free agents, who during the season occasionally start for a poor team, sit on the bench for many, or wait in Triple-A to replace a disabled major leaguer.
Production is compared to the replacement level at each fielding position, which is approximately 90 percent of the average batting wOBA of players at that position. Major league first basemen have the highest average wOBA of any position, .357, while catchers have the lowest, .312. The Yankees’ catching prospect, Jesus Montero, is projected to have a batting wOBA of .357 while suffering through -3 fielding runs. At catcher, this is 30 runs above replacement and produces 2.7 WAR, ranking sixth. If moved to first base, Montero’s same batting line would rank 14th, only 11 runs above replacement, as it is easier to find a player capable of playing first base and hitting this well than it is to find one who can catch.
Final words
Collecting data from the amateur ranks to foreign leagues to the minors and the majors, Oliver attempts to capture all aspects of a player's contribution, as well as the context in which the player performed, to give the most accurate statistical profile available, especially for players with little or no major league experience. I am proud of what Oliver can do now, but there’s always room to look for ways to improve existing methods and to add more features.
Also, please let me know in the comments if there is an aspect of Oliver you would like me to cover in more detail in future articles, and I will do my best to accommodate your requests.
References and Resources
Other articles by the author
What I Hate About Line Drives, FanGraphs
How Good Is That Projection, FanGraphs
The Great Derek Jeter Conspiracy, FanGraphs
Major League Equivalencies, Baseball Prospectus
Park Factors, Baseball Prospectus
Everything I Know About Baseball I Learned In Sandlot, Baseball Prospectus
Brian got his start in amateur baseball way back in the 1970's as the statistician for his local college summer league in Johnstown, Pa, which also hosts the annual All-American Amateur Baseball Association. A longtime APBA and Strat-o-Matic player, he still tends to look at everything as a simulation. He has also written for StatSpeak and Fangraphs, was runnerup in the Baseball Prospectus Idol competition, and has consulted for a major league team. You can contact him at .(JavaScript must be enabled to view this email address).







 
Brian - Thanks for taking the time to write up this overview of your Oliver methodology. Unfortunately, it doesn’t answer either of the two questions that I posed last week. Specifically, why do the six year projections of player’s playing time continue to project them with signifcant playing time after their projected war has substantially decreased, sometimes to below replacement level rates? And second, why does Andrew Torres projected MLE defensive value as a center fielder value greatly exceed the any previous rate of performance that he has actually achieved? Perhaps a step by step example using Torres would help me and others understand how you are making these evaluations. Thanks.