Minor league run environmentsby Justin Inaz
February 23, 2010
Minor league run environments vary substantially from league to league. As a result, any time we evaluate a minor leaguer's hitting or pitching stats, we need to consider the context of those performances. Alex Pedicini had a brief but nice series at Hardball Times breaking down these league differences, but I wanted to take a deeper look at run environments, including an investigation of how they would model using Base Runs.
More on Base Runs later. Let's start with a graph looking at how the run environments of the minor leagues (and major leagues) vary:
The first thing to take from that graph is how much the run environments of these leagues vary. The NL and AL are fairly intermediate, while the minor leagues vary by about a half-run per game in either direction. The Florida State League (high Single-A) is a notorious pitchers' league, but I was surprised to see the International League (Triple-A) virtually tied with the Gulf Coast league (low Rookie) for second-lowest runs environment. Maybe I need to give the Reds' Triple-A Louisville players a bit more credit for their production (and be more cautious about their Triple-A pitchers!). At the other end of the spectrum are the high-scoring leagues: the California (high Single-A), the Arizona Summer League (Rookie), and the Pioneer League (Rookie) all averaged more than five runs per game from 2007-2009.
Here is a sampling of offensive statistics from each of these leagues so that we can see where the differences come from (again, using 2007-2009 data). The table is sortable if you click in the header.
These are substantial differences. For example, both the Florida State League and the California League are high Single-A leagues, and thus (probably) have roughly equivalent talent levels. What these data are saying, therefore, is that you could take an average hitter from the Florida State league (hitting .256/.324/.374) and move him to the California League, where he'd "improve" to .271/.339/.418. The reason? Nothing to do with the player himself. Rather, it's probably some combination of environmental factors (humidity, altitude) and ballparks. Nevertheless, it means that you have to be very careful about how you interpret a player's statistics coming from these lines.
It's also the case that hitter leagues don't "achieve" those run environments in the same way. The Pacific Coast League gets its five-plus runs per game thanks to the highest minor league home run rate, second-highest OBP, but relatively low error rates (again, you can click in the header to sort the table and see this more easily). The Arizona Summer League, in contrast, has pretty high AVG and OBP, very high error rates (better only than the foreign rookie leagues), but relatively weak power totals (including one of the lowest HR/PA rates in the minors at 1.1 percent). The Dominican Summer League is one of the more interesting: the worst AVG, SLG, and HR percentage in baseball, but also the highest walk rate, error rate, and stolen base attempt rates you will find: Small ball is alive and well in the Dominican Republic!
Speaking of error rates, another interesting finding from this table can be seen if you click on the Error (E%) header above. This is showing the percentage of plate appearances that involve an error, and you can see that there is a virtually perfect relationship between error rate and quality of play. This has been observed before (see also this related piece by Harry), but I was surprised how strong the relationship is. Better-quality leagues have fewer errors. You see similar, though not as strong, relationships between league quality and stolen bases attempted per opportunity (SBA/Opp; more attempts at younger levels, almost without regard to run environment), home run rate (HR%; better leagues have more home runs, probably a reflection of hitter quality), and (perhaps) unintentional walk rate (niBB%; more walks at lower levels, probably due to young pitchers with lousy command).
The final thing I'd like you to take away from this is that "my" Base Runs model is doing a nice job of estimating the runs produced within each league. What is Base Runs? It's the best available run estimator today. If you're not familiar with it, Patriot wrote probably the best introduction, while Tango's series really demonstrates its power over other run estimators. Briefly, though, Base Runs is a simple approach to modeling run scoring in baseball, and written in English looks like:
[Baserunners] * [Baserunner Scoring Rate] + [Home Runs] = Runs
The major innovation of Base Runs over earlier estimators like Runs Created was the special treatment given to the home run. This helps it handle a much wider range of offensive environments than any other run estimator
What I did is start with a base runs equation (from this work by Tango) that was tweaked (equation shown in resources below) to accurately predict linear weights values for MLB 2007-2008 (kindly provided by Colin Wyers via e-mail). With that equation in hand, I then ran it on 2007-2009 totals for both of the major leagues as well as all of their affiliates. I am using the exact same equation in each league, and yet the square root of the mean square error is just 0.06 runs per game (that figure is after you adjust for the average shortfall of about 0.05 runs per game: I'm including as many events as I can find (passed balls, wild pitches, errors, etc), but there are ways to score runs that I don't have in my minor league data set, and perhaps as a result I'm missing about eight runs per 162 games from each league).
One of the nice things about having a good-performing Base Runs equation like this is that one can use it to produce league-specific linear weights. With the aid of Patriot's spreadsheet that automates this process, here are the linear weights (in absolute runs) in a Google Spreadsheet. How well they work depends on how well the Base Runs equation is working: we're getting good matches in overall runs estimates, but the truth is that I don't really know if what is happening at the individual event levels are correct (except that they are probably very close for the major leagues). However, in the absence of pbp data (maybe someday I'll try to get there with Gameday), this is probably the next best approach to linear weight generation. At least for players who don't have particularly unique skill sets, using these linear weights should give you a good estimate of their absolute runs production.
References and Resources
All minor league data were pulled from baseball-reference.com.
The Base Runs equation I used was modified from one by Tango: (BsR = A* B/(B+C) + D):
The major differences are that I separated the error terms ("Error" in Tango's work follows Retrosheet and refers only to Reached On Errors as far as I can tell), added GDPs, and tweaked the "b" coefficients. The largest tweak was on non-intentional walks, which saw its coefficient nearly doubled. I don't claim that my approach was particularly scientific here, but again, the equation now does a good job of matching 2007-2008 Colin Wyers' empirically-derived linear weights when run on 2007-2008 MLB data...and, in general, seems to get you to an average of 0.11 runs per game across all leagues (0.5 of which is a systematic underestimate of runs—I could force the b-term to match actual production, but my preference was to avoid doing this in this case).
Justin is a life-long Reds fan who keeps the faith at Red Reporter. He also writes at FanGraphs and on twitter.