Adjusting minor league ratesby Harry Pavlidis
January 07, 2011
With minor league data available from MLBAM's Gameday, researchers have access to a world of information that would've been difficult to imagine a decade ago. Play-by-play, and sometimes pitch-by-pitch, records covering winter ball, rookie ball and all the way to the top levels of the minors. Want to study line drive rates in the Australian Winter League? You can do it.
It is a double-edged sword, as doing anything with minor league data can be tricky—comparing performance between leagues and levels, dealing with all sorts of different scoring environments and park environments, etc. Add in the complexities of batted-ball tagging and you've got a challenging, but worthwhile, task at hand. We'll explore one approach to tackling the problem, along with some actual examples of minor league conversions.
One way to consume endless hours of time is to take the Gameday data and shape it into a projection system. The images below describe one such opportunity, though it is by no means the best, the easiest or proven to work for that matter.
First is a conceptual flow of information, starting with park-adjusted data and aggregation for benchmarks (or regression targets), league adjustments and even aging patterns (gray boxes). You can put it all together at various stages (blue boxes) to arrive at a "projection". There are useful stops along the way. I can shoot holes in this approach—feel free to take aim.
For background on why one may arrive at such a process, you can read-up on why we regress, why we need to be aware of and attempt to handle park factors and stringer bias (here, here and here).
Ground balls will be one of the elements explored below. We've looked at league-by-league groundball rates in the past, but this is different. Instead of saying, "This league average is more or less than this league average", we want to compare the performance of individual players across leagues (levels) in the same season.
Clay Davenport recently published an excellent review of groundball conversion rates at Baseball Prospectus. Davenport's study covered groundball outs as an attempt to remove bias from his batted-ball classifications (provided by BIS). We'll attempt to hold our nose and deal with bias issues with the park/string adjustments and include safe hits as well. We'll also cover a few other events beyond ground balls.
Going back to the general approach shown above, the data in question can be broken down as shown below. The white boxes indicate a terminus, which happen to be places where expected outs and run values should be calculated. We'll discuss the tRA-esque approach this entails at another time.
Brian Cartwright's Oliver system—the basis of the THT Forecasts—makes tremendous use of non-major league data to provide a breadth of projections that no other publically-available system currently offers.
Still, I want a pony. And my own toys. So I'm working Harry's Arguably Redundant Use of Marcel Plus Hackery (HARUMPH)—with regular help and guidance from Mr. Cartwright himself.
Cartwright's Oliver stands out as for a variety of reasons, in particular its robust Major League Equivalents (MLE) framework. The metrics employed in Oliver's MLE calculations are not as exhaustive as the approach required for HARUMPH. This is not a bad thing, as there are downsides to the all-in approach, and Cartwright's focus is empirically supported.
There are many ways to skin this cat, but this is the path I've settled on. Someday we'll find out if it works.
First set of measurements
Before even dipping a toe into outcomes, we have to deal with events. For balls in play, these are the basic four rates to consider.
- Balls in play per batter faced (BIPpBF)
- Ground balls per ball in play (GBpBIP)
- Line drives per ball in air (LDpBIA)
- Pop-ups per Flyball+Pop-up (FBpPUFB)
From these, you can derive pretty much anything else you need.
- BIPpBF*BF = BIP
- GBpBIP*BIP = GB
- LDpBIA*(BIP-GB) = LD
- PUpPUFB *(BIP-GB-LD) = PU
- BIP-GB-LD-PU = FB
Everything starts with measuring, adjusting and regressing the four rates. The impact of regression on the distributions of BIPpBF, GBpBIP, LDpBIA, FBpPUFB is shown below. The first row shows regressed (blue) and un-regressed (red) distributions, each "stripe" is a single season form 2007 to 2010. Click to zoom, and notice the extreme differences in scale. The vertical axis shows the percentage change in the statistic relative to the minor league performance. Positive values mean higher rates in the major leagues than in the minor leagues. Each dot represents one pitcher who worked in both Triple-A and MLB during that season (minimun 50 batters faced in each).
If you click the third image above, you'll see both regressed and unregressed seasons on the same chart. Depending on which numbers you use, regressed or unregressed, you can derive slightly different conversion factors.
Both the International League (IL) and Pacific Coast League (PCL) provide reasonable samples for Triple-A-to-MLB conversions. For each pitcher with at least 50 batters faced at two levels, I selected the lesser number of batters faced as each pitcher's weighting in the calculation. The tables below summarize the average age, pitching seasons (with unique pitchers in parentheses) and the total number of "weighted" batters faced. Each of the four rates charted above is calculated using both regressed and unregressed seasonal data.
Yes, that's right, PCL pitchers are expected to allow fewer line drives per ball in air when moving to the Major Leagues. But, no, that's not necessarily true. We expect fewer balls that are allowed to be hit in the air to be tagged as line drives. That's the reality of this situation, as everything is based on human-generated tags.
Same thing, but now for the IL.
Using the IL as an example, let's estimate our expected AL and NL GBpBIP and LDpBIA rates using both the regressed and unregressed conversion factors.
Let's take a longer jump, from Double-A. The Eastern League happens to have a substantial-ish amount of data available for jumps to both Major Leagues. The Southern and Texas Leagues both feed the show directly, but not both circuits in volume.
And the expected rates for EL pitchers moving to the Major Leagues:
|EL GBpBIP||EXP AL||EXP AL||EXP NL||EXP NL|
|EL LDpBIA||EXP AL||EXP AL||EXP NL||EXP NL|
Refinements to the process described above are in order. We'll see where we go from here, which will be influenced by your feedback.
References and Resources
Batted ball data from MLBAM
Harry Pavlidis admits he has a baseball problem. He is the founder of Pitch Info LLC, His pitch classifications power the player cards at Brooksbaseball.net. Feedback, questions and comments are appreciated - Email firstname.lastname@example.org and Twitter @harrypav