May 26, 2013

THT Essentials:
Fangraphs Player Search:


And here's the full roster.

Now available


You can now purchase the Hardball Times Baseball Annual 2013, with 300 pages of great content. It's also available on Amazon and Kindle. Read more about it here.



Or you can search by:

THT E-book


Third Base: The Crossroads is THT's e-book, available for $3.99 from the Kindle store. The good news is that anyone can read a Kindle book, even on a PC. So enjoy the best from THT in a new format.



Get your very own THT merchandise from our CafePress store. We've got baseball caps, t-shirts, coffee mugs and even wall clocks with the classy THT logo prominently displayed. Also, check out the THT Bookstore. Please support your favorite baseball site by purchasing something today.


Creative Commons License
All content on this site (including text, graphs, and any other original works), unless otherwise noted, is licensed under a Creative Commons License.

Adjusting minor league rates

by Harry Pavlidis
January 07, 2011

With minor league data available from MLBAM's Gameday, researchers have access to a world of information that would've been difficult to imagine a decade ago. Play-by-play, and sometimes pitch-by-pitch, records covering winter ball, rookie ball and all the way to the top levels of the minors. Want to study line drive rates in the Australian Winter League? You can do it.

It is a double-edged sword, as doing anything with minor league data can be tricky—comparing performance between leagues and levels, dealing with all sorts of different scoring environments and park environments, etc. Add in the complexities of batted-ball tagging and you've got a challenging, but worthwhile, task at hand. We'll explore one approach to tackling the problem, along with some actual examples of minor league conversions.

Got time?


One way to consume endless hours of time is to take the Gameday data and shape it into a projection system. The images below describe one such opportunity, though it is by no means the best, the easiest or proven to work for that matter.

First is a conceptual flow of information, starting with park-adjusted data and aggregation for benchmarks (or regression targets), league adjustments and even aging patterns (gray boxes). You can put it all together at various stages (blue boxes) to arrive at a "projection". There are useful stops along the way. I can shoot holes in this approach—feel free to take aim.

image

For background on why one may arrive at such a process, you can read-up on why we regress, why we need to be aware of and attempt to handle park factors and stringer bias (here, here and here).

Ground balls will be one of the elements explored below. We've looked at league-by-league groundball rates in the past, but this is different. Instead of saying, "This league average is more or less than this league average", we want to compare the performance of individual players across leagues (levels) in the same season.

Looking across


Clay Davenport recently published an excellent review of groundball conversion rates at Baseball Prospectus. Davenport's study covered groundball outs as an attempt to remove bias from his batted-ball classifications (provided by BIS). We'll attempt to hold our nose and deal with bias issues with the park/string adjustments and include safe hits as well. We'll also cover a few other events beyond ground balls.

Going back to the general approach shown above, the data in question can be broken down as shown below. The white boxes indicate a terminus, which happen to be places where expected outs and run values should be calculated. We'll discuss the tRA-esque approach this entails at another time.

image

Brian Cartwright's Oliver system—the basis of the THT Forecasts—makes tremendous use of non-major league data to provide a breadth of projections that no other publically-available system currently offers.

Still, I want a pony. And my own toys. So I'm working Harry's Arguably Redundant Use of Marcel Plus Hackery (HARUMPH)—with regular help and guidance from Mr. Cartwright himself.

Cartwright's Oliver stands out as for a variety of reasons, in particular its robust Major League Equivalents (MLE) framework. The metrics employed in Oliver's MLE calculations are not as exhaustive as the approach required for HARUMPH. This is not a bad thing, as there are downsides to the all-in approach, and Cartwright's focus is empirically supported.

There are many ways to skin this cat, but this is the path I've settled on. Someday we'll find out if it works.

First set of measurements


Before even dipping a toe into outcomes, we have to deal with events. For balls in play, these are the basic four rates to consider.

  • Balls in play per batter faced (BIPpBF)
  • Ground balls per ball in play (GBpBIP)
  • Line drives per ball in air (LDpBIA)
  • Pop-ups per Flyball+Pop-up (FBpPUFB)


From these, you can derive pretty much anything else you need.

  • BIPpBF*BF = BIP
  • GBpBIP*BIP = GB
  • LDpBIA*(BIP-GB) = LD
  • PUpPUFB *(BIP-GB-LD) = PU
  • BIP-GB-LD-PU = FB


Everything starts with measuring, adjusting and regressing the four rates. The impact of regression on the distributions of BIPpBF, GBpBIP, LDpBIA, FBpPUFB is shown below. The first row shows regressed (blue) and un-regressed (red) distributions, each "stripe" is a single season form 2007 to 2010. Click to zoom, and notice the extreme differences in scale. The vertical axis shows the percentage change in the statistic relative to the minor league performance. Positive values mean higher rates in the major leagues than in the minor leagues. Each dot represents one pitcher who worked in both Triple-A and MLB during that season (minimun 50 batters faced in each).





imageimage
image



If you click the third image above, you'll see both regressed and unregressed seasons on the same chart. Depending on which numbers you use, regressed or unregressed, you can derive slightly different conversion factors.

Sample conversions


Both the International League (IL) and Pacific Coast League (PCL) provide reasonable samples for Triple-A-to-MLB conversions. For each pitcher with at least 50 batters faced at two levels, I selected the lesser number of batters faced as each pitcher's weighting in the calculation. The tables below summarize the average age, pitching seasons (with unique pitchers in parentheses) and the total number of "weighted" batters faced. Each of the four rates charted above is calculated using both regressed and unregressed seasonal data.







PCL to ALAgeSeasons (pitchers)wBF
26.6175 (138)20365





PCL to ALBIPpBFGBpBIPLDpBIAPUpPUFB
UNREG0.051-0.0800.0750.111
REG0.038-0.0600.0000.013





PCL to NLAgeSeasons (pitchers)wBF
26.9230 (181)23562





PCL to NLBIPpBFGBpBIPLDpBIAPUpPUFB
UNREG0.052-0.0690.0640.111
REG0.036-0.0510.0020.021




Yes, that's right, PCL pitchers are expected to allow fewer line drives per ball in air when moving to the Major Leagues. But, no, that's not necessarily true. We expect fewer balls that are allowed to be hit in the air to be tagged as line drives. That's the reality of this situation, as everything is based on human-generated tags.

Same thing, but now for the IL.







IL to ALAgeSeasons (pitchers)wBF
26.5223 (187)26343





IL to ALBIPpBFGBpBIPLDpBIAPUpPUFB
UNREG0.067-0.0660.0190.069
REG0.057-0.060-0.032-0.020






IL to NLAgeSeasons (pitchers)wBF
27.4172 (140)19777





IL to NLBIPpBFGBpBIPLDpBIAPUpPUFB
UNREG0.053-0.0570.0010.080
REG0.049-0.056-0.033-0.017




Using the IL as an example, let's estimate our expected AL and NL GBpBIP and LDpBIA rates using both the regressed and unregressed conversion factors.














RURRUR
IL GBpBIPEXP ALEXP ALEXP NLEXP NL
0.600.5640.5610.5670.566
0.550.5170.5140.5190.519
0.500.4700.4670.4720.471
0.450.4230.4210.4250.424
0.400.3760.3740.3780.377
0.350.3290.3270.3300.330
0.300.2820.2800.2830.283












RURRUR
IL LDpBIAEXP ALEXP ALEXP NLEXP NL
0.400.3870.4070.3870.401
0.350.3390.3560.3380.350
0.300.2900.3060.2900.300
0.250.2420.2550.2420.250
0.200.1940.2040.1930.200
0.150.1450.1530.1450.150
0.100.0970.1020.0970.100




Let's take a longer jump, from Double-A. The Eastern League happens to have a substantial-ish amount of data available for jumps to both Major Leagues. The Southern and Texas Leagues both feed the show directly, but not both circuits in volume.






AgeSeasons (pitchers)wBF
EL to AL2429 (28)3077





EL to ALBIPpBFGBpBIPLDpBIAPUpPUFB
UNREG0.072-0.1640.6240.216
REG0.056-0.1070.2850.051





AgeSeasons (pitchers)wBF
EL to NL24.518 (17)1862





EL to NLBIPpBFGBpBIPLDpBIAPUpPUFB
UNREG0.083-0.1200.3280.204
REG0.067-0.0850.2710.082




And the expected rates for EL pitchers moving to the Major Leagues:












RURRUR
EL GBpBIPEXP ALEXP ALEXP NLEXP NL
0.600.5360.5020.5490.528
0.550.4910.4600.5030.484
0.500.4470.4180.4570.440
0.450.4020.3760.4120.396
0.400.3570.3350.3660.352
0.350.3130.2930.3200.308
0.300.2680.2510.2740.264














RURRUR
EL LDpBIAEXP ALEXP ALEXP NLEXP NL
0.400.5140.6500.5080.531
0.350.4500.5680.4450.465
0.300.3850.4870.3810.398
0.250.3210.4060.3180.332
0.200.2570.3250.2540.266
0.150.1930.2440.1910.199
0.100.1280.1620.1270.133



Now what?


Refinements to the process described above are in order. We'll see where we go from here, which will be influenced by your feedback.


References and Resources
Batted ball data from MLBAM

Harry Pavlidis admits he has a baseball problem. He is the founder of Pitch Info LLC, His pitch classifications power the player cards at Brooksbaseball.net. Feedback, questions and comments are appreciated - Email harrypav@gmail.com and Twitter @harrypav

Comments


Commenting is not available in this weblog entry.



     Next Article:  Cooperstown Confidential: Rooting for Killer>> <<Previous Article:  Jeff Moorad's Escondido state of mind