Adjusting minor league rates

by Harry Pavlidis
January 7, 2011

With minor league data available from MLBAM’s Gameday, researchers have access to a world of information that would’ve been difficult to imagine a decade ago. Play-by-play, and sometimes pitch-by-pitch, records covering winter ball, rookie ball and all the way to the top levels of the minors. Want to study line drive rates in the Australian Winter League? You can do it.

It is a double-edged sword, as doing anything with minor league data can be tricky—comparing performance between leagues and levels, dealing with all sorts of different scoring environments and park environments, etc. Add in the complexities of batted-ball tagging and you’ve got a challenging, but worthwhile, task at hand. We’ll explore one approach to tackling the problem, along with some actual examples of minor league conversions.

Got time?

One way to consume endless hours of time is to take the Gameday data and shape it into a projection system. The images below describe one such opportunity, though it is by no means the best, the easiest or proven to work for that matter.

First is a conceptual flow of information, starting with park-adjusted data and aggregation for benchmarks (or regression targets), league adjustments and even aging patterns (gray boxes). You can put it all together at various stages (blue boxes) to arrive at a “projection”. There are useful stops along the way. I can shoot holes in this approach—feel free to take aim.

For background on why one may arrive at such a process, you can read-up on why we regress, why we need to be aware of and attempt to handle park factors and stringer bias (here, here and here).

Ground balls will be one of the elements explored below. We’ve looked at league-by-league groundball rates in the past, but this is different. Instead of saying, “This league average is more or less than this league average”, we want to compare the performance of individual players across leagues (levels) in the same season.

Looking across

Clay Davenport recently published an excellent review of groundball conversion rates at Baseball Prospectus. Davenport’s study covered groundball outs as an attempt to remove bias from his batted-ball classifications (provided by BIS). We’ll attempt to hold our nose and deal with bias issues with the park/string adjustments and include safe hits as well. We’ll also cover a few other events beyond ground balls.

Going back to the general approach shown above, the data in question can be broken down as shown below. The white boxes indicate a terminus, which happen to be places where expected outs and run values should be calculated. We’ll discuss the tRA-esque approach this entails at another time.

Brian Cartwright’s Oliver system—the basis of the THT Forecasts—makes tremendous use of non-major league data to provide a breadth of projections that no other publically-available system currently offers.

Still, I want a pony. And my own toys. So I’m working Harry’s Arguably Redundant Use of Marcel Plus Hackery (HARUMPH)—with regular help and guidance from Mr. Cartwright himself.

Cartwright’s Oliver stands out as for a variety of reasons, in particular its robust Major League Equivalents (MLE) framework. The metrics employed in Oliver’s MLE calculations are not as exhaustive as the approach required for HARUMPH. This is not a bad thing, as there are downsides to the all-in approach, and Cartwright’s focus is empirically supported.

There are many ways to skin this cat, but this is the path I’ve settled on. Someday we’ll find out if it works.

First set of measurements

Before even dipping a toe into outcomes, we have to deal with events. For balls in play, these are the basic four rates to consider.

{exp:list_maker}Balls in play per batter faced (BIPpBF)
Ground balls per ball in play (GBpBIP)
Line drives per ball in air (LDpBIA)
Pop-ups per Flyball+Pop-up (FBpPUFB) {/exp:list_maker}

From these, you can derive pretty much anything else you need.

{exp:list_maker}BIPpBF*BF = BIP
GBpBIP*BIP = GB
LDpBIA*(BIP-GB) = LD
PUpPUFB *(BIP-GB-LD) = PU
BIP-GB-LD-PU = FB {/exp:list_maker}

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

Everything starts with measuring, adjusting and regressing the four rates. The impact of regression on the distributions of BIPpBF, GBpBIP, LDpBIA, FBpPUFB is shown below. The first row shows regressed (blue) and un-regressed (red) distributions, each “stripe” is a single season form 2007 to 2010. Click to zoom, and notice the extreme differences in scale. The vertical axis shows the percentage change in the statistic relative to the minor league performance. Positive values mean higher rates in the major leagues than in the minor leagues. Each dot represents one pitcher who worked in both Triple-A and MLB during that season (minimun 50 batters faced in each).

If you click the third image above, you’ll see both regressed and unregressed seasons on the same chart. Depending on which numbers you use, regressed or unregressed, you can derive slightly different conversion factors.

Sample conversions

Both the International League (IL) and Pacific Coast League (PCL) provide reasonable samples for Triple-A-to-MLB conversions. For each pitcher with at least 50 batters faced at two levels, I selected the lesser number of batters faced as each pitcher’s weighting in the calculation. The tables below summarize the average age, pitching seasons (with unique pitchers in parentheses) and the total number of “weighted” batters faced. Each of the four rates charted above is calculated using both regressed and unregressed seasonal data.

PCL to AL	Age	Seasons (pitchers)	wBF
	26.6	175 (138)	20365

PCL to AL	BIPpBF	GBpBIP	LDpBIA	PUpPUFB
UNREG	0.051	-0.080	0.075	0.111
REG	0.038	-0.060	0.000	0.013

PCL to NL	Age	Seasons (pitchers)	wBF
	26.9	230 (181)	23562

PCL to NL	BIPpBF	GBpBIP	LDpBIA	PUpPUFB
UNREG	0.052	-0.069	0.064	0.111
REG	0.036	-0.051	0.002	0.021

Yes, that’s right, PCL pitchers are expected to allow fewer line drives per ball in air when moving to the Major Leagues. But, no, that’s not necessarily true. We expect fewer balls that are allowed to be hit in the air to be tagged as line drives. That’s the reality of this situation, as everything is based on human-generated tags.

Same thing, but now for the IL.

IL to AL	Age	Seasons (pitchers)	wBF
	26.5	223 (187)	26343

IL to AL	BIPpBF	GBpBIP	LDpBIA	PUpPUFB
UNREG	0.067	-0.066	0.019	0.069
REG	0.057	-0.060	-0.032	-0.020

IL to NL	Age	Seasons (pitchers)	wBF
	27.4	172 (140)	19777

IL to NL	BIPpBF	GBpBIP	LDpBIA	PUpPUFB
UNREG	0.053	-0.057	0.001	0.080
REG	0.049	-0.056	-0.033	-0.017

Using the IL as an example, let’s estimate our expected AL and NL GBpBIP and LDpBIA rates using both the regressed and unregressed conversion factors.

	R	UR	R	UR
IL GBpBIP	EXP AL	EXP AL	EXP NL	EXP NL
0.60	0.564	0.561	0.567	0.566
0.55	0.517	0.514	0.519	0.519
0.50	0.470	0.467	0.472	0.471
0.45	0.423	0.421	0.425	0.424
0.40	0.376	0.374	0.378	0.377
0.35	0.329	0.327	0.330	0.330
0.30	0.282	0.280	0.283	0.283

	R	UR	R	UR
IL LDpBIA	EXP AL	EXP AL	EXP NL	EXP NL
0.40	0.387	0.407	0.387	0.401
0.35	0.339	0.356	0.338	0.350
0.30	0.290	0.306	0.290	0.300
0.25	0.242	0.255	0.242	0.250
0.20	0.194	0.204	0.193	0.200
0.15	0.145	0.153	0.145	0.150
0.10	0.097	0.102	0.097	0.100

Let’s take a longer jump, from Double-A. The Eastern League happens to have a substantial-ish amount of data available for jumps to both Major Leagues. The Southern and Texas Leagues both feed the show directly, but not both circuits in volume.

	Age	Seasons (pitchers)	wBF
EL to AL	24	29 (28)	3077

EL to AL	BIPpBF	GBpBIP	LDpBIA	PUpPUFB
UNREG	0.072	-0.164	0.624	0.216
REG	0.056	-0.107	0.285	0.051

	Age	Seasons (pitchers)	wBF
EL to NL	24.5	18 (17)	1862

EL to NL	BIPpBF	GBpBIP	LDpBIA	PUpPUFB
UNREG	0.083	-0.120	0.328	0.204
REG	0.067	-0.085	0.271	0.082

And the expected rates for EL pitchers moving to the Major Leagues:

	R	UR	R	UR
EL GBpBIP	EXP AL	EXP AL	EXP NL	EXP NL
0.60	0.536	0.502	0.549	0.528
0.55	0.491	0.460	0.503	0.484
0.50	0.447	0.418	0.457	0.440
0.45	0.402	0.376	0.412	0.396
0.40	0.357	0.335	0.366	0.352
0.35	0.313	0.293	0.320	0.308
0.30	0.268	0.251	0.274	0.264

	R	UR	R	UR
EL LDpBIA	EXP AL	EXP AL	EXP NL	EXP NL
0.40	0.514	0.650	0.508	0.531
0.35	0.450	0.568	0.445	0.465
0.30	0.385	0.487	0.381	0.398
0.25	0.321	0.406	0.318	0.332
0.20	0.257	0.325	0.254	0.266
0.15	0.193	0.244	0.191	0.199
0.10	0.128	0.162	0.127	0.133

Now what?

Refinements to the process described above are in order. We’ll see where we go from here, which will be influenced by your feedback.

References & Resources
Batted ball data from MLBAM

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG