Tony LaRussa and the Search for Significance

“It is the mark of an educated mind to rest satisfied with the degree of precision which the nature of the subject admits and not to seek exactness where only an approximation is possible.” — Aristotle

Earlier this season on my blog I reviewed the book Three Nights in August: Strategy, Heartbreak and Joy Inside the Mind of a Manager by Buzz Bissinger (yes, I realize it was a bit tardy), which chronicles a three-game Cubs/Cardinals series played in August 2003 from Cardinals manager Tony LaRussa’s perspective. I had special interest in the series since I happened to be in St. Louis on business that week and attended the dramatic third game the book leads up to. I won’t spoil it for those of you who don’t remember or haven’t read the book.

One of the passages I found particularly interesting was related to how LaRussa uses individual batter/pitcher matchup data. His basic philosophy was explained in the following passage.

“La Russa pays special attention to the individual matchups, an essential ingredient of his approach to managing … The term bench player doesn’t really apply to the Cardinals, because LaRussa so frequently plugs utility players into the lineup based on little opportunities he unearths by sifting through the results of their previous experience with players on the opposing team. These individual matchups are so integral to his strategy that he copies them onto 5-by-7-inch preprinted cards that managers normally use to make out the game’s lineup. With ritualistic precision, he folds the cards down the middle 10 minutes before game time and then slips them into the back pocket of his uniform. During a game, he pulls them out continually, almost like worry beads, peering at them as if in search of evidence that everything is fine, that he is doing exactly what he needs to be doing. More practically, he refers to them when deciding who to bring on in relief or who may be the best matchup to pinch-hit.”

Bissinger notes that La Russa knows that matchups aren’t foolproof but still …

“There are some hitters who, never mind their mediocre batting averages, simply tag the living crap out of some pitchers. Conversely, there are pitchers, despite soggy ERAs, who simply do well against particular high-stroke hitters.”

After then digressing about the roles human nature and psychology play in these matchups, Bissinger re-emphasizes the roles they play in the mind of LaRussa.

“Of all the hours spent preparing before a game, many of them LaRussa spends searching for the explanations of these matchup numbers, a slide of seemingly buried narrative that during a season can single-handedly change the outcome of the four or five games that—in La Russa’s estimation—a manager can change.”

What I found interesting in this entire discussion was the omission of the three most important things that leap immediately to mind when I think about matchups—sample size, sample size and sample size.

And that got me to thinking how one might measure whether a particular matchup is statistically significant. In other words, when LaRussa looks at his index cards, how does he know whether the 6-for-26 performance of Aramis Ramirez against Chris Carpenter over the past three seasons is simply Carpenter getting a little lucky against a good hitter or whether Ramirez really has trouble picking up Carpenter’s sinker? Or when he’s picking a pinch hitter and sees 2 for 13 for John Mabry against Greg Maddux, is that enough to choose someone else?

Obviously, LaRussa has other information at his disposal with which to make his decisions, including an understanding of Carpenter’s repertoire and the voluminous charts that pitching coach Dave Duncan keeps, as mentioned by Bissinger and detailed by George Will in Men At Work: The Craft of Baseball. But those of us a little more on the outside might use other tools to try and answer the question.

Enter the Mathematicians

While pondering this question as the season wound down, I remembered that Ken Ross in his book, A Mathematician at the Ballpark: Odds and Probabilities for Baseball Fans, had written about binomial distributions and p-values and related them to batting average in his chapter on assessing streaks. A note on SABR-L from Mike Huber, an associate professor of mathematics at West Point, asking basically the same question I had been thinking about, then served as the impetus for this article.

If it’s been awhile since you cracked open your Statistics 101 text from college, a p-value is the probability of an “observed event or any more extreme and surprising event.” For example, a p-value could be calculated that would indicate the probability that Cal Ripken would hit .340 or better in 1999, given his career .276 average prior to 1999. Ross does this in his book and calculates the p-value for such an event at .0052 or one-half of one percent. Obviously that’s a small probability, and to a statistician a p-value of less than around .05 or 5% is an indication that something more interesting is going on. In other words the event, in this case Ripken’s .340 average, is statistically significant. Of course there is nothing magical about .05 as the cutoff for significance, and in fact Ross notes that anything between .02 and .08 would cause him to look twice at the event.

There are two more pieces to the puzzle here, however, that need to be considered. First, in order to calculate the p-value, a statistician employs a model—a set of assumptions—that enables him to paint a mathematical picture of the situation he’s trying to study. In the case above Ross assumed that each Ripken at-bat approximated a Bernoulli trial (named after the Swiss mathematician Jakob Bernoulli, 1654-1705). Each Bernoulli trial makes the following assumptions:

“a) Each time there are two possible outcomes, traditionally called success and failure.
b) There is a fixed probability p of success each time.
c) The events are independent.”

As you can imagine from the assumptions this model is often referred to as a coin-tossing model, with the only difference being that when flipping coins we assume the fixed probability (p) is .50 or 50%.

The outcomes of Bernoulli trials can then be analyzed using a binomial probability model or binomial distribution. The binomial distribution function is used to calculate the p-value given the number of trials, the number of successes and the probability of success.

Now of course each trial in baseball (an at-bat) does not satisfy the three criteria of Bernoulli trials. In particular there isn’t a fixed probability of success on each trial because weather, injuries, game situation, opposing pitcher and a host of other factors complicate things. The best we can do is to assign a probability, such as career batting average, as our best estimate of p. In addition, at-bats are not independent. As Bissigner notes in Three Nights, a batter’s mindset when facing a particular pitcher, because of his last at-bat against that pitcher or even his last at-bat in the current game, may have much to do with the outcome.

With that said, in the end statisticians employ models, albeit imperfect ones, in order to provide a basis for the study of real-world events. When they find low p-values, that’s an indication that the assumptions of the model may not hold for a particular set of trials. And those are the sets of trials that together make up a statistically significant event.

For the analysis in this article I’ll assume that Bernoulli trials and a binomial distribution are a good proxy for at-bats at the major league level, following in the footsteps of Ross and Jim Albert, co-author of Curve Ball: Baseball, Statistics, and the Role of Chance in the Game, who uses it as the model in his book Teaching Statistics Using Baseball.

The second piece of the puzzle brings us back to sample size. It is intuitive that in the case of Ripken’s 1999 season mentioned above, Cal would have a greater probability of hitting .340 or above in 50 at-bats than he would in 600 at-bats if his “true” average were below .340. The reason is that he’ll be more likely to get lucky in those 50 at-bats where seeing-eye singles, bloopers, bleeders, squibbers and Texas Leaguers will have a larger relative impact. The binomial distribution function takes sample sizes into account when calculating the p-value. In other words, with fewer trials the p-value, all other things being equal, goes down along with the chances of the event being statistically significant.

Putting it all together, we can calculate a p-value and binomial distribution for Ripken’s 1999 season given that he had 113 hits (successes) in 332 at-bats (trials) given a fixed probability of .276 using Microsoft Excel’s BINOMDIST function. That function can be used to produce the following graph of the distribution.

image

Each point in the graph represents the probability of Ripken attaining that particular batting average given the assumptions of the model. As you’ll notice, the probability of Ripken hitting exactly .265 or .277 is only around 5%. However, the cumulative probability of him hitting over or under a certain average is represented by the area to the right or left of the average on the x-axis. The p-value that Ross calculated therefore represents the shaded area to the right of .340 in the graph below.

image

The Methodology

Getting back to the question at hand, I wanted to see how many and which batter/pitcher matchups would be considered statistically significant in order to get a feel for how seriously one should take the matchups that are often reported by announcers. To do so I examined play-by-play data for 2003 through 2005. From that data I found all the batter/pitcher matchups where the batter had 50 or more total at-bats in the three-year period and where the matchup yielded five or more at-bats. This left me with 30,481 individual matchups.

To use the binomial distribution we then need to calculate the probability of success (p) for each hitter. Although I could have used the batting average of the hitter over the three-year period, I chose instead to employ the “log5″ formula Bill James introduced in his 1981 Baseball Abstract. That formula takes into consideration not only the batting average of the hitter (BAVG) but also the batting average against the pitcher (PAVG) and the league context (LgAVG) to calculate an Expected Average (ExAvg).

ExAvg = ((BAVG * PAVG) / LgAVG) / ((BAVG * PAVG) / LgAVG + ((1-BAVG)*(1-PAVG)/(1-LgAvg)))

Dan Levitt wrote a nice article several years ago showing that this formula does a good job of predicting the outcomes for actual batter/pitcher matchups.

With the probability of success calculated all that was left was to run the numbers.

The Results

I calculated the p-value for each matchup and found that 956 of the matchups had p-values less than .05. In other words 3.1% of the matchups over the last three years would be considered statistically significant under the standard test used by statisticians. If the p-value is raised to .08 the number of statistically significant matchups goes up to 1,728 or 5.7%.

You may be wondering why less than 5% of the matchups had p-values of .05 when we would have expected there to be 10% (5% on both ends of the distribution) since that’s what a p-value of .05 means. That question bothered me until John Walsh pointed out the discrete nature of the binomial model when there are few trials (at-bats). It turns out that much of the reason has to do with the fact that of the sample of 30,481 matchups, over two-thirds consist of nine at-bats or fewer and just 3.5% include more than 20 at-bats. When there are so few trials, the probability of obtaining a p-value of less than .05 is actually less than .05 because the distribution is not a smooth curve. For example, in five at-bats you can calculate the following p-values for each number of hits given a probability of success of .266:

H  p-value
0   0.2130
1   0.3860
2   0.2798
3   0.1014
4   0.0184
5   0.0013

Of these five possibilities, only where the batter gets four or five hits is the p-value less than .05. So given 1,000 matchups of five at-bats we would expect about one matchup of 5 for 5 (.13%) and 18 of 4 for 5 (1.8%) to be significant. In my study there were 7,328 matchups of five at-bats, so I would expect 10 matchups where the batter went 5 for 5 and 134 where the batter went 4 for 5. In reality I found five matchups of 5 for 5 and 145 of 4 for 5, six of which had p-values under .05 because the expected average was over .350. This tracks pretty well with what we’d expect and is an indication that the model works pretty well.

You can imagine, however, that in the real world the model may break down a bit at the extremes. We might postulate that this is due to the fact that both hitters and pitchers learn as they face one another repeatedly, which may well restrict the most extreme values on both ends. Also strategy (the reason LaRussa has his cards after all) dictates that extreme matchups be avoided both by the offense and defense through the use of relief specialists and pinch hitters.

Even at the .08 threshold that still means that on average less than one out of 17 of the matchups (given three years of data) written on LaRussa’s index cards each game are relevant in the sense that they may reveal information that the calculated expected average doesn’t (at least in terms of batting average). And as you can imagine, the problem only becomes worse when you consider that we’re using an expected average based on just three years’ worth of data, and that any batting average is merely an approximation of a hitter’s true ability (more on that later).

That said, let’s take a look at which matchups are considered the most statistically significant—in other words, those where we can make the argument that the model doesn’t hold and that there is something else going on that allows a particular pitcher to perform well against a particular hitter or vice versa.

First, we’ll take a look at the 25 most statistically significant matchups (lowest p-values) for “low-hit” matchups.

Batter           Pitcher              AB  H    Avg 3 Yr Avg  ExAvg p-value
Garret Anderson  Brian Anderson       22  0  0.000   0.300   0.335 0.00013
Bill Mueller     Mike Mussina         23  0  0.000   0.303   0.301 0.00027
Rondell White    Jake Westbrook       19  0  0.000   0.289   0.288 0.00159
Alfonso Soriano  John Lackey          26  1  0.038   0.280   0.285 0.00188
David Ortiz      Bartolo Colon        18  0  0.000   0.297   0.285 0.00239
Carlos Lee       Jeff Suppan          17  0  0.000   0.287   0.291 0.00287
Hideki Matsui    Aaron Sele           14  0  0.000   0.297   0.336 0.00327
Bobby Abreu      Mike Hampton         28  2  0.071   0.296   0.303 0.00345
Ivan Rodriguez   Jon Garland          16  0  0.000   0.303   0.298 0.00351
Tony Graffanino  Brian Anderson       21  1  0.048   0.281   0.315 0.00382
Mark Loretta     Kirk Rueter          27  3  0.111   0.314   0.349 0.00532
Travis Hafner    Bartolo Colon        15  0  0.000   0.295   0.284 0.00670
Edgar Renteria   Rodrigo Lopez        13  0  0.000   0.297   0.311 0.00789
Carlos Lee       Brian Anderson       33  4  0.121   0.287   0.320 0.00796
Rod Barajas      Bartolo Colon        18  0  0.000   0.244   0.234 0.00833
Jim Edmonds      Jason Jennings       13  0  0.000   0.280   0.308 0.00838
Mark Teixeira    Aaron Sele           12  0  0.000   0.282   0.319 0.00989
Adrian Beltre    Jason Schmidt        18  0  0.000   0.277   0.224 0.01053
Mark Loretta     Matt Kinney          11  0  0.000   0.314   0.339 0.01060
Matt Lawton      Mark Mulder          15  0  0.000   0.262   0.261 0.01066
Melvin Mora      Gustavo Chacin       12  0  0.000   0.312   0.314 0.01087
Jay Gibbons      Mark Hendrickson     12  0  0.000   0.270   0.307 0.01230
Scott Hatteberg  Brian Shouse         15  0  0.000   0.265   0.254 0.01245
Vernon Wells     Daniel Cabrera       14  0  0.000   0.288   0.267 0.01298
Aubrey Huff      Tim Wakefield        26  2  0.077   0.290   0.275 0.01362

So given Garrett Anderson’s .300 batting average over the past three years and his expected average against Brian Anderson of .335, his 0 for 22 registered a probability of just .013%. In other words, if the model (Bernoulli trials and the expected average) perfectly mimicked real life, the odds of Anderson going 0 for 22 would be around 1 in 7,700. These kinds of odds lead us to say that in the battle of the Andersons, Brian very likely possesses some ability to get Garrett out (starting with his left-handedness of course). In other words the assumptions of our model probably don’t hold for this matchup.

As you can tell from the table, the higher the expected batting average, the fewer hitless at-bats it takes to make the list. Mark Loretta with his .339 expected average off of Matt Kinney makes the list with his 0 for 11, and the p-value associated with that matchup is lower than the 0 for 15 Matt Lawton recorded against Mark Mulder. The reason of course is that it is more unlikely for the .339 hitting Loretta to go 0 for 11 than it is for the .261 hitting Lawton. So while it seems paradoxical, generally speaking a manager might make the decision to pinch-hit for a good hitter based on the evidence of fewer at-bats than he would for an average or poor hitter.

You should also notice that almost all of these hitters have expected averages higher than the major league average (which was .266 over the three-year period) and in fact their cumulative average is .296. This is what one would expect since a higher average means that going hitless against a particular pitcher is less likely.

Of the 956 matchups that produced p-values less than .05, 204 of them were of the low-hit variety where the hitter recorded zero or only a few hits. And in looking at those 204, only one of the 430 had seven at-bats and none had fewer. So as common sense would dictate, a 0 for 6 or 0 for 7 probably isn’t a big enough sample on which to base decisions. The 0 for 7 with the lowest p-value was Sean Casey versus Ben Hendrickson, where Casey was expected to hit a whopping .356 off Hendrickson, who gave up an average of .310 over the past three years. That matchup just inched over the threshold at .0458.

Here are the significant low-hit matchups with the most at-bats were:

Batter           Pitcher              AB  H    Avg 3 Yr Avg  ExAvg p-value
Hank Blalock     Joel Pineiro         35  5  0.143   0.279   0.281 0.04571
Carlos Lee       Brian Anderson       33  4  0.121   0.287   0.320 0.00796
Alex Gonzalez    Livan Hernandez      33  3  0.091   0.249   0.245 0.02402
Shawn Green      Kirk Rueter          31  4  0.129   0.277   0.310 0.01824
Alex Rodriguez   Sidney Ponson        30  5  0.167   0.302   0.331 0.03757
Joe Crede        Brian Anderson       29  3  0.103   0.251   0.282 0.01979
Bobby Abreu      Mike Hampton         28  2  0.071   0.296   0.303 0.00345
Chone Figgins    Barry Zito           28  3  0.107   0.293   0.259 0.04445
Mark Loretta     Kirk Rueter          27  3  0.111   0.314   0.349 0.00532
Mark Loretta     Jason Jennings       27  4  0.148   0.314   0.343 0.02188

What’s more interesting, however, are those matchups that at first glance one might think are statistically significant but probably aren’t. The following list is a cluster of matchups with p-values around .20.

Batter            Pitcher              AB  H    Avg 3 Yr Avg  ExAvg p-value
Rocco Baldelli    Jon Lieber           13  2  0.154   0.285   0.299 0.20406
Royce Clayton     Paul Wilson          14  2  0.143   0.260   0.280 0.20419
Rich Aurilia      Brandon Webb         11  1  0.091   0.269   0.247 0.20423
Coco Crisp        Denny Bautista        8  1  0.125   0.290   0.328 0.20425
Frank Catalanotto Pedro Martinez       16  2  0.125   0.298   0.247 0.20464
Julio Franco      Al Leiter            14  2  0.143   0.295   0.279 0.20466
Brian Schneider   Josh Beckett         23  3  0.130   0.253   0.225 0.20469
Carlos Guillen    Terry Mulholland     11  2  0.182   0.305   0.348 0.20488
Todd Helton       Tom Martin           12  2  0.167   0.343   0.321 0.20492
John Mabry        John Thomson         10  1  0.100   0.258   0.268 0.20503
Mike Matheny      Kris Benson          16  2  0.125   0.247   0.247 0.20546
Nomar Garciaparra Brett Myers           9  1  0.111   0.299   0.295 0.20548
Jason Kendall     Russ Ortiz            9  1  0.111   0.305   0.295 0.20568
Todd Helton       Jesse Foppert         8  1  0.125   0.343   0.327 0.20569
Marcus Giles      Kevin Millwood        9  1  0.111   0.305   0.295 0.20577
Wes Helms         Kris Benson          10  1  0.100   0.268   0.268 0.20582
Jay Gibbons       Jake Westbrook       10  1  0.100   0.270   0.268 0.20589
Morgan Ensberg    Jerome Williams      10  1  0.100   0.283   0.268 0.20595

What these matchups reveal is that the common sense notion that a 1 for 11 or a 2 for 16 is enough to conclude that a particular hitter has trouble with a particular pitcher is often flawed. The above list shows that these kinds of performances don’t necessarily indicate that the hitter will continue to perform worse than his expected average. For example, the expected average of Todd Helton versus Tom Martin is .321, but Helton hit just .167 in his 12 at-bats from 2003 through 2005. The p-value of .205 may lead us to conclude that this is not enough evidence to assume that Helton is not really a .321 hitter against Martin since the odds of Helton getting just two hits in those 12 at-bats is one in five (once again, if the model holds). In other words, matchups like this probably aren’t in and of themselves enough on which to base a pinch-hitting decision.

As an aside this tracks very well with the wisdom of Earl Weaver in his book Weaver on Strategy where he said when talking about matchups that “Most of the time, I think a player needs around 20 at-bats before I can get a reading on him against a certain pitcher.”

But that doesn’t mean that these p-values aren’t low enough on which to base decisions in a game. For example, let’s say La Russa is trying to decide between two pinch-hitters and that they are close in overall ability. If one of them shows a good matchup (say a 4 for 8) with a p-value of 0.2, he would be remiss to simply disregard the data and go on a hunch because the hitter could turn out to be a .200 hitter against this pitcher as indicated by his expected average. The lower p-value leads one to believe that there is a decent chance that the player in question really does hit better than his expected average against that particular pitcher.

On the other end of the spectrum here are the 25 most statistically significant matchups where the hitters were very successful.

Batter             Pitcher              AB  H   Avg 3 Yr Avg  ExAvg p-value
Larry Bigbie      Andy Pettitte        14 11  0.786   0.276   0.256 0.00005
Michael Young     Brandon Backe        10  9  0.900   0.317   0.318 0.00024
Marcus Giles      Jason Schmidt        14 10  0.714   0.305   0.248 0.00032
Preston Wilson    Jae Seo               6  6  1.000   0.268   0.271 0.00040
Preston Wilson    Byung-Hyun Kim       10  8  0.800   0.268   0.254 0.00046
Enrique Wilson    Pedro Martinez       13  8  0.615   0.214   0.174 0.00046
Jose Reyes        Jon Lieber           13 10  0.769   0.277   0.292 0.00050
Mark Grudzielanek Tim Hudson            6  6  1.000   0.304   0.286 0.00055
Derrek Lee        Mark Mulder          15 11  0.733   0.295   0.294 0.00056
Matt Holliday     Woody Williams        6  6  1.000   0.299   0.296 0.00067
Aubrey Huff       Jon Lieber           13 10  0.769   0.290   0.305 0.00076
Todd Helton       Damian Moss           7  7  1.000   0.343   0.367 0.00090
Clint Barmes      Odalis Perez         12  9  0.750   0.289   0.282 0.00102
Reggie Sanders    David Weathers        5  5  1.000   0.272   0.266 0.00133
David Bell        Gary Majewski         7  6  0.857   0.253   0.251 0.00137
Charles Johnson   Jake Peavy            6  5  0.833   0.230   0.197 0.00150
David Dellucci    Kevin Brown          14  9  0.643   0.242   0.241 0.00161
Mark Kotsay       Jamie Moyer          21 13  0.619   0.288   0.288 0.00162
Adrian Beltre     Dontrelle Willis      9  7  0.778   0.277   0.264 0.00192
Brad Wilkerson    Mike Matthews         7  6  0.857   0.257   0.268 0.00198
Matt LeCroy       Nate Robertson       15 10  0.667   0.273   0.280 0.00207
Mark Sweeney      Adam Eaton           11  8  0.727   0.277   0.271 0.00213
Jermaine Dye      Jarrod Washburn      24 13  0.542   0.253   0.252 0.00228
Aaron Rowand      Tim Wakefield         9  7  0.778   0.288   0.272 0.00232
Jeff Cirillo      Javier Vazquez        6  5  0.833   0.234   0.218 0.00243

If you look closely you’ll also notice that five of the 25 slots (six if you count Larry Bigbie who had 66 at-bats with the Rockies in 2005) are filled by players who have spent time with the Rockies. Given the way in which Coors Field inflates offense Matt Holliday’s 6 for 6 against Woody Williams and Clint Barmes going 9 for 12 off Odalis Perez are likely park influenced as well. As a result we could make adjustments to the model to take park factors into account.

You’ll also notice that hitters don’t need as many plate appearances to make this list. In other words, it requires relatively few consecutive hits to reach a statistically significant result. So while Weaver may generally need around 20 at-bats, a 5 for 6 or 6 for 7 is likely almost always enough to “get a reading.”

Overall 752 of the 956 significant matchups were of the high-hit variety with the following having the most at-bats:

Batter           Pitcher              AB  H    Avg 3 Yr Avg  ExAvg p-value
Derek Jeter      Rodrigo Lopez        40 19  0.475   0.307   0.321 0.03009
Jack Wilson      Ben Sheets           32 13  0.406   0.275   0.253 0.04100
Todd Helton      Odalis Perez         31 16  0.516   0.343   0.334 0.02790
Eric Chavez      Jamie Moyer          30 13  0.433   0.275   0.276 0.04598
Jimmy Rollins    Carl Pavano          29 13  0.448   0.281   0.285 0.04421
Bobby Kielty     Mark Buehrle         27 11  0.407   0.244   0.248 0.04958
Todd Walker      Roger Clemens        27 11  0.407   0.287   0.239 0.03964
Eric Byrnes      Mark Buehrle         27 12  0.444   0.260   0.264 0.03282
Trot Nixon       Roy Halladay         27 13  0.481   0.295   0.275 0.01751
Johnny Damon     Bartolo Colon        26 12  0.462   0.298   0.286 0.04313

Wrapping It Up

So is Ramirez’s performance against Carpenter significant? Ramirez hit .296 over the past three years, while Carpenter had a batting average against the league of .237. Given the major league average, Ramirez would have been expected to hit .264. He actually hit .231 (6-26), so the p-value is a healthy .448, which means that Ramirez may in fact be a .264 hitter against Carpenter. The fact that Ramirez hit 30 points lower than he “should” have could very well be chalked up to chance.

While all of this is interesting, there are many issues that serve to cloud the picture which I haven’t explored, and which, if you’re still reading this article, you’re probably more suited to pursue than I. In addition to augmenting the model to take into account park effects, some hitters hit better against fly ball versus ground ball pitchers and vice versa, so their probability of success should go up or down depending on the pitcher’s profile. Of course the same story can be said of platoon effects.

Further, as hinted at previously, batting average is highly variable from season to season—a fact Albert demonstrated in a 2004 article titled “A Batting Average: Does It Represent Ability or Luck?” . In that article Albert concludes that measures such as strikeout rate, walk rate, home run rate and on-base percentage are all much more strongly correlated from year to year than batting average on balls in play (removing the effect of strikeouts) as well as batting average itself. In short, as much as 50% of the difference in batting average between players can be attributed to luck while 50% can be attributed to differences in their hitting ability. As a result, a measure like slugging percentage or OPS would probably be a better candidate for this kind of study, although it would require using a different kind of model.

References & Resources

  • Curve Ball : Baseball, Statistics, and the Role of Chance in the Game by Jim Albert, Jay Bennett
  • Print Friendly