In the September 8, 2004, edition of SABR-L, Dan Heisman posed a question which seemed to me to be worthy of a few hours’ research. As I cannot really improve on his phrasing of the issue, I have asked for and received permission from him to quote his letter:
Subject: The effect of stat frequency on the 162 game “asterisk”
The more frequently a statistic occurs, the larger the effect of a longer season. For example, the chances of breaking the record for most plate appearances is clearly more enhanced with 162 games compared to 154 when compared to the chances of breaking the record for Most Perfect games pitched in a season.
Many of the SABR-L members are statisticians. Is it possible for them to speculate how much easier it is for Ichiro to break Sisler’s 154 game hit record in 162 games than it was for Maris to break Ruth’s record? I would guess the answer is easier, but not dramatically so, but I am curious if anyone can give a quantitative answer to this question.
Baseball’s Active Leaders
On a certain level, the question that Dan poses is obviously un-answerable. The impact of the longer schedule on the probability of a player of breaking the home run record is different if the player needs to hit 62 homers than it is if he needs to hit 61. It is very significantly different if he needs to hit 65. The impact is different if the player’s “true level of ability” — his sustainable production level — is 40 homers per 600 at-bats than if it is 45 homers per 600 at-bats. The impact of the longer schedule is different on every player that you could come up with — thus, there is no general answer to the question.
Nonetheless, while we can not make a perfectly objective answer to the question which covers a wide range of players, we can perhaps offer a reasonable answer to the question, based on reasonable assumptions. Suppose that there is a hitter who, in a typical season, given his real level of ability, can be expected to play 150 games (out of 154), bat 570 times, and hit 42 home runs. We will call this player “Roger Marix”. What is the chance that that player, given a 154-game schedule, will hit 60 or more home runs? What is the chance that that player, given a 162-game schedule, will hit 60 or more? What is the difference between the two?
I wrote a very simple computer program to run that problem. In this program, it was first randomly determined whether or not Marix would play in this game, assuming that he should play in 150 of every 154 games.
Second, if Marix did play in this game, the computer randomly assigned him a number of at-bats for the game:
- 2 at-bats in 5% of the games
- 3 at-bats in 33% of the games
- 4 at-bats in 41% of the games
- 5 at-bats in 19% of the games
- 6 at-bats in 2% of the games
That works out to 3.80 at-bats per game, which is 570 at-bats per 150 games.
For each at-bat, the computer then randomly determined whether or not Marix would hit a home run, on the assumption that he should hit 42 home runs per 570 at-bats.
We will note in passing that there are many elements of the Roger Maris argument that this simulation does not deal with. This simulation does not deal with the fact that the expansion of the schedule may have weakened the level of competition. The simulation does not deal with the fact that one of the two new teams added to the league played in a band box. The simulation does not deal with the fact that Roger Maris’ hair fell out and HBO made a movie about it. Roger Marix is a much simpler creature than Roger Maris. We’re just dealing with one element of the problem, which is the impact of having 162 games on the schedule, as opposed to having 154.
I ran Marix through 200,000 simulated seasons of 154 games each. In those 200,000 seasons, Marix hit 60 or more home runs 806 times, or once every 248 years.
I then changed just one tiny element of the program, changing the season from 154 games to 162, and repeated the experiment. Marix hit 60 or more home runs 2,229 times, or once every 90 years. The eight extra games increased the number of times Marix hit 60 or more home runs by 177%.
I then created a comparable “clone” for Ichiro — Ichirox — using the same program, but just very slightly different parameters. Like Marix, Ichirox played in 150 of every 154 games, on average — some seasons 154, some seasons 140, but 150 on average. Ichirox had more at-bats per game:
- 2 at-bats in 2% of the games
- 3 at-bats in 20% of the games
- 4 at-bats in 41.5% of the games
- 5 at-bats in 27.5% of the games
- 6 at-bats in 8% of the games
- 7 at-bats in 1% of the games
That works out to 4.225 at-bats per game, which is 634 at-bats per 150 games, or 651 at-bats per 154 games, or 684 at-bats per 162 games. Ichirox hits .350 overall — .400 once in a great while, .299 some seasons, but .350 overall.
In this study, Ichirox collected 257 or more hits in a 154-game schedule 845 times in 200,000 seasons — essentially the same as the frequency with which Marix hit 60 or more homers. This was the operating assumption — I actually had to re-run the study several times and jiggle the numbers to make it come out in that area, so that we would have a basis to compare the effect of the longer schedule on Ichirox vs. Marix. Ichirox had 257 or more hits 845 times, which is once every 237 years.
I then changed one number in the program, changing the season from 154 games to 162, and re-ran the study.
Given the eight extra games, Ichirox tied or broke Sisler’s record 8,462 times in 200,000 seasons, or once every 24 years. Whereas the longer schedule increased Marix’ chance of breaking or tying Ruth’s record by 177%, it increased Ichirox’ chance of breaking or tying Sisler’s record by 901%.
For me to try to tell you how surprised I am by this answer would be a waste of my time and yours, but … I certainly did not expect this. Heisman speculated that it would be “easier, but not dramatically so.” I would have agreed. In fact, the impact is (about) five times greater on Ichiro than it was on Maris.
The irony is that whereas the extra eight games on the schedule created a mega-furor when Maris broke Ruth’s record, the same factor is being almost totally ignored as Ichiro gets set to cruise past Sisler, even though this edge was a hamster for Maris, and is a gorilla for Ichiro. The reason for the muted reaction, of course, is that, when Maris had his moment broiling in the sun, the 162-game schedule was new, and thus controversial. But most of you reading this weren’t even born then, and the 162-game schedule has long since ceased to be a curiosity. On the one hand, the 162-game schedule has been around so long, and so many players have already “had” Ichiro’s advantage, that this no longer seems to be any big deal, while on the other hand, Barry Bonds in the last few years has so thoroughly trashed the record book that we’re all sort of numb to it. Nobody cares about that stuff anymore.
One other note. I also kept track of the frequency with which Ichirox would hit .400, just because this seemed like an obvious thing to want to know. In 200,000 seasons, Ichirox hit .400 or better 809 times — essentially the same as the frequency with which Marix or Ichirox would break their respective “counting” records in a 154-game schedule. But, of course, as the longer schedule makes it easier to collect 257 hits, it makes it harder to maintain a .400 average. On the 154-game schedule, Ichirox hit .400 or better 954 times, or about 18% more often.
Maybe that is next year’s story; maybe next year Ichiro will miss 40 games with an injury, and hit .410 in the other 122. A friend of mine claims that it is obviously impossible for a modern hitter to hit .400, because, if it could be done, Barry Bonds obviously would have done it by now.
Bill James, Chairman Emeritus
Swift Boat Veterans for Kevin Youkilis