Americans Defeat Nationals in Pitchers’ Duelby John Walsh
January 11, 2007
It's almost common knowledge now that the American League is currently playing at a higher level than the National League. Whenever a player changes leagues, commentators remark on the improvement in stats we can expect if the player is moving to the NL (see any commentary on the Barry Zito signing). Of course, players moving to the AL are widely perceived to be headed for a tougher playing environment. In fact, noted sabermetrician Mitchel Lichtman wrote an excellent three-part series on the subject here at the Hardball Times last summer. Mitchel presents a detailed analysis that uses several methods that are sensitive to any difference in the quality of the American and National Leagues.
Mostly we think of the AL as the better hitting league—part of that is due to the designated hitter-rule and the fact that we are used to run scoring being typically a half-run higher per game in the AL. But, even beyond the designated hitter, you often hear about how tough AL lineups are from top to bottom; that it's not just the designated hitter, but that the AL has more good hitters. Mitchel's study bears that out, but I have done some additional investigation that shows that the American League has been better in pitching and defense than the National league in recent years as well.
In the winter of 2005, Scott Hatteberg stood at a crossroads of sorts. About to turn 36 years old, the Oakland A's first baseman/designated hitter had not been offered arbitration by his (former) team and was a free agent. Trouble was, he was coming off his worst year in recent memory: his .256/.334/.343 line was not going to cut it for a first baseman/designated hitter-type. He had lost playing time in his last year in Oakland as well, appearing in 134 games, his lowest yearly total since leaving Boston as a part-time player in 2001.
There was speculation the Hatteberg might just hang up the ol' spikes and seek a position in somebody's front office. But the Reds were in need of a first baseman and they took a low-risk flier on Hatteberg, offering him a one-year contract for just $750,000. Hatteberg had an unexpectedly fine 2006, producing a line of .289/.389/.436. Hatty was probably thinking "NL, where have you been all my life?"
Antonio Perez also underwent a big change in the 2005 off-season. The young Dodgers infielder had enjoyed a solid season as a utility infielder: playing in 98 games, mostly at second and third base, the (then) 25-year-old put up a very nice line of .297/.360/.398. However, he would not get a chance to break into the Dodgers infield, because he was soon traded to the A's in the Milton Bradley/Andre Ethier deal. The A's infield was much more set, so Perez's playing time in 2006 was cut way down. (Had he been able to play shortstop, he may have gotten more playing time since Bobby Crosby was injured for much of the season.) And when he was in there, Perez did not build on his solid 2005 campaign. Indeed, he hit about as poorly as possible, going .102/.185/.204 in 109 plate appearances. Welcome to the new neighborhood, kid.
Scott Spiezio is our third player to change teams in 2005. After spending his whole career in the AL, he moved to the Cardinals for the 2006 season. In his two previous seasons, spent with the Mariners, Spiezio came to the plate 466 times and "produced" to the tune of .198 /.270/.324. How did he do when playing his age-33 season in the NL? Just .272/.364/.496, that's all.
Measuring the Relative Strength of Pitching
What do these three little stories mean? By themselves, nothing, of course. I'm sure you could name a few players who have hit much better in the AL than in the NL. Actually, here are a few right here: Kevin Mench, Julio Lugo, Mark DeRosa and Jim Thome. There could be any number of reasons for a player to play better in one situation than another; the change of leagues may have nothing to do with it. Maybe Hatterberg's 2005 was just an off year and his bounceback in 2006 was just a return to his established level of play. Maybe Perez never got used to his reduced role and his hitting suffered. Maybe Spiezio was finally healthy, which allowed him to excel. In all these cases, the sample size is far too small to draw any conclusions about the quality of the two leagues.
What if we take all these players, though, all the hitters who have played in both leagues recently and see how they did collectively? Might we be able to coax out of the data some measure of relative league strength? Yes, we can. Read on for the details.
The analysis is basically quite simple—I select players who have played in both leagues in the period from 2004 to 2006 and I compare their collective hitting performance in the two leagues. If the group hit better in the NL than in the AL, we can infer that the pitching in the National League is not as good as it is in the American League. That's the basic outline — it's very simple, right?
There are a couple of additional details you should know. One simple one is that I require the players in my sample to have at least 100 plate appearances in each league in the three-year period. One hundred plate appearances is not many, but I wanted to include players that may have switched leagues at the trading deadline. Another thing to note is a key assumption: this method assumes that the average park in the NL does not favor hitters over the average AL park, and vice versa. Standard park factors compare parks within a league—we have very little information on how AL parks compare to NL parks, simply because of the low number of interleague games. My intuitive feeling is that the assumption is a reasonable one, but it should be kept in mind, nonetheless.
The first thing that surprised me when starting to look at the data was the sheer number of players who have played in both leagues. For example, between 2004 and 2006 the number of players with at least 100 plate appearances in both leagues is 120. Whoa! That's more than I expected. That's the highest number for any three-year period going back to 1900, but there were plenty of guys switching leagues going back to around 1960.
The graph on the right shows shows the number of two-league players since 1900, with each point representing a three-year period. Aside from the mixing of players due to the formation of the American League in 1901, there weren't many player exchanges between leagues until the advent of expansion in the 1960s. (I have not looked at mixing with the Federal League, which is why this graph shows no big spike in the 1914-1915 period.) From the '60s on the number of two-league players has grown fairly steadily, especially after the advent of free agency in the mid-70s, and nowadays we generally have around 100 players who have gotten at least 100 plate appearances in both leagues in a three-year period.
AL vs. NL
So, how did our pool of hitters do in each league? I have determined each player's OPS in each league without any corrections at all. No park adjustments, no corrections for aging, no league offensive context normalization—none of that stuff. There are three reasons for not adjusting for these effects: 1) each adjustment introduces some uncertainty of its own, 2) with a large number of players and plate appearances (as we'll see in a minute) things like park factors and aging effects will tend to cancel out, and 3) I wanted to keep the analysis as simple and comprehensible as possible.
The results are most easily visualized by the graphic on the right, where each point represents a single two-league player. The vertical position of the point shows the player's OPS in the NL, while the horizontal position is his AL OPS. The red dotted line corresponds to equal OPS's in both leagues. It's evident that more than half the points are above the line, meaning that the majority of batters did better in the NL. In fact 74 players lie above the line and 45 lie below the line. Mathematically-inclined readers will wonder about the one remaining player: Troy Glaus had exactly the same OPS (.885) in both leagues, so his point is right on the line.
The Summing Up
Since 62% of the players hit better in the NL, it would appear that NL defenses (pitching plus fielding) are inferior to their AL counterparts. To quantify this difference, I calculated the difference (NL OPS) minus (AL OPS) for each player. I then calculated the average difference for the whole sample, combining the individual differences with weighting appropriate to the number of plate appearances for each player (see the Resources section below for details). This method not only yields the average OPS difference, but also the (one standard deviation) uncertainty on that number—so we know how much faith to put in the calculation.
The result for the period 2004-2006 is that two-league players hit better in the National League by .029 points of OPS. The result is significant: the uncertainty (one standard deviation) is .008. In other words, the probability that the true level of pitching/defense in both leagues is actually equal in this period is 0.000015, i.e. pretty darn small. So we can say with a high degree of certainty (keeping in mind, though, the assumption about AL/NL park effects) that the AL had significantly stronger pitching/defense than the NL during the last three seasons.
How Long (Has This Been Going On)?
Since the number of players switching leagues is fairly substantial going back almost 50 years, it's possible to use this method to evaluate the relative strength of pitching/defense in the two leagues going back to 1960. Keep in mind, though, that the number of players in each three-year sample is decreasing as we go back in time, so the statistical reliability of the results will decrease as well. The graphic on the right shows the quantity (NL OPS) minus (AL OPS) for two-league players in each three-year period going back to 1960.
The dashed red line is "zero", where the two leagues are of equal strength. When the points lie above the red line, it means hitters performed better in the NL, hence the AL pitching/defense was stronger. The light-blue shaded band represents the one-standard-deviation uncertainty on the data points. As advertised, the thickness of this band increases as we go back towards 1960.
The graph shows that the AL has enjoyed an advantage in pitching/defense for the last 10-15 years. The line is bouncing around a bit, but it appears that the leagues had comparable pitching/defense from the mid-70s until the early-90s. Before that, going back to 1960, the NL seems to have been superior, although the uncertainties are getting pretty large. Still, the NL superiority in the 60s and 70s overlaps well with their stellar record in All-Star Games. The National League actually went 19-1 in All-Star Games in the period from 1963 to 1982 (and back then it really did count, at least more than it does now).
For those of you who want to see the numbers, here's a table of the data that were used to make the plot (remember "Year" represents a three -year period centered on the listed year):
Year NL-AL OPS 1 SD ---- --------- ----- 2005 0.029 0.008 2002 0.011 0.009 1999 0.021 0.009 1996 0.019 0.010 1993 0.024 0.012 1990 0.004 0.012 1987 0.007 0.013 1984 0.001 0.014 1981 0.013 0.013 1978 -0.019 0.016 1975 0.011 0.017 1972 -0.005 0.015 1969 -0.010 0.018 1966 -0.024 0.020 1963 -0.045 0.019 1960 -0.020 0.018
How big is the .029 advantage in OPS that we found for 2004-2006? Well, at the team level, an increase of .029 in OPS would correspond to about 60 more runs scored over the course of the season. That would increase the winning percentage of an average team from .500 to about .540, (or six wins).
Of course, pitching/defense is only half of the story. This method could also be used to evaluate the relative strength of hitting in the two leagues by looking at pitchers who have pitched in both leagues. There is a complication due to the designated hitter in the American League—we expect pitchers to fare worse in the AL, even if the leagues have similar offensive quality (excluding the designated hitter). One way to take this into account, may be to exclude pitchers and designated hitters from the analysis. Note that once we address the issue of offense, the assumption about park factors will become irrelevant, since the sum of offense plus defense will be largely independent of such effects.
I'm hoping that will be the subject of another article in
the near future.
References and Resources
- Many thanks to THT colleague David Gassko for discussions on the methodology.
- The data for this analysis were obtained using the 2006 Lahman Database.
The uncertainty (one standard deviation) on OPS is estimated with the following formula, which I derived using a simulation:
err(OPS) = 1.4/sqrt(PA)The uncertainty on the difference between AL and NL OPS was calculated using standard propagation of uncertainty.
The average OPS difference of all players in each 3-year period was calculated using the following formula:
aveOPSDiff = Sum(wi * OPSDiffi)/Sum(wi) where: OPSDiffi is the OPS difference of a single player. wi = 1/(err_OPSDiffi^2) is the weight for the player.The uncertainty on the average OPS difference is given by err_OPSDiff = 1/sqrt(Sum(wi)).
John Walsh dabbles in baseball analysis in his spare time. He welcomes questions and comments via e-mail.