Lineup Balanceby David Gassko
December 07, 2006
The Detroit Tigers were supposed to win the World Series because of their balanced lineup. They lost in five because … was it because of it?
Lineup construction is a tricky issue, one we knew surprisingly little about prior to the publication of The Book. Authors Andy Dolphin, Mitchel Lichtman, and Tom Tango gave the issue a thorough treatment, but they only looked at it from a manager’s perspective. The Book tells us how to construct a batting order, but it does not stop to ask (nor should it, given the narrow subject matter of the tome): How might lineup balance, not order, impact a team’s ability to score.
There are a lot of theories in “the book” and like all baseball wisdom, they are contradicting. Some believe that a balanced lineup gives a team consistent offense day-in and day-out, and a chance to win in every game. Others prefer having a couple superstars in the lineup, because good pitching shuts down only solid hitting.
These ideas came to a head in recent thread on the Sons of Sam Horn message board (where I have been hanging out a lot recently, having hosted an Annual-related chat there last Friday). A bunch of interesting questions were raised: Do teams with balanced lineups score more runs than expected based on their component statistics; are balanced lineups correlated with outperforming Pythagorean winning percentage; are teams with balanced or unbalanced lineups more likely to win in the playoffs; do teams that rely on power score less runs than expected; do they win more games than expected; do they win more in the playoffs? Let’s find out.
The first thing we need to do is come up with a measurement of balance in each team’s lineup. My solution is imperfect, but pretty good. What I have done is taken every American League team over the past 30 years (so that we wouldn’t be dealing with pitcher hitting), 416 in all, and calculated the weighted on-base average (minus reached on error) for each of the top nine players in plate appearances on that team.
Then, I calculated the spread (standard deviation) in wOBA for each “lineup” and divided it by the average wOBA for that lineup, which yields a statistic called the coefficient of variation (CoV). We need the CoV because higher offense teams tend to have higher standard deviations, and we don’t want the quality of a team’s offense to pollute our results. We are only interested in the correlation between the balance in a lineup and various measures of team performance.
To find the relationship between the various variables and the team’s reliance on home runs, I simply divided the team’s home runs by its runs scored.
So what is balance correlated with? One suggestion is that balanced lineups will out-perform their expected runs allowed based on component statistics because surely in a balanced lineup, runners are more likely to be driven in than in an imbalanced lineup, all other things being equal.
To test that theory, I calculated the BaseRuns for each team, using a technical version which includes many minor categories, and a year-and-league-specific multiplier. In essence, I used the best and most accurate run modeler around. I then subtracted each team’s predicted runs scored (BsR) from their actual runs scored, and divided the result by their runs scored, which allows me to express their over-or-under-performance as a percentage, which I’ll call RunDiff%.
The correlation between RunDiff% and CoV was basically non-existent, .016 (p = .739), and highly insignificant, which refutes the hypothesis that balanced lineups will score more than their component statistics will tell us. Balanced lineups have no inherent advantage in getting runners home.
But do balanced lineups help their teams win more games than expected? Again, the theory here is that in a balanced lineup, it is less likely that each player will have an off-day, and thus balanced lineups score more runs in the optimal four-to-six run range, and exceed their Pythagorean record.
Indeed, the correlation between CoV and winning percentage minus expected winning percentage was -.066, which means that the more imbalanced a lineup gets, the worse it will do relative to its Pythagorean record. However, while this result is intriguing, it is not statistically significant (p = .178), and furthermore, even if it were, the effect would be equal to a swing of less than one game for 95% of all teams. An extremely balanced lineup would be expected to outperform its Pythagorean record by just half-a-win.
Okay, now let’s perform one final lineup balance test, perhaps the most important of all. Does having a balanced lineup have anything to do with a team’s performance in the postseason? No. The correlation between playoff winning percentage and CoV is .004 (p = .974), the definition of insignificant. The Tigers lost to the Cardinals because they were outplayed, not because their lineup was not built for postseason success.
So this suggestion comes from SoSH poster Jim Bennett. He wrote:
I'd be willing to bet that there is an inverse relationship between the percentage of runs a team scores via home runs and their Pythagorean run performance. I've seen glimpses of it, but haven't really tried to prove it.
Intrigued, I decided to re-run the same tests I had run in measuring the effects of having a balanced lineup, but instead looking at the effect of scoring runs via the home run. So instead of CoV, we’re going to look at HR/R.
First, let’s look at how teams that hit a lot of home runs score versus expectation. The correlation between RunDiff% and HR/R is -.253 (p = .000), and highly significant. What that means is that teams that are more reliant on home runs tend to score less runs than we would expect. This is not a function of our run expectation system over-valuing home runs, because BaseRuns is the one run estimator that treats home runs correctly.
One standard deviation in this case is 6.15 runs, meaning that a team that goes from being average in terms of its reliance on home runs to being in the 16th percentile will increase its run output by over six runs over the course of a season, all else being equal. It is not immediately clear to me why this is, but the effect is obviously very real.
Now let’s address Bennett's observation, which is that teams which are reliant on the home run tend to win fewer games than we would expect based on their runs scored and runs allowed. Simply put, Bennett is wrong. The correlation between HR/R and the difference between actual and expected winning percentage is .085, which means that teams that are more reliant on hitting home runs tend to win more games than expected. This makes sense, as high home run teams are more likely to score at least one run in a game, and the only way to guarantee a loss is to get shutout, as Sal Baxamusa has noted.
The correlation is marginally significant (p = .081), and a one standard deviation change in HR/R results is equivalent to .32 wins per season. Basically, this effect halves the earlier effect we found of teams that are more reliant on home runs scoring fewer runs than we would expect.
For 95% of all teams, then, given both these parameters, their dependency on home runs will have an effect no greater than 12 runs overall, or one win. So there is some benefit in teams cutting down their reliance on home runs in the regular season, about half-a-win for going from average to near the bottom of the pile.
But that’s in the regular season; what about the playoffs? It turns out, there is a strong (r = .153, p = .002) correlation between a team’s reliance on home runs and its success in the playoffs. If an average teams jumped to the 68th percentile in terms of its reliance on home runs, it would increase its postseason winning percentage by .037 points (or six wins per 162 games). That is a huge effect. It would most certainly behoove playoff contenders to build up their power.
This result makes sense, for teams face better pitching in the postseason. Thus, all hits are worth a lot less, because they are less likely to drive a runner home (since a runner is less likely to be on base), and getting on-base is not as valuable because a hit is less likely to follow. But a home run is always worth at least one, very important, run.
There is still a lot more to learn about the fascinating workings of a lineup. What we have learned today is that lineup balance is generally unimportant, at least to the categories which we have checked. On the other hand, a team’s dependence on power has a relatively large impact both on its regular and postseason numbers. In the regular season, dependence on power hurts teams, while in the postseason it is highly beneficial. Thus, it makes sense that playoff contenders should seek to add more power bats while teams that are just trying to get into the race should shy away from power instead.
References and Resources
An e-mailer has pointed out that for various reasons, the correct measure to use would not be home runs divided by runs, but home runs divided by BaseRuns. Re-running the numbers yields similar results, though the magnitude of the first two correlations changes. The "r" between RunDiff% and HR/BsR is -.098 (p = .045), while the correlation between a team's performance versus its Pythagorean record and HR/BsR drops to .076 (p = .121). The correlation between HR/BsR and postseason winning percentage actually increases ever-so-slightly, to .158 (p = .001).
This changes our conclusions a bit. Teams that are reliant on home runs actually do a little better in the regular season, and a lot better in the postseason. Thus, it is always a good thing to be reliant on home runs, given a choice between two equal offenses. Maybe the Tigers had such a great run through the postseason because of their high home run totals (203 on the year) and not because of their balanced lineup.
David Gassko is a former consultant to a major league team. He welcomes comments via e-mail.