Much has been said and written about having home field advantage. In all professional sports, playing on your home field gives you the advantage of not having to travel to the other team’s city, and having your fans at your back cheering you on. But unlike in many other sports, in baseball there is also a strategic advantage of being the home team, as you get to bat in the bottom of each inning, and most importantly, in the bottom of the ninth and any extra innings, allowing you the possibility of winning in dramatic walk-off fashion.

In the 162-game regular season, all teams play an even split of 81 home games and 81 away games, so over the course of the whole season, no individual team has an additional advantage in this category. But in the playoffs, each series has an odd number of games, and team with the better regular season is given home field advantage, and entitled to the possibility of playing an extra home game. *(Note that starting in the 2017 season, the league that wins the All-Star Game no longer receives home field advantage in the World Series)*

We may have just started a new season, but it’s never too early to think about the playoffs. At the end of last season, Sports On Earth published an article that summarized home team performance in recent major league seasons. The article says that between the 2012 and 2016 seasons, home teams won about 53.4 percent of their regular season games, which means that away teams won about 46.6 percent of their regular season games. For some reason, that trend hasn’t carried over to the best-of-five League Division Series, where teams with home field advantage have advanced only 48.5 percent of the time (23 out of 48 times). But it *has* carried over to the best-of-seven League Championship Series and World Series, where teams with home field advantage have advanced 55.0 percent of the time (83 out of 151 times).

Clearly, the team with home field advantage in a playoff series is expected to advance more than 50 percent of the time. But what is that exact percentage? This article will run through the steps to calculate the expected probability of the home team winning a best-of-five LDS and a best-of-seven LCS and World Series, visualize those formulas, and then create linear estimators for those formulas.

This topic has been researched and published by others in the past. Jeffrey Gross wrote here at The Hardball Times that if we have a best-of-seven series between two evenly matched teams, with a 54 percent win probability for the home team, the home team will advance 51.25 percent of the time. We will generalize upon this result by eliminating the constraint that the two teams are evenly matched.

For this article, we will define all terms to be relative to the team with home field advantage in the series. So a “home game” would be a home game for the team with home field advantage, and an “away game” would be an away game for the team with home field advantage. For brevity, I’ll denote “team with home field advantage” as “HFA team.”

We will derive formulas for the expected probability of the HFA team advancing in terms of:

- Ph – The probability of a home win
- Pa – The probability of an away win

It follows that the probability of a home loss is (1-Ph), and the probability of an away loss is (1-Pa). We will assume that Ph and Pa are constant for each game in the series.

### Best-of-five League Division Series

In a best-of-five series, you must win three games to advance. The potential five games are played in a 2-2-1 format, so the HFA team is potentially looking at playing two games at home, two games on the road, and then one final game at home. It is possible for the HFA team to “win in three,” “win in four,” or “win in five,” and the total probability we are looking for is the sum of the probabilities of each of these occurrences.

The easiest case is “win in three.” There is only one way this can occur:

P(win in three) = P(win Game One)*P(win Game Two)*P(win Game Three)

= Ph^2*Pa

For “win in four,” the advancing team must win Game Four. So if there were four games played overall, this implies that the advancing team had to have won two of the first three games. There are three scenarios in which this can occur:

P(win in four) = P(win two of first three games)*P(win Game Four)

= (P(win Game One)*P(win Game Two)*P(lose Game Three) + P(win Game One)*P(lose Game Two)*P(win Game Three) + P(lose Game One)*P(win Game Two)*P(win Game Three)) * P(win Game Four)

= (Ph*Ph*(1-Pa) + Ph*(1-Ph)*Pa + (1-Ph)*Ph*Pa) * Pa

= Ph*Pa*(Ph + 2Pa -3Ph*Pa)

(Note, to avoid the messy algebra in the last step, and in the upcoming calculations, I used Python’s SymPy library to perform symbolic computations.)

Similarly, for “win in five,” the advancing team must win two of the first four games, and then win Game Five. There are six scenarios in which this can occur (for brevity, I won’t enumerate all the scenarios or show the algebra):

P(win in five) = P(win 2 of first four games)*P(win Game Five)

= Ph*(6Ph^2*Pa^2 – 6Ph*Pa^2 + Pa^2 – 6Ph^2*Pa + 4Ph*Pa + Ph^2)

Now we can compute the total probability by adding up these three cases:

P(HFA team advances) = P(“win in three”) + P(“win in four”) + P(“win in five”)

= Ph*(6Ph^2*Pa^2 – 9Ph*Pa^2 + 3Pa^2 – 6Ph^2*Pa + 6Ph*Pa + Ph^2)

### Best-of-seven League Championship Series and World Series

In a best-of-seven series, you must win four games to advance. The potential seven games are played in a 2-3-2 format. You arrive at the answer by basically following the same pattern as before:

P(win in four) = P(win Game One)*P(win Game Two)*P(win Game Three)*P(win Game Four)

P(win in five) = P(win three of first four games)*P(win Game Five)

P(win in six) = P(win three of first five games)*P(win Game Six)

P(win in seven) = P(Win three of first six games)*P(win Game Seven)

P(HFA team advances) = P(win in four) + P(win in five) + P(win in six) + P(win in seven)

= Ph*(-20Ph^3*Pa^3 + 40Ph^2*Pa^3 – 24Ph*Pa^3 + 4Pa^3 + 30Ph^3*Pa^2 – 48Ph^2*Pa^2 + 18Ph*Pa^2 – 12Ph^3*Pa + 12Ph^2*Pa + Ph^3)

### Visualizing these formulas

The formulas we just derived are really long and complicated! The only thing that you may find intuitive in these formulas is that there is a single “Ph” multiplied by a much longer expression; the single “Ph” exists because in both a best-of-five and a best-of-seven series, the HFA team must win at least one home game to advance.

So let’s create graphs to visualize these formulas, which will hopefully make them more intuitive. Typically, even the best teams struggle to win two-thirds of their games, so we don’t need to plot the entire 0.0 to 1.0 probability range. Let’s assume that Ph will be between 0.35 and 0.75, and Pa will be between 0.25 and 0.65:

It turns out that we are still plotting too much data here. One extreme case has Ph at 75 percent and Pa at 25 percent, and the other extreme has Ph at 45 percent and Pa at 65 percent. In the situation of evenly matched opponents, we saw that the home team has an advantage of 53.4 percent – 46.6 percent = 6.8 percent. A 6.8 percent advantage seems reasonable, whereas a 20 percent to 50 percent advantage doesn’t.

To correct this, let’s further assume that:

- Pa will never be greater than Ph
- Ph is at most 14 percent higher than Pa

By doing this, we are basically zooming in on the previous graphs.

Let’s also define a new axis, which has value of (Ph – Pa). This axis is akin to a “home field advantage factor.” A value of zero on this axis means there is no difference between the probabilities of winning at home or winning on the road, and thus there is no home field advantage factor. A value of 14 percent on this axis means there is a large home field advantage factor.

Here are the new graphs:

These graphs may be a little difficult to decipher. Perhaps it is easiest to start around the middle of each axis, where Ph is 53.4 percent and the difference in probabilities is 6.8 percent. This is the previously mentioned case where you have two evenly matched teams, and the resulting probability of the HFA team advancing is just over 50 percent.

If you move up or down the Ph axis, that is equivalent of making the HFA team stronger or weaker. If you move up or down the (Ph – Pa) axis, that is equivalent of increasing or decreasing the effect of home field advantage. Note that there is an inverse relationship in the latter case, as increasing the effect of home field advantage means that you are lowering Pa, which will lower the team’s chance of advancing.

### Creating linear estimators

While the equations we derived in the first part of this article were quite complicated, the graphs of these equations over the domains of interest are actually quite simple. By looking at the graphs, we see that we can very reasonably estimate the probability of the HFA team advancing as linear functions of Ph and Pa.

Linear estimators are used all the time in baseball analytics. Some of the most commonly used ones are:

- Scoring 10 more runs over the course of a season will result in about one more win.
- Buying a regular season win via a free agency signing will cost about $8 million.

Now, 10 runs and $8 million aren’t the exact figures you will find if you crunch the numbers, but they are nice round numbers that are still quite accurate. Besides being easy to understand and deal with, there are other properties of linear estimators that often make them more sensible choices than more complicated non-linear estimators.

Let’s “linearize around” the center point of each graph, where Ph is 53.4 percent, Pa is 46.6 percent, and (Ph – Pa) is 6.8 percent. Basically, this means that we take the true calculated value at the center point, and make adjustments to this value as we move away from the center point in each direction. The amount that we adjust by is directly proportional to the slope of the graph in each direction. To calculate the slope, we have to take the partial derivatives of the functions we previously calculated, and then evaluate the derivatives at the center point. Again, to avoid the messy math, I used Python’s SymPy library.

For the first graphs, where we have Ph and Pa as axes, we find that:

Series Type | Ph direction | Pa direction |
---|---|---|

Best-of-5 | 1.13 | 0.75 |

Best-of-7 | 1.25 | 0.94 |

This means that for the HFA team in a best-of-five series, if we increase or decrease Ph by one percentage point, we increase or decrease their probability of advancing by 1.13 percentage points. At the same time, if we increase or decrease Pa by one percentage point, we increase or decrease their probability of advancing by 0.75 percentage points. The same logic holds for best-of-seven series, except with 1.25 and 0.94, instead of 1.13 and 0.75, respectively.

We can now write much simpler equations for the probability that the HFA team advances:

P(HFA team advances in best-of-five)= P(HFA team advances at center point) + (Slope in home win direction) * (Ph – .534) + (Slope in away win direction) * (Pa – .466)

= .513 + 1.13*(Ph – .534) + 0.75*(Pa – .466)

P(HFA team advances in best-of-seven) = .511 + 1.25*(Ph – .534) + 0.94*(Pa – .466)

Let’s now plot these linearized equations:

If you compare these graphs with the original graphs, you’ll see that the values are quite close to each other. You may notice that the edges of the graph, which were previously almost-straight lines, are now actually straight lines thanks to the linearization.

Let’s repeat the process for the second group of equations and graphs:

Series Type | Ph direction | Ph-Pa direction |
---|---|---|

Best-of-5 | 1.88 | -0.75 |

Best-of-7 | 2.19 | -0.94 |

P(HFA team advances in best-of-five) = .513 + 1.88*(Ph – .534) – 0.75*((Ph-Pa) – .068)

P(HFA team advances in best-of-seven) = .511 + 2.19*(Ph – .534) – 0.94*((Ph-Pa) – .068)

Note that if you expand the equations here, and then combine the resulting terms, they will look very close to the equations we calculated in the previous section, which is one way of checking that our math is correct. Also, in the last graph, we have some computed probabilities above 1.0, which is impossible, so we could manually set those values to 1.0 to correct this.

### Conclusion

While we went through the time-consuming process for deriving complicated equations to calculate the probability that the HFA team advances in the playoffs, we also derived some much simpler equations that provide a linear estimator for those probabilities. The estimator equations are more than good enough for our purposes, especially when you consider the fact that the fundamental Ph and Pa values will themselves be estimates based on some other model. We did more than enough work today, it’s up to you to compute these magical Ph and Pa values!

### References & Resources

- Sports on Earth, “Home Sweet Home Field”
- Jeffrey Gross, The Hardball Times, “Home-Field Advantage Does Not Exist in the Postseason”
- SymPy.org

Marc Schneider said...

This is an interesting topic and something I’ve thought about. But I’m not a statistician and this article does me no good. It’s completely inaccessible to anyone who isn’t able to parse the methodology. At least, provide some conclusions for the lay person. My eyes just glazed over.

Roger Cheng said...

Apologies that the article wasn’t accessible enough. If there is a single takeaway from the article that would be considered most important, it would be in the tables titled “SLOPES WHEN PLOTTING …”, and the paragraph that follows to explain the tables.

Jetsy Extrano said...

It’s a weird artifact that increasing HFA shows up as decreasing series win probability, because you’ve pegged the axes that way. Why not use (Ph – Pa) and (Ph + Pa)/2 ?

I’m afraid I can’t read anything out of the perspective on those flat surface graphs, so your quoted linear weights are all I get out. Maybe plot a couple of relevant slices as one-dimensional graphs.

Roger Cheng said...

To be completely honest with you, I didn’t think about using (Ph + Pa)/2 as an axis.

In reading the surface graphs, it is helpful to use the contour lines and shades of blue as guides. The contour lines represent lines where the z-axis (Probability of HFA team advancing) is constant, and the lighter the shade of blue, the closer you are to 1.0 on the z-axis.

Michael Bacon said...

Not all home field advantages are the same. The Altered League representative, Minnesota Twinkies, won two World Series, 1987 & 1991, without ever winning a game on the road. It is simply an unfair advantage for any team playing in such an aberrant ballpark. Add the fact that the National League team was forced to play deviant Baseball while in the aberrant American League to your equation(s) and what chance did the true Baseball teams, who were not designed to play in a unnatural league which violates one of the cardinal rules of Baseball, which is “Our NINE versus your NINE,” and what percentage chance did the NL teams have?

Marc Schneider said...

But the Braves still should have won. I tend to agree about the Metrodome but they had an opportunity to score and the failure had nothing to do with the ballpark or the presence of the DJ.