Tools: Game and Series Win Probabilities

Just how frequently would the Astros beat an All-Star team? (via Keith Allison)

Just how frequently would the Astros beat an All-Star team? (via Keith Allison)

Last month, FanGraphs implemented its very cool Game Odds system, which estimates the chance of a team winning a particular game while factoring in the fact that home teams win about 54 percent of the time.  A couple of months earlier, I’d shared a tool with the Hardball Times/FanGraphs crew that aimed to do basically the same thing, though using a different method.  Well, here I am, finally, with an article to introduce you to that tool.  For fellow lovers of probability, I’m including a second tool of mine that complements the first by figuring out what specific game win probabilities mean to a team’s overall chance of winning a series. FanGraphs’ Game Odds factors in the 54 percent Home Field Advantage (HFA) by simply adding 4 percent to the home team’s chance of winning.  That makes sense, if both teams are completely league-average, .500-level teams, as the 54 percent HFA is the league average.  But to take things to a ridiculous extreme, a home team that would be expected to win 97 percent of the time before taking home field advantage into account couldn’t be expected to win 97 percent + 4 percent = 101 percent of the time.  Of course, it’s not realistic to expect one major league team to beat another 97 percent of the time — I’d imagine even the Astros could beat an All-Star team more than three percent of the time.  Even so, you can see how there are going to be diminishing returns on home field advantage toward the extremes.  That’s why I think there’s a more accurate (though more complicated) way to do the calculations than to simply add four percent to the home team’s chances.  I believe that way might be the odds ratio formula.

Astros vs. All-Stars

Just for fun: what would be the expected result of the Astros playing an All-Star team?  Well, first of all, what is an All-Star team’s chance of beating a league average team — in other words, what would be an All-Star team’s expected winning percentage over a season (or more)?  So that existing teams don’t lose their best players to this team, let’s assume the roster is composed of exact duplicates of the majors’ best players.  Off the top of my head, I’ll guess they’d likely win somewhere between 70 percent and 80 percent of the time against an average team.  Let’s go with 75 percent.  Let’s also presume that the Astros’ .364 record as of this writing represents their true winning percentage, even though FanGraphs’ projections page has their rest-of-season win percentage at a more conservative .429. So if the All-Stars would beat a .500 team 75 percent of the time, how often would they beat a .364 team?  The log5 formula popularized by Bill James, which FanGraphs’ Game Odds uses, would say 83.98 percent of the time, to be precise.  That’s without home field advantage being considered.  If the All-Stars were the home team, FanGraphs’ system would put their chances of winning at 87.98 percent; if they were visiting, that would drop to 79.98 percent.  My tool, which you’ll see shortly, assuming a HFA of 54 percent would instead predict these replicants to win 86.02 percent of their home games vs. the Astros, versus 81.70 percent away.  It’s a difference between the two methods of around two percent in either direction — not huge, but worth considering.

The Math

The odds ratio formula, the use of which was introduced to me by a number of sabermetric all-stars in a piece on batter-pitcher match-ups last year, is one of my favorite tools nowadays.  It’s an offspring of the famous Bayes’ Theorem held dear by many a statistician, and a more sophisticated sibling of the aforementioned log5 formula.  The advantage the odds ratio formula (or at least this type of odds ratio formula) has over log5 is that it can take into account league averages, and can therefore apply to circumstances other than the one log5 was developed for. What is that circumstance?  Well, take a look at the formula: log5:  Chance of Team A winning against Team B = A*(1 – B) / (A*(1 – B) + (1 – A)*B) …where “A” and “B” are the respective overall winning percentages of Teams A and B. Now compare that to the odds ratio used in my calculations (in a simplified version): Chance of Team A winning against Team B = 1 / (1+B*(1-A)*(1-H)/A/H/(1-B)) …where “A” and “B” are the respective overall winning percentages of Teams A and B, and “H” is Team A’s Home Field Advantage (by default, A’s HFA is 54 percent if it’s the home team, or 46 percent if  visiting). In an easier-to-read form:
Odds Ratio Formula
With a little algebra, you can see that if you were to take the (1-H) and the H out of the odds ratio equation, you’d end up with the log5 formula.  Or, if you had H= 0.5, then 1-H would also equal 0.5, cancelling each other out and also resulting in the odds ratio formula giving the same result as log5. What’s happening in the formula, in a nutshell, is this:

  • (1-A) is the basic chance of Team A losing
  • B is the basic chance of B winning
  • (1-H) is the basic chance of a home team losing

If team A is the home team, then all three of these events have to happen simultaneously for team A to lose the game, right?  That means you have to multiply them all out, and together they combine to relate to the chance of team A losing. By the way, by “basic chance,” I mean as an overall average — i.e., not specific to this match-up. Meanwhile:

  • A is the basic chance of Team A winning
  • (1-B) is the  basic chance of Team B losing
  • H is the basic chance of a home team winning

Multiplied out together, the combination relates to the chance of Team A winning.  That means the equation overall is equal to: 1 / (1 + (A’s losing factor) / (A’s winning factor)) The two factors together make up the L:W ratio for Team A.  For example, if Team A has a “true” .620 record, Team B has a true record of .550, and if home field advantage for Team A is 60 percent, then the losing factor comes to about .0836, and the winning factor comes to about twice that much, at .1674.  You could simplify this to a 1:2 L:W ratio (or 2:1 W:L), or you could run it through the rest of the formula to convert that to a 66.7 percent chance of A winning the match-up. The rest of the equation simply serves to convert the odds ratio into a probability.  The result of the losing factor divided by the winning factor can range between 0 (when there’s no chance of losing) and infinity (when there’s no chance of winning), meaning the end result of the overall equation can range only between zero and 100 percent. Has home field advantage been integrated into the odds ratio formula in this way before?  I’m sure it has, but I bet you haven’t seen a handy tool to do it like this before:
Instructions: The white cells are the ones you’re intended to type inside of.  Begin at the top by editing the “Overall Win” percentages for Teams 1 and 2, the two teams involved in the match-up you’re looking at.  These percentages are supposed to reflect the team’s true win percentage against an average (.500) team.  The Projected Rest of Season W% numbers on FanGraphs are probably a good source for these numbers. The box directly under “Home Teams (HFA)” is set to 54 per cent, based on home teams historically winning about 54 percent of match-ups.  Perhaps you feel some teams have legitimately higher home field advantages than that, due to roster construction, familiarity with stadium and groundskeeping nuances, or umpire favoritism; if so, feel free to play with the number here. Just remember that the percentages have to be greater than zero and lower than 100 percent.  You can use something like 99.9999999 percent, but if a team could truly be expected to win exactly 100 percent of the time, there’d be no point of doing an exercise like this with that team anyway. Below the teams’ win rates are columns with various scenarios.  The first column shows the expected results of a single game, where Team 1 is the home team; if you want to make Team 2 the home team in the match-up, you can change the value of the cell next to “Team 1′s # of Home Games in Series” to 0.  The highlighted cell that displays “63.78%” by default is the expected chance of Team 1 winning the one-game “series”; here, it’s the chance of a .600 team winning any particular match-up against a .500 team, while owning home field advantage.  If this .600 team were to play the .500 team at home a million times, Groundhog Day-style, you’d expect the .600 team to win about 637,800 of them. What if we’re instead talking about the chance of the .600 team winning a three-game home stand against the .500 team?  Is it 63.78 percent?  No, it’s higher — there’s a 70.15 percent chance of winning at least two of the three, as it shows.  Let’s look at the possible ways the series could be won, and the probability of each occurrence: W,W,W; .6378 x .6378 x .6378 = .2595 W,W,L;  .6378 x .6378 x (1-.6378) = .1473 W,L,W;  .6378 x (1-.6378) x .6378 = .1473 L,W,W;  (1-.6378) x .6378 x .6378 = .1473 Added together, these four ways that the .600 team could win the series combine to 70.15 percent.   The tool simplifies the math by making use of the binomial distribution.  If the chance of winning each game in the series is not the same, however, then using the binomial distribution would actually be an oversimplification.  Don’t worry, though — I’ll have a solution for that shortly. If we’re talking about a five- or seven-game playoff, then neither team is going to have the home field advantage in each and every game.  The “Team 1′s Overall HFA” you see in those series is therefore their overall weighted average home field advantage.  In a five-gamer, if its has have a 54 percent advantage in its three home games and a 46 percent HFA (or disadvantage) in its two away games, that’s a (3 x .54 + 2 x .46)/5 = 50.8% overall HFA for the series. In the last column, we have a 162-game season being calculated out for Team 1.  The default setting shows the chance of Team 1 winning at least 100 games over the season, again given that Team 1′s true-talent win rate is .600 and that its average opponent is a .500 team (which should be, more or less, over a season).  The graph below will follow along with the settings in this column, but you can manually override those settings above the graph.

Exact Series Breakdown Tool

This follow-up tool shows the expected outcomes of series that are best of three, five or seven games, according to how many wins are needed for a team to clinch the series (two, three or four, respectively). As before, the white cells are the ones you can overwrite.
If the chance of a team winning each game in the series is identical, then there is no point of using this tool; the binomial distribution method used in the first tool will give you identical results.  However, to provide a ridiculous example of a type of situation where using this tool will give you much more accurate estimates than simply running the overall average win rate through the binomial distribution method:  say your team is made up of Terminator androids from the future that run out of batteries after a couple of games in the series.  In a five-game series, they have a 100 percent chance of winning the first two games, but zero chance of winning each of the next three.  Overall, this team would have zero chance of winning the series, right?  Taking the average win rate of (100%+100%+0%+0%+0%)/5 = 40% and feeding it into the first tool would tell you the team has a 31.7 percent chance of winning the series.  The second tool will get it right, though, at zero percent.

On a more practical level, this tool would become more useful when there are some lopsided starting pitching matchups on both sides.  It will also take any other significant game-to-game differences, such as home field advantage, better into account.

I tried to make this tool with a logic that can be expanded to larger series lengths, should any of you have some Excel skills and feel like doing it yourselves.  The tools can be downloaded via the green and white Excel icons at the bottom of each.

Next on the Agenda

[Probably] coming in the future will be another tool that attempts to make predictions based on run-scoring and run-prevention traits of the teams involved, along with some historical testing.  Until then, happy statisticking!

Print Friendly
 Share on Facebook3Tweet about this on Twitter9Share on Google+0Share on Reddit2Email this to someone
« Previous: The Physical Obstacle for Women in Baseball
Next: In Defense of Jeffrey Loria and the Marlins »

Comments

  1. tz said...

    I will not play with this at work today.
    I will not play with this at work today.
    I will not play with this at work today.
    I will not play with this at work today.
    I will not play with this at work today.
    I will not play with this at work today.
    I will not play with this at work today.
    I will not play with this at work today.
    .
    .
    .
    .

    • said...

      Thanks Cliff. It looks like Heg’s 1983-1988 findings were in opposition to James’ 1977-1982 findings, right? Heg showed a lower home field advantage for teams with higher winning percentages.

      I just ran the numbers on 2002-2013. My findings: a negative 0.05 correlation between overall winning percentage and the quantity (Home win% – Away win%), which contradicts James’ findings as well (or at least contradicts the idea that the factor is still applicable). To me, a -0.05 correlation means whatever trends previous studies found were somewhere between unreliable and a complete fluke.

      Bucketing the teams, however, showed the overall trend to be that the home field advantage actually tended to be lower for the higher win% teams. So if there’s a trend here at all, it’s probably the opposite of what James found for 1977-1982. But I’d feel safer saying it’s a non-factor.

  2. said...

    Nice. I’m a hockey analyst that built a very similar tool for the NHL playoffs that worked out the home-ice implications of the NHL’s 2-2-1-1-1 series format. Two big implications: 1) the team starting the series at home gets a huge advantage by winning the opening 2 games; 2) if the team starting the series on the road is better, sweeps or six-game series wins are more likely than five- or seven-game outcomes.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>