Season leverage index

by Dave Studeman
December 11, 2008

Well, I don’t know about you, but I found the National League MVP voting truly enlightening, in a negative-BBWAA-sort-of-way. Like a lot of sabermetric types, I didn’t think that Ryan Howard was the most valuable player on the Phillies, let alone in the league. I didn’t even think he was one of the 10 most valuable players in the league.

Think about that. The guy led the league in home runs and RBIs by a good margin, yet a doofus like me thinks he wasn’t one of the 10 most valuable players in the league? As Craig Wright has written, it really is unbelievable. This is a list of things that Craig figured a player would have to do to achieve such a feat:
{exp:list_maker}Play a big offense position like first base and play it poorly,
Have the third worst on-base percentage at his position,
Lead his position in outs made, and
Do it in a season with a lot of strong individual performances. {/exp:list_maker}Yeah, Ryan Howard did all of those things. But that didn’t matter to the BBWAA. Howard finished second in MVP voting, with 12 voters placing him first (the winner, Albert Pujols, had 18 first place votes).

One of the reasons Howard received so many votes was his September performance, when he batted .352/.422/.852 while the Phillies overtook first the Brewers and then the Mets to qualify for the postseason. The MVP voting is really all about the drama, and Howard provided game thrills that those poor bored sportswriters seemed to enjoy.

Now, we can complain all we want that April wins count as much as September wins, but that’s not a very interesting way to look at things. It doesn’t thrill your typical sportswriter and, honestly, it probably doesn’t thrill your average fan. I have a feeling Howard might have done just as well if the MVP voting was conducted the same way MLB conducts All-Star game voting: mass voting by casual and intense fans alike. Everyone likes a good stretch run.

Well, why not count late games in pennant races more than earlier games? After all, part of the fun of being a fan is seeing your favorite player or team come through “when it counts.” And the heart of a fan is much more likely to be crushed in September than April. As my barber said to me the other day, “Maybe it would have been better if the Cubs had never competed at all. Why can’t we just have fun again?” Yes, they’re still trying to recapture their mojo on the North Side.

The entire issue got me to thinking: if we accept that September games provide more drama and have more impact than April games, can we quantify the difference? (You’re probably not surprised I thought about the numbers angle.) For in-game situations, we have Tangotiger’s Leverage Index, in which 1.0 is an average situation. When the Leverage Index is higher than one, it’s more critical (2.0 or 3.0 are typically good cutoff points for really critical situations—about 3% of all plays had a Leverage Index of 3.0 or more last year). Can we derive the same sort of thing for in-season games?

Of course we can. I’ve put together a system that I think does the trick pretty well and I’m going to offer it here as a way of stimulating discussion and debate. Once we refine the method, we can apply it to Howard, Pujols and all the other MVP candidates to rank their “real-time” impact on the pennant race. Why? Because we baseball nerds like to really spoil the BBWAA party by quantifying everything they can think of.

I’m going to list all the gory mathematical details at the end of the article. Suffice to say that I used a binomial distribution of how often a team is likely to gain X wins in the next Y games (assuming that all teams are naturally .500 teams) over 162 games. I then compared the difference in the binomial distribution between an incremental win and a loss at all points of the season, and indexed that to the 112th game (which, it turns out, represents the average difference).

Like I said, I’ll go into more detail in a second—let me first show you the results. For this index, I assumed that a .500 team would need to play .500 ball for the rest of the season to reach the postseason (I don’t think it really matters exactly what the goal is because we’re indexing the results). Here are the month-by-month results:

Month      LI
April      0.6
May        0.6
Jun        0.7
Jul        0.9
Aug        1.1
Sept       2.2

So, according to this methodology, September games are almost four times more critical than April games. In fact, they’re twice as critical as August games. That’s a pretty steep curve, which is even more apparent when you graph the index of every other game as the season progresses:

This curve may seem too steep to you, but it makes a lot of sense. Of course, when there is only one game left in the season, the potential outcome is going to have a lot more impact than an April game, when there are 150 to 160 more games to go. The question I have is: is this how steep the BBWAA (and many fans) think of it? In fact, don’t you sometimes get the feeling that MVP voters only consider second-half performances and nothing else? Why would Manny Ramirez finish fourth in MVP voting otherwise? Why would Chase Utley finish 15th?

So, let’s go back to our original inspiration. What does this crude tool do for Ryan Howard? To get at that question, I guesstimated monthly Runs Created for Pujols and Howard by calculating each player’s GPA for each month and applying this simple math. (By the way, GPA worked pretty well as a Runs Created estimator for these two players. It gave Ryan Howard 108 RC vs. “actual” RC count of 113 (per Baseball Reference) and Pujols wound up with 150 RC vs. BRef’s 160). I then applied the monthly Leverage Index for both players, like so:

           Original Runs Created   Leveraged Runs Created
               Howard  Pujols   LI     Howard  Pujols
April/March      11      32     0.6       6      19
May              20      29     0.6      13      19
June             12      11     0.7       9       8
July             20      23     0.9      17      20
August           16      32     1.1      19      36
Sept/Oct         28      24     2.2      60      52
                108     150             125     153

Howard closes the gap, but Pujols performed pretty well in September, too, and when you factor in fielding prowess, Pujols gains another 20 runs on Howard. It’s still a slam dunk decision.

But the case isn’t closed. This is a very crude approach—it doesn’t include the specific leverage of each game played by these two players. The Cardinals played fewer critical September games than the Phillies did, and the LI multiplier ought to reflect that.

Good news: I’m almost finished building that spreadsheet. But I first wanted to get some feedback about this general approach. Let me hear you.

Gory mathematical details

I’m not really a math geek; I just play one on the Internet. So take this system with a grain of salt. But to derive in-season game criticality, I used the ol’ binomial distribution or Bernoulli distribution. A binomial distribution (which is a simple function in Excel called BINOMDIST) is a distribution of possible outcomes, given a certain number of trials and an underlying assumed probability. It works for any “binomial” outcome, such as wins/losses, heads/tails, yes/no, up/down, on/off.

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

For instance, if you have four games remaining and a 50% chance to win each one, the binomial distribution predicts this range of outcomes:
{exp:list_maker}0 wins: 6.25% of the time
1 win: 25% of the time
2 wins: 37.5% of the time
3 wins: 25% of the time
4 wins: 6.25% of the time {/exp:list_maker}Nice symmetrical output, isn’t it? If you multiply the probabilities times the number of wins, you get an average of two wins—as you should. Note that if you add these probabilities, you find that a team will win at least two games 68.75% of the time. In other words, if it has to play at least .500 ball the last four games to make the postseason, there is nearly a 70% chance that it will do so. This may run counter to your “natural” assumption that this team has only a 50% chance of making it in that situation.* That’s the power of the distribution output.

*I’m ignoring competition here. Once you factor in a competitor, the odds change.

So, to derive the index, I created a binomial distribution table for 162 games and all possible outcomes (keeping the underlying probability at 50%). I then took each “cell” in the table and compared the difference between a loss and a win at that point. In other words, I looked at the binomial distribution in the next column and took the difference between the team being a win under .500 and a win over .500 (assuming that the goal is to finish the season at .500). As the season progresses, this difference increases from .0626 in the first game to .5000 in the next-to-last game.*

*I only used even-numbered games to derive the index, because there were no “.5” wins in the table.

Then I divided those indices by the overall average, which is .1123. And that’s how I calculated in-season game leverage.

When I apply this math to specific players, I will look at the postseason requirements of each specific team at the time of each game. I’ll calculate how far behind or ahead they are in the postseason race (all the wild card permutations make my head spin), and use that as the postseason goal instead of .500. To simplify the math, I’ll assume that all teams are .500 teams. I know this isn’t reality, but we’re not trying to predict pennant races; we’re just developing an index.

If you have questions or comments, please leave them at Ballhype. In the meantime, I’ll get back to my spreadsheet.

References & Resources
Ken Ross’ A Mathematician at the Ballpark is a great source of information about Bernoulli trials and binomial distributions. Of course, Tangotiger’s Leverage Index concept (my favorite sabermetric invention of the last umpteen years) was the original inspiration for this approach.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG