The WPS Index (part one)

by Shane Tourtellotte
August 15, 2012

As my earliest readers at The Hardball Times know, and as I keep mentioning, I love steals of home. So the July 14 game between the Padres and Dodgers grabbed my attention when the tying run in the ninth came across on Everth Cabrera’s two-out steal of home, and Will Venable scored the go-ahead run on the throwing error that was made attempting to catch Cabrera. It was a great way to win a game, and one that got me thinking.

Allow me to get you thinking the same way. Take a look at the FanGraphs game graph for that contest.

Source: FanGraphs

Team     1  2  3  4  5  6  7  8  9  F
Padres   1  0  0  1  0  2  0  1  2  7
Dodgers  1  0  2  0  1  2  0  0  0  6

The game was tight all the way through, no team ever ahead by more than two runs and with a number of quick comebacks. The Padres entered the ninth down, 6-5, when this happened (you can mouse over the graph to follow along):
{exp:list_maker}Yonder Alonso and Venable hit consecutive singles to set up the Padres with runners on the corners (including pinch-runner Cabrera) and no outs. Big swing for the Padres: they’re now favorites to win.
Cameron Maybin strikes out hacking. A swing back to the Dodgers.
Venable steals second, putting the winning run in scoring position and partly making up for the Maybin whiff.
Mark Kotsay pops out to second. Huge out for LA, as the Dodgers are now one out away from victory.
Cabrera bolts for home, making it on a wild throw that not only fooled the umpire, but let Venable come around to score, too. Padres take the lead and never lose it. {/exp:list_maker}
That’s a wonderful half-inning. Every play except the final out makes a big difference in Win Expectancy, the double-steal changing the odds by more than 60 percentage points. A good game became a great one, and it seemed one could measure the greatness by how much the Win Expectancy swung with each play.

Would this really work? Had I stumbled upon a way to tabulate how good, or at least how exciting, a baseball game was by how much the Win Expectancy moved play by play? I decided to run some numbers, but I also had to realize that I wasn’t the first person to invent a method of quantifying the excitement of baseball games. I wasn’t even close to the first person at The Hardball Times to do it.

The road more traveled

Before sailing into these waters, I will give brief explanations of a few statistics to be sure I’m not leaving any newer readers behind. If the opening section confused you, read on. More experienced hands can skip forward three paragraphs.

Win Expectancy (WE) measures the probability of a team winning the game, based on its current lead or deficit, the inning being played, the number of outs and the runners on base. Teams are assumed to be evenly matched, so each has a 0.50 WE as the game begins. (1.0 equals a win; 0.0 equals a loss.) Starting the game by, say, grounding out would lower your WE below 0.50; tripling to open the game would raise it above 0.50; each subsequent play would raise or lower the number accordingly.

Win Percentage Added (WPA) is the change in a team’s chances of winning caused by a specific play or plays. A solo home run breaking a tie in the second inning would produce a WPA (rendered as a decimal) of something like +0.11; a caught stealing in a tied second inning could be around -0.03. Kirk Gibson’s homer to win Game One of the 1988 World Series was a +0.87 for the Dodgers.

Leverage Index (LI) is a ratio measuring how critical a play’s result will be in deciding the ultimate winner. The average value is 1.0; higher numbers indicate a bigger shift than average in a play’s Win Expectancy. The LI is measured without regard to the result: pop-outs and grand slams alike can have a huge (or tiny) LI, depending on the volatility of the game situation before they are made.

These stats are the essence of two existing simple measures of a game’s excitement, employed by FanGraphs and listed for every contemporary game. Average Leverage Index (aLI) averages out the LIs for each play, producing an overall score of how tense a game is. A score above 1.5 marks a taut, closely-fought game; a score below 0.5 indicates a laugher.

Average Win Expectancy (aWE) works much the same, but its numbers must be read differently. Scores range between 0.0 and 1.0, denoting the road and home teams. Scores closer to 0.5 indicate tighter games, or games that swung back and forth between favoring one team and the other. Scores near 0.0 or 1.0 show one team piling up a lead early and holding it, not the best recipe for excitement.

Both methods are a bit crude but have the virtue of simplicity. They are easily understood and easily calculated by anything from a modest spreadsheet program to the awesome processing power of FanGraphs. They give you good first-order approximations, so of course we’re not satisfied with them.

THT’s first crack at a game-excitement formula came from Dennis Boznango in Nov., 2005. His method combined two factors. Each half-inning, he’d add up how close to 50 percent the home team’s WE was (with extra credit for tight scores late in the game) and the WPA over the preceding half-inning, showing how much “action” there’d been in that frame. A close game with several turnarounds would score highly.

Boznango added factors for gauging the excitement of postseason series by how critical the series itself was (World Series down to LDS) and how close the series stayed (big leads discourage us from staying tuned, even if there is a thrilling comeback).

Next came Max Marchi. He considered a whole raft of factors, including mean change in Win Expectancy (something very close to what I’m considering), absorbed readers’ suggestions, and used advanced factor analysis to produce his method. It combines no fewer than eight separate variables relating to the three factors that he found made for an exciting game: equilibrium, rally, and late-game importance.

Thanks to its appearance in The Hardball Times’ Baseball Annual 2012, this method has a semi-official imprimatur. It also has the burdens of complexity and opacity. Marchi’s articles do not actually spell out how he calculates his game ratings, so we can’t duplicate his methods and apply them to games ourselves. We’re taking his word for it, and we clearly don’t like doing that, given how people keep coming up with their own systems!

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

Last year, Chris Jaffe joined that growing throng. His method (given at the bottom of the linked page) had a different look, being built top-down rather than bottom-up. Jaffe put together a points system based on things fans would subjectively consider exciting: close scores, comebacks, extra innings, walk-off wins, outstanding individual performances, and more. (He also rates the excitement of postseason series, which is a bit removed from what I’m doing today.)

The WPS method

And then we come to me. I should be daunted by all of the other methods that have been crafted. Actually, I am daunted. I’m plowing ahead anyway because I believe I’ve stumbled upon something that gives a good measure of a game’s excitement, at least by certain definitions, while being transparent and simple to calculate, so other people can have fun with it.

The basic system, which I am calling the Win Percentage Sum (WPS) Index, is described quite briefly. The WPS of a game is the total of all positive values of Win Percentage Added for each play in a game. WPAs are rendered as positive or negative, depending on whether one team or the other sees its chance of winning rise. For WPS, remove all the minus signs and add them all up. There are other elements I will add to the system, but this is the foundation.

Let’s show how it works with the game that triggered all this: Padres-Dodgers on July 14. There are two sources I can use for the vital statistics: FanGraphs and Baseball-Reference. FanGraphs does its WPA calculations to the tenth of a percentage point, where B-R rounds to whole percentage points. However, FanGraph’s numbers go back only to 1974 for the regular season and 2002 for the playoffs, whereas B-R covers many more games, especially in October. They also differ in how they calculate park effects.

For this case, I’ll show the results derived from both sources. Numbers are given as percentage points: e.g., Gibson’s homer would count for 87, not 0.87. Compare the numbers to the game graph and line score above.

WPS (FG)     1     2     3     4     5     6     7     8     9 Total
Padres    18.5   4.8   5.1  16.5   6.6  37.5   6.2  33.4 154.7
Dodgers   28.5   4.8  46.7  12.0  21.2  38.9   6.1   6.1  17.4   465

WPS (B-R)    1   2   3   4   5   6   7   8   9  Total
Padres      18   5   5  17   6  36   6  34 152
Dodgers     27   5  46  12  22  38   6   7  18    460

This is a very good score on the WPS scale. The theoretical minimum a game can score is 50, going from an even start to certainty of victory without any break for the losing side. A good majority of games falls between 150 and 300. For comparisons on the high side, Game Six of the 1975 World Series comes out to 608 (according to Baseball-Reference), while Game Seven of the 1960 World Series tallies a mere 455 (ditto). Indeed, why not let you see what those look like?

WPS (B-R)   1   2   3   4   5   6   7   8   9  10  11  12 Total
Reds        9   4   6  13  53  17  62   9  15  38  30  43
Red Sox    32   3  10  11  16  15   9  75  72  15  15  36   608

WPS (B-R)   1   2   3   4   5   6   7   8   9 Total
Yankees     5   5   5   5  10  67  11  27  63
Pirates    25  34   4   1   2   9  26 119  37   455

Already, your eye is picking out those single-digit stretches where not much was happening and likewise fixing on those big numbers, recalling what made those innings big. Note for a second the Red Sox’ eighth and ninth: nearly identical numbers produced in much different ways. The eighth was Bernie Carbo’s three-run home run to tie the game; the ninth was bases loaded, nobody out, George Foster versus Denny Doyle, and Boston failing to push across the winning run. Two different ways of being exciting get similar credit. That’s encouraging.

All systems like this have their strengths and weaknesses, the incidents they emphasize and those they shrug off. To understand the results we’re getting, we need to look at these structural elements, and the WPS method does have very distinct likes and dislikes. The question is whether they match up with what’s interesting and what’s boring. Let’s see.

Things WPS likes

High scores. Lots of runs means lots of opportunities for big win-expectancy swings. Take the last four innings of the Yankees-Pirates finale. It may feel a bit childish, and we may still feel burned after the offensive surge of the 1990s turned out to be Better Hitting Through Chemistry. Regardless, runs are exciting. The system has a bias toward slugfests as opposed to nail-biting pitcher’s duels. Whether this is a bug or a feature may depend on what kind of game you like.

Tight scores and prompt comebacks. Even two- or three-run leads seriously depress WPA movement if they last. The sooner the other team climbs back, the better—and the sooner more back-and-forthing can occur. This fits well with our reactions. A four-run margin doesn’t put us on the edges of our seats the way a one-run margin, or a tie, does.

Interspersed outs. Especially in late and close innings, quick pendulum swings can produce a lot of WPA motion; the Padres and Dodgers demonstrated that. Breaking up the three outs with hits and other on-base events produces a “sawtooth” pattern, a meandering path to the ultimate result of the inning. (I nearly called my method the Richter Scale, after the look of a really exciting game graph. That name would be far less charming, though, after the next big earthquake.)

This factor helps uncover some “hidden” game excitement not firmly tied to scoring. A pitcher squeezing out of a jam is usually more interesting than mowing down the side in order (though I will get to the exceptions). In baseball, the most exciting distance between two points is not a straight line.

Extra innings. The WPS scale is cumulative, so each added frame—starting tied up for maximum leverage—contributes nicely to the total. Boston’s down-in-order 10th and 11th in Game Six still have some excitement, since one swing can literally decide it (as the 12th proved). I will gladly bow to the inherent excitement of extra innings. We’re all fans here, right? If we didn’t think more baseball was better, we’d be reading some other website.

Walk-off wins. A walk-off produces a score in the final half-inning ranging from substantial to huge, which beats the home team not even batting in the last of the ninth or going down meekly in their last licks. With how much fans love walk-offs, the WPS if anything underrates them, but we can remedy this later.

Things WPS hates

Big leads lasting a long time. Large leads sap the tension out of a game. They make one side’s victory look inevitable, especially if the margin persists for several innings. By the time the dramatic comeback begins, fans may have figured this is a good time to clean out the garage and let the spouse/roommate/whoever watch his or her shows. If one team jumps out to a big lead, the other needs to eat into it fast, or interest and WPS scores alike will sag.

Consecutive outs, and especially double plays. These are lost opportunities for the “sawtooth” pattern that produces the highest scores on the scale. Double plays can be exciting by themselves, but they also lower the chances of other exciting events that inning, often to zero.

Outs made on hits, runs scored on outs, and bases advanced on outs. What these plays have in common is that they have positive and negative run elements cancelling each other out. This is one of WPS’s greatest flaws. It devalues sacrifices, bunts and flies alike, and perhaps that’s fair. It also diminishes the excitement value of a runner getting thrown out going for the extra base, one of the most reliably exciting plays in baseball.

Things WPS ignores

Seasonal context. Game Seven of the World Series, a playoff for the division title, or an August game between also-rans are all the same to WPS. We can add adjustments for context, but it’s not going to come from the base numbers.

How the plays are made. WPS doesn’t do degree of difficulty. It can’t tell a pop fly to center field from The Catch, except that Larry Doby probably doesn’t go second to third on the pop fly—and thus, due to Hate No. 3 above, The Catch ranks as less exciting! Heck, look it up: The Catch ranks as the least exciting play of that half-inning! This method cannot measure that kind of excitement, and it’s hard to imagine a purely statistical method that could.

This extends to other plays. The system doesn’t know a leadoff single from a leadoff walk, a ringing double from a muffed medium fly. Not that errors can’t be exciting, but there’s nothing inherent in the system that can distinguish the exciting from the merely embarrassing.

The daily run environment. Even when FanGraphs and Baseball-Reference take park factors into account, they do so for a baseline of many games. They don’t catch variations within that average. Take the fabled Phillies-Cubs 23-22 slugfest from 1979. (Base score: 502 by Baseball-Reference, 512.6 by FanGraphs) Things get boring on the WPS Scale in the middle innings because Philadelphia has built a wide lead. It doesn’t take into account the stiff wind blowing out that both helped produce that lead and made that lead much less safe than on a calm day at Wrigley.

Records and similar achievements. This is partly another matter of external context, how we subjectively gauge things as exciting or noteworthy. Take this well-known game, the participants obscured for dramatic effect.

WPS (B-R)  1   2   3   4   5   6   7   8   9 Total
Visitors   5   5   6   6   6   9   6   7   7
Home       5   5   6  20  11  24   6   1   X   135

The WPS Index does everything but yawn here. The home team scored once in the fourth and once in the sixth, and you would be hard-pressed to find a more boring 2-0 game, right? Look at the visitor’s line: it’s like they never got anything going.

And they didn’t. Don Larsen made sure of that.

My method, which likes pitchers getting into jams, betrays this kind of game, where the whole point is that Larsen never had to worry about a baserunner. This is a hole in the system, and it takes a big external override to fill it. Jaffe’s system does it directly, assigning large point values to perfect games and no-hitters.

With regrets, I have to leave such corrections out of the WPS system. I want all the factors I measure to spring from the Win Percentage Added data. If I start grafting on outside numbers, there is no clear stopping point. I might as well create a wholly top-down system, and Chris Jaffe has already done that quite well. I will accept the holes in the system.

To be continued

This doesn’t mean I won’t try to make some improvements that flow naturally from the data sets I’m using. In tomorrow’s concluding installment, I’ll lay out these two modifications and explain what use they have. I will spread a couple of compliments out to my immediate forebears in crafting game-excitement measurements.

Best of all, I will take WPS out for a spin, showing you how it rates some famous games. This test drive will demonstrate the workings of WPS and illuminate a few noteworthy facts about the games it is rating. After that … who knows?

So join us here tomorrow (when that link will actually work).

References & Resources
FanGraphs and Baseball-Reference were indispensable for this article.

3 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

studes

11 years ago

Shane, I’ve actually used this approach many times in the past, including several Hardball Times Annual articles.

Shane Tourtellotte

Oh.

Well, this will teach me to arrive late to the party. Or at least it will teach me to buy up the backlist of Annuals.

No problem. Lots of these slackers probably haven’t read them either.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG