More on WAR

by Jeremy Greenhouse
February 1, 2010

Earlier this year, it seemed like every other week FanGraphs was improving its stats section. So I’m anxious to see what David Appelman has in store for the new season comes, as I’m sure his projects have been building up. But first, I had some thoughts on the already-existing WAR.

My main philosophical problem with Fangraphs’ WAR (fWAR) is that relievers are given extra value for having pitched in high-leverage situations. Personally, I don’t understand why we use a pitcher’s actual leverage index and chain from there. Why not just start and end with the deserved leverage index?

My new and unrelated idea is to replace linear weights as the primary metric used to calculate WAR with WPA/LI. I proposed this, and Colin Wyers disagreed. Wyers either doesn’t follow my thought process, or I’m wrong. So I felt I should explain my rationale.

For pitchers, FanGraphs has decided not to include defensive data. I propose that pitchers should be evaluated based on WPA/LI with defensive adjustments based on UZR. Rally’s WAR (rWAR) essentially follows this method by tallying a pitcher’s runs allowed and adjusting for defense with Total Zone. I’m uneasy using rWAR for pitchers, for the simple fact that I can’t get over an advanced metric using RA or ERA. But WPA/LI as the basis for rWAR would serve the same purpose and would be much more accurate in measuring a pitcher’s actual contributions. Also, for relievers, WPA/LI adjusts for the game state in which the reliever enters the game, which I think is a huge plus.

As for hitters, for which fWAR and rWAR both use basic linear weights, WPA/LI is just better in my opinion, in that its weights are “perfect.” WPA/LI has dynamic linear weights, so to speak.

I feel WPA/LI should be the starting point, though far from the end point. WPA/LI has its problems, since the current version doesn’t distinguish between hitter, baserunner, pitcher, and defense. The hitter and pitcher are assigned equal responsibility for all events. The data is available to calculate WPA/LI for baserunners and defenders, but it would be exceedingly difficult to do so. Nevertheless, WPA/LI is just a step up from basic linear weights in that it accounts for the game state. It’s going to take a while for someone to develop a WPA/LI-based WAR, but I have a feeling that WPA/LI is the future.

12 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Sean Smith

14 years ago

It’s not all luck. Some pitchers will consistently give up fewer hits on balls in play than others. In addition, some pitchers are better at getting the double play, picking off runners, preventing steals. Pitching to the situation has been shown to be a repeatable skill.

Some pitchers will give up fewer runs than others with the same comination of hits, walks, homers, etc. There is a lot of luck in one year of that, but over a career I have shown that it is not all random. It was my first article on THT, titled “Randy Johnson is not dead yet”.

You might get a better measure if you start with FIP, then consider a pitcher’s skill in preventing hits, aside from ballpark and defensive support, then see if his GB rate changes with runners on first and two out. Consider the pitcher’s own fielding. Use retrosheet to measure his skill in holding runners. Or you could just use runs allowed and be done with it.

Blue

14 years ago

I was under the impression that the amount of hits on balls in play was essentially a function of the amount of groundballs a pitcher coaxes (or rather of the amount of fly balls, ground balls=higher babip lower slugging, fly balls the other way around). The extra groundballs therefore leads to the extra double plays.

Picking off the runners and preventing steals (as well as actual fielding dexterity, which is tough to measure) are things that I think should be more included in pitcher’s conglomerate statistics, much like baserunning should be included for hitters.

I also did forget that point, which is something that seems to be forgotten often in the anti-clutch frenzy, that while clutch hitting is not repeatable, clutch pitching is. I’d personally like to see more written on the subject, I find it interesting.

See, that’s essentially where I get tripped up on advanced statistics, with regard to your last point. They claim (well they don’t claim, but imply) to give a whole picture, but as you pointed out, there are a lot of subjective details in the application of statistics. You can’t just blanket with information, you have to know how to use it too. Which is why I love seeing Pitch f/x analysis and sweeping statistical surveys on sites like THT and Fangraphs, they reflect things that either I have trouble with comprehending myself, or that I just don’t have the time to run, but wonder about.

Jeremy Greenhouse

14 years ago

Isn’t WPA/LI just RA but better? Rally, if you could flip a switch and have your WAR programmed to use WPA/LI, instead of RA, would you choose to do so?

Sean Smith

14 years ago

“I was under the impression that the amount of hits on balls in play was essentially a function of the amount of groundballs a pitcher coaxes (or rather of the amount of fly balls, ground balls=higher babip lower slugging, fly balls the other way around).”

This is true in general, but not complete. Even beyond his bbtype mix, you can’t explain Mariano Rivera’s low babip season after season after season, unless you give allowance that he has some kind of skill there. Yes, even with Derek Jeter as his shortstop.

Blue

14 years ago

I was under the impression that Mo is a crazy freak of nature when it comes to pitching and comparing other pitchers to him is typically a bad plan (although I did it). Also I think there’s something in Baseball Between the Numbers about knuckleballers demonstrating a similar tendency. Maybe it has to do with a reliance on a single pitch that is so wildly different than typical pitches.

as for the WPA/LI vs. RA thing, yeah, I guess it is a better version of RA, but at the same time, fangraphs WAR doesnt run off of RA, it runs off of FIP.

Colin Wyers

14 years ago

“This is true in general, but not complete. Even beyond his bbtype mix, you can’t explain Mariano Rivera’s low babip season after season after season, unless you give allowance that he has some kind of skill there. Yes, even with Derek Jeter as his shortstop.”

Can’t we?

Here’s my hunch – and I would love to check this out at some time. It’s not just Rivera, it’s almost all elite closers with an apparent BABIP prevention ability.

So what do we know about elite closers as a group?

* They rarely pitch more than one time through a lineup.
* They almost always start an inning cleanly – that is, few inherited runners.
* They typically have high strikeout and low walk rates.

The first point is an interesting one to consider – does BABIP “skill” change as a pitcher goes through the lineup?

The second and third, though, go to something more interesting. What does the number of baserunners have to do with BABIP? Well, it affects the positioning of the defenders – with runners on base, they have to position themselves suboptimally in order to hold runners.

Is an elite closer’s ability to keep batted balls from becoming hits simply an ability to keep runners off the bases, thereby letting the defense position themselves optimally?

Sean Smith

14 years ago

Good things to think about. It isn’t just Rivera, but most (though not all) closers who last a long time – weed out the Kevin Greggs of the world who close hear and there when a team can’t find anyone better.

Some other factors to consider – Protecting a 9th inning close game you should be more likely to have a better defense behind you. This is when defensive replacements are used. But then, with 12 man pitching staffs rosters are so thin that you probably don’t have enough D reps to make a big difference.

With the game on the line, are players in general better fielders? More likely to dive for a ball to save a game instead of playing it on a bounce? Especially with veteran players, many may feel that playing all-out, all the time, is the way to find yourself on the DL. And maybe you don’t give that extra effort in the 6th inning of a 5 run game that you do with the game on the line.

One thing to check is find a group of setup pitchers who have the same low baserunners as the elite closers, isolate the games where they come in to start an inning and only pitch one inning, and see if they have a lower babip just like the closers do.

I don’t want to go into the history of DIPS research. Search the archives of Tango’s site if you want. There’s a ton there, and I thought those who’ve studied the issue in depth understand that there is some pitcher skill there, not all of BABIP is defense, BBtype, or luck.

Sean Smith

14 years ago

Here’s a good one.
http://www.insidethebook.com/ee/index.php/site/comments/r50_at_bip_1500_for_babip/

Too much to summarize, takes a long time to grok. But from looking at this and many studies like it I’m convinced that we’ve got a skill here. One that is hard to detect in less than multiyear samples, but a skill nonetheless.

Maybe some combination of park, defense, pitcher role, BBtype, and pitchf/x can explain it all someday, but we’re not there yet.

Blue

14 years ago

While I understand your concerns over using ERA in general, I see WAR as less of a predicative stat and more of a reflective one. While FIP/xFIP/tRA and etc are more efficient at predicting future performance from a pitcher, ERA I think is a better indicator of how well a pitcher’s season resulted, whether that be through luck or skill. We don’t punish batters for inflated wOBA’s due to babip concerns, nor reward them for a poor, unlucky year, why should we do so for pitchers?

Jordan

14 years ago

I do see your point, but if you’re really looking for a stat that measures how well a pitcher’s season resulted without adjusting for luck (including defense), why not choose RA over ERA? Using ERA seems like a strange midway point to settle on; it’s neither fully reflective nor fully predictive. I guess it’s a fine stat so long as those who use it understand that, but I’d prefer to use stats that are either purely reflective or purely predictive.

I’m still not sure I’d prefer a WPA/LI based WAR, though. Isn’t LI a non-repeatable skill? I guess I’d buy a WPA/LI based WAR as a reflective measure of value, but I’d bet it correlates worse year-to-year than the current (fangraphs) model. So maybe a WPA/LI based WAR is the future for reflective stats?

That’s an interesting point, though, that fangraphs adjusts pitchers, but not hitters, WAR for luck. Do hitter WARs correlate worse year-to-year than pitcher WARs?

Blue

14 years ago

I am reluctant to use RA as a stat, although really it should be the other way around, as ERA is plagued with the inconsistency of errors. Regardless, neither is an appropriate measure of reliever quality.

LI is a repeatable skill, although it does not rely on the pitcher, it relies on the way the pitcher is used, a closer who only pitches in late and close situations will therefore always have a higher LI than a long reliever who only enters the game withe a 6+run differential (I.E. Josh Fogg vs Mariano Riviera)

I guess by my logic, WPA is a better reflective stat in that a walkoff grand slam is more valuable than a solo homer in a blowout to a team at any given time, but going forwards neither is more valuable as a prediction (if this makes sense—a hitter who is projected as a 30 hr hitter will likely have a normal spread on his 30 hr, but perhaps he was (as an adjective and not a skill) super-clutch for his season and hit 16 of his home runs in very high leverage situations. He may well have won 16 games for his team, and this will be reflected in WPA but not WAR). However, I am wary of very advanced metrics (I by no means consider advanced statistics below me, but I do feel that a proper utilization of simpler statistics can lead to a more universal and easier understanding of a players talents. I’m not talking about RBIs and Wins and Errors, I’m talking about OBA/SLG, K/BB, and UZR) WPA is an extremely foreign concept to the casual baseball fan, while the key Sabermetric component stats are often easy to explain and to use.

I’ve digressed a bit. Anyhow, it is interesting that pitcher WAR is normalized for luck, while hitter WAR is not. I think the main reason for this is the lack of a fielding independent hitter statistic. perhaps some sort of wOBA/(lgAvgBABIP/BABIP) stat would be in order.

Alex Krolewski

14 years ago

Blue said: “WPA is an extremely foreign concept to the casual baseball fan, while the key Sabermetric component stats are often easy to explain and to use.”

In fact, I think that it’s the other way around. The casual fan should easily be able to understand WPA—based on past games, we know that an average team in this specific situation has a particular chance of winning. WAR, meanwhile, is far more complex and seemingly “arbitrary.”

As for rating pitchers, I would strongly favor using a PZR-like defensive adjustment rather than Sean Smith’s defensive adjustment. Sean adjusts each pitcher’s stats for the overall quality of the team’s defense; therefore, a pitcher who receives bad defensive support, but from a good defensive team, is doubly penalized by Sean. PZR, on the other hand, subtracts the UZR compiled behind that one pitcher only; as a result, it avoids double penalties like the example above.

BABIPs each time through the lineup (from BR, 2009 AL):

1st 0.296
2nd 0.303
3rd 0.308
4th 0.320

There should also be a selective sampling effect here, as presumably pitchers with better BABIP skill stay in the game longer.

And last, to get back to the main point of the article, I like the idea of using WPA/LI instead of Linear Weights in WAR. Situational (“clutch”) hitting should be included in WAR, but WPA/LI also puts all batters on the same scale by adjusting for the situations they find themselves in.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG