We took a little bit of a break over the winter, but THT’s Ten Things I Didn’t Know Last Week column is ready to go another season. Many thanks to the two people who encouraged me to keep going a fourth year.
Some of these items will be old news to you, but I’ve got to start somewhere. It was a busy offseason.
I think I know what sabermetrics is
My good buddy John Brattain wrote a column a couple of weeks ago taking sabermetricians to task for sometimes going too far with their findings:
The thing is, there is a disconnect between what has been studied and what I have seen. For example, we read about the limited utility of the stolen base and its effect on run scoring. However, to take that as absolute gospel is to say in effect that Rickey Henderson and Tim Raines would have been superior players (or at the very least, no less valuable) had they consistently stayed on first base whenever they got on.
Now, someone rightly pointed out to John that stolen bases aren’t the problem—it’s the outs incurred while trying to steal bases that concern most sabermetricians.
Having said that, I’ve got to admit something. I’m not exactly sure what a sabermetrician is. In the 1982 Baseball Abstract, Bill James called sabermetrics “the mathematical and statistical analysis of baseball records.” He should know, right? After all, he coined the term. More recently, in a Bill James Online article, he had this to say about sabermetrics:
First, the concept of a “traditional sabermetric model” is gibberish. Sabermetrics is not based on tradition. Sabermetrics is founded on the concept of questioning traditional beliefs, not on a moral imperative to embrace them.
Second, the main thrust of sabermetrics has been to pursue the truth.
Even though they come from different places, I think John and Bill share some of the same concerns. It seems that some folks use “traditional” sabermetrics to make their point about stolen bases or sacrifice bunts. But there is no such thing as traditional sabermetrics. There is only the pursuit of truth.
So who are the truth-seekers out there? Well, I know Tangotiger and MGL are. Greg Rybarczyk is. Most of us at THT are—even the ones who don’t dabble in math. We’re mostly interested in the truth of baseball; it’s what we like to write about.
The thing is, truth comes in many forms. The truth isn’t always about answers, it’s also about the framework, the context. For instance, I think one of the most important contexts to come out of sabermetrics is the power of the win. Most things come down to wins and losses. Even runs scored and allowed only have one purpose: to win and lose. Who should be the MVP? Whoever did the most to help his team win. Who should be in the Hall of Fame? There are a number of factors, but the most important ought to be who did the most to help his team win.
Which team was the best team in a specific year? The team that won the last game of the postseason.
This is why I’m a fan of Win Probability Added (Click on the link for more info about WPA). WPA directly measures how much impact each play had on an average team’s chance of winning, at the time the play occurred. In other words, it measures the impact of an event in “real time.” WPA does have its flaws—most notably that “real time” thing—and it doesn’t necessarily provide all the answers. But it’s a fantastic framework, the first one I would cite when discussing sabermetric truths with anyone, including John Brattain.
I was going to write an entire article about sabermetrics and framework, but I decided to include my ongoing thoughts in the Ten Things column during the season. That way, you folks will have a lot of time to correct me.
When to bunt
So I was messing around with WPA and looked up something that John might appreciate: Does it ever make sense to sacrifice bunt, according to WPA? Well, yes, one time. In the bottom of the ninth with a tied score, runner on second and no outs. Bunting him over to third increases the home team’s chance of winning from 83 percent to 84 percent. In every other instance I examined, the bunt didn’t increase the team’s chances of winning.
This is a gross measure, however. It doesn’t take into account the specific skills of the batter, or the batter on deck or the baserunner or the pitcher or the third baseman. But you can make some assumptions about those things and change the framework. In fact, this is what MGL did in the Book, and what James Click did at Baseball Prospectus a while ago.
Here’s one of the rules from the Book:
Late in a close game, in a low run-scoring environment, it is correct to often sacrifice bunt with a runner on first and no outs. In an average run-scoring environment, you should sometimes sacrifice to keep the defense honest.
I think that is the best “answer” from the best of sabermetrics. The Book has a lot of other rules about sacrifices, but my point is this: they came up with the rules for bunting by using the Win Probability framework as a beginning point.
What WPA/LI is
This is very technical, but it’s something I’ve devoted a lot of brain cells to lately. Tangotiger, one of the leading advocates of WPA, invented a statistic called Leverage Index (LI). You can read about Leverage Index here, but the concept is simple. LI measures the criticality of a situation. 1.0 is average. The bunt situation I just mentioned has an LI of 2.4; with the runner on third and one out (in other words, after the bunt), the LI is 4.3, which is very, very critical. You can do all sorts of fun things with LI related to game strategy.
You can also use it to modify WPA and address some of the issues people have with WPA. If you take the WPA of each play and divide it by the LI of the situation, you basically have a “normalized” WPA.
Then, if you take each player’s WPA/LI and add it up for the year, you’ve got a WPA rating in which players are rated on an even scale. That is, they don’t get extra credit for having more opportunities to impact a game. It’s a lot of work, but the good news is that Fangraphs is already doing the work for you. Look up any player at Fangraphs and you can find his WPA/LI for each of the last six years.
So WPA/LI is a very attractive way to rate players. Still, I stumbled over the question: what is it, exactly? What does WPA/LI measure? I mean, I understand the calculation, but when a situation has a higher WPA/LI than another, what does that mean?
You know, if you make a pain of yourself and keep asking stupid questions, eventually someone will answer. That’s what Tango did for me in this thread. The answer is that WPA/LI measures how well players performed in specific situations. Tango has started calling it “Situational Wins.”
Here’s an example. Which is worth more: a home run with runners on second and third and no outs, or a home run with no one on and two outs? Tie score, first inning. Well, they’re both home runs, so they’re both equal from that perspective. But the first one came with two on and resulted in three runs, while the second one only yielded one run.
On the other hand, runners on second and third with no outs are pretty likely to score anyway, so a batter probably shouldn’t get too much credit for batting them in. In fact, according to WPA, the first home run is worth .107 WPA and the second homer is worth just a little less at (.087 WPA).
But WPA/LI rates the bases-empty home run much more highly (.202 vs. .096). Why? Because a home run with the bases empty and two out is the best outcome, by far. A home run with runners on second and third is pretty cool too, but a single or double would have been just about as powerful. Think about it from the pitcher’s point of view: he’d rather walk a guy than give up a home run with two outs and the bases empty. The home run is the last thing he would want to give up. With runners on second and third, however, he wants a strikeout and will pitch differently. He recognizes that he doesn’t want a batted ball at all.
So the pitcher receives a bigger “ding” for giving up a home run with no one on and two outs, and the batter receives a bigger “ka-ching.” The context drives the value in a different way than it does in WPA.
There’s more to learn about WPA/LI, and not everyone buys into it. But it’s got me thinking.
Survey says: more about stats
We ran a survey this week so that you could tell us what you like and don’t like about the Hardball Times. So far, over 900 people have filled it out. Thank you for your fantastic feedback; we’re already working on some of your suggestions.
One of the themes that came out of the survey is that people would like to know more about our statistics. So I plan to highlight a THT statistic or two as we plug along this year. Remember that we do have a statistics glossary that contains definitions and links to articles with more information.
But I’d like to talk about one class of statistics that I’ve been thinking about lately: ERA estimators. ERA estimators have been in use for a long time now. We’ve got Expected ERA, component ERA, DIPS ERA, peripheral ERA, QERA and lots of other alphabet names I can’t remember. David Gassko has written about HIPS, LIPS and DIPS and Derek Carty uses LIPS in his fantasy work. Lookout Landing recently invented something called tERA.
I guess people will never stop coming up with new ways to estimate ERA. But the first question should be: why? Why bother to estimate ERA? After all, we have actual ERA, right?
There are a few basic reasons I can think of to estimate ERA:
– To estimate a pitcher’s “true talent” as a way to predict his future performance.
– To isolate the impact of the pitcher from his fielders (related to the first point).
– To pull out the impact that relievers might have had on inherited runners.
– For the heck of it.
At THT, we track two simple stats: FIP and xFIP. They both explained in our glossary, but the idea is simple. FIP (which stands for Fielding-Independent Pitching) uses strikeouts, walks and home runs to meet the first two goals listed above, and it does pretty well. A year and a half ago, Tango tested FIP to see if it worked as well as advertised, and it does.
FIP totally ignores batted balls, other than home runs. We know home run rates vary from year to year, are heavily dependent on a pitcher’s flyball rate, and can be affected by the ballpark. So I created something called xFIP, which takes the home run portion of FIP and adjusts it to a league-normal rate based on a pitcher’s flyball rate (but calibrated to the pitcher’s ballpark). In theory, xFIP should be a better indicator of future pitching performance than FIP.
There have been lots of studies about pitching estimators; allow me to add another to the mix. I pulled together 140 matched pairs of seasons for pitchers in the last four years, with both years consisting of at least 100 innings pitched and for the same team (so, no impact from changing fielders or ballparks), and no season included more than once in the sample. I found that…
– ERA and FIP were about equal in how well they predicted the second year’s ERA. ERA was actually a bit better.
– xFIP was slightly better than either one.
– If you want to include more batted ball information in FIP, remember that FIP inherently includes GB/FB ratio information, by virtue of the home run count. But you can add infield pop flies to the strikeouts and line drives to the walks to get a “batted ball” FIP (Hat tip, Tango, natch), and a “batted ball” xFIP was even more highly correlated with the following year’s performance.
– None of these estimators was terrific at predicting the second year’s ERA.
You know, a couple of stats may be better predictors than FIP, but they’re much more complex. I don’t generally like complex. I’m a simple guy.
But the better question to ask is how well these pitching estimators did predicting ERA for those pitchers who switched teams. Come back next week.
Days of Rest Graph
I don’t know if you saw it, but this was my favorite sabermetric post of the offseason.
That’s a graph of the number of calendar days a major league pitcher rested between starts, using nine-year average rates. “0 Days” means he started on consecutive days, “1 Day” means he had one day of rest between starts, etc. As a graph nut, I love this graph because it shows some very complex information in a very simple way. It’s a line graph, easy to understand. It doesn’t try to collate the information into averages; it lets you pick out the complexity.
And the information is complex. The guy who created the data was investigating the question of when the transition went from three days of rest to four. As Steve Treder has pointed out several times, the transition hasn’t been that straightforward. Pitchers have been used in a variety of ways throughout the years. Here’s a summary quote from the original author:
Starting on 2 days of rest peaked in 1892 and rapidly declined when the mound moved back to 60’6″, with a slight jump in 1901. Starting on 3 days of rest peaked in 1897, 1920, and had its largest peak in 1973, followed by a rapid decline. Starts on 4 days rest have risen almost steadily, peaking in 1995, and have dropped in the last decade. The frequency of starts with 5 days rest began rising rapidly in the 1970s, and jumped to a new level in 1998. Starts with 6 days rest peaked in 1942, when Ted Lyons was ending several successful years as a Sunday starter.
I think the real move toward today’s “five-man rotation” began in the middle of the 1970’s. From 1975 to 1976, starts with three days of rest declined from 33% to 24%, and continued to decline afterwards, while starts with four days of rest rose from 38% to 44% and really took off beginning in 1980. Starts with five days of rest made their biggest leap from 1987 to 1988 (21% to 24%).
SABR’s minor league database
Another big event of the offseason that didn’t get much notice was SABR’s new minor league database. They don’t have all the data I’d like to see (For instance, Dwight Gooden and Lenny Dykstra are both listed with Lynchburg in 1983, but their 300 strikeouts and 105 stolen bases aren’t. Respectively.). But it is still very neat stuff. Random example: Tony Oliva.
The flukiest fielding year
Hopefully, you saw Sean Smith’s Total Zone article earlier this week. Some of the best sabermetrics these days concerns fielding; how to measure it and how to evaluate players with it.
Sean’s work gives a fielding rating (runs above or below average) to just about every fielder who has played since 1956. The spreadsheet is great fun to pour over. One of the things I decided to do was see which players had the greatest fluke fielding years. To determine that, I looked at each player’s performance per 100 plays at each position over his career, and then compared that to each individual year that player played that position (for a minimum of 400 chances). I only considered players whose careers are over, or close to.
The greatest fluke year I found was Kirby Puckett’s rookie year, 1984. He was 4.6 runs above average when he first came up to the majors, but never came close to that level again. His second-best year was his sophomore year, 1.2 runs above average, and the stats say he was about average or slightly below most years after that.
That’s certainly not the image most of us have of Kirby’s fielding prowess, but it brings to mind the work that has been done on fielding and aging, which suggests that most players’ best fielding years are in their youth.
Over the winter, Sean also did some work on how many runs a great fielding first baseman can save a team by scooping out bad throws, a study that MGL was able to duplicate.
The bottom line is that the difference between the best and worst first basemen in a given season is probably about one win, based solely on how well he fielded throws from his fellow infielders.
I also hope you caught Dan Turkenkopf’s study of how well individual catchers blocked balls the past few years. This is another fantastic study, taking advantage of the latest Pitch f/x data and computing skills.
The bottom line: it’s a good thing IRod has a great arm.
There’s a job opening in Cooperstown
Dale Petroskey was asked to step down from his post as Executive Director at the Hall of Fame. I know nothing about this situation, but the press release says the Board felt he “failed to exercise proper fiduciary responsibility.” Doesn’t sound good, but we shouldn’t be too surprised. Eric Enders reported about a Petroskey incident four years ago.
Why, yes, I am available for a job. Why, yes, I would be willing to relocate to Cooperstown.
Spring Training foresight
Are spring training stats meaningful? John Dewan thinks that there is an angle that works. He likes to look at each player’s slugging percentage each spring and compare it to his career percentage. John projects that three-fourths of those who exceeded the career mark by 100 or more points are likely to improve this year. Check out the list.
The moon versus the diamond
You didn’t ask for this, but here is a map of how far men walked on their first trip to the moon, superimposed on a baseball diamond:
Looks like a typical David Wells Spring Training workout to me. The map is courtesy of NASA.