THT Mailbag: How to be a Sabermetricianby Bryan Tsao
January 10, 2007
In last week's mailbag, we began officially soliciting op-ed pieces for publication in this space. I'm happy to say that we're running our very first one, reader Mark Saleman's commentary on the rising cost of free agents, at the end of this mailbag; many thanks for all the submissions this past week.
Also, I'd like to use this space to give another update on our editor search process. Our copy editor application is almost closed, and I promise we will e-mail everyone within the next couple days. However, we have yet to seriously begin looking at the THT Daily/Links editor applications, so that will probably be another week coming.
Now, onto the questions.
How to become a Saber-magician
Can you take me through the process of statistical analysis to try and discover something? Where do you get the necessary data? Do you have some sort of database that is constantly and automatically updated during the season? If so, where do you get this database from and is it available to the public?
- Dave R., Toronto
Bryan Tsao: I'm certainly not one of the Hardball Times' resident stats experts, but I've almost uniformly found that the best analysis starts with an interesting question, and then seeks to answer it in the best way possible—sometimes, that will involve stats, and other times, it won't.
So before you even start the analysis, figure out what you want to discover, and then think about whether statistical analysis is the right tool. On our site, we've looked at things like who has had the best outfield arms or whether weight is a useful predictor of power. Chris Constancio does a lot of great things on his site, FirstInning.com with comparing minor league players through their performance. These are all things for which we have a lot of data, thanks to the likes of the Lahman Database, Retrosheet and sites like Jeff Sackmann's Minor League Splits site. There's also the Baseball Hacks book to get you started. Or, you can get much of your data directly from us.
Dave Studeman: We purchase our stats from Baseball Info Solutions. They update our files automatically and our developer, Bryan Donovan, has automated a process that updates our site stats when BIS updates theirs (usually in the middle of the night). The data we purchase isn't publicly available, though we obviously make most of it available to you on our site.
If you use Excel, there is a relatively simple process you can use to update stats in a spreadsheet daily. Go to "Data > Import External Data > New Web Query" to choose a table of THT's stats you'd like to import into Excel. Once you've set up this query, you can simply update the files in the future by clicking on the exclamation point at the bottom of your screen.
The Amazing All Reliever Pitching Staff
What are the main reasons relievers typically have lower ERAs than starters? Is it that they have great pitches but no stamina, typically? Is it that batters see them less often? Or is it just the situations that they typically enter the game in and that this is just an illusion based on the nature of ERA?
Isn't there some way to exploit this through some sort of "all relievers" staff? For example, a staff that consisted of six average to decent one to two inning pitchers, three pseudo-starters that would pitch three innings every three days or so, two mediocre inning-eating long relievers for mop up duty, and a top notch relief ace.
Please tell me why this is a terrible, terrible idea.
- Jon S.
Steve Treder: Excellent questions, Jon. Let's start with the issue of why it is that relievers typically have lower ERAs than starters.
First of all, pitchers don't just have lower ERAs when relieving as opposed to when starting, they generally have better rate stats across the board. Perhaps you had the occasion to peruse my THT article, Examining the Relief of Relieving. It isn't just an illusion based on the nature of ERA. Instead, it's likely a function of the fact that when a pitcher knows he will be working a short stint, there's no need to pace himself and be concerned with pitch count.
Thus he's free to throw harder, as well as to pitch to corners, as the risk of allowing more frequent bases on balls is acceptable as a means of allowing less frequent hits and home runs. In addition, the fact that in a short stint a pitcher will rarely have to retire the same batter two or more times means he doesn't have to mix in as complete a variety of pitches. All in all it means a less capable pitcher is generally able to be as effective in a short stint as a superior pitcher is in a long stint.
So, does this suggest that an "all relievers" staff would make sense? It would seem so, wouldn't it. But here's the problem: baseball seasons are very long, and the volume of innings a staff must handle adds up pretty quickly. Let's do the math on the staff you suggest:
The three pseudo-starters pitching three innings every three days would pitch about 54 games and 162 innings apiece, right? That's a daunting workload right there; few pitchers in history have been able to manage that on a sustained basis. But let's say our three guys can handle it: that's 486 innings.
Let's say our top-notch relief ace works the typical modern closer workload, and pitches about 70 games, 75 innings. That brings our innings total up to 561.
Over a 162-game season, a staff will typically pitch somewhere around 1,450 innings. That leaves something close to 900 innings to be divided between our six one or two inning relievers, and our two mop-up innings eaters. Let's say our one our two inning guys each pitch 60 games, 90 innings: that's 540 innings. That leaves over 300 innings left over for our two mop-up guys ... uh-oh, that's a whole lot of mopping up.
So the issue is that the workloads these relievers would have to manage would simply be unrealistically heavy - unless you're going to have even more than 12 pitchers on your staff. Carrying 12 pitchers already creates what I think is an unhealthy level of shallowness and inflexibility for the other eight positions; more than 12 is severely problematic. It brings us to the conclusion that every team in history so far has made: You have to have starting pitchers to eat the innings.
Starting is the most difficult of pitching assignments, but it's unavoidably necessary. Having at least two or three pitchers who can make 30-plus starts, averaging 6-plus innings per start, is just a requirement of a schedule of 162 nine-inning games. It's why good starting pitchers are more important and valuable than relievers, it's why they command so much salary in the marketplace, and it's why top-notch starting pitching remains one of the anchor points of any championship-caliber team.
Why the Cardinals Owe the Rocket
In looking back at the 2006 season, I wonder what might have been for the Houston Astros if Roger Clemens had decided to come back earlier than he did. Would two, four, or six weeks more of Clemens' production been enough for the Astros to have taken the Central from the Cardinals? If Clemens had came back earlier than he did, might we have had a different World Series champ than the St. Louis Cardinals? Would the Detroit Tigers have accomplished a most remarkable story by winning the 2006 World Series?
- Anthony W., Anderson, South Carolina
John Beamer: Great question. Roger Clemens made his 2006 return against the Minnesota Twins on June 22. All told, he missed 72 games, and the Astros had a 37-35 record when he took the mound against the Twins. OK, assuming that the Rocket had stayed healthy all season and started pitching in April, how many more games would Houston have won? There are two parts to this.
First, we need to work out how many games he would have played, and second, we need to calculate how much better he'd have been than his replacement. With a five man rotation the 72 games that the Rocket missed equate to roughly 12 starts. Garner would probably have rested him for a couple of those at least—call it 10. Fernando Nieve is the guy who made way (but Taylor Bucholz would probably have been a better option). Nieve's ERA to the end of June was 4.76 as a starter, so assuming that Clemens would have been able to repeat his second half performance in the first half Nieve's 4.76 ERA becomes 2.30.
Over the course of 10 starts, or 60 innings, this works out as 16 runs, or a shade under two wins. As the Cards were 1.5 games up with a game to play they would have remained hot favorites. All said and done the Rocket's presence may only have forced a play-off with St Louis. Also other factors are at play, such as the possibility that an earlier start may have resulted in Clemens losing effectiveness later in the season. One final point is that Garner chose the wrong pitcher to ditch. Had Clemens taken Buchholz's spot the run differential would have been 23 runs. That is closer to the number we need to get Houston to the postseason.
I'll be giving this topic a more complete treatment in my article next Monday so please check back in then.
Mac vs. Kong
I have long debated among my baseball fanatic brethren the Hall of Fame credentials (or lack thereof) of one Mark McGwire. This obviously is a hot topic this week, and many pundits have droned on about how McGwire should or should not be ushered into the HOF on his first ballot. Most, if not all, of these sages, scribes and talking heads, however, are focusing their energies on the wrong topic: steroid use.
I wish more people would have the courage to call it like it is. Why do so many find it so hard to admit that Big Mac's on-field achievements, while solid, do not merit inclusion in baseball's pantheon?
I am no Bill James disciple, nor do I have a statistics degree from MIT. But I do know a one-dimensional player when I see one, and, simply put, Mark McGwire is the Dave Kingman of the "juiced ball" era.
Doubt me? Consider the following similarities that I unearthed in perusing the archives at Baseball Reference:
Mac vs. Kong:
- Both born on West Coast in a city starting with "P".
- Both with a given name of David.
- Both batted right, threw right.
- Mac - 6'5", 220; Kong - 6'6", 210.
- Both played at USC.
- Both drafted in 1st round by a Bay Area franchise.
- Both played 16 seasons.
- Both came up at age 22 and retired at age 37.
- Both played 1B, DH, 3B and OF and no other position.
- Both in top-10 in HRs in 10 seasons.
- Games played: 1874 v. 1941
- AB: 6187 v. 6677
- H: 1626 v. 1575
- 2B: 252 v. 240
- GIDP: 147 v. 139
- SF: 78 v. 75
Only two players to hit 440-plus HR, have 1500-plus strikeouts, bat under .265 and have less than 1650 hits.
Of the 34 players w/ 440-plus HRs, Mac and Kong rank 33 and 34, respectively, in hits; 32 and 34, respectively, in average; 33 and 34, respectively, in doubles; 13 and 8, respectively in strikeouts; 31 and 34, respectively, in total bases; 34 and 21, respectively, in steals; 32 and 34, respectively, in Runs Created; 33 and 34, respectively in runs .
That's just not enough juice to stamp either player's ticket to the Hall. The Hall should be reserved for the best of the best and, while there are other marginal lights currently among Hall members, the gatekeepers should be loath to further water down the ranks with players who did one thing well in an era where doing that one thing became commonplace.
- Carlton P., Atlanta
Steve Treder: Carlton, you've obviously performed some detailed research, and the points of comparison you raise between Dave Kingman and Mark McGwire are truly interesting, and in many ways amusing. However, your fundamental conclusion that McGwire wasn't a meaningfully better player than Kingman is simply not supported by the evidence.
The two were unquestionably players of similar kind, but that doesn't mean they were players of similar quality. Essentially, any metric we have that evaluates players within ballpark and league context, and assesses their performance in comparison to that of their peers, shows McGwire to be, not just better than Kingman, but vastly better. It isn't close.
In career OPS+, for instance, Kingman's mark is 115; that is, 15% better than that of the league average. McGwire's is 163, tied with Jimmie Foxx for eleventh-best all time.
In career WARP3, Kingman's total is 50.5; that is, a bit better than 50 wins more than a replacement player would have provided his teams. McGwire's figure is 109.5, more than twice as good.
In career Win Shares, Kingman racked up 195; that is, he gets credit for about 65 of his teams' wins. McGwire's total is 343, nearly twice as many.
Why is it that McGwire performs so much better than Kingman on these metrics? First of all, it's because although Kingman was extremely good at hitting home runs, McGwire was even better, even when taking into account the differences in league home run context. And more significantly, it's because Kingman demonstrated extremely poor plate discipline, drawing very few walks, while McGwire's strike zone judgment was good even as a young player and became excellent in his later career: By accepting the bases on balls that pitchers were willing to give him, McGwire avoided outs far more effectively than Kingman.
Kingman was genuinely a one-dimensional player: He hit home runs wonderfully well, but in every other regard he was, there's no other way to put it, a terrible player. McGwire not only delivered home runs even more productively than Kingman, but hit for a better batting average, and drew walks at a vastly better rate. (McGwire was also a better defensive first baseman than Kingman, but that isn't the element that truly separates them.)
The performance enhancing drug issue aside, McGwire's achievements are pretty clearly Hall of Fame-worthy. I don't' think they're "first ballot" worthy, for whatever that's worth, but his performance was clearly within the bounds of the established standards of the Hall of Fame. Kingman's aren't close. There is really no comparison between the quality of these two players in contributing to victories for their teams.
The Free Agent Cycle
This has got to be the worst offseason ever. The money that is being handed out to average and below average players since your Nov. 30 article is out of control.
It has gotten so bad that the average fan cannot afford to go to a game any longer. The owners seem to know this, so they are building ballparks with smaller seating capacities and trying to get the highest amount of corporate season tickets they can so they can charge outrageous prices.
The Mets are a good example. Shea has about 55,000 seats. The New Citifield will only have abut 41,300 seats. (They say it will hold about 45,000, but that includes standing room. Why anyone would be $25 to $30 to stand is beyond me, but I digress)
Obviously ticket prices will go way up. The cheap seats Shea has now in the upper deck and back rows of the Loge and Mezzanine will no longer exist. At least for the first few years, there will probably be less than 10,000 tickets available per game for the average fan. This is the way the Mets management is thanking all its loyal fans who have supported them for the past 45 years through thick and thin (mostly thin): by telling us they don't want or need us any longer.
This is going to happen around the majors as the average salary approaches the $5 million dollar mark.
Eventually there will be a business downturn. Companies will cut back on the excess, one of those will be tickets to sporting events. The teams will suddenly have more tickets to offer the general public, but most people will be priced out. The average person can't pay $50, $70 and $100 to see a ball game. They will have to cut back prices drastically and will lose tons of money.
Most of the owners are very wealthy and could probably ride this out for a year or two, but they will only be able to survive by renegotiating salaries down. Either than or sell the teams or fold. That is my prediction of the future of major league baseball.
- Mark Saleman
Bryan Tsao is the editor of The Hardball Times website. He welcomes comments, questions, and suggestions for both himself and the site via email.