What’s it All About, AOWPy? (SP Leverage, Part 3)

This is the third article in a series on staring pitcher leveraging here at The Hardball Times. If you know what SP leveraging is, skip to the next paragraph. For you newbies, starting pitcher leverage refers to the once common but now extinct practice of a team intentionally using one or more of its pitchers disproportionately against particular opposing teams. It could be an ace starting all the time against the best opposing teams, or southpaws starting against the most left-leaning offenses. For this study, I figured out that leveraging existed back in the earliest days of baseball up to the 1960s, and thus looked at the usage patterns for virtually every pitcher in those years worth figuring. For this I invented a stat called AOWP+. Scroll down below to see exactly how this stat works. Short version: it’s set up like ERA+ or OPS+, centered on 100. A higher score means the pitcher was used more against the best teams, a low score means more against the worst teams, and if he’s used evenly against all his AOWP+ will be 100.

The first two articles covered best and worst careers and best and worst single seasons. Before moving on, I want to address two main issues that readers have brought up: 1) ultimately, just how important was it, and 2) is there any way to adjust a pitcher’s stats based on leveraging.

What’s it All About, AOWP?

The first point was best brought up in a Rob Neyer piece that came out in response to Part I. As a former poster on the late, great Rob Neyer Message Board, I was rather thrilled and shocked to see The Flannel Clad One promote my piece, but upon closer inspection I was somewhat mortified to see his interpretation.

Conclusion? Jaffe doesn’t say this, but I will: Except perhaps at the margins, none of it means anything. . . . I support further research, and this research actually might argue for Pierce as a marginal Hall of Fame candidate. But this first effort suggests the impact of opposition quality is less than I’d believed.

Hey wait, I’m not sure I agree with that. To be fair, Neyer does say “less than I’d believed” so maybe he just expected it to be more extensive than I did.

Let’s look at it for a second. It’s true that pitchers rarely had career AOWPs more than 2% off their TOWPs, but that can be deceptive. An AOWP+ of 105 doesn’t mean a pitcher’s AOWP was .505 instead of .500. His AOWP would’ve been .525. Secondly, a career mark will virtually always be flattened out by the ends of a player’s career because even the guys who are leveraged a lot in their primes are usually not leveraged much at the ends of their career. Often times they’re reverse leveraged at the ends.

Let me use Mordecai Brown as an example. His AOWP+ of 104.45 was one of the best ever. His AOWP was .506 and his TOWP .484. Over the course of a season, .022 is worth 3-4 wins. Yet in some ways his leveraging was even more impressive than that. Had he been used evenly against all teams all seasons, in his 332 career starts he would’ve had 160 against teams with winning records. In reality, he had 194, a 21% increase. That’s a pretty damn substantial difference. And against the best teams, he was even more likely to be used. Here’s a chart showing what I mean:

Rivals	Actual GS	If Evenly Used	Dif.
.600+	71		54		131%
.500+	194		160		121%
.499-	138		172		80%
.399-	62		71		87%

Mordecai Brown isn’t unique in this. Lefty Gomez, for example, had an AOWP+ of only 102.86, but faced who faced .600+ teams a whopping 30% more than he should’ve and winning teams 7% more.

Here’s another way of looking at Brown. If you take every start in his career and divvy it up as starts against the best opposing team, second best opposing team, and so on to the worst team, you can get a chart showing how he was leveraged. (It’s a little tricky because in his final year two teams tied for last and he started three times against one, and none against the other. I’ll divide it up as 1.5 games against the two worst availables to see how it works. In 1908 the Pirates and Giants tied, but he had the exact number of starts against both of them). For comparison, I’ll also throw-in his games started during his peak, 1906-11, when he was the Cubs’ ace.

Rival	Career	1906-11
Best	54	35
2nd 	63	35
3rd	57	31
4th	43	21
5th	42	25
6th	33.5	16
Worst	39.5	19

If used evenly, he would’ve had 142 starts against the top three teams. He had 32 more than that. So he had an extra season’s worth of starts against those best teams. Most of those were at the expensive of the worst teams. Hell, in his six ace years alone he had the better part of a season’s worth of games against the best three.

OK, but Brown was always an outlier. Even Neyer said at the edges it could mean something. What’s a more normal season look like. Here’s Wes Ferrell‘s 1935 campaign:

Rival	Pct	GS
Det	616	4
NYY	597	4
CLE	536	4
CWS	487	6
Was	438	7
StB	428	6
PhA	389	7

As you can see, had he been used against the second division teams like he was against the first division, he would’ve had ten fewer starts against the bottom teams. That’s a substantial leveraging. Result: an AOWP+ of 97 (or 96.586 to be exact). That’s the 366th worst single season from a pitcher with at least 20 starts that I have, right between Tony Cloninger‘s 1965 and Dutch Reuther‘s 1927. Plus there’s another 476 at the other end equally or further far away from 100. (There’s more at the high end because you guys who are poorly leveraged generally don’t have as much starts). As impressive as that pattern looked, it was nothing special.

Then again, while I think leveraging has an impact, in fairness I ought to admit that as the person who did all this digging, I have a bias in promoting it. Let me play devil’s advocate for a second.

The logistics of starting 25-30 games a year minimize the impact leveraging can have. And really, if you think a guy’s good enough to leverage, you should think he’s good enough to get 30 starts. And when you start that often, a pitcher has to have 2-3 starts against every team. Only a third of his starts are really in play when in comes to leveraging.

That’s quite a bit, but let’s think about it for a sec. Not all those 8-10 starts will come against the best team. Some will be spaced out against the second best, and third best, and fourth best. And really, how important is it to be leveraged against the third and fourth best teams? All it comes down to is, at best 4-5 starts shifting from the worst teams to the best teams out of 30 starts.

But boy, it sure would be nice to know for sure how much impact it has. Right now, I’m just arguing in circles. That’s where the second question comes in . . .

how much impact does this have on a pitcher’s stats?

I’ve been trying to figure this one out for a loooong time now, but never with much success. Then, after the publication of the first article, I had a thought provoking e-mail exchange with Phil Birnbaum. Phil’s one of the bigwigs in SABR’s Statistical Analysis Committee, runs his own sabermetric blog, and created a nifty little database that I used for my personal favorite stathead study I’ve ever done, Evaluating Managers.

From the exchange, the following idea emerged: using a player’s AOWP and TOWP, you can use the Pythagoras equation to figure out how leveraging affected him. I’ll explain this by using Hank Wyse as an example.

In 159 starts, Wyse had an AOWP+ of 97.16 (yuck), and an AOWP of .486. That means his career TOWP should be .500; to be exact, .5002. (The TOWP makes Wyse an easy example). Well, if you look over his career, going by his starts, there was an average 4.37 runs per game scored. (Only read this if you care about math: I figured 4.37 by taking his GS in each season, multiplying it by the league R/G average, adding all those together, and dividing by his career GS).

Now, based on what I know, I can determine the average strength of the offenses facing him was, and I can figure out what their average strength should’ve been had not been leveraged. We’re looking look at the hitting for the teams opposing Wyse, so treat their pitching staffs as a control, and say they allowed 4.37 runs per game; league average in a start in baseball’s Wyse Era. Since I’m making the opposing pitchers league average, I’ll split the difference between his AOWP of .486 and .500, because pitching should be half the reason why the opposing teams were below average. So that becomes .493.

I have the runs allowed per game (4.37(, the winning percentage (.493), and I plug in 1.87 for the exponent. I can use it all to solve backwards for what his normal level of offensive opposition should’ve been. Here’s the Pythagoras equation for Wyse:

.493 = (X^1.87) / (X^1.87 + 4.37^1.87).

Then all I need to do if figure out the algebra so the equation solve for X, and when I say “figure it out” I mean ask THT’s very own Dave Studenmund to figure it out for me, because there’s no way I’ll ever manage that.

Before I get to the result, I should mention there are several problems with what I’m doing. IP should be used, but I’ve got GS. Then again, I don’t have IP for most years, and even for those I do, I set up AOWP on GS. Believe me, I have no enthusiasm to go back and reconfigure the whole mess with IP. Also, I’m not adjusting for park, and I probably should, but for now I’m not going to worry about it. It won’t be perfect, but it will give me a rough idea how much leveraging impacted a pitcher’s numbers.

So, what does this say for Wyse. Well, the offenses facing Hank Wyse (drum roll, please) scored only 98.47% as many runs as they would have had he been used evenly. That’s it. One point stinking five three percent worse—and this from one of the ten worst leveraged starting pitchers of all time. In reality, he allowed 581 runs over 1257.7 IP. Had he been used evenly, he would’ve allowed 590. Nine frickin’ runs. That’s it. His ERA moves from 3.52 to 3.58, and his ERA+ from 105 to 103. I guess all those reasons I went over while playing Devil’s Advocate meant something. Dammit. Don’t you hate it when the math doesn’t do what you want it to?

One last bit. Since I’ve looked at Mordecai Brown so much in this article, I may as well look at him, too. He’s really an extreme—only one pitcher with 150 GS was better leveraged. Running through the same process, it turns out that the offenses facing Brown were 2.28% better than they should’ve been. Yippee. That’s a big 23 run difference. Given his career length and extreme leveraging, that’s likely the biggest run differential of any pitcher ever. His ERA/ERA+ move from 2.06/138 to 2.01/141.

That’s certainly disappointing. It doesn’t, however, mean studying leveraging is worthless, just that it ain’t earth shatteringly important. Some areas worth looking at are how and why it first arose in the early days of professional baseball, and later petered out in the 1960s; what were its ebbs and flows over the decades in-between; what was the role of platoon leveraging (which we’ve already seen from the first two articles was a key part of the leveraging story) and which managers did the most and least leveraging over time. Besides, if you’re a complete weirdo like I am, finding this stuff out is kinda fun. First though, there’s a ghost I have to exorcise in this study, and that will happen in Part IV.

References & Resources
What the heck is AOWP+?: The stat I invented to judge pitcher leveraging. It’s AOWP/TOWP*100. AOWP is Average Opponent Winning Percentage. TOWP is Team’s (Average) Opponent Winning Percentage. To figure AOWP for a single season, you take the number of starts a given pitcher had against each opposing team, and multiply that by the team’s winning percentage. After doing this for all rival squads, add up the products and divide by the pitcher’s total GS. The result is his AOWP. The same logic applies to TOWP, only here you look at how many games the team played against all rivals. If a pitcher’s used evenly, his AOWP will be the same as the TOWP, and he’ll have an AOWP+ of 100. If he’s used more against better teams, he’ll have a higher AOWP+. I calculated AOWP+ for 659 pitchers who started 182,000 games, including over two-thirds of all games from 1876-1969.

Print Friendly
 Share on Facebook0Tweet about this on Twitter0Share on Google+0Share on Reddit0Email this to someone
« Previous: Prediction Markets Redux
Next: Measuring the Change in League Quality (Part 3) »

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>