Is it better to have a good defense or a good offense?

It shouldn’t matter, should it? A run scored should be just as good as a run allowed—right? That’s the basic assumption of a lot of sabermetric tools, like the Pythagorean win expectation.

There really is no reason to assume or to go with what sounds reasonable when there’s a way to test. So let’s go ahead and test this notion.

Here’s what I did:

- Using the Retrosheet game logs, I created a running total of a team’s won-loss record and run differential for every team from 1953 to 2008, excluding 1998, for each day of the season. (There was a problem with my database that made the 1998 entries unworkable, so for the time being I simply excluded them.)
- Teams were labeled as being an “offensive” or “defensive” team based upon their runs scored or allowed, relative to the average runs per game that season. If a team’s runs scored above average was more than a half-run per game more than their runs saved above average, I labeled them an offensive team. If their runs saved above average was a half-run more than their runs scored above average, I labeled them a defensive team.
- I looked for matched pairs of teams: two teams with the same run differential in the same number of games. I kept the pairs where one team was an offensive team and one team was a defensive team.

[To clarify, for the sake of convenience I’m calling each entry a “team,” which is true in the strictest sense. But the same team can show up multiple times in the result set, at different points in the season—in fact, the same team could very well be called both an “offensive” and “defensive” team at different times during the season.]

Doing this provided me with over 213,000 matched pairs, a very robust sample for testing. When the won-loss records are tabulated, the offensive group of teams won at just below a .5016 clip. The defensive group of teams won at a .5021 clip.

Okay, so this is a very subtle difference; rounding win percentage out to three digits pretty much obliterates the difference. We’re talking roughly one game won every 12 seasons between our defensive-minded teams and our offensive-minded teams.

Using the Student’s t-test, though, we can see that while the difference is not substantial, it is statistically significant—that is to say that we don’t think it’s simply an accident of small sample size.

There is one sampling concern that the t-test isn’t picking up for us, though, and that’s the number of games each team played. Narrowing the sample down to only teams that played over 100 games, we see an even greater disparity: Defensive teams won at a .5039 clip, compared to .5024 for the offensive teams. And the t-test results say that this is still a significant result.

There is some sort of sense going on here, or at least I think there is. Given the use of bullpens and defensive replacements, a team can more readily leverage their run prevention than their run scoring. (This has been studied previously.) And a team’s runs scored per game is generally more consistent than their runs allowed—normally the batting lineup is the same, or very similar, from game to game, whereas there can be a substantial difference between a team’s staff ace and their fifth starter. (Again, this effect has been studied before.)

So what does this mean for Pythagoras? Let’s use the best Pythagorean win estimator around, Pythagenpat. And let’s stick with our 100-plus game teams, mostly to ease computational burdens on my end. (Excel does not look kindly upon 213,000 record files.)

What we want to do is compare the accuracy of Pythagenpat on our offensive and defensive teams. The test I used was average error: the average difference between a team’s actual and estimated win percentage. I also used a weighted average, so that teams with more games played count for a larger portion of the average.

For the offensive teams, the average error of Pythagenpat was .013. For the defensive teams, the average error of Pythagenpat was .022. That’s nearly a game and a half’s difference in an 162 game season.

I want to go ahead and stress a few things here:

- We suspect, but have no evidence for, a mechanism of action here. Further research is warranted.
- Because of that, we have no idea whether or not this is a repeatable skill—we’re not looking at causation here.
- This is a subtle effect—generally speaking, it is still correct to say that a run on offense and a run on defense are equally valuable.

So please, do not mistake this as a testament to the great and amazing powers of the 2007 Arizona Diamondbacks to defy mighty Pythagoras. (The 2008 Arizona Diamondbacks have a little to say on that subject matter, I should think.)

But I think a little caution is perhaps in order the next time you’re tempted to speculate that the difference between a team’s actual record and their Pythagorean is simply a matter of luck. Pythagorean win expectation is a very good model for how team run scoring translates into team wins, and a very useful model, and there is some real power to it. But it’s not necessarily the whole story.

**References & Resources**

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.

Sal Baxamusa would like to remind you that Pythagorean win expectation is still the best model we have for team wins.

Dan Fox has a great article on the accuracy of Pythagorean estimators.

Dave Studeman has learned ten thing that you might want to learn, too.