A 10th man?

by Colin Wyers
December 11, 2008

Can you affect the outcome of a baseball game? (Legally, that is—leaping out of the stands and charging the mound is seriously discouraged, as is bribing a player or an umpire to throw a game.)

By “you” I specifically mean the home team crowd. Can you “root, root, root for the home team” enough to actually affect the on-field outcome? We suspect it’s true in other sports, although in football we can identify a more direct effect (the crowd noise makes it harder to plan drives on offense). And ballplayers will talk about the joys of playing in front of the home crowd. Is that just another of Crash Davis’s clichés, or is there something more to it?

We’re trying to quantify an intangible here, so in lieu of measuring the intensity and heartfeltness of a crowd’s cheering, we’re going to have to use a proxy. In this case, the best proxy we have for what we’re measuring is the crowd’s size. Let’s go ahead and admit up front that this is a rather crude measure, but it should serve well enough. Retrosheet has attendance figures for every game from nearly every game from 1974 on, which is the sample that we’ll be examining.

There are a few things to take care of first, to try to make sure that what we’re measuring is actually what we’re interested in measuring. We don’t actually care about a team’s attendance per se, simply their attendance compared to the typical team. So we’re looking at a team’s attendance in that game above or below the league average.

The other issue is ensuring that we don’t have a biased sample. After all, it’s very possible that team with more attendance are naturally better teams in general. So we also want to figure out a team’s expected win percentage based upon their record and their opponents’ record. We can figure that using the log5 method, first discovered by Bill James:

(A – A * B)/(A + B – 2 * A * B)

Where A is the home team’s win percentage and B is the visiting team’s win percentage. Essentially what we’re measuring is the expected chance of the home team winning, given their opposition.

[A note: I used simple win percentage, instead of regressed win percentage or one of the fancier Pythagorean methods or both. Why? Because the problems with examining a team’s observed win percentage shouldn’t be an issue with the number of team-seasons we’re examining. And my preference is to look at real events, not estimated events, in large group studies.]

The difference between a team’s expected win percentage and their actual win percentage should be a measure of their home field advantage. I broke games into three groups: “Above,” or A, where attendance was more than 1000 fans above average; “Below,” or B, where attendance was more than 1000 fans below average; and “Control,” or C, for all other games.

	G	ATTEND	P_M	PCT	ExpPCT	Adv	Adv81
A	32979	37511	10981	0.555	0.499	0.056	5
B	36992	16109	-9789	0.524	0.501	0.023	2
C	4104	25975	-4	0.544	0.500	0.044	4

So what does this all mean?

G is games in sample.
ATTEND is the average attendance.
P_M is the average difference between attendance and the average in that season.
PCT refers to the actual win percentage of the home team in those games.
ExpPCT refers to the expected win percentage, based on the log5 method.
Adv is the home field advantage, expressed in terms of win percentage.
Adv162 is home field advantage rated out to 81 games (half of an 162-game season).

In spite of my concerns of bias, it doesn’t seem like there’s an issue with a team’s expected win percentage—in a larger sample it doesn’t seem to vary much with attendance. (That 2 percent difference bothers me, but I don’t think it means anything.)

Our control group, C, has a home field advantage of .044, or about 4 games, which is a pretty typical result for studies of home field advantage. Our A group outpaced that result, winning on average about one more game a season. Our B group underperformed that result, winning on average about two fewer games a season.

The results certainly seem to suggest that the home crowd is a factor in giving a team home field advantage. We have roughly 32,000 games in both the A and B groups (only about 4,000 in the C group), certainly a robust-seeming sample.

Some caveats:

As I said earlier, we’re using a crude proxy for the enthusiasm of the home crowd, its size. It’s possible that there’s some other reason that teams with higher attendance have a bigger home-field advantage.
This is a very small effect—again, something on the order of one to two games per season. It’s interesting, and it seems to confirm some bit of conventional wisdom, but there doesn’t seem to be much of a practical benefit to knowing this.

So what happens when we increase our differential from 1,000 to 15,000 fans?

	G	ATTEND	P_M	PCT	ExpPCT	Adv	Adv81
A	8662	46697	20835	0.557	0.498	0.060	5
B	7038	9399	-17938	0.504	0.503	0.000	0
C	58375	25164	-929	0.541	0.500	0.040	3

Curiouser and curiouser—our super-high-attendance teams don’t seem to win any more games than our high-attendance team did. Is this simply an effect caused by low attendance of teams playing out the string? Let’s look at the initial results again, this time excluding a team’s performance after August:

	G	ATTEND	P_M	PCT	ExpPCT	Adv	Adv81
A	28093	37482	10980	0.552	0.499	0.053	4
B	29415	16553	-9490	0.526	0.501	0.025	2
C	3435	25906	-3	0.545	0.500	0.045	4

It seems to smooth out our results a bit. There still is an advantage to playing to a packed house, but not enough to make a difference in a full season’s slate of games. There is still a sizable disadvantage to playing to an empty house, but the question is: Does the lack of fans make the team play worse, or are the fans just staying away because they know something that we don’t? I hate to end on a mystery, but I don’t have a way to answer that question.

References & Resources
The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.

Tom Meager looks at home field advantage by component, as does KJOK.

Phil Birnbaum looks at the effect of travel on home field advantage.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG