Category influence

by Michael Lerra
December 11, 2008

In a typical head-to-head fantasy league, you’ll compete across 10 categories for a week’s time against one person. Since each category counts the same in the standings, intuition would lead you to believe that you should draft talent to compete across all 10 categories equally.

However, not all categories are equal in your ability to control the outcome. How many times have you seen a team with Matt Holliday, Joe Mauer, and Albert Pujols (nice draft!) losing to Adam Dunn, Jim Thome, and Mike Napoli in batting average for a week?

After playing in three Yahoo Head-to-Head leagues, I wanted to answer the following question: Are there certain categories in which I can control my performance to a greater extent than others? In other words … which categories tend to allow the truly best teams to win? And which categories have results that are largely a product of luck?

Across three 12-team head-to-head leagues, I summarized each team’s statistics in each category, and their won-loss record in each category as well. I was tempted to simply rank each team as a measure of their performance. However, it immediately becomes apparent that sometimes the difference between first and second place is 45 runs, but the difference between fifth and six is a mere three runs. To account for this, I calculated the mean and standard deviation in each category, and then assigned a z-score to each team in each category based on their performance.

The analysis was simple: correlate the z-scores for each team in each category with their respective W-L record in the respective category. The results are below:

Category	Correlation
SB	0.93
SV	0.93
K	0.91
HR	0.87
W	0.86
R	0.83
WHIP	0.76
RBI	0.75
AVG	0.68
ERA	0.66

Above is the “r” correlation for each statistic, relating the degree to which a team’s performance and it’s won-loss record are related. From taking statistics courses in college, I remember that squaring the “r” will tell you the percentage of variance in one statistic that is due to the other. In other words, squaring the .91 for strikeouts gives us (.91 x .91) = .8281. So about 83 percent of the variance of a player’s won-loss record in strikeouts is due to their team’s performance. What is the other 17 percent due to? A combination of luck, and perhaps some managerial skill. But mostly luck.

Stolen bases, saves, and strikeouts top the list. I’d expect strikeouts to be high—readers of this site know that a pitcher’s strikeout rate is pretty stable over time. The small amount of variance here could be due, in part, to the randomness of two-start weeks.

Home runs being fourth on the list surprises me. In general, I’d expect the categories with low totals to be more subject to variance. But, at least through three leagues, teams seemed to accumulate home runs at a steady enough rate to have their won-loss record in that category reflect their team talent.

Let’s skip to the bottom of the chart: ERA and batting average. These are the two categories that are most influenced by luck. And this is the heart of what I was trying to learn by examining these numbers. When you are examining preseason projections and trying to craft your team during a draft, these numbers would suggest that you place less emphasis on a player’s projected average or ERA. While there is a positive correlation between having a good ERA as a team and having a good W-L record in that category, we can see that (.66 x .66) = .4356. So 44 percent of the variance in a team’s ERA record is due to actual overall ERA, and a whopping 56 percent is due largely to luck.

There are a few shortcomings in this analysis. Saves and stolen Bases are the two “one-of-these-things-is-not-like-the-other” stats. Every season, in every league, there is likely to be a couple players who give up on one or both of these statistics in order to bolster their other stats. Consequently, there are a lot of players with incredibly low save and stolen base totals, who rightfully lose each and every week. This will artificially inflate those correlations, so they’re not as useful for drawing inferences that relate to your typical teams who have a typical number of closers or base stealers.

In addition, some managerial skill can reduce correlations. If, going into the weekend, I am leading in strikeouts, wins, ERA, and WHIP, and I know my opponent has no starting pitchers going, I may opt to bench my starters to ensure I maintain my lead in ERA and WHIP. Doing so will make my win and strikeout totals lower than they could have been, but I’ll still get wins in both statistics for the week. So that will artificially reduce the correlation between won-loss and each of those stats.

The bottom line is, if you are trying to build a fantasy team that competes across all 10 categories, it pays to trade some talent in average and ERA for talent in home runs, strikeouts, and other stats at the top of the chart. You’re more likely to get guaranteed returns in the standings with those stats … and when 56 percent of the variance in your chance of winning ERA for the week is due to luck, you’re never really out of contention in that category.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG