# The Hardball Times

## The root (part 2)

by Colin Wyers
March 05, 2009

Last week, I discussed two methods of evaluating a player's monetary worth, marginal revenue product and free market returns. (Unless you're familiar with the basic concepts, I'd suggest reading that article first.) This week, we put these theories to the test.

Before we look at how much a player is worth in coin of the realm, we should first take a look at it from the team perspective. Revenue, like run scoring or run prevention, is a team activity, and our models should first be validated at the team level.

### A cautionary note

Many of the results in this article come from a technique called multiple linear regression, which is used to determine how multilple variables can effect one dependent variable.

So here are a few caveats regarding the use of regression analysis in this article:

• A regression analysis is only as good as the input variables used. If you feed your regression bad inputs, you will get bad outputs. This is true even if you get very high confidence levels from your statistical tests.
• The choice of inputs reflects the thoughts and opinions of whoever is running the regression. A regression cannot prove the starting assumptions underlying it, only provide a model using those assumptions.
• Even with good inputs and good assumptions, you can sometimes get results from a regression that are difficult to interpret, due to problems such as collinearity —two inputs that are related to each other.
• I do not have a strong academic background in statistics or econometrics. I have tried my best to create adequate models based upon the data at hand, but have no doubt that others could do better with the same data.

Because of these limitations, I am trying as much as possible to check the results of these against real-world data and see if they adequately explain what is actually happening with team finances and player salary.

### Estimating team revenues

The lynchpin of this, and almost any other MRP-based analysis of team revenue, are the Forbes team financials estimates. The key word here is estimates. How Forbes comes up with these estimates is nearly as secretive as actual MLB team revenues.

In short, we're unsure if we can trust the Forbes estimates. This is a problem, because if those estimates are wrong, then you can essentially throw away these results and lock away the key. In the absence of better data, though, this is what we're left with.

I limited the study to the years 2002-2007, as 2002 is the last year in which the way Forbes presented their revenue values significantly changed. Forbes breaks out revenues into two categories: attendance and other. Other encompasses several kinds of revenue, including broadcasting revenue, league-wide shared revenues from things like merchandising, and playoff revenues.

### Market size

Since there are two main kinds of local revenue (attendance and broadcasting) there are two ways to measure market size. You can either look at the local metropolitan area's population, or the television market's population.

Does it really make a difference? In some cases. For cities like New York, Chicago and Los Angeles, it doesn't really make a difference either way. But looking at metro area population would suggest that Miami is the sixth-largest MLB market; looking at broadcast market drops Miami down to 16th, a more sensical result. Looking at TV markets also allows Boston to leapfrog cities like Miami, Washington, D.C., Houston and Detroit in the rankings.

It should also be noted that there are different units of measure. Metropolitan areas are measured in terms of population, market sizes in terms of the number of households. A household is roughly two and a half people (and a third of a dog or cat).

One thing I didn't account for is split markets; there are four markets that have two teams: New York, Chicago, Los Angeles and San Francisco/Oakland. For the purposes of our regression, each team in the market was treated the same. This is not right, per se, but for a broad analysis like this it should be fine. If I was doing a team-by-team analysis of payroll to wins I'd want to come up with a better model for split markets.

The other issue with these market size estimates is Boston; Boston's actual size—metropolitan or market—simply isn't that large; its ability to draw from all over New England is what gives it its large revenue base.

### Baselines

As I mentioned last week, there is a limit to the amount of playing time a team has to allocate among its players. Because of this, there's a key fact to consider—and if you remember nothing else from this article, please remember this:

Baseball teams cannot purchase additional labor; they can only substitute the output of one laborer for another.

Thus, the marginal revenue product of a free agent to a team is the difference in revenue between him and what he is substituting for. Broadly speaking, free agent labor is a substitute for controlled, or pre-arbitration, labor. This controlled labor is not free, as teams invest a lot of money into scouting and player development. They set up academies in other countries, scout thousands of high schools, colleges and independent minor league teams, pay millions in draft bonuses and subsidize over a dozen minor leagues and two fall leagues.

In other words, at the team level, teams will spend free agent dollars to increase revenue. But a team with \$0 free agent dollars spent will not have \$0 revenue or 0 wins. Because of their investments in minor league players (and as a result, their ability to sign minor-league free agents and other players available at the league minimum), a team who spends \$0 on free agency will still win games and make money. The question is, how much?

As discussed in my series on player value, I am fond of replacement level as a baseline for measuring player performance. And, wouldn't you know it, the typical definition of replacement player seems to coincide pretty well with our \$0 free agent team above—a team composed mostly of Quadruple-A players from the minor leagues. So it would make sense for a team's free agent spending to coincide with their wins above replacement level.

As I have mentioned before, replacement level is difficult to pin down precisely. For the purposes of this regression, I defined a replacement level team as one with 48 wins, or a .296 win percentage. This should line up pretty well with the WAR values on sites like Fangraphs or Baseball Projection or even Baseball Prospectus' updated WARP values, when those become available.

### Results

So here's what I did. I took all the data above and created a model for the value of a win above replacement to the average team. I then broke those values down by year, assuming 10 percent revenue growth per season. Then, I compared those results to the values provided by Dave Cameron of Fangraphs, who looked at the prices paid by teams for free agents compared to their projected Wins Above Replacement.

 Attendance Other Total Fangraphs Difference 2002 \$743,906 \$1,544,257 \$2,288,164 \$2,600,000 \$311,836 2003 \$826,563 \$1,715,841 \$2,542,404 \$2,800,000 \$257,596 2004 \$918,403 \$1,906,490 \$2,824,893 \$3,100,000 \$275,107 2005 \$1,020,448 \$2,118,323 \$3,138,770 \$3,400,000 \$261,230 2006 \$1,122,493 \$2,330,155 \$3,452,647 \$3,700,000 \$247,353 2007 \$1,234,742 \$2,563,170 \$3,797,912 \$4,100,000 \$302,088 2008 \$1,358,216 \$2,819,487 \$4,177,703 \$4,500,000 \$322,297

In other words, the gap between the Fangraphs values and what our MRP model esimates was, on average, \$282,501. That's a difference I can live with.

I have to say, I'm very pleased with these results. I was actually surprised at how close a match the values were, given the amount of things I had to estimate (like market size) and the uncertainties of the underlying revenue data. I don't want to act as though this is the final word on payroll evaluation - we have a ways to go before we're "there," and I don't know if we'll ever get there without more detailed disclosures of team finances. But it's a start.

### Coming Up

Next week, we'll look at two big questions in evaluating salary: Ddo larger market teams have a higher value for a win? And do superstar players command more money per win than average or worse players? And we'll also look at how well our model explains the 2009 free agent market.

References and Resources
All regressions done using the Ordinary Least Squares function in gretl, a free (and open-source) econometrics analysis software package.

For those curious, here are the regression models alluded to above. For attendance revenues:

Gate_Receipts=NEW_PARK*12.4446+Pop*3.06788+W*0.971855

And other revenues:

Other_Revenue=RSN_DUMMY*-14.2312+Households*8.90026+W*2.01745

Adjusted R-squared for the regressions are 0.83212 and 0.88171 respectively. The average population of an MLB market is 5.74 million people; the average television market size is 2.53 million households. The average team makes \$53.73 million in gate reciepts and \$97 million in other revenues. NEW_PARK and RSN_DUMMY are both dummy variables, set to 1 if true and 0 if false. I considered a park to be new for the first seven years of its existence. RSN_DUMMY stands for whether or not a team owns its own regional sports network.

Those versed in regression analysis will note that I have not used a constant in either regression analysis. This is because the results produced using a constant in the regression were simply not useful; the dollar per win value for total revenue was less than a million dollars. This lead to a regression model that suggested that a zero-win team would be worth in excess of \$100 million in revenue. Neither of those results seem probable.

The regression models seem to suggest that RSN ownership decreases, not increases, revenues. There are two potential explanations for this. One suggests that the regression is simply wrong; the p-value would suggest that this finding is not significant. But it is possible that RSN ownership actually decreases on-book revenue - teams that own their own broadcast outlet can sell their broadcast rights for a below-market price, thus reducing team revenues. The lost revenue from this accounting trick then goes to the RSN, which is not subject to the same revenue sharing as the team's broadcasting revenue. If Forbes is reporting revenues in this fashion, then overall team revenues are probably understated compared to their actual value to their owners.

US population figures were taken from the 2007 US Census estimates. Canadian population figures from the 2006 Canadian Census.

US media market sizes were obtained from Nielsen figures. Canadian figures courtesy of the Television Bureau of Canada. In order to convert the population figures to households for the Canadian numbers, I divided by 2.7, which I came upon by comparing the US figures provided by TVB to the Nielson figures. RSN ownership was taken from a Wikipedia entry.

Two notable studies on market size could be used to further refine these estimates - Mike Jones' study, and Nate Silver's study.

The Forbes estimates of team revenues and expenses were collected by Rodney Fort. Our own John Beamer has discussed the uncertainty around the Forbes data before.

Beamer also discusses some of the pitfalls of regression in a recent article - stay tuned for part two next week. Here's an interesting critique of the use of regression analysis in public policy.

Thanks to Paul D., Ron Johnson and Vince Gennaro for their input.

Colin Wyers knows exactly how much of a nerd he is. He is very interested in hearing about any other concerns you may have; you can reach him by e-mail, and he will try his best to respond in a timely fashion. He also blogs at Statistically Speaking.