Opportunity has been a hot topic lately in economics, and it is important in major league naseball, as well. The rest of the world has been able to escape poverty at rates unobserved in human history, and foreigners have been able to play in the majors at record rates.
However, in terms of both economic mobility and baseball, it has been difficult for many native-born Americans to get ahead. In this article, I will discuss some of the changes that have transpired regarding where baseball players are born along with the economic implications.
Decline in the Share of Black Players
The most notable topic related to mobility that has appeared in baseball research is the concerning decline of African-Americans in baseball over the years. Mark Armour and Dan Levitt recently wrote about the rise and decline of the share of African-Americans in major league baseball from Jackie Robinson’s emergence on the scene until 2012. They show the percentage of baseball players who are African-Americans peaking at 18.7 percent in 1981, only to fall all the way to 7.2 percent in 2012.
Armour and Levitt highlight that some of this trend comes from the fact that African-Americans are disproportionately position players (especially outfielders), and the growing share of pitchers on major league rosters has contributed to this decline. However, this does not begin to explain the decline by more than half, nor does the share of Latinos and Asians.
As their data show, even though the fraction of Latinos and Asians has grown by 17.8 percent (from 11.0 percent to 28.8 percent) during this time, the decline in whites has gone only from 70.1 percent to 63.9 percent (a 6.2 percentage point decline), while the share of African-Americans has fallen by 9.5 percentage points.
J.C. Bradbury made some important contributions to the discussion on this topic, as well. He also noted the decline of African-Americans in the data and the limited explanatory power of the increased share of Latinos. However, Bradbury also firmly debunked the myth that African-Americans are simply participating more in football and basketball instead by showing that the percentage of African-Americans in the NFL and NBA has remained roughly constant even as their share has been drastically reduced in major league baseball.
Bradbury also showed that although average wealth of African-Americans is behind that of whites in the United States, the gradual upward trend is similar for both races and has not diverged; so wealth disparity would not appear to explain the decline in the share of blacks in the majors. However, there is one wrinkle that Bradbury did not discuss that I will explain in this article: what if income matters more than it used to and the gap between blacks and whites is not larger but instead is more relevant?
Economic Mobility in the United States
I will not study the share of African-Americans directly in this article—I am still working on the data. However, this piece will discuss the importance of opportunity in making the major leagues and how it has evolved over time. In one of the most important studies on economic mobility, Raj Chetty, Nathaniel Hendren, Emmanuel Saez, and Nicholas Turner looked at economic mobility by area across the country and found some important trends. They provide a tool to study differences by region in economic mobility that David Leonhardt wrote about along with providing this map at the New York Times in July:
Areas in red have the worst income mobility, and areas in blue are those with the highest income mobility. What is extremely striking is the massive splotch of red you see in the South. This will relate to some of my findings later in this article, but it is important to note that this finding is not about race, per se.
As Leonhardt explains, “Regions with larger black populations had lower upward-mobility rates. But the researchers’ analysis suggested that this was not primarily because of their race. Both white and black residents of Atlanta have low upward mobility, for instance.” In other words, while low economic mobility in the United States is disproportionately affecting blacks, it is not just because of their race but because of crucial disadvantages they have more frequently than whites.
In fact, the researchers found a very distinct trend in the data that has been discussed extensively. This is that, “All else being equal, upward mobility tended to be higher in metropolitan areas where poor families were more dispersed among mixed-income neighborhoods.”
In other words, what makes mobility so low in the South and other regions is the geographic segregation of rich and poor, making it harder for the poor to have access to the same opportunities that the rich have. This is central to my argument about what has happened to African-Americans in the majors. As I will show, the most likely culprit is change in the opportunity to succeed.
Whether the Weather?
Another characteristic of the South other than urban sprawl is that (Polar Vortex not withstanding) it is warm. Even in the winter, the South often has weather that enables people to play baseball. That matters because training for baseball has increasingly become a year-round activity; players in the South (and the Southwest) have been able to hone their skills throughout the winter in a way that players in colder climates have not been able to.
Everyone knows this has been one of the reasons that baseball has thrived in Latin America, but I will show that this is also of increasing importance in the U.S. As the opportunity to improve throughout the calendar year presents itself, there is no reason to think that it would necessarily affect everyone equally. In fact, the data suggest that when the climate is warmer, local income matters more. This is especially true in recent years.
In today’s article, I will look at Wins Above Replacement (WAR) per birth at a regional, state, and county level for players born in the United States. What my results will suggest is that while warmer weather states produce more ballplayers, higher income counties produce more players and that these trends have increased over time. These results hold up under a variety of specifications, indicating that this is not a statistical quirk but, rather a trend that may explain why African-Americans make up a smaller share of major league players today.
Baseball data came from Jeff Zimmerman, who helped me with a data file of career WAR along with date and city of birth for all players in major league history. I refined this to concentrate only on players born between 1940 and 1989, since this was most readily adapted to other data sources.
For weather data, I used average temperatures by state and month from 1971 to 2000 from naaa.gov. While this does not quite capture changes in temperatures over time, nor does it capture the variance in average temperatures by state (an obvious limit in particular for California), it at least gives a reasonable proxy for the opportunity to play baseball in the winter in a state. I will actually end up using only December, January and February average temperatures by state, which were far more relevant than average year-round temperatures.
To get county-level income, I used median household income data from the Census’ Small Area Income and Poverty Estimates from 2011. Obviously, income inequality exists within counties, too, so simply ascribing the median household income in a county does not quite explain an individual’s actual opportunity. Furthermore, this does not quite capture the incomes of these individual counties from 1989 or earlier, but it at least provides a reasonable proxy. As you will see, the results are statistically significant even with this noisy measure.
The next thing I needed was data on births per county, which I got from the National Cancer Institute’s Surveillance, Epidemiology, and End Results Program. This provided population by age group in every county from 1940 to 1989, and I used only data on the number of people who were under one year old in each county and summed across the decade. (I figured it was about time the National Cancer Institute’s data was used for something important like baseball!)
The last thing I needed was ZIP code data linking up counties to cities, since Zimmerman’s data file had player’s city of birth, not the county. Fortunately, I was able to get this from the ZIP Code Database. This was good enough to match up the county with the player for 97 percent of players born after 1940, which should eliminate any issues with missing data.
WAR per Birth, by Region and State
Let’s start with a simple grouping of census regions as defined here. The table below shows the ratio of the percentage of career WAR from U.S.-born players in a given decade from a given region, relative to the percentage of births in the U.S. in that region from the decade (which will be a central statistic that I use throughout this article). An important caveat should be noted. To avoid outliers driving the analysis, I capped all individual WAR at 20.0 (i.e. I treated a player with 120.1 WAR and a player with 20.1 WAR as both having 20.0 WAR career).
|WAR/births by decade, region|
What immediately jumps out is that the Northeast and Midwest have always produced fewer players (as measured in WAR to account for differences between players) than would be expected based on their population sizes. For most regions, the proportion of players has declined, but the fraction of players born in the South relative to the actual birth rate in the South has actually grown, especially during the 1980s.
When considering the table above, it is even more amazing that the fraction of African-American players has decreased. Given that the percentage of African-Americans born is nearly twice the rate in the South as any other region, there should be an increasing share of African-Americans if all else were equal.
By my estimates, if there were no unique factors that affected African-American and white players other than the geographic region in which they were born, the share of African-Americans should have grown from about 11 percent to about 16 percent during this time period. In reality, it declined from about 19 percent to about 7 percent. Even as the South was producing more players, the share of black players was declining.
Of course, using general Census regions has benefits and limits. The benefit is that the sample size is large enough in a given decade that numbers can be trusted as meaningful. For instance, we clearly see that there is no way that the West doesn’t produce more players than its population suggests. However, it also clusters together states with differing player production rates, too.
So, for the sake of completeness, and accepting the small sample size limitations, here are five maps (one for each decade) that show the trend over time in WAR/Birth Ratio (again, WAR is capped at 20.0 per person) at a state level. The red states are the ones that produce the most baseball players, and the blue states are the ones that produce the fewest. Orange states produce the second-most WAR and purple the second-least, with green being average. Watch the red and orange move south over time and the blue and purple move north.
Remember that the sample sizes can be very small for individual states, which really can throw off the individual state numbers in a given decade, so focus on the general trend especially in larger states. And the trend is that by the 1980s, there is almost no overlap between red and orange areas (higher WAR/birth ratios) and blue and purple areas (lower WAR/birth ratios). Although this trend was present beforehand, it is striking how much the map evolves over time.
Warm Winters and WAR
States with warmer winters produce more WAR per birth than states with colder winters, and this trend has strengthened over time. Consider the following table of the 10 most populous states in the nation, sorted by the average temperature in the December, January and February.
|WAR/births by decade in 10 most populous states|
|State||Avg. Temp, Dec-Feb||1940s||1950s||1960s||1970s||1980s|
What should jump out is that even though the warmer winter states had higher WAR/birth ratios even in the 1940s, there were still states like Ohio and Michigan with colder winters that produced more ballplayers per birth than average, and states like Georgia and North Carolina that produced fewer than expected players.
However, by the 1980s, all five of the largest states with high average temperatures produced more WAR than would be expected based on the number of births, and all five of the largest states with low average temperatures produced fewer WAR than would be expected based on the number births. In fact, move Texas down three spots and Illinois up one spot and the mapping of temperature to WAR/birth in the 1980s is straight-up monotonically increasing (i.e. for any two states you would pick, the warmer state would have more WAR per birth).
County Income and the Production of Baseball Players
Income is important in the production of baseball players, too. As I highlighted at the beginning of this article, the primary goal here is to study the effect of income over time. The following table shows the correlation of county income with WAR/birth ratio of that county in each decade and overall.
|Correlation of birth county income with WAR/birth ratio, by decade|
|Correlation Time Period of Median Household Income and…||WAR/Birth Ratio||p-stat|
As mentioned above, county income is an not exact measure, and county income itself is not from the same time periods, which explains why the correlations are on the low side, but with a sample size of about three-thousand counties, the results are statistically significant for the 1950s and 1980s, as well as overall, and are highest in the 1980s. Furthermore, since most counties produce zero ballplayers in a given decade, there is a lot of noise in the estimates, even capping WAR at 20.0 per player. However, once I factor in temperatures, it becomes clearer that income has become more important over time.
The following few tables will show regression analyses, in which I control for correlated factors to isolate the significance of each variable. The specific quantitative results from the coefficients may be difficult to describe without more calculus than I would care or need to get into, but the most important takeaways are the variables show up as statistically significant.
In the end, these regressions are glorified correlations that have the added benefit of controlling for several factors at once. This approach shows how important weather and income are, and how they have become more important over time.
Since the data are skewed, the dependent variable is not WAR/birth ratio of a county, but the natural log of WAR/birth ratio, which is less skewed. Since the natural log of zero is non-existent, I use the natural log of (WAR/million births + 1). From here out, I’m just going to call this “the WAR-birth function.”
First, here is the regression of the WAR-birth function on county income, with an adjustment for decade:
|Regression of the WAR-birth function (N=15,210)^|
|Median County Income (in millions)||16.6||.000|
|Median County Income * Decade Number Indicator||2.27||.010|
|Decade Number Indicator#||-0.143||.000|
^: Sample sizes don’t match above because of missing temperature data on Alaska and Hawaii
#: Note that this variable is just set to 0 for 1940, 1 for 1950, etc. as a means of removing bias to get a true impact of the interaction term, much like the constant term
The fact that the first row’s variable (median county income) is statistically significant means that county income level has a positive effect on WAR for the county. The second variable shows the interaction term of median county income with a decade number indicator (a dummy variable that simply increases by one each decade).
The fact that this is statistically significant is very important. It shows that not only is income important, but that it has become more important over time. In fact, the coefficient on the WAR-birth term is effectively 16.6 for the 1940s, but is 25.7 for the 1980s (16.6 + 2.27*4). In other words, a county’s income level is more than 50 percent more important than it used to be in determining how many baseball players a county could produce.
Putting Together Green Weather and Green Money
The next regression shows the importance of weather. It shows that counties with warmer winters have produced more baseball players in richer counties.
|Regression of the WAR-birth function (N=15,210)|
|Winter Avg. Temp.||-.008||.000|
|Winter Avg. Temp. * Median County Income (in millions)||.361||.000|
Ignore the negative coefficient on winter average temperature outside of the interaction because that would only show the effect of winter average temperature on a county with a median income of zero. Instead, what this shows is that if a county has an average income of $22,000, the effect of warm winters would be near 0, but it would be positive beyond this income level. Verbally, this table shows that warmer winters matter more in higher income counties.
The next regression will show that warmer winters matter more in recent years.
|Regression of the WAR-birth function (N=15,210)|
|Winter Avg. Temp.||.0028||.002|
|Winter Avg. Temp. * Decade Number Indicator||.0012||.002|
|Decade Number Indicator||-.052||.000|
Here, the coefficient on winter average temperature is positive, but so is the interaction term. This means that warmer winters made a county more likely to produce players even in the 1940s, but that this was much more important in the 1980s.
In fact, it says that a one-degree increase in winter average temperature would increase the production of baseball players in an area by more than twice as much in the 1970s as the 1940s and nearly three times as much by the 1980s. (You can ignore the decade number indicator and constant terms again, since they are just statistical techniques to un-bias the regression coefficients that we care about.)
Lastly, let’s put it all together. Let’s include three interaction terms, considering the effects of income on the importance of weather and of income and weather over time. These are all statistically significant positive in the equation below.
|Regression of the WAR-birth function (N=15,210)|
|Median County Income (in millions)||-3.34||.115|
|Median County Income*Decade Number Indicator||1.58||.000|
|Winter Avg. Temp.||-0.011||.000|
|Winter Avg. Temp. * Decade Number Indicator||0.002||.000|
|Winter Avg. Temp. * Median County Income (in millions)||0.366||.000|
|Decade Number Indicator||-0.134||.000|
The three variables to consider here are the interaction terms. We have established that income and weather are important, but the number of interactions means that the coefficients on the individual terms will not matter without taking partial derivatives, and we can skip that today. The keys here are that:
- Higher income has become more important over time
- Warmer weather has become more important over time
- Warmer weather is more important in high-income counties (and vice versa)
Conclusions and Future Studies
I began this article by discussing the question of the declining share of African-Americans in baseball. Although I do not have data at this point on player’s races, I do think that what I have shown about incomes at the county level can help shine a light on this and other questions.
Putting together the high rate of baseball players relative to births coming out of the South in recent decades (and the growth of this high level) with the relatively larger share of African-Americans in the South, and referencing back to the issue of urban sprawl and low opportunity of lower-income areas in the South mentioned at the beginning of this article, I believe that a trend has emerged.
As becoming a baseball player has become a year-round activity in the South, it is easier for richer families to afford all that this entails. As a result, with a finite number of roster spots, lower-income young players (including, on average, African-Americans) are losing ground to higher-income young players who train throughout the year.
There are implications for MLB’s initiatives to address this issue, like the RBI Initiative (Reviving Baseball in Inner Cities). The issue may not be simply exposing young people to the game of baseball, but rather giving them an opportunity to train like richer families can afford.
In other words, MLB should not just settle for exposing young African-Americans to the game of baseball, letting them know that just maybe making millions of dollars playing a game might be a good way to live your life, even if you’re better at baseball than football or basketball. Instead it should enable promising youth to become great players by providing coaching and year-round training that would not otherwise be available to them.
Of course, all of my analysis was done without actual data on race. As a next step, I hope to redo some of these studies looking at African-American players’ production at a state and county level and separating out whether there were differences that could illuminate what has transpired.
Because it takes a few decades before one can even study career WAR by players born in a given decade, the fact that we are finding such trends for players born 25-to-34 years ago means that we are already pretty late and that correcting differences in opportunity will take decades.