Does size matter? I’ve already put you through three installments asking just that question, and I promise you won’t have to wait any longer for a conclusion. In part one, we looked at how performance varies among different-sized players, and found that, unsurprisingly, bigger guys are better hitters. Specifically, we found that larger players hit for more power (home runs, doubles), while smaller players have small advantages in terms of singles and triples. We also found that tall players strike out a lot more, while walks seem to be most common at the extremes.
Well, that’s all great information to have, but the more important question is, how much does size affect our expectations for a player’s development? If big guys hit more home runs, well, that will be reflected in a large player’s stat line, but will it affect his projected home run total in the next year? Should we expect a big guy and a little guy who hit the same number of home runs in one year to have different projections in the next season, solely due to their size?
That is question that we explored in parts two and three. What we found was that, yes, size had a very real impact on a player’s projection: All else being equal, we expect a player to hit one additional home run for every extra 9.5 pounds. That difference is pretty huge: Jose Reyes and Carl Crawford have similar home run totals this year, but their weight difference makes for a 7.5 home run advantage in Crawford’s 2007 projection. That’s one win, all because of size.
Furthermore, we found, the impact of size on a player’s projection, somewhat surprisingly I might add, is pretty constant. While I expected size to play less of a difference as players got older and peaked, it in fact continued to have the effect across the spectrum of ages. As well, we found that the effect we’re witnessing is indeed the effect of size, and not size standing in as a proxy for a player’s position, which in turn would be a proxy for his hitting ability. Size matters, when it comes to power. What about everything else?
Today, we’ll look at how player size affects walks, strikeouts, singles, doubles, and triples, and form a conclusion on the run impact of size on a player’s projection. We’ll do it by using the same methodology as used in the previous two installments. To recap quickly, I looked at all post-World War 2 seasons in which a player had more than 200 plate appearances in consecutive years. I pro-rated every player’s statistics to a 150-game season (630 plate appearances for walks and strikeouts, 475 at-bats less strikeouts for the rest). Then I tried to project each category based on the player’s performance in that category the previous season and his size. The results follow.
As I already mentioned, I found quite a large relationship in part one of this series between player height and strikeouts. This is not surprising for a few reasons: (1) Taller players have bigger strike zones; (2) Taller players tend to hit for more power, and power is positively correlated with strikeouts (meaning the more power a player hits for, the more he tends to strike out); and (3) We know what height has a positive impact on the number of home runs a player hits in the future, and since more power means more strikeouts, height should also have a positive impact on strikeouts.
As well, we might expect taller players to strikeout more than shorter ones the next year even if they had identical strikeout-rates in this season because being taller means that it’s easier for the pitcher to throw strikes, so you’ll probably see more strikes, and therefore more strikeouts. Also, taller players may have a propensity for flailing away a little more because they have longer arms and therefore might be feel that more bad pitches are “within reach.” So what’s the result? The following table lists the impact of height on a player’s strikeout rate going from one age to the next.
Age Height 20-21 1.525 21-22 -0.585 22-23 1.0255 23-24 0.739 24-25 0.7925 25-26 1.0651 26-27 0.415 27-28 1.1651 28-29 0.9751 29-30 0.8071 30-31 0.419 31-32 1.3191 32-33 1.1621 33-34 -0.100 34-35 0.393 35-36 0.320 36-37 2.1261 37-38 -0.091 38-39 2.4365 39-40 0.014 1 = Statistically Significant at the 1% level 5 = Statistically Significant at the 5% level
Some interesting results here to discuss. First of all—and this applies to every category—I’d encourage you to not put too much stock into the first two and last five numbers in the table. The samples at those points are so small (less than 300 players) that there’s a lot of uncertainty involved in the coefficients. For example, the coefficient for height for players going from 39 to 40 is just .014, which implies that height has little or no effect on a player’s projected strikeouts. But, what that coefficient really means, due to the small sample, is .014 +/- 2.44. So any answer between -2.30 and 2.58 falls within the error range of our test. In the years with sufficient sample sizes, it looks like one inch of height is worth about one additional strikeout in the next season; the coefficient for 40 year-olds does nothing to contradict that.
So let’s concentrate on the numbers from 22-23 to 34-35. Eight out of 13 coefficients are significant, and a few others come close. The effect seems significant, but shaky. A weighted average of the whole sample and of just those ages gives us a coefficient of around .80, so one inch of height means about an extra eight-tenths of a strikeout in a player’s projection.
Despite some worries about significance, it seems to me that the relationship is real, and that even if it isn’t, the magnitude is small enough to have little effect on a player’s projection.
Walks provide an interesting case-study for a variety of reasons. First of all, like strikeouts, walks are correlated with power; as hitters hit the ball harder, they get pitched around more, and get more free passes (it also can work the other way around; when a hitter gets more patient, he can wait for the pitch he wants and put a charge in it—Sammy Sosa is a prime example of patience begetting power). So if strikeouts are significant because they correlate with power, walks should be too. If walks turn out to be insignificant, then we can probably conclude that tall players do not strikeout more because of their tendency to gain power more than short players.
Secondly, the possible relationship between height and walks is intriguing to say the least, if it exists. If walks are correlated with size, then we have to assume that hitters can indeed “learn patience,” and indeed we have an idea of which players do it better. If they’re not, then it seems that patience is more of an innate skill—that there isn’t any way to categorize players as ones who will become more patient and those who won’t. Not after accounting for the powerful force that is regression to the mean, anyways (which means that a player who walks very little one season will tend to walk more the next and that a player who draws a high number of walks one year will tend to draw less, though still a good amount of walks, in the next).
So how does height impact walk rates? Let’s take a look.
Age Height 20-21 -0.411 21-22 -0.040 22-23 0.7601 23-24 0.006 24-25 0.223 25-26 0.167 26-27 0.5811 27-28 -0.196 28-29 0.196 29-30 0.041 30-31 -0.297 31-32 0.170 32-33 -0.104 33-34 0.543 34-35 -0.285 35-36 0.279 36-37 -1.2385 37-38 -0.497 38-39 0.056 39-40 -0.825 1 = Statistically Significant at the 1% level 5 = Statistically Significant at the 5% level
Well, here’s a picture of total insignificance. Actually, let me hijack this article for a bit. Even those of you who have gotten to this point are probably going to want to skip the next couple paragraphs. You may have noticed that three of the coefficients above are indeed significant. And you may be wondering, well how can I just reject them as insignificant? Well, that’s a good question, and it leads to the important fact that statistically significant does not necessarily mean actually consistent. Statistical significance just indicates that your result is unlikely to have happened by random chance; generally, statisticians use a significance-level of 5%, meaning that a result is accepted as significant if there is a less than 5% chance that it could have happened by chance alone. But of course, if you have a lot of numbers, some are going to end up being significant, even if there is no real effect. If you have 20 different data points, one should be significant at the 5% level no matter what.
Well, in this series we’re testing six different categories at 20 different age groups. With apologies for giving the rest of the article away, kind of, I’ll tell you right now that five of those categories are significant. If we just concentrate on the 13 significant age groups, that gives us 78 different data points. We would expect, if we did our math right, for 62 of those data points to be significant. How many data points pass the significance test? 62. So indeed, five of the categories, just as we concluded are significant. That a few of the above coefficients pass the significance test statistically does not mean that they have any meaning in actuality. They don’t. Height has no impact on walks.
The weighted average of the height coefficient is .095. Only a few of the coefficients are significant, and as noted in the above two paragraphs, that’s simply a statistical hiccup. The sign of the coefficients continually changes. So we have to conclude that a player’s development in the walk category is not affected by his height.
While we used height as a factor for the strikeout and walk tests, we’re going to use weight for these final three categories (as we did for home runs). This doesn’t make a big difference, because height and weight are almost perfectly correlated, but weight is slightly better. It gives slightly more consistent results, it has a higher correlation, and I think it gives us a better idea of a player’s “size” (that is, six-foot, 215 pound player still looks, and probably plays, like more of a slugger than a six-three, 190 pound guy). But obviously I’d be hard-pressed to argue that weight itself can in any way, shape, or form affect walks and strikeouts.
Anyway, there are a bunch of interesting things that affect the number of singles a player will hit, but let’s get to the results first.
Age Weight 20-21 0.2355 21-22 -0.045 22-23 -0.0835 23-24 -0.0871 24-25 -0.0545 25-26 -0.1251 26-27 -0.1091 27-28 -0.0841 28-29 -0.0651 29-30 -0.1121 30-31 -0.0881 31-32 -0.0971 32-33 -0.0755 33-34 -0.1081 34-35 -0.0845 35-36 -0.036 36-37 -0.1871 37-38 -0.1915 38-39 1.797 39-40 -0.124 1 = Statistically Significant at the 1% level 5 = Statistically Significant at the 5% level
The weighted average effect here is -.079, which means that for every extra 10 pounds, we expect a player to hit eight-tenths of a single less the next season. That’s a small effect, but a very real one nonetheless.
So why is it that bigger players see a bigger decline in singles from one year to the next than small players? It seems at first blush that size would either play no role, or the opposite: That big guys would be more likely to keep up their singles-hitting from one year to the next because they’re not as reliant on a groundball finding the hole.
Well, this is where batted-ball data can be really helpful. JC Bradbury and I did a study in the Hardball Times Annual 2006 (and I’ll be re-visiting it in the 2007 Annual, which is available for pre-order) in which we looked at the year-to-year consistency of batted-ball distributions and outcomes. What we found was that (a) About half of all singles occur on line drives, and about 40% occur on groundballs; (b) The percentage of a hitter’s batted balls that are line drives shows some but little consistency from year-to-year, while groundball rates are highly stable; and (c) That the number of singles a hitter hits per groundball is about 50% more predictable than his rate of singles per line drive.
So what does that tell us? Well, bigger guys tend to hit less groundballs, so their singles-rate in a year is much more dependent on their line drive rates. Meanwhile, line drive rates vary so much that the number of singles a non-groundball hitter will hit is much less predictable. Okay, but why is the coefficient negative? It’s just as easy to go from many line drives to few as it is to go from a few to many. Well, not quite, I don’t think. (Note: What comes next is just a theory; I have no concrete backing for it as of now.)
Line drives are not distributed normally. The average major league hitter will hit a line drive about 21% of the time he puts his bat on the ball. But, my guess is that a hitter is more likely to hit a line drive 26% of the time than he is to hit a liner 16% of the time, simply because hitters that don’t hit very many line drives just don’t make it to the major leagues. They’re not good enough hitters. On the other hand, based on my research on batting average on balls in play for pitchers, I suspect that hitters with high line drive percentages are likely to regress more than hitters with low line drive percentages. That is, a high line drive percentage tends to be mostly luck; a low line drive percentage can indicate a bad hitter.
So what does that mean? Well, hitters who are reliant on line drives for singles end up either with about the same number of singles on liners from year-to-year, or less if they hit a lot of line drives the previous season. That makes for a small overall decline in singles from year-to-year from non-groundball hitters, who tend to be bigger guys.
Alright, so what about doubles? It seems to me that doubles would follow the same pattern as home runs, because they too are an indicator of power. I guess that you could argue that doubles are somewhat an indicator of speed (with fast guys able to stretch long singles into two-baggers), but I doubt that the effect is very great. Simply put, it seems to me, doubles are function of hitting the ball hard, and big guys hit the ball harder than little guys. Does that bear out in the results?
Age Weight 20-21 -0.041 21-22 0.0695 22-23 0.0691 23-24 0.0851 24-25 0.0861 25-26 0.0901 26-27 0.0621 27-28 0.0961 28-29 0.0711 29-30 0.0751 30-31 0.0731 31-32 0.0611 32-33 0.0741 33-34 0.0445 34-35 0.0801 35-36 0.021 36-37 0.034 37-38 0.070 38-39 -0.048 39-40 0.013 1 = Statistically Significant at the 1% level 5 = Statistically Significant at the 5% level
Yes. Indeed, bigger players do tend to improve their doubles totals more than smaller ones. Every 10 pounds extra turns out to be worth about seven-tenths of an extra double the next year. Is it a big result? No. But it is significant.
Let’s go straight to the results, and then proceed with some explanation.
Age Weight 20-21 -0.040 21-22 -0.014 22-23 -0.0235 23-24 -0.0191 24-25 -0.0221 25-26 -0.0411 26-27 -0.0221 27-28 -0.0301 28-29 -0.0221 29-30 -0.0241 30-31 -0.0311 31-32 -0.0171 32-33 -0.0251 33-34 -0.0201 34-35 -0.0195 35-36 -0.0245 36-37 -0.0371 37-38 -0.0265 38-39 -0.092 39-40 -0.010 1 = Statistically Significant at the 1% level 5 = Statistically Significant at the 5% level
Ten pounds mean about a quarter of an extra triple in a player’s projection, and the result is clearly very highly significant and highly consistent. Why is this? Well, there are a few theories I guess I could throw out there, and I’m guessing the effect we find is a combination of the three.
One: Non-normal distributions. Most players don’t hit more than a couple of triples on the season. Triples are highly luck-dependent events for all but a few players, so when a guy hits a few extra triples on the season, he’s not likely to repeat that performance the next year. The extra triples are luck. A bunch of triples are the result of skill only for the few guys who have blazing speed and rely on it as a big part of their game. Those guys are almost always little players, so triple totals fall off less for the little guys than they do for the larger ones.
Two: Power development. Triples are actually a sign of a lack of power; if you drive a triple a little further, it turns into a home run. Since we know that big guys improve their home run totals more than little guys, we would expect some of their triples to go over the fence the next season, more so than for small players. Also, as a player hits for more power, things like taking the extra base become less important and maybe even too dangerous, so big guys might be more likely to not risk going to third as they get older.
Three: Body type. Little players are generally speed demons; we should expect them to hit plenty of triples from year to year. As mentioned in the previous two points, this is not the case with big players.
So it’s come to that point; we’re about to make some conclusions. I’ve shown you the impact of size on various statistical categories; let’s see how that impact might actually translate in a projection. Let’s take two 27 year-olds with identically average batting lines. Why 27? Our sample is the greatest there, making for the most reliable coefficients, it’s a player’s peak age, it comes down the middle, etc. It’s just a good age. One is a 5’9”, 163 pound player named Little Larry, while the other is a 6’3”, 216 pound hulk named Big Barry. At 27, they both put up the following batting line (note: The batting lines presented here are rounded, but in doing the math, I have not rounded them so that the run values are correct):
PA AB H 2B 3B HR SO BB LW 630 558 152 26 4 16 83 57 -0.07
That last column is linear weights, or runs above average. That should be zero, but I guess a bit of rounding in the run values has caused it to be off slightly. That’s not important. Both players are completely average in every respect. So how will they do at 28?
AB H 2B 3B HR SO BB LW Big Barry 561 153 29 3 19 86 56 1.69 Little Larry 554 148 24 5 13 79 56 -5.38
Due to his size alone, Barry is projected to be more than seven runs better than Larry, despite equal performances in the previous season! That’s a pretty large effect. He’s expected to hit five more doubles and six more home runs, and have only six fewer singles and two fewer triples. But in this example, we’re going from about the 7th percentile to the 93rd in terms of player size. For 95% of all players, the effect of size will be capped at a total difference of less than 10 runs. Is that significant? Certainly. But it’s not that much.
Size, it seems, has about the same magnitude of affect as other “little things,” like baserunning ability, that performance analysis has ignored for too long.
References & Resources
I couldn’t have done any of this without the always-fabulous Lahman Database. However, the database does not quite contain full height and weight information, so the players for whom a height or weight was not listed were removed.