This article borrows its title from a book by Dave Eggers, but it could more aptly be named after an earlier work by Eggers entitled, “A Heartbreaking Work of Staggering Genius.” The work of genius, however, was not my own, by derived a brilliant hypothesis put forth by John R. Mayne in 2010. Mayne emailed to alert me of this piece earlier this year, after my release of SIERA (Skill-Interactive ERA) at FanGraphs, and I recently tested it. Despite initial pessimism, I was shocked by what I found.

Everyone now knows how important velocity is for a pitcher. For years, pitching coaches scolded amateurs about over-reliance on velocity. Prepubescent pitchers are lectured about Greg Maddux, told that movement and location are more important than a couple digits on a radar gun.

I’m no PITCHf/x expert, but everything I’ve read by those capable of studying that data says that velocity is actually very important, perhaps more important than movement and location after all. It’s hard to throw a 100 mph fastball that is easy to hit, and you have to be Jamie Moyer to get away with an 80 mph, lukewarm heater. Even a few ticks in the ones column of a radar gun can make world of difference.

However, until very recently, I believed that a proper study of a pitcher’s peripherals could tell you which of two guys with a 92 mph fastball has the superior arm, and I also believed that two pitchers with the same SIERAs with different fastball speeds were no different in future skill level.

When discussing SIERA’s ability to adjust for pitcher control of BABIP, Dave Cameron once noted that velocity may explain some of the missing pieces of the puzzle that correlated with both strikeout and BABIP skills. However, I found that if you control for peripherals, age, year, and role, then knowing a pitcher’s velocity is not useful.

In fact, running a regression on all of these, you will actually get an insignificant and positive coefficient of .00035 on velocity; in other words, a 3.0-mph increase in velocity with the same characteristics will correspond with a BABIP that is a full point higher!

When Mayne emailed me with this suggestion, I expressed my skepticism, but I was thinking about the idea the wrong way. Mayne was talking about projections in that article—predicting the future. What I now found was that knowing a pitcher’s velocity tells you about his potential to improve the statistics that express skill level better.

If you just run a regression of a pitcher’s ERA next season on his ERA from the current season, you get the following equation:

ERA_next = 2.76 + .368*ERA

Include velocity, and you get:

ERA_next = 9.49 + .327*ERA – .073*velocity

This formula says that, of two pitchers with the same ERA last season, the one who threw faster is more likely to improve. That’s not surprising. We know that a pitcher was probably more capable if he threw faster, so he probably had better peripherals and worse luck if he had the same ERA and more velocity. Right?

Actually, let’s take a closer look at the pitcher’s true skill level and replace his ERA with his SIERA to see what happens. If you run a regression of a pitcher’s ERA next season on SIERA from the current season, you will get the following equation:

ERA_next = 1.21+ .733*SIERA

However, if you run a regression of ERA next season on SIERA and velocity, you get the following result:

ERA_next = 4.52 + .677*SIERA – .034*velocity

Both coefficients are statistically significant at the 99.9 percent level. In words, this means that a 2.9-mph increase in velocity will correspond with a 0.10 lower ERA, even if you know the pitcher’s SIERA from the previous season.

What’s going on here? Well, the pitchers who throw faster are doing something better than others with the same peripherals. What is that? I looked at various components of pitcher performance to find the answer and found why Mayne’s hypothesis was accurate.

Suppose you know a pitcher’s strikeout rate. In this case, you can predict his future strikeout rate next year very well:

K%_next = 3.87 + .764*K%

However, once you know that pitcher’s velocity, you have a lot more information.

K%_next = -16.1 + .701*K% + .233*velocity

Verbally, this mean that if you have two pitchers with the same strikeout rate the previous year, the pitcher who throws 4.3 mph faster will strike out one percent more batters the following year than the pitcher who throws slower.

What about walks?

BB%_next = 2.864 + .644*BB%

BB%_next = 0.237 + .638*BB% + .296*velocity

In the case of walks, more velocity actually portends an increase in free passes.

However, if you start to include more terms, its significance disappears. Higher velocity is just correlated with other variables that are related to increases in walk rates, such as relief role, age, and strikeout rate itself!

Including strikeout rate in the regression on next year’s walks renders the velocity coefficient insignificant (p = .224), while it remains very significant (p = .000) in the regression on next year’s strikeouts:

BB%_next = 1.04 + .0151*K% + .6361*BB% + .0179*velocity

K%_next = -15.9 + .6984*K% + .0752*BB% + .2254*velocity

Controlling for both rates, more speed foreshadows an improvement in strikeout rate. Including a slew of other variables (results omitted for brevity) did not alter this conclusion.

If you look at BABIP, you start to see more of an effect of a good fastball. If you try to predict BABIP next season using only this season’s BABIP, and then try to do so with BABIP and velocity, you can create a clearer picture:

BABIP_next = .238 + .191*BABIP

BABIP_next = .283 + .191*BABIP – .00050*velocity

Velocity helps predict next season’s BABIP pretty well, though this effect is somewhat minimized when considering the effect on other variables.

The rate of home runs per fly ball is another metric that is mostly determined by luck but incorporates some skill as well. Velocity actually corresponds well with a decreased rate of home runs per fly ball, even in the same season.

Running a regression of home runs per fly ball while incorporating peripherals with interactions, season, year, age, and role, we will still get a coefficient of -.00066 on velocity. This means that a pitcher who gives up 3.0-mph in velocity will yield one fewer home run every 500 fly balls. It’s not a big deal, but it’s statistically significant.

It also matters because the coefficient only goes down to -.00063 when changing the dependent variable to next year’s home run-per-fly ball rate. The skill is something that shines through over time, revealing an ability to get hitters out that gets behind the luck mashed in with other statistics.

However, if we simply check how much velocity adds to HR/FB itself in predicting next year’s HR/FB rate, we can see that:

(HR/FB%)_next = 8.39 + .186*(HR/FB)

(HR/FB%)_next = 20.17 + .173*(HR/FB) – .129*velocity

Knowing velocity is important for this as well.

Velocity is an even bigger deal than we thought, and Mayne hit the nail on the head. Not only do pitchers who throw faster succeed more often, but they improve more as well. It foretells a higher strikeout rate, lower BABIP, fewer home runs per fly ball, and a subsequently lower ERA than other pitchers with similar yearly statistics.

Incorporating velocity into projection systems would appear to be not only a useful tool, but perhaps a pivotal one in better understanding the importance getting the ball to the batter sooner has on getting him out.

Nate said...

This is certainly not unique to this particular article, but correct me if I’m wrong – most of the excel-based linear regression calculations of p-values and the like are a simple H0 v. Ha hypothesis test, right? So a p-value only tells you an effect is non-zero, not how big or small an effect is, right?

I.e., you would need to re-run these tests against a different null hypothesis (something other than zero) before making those size-of-effect comparisons, and at the very least you need confidence intervals to get a real look at how much these effects overlap. I also wonder why as you’re adding variables to you regression, you’re not giving adjusted r-squared figures to demonstrate to what degree the knowledge of velocity improves the model. It seems pretty obvious that there’s a boatload of colinearity with some of these variables (velocity is tied to K%, BB% and K% are likely both tied to pitches per at-bat, etc.), so it seems like I need to directly compare the fit of those two models before getting excited about how much knowledge of velocity improves it. And even with that knowledge, knowing it is improved and a p-value < .001 just means the improvement of the model is non-random, not that I now know that velocity had this or that size of effect on particular parameters (and certainly not that it will have the same effect going forward).

Sorry if that’s overly nit-picky, because I do think this is a pretty nifty analysis, genius especially since I think you probably have the figures to better assert what you have here. I’ll leave staggering when I see what the actual range of velocity effects likely is.

John R. Mayne said...

A brief thought on ideas and execution.

I think I read on a game development blog (probably Sarah Northway’s) that there’s an axiom that “Ideas are worthless.”

I don’t have the tools to execute the ideas, which means that I often can’t prove the ideas, which means they are worthless. I’m sure scores of people inclusive of me developed something like support-neutral win-loss – but Michael Wolverton did the legwork. He won the internet, and rightly so.

(Or, on Big Bang Theory last night, dude has an idea for 3-D glasses for all television. How does it work? “I don’t know. You’re the nerd. You figure it out.”)

On the specifics:

This is a particularly interesting article to me (I’m fascinated by this, obviously, and I’m extremely flattered to be mentioned). I’m interested to see where this goes.

I would note that I think you’ll get even higher effects for velocity if you filter by handedness (velo’s more important for righties, is my worthless idea) and by rate of fastball use.

If I have this right, we end up with:

220 IP, 3.70 ERA, 155K 65BB. Steve throws 94, while Joe throws 89. What are our expectations of Steve and Joe? If I’m doing this right, the ERA expectation of Steve is 0.38 better. SIERA’s going to be a lot less.

The fact that the SIERA change is so much lower (less than half!) seems to me to mean something important. For there to be a residue other than statistical noise means there’s a real skill difference between SIERA skill and ERA skill, right? Or am I mistaken?

Matt Swartz said...

@Nate:

Thanks for your comment, Nate. I think you might be mixing up a couple of concepts. These aren’t H0 vs. Ha p-tests, really, though the regression output does provide p-stats for the coefficient that do test H0=0. The coefficients are the center of a confidence interval of approximate effect, but you could run tests if I had reported the standard errors. When the p-stat is =.001 exactly (and many of these are <.001), then the bottom of the confience interval is 63% of the way to 0. (So if the coefficient is 1.00 and the p=.001, then the 95% confidence interval would be 0.37 to 1.63 if I did that correctly.) These are all very significant unless reported otherwise.

I didn’t report adjusted R^2 (I can), but when you add a second independent variable to an equation and both p-stats are <.001, then the adjusted R^2 goes way up, in general. I can reported adjusted R^2 later though when I dig up my log files.

The “boatload of colinearity” that you refer to is definitely not true. I think you are mistaking multicollinearity with correlation, which is a common mistake that people make. Correlation between two independent variables is why regression is useful in the first place. There is no use for regression if there is ZERO correlation between independent variables. Regression coefficients answer the question of “what is the effect of X2 on Y, holding X1 constant?” If those were uncorrelated, you could run separate regressions without any difference in information. The point is that BECAUSE K% AND VELOCITY ARE CORRELATED, you run a regression to determine the relative importance of each. However, when the two are so highly correlated that it is difficult to isolate effects, then you have multicollinearity. That’s not the issue here, though.

I can throw in the adj R^2 and confidence intervals later tonight for those interested.

@John:

Thanks again for the idea. Not a worthless one to me!

I’m not 100% sure what you’re asking with the Steve & Joe example. I’d need to re-run something with controls for ERA, BB, and K, but I know with SIERA, the coefficient on velocity was -.034, so a 5mph difference would be 0.17 difference in ERA, holding SIERA constant.

The reason that the coefficient on velocity was lower with SIERA than with ERA was that a season of SIERA is a better indicator of pitching skill than a season of ERA. There are plenty of lucky pitchers who aren’t very good, had low ERAs, high SIERAs, and throw slowly. Knowing SIERA means that knowing velocity won’t give you that much more information. However, knowing ERA only in this case means that knowing velocity will give you much more information. ERA wasn’t as teling for next year’s ERA as SIERA was, so velocity was more useful.

Nate said...

Thanks for the clarifications – a few comments/questions, if you don’t mind. I enjoyed that you even included the p-values at all, as I’ve read a lot of articles that just drop the regression equations and a R-value as if that’s all there was to it. Hopefully these aren’t too stat-nerdy:

1. I guess I find that info about the confidence interval more helpful than an isolated p-value. Taking your example from above about a 2.9 mile per hour increase* leading to a .10 decrease in future ERA (with a p of .001), it seems as though with the confidence interval we are 95% sure** that a 2.9 mile per hour advantage will lead from a .037 to a .163 advantage in future ERA. Is that right? The obvious point being that a .037 difference isn’t as exciting while .163 would be crazy. And I would think if you were a little Bayesian about this and thought it seemed reasonable that velocity had an effect – albeit a small one (say, .04) – if you had run the test using Ho=0.04 (or whatever the corresponding coefficient in the regression would be), now your data is not significant and wouldn’t change your mind. Again, that’s pretty much always the case, but that’s all the more reason to speak in terms of confidence intervals … yes?

What about incorporating the uncertainty on both the SIERA coefficient and the velocity, or is the SIERA confidence interval small enough that it doesn’t really affect things?

* – Strictly speaking, “2.9 mph increase” is the wrong term – it’s “having a 2.9 mph faster hour fastball than the guy next to you who shares your SIERA.” Maybe you could look at a delta fastball speed (say, september to may, or the two previous years?), but it doesn’t sound like this study can comment on the pitcher who attempts to add velocity – or am I totally off on that?

** – Again strictly speaking, since the p-values come from frequentist conceptions, it’s not that we are 95% confident in this particular result, it’s that a studies that use this methodology would catch the true value in the interval 95% of the time. People use these interchangeably, but they’re not exactly the same.

2. Sorry about the Ho-Ha, you’re right that it’s just a Ho=0 test, and p under a pre-defined value gives significance and indicates that Ho is to be rejected. Still, I was under the impression that controlling the one variable while testing to see if the other is effectively random (not correlated) is how you get the actual p-value and confidence intervals, right? And it’s interesting how the constant and SIERA coefficient get changed with the additional info of velocity. Introducing the new variable could drop the p-values of both coefficients, correct? Which I think is why having those adjusted r-squares can be nice, because it’s a clean indication that the new info improves the model overall without checking whether the increased knowledge actually widens the error range on the original variable’s coefficient.

3. And right on, I get that multicollinearity is the extreme of correlation. Sorry if I used the term sloppily – how do you before-the-fact figure out the threshold for “too correlated?” For example, it looks like the information of velocity is embedded enough in strikeout rate that having the velocity information doesn’t improve predictions for future walk rate … but it’s not too correlated in the second case because the additional info still helps K-rate. Am I thinking about that right? I guess what I am saying is sure, being perfectly correlated would make thing collinear and it would be entirely useless to have both pieces of information, but it looks like it doesn’t have to be perfectly correlated to be “too correlated,” at least in some contexts.

4. While I agree that a large part of the purpose of multiple regression is to navigate correlated independent variables, I don’t see why they *must* be correlated. Example off the top of my head – if I have data set that includes a random number of dimes in my pocket, an independently random number of quarters, a random (but unrecorded) debt to a friend ranging from -3 to 3 cents (to eliminate algebra as a simple solution), and a total number of cents I have, I can still use multiple regression to figure out the value of a dime and a quarter, yes? I can’t do them independently because knowing just the number of quarters or dimes alone in a case would not be adequately predictive of the final amount.

A wacky example, but I have to imagine that there are independent factors in baseball that can be predictive of a third factor while not necessarily predictive of one another.

thanks for your response! And again, kudos for actually addressing the underlying data.

Matt Swartz said...

Confidence intervals & R^2 (Stata didn’t reported Adjusted R^2 for these, but the increases in R^2 are sufficient to see the difference)

For ERA on ERA & velo, the velo confidence interval was (-.088,-.057) and the increase in R^2 by including velo was .1072 to .1389.

For ERA on SIERA & velo, the velo confidence interval was (-.049,-.019) with R^2 increasing from .1850 to .1911.

For K% on K% & velo, the velo CI was (.0018,.0029) with R^2 going from .5629 to .5770.

For BB% on BB% & velo, velo CI: (.0000349,.0005571) with R^2 going from .3859 to .3871. (the p-stat was .026 there)

For BABIP the velo CI was (-.000927,-.000078) with R^2 up from .0319 to .0345.

For HR/FB the velo CI was (-.001774,-.000804) w/ R^2 going from .0177 to .0446!

John R. Mayne said...

Many of my fantasy leaguemates are now reading this. Can we screen them out?

(Seriously, Matt, thanks for the followup. Having the degree of the effect quantified is huge, and I don’t think this has been done before. A big step forward – and while my ideas were surely worth something, they weren’t worth as much as the execution.)

—JRM

Colleague: This will just inflate your ego.

JRM: How is that possible?

Jack Thomas said...

Matt interesting article. You guys are way above in the math—too many years since college.

I do not understand the statement “This means that a pitcher who gives up 3.0-mph in velocity will yield one fewer home run every 500 fly balls” Does this mean a pitcher with lower velocity gives up less HRs per flyball than high velocity pitcher? It seems to be opposite of the rest of the article—Velocity is good.

Matt Swartz said...

@Jack

I meant to say that a pitcher with higher velocity gives up fewer home runs. Sorry about that.

@Nyet

The R^2=.5 benchmark is random, and it doesn’t really matter here. There is a lot of variation in pitching outcomes that are random but have larger influences, but that only serves to make it more important to have a fundamental understanding of what matters over time. One season of data may induce an R^2 of less than .5, but a few seasons will beat .5. Half a season will do even less. The .5 level is not unique—income regressions generally have R^2 of .3 or so, but that doesn’t mean that knowing education, experience, age, sex, race, etc., aren’t important; it just means that 70% of income can’t be explained using those variables alone.

Nyet Jones said...

I disagree that .5 is “random;” you probably mean arbitrary, but it seems sensible that you would want to have a goodness of fit such that more variance is explained by the model than left unexplained. That doesn’t seem arbitrary to me, but perhaps you mean something else.

And I didn’t say it wasn’t important, I said it wasn’t useful. If I am trying to project forward for two players who are equal in all respects but then learn that the model with velocity explains 19% of the variation instead of 18.5, there is an exceedingly small chance (rough calc, 1 in 163? and that’s i you make it solely on a velocity difference and the rest of the variables were equal) that any future difference between those two players would be actually due to that velocity difference. And sure, the chance would increase with multiple trials, but that’s why it’s necessary to know after how many trials you would be likely to come out ahead using this as a decision guide.

(Eg, if I knew a coin flipped heads 50.001% of the time, sure, it’s correct to bet heads because it will pay off more over the long haul. But it’s going to take a huge number of flips before I’m likely to see a return on investment that isn’t due to chance – and the main point is that I’m better off playing some other game if all I have is that info about coins. Translation – stop worrying about pitcher velocities and start looking for better explanatory models or other variables of run control that are more likely to see a return in the career-span of these players / my job as a GM or what have you).

This is the problem with an “all things being equal” evaluation. Things will rarely actually be equal between two players in all respects except velocity, and then they will only be equal in the respects that are currently considered as independent variables. You are right that even a small jump in R^2 is worth considering over a large haul and mounds of data, but decisions are not really being made across those large mounds of data or long hauls. It would be one thing if our model explained, say, 30% of the variance, and we knew the other 70% was just plain error. Then you would have to pull the “what are you gonna do?” and place the smart bet even though you would be aware that error is more like to drive the outcome than the model. But you don’t *really* know that, so when your model explains a relatively small amount of the variance, you should try to figure out what, if anything, is driving more of it.

All of which is to say that putting these findings in context – velocity has an impact, but it is small and the model has a poor fit – is key to understanding what a sabermetric analysis actually gets you. It’s cool that we have some evidence that velocity matters in this way, but it’s less clear that this info is actually practically useful.

Matt Swartz said...

Nyet, the goal is not always to increase the R^2. In graduate school, I was taught to stop focusing on R^2 much at all. If I wanted to increase my R^2, I could do it by throwing in a host of other variables. In unreported regressions to check for the robustness of my conclusions, I included variables like age that increased the R^2 somewhat, and I could have included park effects, quality of a team’s defense, etc. And if I wanted to study other aspects of pitching (something I have done in other articles and others have done in many other articles), that would be great. It doesn’t make this conclusion useless. Since Voros McCracken discovered DIPS, there have mostly been incremental improvements in understanding pitching. This is just one such study. In any study, the amount of randomness that goes into a single season of data is such that you’re never going to smell an R^2 of .5.

These other variables that we’re talking about do not change the importance of the model unless they are correlated with the regressors I’m using. I’ve checked that already. It’s not an issue. The effect is what the effect is, and if I omit variables that could increase the R^2 but would not change the coefficient on velocity, then all I’m doing is wasting time (or not studying velocity).

Nyet Jones said...

Thanks for the explanation. I was under the impression that to counter the increased R^2 by just adding adding variables one should use the adjusted R^2 to evaluate the fit. But your explanation about those other variables not being correlated to these variables on velocity does make it sound like you could just leave them embedded in the “error” and not worry about them messing up your coefficients for velocity and the like.

Still, I didn’t say (or at least didn’t mean to say) that the conclusion was useless; I said it was not (practically) useful. Looks like it’s very useful for studying the impact of velocity, a sort of knowledge for knowledge’s sake, as you state. But when will that impact be not just significant (ha), but important? It appears that I need a bunch of factors to be equal (which would rarely happen), and I need the cumulative effects of all the other variables that you did not include to not drown out the effect that velocity would have. So my real concern is how often those circumstances occur such that this info about the impact of velocity would make a real impact on my decision? I just don’t know based on what’s been stated here. A 3 mph per .10 of ERA impact with all-things equal just don’t sound like much because it seems like the differences in the non-velocity variables would have impacts that would almost always drown out this effect.

And this:

“In any study, the amount of randomness that goes into a single season of data is such that you’re never going to smell an R^2 of .5.”

is thoroughly true not just on the data analysis side but on the forecasting side. If a single year’s data has more variance explained by randomness than by known factors, then using the resulting model is going to miss a lot the next year. So why not be transparent about this – why not report the prediction as a range that incorporates all of the uncertainty due to error? I’m guessing it’s because it would make it clear that these predictions don’t really separate people because their ranges overlap. In a scientific study, you’d have overlapping CI’s and be hard-pressed to establish a difference between two players, but here we at least strongly imply that we should *expect* a .10 difference in ERA if there’s a 3 mph difference in the past year. And we should expect a .10 difference all ceteris paribus, but why would we *expect* that error is a thing we can hold equal?

Sorry if I sound combative; I’m just really interested in the actual, real world meaning of a significant statistical finding.I’m not sure, but in my forays into sabermetrics I don’t often see CI’s or admission or the relative utility of different findings. Some of the characteristics of baseball – limited data, limited careers, etc. – may just mean that even if a factor comes out in the aggregate over several years of data, it will never be useful in a single year because the natural variance makes it unseeable. We know this – we don’t think a seven game series really determines the better team. So why think these predictive metrics will work?

Nyet Jones said...

Alright, forgive me if I’m misunderstanding … but for every regression you’ve listed here except For K% on K% & velo, the R^2 is under (and often well under) .5. So in all of those cases, more of the variation is explained by error and/or hidden variables than the explained by variables accounted for in the model?

Am I missing something? I just don’t understand how those models are useful if they don’t even get at half of the variance. And I don’t get how knowing the velocity helps if a difference is more explained by error than velocity. I thought that in “all things being equal,” the all can’t include error because you would never be able to know that the error was equal.

Matt Swartz said...

Let me try this another way. Teams on the free agent market behave as though one run in expectation is worth about $500K. If we know that a given starting pitcher is worth 20 runs above replacement +/- 20 runs in a given season, they will pay about $10MM for his services, knowing that it may be worth anywhere from $0-20MM to them.

If this is a SP expecting 180 IP, then the difference in between .10 of ERA is about two runs. So take a pitcher who appears to be worth about 20 runs above replacement looking at his statistics. Now we know that if we have look at his velocity and it’s 92mph, maybe it’s really 21 runs +/- 20 runs, and if it’s 89mph, then it’s 19 runs +/- 20 runs. So his value is either $10.5MM or $9.5MM in expected value based on what we know about his velocity.

There is no profession on the planet where learning about a million dollars of variance in expected marginal revenue product isn’t useful. I’m not even saying we definitely have that here, but that starts to sound very relevant.

Nyet Jones said...

Thanks again for your explanation; I hope your “let me try this another way” qualifier doesn’t mean you think I’m daft. I completely get that is the standard interpretation, sure. I understand that, and I understand why; it’s the “what are you gonna do, might as well take the bet that would pay off in the long run” response. Which is a perfectly sensible long-range multiple iteration game theoretical approach to the situation. I even understand that descriptively, that’s how actors on the free agent market behave.

What am I asking is whether that makes sense given that these decisions are typically made in non-iterative, widely error-prone outcome spaces. Or if it’s an across the board application of a policy based on this velocity finding, how many times do you need to apply it before it is reasonably guaranteed to pay off?

An example: I offer you a game of 5 coin flips where if you get more heads than tails you win $100 or lose it otherwise. I offer you two coins to use – one which you are 95% confident has a p(heads) from .48-.52, and another you are 95% confident has a p(heads) from .50-.54. We are only playing the game once.

Now – *fair enough*, if we were paying the game some huge amount of times and we were resetting the coins every time, the smart thing would be to pick the latter; the distribution would mean it would pay off reliably over the long haul. But we’re not doing that; we’re playing once. (And actually, even if you picked the second coin and we played an infinite number of times but didn’t reset the coins, you could presumably infer what its actual p(heads) is – given infinite flips and all – but unless it comes out .52 or higher, you can never know whether you picked the correct coin).

When we are only playing once, these typically frequentist interpretations are nonsensical. Coin one has whatever p(heads) it has as does coin 2; it won’t pay off in the long run to take the higher-but-overlapping range because there is no long run and we’re not resetting the coin. These coins are, by standard issue matters of interpreting overlapping confidence intervals, not different enough to warrant spending more on the second coin. In any medical study, if treatment A had a “higher interval” but overlapped treatment B by 50%, you would absolutely not be warranted in claiming that everyone should take treatment A. That might make sense if I only care about the population outcome, but you would be harming a fair percentage of people with approach as they are individuals, not the population. That’s the whole point of showing the confidence interval – why is baseball different?

This is the case when the coins overlap by 50%; in your 19 or 21 plus or minus 20 example, those ranges overlap by 40 out of 44 or ~91%. Without crunching through it, the faster pitcher will be superior about 9.8% of the time and better by chance 45% of the time. So you are paying a million extra dollars for a player who has a 45% chance of being worse. Again, this makes sense over the long haul; I don’t know if the correctly crunched value is $1 million, but it would be beneficial to flip with the 55% coin, so to speak. But again, this isn’t the long haul; this is a single game. There’s a big risk that you are paying MORE for the worse outcome. That risk gets attenuated by multiple iterations, and at some point the very likely benefit will be guaranteed to warrant the cost. But – how many iterations is it?

I do appreciate your efforts to explain this, and I think your last example makes more sense in terms of trying to value someone rather than trying to choose between two players. But the smaller the impact of a given factor – i.e. the more the confidence intervals overlap – the more iterations it’s going to take before a policy adjustment is warranted. You don’t have to respond to this unless you want to explain why it makes sense to automatically pay more to a pitcher who overlaps to a large extent within another whereas such a finding in a medical study wouldn’t warrant a conclusion that the two cohorts are different at all. Is it because we only care about the long haul in baseball? In which case I would think that the finite number of decisions that are actually made in the space would warrant reconsideration of treating players differently (depending on your tolerance for risk, I suppose).

evo34 said...

I like the analysis here, but I’d have to agree with Nyet that the impact of velocity data on ERA prediction accuracy is smaller, not larger, than expected. A 1 mph difference in fastball velocity is only worth 0.03 rpg of skill that isn’t already being captured by peripherals? If I’m a GM, that tells me that I need to focus more my statistical models and less on velocity.

That said, intuitively I do value velocity in pitching prospects. I would be interested to see if velocity has a larger effect on long-term player projections (rather than just the very next season).