Extra Innings: a reply

Any press, as they say, is good press.

Recently I had a chance to get my hands on the new Baseball Prospectus book, Extra Innings, which—like their previous book, Baseball Between the Numbers—has a series of chapters, each of which dealing with a central question.

This isn’t a book review, just an analysis of one chapter. Veteran BP writer (and recent Bleacher Report addition) Steven Goldman penned a chapter titled “How can we evaluate managers?” This really gets my attention, as I wrote a book titled Evaluating Baseball’s Managers. The similarity in titles between my book and his chapter isn’t a simple coincidence. This chapter is largely a response to my book.

The short version is that Goldman disagrees with much of my book. Though he states that I wrote “a fine book” that does an excellent job depicting the characteristics of the various managers it profiles, there is a fundamental disagreement between my book and his chapter. Ultimately, I wrote a book titled Evaluating Baseball’s Managers, and Goldman is quite a bit more pessimistic about our ability to do just that, evaluate baseball managers. So yeah, there are points of contention.

Criticisms I agree with

I’ll be the first to admit Goldman makes some valid critical points. Three in particular should be noted. First, Goldman notes that I misconstrued the conclusion made by former Prospectus writer (and current Tampa Bay Rays brain) James Click in a study he did for the Prospectus book, Baseball Between the Numbers, that examined a managers’ ability to improve individual batter performance. I stated that Click says managers have no impact on it.

Goldman corrects me, saying that Click’s study couldn’t show managers affected player performance, but that’s not the same thing as saying that it doesn’t exist. This distinction has been noted in sabermetrics since at least as far back as Bill James’ understanding the fog, but I botched it.

That said, looking at it again, I notice that Click slipped a bit and also said, “managers show no consistent ability to improve batter performance.” Yeah, his main conclusion is what Goldman states, but that line sounds more like it doesn’t exist than we can’t prove it exists.

Second, Goldman justifiably calls me out on some mistakes made in my Casey Stengel section. I argued that Stengel’s Yankees were very unsentimental in their treatment of aging players, moving them out once they could. However, Goldman notes that some of my examples made of this were actually made despite, not because, of Stengel. Fair enough, good point. Should I ever update the book, I’ll remember this.

Third, Goldman notes one thing I all but state on my own in my book: I engage in confirmation bias. I entered this study with an assumption that managers have an impact on player performance and present some (admittedly not conclusive) statistical evidence of it based on some data called the Birnbaum Database.

Goldman quotes me stating, “I do not believe in limiting myself to mathematical rationales. This evidence [the Birnbaum Database] beautifully corresponds to long-lasting and widely held notions that managers can and do have an impact on player performance. I therefore accept it.”

Goldman flatly states: “This is textbook definition of confirmation bias; Jaffe excepts (sic) the results because they conform his beliefs, not because they are illustrative of reality.” I plead guilty. But Goldman presses his point. It ain’t either or. The results could both conform to my beliefs and be illustrative of reality.

Less surprisingly, I have areas of disagreement with Goldman. Let’s look at some main issues/themes of disagreement.

Sample size

Many of the disputes between myself and Goldman can be seen as micro versus macro. Some of the arguments Goldman makes against my book made a lot of sense on the micro level. But the problem is, my book was trying to focus more on the macro level.

Let’s look at probably the most important dispute, when Goldman takes on the Birnbaum Database. First, let’s back up and explain the Birnbaum Database, which inspired my book. It’s an attempt to project how a player should have performed in any given season based on his real-life performances in the two preceding and two succeeding years.

Goldman criticizes this, and his criticism has a core of truth, especially on the micro level. He is absolutely right when he writes “players exceed or fall below projections for many reasons: injury, a happy new marriage, a nasty divorce, a taste of one of the magical elixirs in the PED cabinet. None of these elements reflect in any way on the manager.”

Any time you’re looking at just one player, any variation in projection tells you about the player—and virtually nothing about the manager. How could it? There are too many other factors muddying the waters; factors including injuries, divorces, marriages, and magic elixirs.

A Hardball Times Update
Goodbye for now.

But I noted that in my book. Heck, before daring to introduce any results from the Birnbaum Database, I spend several pages discussing its flaws and limitations. A lot of the issues, especially the ones Goldman raised, can be minimized (albeit never perfectly solved) using sample size.

Let’s look at Earl Weaver. The Birnbaum Database rates his pitchers at +409 runs better than expected and his hitters were +183 runs. That places him among the best managers in history. This is a man who lasted about 2,500 games as a manager. Does it really sound reasonable to presume that his players were just that much more happily married than all opposing clubs? Doubtful. Was 1970s Baltimore some early and unknown haven for PED usage? Color me skeptical.

Sample size, it comes in handy. There’s a reason why the book focuses on managers with longer careers. They’re the ones where the numbers are less distorted by outside factors.

Look, these outside factors Goldman notes still distort things. The numbers on Weaver aren’t perfect. Nor are they for Tony La Russa or John McGraw or anyone. My book never claims they’re perfect. It’s just that imperfect isn’t a synonym for useless. Goldman’s comments are valid with regard to individual players, but the larger the sample size, the less valid those comments becomes.

For that matter, even with the increased sample sizes of a larger career, there are still times I have serious reservations with regards, to the numbers. In fact, most of the managerial commentary on Don Zimmer and Terry Francona focuses on how I disagree with the Birnbaum Database.

One last note along these lines. Goldman also throws in injuries as a sign of how players might miss their projections. Yeah, on the micro level, that’s true. But what happens if, say, a manager has a whole ton of pitchers go down in injured tatters on his watch? Doesn’t that tell us something about a manager? If not, a hell of a lot of people owe Dusty Baker (among other managers) their apologies. For that matter, if a manager has a track record of players staying healthier than normal under his watch, that can be a sign he knows how to take care of them.

Mind you, the Birnbaum Database isn’t the only time sample size issues emerge. Near the end of the chapter, Goldman presents several examples of how it’s difficult to evaluate managers, and much of this is also dependent on small sample sizes. Goldman mentions how former Orioles manager Hank Bauer once engineered a trade for Billy Williams only to be vetoed by his bosses. Goldman also mentions Earl Weaver and one year his players took control of in-game management on their own—and the team got better.

Look, if you start at the micro level trying to analyze each specific decision or even each individual season, you’ll go crazy trying to evaluate managers. You have to take the long view. I think getting the bigger picture first works better. Then, once you find trends, you can use specific examples to characterize the manager, but don’t start with specifics. Then you’ll never see the forest for the trees.

Department of huh?

Sometimes I didn’t quite get what the criticism was, or that it read like a criticism but I didn’t see how it was.

Let’s give an example, going back to what Goldman says about Click and me. Some background: Click wrote a chapter on managers in Baseball Between the Numbers that featured a one-page study I mentioned earlier on a manager’s impact on players. The rest of the chapter focuses on other matters such as baseball strategy and in-game decisions.

First, Goldman quotes my objections to Click’s study and then offers his response. After noting that I misstated Click’s conclusion Goldman writes:

The key to making an argument about any subject, be it managers or murderers, is to present evidence. In the case of managers, a skipper who possessed the skill to consistently alter some attribute of his clubs would manifest that ability consistently and in a way we could document. While many managers do tend to mold their teams in certain characteristic ways over time (something that Jaffe’s book excels in demonstrating), there is no evidence that over time managers can have more than a small positive influence on the outcome of a given contest or season in terms of his on-field impact, be it through his tactical choices or in some Svengali-like effect that hypnotizes batters or pitchers to perform in a way that was dramatically different than they might have otherwise. This is distinct from how a manager might positively influence a club through the way he shapes the work environment or psychologically affects certain players, but these are impossible to pin down statistically.

There’s a lot in that quote that made me scratch my head. First—and this is easy to miss—the focus actually shifts at the outset. Please remember that just before this section, Goldman was recounting my dispute with Click’s study on managerial impact on individual batters. Here, immediately after implying I lack evidence (more on that in a second), Goldman talks about in-game strategy (“his tactical choices”).

Wait, the issue I had with Click’s study wasn’t about tactics, but about impact on individual batters. That’s coaching, not tactics. There’s a lot on tactics in Click’s chapter, but that wasn’t what the dispute was about.

Well, that bit about tactical choices was just a clause in a longer sentence. Yeah, but the rest of the sentence—the part about “some Svengali-like effect”—left me wondering, too. Specifically, it seemed like the very next sentence contradicted it. When I read the Svengali line, I figured Goldman was doubting that managers can have any meaningful impact on his players behind the scenes, either in terms of coaching or psychologically. However, Goldman’s very next sentence says managers can have a positive psychological impact.

But there’s still one big “huh?” left from near the top of the quote. What’s this about lacking evidence? Actually, I provided some evidence that the Birnbaum Database shows managers can impact player performance. I’ll be the first to admit it’s not conclusive, but proof and evidence are different words.

I divided all baseball games into those managed by men who lasted 2,000 or more games, 1,000-1,999 games, 500-999 games, or 499 or fewer games. Then I ran those four groups through the Birnbaum Database and saw if the results looked more like luck or managerial skill. If they were luck, you’d expect the managers who lasted over 2,000 games to have the most average score. After all, luck should even out over time, and a minimum 2,000 games is a long time. If it’s skill, you’d expect the 2,000s to be best. The results provided evidence of managerial skill to improve (or worsen) player performance.

So, immediately after discussing my dispute with Click, Goldman 1) says I don’t provide evidence (even though I do have some non-conclusive evidence), 2) brings up in-game tactics (which aren’t related to the debate), 3) doesn’t seem to think managers meaningfully help players when it comes to the human side of managing, but 4) then says they can meaningfully help individual players when it comes to the human side of managing. Huh?

There are other parts that just struck me as off. In fact, the time Goldman first mentioned my book seemed a bit off. Goldman said I wrote my book in response to Click’s Prospectus work.

Wait, what? That’s news to me. Phil Birnbaum first presented his work at a 2005 SABR convention, and that was my big inspiration, not Click. I’d already finished my first wave of research before coming across Click. Frankly, Click’s study was always something of a weird adjunct upon work I’d already done. I knew I had to come to discuss it, but it was off my focus.

There’s one last bit dealing with Click. (Maybe this is too much on Click, but since Goldman implies Click inspired my book, I feel the need to clarify some points). In my book, I noted Click’s study and commented: “I have an admitted bias: I believe managers matter. To convince me otherwise would take more than an equation, no matter how brilliant its math. I need a clear and coherent argument based on thoughts instead of double regression studies and metrics. It takes words, not numbers, to convince me otherwise.”

Goldman calls me out on this, saying “Jaffe tries to have it both ways, insisting that managers are about human interactions and not equations, but then offering his own equations in defense of managers.”

Here’s the thing: I wasn’t opposed to a mathematical formula for managers, just that I’d like more than just a formula by itself. Click just spent a page presenting his study, stating the results from his R-squared test and moving on. If you’re going to convince that Click’s right, I need more than the math. It’s not either/or. It’s not math or an explanation, but math and an explanation. Click only gives the math, so I have trouble being convinced.

In my work, I tried to have both: give numbers, and give numbers I could explain and understand. To be fair, reading what Goldman quotes makes it sound like I was saying its either/or.

There are other points I could mention that don’t deal with the Click section, but some of it is ticky-tack stuff, and this section has gone on long enough. The Click stuff had me the most bewildered.

Dealing with uncertainty

One theme I found particularly jolting and that caught me off guard is Goldman’s belief that managers do have an impact on the way players perform. He even goes so far as to flatly state: “The human element of managing is everything” (italics in the original).

First, I agree. Second, the main appeal of the Birnbaum Database for me was that it put some (imperfect) numbers on the above. But that’s why I was so caught off guard. Steve Goldman is not the first person to criticize my work or methodology. But most others who do it would never write the sentence above about the human element.

I’ve come across two main responses to my book. Some refuse to belief the human element matters unless I can offer inarguable proof it does. I can’t, so they have no interest in my work. Others agree that it matters, and those people generally go along with the Birnbaum Database. With Goldman, we have someone who agrees that the human side of managing is vital but wants nothing to do with the Birnbaum Database.

Ultimately, I think it boils down to how much uncertainty you’ll tolerate. You can never perfectly isolate managers, but I think the Birnbaum Database gives a good idea of what impact they have (given a large enough sample size).

Look, Steve Goldman is a terrific writer who has consistently done excellent work for years. He makes some good points, but ultimately I still stand behind Evaluating Baseball’s Managers.

References & Resources
Steve Goldman’s chapter, “How Can We Evaluate Managers,” is on pages 232-256 of Extra Innings.

James Click’s chapter, “Is Joe Torre a Hal of Fame Manager,” runs from pages 139-156 in Baseball Between the Numbers. Click’s study, “Improving Individual Batter Performance,” is on pages 152-53 of that chapter.

Almost all the information from my book, Evaluating Baseball’s Managers, that Goldman disputes comes from the first chapter. That’s where the Birnbaum Database is debuted and where I mention Click. The parts on Casey Stengel come in Chapter Seven, on pages 190 to 195.


7 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
bucdaddy
11 years ago

how each boss handles tough situations increases or decreases the productivity of her or his workers.
—-
This may be true for normal human beings, but I dunno about this at the MLB level, or at the level of any elite athlete. These guys are among the 1,000 best in their professions IN THE WORLD. I like to think (without any evidence to support it) that if these athletes were that easily “distracted,” to use the term the media just love, they wouldn’t be playing long in professional ball. Guys who let outside events influence their performance get, I hope, weeded out at about the Class A level. Top pro athletes are among the most focused people I’ve ever met. I think the ability to leave the outside issues outside the park and concentrate on playing ball for three or four hours a day has to be a character trait of the top athletes.

Obviously, everyone can point to a handful of players who weren’t up to the task. I’m speaking generally, and I would also add that happy athletes don’t always make the best teams. Your early ‘70s A’s and your Reggie-era Yankees are notable for their clubhouse wars, as well as their world championships.

TC
11 years ago

I have always been interested in the way fans understand the job of managing or coaching. One of the most common examples of this is in reading fan evaluations of the job a manager has done tactically. He should have hit and run here or let him swing away there. He should be playing this player in this spot or not. The assumption by the fan is that the manager apparently is too stupid to know the basics that he, the fan knows and thus is proof that the manager should be fired. This usually accompanies a team that is struggling and fans blaming the manager is the first remedy, at least in their minds.
But this kind of surface criticism is typical. For the White Sox last year, the fans had a field day. Why is Ozzie playing the bum, get him out of there. Look at the rookie, or the guy down on the farm. No amount of logic will assuage the anger. Adam Dunn collapsed. We could all see that but the Monday morning quaterbacking was ridiculous. It is not uncommon for power hitters to go through power outtages, even for extensive periods of time. The problem with power hitters is that unless there is something obvious, like an injury or substance abuse, all you can do is keep playing them because in most cases they will return to form. So Ozzie did the only thing you can do and got vilified for it. Now I’m not saying it was the only thing he got vilified for, but the fan commentary on the subject was ludicrous and demonstrates once again the lack of understanding of people who have never played the game or managed anything.
I don’t know of any statistical way to evaluate what a manager or coach does. Having played and coached the game it is my view that what goes on between the lines is infinitesimally less important than what goes on outside of them. But because the team fails or succeeds frequently has nothing to do with the manager. Who chose the players? Usually, not the manager. If you don’t have the horses, it doesn’t matter how good at player relations you are, your team isn’t going to win. I think a classic example of this is in another sport, basketball. Was Phil Jackson the greatest coach ever or was he a guy who was really good at getting the most advantageous jobs?
In any case, he could have seriously screwed up those jobs and because he didn’t he deserves credit. Clearly he had a style of coaching that was effective. And having Michael Jordan and Kobe Bryant didn’t hurt either. But Doug Collins had Jordan and didn’t win a championship? Was it just the timing? Jordan wasn’t quite ready to lead a team that far? Impossible to say. But what anyone who has ever played a sport should know is how far you’ll go, how hard you’ll work, how dedicated you’ll be when you have the correct form of motivation that works for you. Stating through any statistical measurement that Phil Jackson is the greatest is for people who need certainties about uncertain concepts.
Take Yankee managers, especially the ones prior to the institute of the draft. When the amateur ballplayer was able to sign with whatever team they wanted, the best frequently chose the Yankees. They had the most money and the greatest rep. But after the draft, what a collapse. Were the manager suddenly more stupid. Your average fan would say yes and did so repeatedly. Suddenly there is free agency, once again Yankee wealth reigned supreme and look who’s back. Where did the managers fall in all of this? They got fired and hired with a predictability that all of the other teams in baseball had been experiencing for decades. So how good was Stengel? Measure that.
To bucdaddy I would say, human beings are human beings. Doctors have slumps, lawyers have slumps so why wouldn’t a professional athlete? To assume they should be above that is, I’m sorry bucdaddy, but just plain naive. What, they’re getting a big paycheck so they aren’t supposed to have issues?
A good manager/coach recognizes this and adjusts his motivations to fit the current situation. But how are we, who are not in the clubhouse and without this personal information to know whether that manager/coach is handling these situations properly? No statistics will tell, no matter how big the sample, because no two situations are the same, and I am speaking about total team situations.
Honestly, I believe that most coach hiring/firing is about bones being tossed to the irate fans. Shuts them up for a while, until whatever the long term plan hopefully kicks in.
All discussions of how someone or some team is successful or not are inherently interesting if you love the game. Something is always learned. I liked Jaffe’s article but he is essentially acknowledging the difficulty in measuring, anything really, but managers in particular.

Brandon Isleib
11 years ago

“[P]layers exceed or fall below projections for many reasons: injury, a happy new marriage, a nasty divorce, a taste of one of the magical elixirs in the PED cabinet. None of these elements reflect in any way on the manager.”

The elements don’t reflect on the manager, but the manager can handle each of those elements differently, yes?  If so, then the manager can make a difference in those areas.

About 15 months ago, I had a great-uncle and a grandfather on the same side of the family die on consecutive Sundays while I was moving to a new apartment.  My boss was very reasonable and understood the need to get myself together.  As you might expect in my line of work as a lawyer, not every supervising lawyer you find is this reasonable; I’ve worked for one who probably would have been upset if I took off too much time for it.

Rather obviously, how each boss handles tough situations increases or decreases the productivity of her or his workers.  There’s plenty of managerial theory on this; I don’t know any of it, but it exists.

Importantly for Goldman’s criticisms and your premises, these are assumptions fundamental to work environments and managerial theory, NOT specifically to baseball.  Ordinarily, modern baseball evaluation has no need to import such ideas, making it feel weird to do so if you’re completely immersed in sabermetrics.  But it’s not like you were working with some novel idea about people and management.  If I remember your book content correctly, the premise on this was: “If the types of effects that bosses normally have on employees have a translation between managers and players, it would show up in these statistics and those effects,” and there was enough correlation to paint a reasonable picture on managerial effects.  The “if there is an effect, it would show up in X, so let’s see if it does” reasoning is straight out of Bill James.  I can’t argue with it.

The Goldman comments strike me as a case of tunnel vision.  I suppose your micro-v.-macro idea represents that, so I suppose this is my way of agreeing with you.  But if nobody’s saying it can be done at the micro level, why did Goldman feel the need to respond to you?  I too am puzzled by this.

Ben
11 years ago

Chris,

I admit that it’s been years since I’ve read the Click study, and I haven’t read your book or Goldman’s response.  However, I was wondering if you addressed possible collinearity of other coaches with managers in your study.  Specifically, I would guess that we don’t have a lot of variation in coaching staffs under a given manager, and therefore it would be hard to separate the effect of the manager from that of his coaches.  If there’s a lot of collinearity in coaching staffs, we wouldn’t be able to attribute the effect to the manager, but rather the staff as a whole.  Just curious to know if you’d looked into that at all.

Chris J.
11 years ago

Ben – in Chapter 1, I bring up coaches.  Before giving the results to the Birnbaum Database, I note there are many other factors that muddy the waters. Coaches are mentioned as one of the factors and I discuss that for a little bit.

Ultimately, there’s no way to separate them.

bucdaddy
11 years ago

TC,

I’m not saying these guys have no feelings. I’m saying I think their ability to put those feelings in a box and play ball while they’re on the job more than most ordinary people must be a trait of a successful athlete. I’m saying I think most of what outsiders would term “distractions” have little bearing on performance, certainly not to the degree media and fans think they do.

No doubt having a sick kid or a marriage falling apart or something is going to weigh on a player’s mind. But if you’re up at home plate thinking about the sick kid or the failing marriage, and not the fastball headed for your head, that’s a good way to get killed.

I think the “distractions” stuff is largely made up by the media because the media have pages and air time to fill and not enough to write and talk about. I also think ordinary people believe that if THEY had a sick kid or a failing marriage, it certainly would affect them at their jobs, so it MUST affect athletes too, no matter how many times they tell us it doesn’t.

But top-level pro athletes aren’t like ordinary people. They didn’t get there by being ordinary people. Lots of people with great physical gifts don’t make it to the top because they lack the mental make-up. People who have both, people who are physically AND mentally strong, are not going to be easily distracted from the task at hand.

Somebody should do a study somehow of how outside stresses affect the performance of major-league athletes. Maybe it would bury this “distractions” stuff once and for all.

TC
11 years ago

Hey bucdaddy, Don’t know if you’re still following this string but I want to respectfully disagree with you. There is no question that a professional athlete has greater powers of focus and concentration than the rest of us, at least in their specialty. But even within the more rarefied atmosphere that they operate, there are ups and downs. So many things contribute to this variance that it is impossible to catalog. Volumes have been written about slumps to no avail. A coach , in most cases, cannot resolve the issues that create a slump. What a coach can do is work with the athlete to create a realistic approach that allows the athlete to depressurize and refocus solely on solutions and stop focusing on their continuing failure. I don’t know whether Guillen did this for Dunn, the expectations were so high and the failure so great it might not have made any difference, but I do know that similar kinds of failure happen to athletes throughout their careers. It’s why hitters, for example, don’t hit .330 every season. And life is full of distractions. If you’re a rich athlete, very successful, lots of investments, lots of people coming at you with energy and concentration diverting schemes and ideas, it’s not that you’re standing at the plate contemplating investing in yet another hotel deal, but if you’re not in exactly the right “groove” because of all the energy drain, you just might get hit in the head. More likely you’ll just have a bad day/week/month at the plate. Yes, they operate on a higher plane of focus in their field than you or I but they are still human, and being human means not having machine like consistency. And that is where good coaches/managers make a difference. And as I said about Stengel, measure that.