Book excerpt: Evaluating Baseball’s Managers, 1876-2008

by Chris Jaffe
November 16, 2009

For the next few weeks, THT will publish excerpts from my new book Evaluating Baseball’s Managers, 1876-2008, which is scheduled for release in December, but can be ordered now. Please note, as an added bonus, and for a special limited time only, if you order it now you will get at no additional charge—new book smell!! (I should note I make considerably more money if you order directly from the publisher, but if you want to get it from Amazon or another source that’s your call.)

(The following excerpt comes from Evaluating Baseball’s Managers Chapter 2: Evaluating Managerial Performance. This is actually a heavily edited and condensed version of that chapter.)

To really understand a manager, comprehending his inclinations and peculiarities is vital. Ideally, one would know who preferred veterans and who used kids; which ones bunted and which did not; who relied on contact hitters and who depended on power; which manager relied on his bench and who used his starters; who went to the bullpen and who relied on his starters, and so on.

These issues and many similar ones can be answered using the basic statistical record, but the raw stats can be deceptive because they provide information about both the manager and when he managed. For example, say one wanted to find out which managers had the most or least interest in using relief pitchers. All current skippers use relief pitchers more than anyone previously. However, this does not mean that placing any modern manager, whether it is Ron Gardenhire or Mike Scioscia or whoever, back in the 1930s that he would use the same number of relievers as he did in 2006.

Before judging managers across eras, it is necessary to first compare them to their contemporaries. In his book on managers, Bill James had a few pages showing which managers’ clubs led the league in various categories the most often. That approach can be taken deeper. Instead of relying only on first-place finishes, it would be better to account for every ranking a manager had in a given category.

The Tendencies Database

Sticking with relief pitchers, manager Buck Ewing was one of the game’s leading users of the bullpen in the 1890s. He used a few dozen relievers a season, which consistently put him among the league leaders of his day. Ewing possessed good pitching staffs so necessity did not cause this; it was his managerial inclination. Here are the teams Ewing managed, how many relievers he used, and where he ranked:

Year	Team	RP	Rank
1890	NYP	23	2nd of 8 teams
1895	CIN	35	2nd of 12 teams
1896	CIN	26	5th of 12 teams (tied)
1897	CIN	38	1st of 12 teams
1898	CIN	27	1st of 12 teams (tied)
1899	CIN	28	3rd of 12 teams (tied)

Two problems exist with this data. First, there are several ties. Ideally, Ewing should be cleanly separated from those surrounding him. Second, and far more importantly, there is a difference between ranking second in eight- and 12-team leagues.

Fortunately, both problems can be solved. For ties, use relievers per game instead of relief pitchers used. In baseball’s early decades, teams frequently finished the season having played slightly different numbers of games owing to rainouts and darkness. Nowadays that is rarely the case, but the much greater number of relief appearances makes ties considerably less common. Using relievers per game has a second and much more important advantage: it makes the results more precise.

To adjust for league size, divide a squad’s rank by the number of teams in their league. For example, coming second in the eight-team 1890 Players League would be worth 0.250 (2/8), and 0.167 in the 12-team 1895 National League (2/12). From there, one can figure out a career average. Apply both adjustments to Ewing and here are the results:

Year	Team	Rank	Teams	Avg
1890	NYP	2	8	0.250
1895	CIN	2	12	0.167
1896	CIN	5	12	0.417
1897	CIN	1	12	0.083
1898	CIN	2	12	0.167
1899	CIN	4	12	0.333

The lower the score, the more a manager used relievers; the higher the score, the less inclined he was. Ewing’s score above averages out to 0.236. Coming in second in an eight-team league was 0.250, so he was a bit more extreme than that.

Seems nice, but a quirk needs to be resolved. The midpoint shifts based on league size, which causes the average score to fluctuate. That is incredibly noteworthy because to examine managers across all eras, it is necessary to have them centered the same. If managerial scores float in space, with no constant fixed point of reference for all of them, comparisons across the decades are impossible.

Take an eight-team league like the 1890 Players League. To figure its midpoint, average up all the scores: one-eighth plus two-eighths plus three-eighths up to eight-eights divided by eight. The midpoint averages out at 0.5625, which is precisely halfway between four-eighths (0.500), and five-eighths (0.625). Similarly, in a twelve-team league like the 1895 National League, the midpoint is again halfway between the two middle markers. However, this time those are six-twelfths (0.500) and seven-twelfths (0.583). Thus the 1895 NL has a midpoint of 0.5417, a bit lower than that of the Players League.

It seems odd that the leagues would have different midpoints, but it makes sense. Ultimately, no matter what the league size is, last place will always be the same: eight-eighths is one, as is twelve-twelfths, and sixteen-sixteenths. Alternately, first place constantly changes. One-eighth is 0.250 and one-twelfth is 0.167. If one endpoint always moves and the other never does, the center shifts.

Fortunately, a simple fix exists. Take Ewing’s score for each year, and divide it by the league average. For 1890, that is 0.5625. For the other years it is 0.5417.

Year	Team	Rank	Teams	Avg	LgAvg	Final
1890	NYP	2	8	0.250	0.5625	0.444
1895	CIN	2	12	0.167	0.5417	0.308
1896	CIN	5	12	0.417	0.5417	0.771
1897	CIN	1	12	0.083	0.5417	0.154
1898	CIN	2	12	0.167	0.5417	0.308
1899	CIN	4	12	0.333	0.5417	0.617

Ewing’s score works out to 0.434 with relief pitchers—still better than coming in second place in an eight-team league.

With this method, an average score is always one; that is how the math works when you always divide by league average. This formula can allow for comparisons between different managers across the generations. Better yet, the formula does not just work for relief pitchers used. One can use this method for anything. Just choose a stat, and plug it in. I created a database to handle this—the Tendencies Database.

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

Guidelines for the Tendencies Database

The Tendencies Database only examines those who managed in the major leagues 10 or more seasons. (By this, I mean he should have managed the majority of a season at least ten times.) Playing talent always matters the most in determining what happens in a season, but over the years managers shape and guide a team in a manner that suits his tendencies. For example, if a manager really believes in contact hitting, team strikeouts should decline. Strikeout rate will play a larger role in determining playing time. The manager can coach the hitters to make contact, and/or hire a hitting coach to do that. In the old days the manager would make roster decisions that general managers currently cover. Even to this day, managers who last a long time on the job usually have good working relationships with their bosses; that relationship includes input into roster construction. Thus managers can affect stats containing no obvious managerial influence.

Several oddities arise in the Tendencies Database. For one, based on the way the database is set up, the man with the most relievers (or any other stat) ends up with the lowest score. That might seem backwards, but it makes sense. The Tendencies Database is not accounting for the raw number, but the rank. Teams with the most relief pitchers used or home runs hit always rank first—the lowest ranking of all.

The Ewing example also showcases the importance of adjusting for context, which should be done whenever possible. For example, when looking at sacrifice bunts, the raw number can be misleading. For a sacrifice to occur, a runner must be on base. If one team with a .350 OBP performed 100 sacrifice bunts, and another squad with a .310 OBP had 95, the latter liked to bunt more. The Tendencies Database adjusts for context whenever possible, which both makes the results more precise and reduces ties.

Managers who last only part of the season are a sore spot for the Tendencies Database. In those situations, a manager will be added into the database if he lasted at least half of the season. This is not a perfect solution, but frankly perfection is not an option.

Tendencies Database: Overview of the Results

Relief Pitchers Used

Since the example given at the chapter’s start focused on relievers, that will be the first item examined. As noted already, instead of using relief pitcher appearances, it is relievers per game. Without question the quality of starting pitchers plays a role, but it is very difficult to make these leaderboards without clear managerial preference for calling on relievers. Here are the results:

Most Relievers	
Burt Shotton	0.378
Jimy Williams	0.440
Lou Boudreau	0.458
Whitey Herzog	0.551
Joe Cronin	0.607

Fewest Relievers	
Bill Virdon	1.609
Ralph Houk	1.608
Earl Weaver	1.566
Jimmy Dykes	1.522
Cito Gaston	1.520

Shotton’s score reveals how athletic talent and managerial temperament both play a role. Shotton ran the 1928-33 Phillies, who had no quality pitchers. Naturally, he led the league in relief pitchers used every season he managed in Philadelphia. There was more to it than that, though. His 1928 Phillies completed only 42 games, easily the fewest ever to that point in history. The previous record was 49, by an 1876 club that played 60 games. All previous twentieth century pitching staffs had at least 52 completions, and no club dropped below 42 until 1941. Though Shotton had a bad rotation, by itself that does not explain the tremendous drop-off in complete games.

On the other side, it is very interesting to see Virdon and Houk virtually tied for first. Though both men were known for relying on their starters, if you asked 100 serious baseball fans which managers avoided their bullpens the most, not many would say those two. It helps to crunch the numbers.

Hit and Run

That example was straightforward. However, with the Tendencies Database, there were times I tried to use the available data to make the best estimate for stats unknown. The hit and run is a good example of this. You do not find stats for this play for most of baseball history. What if you want to judge how recent managers compare to their predecessors in this strategy?

Time to think it through. A main reason teams call the hit and run is to avoid the double play, so grounding into double plays can serve as a rough approximation for it. The formula GIDP/(ROE+BB+HB+H-2B-3B-HR-SH-SB-CS) is double plays divided by the number of times someone made it to first without moving past the bag. In other words, double plays divided by chances to be doubled up. Admittedly, the hit and run is not the only thing that would cause teams to hit into or avoid double plays. However, over a decade or more those things often even out. It is exceptionally unlikely to end up at either extreme in the Tendencies Database unless a manager had a pronounced interest in or antipathy toward the hit-and-run. This is an imperfect guess, but perfection is not attainable, and running this formula through the database provides a reasonable approximation. GIDP info goes back to 1939 for the AL and 1933 for the NL.

Most Interest in the Hit-and-Run	
Birdie Tebbetts	0.548
Billy Southworth0.606
Al Lopez	0.639
Sparky Anderson	0.647
Davey Johnson	0.664

Least Interest in the Hit-and-Run	
Tom Kelly	1.467
Connie Mack	1.389
Frank Robinson	1.323
Mike Hargrove	1.316
Bill McKechnie	1.302

Season Pace

While the hit-and-run is one of the more ambitious portions of the Tendencies Database, it has the benefit of being a well-known play. The database also allows one to check items that have drawn less attention. One such issue is how a manager paces his team over a season. Baseball Reference lists the first and second half winning percentages for every team in baseball history. Put them both in the database, and divide the former by the latter to see whose teams improved or declined as the campaign wore on.

A manager should influence pacing to some degree. He keeps the team motivated, prepared, and most importantly of all, healthy. Pitchers’ arms can turn to dust through overuse or grow rusty if not called on often enough. Position players can also be overworked or underutilized. The manager has to rest his batters often enough so that little aches and pains do not become full-blown injuries or nagging problems that can deplete their strength as the year continues. Managers are not the only factor that affects team pacing, but the larger the sample size, the more important their role becomes.

Improved as Year Went on	
Al Lopez	0.646
Frank Chance	0.667
Jimy Williams	0.734
Ned Hanlon	0.734
Billy Martin	0.744

Worsened as Year Went on	
Joe Cronin	1.393
Bill Terry	1.289
Johnny Oates	1.280
Danny Murtaugh	1.222
Miller Huggins	1.216

One key problem exists with this list: managers do not always last a full season on the job. This is especially important with Billy Martin. He gets full credit for the Yankees’ late-season, pennant-winning explosion in 1978 despite the fact that it came after the franchise replaced him. The sixth-best score for in-season improvement belongs to Earl Weaver, at 0.800.

Teams managed by Lopez, Williams, and Chance virtually always played better as the year progressed. Alternately, Cronin’s and Terry’s teams melted in the dog days of summer almost every year.

Starter Percentage

This is a simple concept—add together plate appearances by the starting eight batters (or nine in a league with the designated hitter) and divide that by the team’s total plate appearances. Some managers prefer a set lineup while others platoon or mix-and-match their starters. As always, managers are not the only factor affecting the results. Player health is the most important one in an individual season. Over a decade or more, however, injuries should even out.

Use Starters the Most	
Frank Selee	0.580
Dick Williams	0.584
Danny Murtaugh	0.600
Joe McCarthy	0.657
Ralph Houk	0.658

Use Bench the Most	
George Stallings1.387
Frank Robinson	1.381
Jim Fregosi	1.290
Paul Richards	1.258
Casey Stengel	1.242

Pat Moran, who only managed nine years, scored 0.543 in this query. Moran served as a backup catcher for list-leader Frank Selee at the turn of the century. One can easily imagine Moran sitting on the bench next to Selee, soaking up wisdom.

It is perfect that George Stallings tops the list on the right. As manager of the 1914 Miracle Braves he did more to popularize platooning than any other manager in baseball history. That strategy fell out of style, only to be reinvigorated in the 1950s when Casey Stengel used it while winning five-straight world titles with the Yankees. Fittingly, Stengel also appears in the right hand column.

Please note that Gil Hodges, who died after managing nine seasons, posted a mark of 1.667. Hodges could have used his bench the least of any manager for several years and still comfortably topped everyone.

Top Three Pitchers

It would be nice to have a pitching version of starter percentage: something that indicates which managers relied the most on their frontline hurlers and who liked to spread the innings around. A cursory examination of baseball history reveals that some managers treated their aces like pack animals while others evenly doled out the work. For example, in the first decade of the 20th Century John McGraw squeezed every inning he possibly could out of Christy Mathewson and Joe McGinnity while rival skipper Frank Chance parceled out the innings among his arms more evenly. Distributing innings is not simply a matter of starters versus bullpens, either. For much of baseball history, managers used pitchers in both roles. Aces—including Carl Hubbell, Lefty Grove, and Mordecai Brown—led the leagues in saves. Besides, starter/bullpen inning splits only go back to the 1950s.

No perfect way exists to distinguish between the McGraws and the Chances. However, when perfection is unattainable, take the best imperfect approximation. There is an effective, albeit rough, way of reckoning who leaned the most/least on their main arms. Go through every team in baseball history, find each squad’s top three leaders in innings, add together their workload, and divide by the team’s total innings pitched.

Why focus on the top three workhorses? Making it just the ace or top two pitchers would cause the results to be excessively dependent on if a team had a dominant hurler. Broadening it out to four or five pitchers primarily reveals the overall depth of the rotation rather than the manager’s predilections. Three provides a nice middle ground. Quality of the ace and staff depth each bleed in, but both are more subdued.

Rely on Main Pitchers	
Tommy Lasorda	0.429
Earl Weaver	0.435
Bobby Cox	0.443
Al Lopez	0.588
Frank Selee	0.599

Spread Out the Innings	
Frank Robinson	1.331
Gus Schmelz	1.322
Jack McKeon	1.285
Jimy Williams	1.279
Frank Chance	1.273

John McGraw scores a 0.823 with this stat. Once McGinnity faded away, McGraw eased up on his main starters, a tendency that increased when Mathewson waned.

Frank Selee not only heavily relied on his front line talent, but his protégé, Pat Moran had a score of 0.519. Clearly, these men had some similar thoughts on how to run a team. Then again, Selee’s first baseman with the Cubs was Frank Chance, who obviously took a very different approach to handling his pitchers. Sometimes managers emulate those they played under, as Moran did with Selee. Other times they move in the opposite direction, as was the case with Chance.

All the above examples of the Tendencies Database look at one stat at a time. However, since all of the database’s results are centered at one, they can be combined rather easily. Some statistics have an underlying philosophical similarity, and combining them gives you a better appreciation of how managers approached their job.

Hopefully these little charts made sense, because the second part of the book is littered with them. The Tendencies Database is based on the overriding belief that good enough trumps nothing when perfection is not an option.

References & Resources
I should note that Chapter Two is approximately twice as long as this excerpt. Among other things, I address a lot of side issues with the Tendencies Database (how it handles ties when they inevitably arise, what areas are worth studying with it, etc).

In order to make this excerpt work, I not only edited it, but engaged in some minor revising on a few occasions. For example, the chapter actually starts out by saying: “While the Birnbaum Database provides an overview of managerial performance, it does not enlighten anyone about the inclinations and peculiarities of individual skippers. To really understand a manager, comprehending such details is vital.”

In context of the book, that makes sense because Chapter One focuses on the Birnbaum Database. Here, it doesn’t work like that, hence some minor revising. Something similar happens in the last sentence of the chapter—which notes that good enough beats nothing with both the Tendencies and Birnbaum Databases.

Here’s another example of a revision. In the actual book, the Hit-and-Run section begins: “The above examples are straightforward.” Well, as it happens, the Hit-and-Run example is the fourth one given in the book. (I opted not to include the sacrifice hit and stolen base ones, as they already appear in The 2008 THT Annual. Thus a shift from plural to singular. This explains almost all revisions: they refer back to something previously in the book but deleted in the excerpt.

I’m not happy that any revisions were made. That goes against the spirit of offering excerpts in the first place. Alternately, the entire chapter—even thought it’s a short chapter—is too big to serve as one excerpt. I don’t want to divide it into two excerpts because I’d rather focus on managers rather than the math in these pieces. (I can only assume that’s why people are going to buy the book.) Still, I think the Tendencies Database is important enough to the book that it should become an excerpt, and the revisions really are quite minor and rarely occur.

7 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Brandon Isleib

14 years ago

From what I’ve seen in 1920s data, player-managers tended to substitute in-game more than their dugout counterparts, perhaps because they were more in tune with how people were on the field. Does this hold true to any degree through time?

Gilbert

Not sure how many mgrs make up Brandon’s player mgrs but a psychological item may be in play: The player mgr may be younger and have previously been a player, and one of the needs is to show who is boss. So if you don’t run out a ground ball or throw to the wrong base, there is one way to show that kind of play is not tolerated.

David P. Stokes

Another distortion that might exist in the data is that not everything is called by the manager. For example, a pitch-out might be called by the manager, but it might also be called by the catcher. Even worse, the tendency for the players to call things themselves rather than the manager was almost certainly more pronounced in the past. Actually, one thing that would be interesting to see would be data about which managers let players call things more themselves, and which ones kept those decisions more tightly controlled, but I don’t see how it would be possible to figure out.

The two main managers that come to mind are Bucky Harris and Tris Speaker. Harris pinch-ran a lot and used relievers often; Speaker substituted in the outfield and first base often. Much of the time, however, the substitution was to take themselves out of the game, hence why Stuffy Stewart was a frequent Senators pinch-runner (since he could play 2B). Speaker removed himself in CF more than usual, although not often by modern standards. I don’t recall Ty Cobb subbing much at all, however.

KJOK

Most Relievers
Burt Shotton 0.378
Jimy Williams 0.440

but you just said:

Ewing’s score works out to 0.434 with relief pitchers

So, shouldn’t Ewing be #2 on the list instead of Williams?

KJOK, I believe Chris’s lists in the book have a games managed threshold, making Ewing a convenient example but insufficient for making the list. Could be wrong, though.

Chris J.

KJOK – Like Brandon says, there’s a year minimum: 10 years as a team’s primary manager (this is noted in the excerpt, in the first sentence after the Guidelines for the Tendencies Database header).

Ewing has a shorter career, which makes him easier to use as an example.

I don’t have anything to say about in-game subbings of player-managers versus others. I didn’t look into that. Sounds interesting, and there might be something to it, provided you still allow some latitude for individual variation.

David – there’s a bunch of things that can distort the numbers. I try to note/account for those things when I can. The overall theme of the book is that though there are imperfections in all the data, you can still find some useful knowledge from it.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG