Book excerpt: Evaluating Baseball’s Managers, 1876-2008by Chris Jaffe
November 16, 2009
For the next few weeks, THT will publish excerpts from my new book Evaluating Baseball's Managers, 1876-2008, which is scheduled for release in December, but can be ordered now. Please note, as an added bonus, and for a special limited time only, if you order it now you will get at no additional charge—new book smell!! (I should note I make considerably more money if you order directly from the publisher, but if you want to get it from Amazon or another source that's your call.)
(The following excerpt comes from Evaluating Baseball's Managers Chapter 2: Evaluating Managerial Performance. This is actually a heavily edited and condensed version of that chapter.)
To really understand a manager, comprehending his inclinations and peculiarities is vital. Ideally, one would know who preferred veterans and who used kids; which ones bunted and which did not; who relied on contact hitters and who depended on power; which manager relied on his bench and who used his starters; who went to the bullpen and who relied on his starters, and so on.
These issues and many similar ones can be answered using the basic statistical record, but the raw stats can be deceptive because they provide information about both the manager and when he managed. For example, say one wanted to find out which managers had the most or least interest in using relief pitchers. All current skippers use relief pitchers more than anyone previously. However, this does not mean that placing any modern manager, whether it is Ron Gardenhire or Mike Scioscia or whoever, back in the 1930s that he would use the same number of relievers as he did in 2006.
Before judging managers across eras, it is necessary to first compare them to their contemporaries. In his book on managers, Bill James had a few pages showing which managers' clubs led the league in various categories the most often. That approach can be taken deeper. Instead of relying only on first-place finishes, it would be better to account for every ranking a manager had in a given category.
The Tendencies Database
Sticking with relief pitchers, manager Buck Ewing was one of the game's leading users of the bullpen in the 1890s. He used a few dozen relievers a season, which consistently put him among the league leaders of his day. Ewing possessed good pitching staffs so necessity did not cause this; it was his managerial inclination. Here are the teams Ewing managed, how many relievers he used, and where he ranked:
Year Team RP Rank 1890 NYP 23 2nd of 8 teams 1895 CIN 35 2nd of 12 teams 1896 CIN 26 5th of 12 teams (tied) 1897 CIN 38 1st of 12 teams 1898 CIN 27 1st of 12 teams (tied) 1899 CIN 28 3rd of 12 teams (tied)
Two problems exist with this data. First, there are several ties. Ideally, Ewing should be cleanly separated from those surrounding him. Second, and far more importantly, there is a difference between ranking second in eight- and 12-team leagues.
Fortunately, both problems can be solved. For ties, use relievers per game instead of relief pitchers used. In baseball's early decades, teams frequently finished the season having played slightly different numbers of games owing to rainouts and darkness. Nowadays that is rarely the case, but the much greater number of relief appearances makes ties considerably less common. Using relievers per game has a second and much more important advantage: it makes the results more precise.
To adjust for league size, divide a squad's rank by the number of teams in their league. For example, coming second in the eight-team 1890 Players League would be worth 0.250 (2/8), and 0.167 in the 12-team 1895 National League (2/12). From there, one can figure out a career average. Apply both adjustments to Ewing and here are the results:
Year Team Rank Teams Avg 1890 NYP 2 8 0.250 1895 CIN 2 12 0.167 1896 CIN 5 12 0.417 1897 CIN 1 12 0.083 1898 CIN 2 12 0.167 1899 CIN 4 12 0.333
The lower the score, the more a manager used relievers; the higher the score, the less inclined he was. Ewing's score above averages out to 0.236. Coming in second in an eight-team league was 0.250, so he was a bit more extreme than that.
Seems nice, but a quirk needs to be resolved. The midpoint shifts based on league size, which causes the average score to fluctuate. That is incredibly noteworthy because to examine managers across all eras, it is necessary to have them centered the same. If managerial scores float in space, with no constant fixed point of reference for all of them, comparisons across the decades are impossible.
Take an eight-team league like the 1890 Players League. To figure its midpoint, average up all the scores: one-eighth plus two-eighths plus three-eighths up to eight-eights divided by eight. The midpoint averages out at 0.5625, which is precisely halfway between four-eighths (0.500), and five-eighths (0.625). Similarly, in a twelve-team league like the 1895 National League, the midpoint is again halfway between the two middle markers. However, this time those are six-twelfths (0.500) and seven-twelfths (0.583). Thus the 1895 NL has a midpoint of 0.5417, a bit lower than that of the Players League.
It seems odd that the leagues would have different midpoints, but it makes sense. Ultimately, no matter what the league size is, last place will always be the same: eight-eighths is one, as is twelve-twelfths, and sixteen-sixteenths. Alternately, first place constantly changes. One-eighth is 0.250 and one-twelfth is 0.167. If one endpoint always moves and the other never does, the center shifts.
Fortunately, a simple fix exists. Take Ewing's score for each year, and divide it by the league average. For 1890, that is 0.5625. For the other years it is 0.5417.
Year Team Rank Teams Avg LgAvg Final 1890 NYP 2 8 0.250 0.5625 0.444 1895 CIN 2 12 0.167 0.5417 0.308 1896 CIN 5 12 0.417 0.5417 0.771 1897 CIN 1 12 0.083 0.5417 0.154 1898 CIN 2 12 0.167 0.5417 0.308 1899 CIN 4 12 0.333 0.5417 0.617
Ewing's score works out to 0.434 with relief pitchers—still better than coming in second place in an eight-team league.
With this method, an average score is always one; that is how the math works when you always divide by league average. This formula can allow for comparisons between different managers across the generations. Better yet, the formula does not just work for relief pitchers used. One can use this method for anything. Just choose a stat, and plug it in. I created a database to handle this—the Tendencies Database.
Guidelines for the Tendencies Database
The Tendencies Database only examines those who managed in the major leagues 10 or more seasons. (By this, I mean he should have managed the majority of a season at least ten times.) Playing talent always matters the most in determining what happens in a season, but over the years managers shape and guide a team in a manner that suits his tendencies. For example, if a manager really believes in contact hitting, team strikeouts should decline. Strikeout rate will play a larger role in determining playing time. The manager can coach the hitters to make contact, and/or hire a hitting coach to do that. In the old days the manager would make roster decisions that general managers currently cover. Even to this day, managers who last a long time on the job usually have good working relationships with their bosses; that relationship includes input into roster construction. Thus managers can affect stats containing no obvious managerial influence.
Several oddities arise in the Tendencies Database. For one, based on the way the database is set up, the man with the most relievers (or any other stat) ends up with the lowest score. That might seem backwards, but it makes sense. The Tendencies Database is not accounting for the raw number, but the rank. Teams with the most relief pitchers used or home runs hit always rank first—the lowest ranking of all.
The Ewing example also showcases the importance of adjusting for context, which should be done whenever possible. For example, when looking at sacrifice bunts, the raw number can be misleading. For a sacrifice to occur, a runner must be on base. If one team with a .350 OBP performed 100 sacrifice bunts, and another squad with a .310 OBP had 95, the latter liked to bunt more. The Tendencies Database adjusts for context whenever possible, which both makes the results more precise and reduces ties.
Managers who last only part of the season are a sore spot for the Tendencies Database. In those situations, a manager will be added into the database if he lasted at least half of the season. This is not a perfect solution, but frankly perfection is not an option.
Tendencies Database: Overview of the Results
Relief Pitchers Used
Since the example given at the chapter's start focused on relievers, that will be the first item examined. As noted already, instead of using relief pitcher appearances, it is relievers per game. Without question the quality of starting pitchers plays a role, but it is very difficult to make these leaderboards without clear managerial preference for calling on relievers. Here are the results:
Most Relievers Burt Shotton 0.378 Jimy Williams 0.440 Lou Boudreau 0.458 Whitey Herzog 0.551 Joe Cronin 0.607 Fewest Relievers Bill Virdon 1.609 Ralph Houk 1.608 Earl Weaver 1.566 Jimmy Dykes 1.522 Cito Gaston 1.520
Shotton's score reveals how athletic talent and managerial temperament both play a role. Shotton ran the 1928-33 Phillies, who had no quality pitchers. Naturally, he led the league in relief pitchers used every season he managed in Philadelphia. There was more to it than that, though. His 1928 Phillies completed only 42 games, easily the fewest ever to that point in history. The previous record was 49, by an 1876 club that played 60 games. All previous twentieth century pitching staffs had at least 52 completions, and no club dropped below 42 until 1941. Though Shotton had a bad rotation, by itself that does not explain the tremendous drop-off in complete games.
On the other side, it is very interesting to see Virdon and Houk virtually tied for first. Though both men were known for relying on their starters, if you asked 100 serious baseball fans which managers avoided their bullpens the most, not many would say those two. It helps to crunch the numbers.
Hit and Run
That example was straightforward. However, with the Tendencies Database, there were times I tried to use the available data to make the best estimate for stats unknown. The hit and run is a good example of this. You do not find stats for this play for most of baseball history. What if you want to judge how recent managers compare to their predecessors in this strategy?
Time to think it through. A main reason teams call the hit and run is to avoid the double play, so grounding into double plays can serve as a rough approximation for it. The formula GIDP/(ROE+BB+HB+H-2B-3B-HR-SH-SB-CS) is double plays divided by the number of times someone made it to first without moving past the bag. In other words, double plays divided by chances to be doubled up. Admittedly, the hit and run is not the only thing that would cause teams to hit into or avoid double plays. However, over a decade or more those things often even out. It is exceptionally unlikely to end up at either extreme in the Tendencies Database unless a manager had a pronounced interest in or antipathy toward the hit-and-run. This is an imperfect guess, but perfection is not attainable, and running this formula through the database provides a reasonable approximation. GIDP info goes back to 1939 for the AL and 1933 for the NL.
Most Interest in the Hit-and-Run Birdie Tebbetts 0.548 Billy Southworth0.606 Al Lopez 0.639 Sparky Anderson 0.647 Davey Johnson 0.664 Least Interest in the Hit-and-Run Tom Kelly 1.467 Connie Mack 1.389 Frank Robinson 1.323 Mike Hargrove 1.316 Bill McKechnie 1.302
While the hit-and-run is one of the more ambitious portions of the Tendencies Database, it has the benefit of being a well-known play. The database also allows one to check items that have drawn less attention. One such issue is how a manager paces his team over a season. Baseball Reference lists the first and second half winning percentages for every team in baseball history. Put them both in the database, and divide the former by the latter to see whose teams improved or declined as the campaign wore on.
A manager should influence pacing to some degree. He keeps the team motivated, prepared, and most importantly of all, healthy. Pitchers' arms can turn to dust through overuse or grow rusty if not called on often enough. Position players can also be overworked or underutilized. The manager has to rest his batters often enough so that little aches and pains do not become full-blown injuries or nagging problems that can deplete their strength as the year continues. Managers are not the only factor that affects team pacing, but the larger the sample size, the more important their role becomes.
Improved as Year Went on Al Lopez 0.646 Frank Chance 0.667 Jimy Williams 0.734 Ned Hanlon 0.734 Billy Martin 0.744 Worsened as Year Went on Joe Cronin 1.393 Bill Terry 1.289 Johnny Oates 1.280 Danny Murtaugh 1.222 Miller Huggins 1.216
One key problem exists with this list: managers do not always last a full season on the job. This is especially important with Billy Martin. He gets full credit for the Yankees' late-season, pennant-winning explosion in 1978 despite the fact that it came after the franchise replaced him. The sixth-best score for in-season improvement belongs to Earl Weaver, at 0.800.
Teams managed by Lopez, Williams, and Chance virtually always played better as the year progressed. Alternately, Cronin's and Terry's teams melted in the dog days of summer almost every year.
This is a simple concept—add together plate appearances by the starting eight batters (or nine in a league with the designated hitter) and divide that by the team's total plate appearances. Some managers prefer a set lineup while others platoon or mix-and-match their starters. As always, managers are not the only factor affecting the results. Player health is the most important one in an individual season. Over a decade or more, however, injuries should even out.
Use Starters the Most Frank Selee 0.580 Dick Williams 0.584 Danny Murtaugh 0.600 Joe McCarthy 0.657 Ralph Houk 0.658 Use Bench the Most George Stallings1.387 Frank Robinson 1.381 Jim Fregosi 1.290 Paul Richards 1.258 Casey Stengel 1.242
Pat Moran, who only managed nine years, scored 0.543 in this query. Moran served as a backup catcher for list-leader Frank Selee at the turn of the century. One can easily imagine Moran sitting on the bench next to Selee, soaking up wisdom.
It is perfect that George Stallings tops the list on the right. As manager of the 1914 Miracle Braves he did more to popularize platooning than any other manager in baseball history. That strategy fell out of style, only to be reinvigorated in the 1950s when Casey Stengel used it while winning five-straight world titles with the Yankees. Fittingly, Stengel also appears in the right hand column.
Please note that Gil Hodges, who died after managing nine seasons, posted a mark of 1.667. Hodges could have used his bench the least of any manager for several years and still comfortably topped everyone.
Top Three Pitchers
It would be nice to have a pitching version of starter percentage: something that indicates which managers relied the most on their frontline hurlers and who liked to spread the innings around. A cursory examination of baseball history reveals that some managers treated their aces like pack animals while others evenly doled out the work. For example, in the first decade of the 20th Century John McGraw squeezed every inning he possibly could out of Christy Mathewson and Joe McGinnity while rival skipper Frank Chance parceled out the innings among his arms more evenly. Distributing innings is not simply a matter of starters versus bullpens, either. For much of baseball history, managers used pitchers in both roles. Aces—including Carl Hubbell, Lefty Grove, and Mordecai Brown—led the leagues in saves. Besides, starter/bullpen inning splits only go back to the 1950s.
No perfect way exists to distinguish between the McGraws and the Chances. However, when perfection is unattainable, take the best imperfect approximation. There is an effective, albeit rough, way of reckoning who leaned the most/least on their main arms. Go through every team in baseball history, find each squad's top three leaders in innings, add together their workload, and divide by the team's total innings pitched.
Why focus on the top three workhorses? Making it just the ace or top two pitchers would cause the results to be excessively dependent on if a team had a dominant hurler. Broadening it out to four or five pitchers primarily reveals the overall depth of the rotation rather than the manager's predilections. Three provides a nice middle ground. Quality of the ace and staff depth each bleed in, but both are more subdued.
Rely on Main Pitchers Tommy Lasorda 0.429 Earl Weaver 0.435 Bobby Cox 0.443 Al Lopez 0.588 Frank Selee 0.599 Spread Out the Innings Frank Robinson 1.331 Gus Schmelz 1.322 Jack McKeon 1.285 Jimy Williams 1.279 Frank Chance 1.273
John McGraw scores a 0.823 with this stat. Once McGinnity faded away, McGraw eased up on his main starters, a tendency that increased when Mathewson waned.
Frank Selee not only heavily relied on his front line talent, but his protégé, Pat Moran had a score of 0.519. Clearly, these men had some similar thoughts on how to run a team. Then again, Selee's first baseman with the Cubs was Frank Chance, who obviously took a very different approach to handling his pitchers. Sometimes managers emulate those they played under, as Moran did with Selee. Other times they move in the opposite direction, as was the case with Chance.
All the above examples of the Tendencies Database look at one stat at a time. However, since all of the database's results are centered at one, they can be combined rather easily. Some statistics have an underlying philosophical similarity, and combining them gives you a better appreciation of how managers approached their job.
Hopefully these little charts made sense, because the second part of the book is littered with them. The Tendencies Database is based on the overriding belief that good enough trumps nothing when perfection is not an option.
References and Resources
I should note that Chapter Two is approximately twice as long as this excerpt. Among other things, I address a lot of side issues with the Tendencies Database (how it handles ties when they inevitably arise, what areas are worth studying with it, etc).
In order to make this excerpt work, I not only edited it, but engaged in some minor revising on a few occasions. For example, the chapter actually starts out by saying: "While the Birnbaum Database provides an overview of managerial performance, it does not enlighten anyone about the inclinations and peculiarities of individual skippers. To really understand a manager, comprehending such details is vital."
In context of the book, that makes sense because Chapter One focuses on the Birnbaum Database. Here, it doesn't work like that, hence some minor revising. Something similar happens in the last sentence of the chapter—which notes that good enough beats nothing with both the Tendencies and Birnbaum Databases.
Here's another example of a revision. In the actual book, the Hit-and-Run section begins: "The above examples are straightforward." Well, as it happens, the Hit-and-Run example is the fourth one given in the book. (I opted not to include the sacrifice hit and stolen base ones, as they already appear in The 2008 THT Annual. Thus a shift from plural to singular. This explains almost all revisions: they refer back to something previously in the book but deleted in the excerpt.
I'm not happy that any revisions were made. That goes against the spirit of offering excerpts in the first place. Alternately, the entire chapter—even thought it's a short chapter—is too big to serve as one excerpt. I don't want to divide it into two excerpts because I'd rather focus on managers rather than the math in these pieces. (I can only assume that's why people are going to buy the book.) Still, I think the Tendencies Database is important enough to the book that it should become an excerpt, and the revisions really are quite minor and rarely occur.
History instructor by day, statnerd by night, Chris Jaffe leads one of the most exciting double lives imaginable; with the exception of every other double life possible to imagine. Despite his lack of comic-book-hero-worthiness, Chris enjoys farting around with this stuff. His new book, Evaluating Baseball's Managers is available for order. Chris welcomes responses to his articles via e-mail. Oh, and now he's on twitter.