Measuring greatness (part 2)

In part 1, we looked at two stats devised by Bill James, Win Shares and the Hall of Fame Monitor, and found that they generally hold up as standards for the Hall of Fame. As for Black Ink Test and Similarity Scores, I think they are valuable, but there are ways to augment them. The Black Ink test has an inherent era bias built into it, which James alluded to when he published his study: “Of course, it is harder to lead the league in multiple categories now, when there are fourteen [or 16] teams in a league, than it was in 1935, but hey, nothing’s perfect.” (p. 67)

Sure, nothing is perfect, but now that there are twice as many teams in the National League than there were prior to expansion, we can try to make the playing field a bit more even.

Note how the average even among the best candidates (based on Win Shares Grade) have been losing in the Black Ink Test since expansion:

WS Grade Decade Black Ink Avg
A 1930s 99.82
A 1940s 57.33
A 1950s 46.75
A 1960s 70.20
A 1970s 59.00
A 1980s 41.80
A 1990s 35.40
A 2000s 21.60
B 1930s 36.00
B 1940s 45.56
B 1950s 14.27
B 1960s 24.30
B 1970s 19.00
B 1980s 24.33
B 1990s 37.38
B 2000s 14.78
C 1930s 17.00
C 1940s 21.71
C 1950s 35.25
C 1960s 26.30
C 1970s 24.08
C 1980s 22.88
C 1990s 20.17
C 2000s 19.00
D 1940s 10.67
D 1950s 26.00
D 1960s 4.00
D 1970s 22.10
D 1980s 7.50
D 1990s 0.00
D 2000s 3.67

I propose that we weight the Blank Ink Test to compensate for league size. However, we just base it on number of teams then we end up compensating a league leader today twice as much as someone prior to expansion which overly devalue their feats. We will weight the additional teams above eight by a factor of 0.5. (Also, given the disparity in player and team quality in the early days of the game, I have kept weighting factor for 19th century league leaders to one. Otherwise, the handful of players who led the early leagues got too healthy a bump. Why should they benefit because the Altoona Mountain City club happens to field a handful of games in their league for a month or so?)

Also, reviewing the system James used to award points for individual stats, I had a few questions. First, why were OBP and OPS left off the list? Over the last 15 years, they have become probably the two most important common stats for measuring batter performance. Perhaps James felt that the voters were not so advanced as to understand the concept. I will give the benefit of doubt and add it to the mix. Even the NFL Channel, er, I mean ESPN, uses them now.

Second, I think that the point system can be revamped to more closely align the points assigned to the value of the stat, at least in the voters’ eyes. I ran a correlation between getting in the Hall and the total times leading in a category. The numbers were pretty close to James as it turned out, but there were some minor changes. (Points for batting average leader went from 4 to 3.5, for RBI went from 4 to 3.25, for HR went from 4 to 3, for R went from 3 to 3.5, for H remained at 3, for SLUG went from 3 to 3.25, for 2B went from 2 to 2.75, for BB remained at 2, for SB went from 2 to 1.5, for G went 1 to 2.25, for AB from 1 to 1.5, for 3B from 1 to 2.25, for OBP went from 0 to 3.25, and OPS went from 0 to 3.5.)

Here are the top 25 players for James’ Black Ink Test and for the Modified Black Ink Test:

Black Ink Batting Leaders – Original Black Ink Batting Leaders – Modified
Player Pts Player Pts
Babe Ruth 158 Babe Ruth 223.10
Ty Cobb 146 Ty Cobb 201.25
Rogers Hornsby 128 Ted Williams 198.00
Ted Williams 126 Rogers Hornsby 195.25
Stan Musial 116 Barry Bonds 178.59
Honus Wagner 109 Stan Musial 178.35
Dan Brouthers 79 Honus Wagner 151.65
Hank Aaron 76 Dan Brouthers 130.70
Lou Gehrig 75 Mike Schmidt 115.31
Mike Schmidt 74 Lou Gehrig 108.65
Nap Lajoie 72 Alex Rodriguez 105.33
Barry Bonds 69 Pete Rose 101.97
Alex Rodriguez 68 Carl Yastrzemski 101.36
Pete Rose 64 Nap Lajoie 92.15
Jimmie Foxx 63 Hank Aaron 91.51
Mickey Mantle 62 Mickey Mantle 88.19
Harry Stovey 62 Wade Boggs 87.45
Chuck Klein 60 Jimmie Foxx 85.75
Ed Delahanty 59 Willie Mays 85.29
Ross Barnes 59 Ross Barnes 84.90
Willie Mays 57 Ed Delahanty 83.75
Carl Yastrzemski 55 George Brett 78.18
Tony Gwynn 53 Rod Carew 75.79
Ralph Kiner 52 Chuck Klein 73.65
Cap Anson 52 Rickey Henderson 73.29

Next, I looked at the point system for the pitchers. Using the same method, I found new values for the various league-leading stats. In addition, I added Strikeouts-to-Walks ratio and WHIP (Walks plus Hits per Innings Pitched to James’ league leaders to perform the evaluation. The results were a bit farther off from James than the batting stats. Also, the results confirmed that WHIP more closely correlated to being a Hall of Famer than the component stats, Hits per Innings Pitched and Walks per Innings Pitched. Points for wins leader went from 4 to 4.75, for ERA remained at 4, for Strikeouts went from 4 to 4.25, for Innings Pitched from 3 to 4.5, for Winning Percentage from 3 to 2.75, for Saves from 3 to 1.5, for Complete Games from 2 to 4.25, for Games Pitched from 1 to 2, for games started from 1 to 3.25, for Shutouts from 1 to 4.25, and WHIP—4.25—replaced BB per IP and H per IP, which would have remained at 2 points and gone from 2 to 3.25 points, respectively. Also, I hate when someone throws in some idiosyncratic stat, but given that relievers are so undervalued, I threw in a stat I developed based on Bill James’ reliever theories and using runs created adjusted by era. It is called Relief Win and had a value of 1.75.)

Here are the top 25 in Black Ink pitching based on the original and my modified formula.

Black Ink Pitching Leaders – Original Black Ink Pitching Leaders – Modified
Player Pts Player Pts
Walter Johnson 150 Walter Johnson 264.25
Pete Alexander 126 Pete Alexander 201.75
Lefty Grove 111 Lefty Grove 187.50
Roger Clemens 103 Cy Young 182.00
Warren Spahn 101 Warren Spahn 166.50
Cy Young 100 Roger Clemens 157.50
Randy Johnson 99 Christy Mathewson 155.50
Bob Feller 98 Bob Feller 151.75
Christy Mathewson 92 Greg Maddux 143.25
Greg Maddux 87 Randy Johnson 130.75
Nolan Ryan 84 Sandy Koufax 121.25
Sandy Koufax 81 Dazzy Vance 119.50
Al Spalding 67 Ed Walsh 119.00
Ed Walsh 67 Robin Roberts 118.00
Dazzy Vance 66 Steve Carlton 104.25
Steve Carlton 66 John Clarkson 98.25
Joe McGinnity 64 Joe McGinnity 96.50
Robin Roberts 64 Al Spalding 95.00
Tom Seaver 60 Carl Hubbell 94.75
John Clarkson 60 Pedro Martinez 94.75
Pedro Martinez 58 Tom Seaver 89.75
Tim Keefe 58 Nolan Ryan 88.50
Amos Rusie 52 Curt Schilling 86.50
Dizzy Dean 52 Tim Keefe 81.50
Carl Hubbell 51 Dizzy Dean 80.25

Using the average Hall of Famer as a guide, the following players who are not in the Hall would meet the original Black Ink Test:

Player Pts
Roger Clemens 103
Randy Johnson 99
Greg Maddux 87
Barry Bonds 69
Alex Rodriguez 68
Pete Rose 64
Harry Stovey 62
Ross Barnes 59
Pedro Martinez 58
Bucky Walters 48
Tommy Bond 47
Gavvy Cravath 46
Curt Schilling 45
Johan Santana 42
Tony Oliva 41
Bill Hutchison 40
Tip O’Neill 39
Mark McGwire 36
Sherry Magee 35

For the most part, there are 19th Century players and active players who are obvious Hall choices.

Now, these are the players that meet the modified Black Ink Test standard:

Player Pts
Barry Bonds 178.59
Roger Clemens 157.50
Greg Maddux 143.25
Randy Johnson 130.75
Alex Rodriguez 105.33
Pete Rose 101.97
Pedro Martinez 94.75
Curt Schilling 86.50
Ross Barnes 84.90
Tommy Bond 79.50
Bucky Walters 72.00
Frank Thomas 69.37
Bill Hutchison 65.75
Mark McGwire 65.71
Harry Stovey 63.50
Johan Santana 62.00
Roy Halladay 61.50
Dick Allen 56.89
Gavvy Cravath 56.65
Albert Pujols 54.90
Manny Ramirez 54.86
Larry Walker 52.55
Tony Oliva 50.94
Jeff Bagwell 48.29
Edgar Martinez 47.85
Ichiro Suzuki 47.78
Dale Murphy 46.31
Pete Browning 44.85
Sammy Sosa 44.06
Tip O’Neill 44.05
Todd Helton 43.58
Albert Belle 43.04
Sherry Magee 41.30
George Burns 41.25
Juan Pierre 40.50
Dave Parker 38.81
Don Mattingly 38.23
Dwight Evans 37.61
Jason Giambi 37.54

I cannot say that I am overly pleased by Juan Pierre making the list, but there are a number of undervalued expansion-era players that also make it. A number of players on the continuing purgatory of the writers’ ballot (Parker, Mattingly, Murphy) make an appearance as well as some on the Vets list (Magee, Allen).

By the way, Rice now passes this test with a 49.57.

Next, we move to Similarity Scores. Of this method, James says, “A left-handed hitter will tend to be paired with another left-handed hitter. A player from the 1920s will tend to be paired with another player from the 1920s. A player who has post-playing career as a manager will tend to be paired with another player who was also a manager…Size—players are almost always paired with another player of the same size.” (p. 95)

That sounds like a great justification for a system, but it also might point to era biases within the system that pick players with similar demographics perhaps because those demographics directly impact the stats.

I propose that instead of comparing by raw stats, we compare based on stats weighted against era bias. Instead of looking at home runs hit, for example, we look at the number of home runs hit above expectation for a given era. Might we find similarities that are deeper and yet not apparent from the James’ Similarity Scores?

I created a set of expected values per stat per league. Then I took the total plate appearances for each player per league and year and created the amount he exceeded or was short of what was expected of the average player. In this way, Mike Schmidt leading the league with 36 home runs, which he did twice (1974 & 1984) stands out a bit more than the three players who hit 36 home runs in the American League in 1996—Alex Rodriguez, Geronimo Berroa, and Ed Sprague (really)—and ended up tied for 13th place in the league behind Mark McGwire’s 52 bombs.

Finally, the amount above/below expected was prorated per plate appearance over the player’s career. In this way, Babe Ruth exceeded the expected number of home runs for his career by .057 per plate appearance. He is followed by Mark McGwire (.051 per PA), Ryan Howard (.045), Jimmie Foxx (.041), Dave Kingman (.040), Ralph Kiner and Hank Greenberg (.039), Lou Gehrig (.038), and finally rounding out the top ten: Barry Bonds, Harmon Killebrew, and Mike Schmidt (.036). Ed Sprague is 785th.

I dropped Jasmes’ penalties for games played and at-bats (assuming a minimum of 1000 plate appearances). I compared runs, hits, doubles, triples, home runs, runs batted in, walks, strikeouts, stolen bases, batting average, on-base percentage, slugging, OPS, and kept James’ defensive position penalty (though I have outfielders truly divided by actual position not just lumped into one bucket as is done at Baseball Reference and may have been done in James’ original study).

The results are that the top-10 comps to Hank Aaron change from:

1. Willie Mays (783) *
2. Barry Bonds (I) (748)
3. Frank Robinson (667) *
4. Stan Musial (663) *
5. Babe Ruth (647) *
6. Carl Yastrzemski (627) *
7. Rafael Palmeiro (I) (611)
8. Mel Ott (601) *
9. Eddie Murray (588) *
10. Ken Griffey (I) 588)

To…

Joe DiMaggio* (855)
Willie Mays* (810)
Frank Robinson* (765)
Johnny Mize* (761)
Larry Walker (I) (754)
Vladimir Guerrero (I) (712)
Sam Thompson (663)
Chuck Klein* (657)
Harry Heilmann* (635)
Lip Pike (635)

I cannot say that I am entirely happy with either one of those lists. I think each has its plusses and minuses, but I am willing to follow the methodology and see what results we get being in the democratic mindset statistically speaking as we are.

By the way, the players with the lowest scores for their best comps are Babe Ruth and Ted Williams (both at 419) and ever execrable Bill Bergen (566), who was arguably the worst Hall-eligible batter of all time (sorry, Steve Jeltz does not qualify), he of the .170 career average, .228 OBP, .232 slugging average, and park-adjusted OPS 53 points below the league average. By James’ system Bergen has a number of comparable players because his stats though abysmal fall within the range of a number of lesser batters during mostly pitcher’s eras.

So where does this all leave us? If we look at the three players that started this blogging bloviation, the three men who are about to enter the Hall—Rickey Henderson, Jim Rice, and Joe Gordon—does this new methodology help to make their cases (or lack thereof) more clearly? To quote the estimable Mr. Owl, let’s find out.

As I stated earlier Henderson passed all four standards, Rice passed all but Hall of Standard (by 7.1 points), and Gordon passed not a one.

Of our new tests, Henderson passes three of four: Win Shares above baseline—both tests (210.35 WSAB), and Black Ink Modified (73.3). However, he fails the Modified Similar Scores test. Just one similar batter, Max Carey, is in the Hall, and most of the rest are Deadball-era leadoff outfielders with decent speed and good OPS’s, but Henderson’s uniqueness might not be properly captured there. Henderson falls from 100 percent passing to 88 percent.

Rice passes just the Modified Black Ink test (49.6). He has 96.34 WSAB, so he falls both Win Share tests. He has just one similar batter (Tony Perez) in the Hall. Rice falls from 75 percent to 50 percent, which seems to capture his borderline status properly.

Gordon keeps his perfect goose-egg streak going, going 0-for-4 in the new tests. He has just 80.67 WSAB, failing those two tests. He scores a 4.5 in the Modified Black Ink test. He has just one similar batter in the Hall (Bobby Doerr). Gordon remains at zero percent.

I think that these new tests are in the spirit of James’ “democratic” view of Hall of Fame candidates. The Modified Similarity Scores might need a little more tweaking and that is not surprising given the complexity and the scale of the calculations involved. At the other end of the spectrum, Win Shares Above Baseline are a vast improvement over the Fibonacci Win Scores that James originally proposed. In addition, having eight rather than four tests (five if you count the Fibonacci test) helps more clearly define the borderline cases, as Rice demonstrated.


Comments are closed.