May 25, 2013

THT Essentials:
Fangraphs Player Search:


And here's the full roster.

Now available


You can now purchase the Hardball Times Baseball Annual 2013, with 300 pages of great content. It's also available on Amazon and Kindle. Read more about it here.



Or you can search by:

THT E-book


Third Base: The Crossroads is THT's e-book, available for $3.99 from the Kindle store. The good news is that anyone can read a Kindle book, even on a PC. So enjoy the best from THT in a new format.



Get your very own THT merchandise from our CafePress store. We've got baseball caps, t-shirts, coffee mugs and even wall clocks with the classy THT logo prominently displayed. Also, check out the THT Bookstore. Please support your favorite baseball site by purchasing something today.


Creative Commons License
All content on this site (including text, graphs, and any other original works), unless otherwise noted, is licensed under a Creative Commons License.

Measuring greatness (part 2)

by Mike Carminati
April 13, 2009

In part 1, we looked at two stats devised by Bill James, Win Shares and the Hall of Fame Monitor, and found that they generally hold up as standards for the Hall of Fame. As for Black Ink Test and Similarity Scores, I think they are valuable, but there are ways to augment them. The Black Ink test has an inherent era bias built into it, which James alluded to when he published his study: “Of course, it is harder to lead the league in multiple categories now, when there are fourteen [or 16] teams in a league, than it was in 1935, but hey, nothing’s perfect.” (p. 67)


Sure, nothing is perfect, but now that there are twice as many teams in the National League than there were prior to expansion, we can try to make the playing field a bit more even.

Note how the average even among the best candidates (based on Win Shares Grade) have been losing in the Black Ink Test since expansion:

WS GradeDecade Black Ink Avg
A1930s 99.82
A1940s 57.33
A1950s 46.75
A1960s 70.20
A1970s 59.00
A1980s 41.80
A1990s 35.40
A2000s 21.60
B1930s 36.00
B1940s 45.56
B1950s 14.27
B1960s 24.30
B1970s 19.00
B1980s 24.33
B1990s 37.38
B2000s 14.78
C1930s 17.00
C1940s 21.71
C1950s 35.25
C1960s 26.30
C1970s 24.08
C1980s 22.88
C1990s 20.17
C2000s 19.00
D1940s 10.67
D1950s 26.00
D1960s 4.00
D1970s 22.10
D1980s 7.50
D1990s0.00
D2000s 3.67


I propose that we weight the Blank Ink Test to compensate for league size. However, we just base it on number of teams then we end up compensating a league leader today twice as much as someone prior to expansion which overly devalue their feats. We will weight the additional teams above eight by a factor of 0.5. (Also, given the disparity in player and team quality in the early days of the game, I have kept weighting factor for 19th century league leaders to one. Otherwise, the handful of players who led the early leagues got too healthy a bump. Why should they benefit because the Altoona Mountain City club happens to field a handful of games in their league for a month or so?)

Also, reviewing the system James used to award points for individual stats, I had a few questions. First, why were OBP and OPS left off the list? Over the last 15 years, they have become probably the two most important common stats for measuring batter performance. Perhaps James felt that the voters were not so advanced as to understand the concept. I will give the benefit of doubt and add it to the mix. Even the NFL Channel, er, I mean ESPN, uses them now.

Second, I think that the point system can be revamped to more closely align the points assigned to the value of the stat, at least in the voters' eyes. I ran a correlation between getting in the Hall and the total times leading in a category. The numbers were pretty close to James as it turned out, but there were some minor changes. (Points for batting average leader went from 4 to 3.5, for RBI went from 4 to 3.25, for HR went from 4 to 3, for R went from 3 to 3.5, for H remained at 3, for SLUG went from 3 to 3.25, for 2B went from 2 to 2.75, for BB remained at 2, for SB went from 2 to 1.5, for G went 1 to 2.25, for AB from 1 to 1.5, for 3B from 1 to 2.25, for OBP went from 0 to 3.25, and OPS went from 0 to 3.5.)

Here are the top 25 players for James’ Black Ink Test and for the Modified Black Ink Test:

Black Ink Batting Leaders - OriginalBlack Ink Batting Leaders - Modified
PlayerPtsPlayerPts
Babe Ruth158Babe Ruth 223.10
Ty Cobb146Ty Cobb 201.25
Rogers Hornsby128Ted Williams 198.00
Ted Williams126Rogers Hornsby 195.25
Stan Musial116Barry Bonds 178.59
Honus Wagner109Stan Musial 178.35
Dan Brouthers79Honus Wagner 151.65
Hank Aaron76Dan Brouthers 130.70
Lou Gehrig75Mike Schmidt 115.31
Mike Schmidt74Lou Gehrig 108.65
Nap Lajoie72Alex Rodriguez 105.33
Barry Bonds69Pete Rose 101.97
Alex Rodriguez68Carl Yastrzemski 101.36
Pete Rose64Nap Lajoie 92.15
Jimmie Foxx63Hank Aaron 91.51
Mickey Mantle62Mickey Mantle 88.19
Harry Stovey62Wade Boggs 87.45
Chuck Klein60Jimmie Foxx 85.75
Ed Delahanty59Willie Mays 85.29
Ross Barnes59Ross Barnes 84.90
Willie Mays57Ed Delahanty 83.75
Carl Yastrzemski55George Brett 78.18
Tony Gwynn53Rod Carew 75.79
Ralph Kiner52Chuck Klein 73.65
Cap Anson52Rickey Henderson 73.29


Next, I looked at the point system for the pitchers. Using the same method, I found new values for the various league-leading stats. In addition, I added Strikeouts-to-Walks ratio and WHIP (Walks plus Hits per Innings Pitched to James’ league leaders to perform the evaluation. The results were a bit farther off from James than the batting stats. Also, the results confirmed that WHIP more closely correlated to being a Hall of Famer than the component stats, Hits per Innings Pitched and Walks per Innings Pitched. Points for wins leader went from 4 to 4.75, for ERA remained at 4, for Strikeouts went from 4 to 4.25, for Innings Pitched from 3 to 4.5, for Winning Percentage from 3 to 2.75, for Saves from 3 to 1.5, for Complete Games from 2 to 4.25, for Games Pitched from 1 to 2, for games started from 1 to 3.25, for Shutouts from 1 to 4.25, and WHIP—4.25—replaced BB per IP and H per IP, which would have remained at 2 points and gone from 2 to 3.25 points, respectively. Also, I hate when someone throws in some idiosyncratic stat, but given that relievers are so undervalued, I threw in a stat I developed based on Bill James’ reliever theories and using runs created adjusted by era. It is called Relief Win and had a value of 1.75.)

Here are the top 25 in Black Ink pitching based on the original and my modified formula.

Black Ink Pitching Leaders - OriginalBlack Ink Pitching Leaders - Modified
PlayerPtsPlayer Pts
Walter Johnson150Walter Johnson 264.25
Pete Alexander126Pete Alexander 201.75
Lefty Grove111Lefty Grove 187.50
Roger Clemens103Cy Young 182.00
Warren Spahn101Warren Spahn 166.50
Cy Young100Roger Clemens 157.50
Randy Johnson99Christy Mathewson 155.50
Bob Feller98Bob Feller 151.75
Christy Mathewson92Greg Maddux 143.25
Greg Maddux87Randy Johnson 130.75
Nolan Ryan84Sandy Koufax 121.25
Sandy Koufax81Dazzy Vance 119.50
Al Spalding67Ed Walsh 119.00
Ed Walsh67Robin Roberts 118.00
Dazzy Vance66Steve Carlton 104.25
Steve Carlton66John Clarkson 98.25
Joe McGinnity64Joe McGinnity 96.50
Robin Roberts64Al Spalding 95.00
Tom Seaver60Carl Hubbell 94.75
John Clarkson60Pedro Martinez 94.75
Pedro Martinez58Tom Seaver 89.75
Tim Keefe58Nolan Ryan 88.50
Amos Rusie52Curt Schilling 86.50
Dizzy Dean52Tim Keefe 81.50
Carl Hubbell51Dizzy Dean 80.25


Using the average Hall of Famer as a guide, the following players who are not in the Hall would meet the original Black Ink Test:

PlayerPts
Roger Clemens103
Randy Johnson99
Greg Maddux87
Barry Bonds69
Alex Rodriguez68
Pete Rose64
Harry Stovey62
Ross Barnes59
Pedro Martinez58
Bucky Walters48
Tommy Bond47
Gavvy Cravath46
Curt Schilling45
Johan Santana42
Tony Oliva41
Bill Hutchison40
Tip O'Neill39
Mark McGwire36
Sherry Magee35


For the most part, there are 19th Century players and active players who are obvious Hall choices.

Now, these are the players that meet the modified Black Ink Test standard:

Player Pts
Barry Bonds 178.59
Roger Clemens 157.50
Greg Maddux 143.25
Randy Johnson 130.75
Alex Rodriguez 105.33
Pete Rose 101.97
Pedro Martinez 94.75
Curt Schilling 86.50
Ross Barnes 84.90
Tommy Bond 79.50
Bucky Walters 72.00
Frank Thomas 69.37
Bill Hutchison 65.75
Mark McGwire 65.71
Harry Stovey 63.50
Johan Santana 62.00
Roy Halladay 61.50
Dick Allen 56.89
Gavvy Cravath 56.65
Albert Pujols 54.90
Manny Ramirez 54.86
Larry Walker 52.55
Tony Oliva 50.94
Jeff Bagwell 48.29
Edgar Martinez 47.85
Ichiro Suzuki 47.78
Dale Murphy 46.31
Pete Browning 44.85
Sammy Sosa 44.06
Tip O'Neill 44.05
Todd Helton 43.58
Albert Belle 43.04
Sherry Magee 41.30
George Burns 41.25
Juan Pierre 40.50
Dave Parker 38.81
Don Mattingly 38.23
Dwight Evans 37.61
Jason Giambi 37.54


I cannot say that I am overly pleased by Juan Pierre making the list, but there are a number of undervalued expansion-era players that also make it. A number of players on the continuing purgatory of the writers' ballot (Parker, Mattingly, Murphy) make an appearance as well as some on the Vets list (Magee, Allen).

By the way, Rice now passes this test with a 49.57.

Next, we move to Similarity Scores. Of this method, James says, “A left-handed hitter will tend to be paired with another left-handed hitter. A player from the 1920s will tend to be paired with another player from the 1920s. A player who has post-playing career as a manager will tend to be paired with another player who was also a manager…Size—players are almost always paired with another player of the same size.” (p. 95)

That sounds like a great justification for a system, but it also might point to era biases within the system that pick players with similar demographics perhaps because those demographics directly impact the stats.

I propose that instead of comparing by raw stats, we compare based on stats weighted against era bias. Instead of looking at home runs hit, for example, we look at the number of home runs hit above expectation for a given era. Might we find similarities that are deeper and yet not apparent from the James’ Similarity Scores?

I created a set of expected values per stat per league. Then I took the total plate appearances for each player per league and year and created the amount he exceeded or was short of what was expected of the average player. In this way, Mike Schmidt leading the league with 36 home runs, which he did twice (1974 & 1984) stands out a bit more than the three players who hit 36 home runs in the American League in 1996—Alex Rodriguez, Geronimo Berroa, and Ed Sprague (really)—and ended up tied for 13th place in the league behind Mark McGwire’s 52 bombs.

Finally, the amount above/below expected was prorated per plate appearance over the player’s career. In this way, Babe Ruth exceeded the expected number of home runs for his career by .057 per plate appearance. He is followed by Mark McGwire (.051 per PA), Ryan Howard (.045), Jimmie Foxx (.041), Dave Kingman (.040), Ralph Kiner and Hank Greenberg (.039), Lou Gehrig (.038), and finally rounding out the top ten: Barry Bonds, Harmon Killebrew, and Mike Schmidt (.036). Ed Sprague is 785th.

I dropped Jasmes’ penalties for games played and at-bats (assuming a minimum of 1000 plate appearances). I compared runs, hits, doubles, triples, home runs, runs batted in, walks, strikeouts, stolen bases, batting average, on-base percentage, slugging, OPS, and kept James’ defensive position penalty (though I have outfielders truly divided by actual position not just lumped into one bucket as is done at Baseball Reference and may have been done in James’ original study).

The results are that the top-10 comps to Hank Aaron change from:

1. Willie Mays (783) *
2. Barry Bonds (I) (748)
3. Frank Robinson (667) *
4. Stan Musial (663) *
5. Babe Ruth (647) *
6. Carl Yastrzemski (627) *
7. Rafael Palmeiro (I) (611)
8. Mel Ott (601) *
9. Eddie Murray (588) *
10. Ken Griffey (I) 588)

To…

Joe DiMaggio* (855)
Willie Mays* (810)
Frank Robinson* (765)
Johnny Mize* (761)
Larry Walker (I) (754)
Vladimir Guerrero (I) (712)
Sam Thompson (663)
Chuck Klein* (657)
Harry Heilmann* (635)
Lip Pike (635)

I cannot say that I am entirely happy with either one of those lists. I think each has its plusses and minuses, but I am willing to follow the methodology and see what results we get being in the democratic mindset statistically speaking as we are.

By the way, the players with the lowest scores for their best comps are Babe Ruth and Ted Williams (both at 419) and ever execrable Bill Bergen (566), who was arguably the worst Hall-eligible batter of all time (sorry, Steve Jeltz does not qualify), he of the .170 career average, .228 OBP, .232 slugging average, and park-adjusted OPS 53 points below the league average. By James’ system Bergen has a number of comparable players because his stats though abysmal fall within the range of a number of lesser batters during mostly pitcher’s eras.

So where does this all leave us? If we look at the three players that started this blogging bloviation, the three men who are about to enter the Hall—Rickey Henderson, Jim Rice, and Joe Gordon—does this new methodology help to make their cases (or lack thereof) more clearly? To quote the estimable Mr. Owl, let’s find out.

As I stated earlier Henderson passed all four standards, Rice passed all but Hall of Standard (by 7.1 points), and Gordon passed not a one.

Of our new tests, Henderson passes three of four: Win Shares above baseline—both tests (210.35 WSAB), and Black Ink Modified (73.3). However, he fails the Modified Similar Scores test. Just one similar batter, Max Carey, is in the Hall, and most of the rest are Deadball-era leadoff outfielders with decent speed and good OPS’s, but Henderson’s uniqueness might not be properly captured there. Henderson falls from 100 percent passing to 88 percent.

Rice passes just the Modified Black Ink test (49.6). He has 96.34 WSAB, so he falls both Win Share tests. He has just one similar batter (Tony Perez) in the Hall. Rice falls from 75 percent to 50 percent, which seems to capture his borderline status properly.

Gordon keeps his perfect goose-egg streak going, going 0-for-4 in the new tests. He has just 80.67 WSAB, failing those two tests. He scores a 4.5 in the Modified Black Ink test. He has just one similar batter in the Hall (Bobby Doerr). Gordon remains at zero percent.

I think that these new tests are in the spirit of James’ “democratic” view of Hall of Fame candidates. The Modified Similarity Scores might need a little more tweaking and that is not surprising given the complexity and the scale of the calculations involved. At the other end of the spectrum, Win Shares Above Baseline are a vast improvement over the Fibonacci Win Scores that James originally proposed. In addition, having eight rather than four tests (five if you count the Fibonacci test) helps more clearly define the borderline cases, as Rice demonstrated.

My site with my opinions, but I hope that, like Irish Spring, you like it, too.

Comments


Commenting is not available in this weblog entry.



     Next Article:  THT Daily: Many losses>> <<Previous Article:  Book review: As They See 'Em