(Somewhat) sabermetric similarity scores

by Chris Jaffe
March 9, 2009

Several months ago I wrote an article for this website titled “The George Grantham all-stars” which had a somewhat ambitious, albeit largely frivolous goal: find out what the most well-rounded seasons of all time were.

Inspired by a study by Bill James, I (thanks to Baseball-Reference.com’s Play Index) put every season a player qualified for the batting title in a database, and compared his likelihood at hitting any sort of play by the overall average of the league he played in. For example, it would take a player’s doubles divided by his plate appearances, and see if he rattled them out at a better rate to his peers. This was done for a dozen stats: batting average, slugging percentage, on-base percentage, doubles, triples, home runs, walks, strikeouts, stolen bases, hit by pitch, sacrifice hits, and grounding into double plays. (The last nine stats were all figured on a per-plate appearance basis.) For negative actions, such as GIDP, I counted it as a mark in a player’s favor if he hit them less regularly than typical.

This was a fun study, but in hindsight I was so focused on my personal pet project that I possibly missed a bigger usage for the database created. By comparing how a player did in these dozen categories with league rates, I can potentially create some sabermetric-friendly similarity scores for individual seasons.

Similarity scores are one of the multitude of inventions Bill James created back in the Abstract days (and one Sean Forman has since made available at all player pages at B-ref). The idea is simple: compare a player’s stats to all others, and see who has the most similar careers. If two players had identical numbers, their sim scores would be 1,000. The lower down, the further they were from each other.

This was a breakthrough, but one criticism has consistently been made at it: the results are entirely context-dependent. The game’s offensive environment changes over time so the raw numbers can be deceiving. This is hardly news, and James himself clearly knew it. The sim scores were never meant to be perfect, but imperfection did not mean they were useless.

The database created for the Grantham study (call it the Grantham Database) offers a way to partially adjust for context. It doesn’t account for park, but its whole purpose is to account for era. That makes it worth looking at.

I should note that traditional sim scores have one huge advantage over what I am doing: they look at a person’s entire career, whereas this only looks at particular seasons. Actually, I’m sure there has to be some sort of way it could be used to account for careers, but that is well beyond my computer skills and the way the database is set up. (Long story—trust me on it, though.)

All I have to do is put a particular season through the database, and it will tell me what the most similar seasons were to it based on league-adjusted rates to the dozen offensive stats listed above. Sounds neat, doesn’t it?

Testing it: the Grantham Database and Geovany Soto

Since the database works best when looking at individual seasons, it probably works best to try out on someone with a short career, such as a rookie from 2008. This way I can look at what he did last year and his entire career arc at the same time. (The database isn’t really designed to look at someone’s entire career arc under any other circumstances.)

Geovany Soto makes a good test case for the database. Not only was he a rookie, but he won the Rookie of the Year Award. Besides, I’m a Cubs fan and I’d rather see what the crystal ball says for him.

Here was Soto’s line in 2008:

G	PA	AB	R	H	2B	3B	HR	RBI
141	563	494	66	141	35	2	23	86

BB	SO	HBP	SH	GIDP	SB	AVG	OBP	SLG
62	121	2	0	11	0	0.285	0.364	0.504

He was above average at most categories, most especially home runs per plate appearance. What happens when the 25-year-old catcher goes through the system?

Well, it turns out his No. 1 sim was another 25-year-old. Naturally, he’s another power hitter. It was the 1995 season for New York Mets first baseman Rico Brogna. Rico Brogna?!?!? I must admit, that would not have been my first guess. Looking at it, though. It makes sense. Like Soto, he had no steals, but plenty of pop. Along the lines, they had similar careers overall.

On the one hand, this is a compliment to Soto. As a catcher, his offensive performance was similar to a first baseman having a career year. That is not a bad thing.

This sim does highlight one problem I have with the Grantham Database. Just because two guys have extremely similar seasons, that does not mean their entire careers will play out the same. To do that, I’d need to examine several years of a player and see whose careers had similar contours. For various reasons, the database (and more importantly, myself) is very poorly equipped to do that.

For what it is worth, here are the top 10 sims for Soto’s 2008 season:

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

Year	Name	      Sim Score
1995	Rico Brogna	892
2008	Adam LaRoche	892
1974	Richie Zisk	888
2000	Jermaine Dye	886
1980	Ken Singleton	881
1996	Jeff Conine	879
1994	Ken Caminiti	878
1975	Richie Zisk	876
1966	Frank Howard	875
2004	Raul Ibanez	871

(Rico Brogna’s season comes narrowly ahead of LaRoche if you take it to the decimals place.)

In Bill James’s sim scores, player almost always have someone over 900 unless they are really unique. That is not the case here, so don’t read too much into the lack of 900s. The scores don’t mean Brogna’s season was especially unique; it’s just a difference in James’s system and the Grantham Database’s method. (The math determining the exact sim scores is boring and takes a long time to explain, so check the References and Resources section if you’re curious. Otherwise, don’t worry about it.)

Two Richie Zisk seasons are present, and a second Jeff Conine year comes in 21st place (1995, if you’re curious).

Interesting, but that highlights another concern: none of those guys are catchers. Positions ought to matter when looking at similar seasons because a catcher who posts similar numbers to a first baseman is more valuable because the position as a whole does not hit as well. Also, part of the fun in doing these things is trying to figure out what lies ahead, and not all positions age the same—that is especially true of catchers.

For that matter, Brogna is the one of the few who was the same age as Soto. Most are similarly aged, but not quite the same. Let’s account for this first. I’ll give a one-year age window either way around Soto’s season, so of the nearly 4,000 times someone aged 24 to 26 qualified for the batting title, these men had the most similar production to Soto:

Year	Age	Name	      Sim Score
1995	25	Rico Brogna	892
1974	25	Richie Zisk	888
2000	26	Jermaine Dye	886
1975	26	Richie Zisk	876
1987	26	Alvin Davis	868
2002	25	Pat Burrell	865
2006	26	Mark Teixeira	863
2001	24	Pat Burrell	858
2008	25	Miguel Cabrera	853
1992	26	Paul Sorrento	851

Interesting. There are some heavy hitters up there, as well as some memorable fizzlers. (I still remember thinking Alvin Davis was going to be a beast as a kid.)

Still no catchers, though. As it happens, catchers have qualified for the batting title about 1,100 times—295 were ages 24 to 26. (Well, it’s really only 126 because not all stats are available for every year of baseball history.) From that sample size, here are the most similar seasons to Soto’s 2008:

Year    Age	Name	      Score
1941	26	Frankie Hayes	839
1990	24	Todd Zeile	819
2004	25	Victor Martinez	805
1997	25	Mike Lieberthal	796
1978	26	Darrell Porter	780
1983	26	Jody Davis	764
2005	26	Victor Martinez	761
1967	25	Randy Hundley	759
1981	25	Lance Parrish	756
1956	25	Gus Triandos	749

Frankie Hayes? Yeah, I never heard of him either. His career was interesting. He broke into the majors for Connie Mack’s post-dynasty teams at age 18 and established himself as an offensive force by age 22. His production fell off drastically after 1941, but he still made All-Star games as late as 1946. His 1940 season is the 14th most similar season to Soto.

Victor Martinez is the only man to make it twice. Like Soto, his first big season came at age 25. He was very good for several years, but last year he was injured and ineffective all season at age 29.

Actually, many of the men above broke down early. That is always typical of catchers, admittedly, but it seems especially pronounced here. For example, the two other Cub catchers, Jody Davis and Randy Hundley, were both through by their early 30s. Lance Parrish was a perennial all-star in his prime, but he collapsed. It’s a bunch of Hall of Very Gooders.

Especially interesting are the 24-to-26-year catchers with the least similar seasons to Soto. The bottom 10 include Carlton Fisk’s 1972 season, and he went on to have the longest career of any catcher in baseball history. Also on the list was Craig Biggio’s year at catcher (before he was shuffled off to second base). Yogi Berra also appears in the bottom 10 (1950) and also has the 11th-least similar season (1949) to Soto’s 2008.

The question

That is interesting—but what does it mean? Do that data mean anything or are they just fun. The sabermetric world is full of junk stats that are ultimately more interesting than they are illuminating, adding to the information more than they add to our knowledge.

Does this qualify as one? I honestly don’t know. My hunch is that most of the results are more noise than signal. There is some information to be gleaned from it, at times. Run 20 players through it and you might learn something valuable about two or three.

Even then, it is rarely clear-cut, but always depends on the interpretation of the info. That’s fine by me as I’ve always thought sabermetrics is more an art than science, and this fits it. Like a painting, what one thinks of the results depends as much upon the individual looking at what is in front of him or her at least as much as it does on the object itself.

Personally, I take one thing from this: a reminder of just how damn hard it is to be an effective hitting catcher for a prolonged period of time. Most of the guys listed (in virtually all of the lists) did well in the short term. Well, jeez, you really don’t need a study to figure out that someone who hits really well at age 25 will be good for a while.

Yet it is difficult, especially for a catcher, to keep it up over the long haul. The Hall of Very Good is exponentially larger than the Hall of Fame for a reason. My hunch is that Soto has a chance to age better than many of those listed above because he doesn’t quite have as many miles on his legs. Hundley still owns the record for most games caught in a season, with 160. When Jody Davis worked in Wrigley, the Cubs stupidly cut his backup. Hayes I already went over. Still, Soto’s upside is likely Lance Parrish. That’s not bad, to put it mildly. That being, upsides—by their very nature—are optimistic-looking projections.

One more step

Actually, there is one other tweak that should be addressed. The Grantham Database is a very blunt instrument in that it weighs all 12 factors the same. If a person has very similar power, hitting, strikeouts and walks to another player, but is wildly off on sacrifice hits and hit by pitches, he won’t appear on the leaderboard.

Let’s focus on what really matters. While we’re at it, let’s polish it up a bit more. Some of the categories overlap—doubles, home runs, and slugging average all tell us something about power, for example.

I want to focus on some important, separate skills embodied in the Grantham Database and take it from there. To my mind, there are five worth focusing on: batting average, walks per plate appearance, strikeouts per plate appearance, stolen bases per plate appearance, and isolated power. (The last one isn’t one of the Grantham Database, but is easy to figure out based on it.)

They all look at different areas—hitting, batting eye, plate judgment, speed, and power—that matter. Some stats left out are more important, most notably OBP, but they are at least a bit redundant of what already is in. I’m looking for stats that best isolate components of offensive ability rather than the stats that are the most meaningful in and of themselves. I’d like to add in GIDP, but that rubs out much of the database, because MLB only began collecting that info in the 1930s.

Based on that, here are the most similar seasons to Soto’s 2008:

Year	Age	Name	      Score
1978	26	Darrell Porter	916
1956	25	Gus Triandos	914
1927	26	Gabby Hartnett	891
1973	24	Earl Williams	883
1941	26	Frankie Hayes	878
1965	24	Joe Torre	870
1967	26	Joe Torre	862
1985	25	Rich Gedman	857
1961	26	Johnny Romano	846
2004	25	Victor Martinez	834

Victor Martinez’s 2005 is the 11th-most similar season. It’s a better list, but it is still mostly a list of Hall of Very Gooders.

Want to hear something really cool? When I get rid of the age and position constraints and look for Soto’s 10 most similar seasons of all time among all the 12,500+ times someone qualified for a batting title in leagues that recorded all five stats, the names are virtually identical to the list just given.

Out of 12,500+ seasons, Darrell Porter’s 1978 is still number one. Hell, nine of the 10 seasons are the same. The only difference is that 23-year-old shortstop Ossie Bluege’s 1933 season squeezes in between the two Torre campaigns. The rest of the top 15 consist entirely of catchers ages 24 to 26. Neat, huh?

That leads me to conclude that this five-distinct-stat approach works far better than the overall approach.

Of the catchers listed above as Soto’s best comps, all made it to at least two All-Star teams, except for Earl Willliams, who had to settle for winning a Rookie of the Year Award. Going by Pete Palmer’s batting runs statistic, four (Gus Triandos, Rich Gedman, Johnny Romano, and Victor Martinez) all had their best offensive campaigns in their Soto-comps. That being said, almost all remained effective hitters for a while afterwards.

In terms of career arc, the best comps from those listed above and Romano and Martinez. Like Soto, they did not play much until turning age 25, but performed extremely well as soon as they established themselves as starters. Romano remained an effective catcher through his age-31 season, but apparently was derailed by an injury, playing only 24 more games afterward. Martinez had four first-rate seasons before last year’s injury-marred campaign. We’ll see what the future holds for him.

It should be remembered that this approach does not adjust for park. Since Wrigley played as a hitters’ park last year, Soto’s numbers are thus inflated and the comps a bit over his head. Also, they all played more MLB games prior to age 25 than Soto. In fact, several had established themselves as stars before age 25.

Right now, my best guess is that Soto will be a cut below Romano in his career.

References & Resources
The Grantham Database was assembled thanks to B-ref’s Play Index. Bill James’ version of similarity scores can be found on pages 72-72 of the The 1987 Baseball Abstract.

The math behind sim scores:

I’ll use an example to explain: let’s start with figuring Geovany Soto’s batting average. He hit .285 in a league with a .260 overall mark. Divide .285 by .260 for a result of 1.096: in other words his batting average was 9.6 percent better than league average. Now run this for everyone else in the database for all stats under examination. (For stats a player wants to avoid, such as strikeouts and batting average, invert it, so it’s league rate divided by his rate. That way a score of 1.096 will always mean the individual was 9.6 percent better.)

At any rate, bringing it back the batting average example, once you’ve figured Soto’s average was 1.096, figure out the difference between each player’s batting average score and 1.096. (In excel, the equation is =ABS[1.096-player AVG].)

Now do this for every stat under investigation. In the first version given in this article, that is 12 stats. Figure out the difference between each player’s scores and Soto’s scores for all 12 categories. Then determine the average difference.

If an average difference between Soto and another player is 22 percent, then they would have a similarity score of 780—22 percent different, so 78 percent similar. (In keeping with classic similarity scores, this is set with a possible high of 1,000.)

This is why sim scores here are rarely over 900. All it takes is a major discrepancy on one stat to really foul up the average, and with a dozen stats you’ll have at least one way off virtually every time. It’s virtually impossible to be within 5 percent all the time, so a sim score of 950 is essentially unattainable even if one category isn’t far off.

Ultimately, the sim score itself is derived very differently from how James did it, which is why a sim score of 820 here and one of 820 for him mean very different things. Please remember Rico Brogna’s sim score might “only” be 892 by the first method, but it’s out of 14,000+ seasons.

For the sim score version given in the second half, the same system is used, but now there are only five categories instead of 12.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG