Friday, March 23, 2012
Similarity Scores: a very beta feature
Posted by Dan BrooksLast night during the heart-stopping Syracuse-Wisconsin game, Harry and I were talking (okay, Harry was mostly watching his team win by the skin of its teeth) about ways to improve the Brooks Baseball player card system. We exchanged some data and are presenting the first of our "data-driven" search tools—pitcher similarity.
This feature is incredibly beta and likely to change over the next few weeks, but right now when you search for a player (let’s pick Josh Beckett), you will get a table listing other players in the "Josh Beckett Family," along with the "distance" to each player.
The scores are generated by comparing a vector of pitch speed, frequency, release, spin angle, and spin rate using MATLAB’s knnsearch algorithm to identify neighbors. Currently, we’re presenting the top five neighbors for each pitcher.
These are not perfect right now. We haven’t weighted the scores yet (that’s another conversation over basketball), so while we do a good job representing pitch mix and style, we’re not doing a very good capturing pitch speed yet.
There are also a few pitchers with hardly any comparables. Matt (@HouseOfTheBB) noted that neither Roy Halladay nor Mariano Rivera have a comparable pitcher at all!
Punch in a few pitchers, and let us know how our system is doing. Let us know over Twitter if we’ve really missed on someone. I'm @brooksbaseball, and Harry is @harrypav.
Dan Brooks is a Neuroscientist at Brown University. He operates BrooksBaseball.net and eats Fried Chicken during every Red Sox game, especially in September. Come follow him @brooksbaseball.







Had an auction-keeper draft last week. 10 team, AL only, 12 fielders, 8 pitchers (field gets thin fast). My last pick was Vargas at $1. I plugged him into your system and his closest match is Lester(283). Is this a good sign? How could this be used for a fantasy edge?