Similarity Scores: a very beta feature

Last night during the heart-stopping Syracuse-Wisconsin game, Harry and I were talking (okay, Harry was mostly watching his team win by the skin of its teeth) about ways to improve the Brooks Baseball player card system. We exchanged some data and are presenting the first of our “data-driven” search tools—pitcher similarity.

This feature is incredibly beta and likely to change over the next few weeks, but right now when you search for a player (let’s pick Josh Beckett), you will get a table listing other players in the “Josh Beckett Family,” along with the “distance” to each player.

The scores are generated by comparing a vector of pitch speed, frequency, release, spin angle, and spin rate using MATLAB’s knnsearch algorithm to identify neighbors. Currently, we’re presenting the top five neighbors for each pitcher.

These are not perfect right now. We haven’t weighted the scores yet (that’s another conversation over basketball), so while we do a good job representing pitch mix and style, we’re not doing a very good capturing pitch speed yet.

There are also a few pitchers with hardly any comparables. Matt (@HouseOfTheBB) noted that neither Roy Halladay nor Mariano Rivera have a comparable pitcher at all!

Punch in a few pitchers, and let us know how our system is doing. Let us know over Twitter if we’ve really missed on someone. I’m @brooksbaseball, and Harry is @harrypav.

Print Friendly
« Previous: Five questions:  San Francisco Giants
Next: Camp developments »

Comments

  1. SCIENCE! said...

    Had an auction-keeper draft last week. 10 team, AL only, 12 fielders, 8 pitchers (field gets thin fast). My last pick was Vargas at $1. I plugged him into your system and his closest match is Lester(283). Is this a good sign? How could this be used for a fantasy edge?

  2. Harry Pavlidis said...

    Interesting idea, but none of this is directly tied to performance. Naturally speed and movement are correlated but not the end of the story. Adding in pitch mix, height and release point are intended to bring them closer than just stuff alone. It’s also based on a single year and a work in progress.

    So, that said …. 283 is pretty good, but the closest I’ve been finding are 120-140. 400 is pretty far apart IMO, but the 200s are a common area. We’ll see how things develop.

  3. Harry Pavlidis said...

    yep, good to be reminded—- pitch location is not a variable, but it should be, split by batter hand. Oh, Dan…

  4. CJ in Austin said...

    The closest I’ve found is Roy Oswalt and Zack Greinke at 97.  One of the oddest: most similar for Jordan Lyles is Livan Hernandez…the young and the old.

  5. Dan Brooks said...

    The algorithm is the Matlab KNN Function with some Minkowski metric as the distance function. So, really, the fact that Mo has no comps is more a win for the quality of the data than anything else. =)

  6. philly said...

    Pretty cool, but it fails the kuckleball test:

    Wakefield:

    Pitcher Distance

    Michael Ekstrom 1964
    Craig Kimbrel 1964
    Evan Meek 1964
    Anthony Slama 1966
    Chris Scholl 1969

    Dickey:

    Pitcher Distance

    Parker Frazier 1863
    Matt Daley 1864
    Erik Hamren 1869
    Donn Roach 1992
    Kyle Waldrop 2015

    When I saw Wake’s clustered in the 1960s I wondered if that was near a max distance, but Dickey blows him out of the water.

  7. Harry Pavlidis said...

    Wake and Dickey throw different speed knucklers (Wake also has more than 1 speed but rarely throws the super slow … excuse me, rarely threw the super slow), also Wake threw slower fastballs than Dickey’s sinker, Dickey throws changes.
    That said, the weightings are going to be adjusted and we may add inputs (I already gave Dan all the data split by batter hand and threw in some pitch location stuff just to play with)

  8. George said...

    I would be interested to see what kind of year to year numbers this would spit out.  it would give and idea of how much and individual pitcher changed year to year.  Also doing an even odd comparison between to see how similar a pitcher is to himself just to see how good this metric is.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current day month ye@r *