May 23, 2013

Now Available for 2012


THT Essentials:

Now available


You can now purchase the Hardball Times Baseball Annual 2013, with 300 pages of great content. It's also available on Amazon and Kindle. Read more about it here.
Fangraphs Player Search:

THT's latest book


Third Base: The Crossroads is THT's new e-book, available for $3.99 from the Kindle store. The good news is that anyone can read a Kindle book, even on a PC. So enjoy the best from THT in a new format.

Most Recent Comments




And here's the full roster.



Or you can search by:


Creative Commons License
All content on this site (including text, graphs, and any other original works), unless otherwise noted, is licensed under a Creative Commons License.
Roll mouse over date for entries
THT Dispatch Calendar
May 2013
S M T W T F S



1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31

Friday, March 23, 2012

Similarity Scores: a very beta feature

Posted by Dan Brooks
Last night during the heart-stopping Syracuse-Wisconsin game, Harry and I were talking (okay, Harry was mostly watching his team win by the skin of its teeth) about ways to improve the Brooks Baseball player card system. We exchanged some data and are presenting the first of our "data-driven" search tools—pitcher similarity.

This feature is incredibly beta and likely to change over the next few weeks, but right now when you search for a player (let’s pick Josh Beckett), you will get a table listing other players in the "Josh Beckett Family," along with the "distance" to each player.

The scores are generated by comparing a vector of pitch speed, frequency, release, spin angle, and spin rate using MATLAB’s knnsearch algorithm to identify neighbors. Currently, we’re presenting the top five neighbors for each pitcher.

These are not perfect right now. We haven’t weighted the scores yet (that’s another conversation over basketball), so while we do a good job representing pitch mix and style, we’re not doing a very good capturing pitch speed yet.

There are also a few pitchers with hardly any comparables. Matt (@HouseOfTheBB) noted that neither Roy Halladay nor Mariano Rivera have a comparable pitcher at all!

Punch in a few pitchers, and let us know how our system is doing. Let us know over Twitter if we’ve really missed on someone. I'm @brooksbaseball, and Harry is @harrypav.



Dan Brooks is a Neuroscientist at Brown University. He operates BrooksBaseball.net and eats Fried Chicken during every Red Sox game, especially in September. Come follow him @brooksbaseball.


Comments

SCIENCE! said...

Had an auction-keeper draft last week. 10 team, AL only, 12 fielders, 8 pitchers (field gets thin fast). My last pick was Vargas at $1. I plugged him into your system and his closest match is Lester(283). Is this a good sign? How could this be used for a fantasy edge?

Posted 03/23  at  03:16 PM
Harry Pavlidis said...

Interesting idea, but none of this is directly tied to performance. Naturally speed and movement are correlated but not the end of the story. Adding in pitch mix, height and release point are intended to bring them closer than just stuff alone. It’s also based on a single year and a work in progress.

So, that said .... 283 is pretty good, but the closest I’ve been finding are 120-140. 400 is pretty far apart IMO, but the 200s are a common area. We’ll see how things develop.

Posted 03/23  at  04:25 PM
Nick Steiner said...

Have you guys read this?

http://www.hardballtimes.com/main/article/projecting-hanson/

How are you calculating the sim scores?

Posted 03/23  at  04:47 PM
Harry Pavlidis said...

yep, good to be reminded—- pitch location is not a variable, but it should be, split by batter hand. Oh, Dan…

Posted 03/23  at  10:09 PM
CJ in Austin said...

The closest I’ve found is Roy Oswalt and Zack Greinke at 97.  One of the oddest: most similar for Jordan Lyles is Livan Hernandez…the young and the old.

Posted 03/24  at  09:25 PM
Harry Pavlidis said...

Greinke and Guthrie are 64
http://brooksbaseball.net/player_cards/player_card.php?player=425844

Posted 03/24  at  09:30 PM
Max Marchi said...

Having Mo with no comps seems to me a hit for the algorithm.

Posted 03/25  at  12:37 PM
Dan Brooks said...

The algorithm is the Matlab KNN Function with some Minkowski metric as the distance function. So, really, the fact that Mo has no comps is more a win for the quality of the data than anything else. =)

Posted 03/25  at  12:57 PM
jessef said...

Have you considered putting all these data into an ordination like principal components analysis or nonmetric multidimensional scaling?  for what it’s worth, I generated an nmds ordination based on pitch usage data only http://www.bluebirdbanter.com/2012/2/12/2793612/sixteen-clumsy-and-shy-how-weird-are-knuckleballers-arsenals

Posted 03/25  at  03:07 PM
philly said...

Pretty cool, but it fails the kuckleball test:

Wakefield:

Pitcher Distance

Michael Ekstrom 1964
Craig Kimbrel 1964
Evan Meek 1964
Anthony Slama 1966
Chris Scholl 1969

Dickey:

Pitcher Distance

Parker Frazier 1863
Matt Daley 1864
Erik Hamren 1869
Donn Roach 1992
Kyle Waldrop 2015

When I saw Wake’s clustered in the 1960s I wondered if that was near a max distance, but Dickey blows him out of the water.

Posted 03/27  at  11:37 AM
Harry Pavlidis said...

Wake and Dickey throw different speed knucklers (Wake also has more than 1 speed but rarely throws the super slow ... excuse me, rarely threw the super slow), also Wake threw slower fastballs than Dickey’s sinker, Dickey throws changes.
That said, the weightings are going to be adjusted and we may add inputs (I already gave Dan all the data split by batter hand and threw in some pitch location stuff just to play with)

Posted 03/27  at  11:54 AM
George said...

I would be interested to see what kind of year to year numbers this would spit out.  it would give and idea of how much and individual pitcher changed year to year.  Also doing an even odd comparison between to see how similar a pitcher is to himself just to see how good this metric is.

Posted 03/29  at  05:44 PM
Page 1 of 1

Leave a comment:

Commenting is not available in this weblog entry.




The best online source for major league baseball tickets is Ticket City.

     Next Dispatch:  Things are trending up>> <<Previous Dispatch:  Danny Duffy PITCHf/x