May 18, 2013
Now availableHardball Times Baseball Annual 2013, with 300 pages of great content. It's also available on Amazon and Kindle. Read more about it here.
THT's latest bookThird Base: The Crossroads is THT's new e-book, available for $3.99 from the Kindle store. The good news is that anyone can read a Kindle book, even on a PC. So enjoy the best from THT in a new format.
Most Recent Comments
A splitter from Buchholz (1)
A short story about two sinkers (4)
Umpire statistics (1)
Similarity Scores: a very beta feature (12)
Things are trending up (3)
And here's the full roster.
Or you can search by:
All content on this site (including text, graphs, and any other original works), unless otherwise noted, is licensed under a Creative Commons License.
Hello. Ball-tracking technology—PITCHf/x and its offspring—has changed the way we look at the game of baseball. This is a place for our writers to share pitcher profiles and thumbnails, topical information about games, trades and anything else we can think of that ball-tracking technology helps us understand or enjoy.
Friday, March 23, 2012
Last night during the heart-stopping Syracuse-Wisconsin game, Harry and I were talking (okay, Harry was mostly watching his team win by the skin of its teeth) about ways to improve the Brooks Baseball player card system. We exchanged some data and are presenting the first of our "data-driven" search tools—pitcher similarity.
This feature is incredibly beta and likely to change over the next few weeks, but right now when you search for a player (let’s pick Josh Beckett), you will get a table listing other players in the "Josh Beckett Family," along with the "distance" to each player.
The scores are generated by comparing a vector of pitch speed, frequency, release, spin angle, and spin rate using MATLAB’s knnsearch algorithm to identify neighbors. Currently, we’re presenting the top five neighbors for each pitcher.
These are not perfect right now. We haven’t weighted the scores yet (that’s another conversation over basketball), so while we do a good job representing pitch mix and style, we’re not doing a very good capturing pitch speed yet.
There are also a few pitchers with hardly any comparables. Matt (@HouseOfTheBB) noted that neither Roy Halladay nor Mariano Rivera have a comparable pitcher at all!
Punch in a few pitchers, and let us know how our system is doing. Let us know over Twitter if we’ve really missed on someone. I'm @brooksbaseball, and Harry is @harrypav.
Friday, March 16, 2012
Here's something new. Jeffrey Gross from THT Fantasy and I are exchanging "watch this guy" ideas—fantasy picks from one side and PITCHf/x based picks on the other. First up is Jeff's first breakout candidate for a cheap but valuable pitcher—Kansas City's Danny Duffy.
Click for more...
Thursday, March 08, 2012
Your definition of fun may vary. But Yu Darvish and his eight-pitch mix are going to make life interesting for catchers, hitters and even PITCHf/x analysts. Here's a picture from his Cactus League debut. The pitch in red was a strike three splitter to end an inning. The axes show movement during the flight to home plate from the catcher's perspective.
Click for more...
I thought it important to describe a new feature we've added to the PitchFX Player Cards over the last month or so. I’ve previously tweeted (@Brooksbaseball) about these features but haven’t described them in detail.
When the cards first debuted, we were asked by a number of people to provide average data for comparison, especially for the "Sabermetric Outcomes" table. For example, if Clay Buchholz got 45.96 percent whiffs/swing on his change-up, people wanted to know how good that was relative to other pitchers, and so they wanted some average number of swings and misses. They had a feeling it was good, but they wanted to know just how good.
The problem people don’t realize is that they really don’t want the average, because while it is useful in some contexts to know simply an average, it isn’t nearly as useful as knowing something about the distribution of scores. For example, if I told you that on some made-up metric Buchholz was a 7, and that the average was a 5, you’d know that Buchholz was above average but you wouldn’t know by how much. Maybe on this metric most good players score between 5-6, and so 7 is really outstanding. Maybe on this metric most good players score between 5-25, and so 7 is really not very exceptional.
So you can see, it would be nicer if I told you instead something about how far Buchholz was from the mean score as an expression of the distribution of scores. For that, we can use a Z-score.
The Z-Score is a simple concept in statistics. Simply, it tells you how many standard deviations a score is from the mean score. So, if you now scroll down to the “Sabermetric Outcomes” table on Buchholz’s player card, you can change “Percentages” to Z-Scores. This will contextualize the percentages that you see on the table (all of them, not just whiffs) so that you can better understand just how good the pitches are that you’re looking at.
It’s also important to think about which distribution is appropriate for comparison in this case. For example, we could compare Buchholz’s change-up to all other pitches, or, perhaps more appropriately in this context, compare the change-up to all other change-ups. We’ve chosen “all other change-ups” as our distribution. When you change the months or years on the player cards, it will change the numbers used for Buchholz’s data but won’t change the numbers used for calculating the Z-scores, because we didn’t want to get too fine with the comparisons that we made.
There’s also a problem in this dataset that arises when pitchers throw a very small number of pitches, because this makes their whiff numbers artificially high or low (luck plays a larger role). So, we’ve left pitches (e.g., Player X’s change-up) out of the distribution that didn’t get thrown at least 100 times. You can still look at those pitches as a function of Z-score, but they won’t be particularly meaningful. Even with these omissions, we’ve still got a very large sample to work with for each pitch in this case (except Knuckleball, which is a special case on its own).
You can also change the scores into “Pitch IQ Scores.” You can think of the “Pitch IQ Scores” exactly like you would think of Z scores, except, some people don’t like Z scores because it requires explaining basic statistics to people, and IQ scores are a sort of intuitive thing that we use in everyday life. The formula here is simply 100+15*Z (just like it is for IQ). We also often use the 100+/- system in baseball, for describing things like ERA+ or OPS+, so describing a pitch as having a 124 Whiff/Swing (pretty damn good) or a 64 Whiff/Swing (worse than useless) seems like something that could catch on, and might be easier for your readers to grok than numbers like “1.6” and “-2.4”.
Have fun, tab through, figure out which representation of the data you like best. I hope you enjoy the new features and we look forward to hearing additional feedback as the season begins.
Monday, March 05, 2012
The radar gun in Peoria may be running a little warm, but Andrew Cashner was throwing gas. The ex-Cub came in for the Padres in the sixth inning. He threw nothing under 101 mph and broke 103 a few times. Even if you deduct a couple mph for what are probably calibration issues with the PITCHf/x system, he was still pumping serious heat.
Click for more...
Wednesday, February 29, 2012
How Brett Myers will do as a closer and how valuable he is to the Astros in that role is not the question for this space. Just the fastballs.
Click for more...
Monday, February 27, 2012
Pitch classification is not an exact science. It requires a lot of review and rechecks over time to ensure no funny IDs have snuck in and that a pitcher's arsenel is accurately reflected. That last point can be the most difficult, especially if a pitcher changes things over time. And then there are changes in arm angles.
So, what's an PITCHf/x junkie to do? Well, the Brooks Baseball player cards are a start, as they've given anyone with the urge to review data the chance to do so, even if that's not what they're setting out to do. Take a recent effort by Adam Foster of Project Prospect, mix in a little Twitter and, voila, you can polish up some pitch classifications with relative ease.
Oh, that little Twitter put in the blender? It is especially helpful if the pitcher in question is active in the micro-blogging space. And an ESPN The Magazine feature rounds it all out.
Click for more...
Tuesday, February 21, 2012
After much anticipation, the trade that sent A.J. Burnett from the Yankees to the Pirates was finally completed on Sunday. Burnett had a tumultuous three seasons in the Bronx, with a 4.79 ERA and 4.5 walks/hit batters per nine over 584 regular-season innings.
His signature pitch is an 83 mph spike-curveball with exceptional movement. (Harry Pavlidis mentioned it here a few weeks ago—it’s not quite as nasty as Craig Kimbrel’s). Because he does a consistent job of keeping it below the strike zone, hitters don’t make much contact with the pitch when they swing at it, but it’s taken for a ball if when they don’t. The two tables below are for whiffs per swing and balls per pitch, minimum 2,500 pitches thrown dating back to 2007.
Rank Pitcher Pitch # Whiff% 1 A.J. Burnett Curveball 4575 43.9% 2 Cole Hamels Changeup 4158 43.8% 3 Rich Harden Changeup 3053 43.4% 4 Francisco Liriano Slider 2632 43.4% 5 Tim Lincecum Changeup 2946 42.9%
Rank Pitcher Pitch # Ball% 1 Randy Wolf Curveball 2562 44.2% 2 Gio Gonazlez Curveball 2663 43.2% 3 A.J. Burnett Curveball 4575 43.1% 4 Justin Verlander Curveball 3244 42.3% 5 Kevin Millwood Fastball 2718 41.2%
Pitch labels are THT's.
Friday, February 10, 2012
Craig Kimbrel throws a nasty breaking ball. He grips it like a spike slider (maybe) and the ball moves like a curveball (sort of). It's a relatively short curveball, but one moving upwards of 87 mph when it leaves Kimbrel's hand. The combination of speed and drop are unparalleled in the major leagues today.
Checkout the discussion and a picture of the grip in the forum at Brooks Baseball. The label is not as important as the key characteristics—speed and drop—when looking for comparisons to Kimbrel's offering. The first comparison is flame-thrower Henry Rodriguez.
Click for more...
Friday, February 03, 2012
Long term PITCHf/x data has always been difficult to find online. There are several existing sources available: Fangraphs has some of it, but not everything you might want (I’m sure they will tomorrow, just for that). Texas Leaguer has had a fantastic tool up for quite some time now, but still, there are places it lacks functionality. Josh Kalk used to have a wonderful website, but he moved on to the Rays. BrooksBaseball has never really had seasonal data per se, despite having data that spans each season.
We think that, generally, there are several reasons the data has been difficult to find in a long-term format. First, there is the technical limitation: The PITCHf/x dataset is large—millions of pitches. This means that dynamic solutions—most PitchFX systems are dynamic—must have very good caching systems, well written databases, and powerful hosting solutions.
But beyond raw computing, which we can solve using some combination of duct tape and Moore’s law, there are really two critical issues that are unfortunately intertwined. The first of these issues is a data quality limitation. The PITCHf/x data is beautiful and we have nothing but exemplary things to say about the people who both collect it and make it free to access. I hope I do not overstep my bounds when I say that Cory and his team at MLBAM and Greg and his team at Sportvision have probably contributed more to baseball research simply through the availability of this data than all but the most accomplished Sabermetricians.
That said, there are simply issues with the data, most of which are due to park-specific camera quirks that make individual games more grokkable than complete months or seasons (see Chris Carpenter’s data in the World Series, both in Texas and in St. Louis, for an example).
The second of these issues relates to pitch classification. This has become progressively less and less of an issue as the brilliant minds at BAM have worked to improve their classification algorithms, which have gone from mediocre to damn good in a very short amount of time. Yet still, if you’re going to average across a set of data to say something about a set of labels, the quality of what you report depends heavily on your data labels.
These last two issues are related in the following way: Chris Carpenter’s data includes park specific errors in both St. Louis and Texas, so, it is classified by the automatic algorithm differently in St. Louis and Texas despite being internally consistent. Therefore, quality issues propagate through various parts of the system. So, you really want a very qualified human to do the tagging. But, good luck convincing THT's Harry Pavlidis or Lucas Apostoleris to tag three and a half million pitches, because that would be insane.
What’s that you say? They’ve actually done that?!
Click for more...