<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
    xmlns:admin="http://webns.net/mvcb/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:content="http://purl.org/rss/1.0/modules/content/">

    <channel>

    <title>The Hardball Times -- Josh Weinstock</title>
    <link>http://www.hardballtimes.com/main</link>
    <description>Baseball. Insight. Daily.</description>
    <dc:language>en</dc:language>
    <dc:creator>studes@hardballtimes.com</dc:creator>
    <dc:rights>Copyright 2013</dc:rights>
    <dc:date>2013-05-22T08:05:15+00:00</dc:date>
    <admin:generatorAgent rdf:resource="http://www.pmachine.com/" />


    <item>
      <title>Which umpire has the largest strike zone?</title>
       
<link>http://www.hardballtimes.com/main/article/which&#45;umpire&#45;has&#45;the&#45;largest&#45;strikezone/</link>
<guid>http://www.hardballtimes.com/main/article/which-umpire-has-the-largest-strikezone/#When:09:35:15</guid>       
<description><![CDATA[Admonished for reasonable mistakes and seldom credited for accuracy, umpires unquestionably have one of the hardest jobs in baseball. They must maintain a high level of concentration for <i>every single play</i>.<br />
<br />
But perhaps most difficult for umpires is when they are behind the plate. With modern technology, home plate umpires are under an absurd level of scrutiny. Borderline call? Replay it with K-zone or Pitchtrax. Egregious enough, and the mistake will be shown on ESPN or MLB Network the next day. They are also evaluated with Questec to standardize the strike zone.<br />
<br />
But this is not to trivialize putting umpires under the microscope. Indeed, home plate umpires have a significant effect on run scoring. How the strike zone is interpreted can affect not only ball-strike counts, but pitchers' approaches and batters' approaches. Small strike zone? A pitcher will have to modify his location so that he can throw strikes, an advantage for the hitter.<br />
<br />
Based on observational evidence, umpires seem to be pretty consistent across the league. But for a more exact measure, it is possible to calculate the size of an umpire's strike zone. To do this, I modified the method I used to find <a href="http://www.hardballtimes.com/main/article/a-different-take-on-plate-discipline/" title="swing area">swing area</a>. This time, I found the area in which umpires call a strike at least 50 percent of the time. I arbitrarily chose the 50 percent mark, but it seemed to be reasonable from a common sense standpoint. <br />
<br />
I eliminated all pitches where the umpire did not make a decision&mdash;when the batter swung. I was ambivalent about including pitches involved in intentional walks, but I end up keeping them; they compromise only a small amount of pitches, anyway. I also only looked at umpires who saw at least 3000 pitches in 2011 to make sure outliers weren't having as large an influence as they did when I looked at swing area. <br />
<br />
The umpires with the five smallest strikezones are:<br />
<pre>
Tim Tschida 	2.85
Tim McClelland 	2.99
Paul Schrieber	3.01
Ed Hickox	3.07
Chad Fairchild	3.07
</pre><br />
The umpires with the five largest strikezones are:<br />
<pre>
Phil Cuzzi	3.60
Ron Kulpa	3.60
Bill Miller	3.63
Ted Barret	3.63
Doug Eddings	3.65
</pre><br />
The mean strikezone area is 3.32 square feet. The overall distribution looks like this:<br />
<br />
<img src="http://www.hardballtimes.com/images/uploads/histump.png" border="0" alt="image" name="image" width="490" height="435" /><br />
<br />
The standard deviation is 0.16, but I'm not sure how well it describes the data in this case, so take that figure with a grain of salt. This means that Tim Tschida is nearly three standard deviations away from the mean, which is the largest distance from the center. I can't say I have much of a memory of Tschida's umpiring, so it would be interesting for people to report their observations of his umpiring in the comments. <br />
<br />
Because I limited the sample to umpires who had seen 3000 pitches in 2011, we only have 74 umpires. This is why the histogram looks a little ragged, even though strike zone area is probably normally distributed.<br />
<br />
But does strike zone area actually <i>mean anything</i>? To verify that these results actually mean something, I calculated the FIP for each umpire. The relationship was very significant, and in the correct direction&mdash;a larger strike zone means a lower FIP.<br />
<br />
The coefficient of strike zone area was -0.58, meaning that for every one foot increase in strike zone area, we can expect a decrese in FIP of about -0.58. The relationship had limited explanatory value though, with an R-squared of 0.13. This means that the values of strike zone area explain 13 percent of the variance in the values of umpire FIP. You can see this relationship below:<br />
<br />
<img src="http://www.hardballtimes.com/images/uploads/fip~umparea1.png" border="0" alt="image" name="image" width="490" height="435" /><br />
<br />
However, we may be underestimating the value of the metric. FIP includes home runs as a major component. And while strike zone area probably does have a relationship with home run rate, ballpark effects and randomness probably play a much larger role. A metric that ignores home runs is kwERA, an ERA estimator based on only strikeouts and walks. You can read more about the metric <a href="http://www.insidethebook.com/ee/index.php/site/article/lego/" title="here">here</a>. <br />
<br />
Strike zone area has a much stronger relationship with kwERA, yielding an R-squared of 0.39. You can see this relationship below:<br />
<br />
<img src="http://www.hardballtimes.com/images/uploads/kwERA~area1.png" border="0" alt="image" name="image" width="490" height="435" /><br />
<br />
I was also interested in the relationship between strike zone area and swing rate. My thinking was that if an umpire had a smaller strike zone, batters would have better pitches to hit and swing more. In other words, I guessed that strike zone area would have an inverse relationship with swing rate.<br />
<br />
Turns out the opposite is true, and that strike zone area has a positive relationship with swing rate, significant at a 95-percent level. Although this seems to controvert common sense&mdash;or at least my intuition, anyway&mdash;upon reflection, it makes sense. In many pitchers' counts, batters swing more than average. This is because they are trying to protect the plate.<br />
<br />
So if a larger strike zone means more pitcher counts, then it might also mean a higher swing rate. I should also note that there is a very tight distribution of swing rates among umpires, ranging from 44 percent to 47 percent. <br />
<br />
<b>Limitations</b><br />
<br />
There are many more factors at play here than just umpires. I have not adjusted the metric for batter identity, pitcher identity, league or ballpark. The largest variable which I have not accounted for is batter handedness; because left-handed batters and right-handed batters have different called strikezones, this probably decreases the accuracy of the metric.<br />
<br />
However, if we assume that each home plate umpire had a similar distribution of batter handedness, then the imprecision is distributed in a way that it doesn't make much difference overall. <br />
<br />
<b>Finishing thoughts</b><br />
<br />
While it can be interpreted in many ways, it seems to me that umpires are pretty consistent in terms of their strike zone sizes. However, what differences there are definitely have a significant effect on run scoring, which we can see in the relationship between strike zone area and kwERA. Strike zone area should compliment the research that is already out there, such as Brian Mill's <a href="http://princeofslides.blogspot.com/2011/04/umpire-call-database.html" title="umpire call database">umpire call database</a>.<br /><br /><a href="http://www.hardballtimes.com/main/downloads/" target="new">Click here</a> to learn about THT's download subscriptions.]]>

</description>
      <dc:creator>Josh Weinstock</dc:creator>
      <dc:date>2012-01-11T09:35:15+00:00</dc:date>

    </item>

    <item>
      <title>The limits of baseball</title>
       
<link>http://www.hardballtimes.com/main/article/the&#45;limits&#45;of&#45;baseball/</link>
<guid>http://www.hardballtimes.com/main/article/the-limits-of-baseball/#When:07:00:15</guid>       
<description><![CDATA[In calculus there is a concept called a limit. As described by Wikipedia: <br />
<br />
<blockquote> the concept of a "limit" is used to describe the value that a function or sequence "approaches" as the input or index approaches some value.</blockquote><br />
<br />
Many measures of athletic performance have a limit. Every few years someone runs the mile more quickly than anyone had before. But while we are continually running the mile faster, it would be unthinkable to run the mile in negative time, or even no time. This means that there is a lower bound which our mile times are approaching, but won't quite reach. It's hard to say exactly what that limit is, or when we will reach it, but we know it exists. <br />
<br />
We can also observe this phenomenon in baseball; pitchers throw much <a href="http://www.hardballtimes.com/main/article/lose-a-tick-gain-a-tick/" title="harder ">harder </a>than they used to just a few years ago. Most estimates have the average fastball velocity as increasing more than 1 mph in the past few years. While it's possible that teams are simply emphasizing velocity more than they used to, it also seems likely at least some of this velocity increase is real. But at some point, pitchers will stop throwing harder; without this upper bound, baseball players would eventually average infinite miles per hour.   <br />
<br />
I firmly believe that there is a higher level of competition in the major leagues now than ever before. Players are bigger and stronger, pitchers throw harder, and teams are smarter. As Ben Lindbergh  <a href="http://www.baseballprospectus.com/article.php?articleid=15703" title="wrote">wrote</a>, many teams now employ intelligent, "sabr-savvy" GMs. This makes any discussion of "would a star in the past be a star now?" problematic. Thanks to ESPN broadcasts, I used to listen <a href="http://www.fangraphs.com/players.aspx?lastname=Joe%20Morgan" target="_blank" class="player">Joe Morgan</a> occasionally engage in this discussion. He used to say that stars in the past could be stars now, but current stars may not have been as great in the past. Essentially, Morgan implicitly argued that players used to be better. His reasons, if I remember correctly, used to focus on how ballparks are smaller now, and that players are protected more.<br />
<br />
I ardently disagree. Say we estimate that in the past 10 years, pitchers are on average capable of throwing 1 mph harder than they were before. If we extrapolate this rate, then 50 years ago pitchers threw on average around 86 mph, which seems reasonable, although I wasn't around then. Baseball is a game in which every small change can have a large impact. Just hit the ball a fraction of an inch off the sweet spot, and you may weakly ground out or pop out. It stands to reason, then, that an increase from 86 to 91 mph is of an extreme magnitude. <br />
<br />
We have seen evidence that the level of competition in the majors has improved a lot, both in terms of players and in terms of the evaluation methods used. And despite our tendencies to cast star athletes as immortal, we know that they are human, bounded by flesh and bones. If baseball continues to improve, then at some point, all players will be nearly equal in talent and all teams will employ optimal strategies&mdash;the limit. <br />
<br />
What would such a world look like? And what does baseball today tell us about what it would be like?<br />
<br />
Pitchers would certainly be different. In recent times we have seen pitchers model themselves after <a href="http://www.fangraphs.com/statss.aspx?playerid=1303&position=P" target="_blank" class="player">Roy Halladay</a>; both <a href="http://www.fangraphs.com/players.aspx?lastname=Charlie%20Morton" target="_blank" class="player">Charlie Morton</a> and <a href="http://www.fangraphs.com/statss.aspx?playerid=4662&position=P" target="_blank" class="player">Brandon McCarthy</a> significantly revamped their mechanics and repertoire to imitate the Phillies right hander. And part of the changes they implemented have become more widespread in baseball. Both pitchers focused on developing a cutter and two-seam combination, and with good results. The cutter especially is a pitch that has increased in usage in recent years, and it's not hard to see why. The cutter is most often used in two ways; like a fastball, or a slider. When used like a slider, its benefits are especially clear. Sliders are notorious for causing pitching injuries, while cutters are generally considered to be less strenuous. And when used as a fastball, the cutter helps to keep batters guessing with a similar velocity pitch. <br />
<br />
The Roy Halladay model may grow in popularity. It doesn't require a pitcher to have very good stuff, but to be able to generate ground balls by throwing a cutter and two-seamer to each side of the plate. His mechanics are also considered very clean, and should be able to be imitated by many pitchers. But of course not every pitcher would be the same. That doesn't make sense for two reasons. Firstly, unless all pitchers are identical anatomically in the future, different pitchers have different optimal sets of mechanics and repertoires. Secondly, it would be a terrible decision from a game theory perspective, offering no variety in looks to the batter.  <br />
<br />
Change-ups may also grow in popularity. Teams may decide that they could save a tremendous amount of resources by eliminating injury-inducing sliders, which would help them get more innings out of their starters. And while much less strenuous, change-ups are not necessarily any less effective; last year two of the pitches with the highest whiff rates (swing-and-misses pitches) were <a href="http://www.fangraphs.com/statss.aspx?playerid=1852&position=P" target="_blank" class="player">Ryan Madson</a>'s and <a href="http://www.fangraphs.com/statss.aspx?playerid=4972&position=P" target="_blank" class="player">Cole Hamels</a>' change-ups. Change-ups have another big advantage as well; most have a very limited platoon split, while most sliders have significant platoon splits. Of course change-ups are not perfect. They largely rely on deception be effective; if the batter recognizes the pitch, it's basically just a slow fastball. Tt is a "feel" pitch, and given the obvious advantages, would probably be much more popular today than it actually is if it were easier to throw effectively.  <br />
<br />
But to be honest, I have no idea what the future will bring. Maybe pitchers will throw a two-seam, cutter and change-up with greater frequency than they do now, or maybe everyone will just throw knuckeballs. But the recent proliferation of sabermetrics in the management styles of major league teams serves as a bit of reminder of this ominous limit in the future. We can see a limit where, because all teams are so similar, major league baseball is not much more than a complex coin flip; a true game of inches, if it isn't already now. How far away is this limit and when we will get there? It's hard, if not impossible, to know. But we are certainly approaching it. <br /><br /><br /><a href="http://www.hardballtimes.com/main/downloads/" target="new">Click here</a> to learn about THT's download subscriptions.]]>

</description>
      <dc:creator>Josh Weinstock</dc:creator>
      <dc:date>2011-12-30T07:00:15+00:00</dc:date>

    </item>

    <item>
      <title>Josh Willingham&#8217;s contact struggles</title>
       
<link>http://www.hardballtimes.com/main/article/josh&#45;willinghams&#45;contact&#45;struggles/</link>
<guid>http://www.hardballtimes.com/main/article/josh-willinghams-contact-struggles/#When:09:44:15</guid>       
<description><![CDATA[Free agent<a href="http://www.fangraphs.com/statss.aspx?playerid=2103&position=OF" title=" Josh Willingham"> Josh Willingham</a> has always been one of my favorite players. In the past six years, he has quietly posted a wRC+ of 123&mdash;good enough for seventh among all qualified left fielders during that time. And in each of those six years, he was remarkably consistent, posting a wRC+ of at least 117 in each year. Thanks to above average discipline and power, he plods along every year as an underrated contributor on offense. While he may not have quite the same skill on defense, he's been good enough overall for 2-3 WAR every year. I suppose I like him because in addition to being unheralded, he's good&mdash;but not <i>too </i>good. I don't know if that's a good reason, but I don't care.<br />
<br />
But I'm not the only one who likes Willingham. According to reports, this offseason he has been sought after by the Twins, Indians, Mariners and Rockies. These teams are still pursuing Willingham despite a major downturn in his plate discipline. In 2011 he halved his BB/K ratio from the previous year&mdash;a drop from .79 in 2010 to .37 in 2011. He struck out 150 times&mdash;28 more than his previous career high, and his walk rate fell, too. <br />
<br />
It's not that he tried a completely new approach at the plate in 2011. Indeed, he swung at about the same rate of pitches in 2011 as he did in 2008-2010, and he chased about the same rate of pitches too. Despite an ostensibly identical approach, his contact rate&mdash;the percentage of pitches that a batter makes contact on out of all swings&mdash;dropped from about 81 percent in 2008-2010 to 76 percent in 2011. This undoubtedly contributed to his drop in plate discipline in 2011. Perhaps he was struggling against a certain kind of pitch in 2011?<br />
<br />
<img src="http://www.hardballtimes.com/images/uploads/whiff~pz_pitch.PNG" border="0" alt="image" name="image" width="582" height="426" /><br />
<br />
This graph shows his whiff rate by pitch height, split up by pitch type and years. I grouped together multiple pitch types to create the larger categories to get larger samples and to mitigate classification issues. Dotted lines indicate the vertical borders of the strikezone. The "other" category is basically all knuckleballs, which he was terrible against in 2011. But excluding these knuckleballs, his contact issues in 2011 are not unique to a type of pitch. It actually looks like he struggling more in general on pitches that are simply low in the zone. <br />
<br />
 <img src="http://www.hardballtimes.com/images/uploads/whiff~pz.PNG" border="0" alt="image" name="image" width="573" height="424" /><br />
<br />
Indeed, it appears that his issues in 2011 can be traced more specifically to contact struggles on pitches low in the zone. I don't know what caused his problems on pitches low in the zone in 2011, but any suitors should be wary of this weakness. In the future, pitchers may expose Willingham on his new weakness, which would eliminate any possibility of Willingham's plate discipline returning to previous levels. Given that the bulk of his value comes from his bat, his contact issues on pitches down in the zone may indicate the beginning of decline.<br /><br /><a href="http://www.hardballtimes.com/main/downloads/" target="new">Click here</a> to learn about THT's download subscriptions.]]>

</description>
      <dc:creator>Josh Weinstock</dc:creator>
      <dc:date>2011-12-15T09:44:15+00:00</dc:date>

    </item>

    <item>
      <title>Predictive ability of swing area</title>
       
<link>http://www.hardballtimes.com/main/article/predictive&#45;ability&#45;of&#45;swing&#45;area/</link>
<guid>http://www.hardballtimes.com/main/article/predictive-ability-of-swing-area/#When:09:58:15</guid>       
<description><![CDATA[The internet baseball community loves to use plate discipline statistics. Measures like swing rate, in-zone rate, first-pitch strike rate, and many others are prevalent in many sabermetric analysis. A month ago, I introduced another metric called swing area, which you can read about <a href="http://www.hardballtimes.com/main/article/pitchers-and-swing-area/" title="here ">here </a>and <a href="http://www.hardballtimes.com/main/article/a-different-take-on-plate-discipline/" title="here">here</a>.<br />
<br />
We like these metrics because they give us a glimpse into the actual batter-pitcher match-up. With most baseball statistics, we are stuck with the result of a plate appearance. Strikeouts, walks, homeruns&mdash;these tell us nothing about what happened <i>during </i>the plate appearance. And while this is extremely important information, it feels somewhat detached from the actual baseball experience.<br />
<br />
But what can be lost in our frequent usage of plate discipline metrics is how much useful information they actually tell us. <br />
<br />
In the previous two swing area articles, I looked at what swing area tells us about batters and pitchers in the same year. This time I will look at what swing area, and other metrics, can tell us about the future. To test this, I first calculated swing area for 2010 pitchers with the same restriction as before: I only looked at pitchers who had thrown at least 1000 pitches.<br />
<br />
To confirm previous results, I looked at the relationship between 2010 swing area and 2010 strikeout rate. This time, after again ignoring outliers, I find that 2010 swing area explains 18.5 percent of the variation in 2010  strikeout rate. With 2011 data, I found that swing area explained 14.5 percent. As we found before, O-swing explained significantly less of the variation in strikeout rate than did swing area. <br />
<br />
If we look at the relationship between plate discipline metrics in the N-1 year and the strikeout rate in year N, we find some pretty interesting results. I should also note here that the only pitchers I looked at have thrown at least 1000 pitches in both 2010 and 2011 so that I could calculate swing area. I have also not controlled for aging or run environment. <br />
<br />
<img src="http://www.hardballtimes.com/images/uploads/k_rate11~area101.png" border="0" alt="image" name="image" width="500" height="362" /><br />
<br />
Unsurprisingly, swing area does worse at predicting strikeout rates of the next year than in the same year. However, it still does pretty well, relative to similar metrics. The coefficient of swing area is .019. This means that for every increase in the values of 2010 swing area by one, we can expect a corresponding increase in 2011 strikeout rate by a little less than two percent. This is equivalent to the coefficient I found for 2011 swing area and 2011 strikeout rate.<br />
<br />
O-swing is entirely worthless at predicting next year's strikeout rate; a regression of 2011 strikeout rate on 2010 O-swing yields an R-squared of zero, and O-swing does not even approach significance (p-value = 0.4 for a one-sided test, 0.8 for a two-sided test). <br />
<br />
But if we have plate discipline metrics from 2010, we also know strikeout rate from 2010. Do these metrics give us any information that the previous year's strikeout rate does not?<br />
<br />
If I run a regression of 2011 strikeout rate on 2010 strikeout rate and 2010 swing area, I find that swing area no longer has significance. I find the same result for O-swing. In other words, these plate discipline metrics are not useful in predicting the next year's strikeout rate if we already know the previous year's strikeout rate. <br />
<br />
<br />
<h3 class="article_title">Year-to-year correlations</h3><br />
key:<br />
<br />
K/PA = strikeouts per plate appearance<br />
whiff = whiffs / pitch<br />
contact = 1 - (whiff/swings)<br />
swing = swing rate<br />
zone = rate at which pitches are thrown in the strike zone. I have used two strike zones here for left-handed and right-handed batters, based on Mike Fast's research.<br />
BB/PA = walks per plate appearance<br />
fip = <a href="http://www.hardballtimes.com/main/statpages/glossary/#fip" target="new">fielding independent pitching</a><br />
oswing = percentage of pitches outside the strikezone that the batter swings at. <br />
rv100 = linear weights based metric that multiplies a pitcher's average run value per pitch by 100<br />
babip = batting average on balls in play<br />
area = swing area<br />
<pre>
K/PA    .73
whiff   .73
contact .73
swing   .68
zone    .68
BB/PA   .64
fip     .48
oswing  .45
rv100   .44
babip   .31
area    .28
</pre>I should first restate some limitations. Again, these are only for pitchers who threw at least 1000 pitches in both 2010 and 2011. The data also include both relievers and starters, which is likely artificially increasing the correlation of more than a few of these metrics.<br />
<br />
Unsurprisingly, strikeout rate is very stable from year to year. Disappointingly though, swing area is not very stable from year to year, and is less stable than O-swing.<br />
<br />
This can lead us to infer that swing area is subject to more noise than O-swing. Does this mean that it's less of a skill? Probably. Of course some of the low year-to-year correlation for swing area can likely be attributed to kinks in the calculation method, which I'm sure can be improved so that it does not exaggerate the swing areas of pitchers with data problems and outliers.<br />
<br />
Also surprising is the year-to-year correlation for BABIP, which higher than I expected. I'm sure the correlation is significantly inflated by the fact that I have both relievers and starters in the sample, and relievers typically demonstrate lower BABIPs than starters by about 17 percent. <br />
<br />
Still so much to explore. Why is swing area so much more useful for pitchers than hitters? Why is swing area much more useful than O-swing in predicting strikeout rates in both year N and year N-1, but less stable? And in more general terms, are we placing too much importance on plate discipline stats? These metrics seem to have use if we want to create a narrative, but they are not very helpful when we want to make predictions.<br /><br /><a href="http://www.hardballtimes.com/main/downloads/" target="new">Click here</a> to learn about THT's download subscriptions.]]>

</description>
      <dc:creator>Josh Weinstock</dc:creator>
      <dc:date>2011-11-30T09:58:15+00:00</dc:date>

    </item>

    <item>
      <title>Pitchers and swing area</title>
       
<link>http://www.hardballtimes.com/main/article/pitchers&#45;and&#45;swing&#45;area/</link>
<guid>http://www.hardballtimes.com/main/article/pitchers-and-swing-area/#When:06:36:15</guid>       
<description><![CDATA[Two weeks ago I introduced a method to calculate the area of a batter's swing zone, which you can view <a href="http://www.hardballtimes.com/main/article/a-different-take-on-plate-discipline/" title="here">here</a>. I created the stat because O-swing&mdash;the percentage of swings on pitches outside the strike zone&mdash;is so... <i>unsatisfying</i>. O-swing gives us no information about how far a batter is chasing. Swing area, on the other hand, can tell us in easily understandable terms how large a batter's swing zone is. A more detailed explanation is given in the article, but I calculate swing area by finding the area of the 22.4 percent swing contour. <br />
<br />
While analysis of swing variance yielded intriguing results, swing area proved to be somewhat disappointing as a measure of plate discipline. For batters it measures the same skill as O-swing, and does not do a better job than O-swing of predicting walk rates. <br />
<br />
<i>Thanks again to Lucas Apostoleris for coming up with the idea to perform this kind of analysis. </i><br />
<br />
<h3 class="article_title"><br />
But what if we apply this analysis to pitchers?</h3><br />
I re-ran the swing area analysis on all pitchers who threw at least 1,000 pitches in 2011, including playoffs, with the intention of getting as many pitchers as possible. The cutoff is arbitrary, but the calculations involve regression that can be pretty sensitive to outliers if the sample is too small. The 1,000-pitch cutoff also creates some sampling bias; the only pitchers who threw at least 1,000 pitches in 2011 were the ones who were either healthy or good enough to do so. To demonstrate the effects of this sampling bias, here is the distribution of strikeout rates among the pitchers in the data set:<br />
<br />
 <img src="http://www.hardballtimes.com/images/uploads/density~k_rate.png" border="0" alt="image" name="image" width="500" height="363" /><br />
<br />
See that long tail on the right side of the graph? That tells us that the distribution is skewed positively, which is something to keep in mind later when we look at the relationships between various plate discipline metrics and strikeout rate. <br />
<br />
I should also note that, for the sake of transparency, I have slightly tweaked my calculation since I introduced it; I increased the number of bins that I was using, making the calculation a little more accurate. <br />
<br />
Back to swing area. After running the calculations, here are the top five starters in swing area (feet):<br />
<br />
<pre>
Carlos Carrasco 14.7 
Josh Beckett 	13.1
Kevin Correia 	12.2
Douglas Fister  12.1
Jeff Niemann    11.5
</pre><br />
<br />
A major surprise at the top of the list is Carlos Carrasco. His placing is surprising because he neither records a lot of strikeouts nor is rated well by O-swing. He may be a good example of some of the limitations of this calculation method&mdash;I found a decent amount of his pitches that were clearly PITCHf/x errors in my data base. Not enough errors that we need to be very worried about the integrity of the data, but enough that we need to be mindful of the problem. Roy Halladay had the seventh largest swing area. <br />
<br />
The top five relievers who threw at least 1,000 pitches in 2011 are:<br />
<br />
<pre>
Cory Luebke     14.7
Drew Storen     14.2
Nick Masset     13.9
Heath Bell      13.6
Jonny Venters   12.9
</pre><br />
<br />
The top five here are far less surprising. I wasn't quite sure what group to put Luebke in as he started the year and finished as a starter, but most of his appearances came as a reliever so I that's where I kept him. <br />
<br />
The top 10 overall pitchers are:<br />
<br />
<pre>
Cory Luebke     14.7
Carlos Carrasco 14.7 
Drew Storen     14.2
Nick Masset     13.9
Heath Bell      13.6
Josh Beckett 	13.1
Jonny Venters   12.9
John Axford     12.8
Jim Johnson     12.2
Kevin Correia 	12.2
</pre><br />
<br />
And the pitchers with the five smallest swing areas are:<br />
<br />
<pre>
Brad Penny      6.5
Fausto Carmona  6.5
Casey Coleman   6.5
Zachary Britton 6.4
Tyler Chatwood  6.2
</pre><br />
<br />
The overall distribution looks like this:<br />
<br />
<img src="http://www.hardballtimes.com/images/uploads/density~area.png" border="0" alt="image" name="image" width="500" height="363" /><br />
<br />
The distribution suffers from the same skewness as the distribution of strikeout rates. This has the unfortunate effect of making it difficult to measure the center and spread of the distribution. In cases like this mean and standard deviation do not do such a great job of describing the distribution, so we should use other methods. The median swing area is 8.3 feet, and the first quartile&mdash;the 25th percentile&mdash;is 7.6 feet. The third quartile&mdash;the 75th percentile&mdash;is nine feet. Much less skewness was present with hitters. <br />
<br />
<h3 class="article_title">Comparing to O-swing</h3><br />
With hitters, I found a very strong relationship between O-swing and swing area. With pitchers....not so much. The relationship between O-swing and swing area is weak, with a correlation coefficient of .42. I looked only at the relationship between the two variables after ignoring all swing areas greater than 11 feet to try to combat some of the effect of skewness:<br />
<br />
<img src="http://www.hardballtimes.com/images/uploads/oswing~area1.png" border="0" alt="image" name="image" width="550" height="399" /><br />
<br />
The relationship is much weaker than expected. This suggest that there may be a difference between the ability to get batters to chase outside the zone and the ability to get batters to chase <i>far</i> outside the zone. With hitters we found no evidence for two separate skills. <br />
<br />
But which metric, O-swing or swing area, tells us more about a pitcher's strikeout rate? After again removing removing outliers (swing areas > 11), we find a correlation of .376 between swing area and strikeout rate, with swing area being significant at greater than a 99 percent level. The coefficient of swing area is .019. This means that for every increase in the values of swing area by one, we can expect a corresponding increase in strikeout rate by a little less than two percent. The R-squared is .14, meaning that the values of swing area explain 14 percent of the variation in strikeout rate. O-swing explains just 2.4 percent of the variation in strikeout rate, which is lower than expected. O-swing is also statistically significant, with a p-value of .01 (for a two-sided test). <br />
<br />
I also tested these two metrics with a measure of overall ability, in this case FIP. Both variables have a significant relationship with FIP, but swing area explains 15 percent of variation while O-swing explains a little less than three percent. Both coefficients are negative, meaning that there is a negative relationship between both variables and FIP, which is to be expected&mdash;the more you can get a batter to chase, the better of a pitcher you are. The coefficient of swing area is -.29. This means that for every one foot increase in swing area, we can expect FIP to decrease by .29. <br />
<br />
You can see both of these relationships below:<br />
<br />
<img src="http://www.hardballtimes.com/images/uploads/k_rate~area1.png" border="0" alt="image" name="image" width="500" height="362" /><br />
<br />
<img src="http://www.hardballtimes.com/images/uploads/fip~area1.png" border="0" alt="image" name="image" width="500" height="362" /><br />
<br />
I also tested both O-swing and swing area with BABIP, and neither were close to significance. This is important because often we attribute a pitcher's ability to induce weak contact to be in part their ability to get a batter to chase, but there is no evidence of this relationship. <br />
<br />
<h3 class="article_title">Finishing thoughts</h3><br />
Swing area tells us much more about both a pitcher's overall ability and more specific ability to record strikeouts than O-swing does. It seems that how far the batter is chasing does contain valuable information, and that O-swing may not be so useful for evaluating pitchers&mdash;it neither tells us much about strikeout rate nor BABIP ability.  But why is that swing area is so much more important for pitchers than hitters? I'll expand on the explanation that I'm working on in a later post. <br />
<br />
I have made the full results available here via google doc:<br />
<a href="https://docs.google.com/spreadsheet/ccc?key=0Aki1tMtbkcC9dFdzVGlMZ19nSlRCWGwwTFFieDhvOGc" title="spreadsheet">spreadsheet</a><br /><br /><a href="http://www.hardballtimes.com/main/downloads/" target="new">Click here</a> to learn about THT's download subscriptions.]]>

</description>
      <dc:creator>Josh Weinstock</dc:creator>
      <dc:date>2011-11-18T06:36:15+00:00</dc:date>

    </item>

    <item>
      <title>A different take on plate discipline</title>
       
<link>http://www.hardballtimes.com/main/article/a&#45;different&#45;take&#45;on&#45;plate&#45;discipline/</link>
<guid>http://www.hardballtimes.com/main/article/a-different-take-on-plate-discipline/#When:10:04:15</guid>       
<description><![CDATA[A few weeks ago, fellow THTer Lucas Apostoleris approached me with an interesting idea for evaluating plate discipline. Using O-swing&mdash;the percentage of a batter's swings that come on pitches outside the strikezone&mdash;we know how often the batter is chasing. But what we don't know&mdash;and what may be a very useful piece of information&mdash;is <i>how far</i> the batter is willing to chase outside the zone.<br />
<br />
This is because O-swing is binary. Whether the batter is chasing two inches off the corner, or a pitch that bounces before homeplate, is ignored:<br />
<br />
<img src="http://www.hardballtimes.com/images/uploads/assumption_plot.png" border="0" alt="image" name="image" width="450" height="450" /><br />
<br />
The implicit assumption that all chases are equivalent helps to create a simple and powerful metric, but one that is flawed. Of course, we can try different variations of O-swing, such as a model with <a href="http://www.baseballprospectus.com/article.php?articleid=15216" title="three outcomes">three outcomes</a> instead of two, but we are still bound by the same flaws.<br />
<br />
An important restriction is that these metrics live and die with the accuracy of their definitions of the strike zone, the arbitrary box that defines strikes and balls. Of course, right now we are pretty confident in our knowledge of the called strike zone, but we are not <i>100 percent confident</i>. A metric that measures plate discipline* without relying on a strike zone definition may provide new information.<br />
<br />
<i>*Of course, we can't actually measure plate discipline, because that would involve knowing what pitches the batter actually wants to hit. This is analogous to our inability to measure command; we don't know where the pitcher wants to throw the ball, we can only infer based on his pitch locations. However, we can use various metrics like walk rate and O-swing as reasonable proxies for plate discipline.</i><br />
<br />
<h3 class="article_title">Swing area</h3><br />
After discussion with Lucas, I calculated the swing area for each batter with at least 1000 pitches thrown to him in 2011. I did by this by finding the area of the 22.4 percent swing contour for each batter. I derived the predicted swing rates by performing logistic regression for each batter, with the existence of a swing being the dependent variable, and pitch location being the independent variable (with smoothing).<br />
<br />
Why 22.4 percent? That's half the average swing rate, which is admittedly somewhat arbitrary. The 22.4 percent swing contour is actually pretty expansive, representing all pitch locations where the  batter is expected to swing at least 22.4 percent of the time. Here is the league average 22.4 percent swing contour for right handed batters:<br />
<br />
<img src="http://www.hardballtimes.com/images/uploads/avg_swingarea1.PNG" border="0" alt="image" name="image" width="511" height="450" /><br />
<br />
Data are from the catcher's perspective, and the shading indicates the swing rate, where white indicates no swings and black indicates a high rate of swings. The dotted box represents the strike zone, and the black line that encircles it is the swing contour.<br />
<br />
As shown in the graph, the average swing area (at the 22 percent swing contour) is about 8 feet. For context, the area of the strike zone is a little less than 4 square feet. The standard deviation of swing areas for batters who saw at least 1000 pitches in 2011 is 1.8 feet.<br />
<br />
Here are the five batters with the smallest swing areas:<br />
<pre>
Jack Cust 	4.88 feet
Sam Fuld 	4.88 feet
Jorge Posada 	5.04 feet
Josh Willingham 5.12 feet
Nick Swisher	5.12 feet
</pre><br />
And the largest 5 zones are:<br />
<pre>
Mark Trumbo	15.08 feet
Alex Gonzalez	13.56 feet
Pablo Sandoval	13.16 feet
A.J. Pierzynski	12.68 feet
Miguel Olivo	12.64 feet
</pre><br />
<h3 class="article_title">Variance of swing rates</h3><br />
I also measured the variance of a batter's swing rates. Theoretically, variance should give us information about the batter's strategy. For example, if the player has no variance in his swing rates, then he is swinging at the same rate at every pitch location.<br />
<br />
Intuitively, a uniform distribution of swing rates would be a poor strategy, because there are different rewards (or penalties) based on pitch location. A batter with a high level of variance in his swing rates might swing very often at pitches down the middle, but very rarely at all other pitches. <br />
<br />
Here are the five batters with the greatest variance in swing rates, in decreasing order:<br />
<br />
Vladimir Guerrero<br />
Mike Carp<br />
Josh Hamilton<br />
Aramis Ramirez<br />
Pablo Sandoval<br />
<br />
I should also note that there is a strong relationship between variance of swings and overall swing rate; the correlation between the two is .81. <br />
<br />
And here are the five batters with the lowest variance in swing rates, in ascending order:<br />
<br />
Brett Gardner<br />
Jamey Carroll<br />
Bobby Abreu<br />
Sam Fuld<br />
Chris Getz<br />
<br />
<h3 class="article_title">Comparing to O-swing</h3><br />
To find out if swing area has any value as a metric, I compared it to O-swing. I calculated O-swing using PITCHf/x data and the strike zone definitions found by Mike Fast <a href="http://www.baseballprospectus.com/article.php?articleid=14572" title="here">here</a>. In the interest of transparency, I found the average O-swing to be 28.2 percent and the average zone rate&mdash;the percentage of pitches in the strike zone&mdash;to be 48.4 percent. <br />
<br />
First of all, I find that O-swing and swing area have a very strong relationship:<br />
<br />
<img src="http://www.hardballtimes.com/images/uploads/area~O-swing1.PNG" border="0" alt="image" name="image" width="550" height="435" /><br />
<br />
The strong relationship between the two metrics strongly implies that they are measuring the same skill, or a very similar set of skills. But which metric is more useful?<br />
<br />
To test this, I looked at the relationships between these metrics and overall walk rate. I found that O-swing explains 40 percent of the variation in walk rate, while area explains 36 percent of the variance in walk rate. If I run a regression with both swing area and O-swing, I find that the predictive ability of the model is only marginally improved&mdash;further evidence that O-swing and swing area measure the same skill.<br />
<br />
This result suggests that in terms of predicting walk rates, swing area does not serve any value if O-swing already exists.  Additionally, overall swing rate explains 38 percent of variance in walk rates. <br />
<br />
<h3 class="article_title">Other findings</h3><br />
I also looked at the relationship between some of these metrics and a measure of overall batter success, run value per 100 pitches (rv100), a linear weights-based statistic that calculates the average run value a batter produces in 100 pitches.<br />
<br />
Out of swing area, O-swing, and swing variance, swing variance had the strongest relationship with rv100. As noted earlier, swing variance has a strong relationship with overall swing rate. Despite the strength of this relationship, swing variance has a positive relationship with rv100 and swing rate has a negative relationship. <br />
<br />
Perhaps most interestingly, if I run a multiple linear regression with rv100 as the dependent variable and swing rate and swing variance as the explanatory variables, I find an R-squared of 22 percent. This is notable because if I run a regression with rv100 as the dependent variable and walk rate as the (only) explanatory variable, I find that the R-squared is nearly identical at 21.4 percent.<br />
<br />
This finding suggests that swing variance and swing rate combined tell us as much about a batter's overall batting ability as walk rate, which is very surprising. Swing variance also does not explain walk rate very well, so this may suggest that swing variance plays a role in power (ISO) or BABIP ability, but that's something to research at a later time. <br />
<br />
<h3 class="article_title">Finishing Thoughts</h3><br />
While swing area may not have yielded any additional information about plate discipline, it does give us a way to measure the expansiveness of a batter's swing area in familiar terms. Additionally, swing variance appeared much more significant than I expected it to be and may give cause for further research as to the relationship between swing variance and isolated power or BABIP. I also plan to extend this analysis to pitchers. <br />
<br />
In terms of limitations, swing area doesn't actually tells us anything about at what pitches or where the batter is swinging, which is a serious impediment. Additionally, I was not able to account for the context of these swings. By this I mean that batters <a href="http://www.hardballtimes.com/main/article/when-batters-swing-and-when-they-dont/" title="do not swing at the same rate in each count">do not swing at the same rate in each count</a>, so we are implicitly including the effect of different count distributions. Of course, this is also a problem with O-swing, but nonetheless, it's a significant bias and should be noted.<br /><br /><br /><a href="http://www.hardballtimes.com/main/downloads/" target="new">Click here</a> to learn about THT's download subscriptions.]]>

</description>
      <dc:creator>Josh Weinstock</dc:creator>
      <dc:date>2011-11-02T10:04:15+00:00</dc:date>

    </item>

    <item>
      <title>Two lackluster years</title>
       
<link>http://www.hardballtimes.com/main/article/two&#45;lackluster&#45;years/</link>
<guid>http://www.hardballtimes.com/main/article/two-lackluster-years/#When:10:02:15</guid>       
<description><![CDATA[<a href="http://www.fangraphs.com/statss.aspx?playerid=1507&position=P" title="John Lackey">John Lackey</a> has always been a little overrated. Praised for his postseason success and competitive demeanor, Lackey used to have a reputation for being one of the best starters in the American League. In 2007, he even contended for the Cy Young award, managing an impressive third-place finish.<br />
<br />
This esteem helped him land a monster five-year, $85 million contract from the Boston Red Sox. A surprising deal at the time, the signing is as perplexing now as it was then. In only one season had he bested an xFIP- of 87, and by the time he signed the contract, that was a few years in the past. <br />
<br />
Paid like a top-tier starter, Lackey has been anything but. Replacement level*. Bust. Flop. Call it want you want, he has not lived up to his contract. Aside from moving to a more offense-friendly environment and a tougher division, what has been driving his mediocrity? <br />
<br />
*<i>According to Baseball Reference WAR.</i><br />
<br />
<h3 class="article_title">Being easy to hit is bad</h3><br />
According to my database, Lackey had a whiff rate (swing-and-miss per pitch) of 8.1 percent from 2008-2009 and 7.3 percent in 2010-2011. Despite an ostensibly small difference, if we perform a significance test between the two proportions, we find a significant difference at a 95-percent level for a one-sided test. <br />
<br />
<img src="http://www.hardballtimes.com/images/uploads/whiff~px.PNG" border="0" alt="image" name="image" width="553" height="368" /><br />
<br />
This graph shows the whiff rate of all of Lackey's pitches based on horizontal pitch location, split by batter handedness and time frame. Gray bands indicate confidence. <br />
<br />
For righties, we see that he is really the same in both time ranges. This is affirmed when we break down his whiff rates by batter handedness. <br />
<br />
<pre>
Years      Handedness  rv100  Whiff  Swing  Contact
2008-2009       L      -0.69   0.07   0.44     0.85
2008-2009       R      -0.18   0.10   0.45     0.78

2010-2011       L       1.87   0.05   0.44     0.88
2010-2011       R       0.88   0.10   0.47     0.79
</pre>rv100 = linear weight run value per 100 pitches<br />
Whiff = whiffs / pitches<br />
Contact = 1 - (whiffs / swings)<br />
<br />
As you can see, the whiff dropoff is coming against lefties. When we look at the graph, we see that there is a significant difference between his whiff rates on pitches middle inside between the two time frames. In fact, it almost appears as if his whiff rates to lefties in that area of the zone have been halved. <br />
<br />
 <img src="http://www.hardballtimes.com/images/uploads/whiff~pz1.PNG" border="0" alt="image" name="image" width="586" height="387" /><br />
<br />
This graph shows the whiff rates of his pitches by vertical location, split by batter handedness and time frame. The vertical borders of the strike zone are marked by the two dotted lines. As the graph shows, his whiff rates are much lower on pitches below the bottom of the strikezone than before. <br />
<br />
Lackey has always relied on his breaking balls. His fastball has never had elite velocity nor spectacular movement. His changeup is usable, but not special. Lackey has always made his money with a very effective curveball- and slider-heavy repertoire.<br />
<br />
As soon as batters stop having trouble with his breaking balls, Lackey stops having success, and that's exactly what we see here. The opposition is simply no longer swinging and missing at his breaking balls below the zone. His trademark curveball now only garners whiffs on about 9.5 out of every 100 pitches&mdash;a rate significantly below the league average. <br />
<br />
Interestingly, batters haven't stopped chasing pitches below the zone.<br />
<br />
<img src="http://www.hardballtimes.com/images/uploads/swing~pz1.PNG" border="0" alt="image" name="image" width="580" height="377" /><br />
<br />
Batters are chasing pitches below the zone at a similar rate as before; they are just making more contact on these pitches. All of this may have a compounding effect. Since his breaking balls are less effective than before, it's possible that he has more trouble getting ahead of batters and staying in pitchers' counts, an effect that would reduce the effectiveness of all of his pitches. But when we look at the distribution of his counts, this is not what we find.<br />
<br />
<b>2008-2009</b><br />
<br />
<pre>
  0-0    0-1    0-2    1-0    1-1    1-2    2-0    2-1    2-2    3-0    3-1    3-2 
0.2741 0.1368 0.0604 0.1013 0.1072 0.0927 0.0296 0.0516 0.0762 0.0090 0.0188 0.0425 
</pre><br />
<br />
<b>2010-2011</b><br />
<br />
<pre>
  0-0    0-1    0-2    1-0    1-1    1-2    2-0    2-1    2-2    3-0    3-1    3-2 
0.2607 0.1286 0.0598 0.0975 0.1120 0.0959 0.0302 0.0576 0.0790 0.0086 0.0210 0.0491
</pre><br />
<br />
The distribution of counts against Lackey is, surprisingly, almost identical in each time frame. The only count that shows a difference greater than one percentage point is the proportion of 0-0 counts, which is higher in 2008-2009 than 2010-2011, implying that his at-bats in 2010-2011 were longer than before. <br />
<br />
<h3 class="article_title">Location, location, location</h3><br />
Until now, we really have only seen notable differences with Lackey's breaking balls; batters are no longer struggling with breaking balls below the zone. But his fastball has been very ineffective, as well, despite very similar movement and velocity. This suggests that another cause, such as command, may be at play here. To test this, let's look at the density of his horizontal pitch locations for his fastballs, split up by batter handedness and year.<br />
<br />
<img src="http://www.hardballtimes.com/images/uploads/density~px.PNG" border="0" alt="image" name="image" width="522" height="362" /><br />
<br />
In 2011, it appears that many more of his fastballs were down the middle (marked by the dotted lines) than in previous years. This is not a rigorous examination of his command, but it does suggest that he was having command issues with fastballs to lefties in 2011. <br />
<br />
<h3 class="article_title">Finishing thoughts</h3><br />
Lackey has struggled, with a major contributor being the reduced effectiveness of his breaking balls. His slider and curveball are simply not as challenging as they used to be, especially below the zone. As shown above, there is also evidence suggesting a drop in fastball command in 2011.<br />
<br />
Overall, though, not a lot has changed. Lackey started throwing more sliders than curveballs in 2011, but there is nothing in the data that suggests pitch selection is at fault here.<br />
<br />
A lot of his poor performance is likely due to bad luck; he has significantly underperformed his defense independent pitching statistics (DIPS) in the past two years, which is a characteristic that he did not display before. He did post an xFIP of 116 in 2011, which is bad, but not horrible. Going forward, it is doubtful Lackey will ever be worth what he signed for, but the tools are still there for a solid contribution. <br /><br /><br /><a href="http://www.hardballtimes.com/main/downloads/" target="new">Click here</a> to learn about THT's download subscriptions.]]>

</description>
      <dc:creator>Josh Weinstock</dc:creator>
      <dc:date>2011-10-20T10:02:15+00:00</dc:date>

    </item>

    <item>
      <title>Rick Porcello&#8217;s struggles</title>
       
<link>http://www.hardballtimes.com/main/article/rick&#45;porcellos&#45;struggles/</link>
<guid>http://www.hardballtimes.com/main/article/rick-porcellos-struggles/#When:10:01:15</guid>       
<description><![CDATA[Ace. Bona fide number one starter. Comparable to <a href="http://www.fangraphs.com/statss.aspx?playerid=510&position=P" target="_blank" class="player">Josh Beckett</a>. Fast forward to the present, and <a href="http://www.fangraphs.com/statss.aspx?playerid=2717&position=P" target="_blank" class="player">Rick Porcello</a> has not lived up to the praise he received as a top pitching prospect. Now he is caught in a limbo of sorts&mdash;is he a developing young ace, or a number four or five starter?<br />
<br />
Drafted 27th overall in the 2007 first year player draft, Porcello commanded what was then the largest bonus ever given to a high school player. He would have been drafted higher if not for his exorbitant contract demands. <br />
<br />
In his first year in the minors he was very impressive but with one red flag. Against far more experienced players in High-A, Porcello posted a 2.66 ERA and a FIP of 3.83. He used a heavy sinker and developing off-speed and breaking pitches to neutralize opposing batters.<br />
<br />
His big flaw&mdash;which is still a problem today&mdash;was his inability to rack up strikeouts. Despite purportedly excellent stuff and command, he only struck out 5.18 batters per nine innings thrown in the lower minors.  The lack of whiffs was dismissed; Porcello was just aggressive, inexperienced. <br />
<br />
In 2009, he was given the following <a href="http://mlb.mlb.com/mlb/minorleagues/prospects/y2009/profile.jsp?t=p_top&pid=519144" title="scouting report">scouting report</a> from MLB.com as the No. 4 overall prospect:<br />
<br />
<blockquote><b>Scouting report:</b> Porcello throws both a two-seam and four-seam fastball...both of which can reach the mid-90s with regularity, although the four-seamer is a bit faster. The heavy two-seamer has plenty of life down in the zone and induces ground balls.<br />
<br />
His curve improved in 2008 and could become a swing-and-miss pitch. He has a good feel for his change-up and can throw it in any count. Though he has a slider, he didn't throw it in 2008. He possesses excellent command, particularly of his fastball, to both sides of the plate. Makeup, poise and mound presence are all off the charts.<br />
<br />
<b>Upside potential: </b>Ace, All-Star, <a href="http://www.fangraphs.com/statss.aspx?playerid=1014369&position=P" target="_blank" class="player">Cy Young</a> candidate, you name it. He's been compared to <a href="http://www.fangraphs.com/statss.aspx?playerid=1303&position=P" target="_blank" class="player">Roy Halladay</a>, <a href="http://www.fangraphs.com/statss.aspx?playerid=8700&position=P" target="_blank" class="player">Justin Verlander</a>, <a href="http://www.fangraphs.com/statss.aspx?playerid=571&position=P" target="_blank" class="player">Roy Oswalt</a> and Josh Beckett.</blockquote><br />
Promoted to the majors later that year, as the youngest player in the majors, Porcello managed a very solid xFIP of 97, meaning that he was better than average by three percent in terms of expected FIP.  His rookie year performance was promising, if not impressive.  Unfortunately for Porcello, he has not progressed since that point, posting near identical xFIP figures.<br />
<br />
He also appears to be a pitcher who consistently underperforms his fielding independent pitching statistics. In over 500 career innings, the differential between his ERA and xFIP is 0.36 due to a poor strand rate.<br />
<br />
<br />
<h3 class="article_title">Looking for potential through PITCH-f/x</h3><br />
Overall, he is a pretty similar pitcher to when he made his debut; he still relies heavily on his two-seam fastball, but a little less heavily than before when he used to throw the pitch over 75 percent of the time . He has increased his slider usage and now throws the pitch about 20 percent of the time. He uses the curve much less than before, and his change-up usage has been pretty consistent. Here is what his repertoire looks like now:<br />
<br />
 <img src="http://www.hardballtimes.com/images/uploads/pfx_z~pfx_x1.PNG" border="0" alt="image" name="image" width="523" height="419" /><br />
<br />
This is a graph of the movement of Porcello's five pitches: four-seam (FF), two-seam (FT), changeup (CH), slider (SL), and curveball (CU). The color indicates velocity, and the size of the points indicates the usage of the pitch.<br />
<br />
As you can see, he throws his two-seamer most often and occasionally tosses in a curveball. Classifications are a mix of Gameday's and the output from clustering analysis. In terms of movement and velocity, his two-seamer is very, very, average, as are the rest of his pitches. In this way,his repertoire is similar to his overall performance&mdash;ordinary. <br />
<br />
While his stuff is nothing special, it's possible that he can improve his approach. MGL <a href="http://www.insidethebook.com/ee/index.php/site/comments/rick_porcello_and_his_sinker/" title="hypothesized">hypothesized</a> that he was throwing his sinker in a sub-optimal locations:<br />
<blockquote>I have never seen any of his Pitch-f/x data, but I would bet that he throws his sinker too far up in the zone, on the average, compared to other sinkerball pitchers. </blockquote><br />
So I looked into this, and compared the distribution of the pitch height of Porcello's sinkers to the league's sinkers:<br />
<br />
<img src="http://www.hardballtimes.com/images/uploads/density~pz.PNG" border="0" alt="image" name="image" width="536" height="368" /><br />
It turns out that the pitch height of his sinkers is pretty much identical to that of the league, where by "league" I mean a random sample of 5000 two-seamers/sinkers thrown by right handed pitchers in 2011. The two dotted lines represent the vertical borders of the strikezone. <br />
<br />
What did seem strange to me, though, is the frequency of his four-seam fastball. Why does he throw it so often (a little above 20 percent)? It's not a pitch that gets very many groundballs or whiffs, so it really does not seem to be effective on its own. Were he to replace all of his four-seams with two-seams, his groundball rate would likely be significantly higher, and he would not be sacrificing very many swings and misses.<br />
<br />
Of course, it is possible that his other pitches are benefited by his throwing a four-seam fastball, but I do not think that is very likely. And even if that were true, I would think that he could decrease his four-seam usage down to around five or 10 percent and retain the effect. It's also a pitch that he throws to both lefties and righties, so this is not a platoon matchup kind of pitch, either.<br />
<br />
Overall, Porcello is a pretty solid pitcher. He is never going to be a Beckett or a Halladay, but he may very well develop into a solid middle-of-the-rotation innings eater. If his career is disappointing, that's probably the fault of our own ridiculous expectations, and not any failure of his.<br /><br /><a href="http://www.hardballtimes.com/main/downloads/" target="new">Click here</a> to learn about THT's download subscriptions.]]>

</description>
      <dc:creator>Josh Weinstock</dc:creator>
      <dc:date>2011-10-06T10:01:15+00:00</dc:date>

    </item>

    <item>
      <title>Pitching repertoires and BABIP</title>
       
<link>http://www.hardballtimes.com/main/article/pitching&#45;repertoires&#45;and&#45;babip/</link>
<guid>http://www.hardballtimes.com/main/article/pitching-repertoires-and-babip/#When:10:04:15</guid>       
<description><![CDATA[People like trends. I like trends, you like trends, heck, even your mom likes trends. Why? Because we are human, and viewing the world in trends is intuitive and makes the big bad world just a little less scary. While tempting, this impulse does not always produce the best analysis.<br />
<br />
Consider hot streaks. All research indicates that the majority of hot streaks, especially in baseball, are not predictive. In fact, hot streaks have been referred to as a "cognitive illusion" by Thomas Gilovich, professor of psychology at Cornell University. But they certainly <i>feel</i> tangible, predictive, real. <br />
<br />
A great deal of sabermetrics controverts our itch for trends. A chief example is defense independent pitching statistics, or DIPS. Perhaps the ultimate sobering sabermetric doctrine, DIPS stresses the lack of control that pitchers have over outcomes on balls in play.<br />
<br />
Years after Voros McCracken's breakthrough research, DIPS has been intensely scrutinized and has come out relatively unscathed. Despite the support for DIPS, I will admit that at times I still struggle with the concept. I am human after all, and it is reasonable, if not wise, to be skeptical of a theory that  so strongly goes against common sense.<br />
<br />
It is often said that pitching is about disrupting the timing of the batter. I do not know who first said this, but it sounds reasonable. <i>[Ed.: It was Warren Spahn.]</i> And what better types of pitchers personify this advice than pitchers who throw lots of changeups? <br />
<br />
In reality, a changeup is really only effective when coupled with a fastball. In terms of movement, a changeup is very similar to a two-seam fastball.  The only difference is velocity, usually around an eight-mph separation.  It does not seem outrageous, then, that changeup-heavy pitchers might be a little better at preventing hits on balls in play than other pitchers.<br />
<br />
There is certainly anecdotal evidence. <a href="http://www.fangraphs.com/statss.aspx?playerid=755&position=P" target="_blank" class="player">Johan Santana</a>, <a href="http://www.fangraphs.com/statss.aspx?playerid=3543&position=P" target="_blank" class="player">Clay Buchholz</a>, <a href="http://www.fangraphs.com/statss.aspx?playerid=6204&position=P" target="_blank" class="player">Shaun Marcum</a>, and <a href="http://www.fangraphs.com/statss.aspx?playerid=4972&position=P" target="_blank" class="player">Cole Hamels</a> are just a few changeup-reliant pitchers who have sustained significantly better BABIPs than average over considerable sample sizes. But this kind of support does not mean very much, so I turned to PITCH-f/x data for help. <br />
<br />
First, I found all pitchers in 2011 who have thrown at least 1350 pitches through Sept. 19. I used only 2011 data because the run environment from this season&mdash;and consequently the league-average BABIP&mdash;are lower than in years previous. I used the 1350-pitch threshold to eliminate relievers from the sample, because relievers are known to be able to sustain lower BABIPs than starters. This gave me a sample size of 157 starters, which is not a huge amount, but sufficient. <br />
<br />
Here is a graph of the relationship between changeup usage and BABIP:<br />
<br />
<img src="http://www.hardballtimes.com/images/uploads/babip~ch_usage.PNG" border="0" alt="image" name="image" width="529" height="374" /><br />
<br />
This is for all 157 pitchers in the sample, and the gray bands indicate confidence. As you can see, BABIP fluctuates a little until around a .17 usage rate, when BABIP starts to fall. For purposes of transparency, I have the league-average BABIP for the 157 starters as .287. This is slightly lower than the league average found elsewhere because I did not deal with sacrifices (included as outs).<br />
<br />
After seeing this 17 percent threshold, I split the starters into two groups: Those who threw changeups at least 17 percent changeups, and those who do not.  I will refer to the pitchers who threw at least 17 percent changeups as the "changeup heavy" group. I will refer to the pitchers who threw fewer than 17 percent changeups as the "changeup light" group. <br />
<br />
The changeup heavy group (n=45) had a mean BABIP of .279, and the changeup light group (n=112) had a mean BABIP of .290. A two-sided t-test finds the difference in means to be statistically significant at a 98% level. <br />
<br />
<br />
<h3 class="article_title">Group Breakdowns</h3><br />
As suggested by <a href="http://twitter.com/#!/dturkenk/status/115982468988936192" title="Dan Turkenkopf">Dan Turkenkopf</a>, I looked at the mean velocities of these two groups.  For pitches classified as four-seam fastballs by Gameday, the changeup-heavy group averaged 90.05 mph, and the changeup-light group average 91.0 mph. A one mph difference was also observed with two-seam fastballs (including pitches classified as sinkers). For pitches classified as changeups, the changeup-heavy grouped averaged 81.97 mph and the changeup-light group averaged 83.38 mph. <br />
<br />
In terms of repertoires, here is how the two groups differed: <br />
<br />
<img src="http://www.hardballtimes.com/images/uploads/usagediff~pitch_type.PNG" border="0" alt="image" name="image" width="540" height="374" /><br />
<br />
This graph shows difference in pitch usage for four-seams (FF), two-seams (FT), sliders (SL), curveballs (CU), cutters (FC), and of course changeups (CH). As you can see, the only major difference is with changeups (13.2 percent). <br />
<br />
<br />
<h3 class="article_title">Batted Ball Differences</h3><br />
We know that batted ball profiles can help us predict BABIP. Pitchers at the extremes&mdash;tons of flyballs or tons of groundballs&mdash;are known to be able to sustain lower-than-average BABIPs. Therefore, if changeup heavy pitchers fall into one of these groups, then we have not really found anything interesting.<br />
<br />
I am calculating these batted ball profiles using the data available from MLBAM stringers. There are four possible types of batted balls: Fly balls, ground balls, line drives, and pop-ups. This is different from Fangraphs, which includes pop-ups within fly balls. <br />
<br />
In terms of groundball rates, the changeup-heavy group averaged 42.5 percent, and the changeup-light group averaged 43.5 percent. All other types of batted ball rates were within two percent for each group, meaning the batted ball profiles were essentially the same! Assuming these batted ball classifications are reliable, we can reason that batted ball differences are not the reason for the difference in BABIP skill.<br />
<br />
So which batted balls are going for hits less often for changeup heavy pitchers? Both ground balls and fly balls go for hits less often for changeup heavy pitchers, though line drives become hits at a marginally higher rate. I have not tested these individual BABIP differences for statistical significance. <br />
<br />
<br />
<h3 class="article_title">Limitations</h3><br />
As stated earlier, all classifications used were from MLBAM. This creates some uncertainty about the actual changeup rate for some of these pitchers. For example, if a pitcher throws both a splitter and a changeup, Gameday usually has a lot of trouble distinguishing the two (see: <a href="http://www.fangraphs.com/statss.aspx?playerid=3374&position=P" target="_blank" class="player">Ubaldo Jimenez</a>, <a href="http://www.fangraphs.com/players.aspx?lastname=Freddy%20Garcia" target="_blank" class="player">Freddy Garcia</a>).<br />
<br />
In addition, I have not adjusted for the counts in which these pitches were thrown. Ideally, each pitcher in the sample would have an identical distribution of counts. This is because the average BABIP is not same for each count; BABIP is going to be [presumably] higher in 2-0 counts than in 0-2 counts.<br />
<br />
Therefore, if pitchers that are in the heavy changeup usage group are better at getting ahead in the count, then we are just implicitly measuring the effect of count distribution on BABIP. Other cautions include that I have not adjusted for team fielding, ballpark, league, or opposition. Also important to note is the inaccuracy of MLBAM batted ball stringers. However, if these errors are randomly distributed, or at least distributed in a manner that does not systematically favor the BABIP of one group vs. another, then these limitations are not huge concerns. <br />
<br />
Another limitation, as pointed about by <a href="http://twitter.com/#!/garik16/status/116000553183617024" title="Josh Smolow">Josh Smolow</a>, is that of handedness. Indeed, we do find that changeup-heavy pitchers are more likely to be lefties than changeup-light pitchers:<br />
<br />
<img src="http://www.hardballtimes.com/images/uploads/handedness~CH_group1.PNG" border="0" alt="image" name="image" width="470" height="374" /><br />
<br />
But does this really matter? For the pitchers in the dataset, lefties have an average BABIP of .291, and righties have an average BABIP of .285, a difference which is not statistically significant. If anything, this means that we may be underestimating the effect of having a changeup that you can throw 17 percent of the time. <br />
<br />
Here is a graph of the relationship of BABIP by changeup usage, split up by pitcher handedness:<br />
<br />
<img src="http://www.hardballtimes.com/images/uploads/babip~ch_usage~throws.PNG" border="0" alt="image" name="image" width="528" height="372" /><br />
<br />
It may appear that lefties and righties are displaying very different relationships here to changeup usage, but part of that is because of the smoothing method used. If we present the data using a linear regression instead, the two groups look much more similar:<br />
<br />
<img src="http://www.hardballtimes.com/images/uploads/babip~ch_usage~throws_linear.PNG" border="0" alt="image" name="image" width="555" height="378" /><br />
<br />
However, I have not looked into splitting up the data by the handedness of the batter. It is also important to note that the 17 percent changeup usage threshold used to create the two groups is arbitrary. At lower thresholds, the difference in means in not significant. I re-ran the t-test using different thresholds, and the difference in means is significant at at least a 92.5 percent level for every threshold from 17 to 24 percent (17 percent, 18 percent, 19 percent, etc.).<br />
<br />
Keep in mind this is also a two-sided t-test, despite the fact that our alternative hypothesis is really one-sided, meaning that we can halve the p-value. This means that for a one-sided t-test, these results would be significant at above a 95 percent level (the standard level in the social sciences) for every threshold in the 17-to-24 percent range. <br />
<br />
<br />
<h3 class="article_title">One bias to watch out for</h3><br />
As stated earlier, I split the starters into two groups, those who throw changeups at least 17 percent of the time, and those who don't. So perhaps all we are measuring is the effect of having a pitch good enough to throw 17 percent of the time.<br />
<br />
Well, were this to be true, we would see a similar BABIP split with other pitches that are thrown at least 17 percent of the time.  But we don't. Slider-, curveball-, and cutter-heavy pitchers (usage greater than 17 percent) do not display a statistically significant BABIP advantage. <br />
<br />
<br />
<h3 class="article_title">Finishing Thoughts</h3><br />
Are changeup-heavy pitchers allowing weaker contact, or are these findings just the result of a confluence of luck and limited data? Unless we obtain a lot of HIT-f/x data, we won't know the answer to this question. But based on the above information, I would feel comfortable saying that there is evidence supporting a BABIP-suppressing skill for changeup-heavy pitchers.<br />
<br />
If we do accept that changeup-heavy pitchers have BABIP-suppressing skill, we also need to accept that this skill is not very large. In a way, that makes this just another win for DIPS. <br /><br /><br /><a href="http://www.hardballtimes.com/main/downloads/" target="new">Click here</a> to learn about THT's download subscriptions.]]>

</description>
      <dc:creator>Josh Weinstock</dc:creator>
      <dc:date>2011-09-22T10:04:15+00:00</dc:date>

    </item>

    <item>
      <title>Mark Teixeira&#8217;s shrinking BABIP</title>
       
<link>http://www.hardballtimes.com/main/article/mark&#45;teixieras&#45;babip&#45;issues/</link>
<guid>http://www.hardballtimes.com/main/article/mark-teixieras-babip-issues/#When:09:02:15</guid>       
<description><![CDATA[After five consecutive years of BABIPs over .300 for <a href="http://www.fangraphs.com/statss.aspx?playerid=1281&position=1B" title="Mark Teixeira">Mark Teixeira</a>, it was reasonable to write off his .268 babip in 2010 as an aberration. We see fluctuations in batting average on balls in play all the time, so 2010 seemed like just another example. Then 2011 happened. Currently posting a paltry .234, Teixeira has defied expectations. Unfortunately, we do not have the power right now to determine the cause.<br />
<br />
Baseball Info Solutions (BIS) data available at Fangraphs does suggest a slight shift in batted ball profile; in the past few years, Teixeira has steadily increased his flyball percentage. Of these flyballs, his pop-up rate has been higher than previous levels. Given that flyballs go for hits less often than line drives and ground balls, hitting more typically corresponds with decrease in BABIP. However, given the uncertainty surrounding the precision of batted ball info, slight changes in the data are not all that dependable. <br />
<br />
Another possibility is that his plate discipline has declined, driving Teixeira to chase pitches and make weaker contact than before. This theory is also supported by BIS data: In the past two years Teixeira has posted O-swing (the percentage of swing on pitches outside the zone) rates that are much higher than previous norms. But just like before, we need to exercise caution. The league average O-swing has increased dramatically in the past few years. If we verify these rates using PITCHf/x data, we find nearly identical O-swing rates. Additionally, given that his walk and strikeout rates have stayed strong, the possibility that plate discipline is causing his reduced BABIP seems remote. <br />
<br />
It seems then that we can not definitely give a <i>why</i>. But what we can do is look at <i>how</i>. <br />
<br />
<img src="http://www.hardballtimes.com/images/uploads/babip~px.png" border="0" alt="image" name="image" width="550" height="442" /><br />
<br />
This graph shows BABIP by horizontal location, split up by Teixeira's handedness; the left side is for when he bats left-handed, and the right side is when he bats right-handed. The graph also is split up by time period&mdash;2008-2009 and 2010-2011. Of course these time frames are arbitrary, but they do help to contrast his recent BABP performance with his previous levels. The data shown here are within a rough approximation of the strike zone. Gray bands indicate confidence. The sample sizes are not huge here so we have reason to tread cautiously. <br />
<br />
As you can see, there's not much of anything of note with his performance as a right-handed batter. Indeed, in 2010 and 2011 his BABIP as a right-handed batter has stayed above .290, which is pretty close to his career average once we account for the lower run environment in recent years. His performance as a left-handed batter is where things get interesting. As shown in the graph, there is no significant difference in his BABIP on pitches inside and down the middle. But with pitches on the outer half the plate [-1, 0], there is a clear dropoff. <br />
<br />
Were his BABIP decline in 2010-2011 completely the product of luck, we would expect to see it randomly distributed across his performance, not only on pitches that are outside. Of course that is only what we would expect; this does still not preclude the possibility that the difference in his BABIP is entirely due to luck, it just makes it seem less likely. The disparity is very stark; in 2010-2011, his BABIP on outside pitches is below .200, but in 2008-2009 his BBIP is around .300. We also see a difference when looking at the frequency of him pulling the baseball on outside pitches:<br />
<br />
<img src="http://www.hardballtimes.com/images/uploads/batted_angle~px1.png" border="0" alt="image" name="image" width="555" height="394" /><br />
<br />
This image shows the angle of Teixeira's batted balls by horizontal pitch location. This is only for his left-handed at-bats. The two dotted lines indicate the horizontal borders of the strike zone for a typical lefty. As expected, he pulls pitches most of the time. <br />
<br />
Of note here is that in 2010-2011 he does seem to be pulling outside pitches much more often than before. This may suggest that he has been rolling over on outside pitches more frequently in 2010-2011 than in 2008-2009, though I wouldn't necessarily categorize this as strong evidence for a change in approach. The graph also suggests that he is pulling inside pitches more often than before, which is surprising. <br />
<br />
Has Teixeira's swing as a left-handed batter changed such that his BABIP performance on outside pitches suffers? As shown above, the results appear to suggest that a change has occurred, but the answer to this question is one that could be better approached by a scout.<br /><br /><a href="http://www.hardballtimes.com/main/downloads/" target="new">Click here</a> to learn about THT's download subscriptions.]]>

</description>
      <dc:creator>Josh Weinstock</dc:creator>
      <dc:date>2011-09-07T09:02:15+00:00</dc:date>

    </item>


    </channel>
</rss>