I’m Batty for Baseball Statsby Dave Studeman
May 10, 2005
It took a little longer than I had hoped, but we have our first cut of batter and pitcher statistics now available on our Hardball Times Statistics Page. We plan to roll out more stats soon, but this initial cut is designed to tell you most of the essential information you need to know about each batter and pitcher.
The stats have only changed a little from last year's, which were explained and elaborated upon in this article. We have added one more stat this year: the number of home runs per outfield flyball, which is discussed in this note. We also now have each batter's infield fly rate.
I like to think our stats do two things for our readers:
- Concisely show the underlying dynamics of success and failure for each batter and pitcher.
- Give you some useful information not readily available elsewhere; namely, batted ball information for batters and pitchers.
All of our statistics are collected and sent to us nightly by Baseball Information Solutions, whose scorers categorize each batted ball as a groundball, flyball or line drive. This information is relatively new to the sabermetric community, and some of us are still discovering what it all means. Here are some recent articles that have dealt with the subject:
- MGL's article from over a year ago, which looked at batted ball type by pitcher.
- An initial article I wrote last year, which basically documented that line drives and flyballs are good for batters.
- A follow up regarding infield fly stats, which showed that flyball pitchers have a tendency to induce a slightly higher proportion of infield flies (which was reinforced by Robert Dudek's article in the Hardball Times Annual).
- Dan Agonistes' work with a different data source documenting the rate of outs and types of base hit by batted ball type.
- Research published in Ron Shandler's Baseball Forecaster that came to a couple of conclusions: pitchers don't have much control over whether batters hit line drives or not, and home runs hit off pitchers are pretty much a function of the number of flyballs allowed by the pitcher, and the ballpark.
There's power in them thar numbers, and we're just starting to unlock it. Let me see if I can add something to the sabermetric ether:
We can improve the predictive power of FIP by normalizing each pitcher's home run rate per flyball.
As Shandler's research uncovered, pitchers don't seem to have a whole lot of impact on the number of home runs allowed, other than the extent to which they allow flyballs in general. We track something called FIP at the Hardball Times, because we believe it is better than ERA at discerning how well a pitcher is pitching, and will pitch in the future.
Well, now we can improve FIP by "normalizing" the Home Run portion of the equation. Specifically, we will adjust the Home Run total to equal 11% of outfield flyballs allowed (which is the average in both leagues). Let's call this xFIP, for Expected FIP. Here's a list of pitchers who are most underperforming their xFIP (through last Friday's games) and can be expected to improve:
Player Team IP ERA xFIP Diff Brown K. NYA 24.0 8.25 3.80 4.45 Wright J. NYA 19.7 9.15 5.10 4.05 Lilly T. TOR 24.3 7.77 4.29 3.47 Anderson B. KC 28.7 6.91 4.01 2.90 Elarton S. CLE 25.0 7.20 4.50 2.70 Bell R. TB 25.0 8.28 5.63 2.65 Backe B. HOU 35.7 6.81 4.22 2.60 Kennedy J. COL 33.3 7.56 5.05 2.51 Lohse K. MIN 21.7 6.65 4.15 2.49 Harper T. TB 18.7 6.27 3.82 2.45 Wood K. CHN 26.3 6.15 3.78 2.38 Wilson P. CIN 36.0 7.25 5.11 2.14 Vazquez J. ARI 44.0 4.70 2.71 2.00
And here's the converse list; pitchers whose ERA's are most exceeding their xFIP, and can be expected to decline in performance:
Player Team IP ERA xFIP Diff Blanton J. OAK 30.3 2.67 5.39 -2.72 Garland J. CHA 39.0 1.38 4.01 -2.63 Chacon S. COL 22.0 3.27 5.87 -2.60 Rogers K. TEX 38.3 2.11 4.60 -2.48 Moehler B. FLA 24.7 2.19 4.40 -2.21 Hampton M. ATL 43.7 2.47 4.62 -2.15 Patterson J. WAS 33.7 1.60 3.71 -2.11 Sabathia C. CLE 24.0 2.63 4.62 -2.00 Contreras J. CHA 34.7 2.60 4.54 -1.94 Seo J. NYN 18.0 2.00 3.93 -1.93 Santos V. MIL 34.3 2.88 4.79 -1.90 Robertson N. DET 28.0 4.18 6.08 -1.90A third White Sox pitcher, El Duque, just missed making this list.
Hitting home runs and hitting for BABIP are different skills, and even run somewhat counter to each other.
Last year, I found that line drive rates impacted a batter's BABIP. This year, by analyzing the performance of all 2004 batters with at least 300 plate appearances, I was able to uncover a more complicated yet accurate model. Here's the equation I found by regressing many components against BABIP:
BABIP = .245 + .52 times the Line Drive Rate (LD/BIP) - .16 times the Flyball Rate (FB/BIP) + .11 times the Strikeout Rate (K/AB)
The R squared of this equation is .39, which isn't bad in the world of round balls and round bats. I initially threw a lot of different stats into the analysis, but found that these three simple stats tell the best story: If you hit line drives, avoid flyballs and strikeout more often, you'll have a higher Batting Average on Balls in Play.
The Strikeout Rate finding was a surprise to me but Paul Scott of the Fourth Outfielder blog came to the same conclusion recently and, really, I have a feeling that other baseball researchers have found the same thing in the past. You may be surprised at this finding because a strikeout is an out, but remember, we're measuring Batting Average on Balls in Play -- the batter has already hit the ball. The fact that he tends to strikeout in other instances may be an indication that he's a hard swinger and is more likely to hit the ball hard when he does hit it.
Which led to the next dynamic I analyzed: Home Run Rate, or home runs as a percent of balls hit (adjusted for ballpark). Specifically, I ran a series of regression analyses to uncover what stats might lead to higher home run rates and found an equation with an R squared of .47:
HR/BIP = .11 times the Flyball Rate (FB%) - .07 times the Groundball Rate (GB%) + .16 times the Strikeout Rate (K/AB)
Yes, to hit home runs, you have to hit flyballs and avoid groundballs. Line drives proved to be a non-factor in hitting home runs. You probably noticed that flyballs, which are a negative factor for BABIP are a positive factor for hitting home runs. And strikeouts are a positive factor for both.
Of course, I'm looking at rates of batted balls. As I said, the problem with strikeouts is that the batter doesn't bat the ball at all. So let me present one final regression result: the stats that most impact the rate of Home Runs Hit per At Bat (adjusted for ballpark):
HR/AB = .11 times Flyball Rate (FB%) - .09 * BABIP + .11 times the Walk Rate (BB/PA) + .06 times the Strikeout Rate (K/AB)
The R squared of this regression was .48. And here you can clearly see that the BABIP rate is negatively associated with the Home Run Rate. Now, there may be a mathematical phenomenon happening here; after all, BABIP purposely leaves home runs out of the equation, and this may impact the regression results in some way I didn't capture. But this equation seems to cinch it: BABIP and HR rates are negatively correlated.
You probably also noticed that both walks and strikeouts are positively correlated with hitting home runs. But the impact of these two stats isn't very strong. If you leave them out of the regression analysis, you still achieve an R Squared of .42.
One last regression I ran, by the way, was to see if I could find the factors that most drove home runs hit as a percent of flyballs (again, adjusted for ballpark). My model achieved an R Squared of .36 and essentially included two factors: flyball rate and strikeout rate. Turns out that the more you hit flyballs, the more likely it is that the proportion of flyballs you hit for home runs will increase! And, once again, the more you strikeout the more likely you are to hit proportionately more home runs.
Please let me remind you that this analysis includes only regular major league baseball players. If you took the strikeout factor to the extreme -- that to hit home runs you should strike out all the time -- then I would be the greatest slugger in major league history. I can assure you that's not true. These equations describe relationships between the stats of established major league players, within a "normal" range of major league "behavior." We can use these equations to establish baselines for current major league players, but that's all we can do with them.
Here's A Fun Stat
After forcing you to read all that mind-numbing regression analysis, I'd like to conclude with a list that actually may not mean a dang thing. It's a list of each league's leading batters in LD% plus HR/OF. These are the guys who are really nailing the ball when they hit it: hitting line drives (for BABIP) and hitting flyballs over the wall. And I'm going to call this stat "Force" because I've just been studying Newton's Second Law of Physics with my son. I love helping my kids with their homework -- I learn again all the stuff I've forgotten.
Here are the leading batters (and pitchers, too) in "Force" in the American and National Leagues (minimum 50 plate appearances or 20 innings pitched).
American League Batters Pitchers Player Team Force Player Team Force Roberts B. BAL .640 Young C. TEX .137 Varitek J. BOS .539 Sabathia C. CLE .141 Morneau J. MIN .528 Garland J. CHA .154 Young D. DET .504 Rogers K. TEX .163 Belliard R. CLE .473 Johnson J. DET .176 Ramirez M. BOS .468 Bautista D. KC .183 Tejada M. BAL .447 Clement M. BOS .193 Sexson R. SEA .439 Bedard E. BAL .197 Jones J. MIN .431 Robertson N. DET .213 Rodriguez A. NYA .431 Kazmir S. TB .220 National League Batters Pitchers Player Team Force Player Team Force Branyan R. MIL .631 Martinez P. NYN .168 Pena W. CIN .563 Patterson J. WAS .172 Nady X. SD .559 Francis J. COL .182 Kearns A. CIN .532 Perez O. LAN .187 Floyd C. NYN .523 Redman M. PIT .189 Nevin P. SD .512 Chacon S. COL .194 Cabrera M. FLA .499 Marquis J. STL .213 Utley C. PHI .498 Rueter K. SF .224 Diaz V. NYN .487 Wright J. COL .237 Klesko R. SD .481 Loaiza E. WAS .237Interestingly, there are a number of pitchers on the pitcher list (Sabathia, Patterson, Garland, Robertson and Rogers) who are also most out-performing their xFIP. Now we see why -- batters just aren't hitting the ball hard against them.
In theory, pitchers don't have a lot of control over line drives and home run rates, so we shouldn't expect these guys to maintain their pace. Let's watch them the rest of the year to see how they do.
References and Resources
Thanks to JC of the Sabernomics blog for his advice.
Dave was called a "national treasure" by Rob Neyer. Seriously. Comments about this article can be sent to him through the miracle of e-mail.
<< Return to Article