I’m Batty for Baseball Stats

It took a little longer than I had hoped, but we have our first cut of batter and pitcher statistics now available on our Hardball Times Statistics Page. We plan to roll out more stats soon, but this initial cut is designed to tell you most of the essential information you need to know about each batter and pitcher.

The stats have only changed a little from last year’s, which were explained and elaborated upon in this article. We have added one more stat this year: the number of home runs per outfield flyball, which is discussed in this note. We also now have each batter’s infield fly rate.

I like to think our stats do two things for our readers:

  • Concisely show the underlying dynamics of success and failure for each batter and pitcher.
  • Give you some useful information not readily available elsewhere; namely, batted ball information for batters and pitchers.

All of our statistics are collected and sent to us nightly by Baseball Information Solutions, whose scorers categorize each batted ball as a groundball, flyball or line drive. This information is relatively new to the sabermetric community, and some of us are still discovering what it all means. Here are some recent articles that have dealt with the subject:

There’s power in them thar numbers, and we’re just starting to unlock it. Let me see if I can add something to the sabermetric ether:

We can improve the predictive power of FIP by normalizing each pitcher’s home run rate per flyball.

As Shandler’s research uncovered, pitchers don’t seem to have a whole lot of impact on the number of home runs allowed, other than the extent to which they allow flyballs in general. We track something called FIP at the Hardball Times, because we believe it is better than ERA at discerning how well a pitcher is pitching, and will pitch in the future.

Well, now we can improve FIP by “normalizing” the Home Run portion of the equation. Specifically, we will adjust the Home Run total to equal 11% of outfield flyballs allowed (which is the average in both leagues). Let’s call this xFIP, for Expected FIP. Here’s a list of pitchers who are most underperforming their xFIP (through last Friday’s games) and can be expected to improve:

Player           Team    IP     ERA   xFIP   Diff
Brown K.         NYA    24.0   8.25   3.80   4.45
Wright J.        NYA    19.7   9.15   5.10   4.05
Lilly T.         TOR    24.3   7.77   4.29   3.47
Anderson B.      KC     28.7   6.91   4.01   2.90
Elarton S.       CLE    25.0   7.20   4.50   2.70
Bell R.          TB     25.0   8.28   5.63   2.65
Backe B.         HOU    35.7   6.81   4.22   2.60
Kennedy J.       COL    33.3   7.56   5.05   2.51
Lohse K.         MIN    21.7   6.65   4.15   2.49
Harper T.        TB     18.7   6.27   3.82   2.45
Wood K.          CHN    26.3   6.15   3.78   2.38
Wilson P.        CIN    36.0   7.25   5.11   2.14
Vazquez J.       ARI    44.0   4.70   2.71   2.00

And here’s the converse list; pitchers whose ERA’s are most exceeding their xFIP, and can be expected to decline in performance:

Player           Team    IP     ERA   xFIP   Diff
Blanton J.       OAK    30.3   2.67   5.39  -2.72
Garland J.       CHA    39.0   1.38   4.01  -2.63
Chacon S.        COL    22.0   3.27   5.87  -2.60
Rogers K.        TEX    38.3   2.11   4.60  -2.48
Moehler B.       FLA    24.7   2.19   4.40  -2.21
Hampton M.       ATL    43.7   2.47   4.62  -2.15
Patterson J.     WAS    33.7   1.60   3.71  -2.11
Sabathia C.      CLE    24.0   2.63   4.62  -2.00
Contreras J.     CHA    34.7   2.60   4.54  -1.94
Seo J.           NYN    18.0   2.00   3.93  -1.93
Santos V.        MIL    34.3   2.88   4.79  -1.90
Robertson N.     DET    28.0   4.18   6.08  -1.90

A third White Sox pitcher, El Duque, just missed making this list.

Hitting home runs and hitting for BABIP are different skills, and even run somewhat counter to each other.

Last year, I found that line drive rates impacted a batter’s BABIP. This year, by analyzing the performance of all 2004 batters with at least 300 plate appearances, I was able to uncover a more complicated yet accurate model. Here’s the equation I found by regressing many components against BABIP:

BABIP = .245 + .52 times the Line Drive Rate (LD/BIP) - .16 times the 
Flyball Rate (FB/BIP) + .11 times the Strikeout Rate (K/AB)

The R squared of this equation is .39, which isn’t bad in the world of round balls and round bats. I initially threw a lot of different stats into the analysis, but found that these three simple stats tell the best story: If you hit line drives, avoid flyballs and strikeout more often, you’ll have a higher Batting Average on Balls in Play.

The Strikeout Rate finding was a surprise to me but Paul Scott of the Fourth Outfielder blog came to the same conclusion recently and, really, I have a feeling that other baseball researchers have found the same thing in the past. You may be surprised at this finding because a strikeout is an out, but remember, we’re measuring Batting Average on Balls in Play — the batter has already hit the ball. The fact that he tends to strikeout in other instances may be an indication that he’s a hard swinger and is more likely to hit the ball hard when he does hit it.

Which led to the next dynamic I analyzed: Home Run Rate, or home runs as a percent of balls hit (adjusted for ballpark). Specifically, I ran a series of regression analyses to uncover what stats might lead to higher home run rates and found an equation with an R squared of .47:

HR/BIP = .11 times the Flyball Rate (FB%) - .07 times the 
Groundball Rate (GB%) + .16 times the Strikeout Rate (K/AB)

Yes, to hit home runs, you have to hit flyballs and avoid groundballs. Line drives proved to be a non-factor in hitting home runs. You probably noticed that flyballs, which are a negative factor for BABIP are a positive factor for hitting home runs. And strikeouts are a positive factor for both.

A Hardball Times Update
Goodbye for now.

Of course, I’m looking at rates of batted balls. As I said, the problem with strikeouts is that the batter doesn’t bat the ball at all. So let me present one final regression result: the stats that most impact the rate of Home Runs Hit per At Bat (adjusted for ballpark):

HR/AB = .11 times Flyball Rate (FB%) - .09 * BABIP + .11 times the 
Walk Rate (BB/PA) + .06 times the Strikeout Rate (K/AB)

The R squared of this regression was .48. And here you can clearly see that the BABIP rate is negatively associated with the Home Run Rate. Now, there may be a mathematical phenomenon happening here; after all, BABIP purposely leaves home runs out of the equation, and this may impact the regression results in some way I didn’t capture. But this equation seems to cinch it: BABIP and HR rates are negatively correlated.

You probably also noticed that both walks and strikeouts are positively correlated with hitting home runs. But the impact of these two stats isn’t very strong. If you leave them out of the regression analysis, you still achieve an R Squared of .42.

One last regression I ran, by the way, was to see if I could find the factors that most drove home runs hit as a percent of flyballs (again, adjusted for ballpark). My model achieved an R Squared of .36 and essentially included two factors: flyball rate and strikeout rate. Turns out that the more you hit flyballs, the more likely it is that the proportion of flyballs you hit for home runs will increase! And, once again, the more you strikeout the more likely you are to hit proportionately more home runs.

Please let me remind you that this analysis includes only regular major league baseball players. If you took the strikeout factor to the extreme — that to hit home runs you should strike out all the time — then I would be the greatest slugger in major league history. I can assure you that’s not true. These equations describe relationships between the stats of established major league players, within a “normal” range of major league “behavior.” We can use these equations to establish baselines for current major league players, but that’s all we can do with them.

Here’s A Fun Stat

After forcing you to read all that mind-numbing regression analysis, I’d like to conclude with a list that actually may not mean a dang thing. It’s a list of each league’s leading batters in LD% plus HR/OF. These are the guys who are really nailing the ball when they hit it: hitting line drives (for BABIP) and hitting flyballs over the wall. And I’m going to call this stat “Force” because I’ve just been studying Newton’s Second Law of Physics with my son. I love helping my kids with their homework — I learn again all the stuff I’ve forgotten.

Here are the leading batters (and pitchers, too) in “Force” in the American and National Leagues (minimum 50 plate appearances or 20 innings pitched).

American League

Batters                             Pitchers
Player          Team   Force        Player          Team   Force
Roberts B.      BAL     .640        Young C.        TEX     .137
Varitek J.      BOS     .539        Sabathia C.     CLE     .141
Morneau J.      MIN     .528        Garland J.      CHA     .154
Young D.        DET     .504        Rogers K.       TEX     .163
Belliard R.     CLE     .473        Johnson J.      DET     .176
Ramirez M.      BOS     .468        Bautista D.     KC      .183
Tejada M.       BAL     .447        Clement M.      BOS     .193
Sexson R.       SEA     .439        Bedard E.       BAL     .197
Jones J.        MIN     .431        Robertson N.    DET     .213
Rodriguez A.    NYA     .431        Kazmir S.       TB      .220


National League

Batters                             Pitchers
Player          Team   Force        Player          Team   Force
Branyan R.      MIL     .631        Martinez P.     NYN     .168
Pena W.         CIN     .563        Patterson J.    WAS     .172
Nady X.         SD      .559        Francis J.      COL     .182
Kearns A.       CIN     .532        Perez O.        LAN     .187
Floyd C.        NYN     .523        Redman M.       PIT     .189
Nevin P.        SD      .512        Chacon S.       COL     .194
Cabrera M.      FLA     .499        Marquis J.      STL     .213
Utley C.        PHI     .498        Rueter K.       SF      .224
Diaz V.         NYN     .487        Wright J.       COL     .237
Klesko R.       SD      .481        Loaiza E.       WAS     .237

Interestingly, there are a number of pitchers on the pitcher list (Sabathia, Patterson, Garland, Robertson and Rogers) who are also most out-performing their xFIP. Now we see why — batters just aren’t hitting the ball hard against them.

In theory, pitchers don’t have a lot of control over line drives and home run rates, so we shouldn’t expect these guys to maintain their pace. Let’s watch them the rest of the year to see how they do.

References & Resources
Thanks to JC of the Sabernomics blog for his advice.


Dave Studeman was called a "national treasure" by Rob Neyer. Seriously. Follow his sporadic tweets @dastudes.

Comments are closed.