The Hardball Times

How Do You Like Your Data?

by Dave Studeman
January 12, 2006

Outfield Flies and Striking Out

Last week, I posted an article with some pictures of batted ball data that sparked a lot of interest and a number of questions. One question was about the graph that showed that the likely value of a batter's outfield fly compared to how often he strikes out. The graph looked like this:

The question regarded the position of batters on the graph. Obviously, the guys far above the line are great batters, but what about the other guys? What should we think of Russell Branyan and Mark Bellhorn? I don't know the complete answer to that question, but there is a formula implied by the line in the graph that can help. It's...

Outfield Run Value = -.146 + 1.2*Strikeout Rate

So if a batter never strikes out, his average outfield fly will yield .146 runs fewer than the average plate appearance. If he strikes out 10% of the time, they will yield .026 runs fewer (-.146 plus 1.2*.10). Going from 0% to 10% will increase the value of his outfield flies, but he'll hit 9% less of them. At the extreme, if he strikes out 100% of the time, nearly every outfield fly will theoretically be a home run. But he'll never hit one.

So what to think? Well, there is actually a break-even point in this equation, a point at which an average player can maximize his value between hitting stronger outfield flies and striking out. I'll leave the details in a footnote and graph the results here.

The "Relative Run Value" is the combined impact of harder-hit fly balls and more strikeouts, starting at zero when a batter never strikes out. At first, the good from the fly balls offsets the bad from the strikeouts. The maximum impact is between the 10% and 15% strikeout rate, and then the batter's overall value starts to decline. Once a batter strikes out 25% of the time, it's like not striking out at all. As the strikeout rate goes over 25%, the batter's value goes down fast.

Now, this is all theoretical, for an average batter. The graph only includes outfield flies and strikeouts (not line drives, walks and other stuff), and it ignores the realities of what makes an individual batter's style work. So take what I'm saying here with a large grain of salt. But when a player is striking out more than 25% of the time, and he's not above the line on the graph, he may not be helping your offense very much.

Presenting the Data

I should have mentioned that our batted-ball data is provided by Baseball Info Solutions, our baseball stats partner, who track every play of every game and enter the results in great big computers. All of our stats and much of our analysis is only possible due to their great data and hard work.

I bring this up because I get a lot of requests to see more of the batted-ball data. I'd like to comply, but there are a couple of issues. One, the data is very complex and it's not easy to present it effectively. Second, the data costs money because BIS incurs a lot of expense to collect it. We recoup our costs if enough people buy the Annual and click on the advertisements on the left. If that doesn't happen, we probably can't continue to buy the stats. Not a threat, just a fact.

But let's not worry about that second thing right now. Let's talk data tables. I spent the last week developing a data format that hopefully works (lots of data, easy to understand and pick out trends) for individual players. I'm going to present the stats for a few pitchers so you can tell me what you think of the format. First up is Pedro:

Martinez, Pedro

Net Runs per Ball Percent of Batted Balls % of OF % of PA Total Net Runs
OF LD GB OF% LD% GB% HR K OF LD GB NIP Runs
2002 -0.07 0.39 -0.11 31% 22% 42% 7% 30% -10.3 41.4 -22.4 -51.4 -47.1
2003 -0.05 0.38 -0.13 31% 22% 41% 4% 28% -8.1 40.9 -25.8 -41.8 -42.7
2004 0.06 0.35 -0.09 37% 19% 38% 11% 25% 13.3 38.7 -21.0 -41.2 -20.6
2005 0.02 0.34 -0.15 39% 17% 38% 8% 25% 3.4 32.5 -32.0 -44.1 -49.3
Avg. 0.01 0.36 -0.12 36% 19% 39% 8% 26% 1.4 37.6 -26.5 -42.7 -39.5
MLB Avg. 0.04 0.36 -0.10 31% 21% 44% 11% 17%

The first three columns list the average relative run value of the three main types of batted balls (outfield flies, line drives and ground balls) given up by Martinez. You can see a real difference in his performance between 2002/2003 and 2004/2005; he's given up more on outfield flies but less on line drives. In the first two years, his outfield flies were clearly below major league average (the last line), but he's become more average the last two years. (This data is adjusted for ballpark, by the way).

The next three columns show how often each type of batted ball was hit off the pitcher. You can see that Martinez's outfield fly rate has really jumped while his line drive rate has declined. In the next two columns, you see that his home run rate has increased (which explains the increase in outfield fly values) and his strikeout rate has dropped. Finally, the right-hand columns sum the impact of each type of batted ball (value times rate) and adds the impact of all balls not put in play (NIP: strikeouts, walks and HBP's). The final column is the sum impact of all of this, including all plate appearances.

Measured this way, 2005 was actually Pedro's best year (lowest net run value). He kept his outfield flies under control, got a lot of good impact from line drives and ground balls and didn't walk many batters (good NIP total). You'd rather have the Pedro of old, but this data shows how he has managed to stay successful while adapting to old baseball age.

Following are the stats of four more pretty good pitchers. Let me know if the stats "speak to you."

Santana, Johan

Net Runs per Ball Percent of Batted Balls % of OF % of PA Total Net Runs
OF LD GB OF% LD% GB% HR K OF LD GB NIP Runs
2002 -0.06 0.41 -0.05 39% 20% 33% 6% 30% -6.0 21.5 -4.0 -24.1 -18.4
2003 0.00 0.30 -0.11 41% 24% 28% 9% 26% 0.7 30.6 -12.7 -33.5 -20.8
2004 0.03 0.30 -0.12 37% 16% 41% 11% 30% 5.3 27.0 -26.5 -56.9 -60.4
2005 0.03 0.35 -0.10 37% 18% 39% 9% 26% 7.0 37.0 -24.1 -54.3 -46.1
Avg. 0.02 0.32 -0.11 38% 19% 36% 10% 28% 4.1 32.2 -20.7 -47.8 -40.9
MLB Avg. 0.04 0.36 -0.10 31% 21% 44% 11% 17%

What's not to like about Johan? Look how much he dominates the plate; his NIP net run values the last two years have been great.

Clemens, Roger

Net Runs per Ball Percent of Batted Balls % of OF % of PA Total Net Runs
OF LD GB OF% LD% GB% HR K OF LD GB NIP Runs
2002 0.07 0.37 -0.09 31% 22% 43% 12% 25% 10.0 40.3 -19.3 -33.6 -6.7
2003 0.03 0.35 -0.08 30% 22% 44% 12% 22% 5.0 46.3 -22.7 -35.2 -13.2
2004 0.02 0.35 -0.11 30% 18% 49% 8% 25% 4.0 33.8 -31.0 -36.6 -37.2
2005 -0.08 0.29 -0.14 27% 21% 49% 6% 22% -13.2 34.7 -39.3 -33.0 -56.5
Avg. -0.01 0.33 -0.11 29% 20% 47% 8% 23% -1.9 38.2 -31.0 -35.1 -36.4
MLB Avg. 0.04 0.36 -0.10 31% 21% 44% 11% 17%

Check out Roger's 2005. What's telling is that his NIP Runs stayed even with his 2004 levels. 'Twas the batted balls that made the difference.

Burnett, A.J.

Net Runs per Ball Percent of Batted Balls % of OF % of PA Total Net Runs
OF LD GB OF% LD% GB% HR K OF LD GB NIP Runs
2002 -0.07 0.37 -0.11 33% 20% 43% 6% 24% -11.7 38.8 -24.7 -27.9 -33.1
2003 0.07 0.37 -0.09 29% 16% 51% 12% 20% 1.2 3.7 -2.8 0.1 1.0
2004 0.02 0.33 -0.07 29% 17% 50% 9% 23% 1.4 18.0 -12.0 -19.5 -17.5
2005 -0.03 0.35 -0.09 20% 19% 58% 5% 23% -3.1 38.8 -30.9 -30.5 -29.7
Avg. 0.02 0.35 -0.08 26% 17% 53% 9% 22% 1.5 19.5 -14.4 -12.3 -10.2
MLB Avg. 0.04 0.36 -0.10 31% 21% 44% 11% 17%

Blue Jay fans are probably interested in this one. Check out that increase in groundball rates. And isn't it interesting that, for all of these pitchers, their NIP Runs tend to equal all of their runs? It looks as though any significant variance regresses to the mean. Hmm. Sounds like another article...

Colon, Bartolo

Net Runs per Ball Percent of Batted Balls % of OF % of PA Total Net Runs
OF LD GB OF% LD% GB% HR K OF LD GB NIP Runs
2002 -0.05 0.32 -0.07 30% 20% 46% 16% 15% -10.8 45.8 -23.8 -20.8 -21.0
2003 -0.03 0.31 -0.14 36% 22% 37% 11% 18% -6.5 50.1 -36.4 -27.6 -30.5
2004 0.12 0.33 -0.07 41% 17% 39% 14% 18% 31.1 35.6 -18.4 -22.8 17.2
2005 0.05 0.33 -0.12 35% 19% 41% 11% 17% 11.2 44.2 -34.4 -31.0 -19.1
Avg. 0.05 0.32 -0.11 37% 19% 39% 12% 18% 11.9 43.2 -29.5 -27.5 -11.0
MLB Avg. 0.04 0.36 -0.10 31% 21% 44% 11% 17%

Colon somehow keeps his line drives in check, but he's only average in home run and strikeout rates.

So what do you think? I've posted an entry in my baseball graphs blog for you to post comments about the stats format. If you've got feedback, please let me have it!

References and Resources
Here's the specific math I used to calculate the flyball/strikeout breakeven point. The key was that I held all other rates steady at the major league average:
- Walks in 10% of plate appearances
- Plate appearances minus walks and strikeouts equal Balls in Play
- I assumed a constant percentage of batted ball types (outfield flies, groundballs, etc.) of all balls in play. The rate for outfield flies is 31%.
- I also assumed constant run values for other types of batted balls.
- As I increased the strikeout rate, I assumed the value of a flyball would increase as outlined in the formula. Then I multiplied that value times the number of flyballs (which decrease proportionately as strikeouts increase) and subtracted the increased negative value of strikeouts.