April Musingsby Sal Baxamusa
April 30, 2007
Writing about baseball while baseball is being played is tough for me. I like large datasets and lots of statistics with tight error bars. April is about the worst time of year for that. Don't get me wrong—any day that baseball is being played is a good day. It's just April is the worst month of the season to be trying to draw conclusions about what's happening.
A few weeks into the season, some player is always teasing us by being on pace for 120 home runs or 30 losses. Internet and print writers start talking about the demise of Mariano Rivera or the resurrection of Ramon Ortiz. The Yankees are in the last place and the Pirates are above .500. Anything seems possible.
But by June the regression demon kills the sample size fairy and we're back to baseball as usual. Rivera is untouchable. Ortiz is giving up bombs at his usual rate. The Yankees snuggle into first place and the Pirates shack up the cellar.
Quickly, before the dataset gets too large!
Before all of that happens, it's always fun to trip through the statistics pages and pick out a wacky numbers. One of my favorites is FIP, or Fielding Independent Pitching, which strips pitching ability down to walks, strikeouts, and home runs and expresses the result as a number on the same scale as FIP. Over the course of a few starts, however, a pitcher might get some flukey-good or flukey-bad defense behind him and his ERA might be way lower or way higher than his FIP. By the end of the season FIP is usually within 1.20 of ERA for most starting pitchers. But in April...
Pitcher FIP ERA FIP-ERA Matt Cain 4.02 1.55 2.47 Orlando Hernandez 4.90 2.53 2.37 Dan Haren 3.75 1.41 2.34 Rich Hill 3.83 1.57 2.26 John Maine 3.93 1.71 2.23 Tom Glavine 5.24 3.07 2.17 ... David Bush 2.78 5.04 -2.26 Zach Duke 4.64 6.92 -2.28 Miguel Batista 4.92 7.54 -2.63 Clay Hensley 4.62 7.86 -3.24 Jae Seo 5.90 9.51 -3.61 (among qualified pitchers through April 27)
Unsurprisingly, the pitchers at the top of the list are among the league leaders in ERA. While all are outperforming their FIP by a decent margin, only Orlando Hernandez and Tom Glavine could be described as pitching poorly. Matt Cain, John Maine, Rich Hill, and El Duque are stranding an ungodly percentage of their baserunners - 90% or more - which is a neat trick but hardly sustainable. They can pitch just as well and expect to see a drop in their numbers. Excluding Glavine, all of the top pitchers are also getting a good amount of defensive support, with their fielders converting their balls in play to outs at a high rate.
Among the pitchers at the bottom of the list, only David Bush has been doing the things that a pitcher should do: lots of strikeouts, few walks, and fewer home runs. But the guys at the bottom of the list generally have poor defensive support and are not doing a good job of stranding runners on base. Obviously, just the opposite of the pitchers at the top of the list.
The big picture is that we're early in the season and unlikely to see this kind of variation at the end of the year. So far this year, the standard deviation of FIP-ERA is around 2.3 (for pitchers with >3.0 IP). For a full year, such as last year, that number is around 1.0. In other words, the pitchers on the above list are likely to find themselves tumbling (or climbing) toward FIP-ERA = 0, and their ERA will start to better reflect their ability. All of which is a roundabout way of saying: don't put too much stock into April numbers. But you knew that already, didn't you?
April showers bring?
Nobody really knows what April will bring.
Think for a minute about a team's Pythagorean record. At the end of the season, we talk about teams that over- or underperformed their Pythagorean record; we wring our hands over our favored squad's bad luck or wonder what would have happened if the opposition hadn't played so far under it's predicted record. But simple statistics tell us that over the course of a 162-game season, a team that played one standard deviation away from Pythagoras will have a winning percentage +/- .040 away from it's Pythagorean winning percentage. That's around +/- 6 wins, so a "true talent" 90 win team can easily look like an 84-win disappointment due to nothing more than...well, let's not say luck. Let's call it "random variation." So team record doesn't tell us much even at the end of a season when champagne has been poured and managers have been canned.
And in April? We basically know nothing.
We're about a sixth of the way through the season and most teams have played somewhere in the neighborhood of 25 games. Basic statistics tell us that after 25 games, a team that has played one standard deviation away from Pythagoras is at +/- .100 away from it's Pythagorean winning percentage. A .500 team can look like a .600 juggernaut at the end of April. And this doesn't even account for flukey-good or flukey-bad performances.
Not to pick on Brewers, but they make a convenient example. Their Pythagorean winning percentage (as of early Saturday evening) is .510—not all that shabby—but their actual winning percentage is .636. They're sitting pretty atop the NL Central, and of course nobody can take away the wins that are already in the books, but they could just as easily have 11 or 12 wins and considerably less early-season buzz.
A marathon, not a sprint
Am I a wet blanket? I hope not. But the weather hasn't warmed up yet and I'm already getting ulcers and throwing my laptop across the living room. Consider this self-medication for my inability to stay calm when Mike Piazza, who's SLG is lower than what I had hoped his OBP would be, hits another weak grounder. If you're like me, a little perspective is a good thing this time of year.
Sal Baxamusa is a graduate student in chemical engineering. He can be reached here.