THT Mailbag: Underway

Time sure does fly when there’s baseball on. Just in the past couple weeks, we’ve seen a no-hitter, four straight home runs and Mariano Rivera blow saves to Marco Scutaro and the Red Sox. Still, things seem mostly to be going to chalk so far. The Mets and Red Sox are off to hot starts, the two Central Divisions are all jumbled up and the AL West is very mediocre. We’ve had one contender decimated so far by injuries (the Yankees) and one somewhat surprising division (the NL West). And now that I’ve gotten off my lazy butt, we have another THT Mailbag too.

Hard Way to Make an Easy Living

I’m currently a college junior who is majoring in Accounting and Geography. But my true love is baseball and the statistics of it. I am currently lined up for an unpaid internship with the collegiate league Corvallis Knights. I also have the option of having a paid accounting internship for the summer. I would really like to get into the industry and become a Major League Baseball statistician, but I do not know what the likelihood is of that. If you could let me know what it is like, it would be much appreciated. I want to learn more about being a statistician, but I do not know where to ask.

– Rob M., Eugene, Oregon

John Beamer: I guess this question practically defines the word “dilemma.” You’ll ask 100 people and get 101 different answers. Anyway, here is my opinion.

Becoming an MLB statistician is not easy. For a start there are only 30 teams and a lot of people just like you would saw their left arm and leg off to work for one. This has a couple of implications. One, unless you are very very good they’ll pay you a pittance, and two, you have to be very very good to even get a look in.

I have a few other thoughts. First, becoming a statistician for a ballclub might not be as grand as it first seems. For a start you’ll be in the back office and will likely only have minimal influence on team strategy and decision making. Second, you’ll probably get paid a pittance, provided you can even find a good role. Third, job security isn’t guaranteed. If the front office changes you could be on the streets.

You’ll also need to ensure you are correctly qualified. A Geography and Accounting degree doesn’t yell statistics to me (admittedly Geography can have a statistical bent), but unless you are already an expert statistician consider a masters course or some other qualification. If you can’t stomach that then you need to buy some textbooks and bury your head in them and learn all the common techniques. A good starting point, since you are obviously an avid baseball fan, is to read some of the statistically leaning blogs such as Tango’s and Phil Birnbaum’s and try to repeat some of the studies on there. After you learn your way around try to devise questions that you want answering and use the techniques you know to solve them.

You’ll almost certainly need to establish a pedigree before being noticed by a major league team. You’ll have to start writing about statistics, either or your own blog or for stats leaning publications (eg, By the Numbers). If you do this, do original, good work, then you may get lucky and get noticed. You never know.

Saying that, my advice would be to try to establish yourself in a more secure job. Accounting is a very well paying profession that will give you financial security (provided you qualify). You could then do sabermetric work on the side and perhaps it eventually becomes something bigger. I certainly wouldn’t jack in a career in Accounting, say, to follow a 1% shot of getting a job with an MLB team.

As to the internship that is tricky. My personal view is that as an intern in an Accounting firm you learn little … but it does look good on the resume. The baseball internship will no doubt be a lot more fun (you’ll do more interesting work, I’m sure). The question is whether doing the baseball internship harms the Accounting career. It’s a tricky one … but I don’t think it will that much. I’d go for the baseball internship, have some fun, learn some stats but be prepared to take up an Accountancy career and do your baseball stuff on the side.

Useless No Hitter Info

The morning after Mark Buehrle’s no-hitter against the Rangers, I was curious. According to the scorecards, what is the lowest number of batters faced in a no-hitter. Buehrle faced 26 batters on the scorecard, so who’s been lower? Also, has anyone ever pitched the same game as Mark: Allowing one walk and then picking off that runner?

– Joe N.

John Walsh: Buehrle actually faced 27 batters, I think you are referring to the fact that there were 26 at-bats in the game. To pitch a full-length no-hitter, a pitcher has to record 27 outs, hence he must face 27 batters.

It’s possible to face fewer than 27 batters and throw a no-hitter, of course, if the game is shortened by rain. That has happened several times, most recently on the last day of last season, when Boston’s Devern Hansack (who?) threw a five inning no-hitter against the Orioles. If Hansack never makes it back to the majors (he’s currently pitching for the Red Sox Triple-A team in Pawtucket), he’ll have the distinction of throwing a no-hitter in half of his major league appearances.

Regarding no-hitters just like Buehrle’s, I searched through the play-by-play data from Retrosheet (which go back 50 years) and I found a game that was very similar. On June 4, 1964 Sandy Koufax threw a no-hitter against the Phillies. The only baserunner was Dick Allen, who, after drawing a base on balls, was thrown out trying to steal second base. I also came across a game from 1922, in which the Giants’ Jesse Barnes also threw a no-hitter, gave up one walk and faced the minimum 27 batters. The Giants turned a double play in that one, to eliminate the one baserunner.

Another interesting game was Ernie Shore’s nearly perfect game. Shore, pitching for the mid-teens Red Sox, came on in relief after just one batter, who was walked. Shore was needed because the starter was ejected for fighting with the umpire after the walk. Shore calmly picked off the runner and then retired 26 batters in a row for the near-perfecto. The starting pitcher in the game? Babe Ruth, of course.

A Hardball Times Update
Goodbye for now.

Richard Barbieri: In the sprit of John’s reply, it only seems fitting to mention A.J. Burnett’s no-hitter from 2001, which was almost the exact opposite. Like Buehrle and Koufax, Burnett’s opposition (the Padres) only recorded 26 at-bats. Unlike those games, however, Burnett allowed nine walks. Of course, walks aren’t official at-bats, so Burnett managed to keep his total low, throwing a no-hitter despite a WHIP of 1.00.

On the Side

This week Joe Torre used Andy Pettitte out of the bullpen for an inning in place on his scheduled between-start side session. Many in the media may be right that it was a product of early desperation, but regardless of the motivation, why is this not tactic used so rarely?

Assuming the starter has no injury issues and is not struggling, it seems to me that using him for an inning of relief in place of his side session would get more innings out of a team’s top pitchers (in place of their worst pitchers) without taxing the starter’s arm significantly more than their side session. Flexibility would still remain as the pitcher could be taken out at any point and could also throw a side session as usual if the game didn’t present a good situation for their use.

– Jason G.

Steve Treder: Well, Jason, all I can tell you is I wonder exactly the same thing. This tactic was in fact rather commonly seen in the 1950s and 1960s, but by the 1970s, it faded out and has never come back with any degree of regularity. Why?

I think it’s an illustration of the culture of hyper-specialization among pitchers that has prevailed since the 1970s, and only gotten stronger. The modern mode features, to my way of thinking, an extreme concern with strictly defined, highly predictable “roles.” The very apparent belief is that pitchers perform significantly better when they are allowed to do the same thing, over and over, with the greatest possible minimization of variation. My sense is that while there is some basis of truth to this notion, its importance is vastly overemphasized.

Thus we have the intense reluctance to use starting pitchers in even the most occasional relief appearance. When a starter is taking his regularly scheduled side session anyway, assuming he’s healthy and feeling good, why not have him take an inning if the game situation presents itself, or face a batter or two? What exactly is the downside?

I strongly suspect the true reason this practice remains effectively extinct is nothing more sophisticated than, “Because we just don’t do that.” In all realms of human behavior, lots of things are handled this way. We’re creatures of habit and custom, which serves the convenient purpose of forgiving us the requirement of thinking all the time.

Twin Killing

I was hoping that you could help settling a little debate amongst my friends. This started off as a discussion of the fact the Miguel Tejada is consistently amongst the League Leaders in double plays. Is this a more a function of how often he is in a situation where he can ground into a double play, or more a function of not producing in these situations?

Maybe it’s a function of being a shortstop in Baltimore … if memory serves correctly, I think that Cal Ripken Jr. was usually near the top of the leader board in this category as well. My guess is that the answer lies somewhere in the middle. Does a stat exist that measures how often a player is in a situation with a runner on first and fewer than two outs? Perhaps a GDIP/GDIP Opportunity ratio? Help us settle the debate.

David Gassko: Baseball Prospectus keeps track of some great statistics, including each player’s double play opportunities. Last year, among 143 players, Tejada was 130th in double play percentage, hitting into a double in 18% of his double opportunities (versus an average of around 13%). In 2005, he was 139th among 143 players. In 2004, he was 136th out of 154 players.

So why is Tejada so bad in double play situations? I think it mostly has to do with his propensity for hitting ground balls—last season, he was 13th in groundball percentage among qualified players. Tejada just isn’t fast enough to consistently run those ground balls out.

John Walsh: An additonal comment: another important factor in a player’s double play tendencies is how often he puts the ball in play. Tejada doesn’t walk or strike out much, and in fact, puts the ball in play about 83% of his plate appearances—that’s a high percentage. So, with lots of balls in play and a large fraction of them being ground balls, Miggy is bound to hit into more than his share of double plays.

There are many slow-footed sluggers who have just the opposite profile: They are better than average at avoiding the double play because they put the ball in play at a low rate and when they do hit the ball, it usually goes in the air. Perhaps surprisingly, Jim Thome, Barry Bonds, Jason Giambi, Ryan Howard and Nick Swisher were all better than average at staying out of the double play in 2006.

As for Ripken, he had relatively high GIDP totals for the simple reason that he played every game for all those years. He was just a little above average in double plays grounded into per opportunity.

Still Cold

I was wondering about the correlation between runs per game and time of the season. It seemes to me that teams score more runs as the season progresses, but I can’t seem to find any data to back this up. Can you help?

– Robby W.

David Gassko: There’s always Retrosheet for that! What you’ll find is that more runs are scored in the warm weather months, especially July. Runs per game generally rise through July, and then fall from August to the end of the season.

John Walsh: To quantify what David said, here is a table of runs per game in each month, using stats from the last five years:

+-------+---------+-------+------------+
| month | innings | runs  | runs_per_9 |
+-------+---------+-------+------------+
| 03    |   292.3 |   167 |       5.14 |
| 04    | 32368.3 | 17264 |       4.80 |
| 05    | 37154.7 | 19438 |       4.71 |
| 06    | 35555.0 | 18988 |       4.81 |
| 07    | 35548.0 | 19183 |       4.86 |
| 08    | 37733.7 | 19901 |       4.75 |
| 09    | 36224.7 | 18965 |       4.71 |
| 10    |  1612.0 |   780 |       4.35 |
+-------+---------+-------+------------+

July does have the highest number of runs scored per game, but it’s only 0.06 runs more than April. And, in this sample, April is higher-scoring than August, which is unexpected. I think the message here is that the effects are very small and even using five years of data, there is probably some amount of statistical fluctuation in these numbers.


Comments are closed.