Platooning: the meaning of mean (Part 1)by Bojan Koprivica
January 29, 2013
You know that feeling when you are all excited and motivated and you thrust yourself in a big project only to stumble on a very first step? Like when you invite a dozen friends to a self-made six-course dinner and you suddenly realize you have no clue whether you have to boil the eggs for three, five or eight minutes? And you haven't even started with the complicated dishes? And then you are not so excited and motivated anymore?
I actually wrote a little article a few days ago, thinking I'd take it easy in my return from the dead, strictly sabermetrically speaking. It was about platoons, it was not all that long, definitely not groundbreaking, and actually made some sense.
But then, the devil in me started speaking to me, saying how it would be a truly beneficial exercise both for me and for THT readers to explain how to do the platoon skill regression from the scratch. Doing things from scratch is my favorite, as my wife and her wobbly desk can attest. Sure, Tango and his team taught us what numbers to use, but it would be much more fun doing the whole thing over again, plugging in the recent numbers and explaining everything step by step.
Or, so I thought, until I came to the very first step, my sabermetrically boiled eggs, if you will. To estimate the true talent, we have to regress to the mean. But what does mean mean?
Imagine there were only three hitters in the whole league, all three right-handed, and that they hit like this:
And now tell me, what is the league average platoon split for right-handers? The answer is: It's 0.062. Or 0.052. Or 0.047. Or 0.057.
With other stats this is a fairly simple question. You want the league average ERA? Well sum the earned runs and divide them by the sum of the innings pitched and multiply by nine for a good measure. But here? I can actually offer four different answers
1. Averaging the sum
We treat the three players as one. So we consider the three sets of data as a part of one big set, in which the league had 400 plate appearances against lefties and 900 against right-handers, and posted wOBA of .376 against the former and wOBA of .314 against the latter. We subtract the values and get the league average split of 0.062. Is that what you would do, too?
I don't know, just intuitively, this doesn't feel quite right, does it? It is the same approach as we would do with ERA, but what we are trying to measure is how pairs of values compare for individual batters , and by mixing them all together we seem to murk the picture. If you are not convinced that this is not the ideal path to pursue consider following, even simpler, scenario, where there are only two batters in the league:
We have one batter with a split of 0.060 and the other with a split of 0.040 and what we get as league average split is 0.071. Now convinced that our intuition was right?
2. Averaging the individual splits
This seems better. At least, it has "individual" in the title, so it has to be better, I say. We average 0.100, 0.015 and 0.040, the splits our three players had, and we end up with 0.052 as a league average split. But this has an easy-to-notice flaw. Our first hitter's split carries the same weight as the other two, yet he batted fewer times, contributing less information to what we are trying to measure. We need to weight the splits.
3. Averaging weighted individual splits
We weight the first split with 350 (number of total PA), the second one with 500 and the third one with 450. Doing that we get the third different result - 0.047. We have accounted for the fact that splits have to be measured on individual level, and we have accounted for the fact that not every individual split carries the same weight. We did almost everything right, yet something still feels wrong.
Look at our second and third hitters again. We have put more weight into the second hitter's split because he played more. But he has actually given us less information about his hitting split, because almost all his plate appearances came against right-handed pitchers. Meanwhile, our third batter had fewer at-bats, yet they were distributed more evenly. How can we account for that?
4. Averaging individual splits weighted by harmonic mean
What is harmonic mean, you say? This is harmonic mean:
We are looking at the harmonic mean of each batter's plate appearances against left-handers and against right-handers, because harmonic means tend to the least (or in our case of two elements, the lesser) of the elements. This way we can give more value to the PA that are more evenly spread, because they give us better information on the split itself. So, in the case of only two elements, the harmonic mean looks like this:
So, when we plug our values in it, we come up with the following:
You can think of it in this way. You get the weight for all the PA that you had against both LHP and RHP plus a discounted number for those you had against only one. So, our first and third batters both get 150 for the common one plus a portion of the unmatched ones. Our middle batter is weighted the least. He has only 50 PA in common between left and right, and the fact that he has batted an additional 400 times against right-handers doesn't bring him much, because it doesn't say much about the split. It means that our uncertainty about him is the greatest and he gets to represent our three-man league the least.
Where and how to cut?
Now that we have decided how to calculate the mean, we need only to choose where we want to cut off our sample. Say we decide it is 200 plate appearances over the past five years. It gives us a decent number of players to work with and eliminates those who just barely put on a major league uniform.
But nothing is simple with platoons. Should it be 200 overall PA? Or at least 100 versus left and 100 versus right? That actually does make a difference, as there seem to be different observed levels of talent for the players who were more heavily platooned as opposed to those with more average PA splits.
I have made the arbitrary (aren't they all arbitrary) cut at 200 PA for the harmonic mean of player's appearances versus left-handers and right-handers. This way we were left with exactly 150 left-handed hitters and 264 right-handers. I have excluded pitchers from this group and look only at the hitters of dedicated handedness (fancy way to say that I omit switch-hitters).
The left-handers in our sample group have the observed platoon split of 9.8 percent (0.035 wOBA). The right-handers showed the split of 6.1 percent (0.023 wOBA).
First, a few words about the absolute numbers. In The Book, Andy came up with 0.017 for right-handed batters and 0.027 for lefties, basing his research on data from 2000-2004. We have used similar approaches (not because we are equally smart or versed in statistics, but because I was not too proud to ask for advice and because both Tango and Andy were nice enough to push me in the right direction). However, what you see here is a simplified version - Andy went to great lengths to squeeze the last possible drop of truth out of the platoon lemon, performing multiple iterations to properly weight the data.
Matt Klaassen has done some very good work on platoon splits in recent years, and my numbers are rather close to his latest overview. Regardless of the exact evaluation method, I think it is safe to say that platoon splits have increased since The Book was written.
As for the fact that the left-handers constantly have higher splits than right-handers, the explanation is rather simple. There are two truths about hitting in baseball. One, it is easier to hit opposite-hand pitching than same-hand pitching. And two, it is easier to hit right-handed pitching than left-handed pitching.
The reason for the latter is negative frequency dependent perceptual advantage. If you prefer your explanations in English, it means that left-handed opponents (in any sport) are harder to decipher because our brains are wired to expect right-handers' movements, because we face those much more often. Thus, while right-handed batters have one piece of good and one piece of bad news, regardless of the handedness of the pitcher they are facing, for left-handed batters it is either all good or all bad, as left-handed pitchers are also the same-side pitchers for them.
Finally, you will notice that I list the values both as the absolute delta in wOBA points and as a percentage (difference divided with the higher of the two splits). I think that industry consensus is that the former is easier to use and the latter better, as better players have larger splits.
Now that we know the league average production, we can start figuring out how much each batter's split should be regressed to most accurately estimate his true talent level.
To do that, I have taken all the above batters and split their production into even (calendar days ending with 0, 2, 4, 6 and 8) and odd days (calendar days ending with 1, 3, 5, 7 and 9). I then looked how the buckets correlate. Starting with left-handers, we have r=0.13 with N(HM) = 320. If you prefer verbatim, that means that the two buckets show a correlation coefficient of 0.13, with average sample size of 320. That sample size is a harmonic mean of PA versus left and versus right.
Now, we could stick with harmonic means throughout the regression and that would probably be the more precise way to go. That means you would have to calculate harmonic mean of the player's plate appearances and then regress that number against the fixed number of league average performance. It is probably easier for the reader to do the regression based on PA versus left, as is customary, so here is the overview of all the ways one could go about it:
A few thoughts and explanations about those numbers:
- There was a higher correlation factor for right-handed batters when I used splits as percentages, not absolute values
- The N(L) is calculated using league average percentage of PA versus left-handed pitchers (29 percent for RHB and 24 percent for LHB). So, for example, a right-handed hitter with a harmonic mean of 325 and league average split will have 230 PA versus left and 563 PA versus right.
- The two different numbers of PA to regress with depend on whether you count only PA versus left or you use the harmonic mean.
- If I were to give advice on how to regress simply and quickly based on these numbers, I'd say: Always regress with 1,500 PA versus left, regardless of the batter handedness, and use 0.035 wOBA for left-handers and 0.023 for right-handers.
- If I were to offer another piece of advice (and you should really not take too many of those from me), I'd say to regress with either 2,260 or 1,670 HM PA and use 9.8 percent and 6.1 percent if you want the best results.
- The numbers I come up with are different from those we normally use (from The Book, for example). The reason is most probably the newer data, although methodology could definitely play a role in it, too.
(For full disclosure, my underlaying data covers 98-99 percent of all PA during the five year period (I use my own program for parsing Retrosheet data, and I parse quite a lot of data at the same time, like runner advances, placement of the batted ball and so on. On about 20-30 random games every year, I get parsing errors, sometimes because there are inconsistencies in the ways the data is recorded, sometimes because my parser doesn't know how to handle a certain rare occurrence. For example: bunt single that moved the runner from second to third, but he then got caught in a rundown between third and home, while the batter-runner advanced to second on a fielders choice, and then home on a throwing error. If my parsing program encounters an event it can't handle, it automatically ignores the game and puts it in my "to do bucket" that I periodically go back to, to both clean the files and improve my algorithm. I have no reason to believe that my samples are biased by that in any way, but if you do, please scream so. I also use the same wOBA formula for every year.)
Finally, to get a feel for how much difference there is when using different regressions, look at these four hitters. We have two left-handers and two right-handers, an experienced batter and a novice one in each group, all with pronounced observed splits.
The differences are not world-shattering, with 0.011 wOBA being the biggest delta among the four ways to regress. I think it is important that you do regress and it is important to understand why we need to do it. Which manner you use is up to you. For what it's worth, I will use the last column in the table above for the next articles on this topic—unless someone finds glaring holes in the way I went about my business—simply because it's mine and I'm like that when it comes to using things I made with my hands. If someone wants to get the underlaying data to run a personal own analysis, drop me a line.
The biggest thing, though, is that's this is not at all about which numbers will yield the best results. It is much more about the journey, about trying to see what our obstacles are while we try to understand baseball better. It's a look under the sabermetric hood, if you will.
After playing, coaching and umpiring more than 500 games all over Europe, Bojan realized that it's actually writing about baseball that can be most easily done while holding a beer in a hand. If you want to discuss either baseball or beer with him, drop him a line.