Over the offseason, I decided to figure out how to make a baseball simulator, and not just any sim, but one that anyone could easily use. To my surprise, I actually did. You can find the result at baseball-sim.com.
The purpose of this article is not to promote my sim. Rather, the process of creating the sim taught me a lot, both about the process of simulating baseball games, and about the results of simulations that I have run myself. I’d like to share those lessons I learned.
Lesson #1: Baseball is simple
Ever since I started following sabermetrics, I have been fascinated by the discrete state-based nature of baseball. That is, at any moment in between plays during a baseball game, the state of the game can be described with a small number of variables, most with just a few possible values. On a play-by-play level, those states are: inning (and half), outs, baserunners, and score. Of course, there are other variables involved, but at the simplest level, a baseball game can be reasonably described with these four variables, and all but score consisting of less than 10 possible values.
I’m fascinated by this quality of the sport of baseball because it’s simple and easy to describe, and simple and easy to describe things can be easily measured and simulated. With regards to measurement, I have always loved statistics like RE24, which essentially sums the difference in run expectancy between states. But, of course, the aspect we care about today is simulation.
Most of you have likely heard of the concept of a baseball sim, either through games like Out of the Park Baseball or Stratomatic, or through custom simulators built by sabermetricians like MGL or Xeifrank. Those are all very complex, thorough simulators, but the great thing about the simple state-based nature of baseball is that a simulator does not need to be complex. All we need to do to create a simple sim is to keep track of the four variables I mentioned above, adjusting their values using whatever probabilities we want and a few simple rules about when innings transition, runners advance, and runs score. The basic process of creating this simple sim is as follows:
- Determine the probabilities of the main events occurring in a single plate appearance: single, double, triple, home run, walk, strikeout, and batted ball out.
- Determine rules for how runners advance given the event that occurred. In the simplest version, they can advance the same for every event, or you can assign probabilities of going first to third on a single, etc. Also set up rules for when runs score (when outs+runners+1 after the play is greater than outs+runners before the play).
- Start a game with the inning at 1, outs at 0, bases at 0, score at 0-0.
- Choose an event at random. Change the outs, bases, and score based on the rules specified above. Rinse and repeat until there are three outs.
- When there are three outs, start over, this time assigning runs to the other team. Rinse and repeat until there are three outs.
- When there are three outs, increase the inning by one, and start over. Rinse and repeat until 8.5-9 innings have been completed.
I took some shortcuts above, but the process of simulating baseball, at it’s simplest level, is, well, simple. Because of the small number of variables and the easy to understand rules, it’s not a hard model to set up.
Lesson #2: Baseball is not simple
The sim I laid out above, while interesting, is fairly useless when we’re trying to actually model a real baseball game. Here are some of the variables that are missing from the above model:
- Individual batter probabilities
- Starting pitchers
- Relief pitchers
- Batted ball profile
- and so on…
While making my sim, I quickly realized that each additional variable I included would add a layer of complexity and work, complexity that compounded each time I made a change. Some of the variables were necessary to have a remotely useful sim, like batter probabilities, and others, while integral to the game of baseball, could be put off for another time, hopefully without sacrificing the usefulness of the tool (like baserunning). In the end, I created a model that considered the starting lineup and starting pitchers, but not much else. In this way, I would have a sim that was simple to use and understand, and could have some practical uses, but one that still required a lot more work in order to be relevant with regards to making predictions or optimizing lineups.
Lesson #3: Batter-pitcher matchups are much more complicated than you may think
The initial version of my simulator consisted of only the starting lineup. The sim would take the 2013 season stats of the players, turn those stats into probabilities of single, double, triple home run, strikeout, etc, and determine the outcome of each play based on those probabilities, adjusting the game state accordingly. That was fun and interesting, but after a while I became bored and ambitious. I wanted to include pitchers.
I began with the foolish thought that this would be a simple task of essentially averaging the the probabilities of pitcher and batter. The pitcher strikes out 20% and the batter strikes out 10%? Just use 15% — easy!
Not easy. You can probably already see why. If a pitcher and a batter both have strikeout rates below the league average, then combining those rates should produce a number that is even lower than both probabilities. Put another way, if you face Adam Dunn (31% K-rate) against Craig Kimbrel (47%), and average the strikeout rates, you get a 39% rate, but that’s lower than Kimbrel’s normal rate. In fact, we would expect Kimbrel to strike out Dunn at a much higher rate than normal.
That’s where the Odds Ratio method comes in. This is a variant of the log5 method made famous by Bill James to estimate the probability of team A beating team B, given both teams’ win probabilities. You can read a detailed description of the Odds Ratio method at Tom Tango’s blog, but the gist of the technique is that it allows one to estimate the probability of an event’s occurrence, given the probabilities of the event occurring for the pitcher, batter, and league. The simple implementation of the formula is:
Odds = Odds(H) * Odds(P) / Odds(L)
To implement this technique, I took each event (hit, strikeout, walk, etc), and used the hitter, pitcher, and league odds to find the overall odds, using the corresponding probability as the “true” probability of the event occurring. After doing so, the sim seemed to work fairly well. Batters performed worse against good pitchers, and vice versa. But, as Beyond the Box Score writer/researcher John Choiniere pointed out, the method I used didn’t quite work.
Why didn’t it work? Well, it’s complicated, but the basic idea is that the Odds Ratio method is only valid when there are two possible outcomes. That is, you can used Odds Ratio to find the probability of contact or no-contact overall, but you cannot sum up the Odds Ratio probabilities for home runs, batted ball outs, and non-HR hits in order to find the probability of contact. The reasons for this are beyond my understanding, but I confirmed that this was true by seeing that the sum of the probabilities as I had been calculating them did not equal 1.
Instead, I used a chained binomial approach–in layman’s terms, this means that I split each plate appearance into a “decision tree” of sorts, so that each time I used the Odds Ratio method, it was on an “A or not-A” set of outcomes, rather than “A or B or C or D”. In this case, that means I used pitcher and batter statistics to choose contact or no-contact, then if contact, home run or no-home run, then if no-home run, hit or not-hit, and so on. In the end, I came up with probabilities for each event that added up to 1.
Lesson #4: On high-scoring teams, OBP rules
A topic that has interested me recently is the effect of lineup construction and ability on overall offensive performance; in other words, how do the types of hitters on a team affect the way and extent to which teams score runs. Is it better to have high-OBP or high-SLG players, overall offensive ability being equal? What if overall offense is below average? What type of hitter do you want as a ninth player if the other eight are bad?
The topic has been covered extensively by Steve Staude at FanGraphs, as well as by Tom Tango, who both came to the conclusion that the higher the run environment, the more important OBP is over SLG. In other words, if you have a team of above-average players, you would rather they have a high OBP than a high SLG, their wOBA being static.
To test this, I ran two simulations. In the first, I ran a team of all 2013 Carlos Santana against average pitching, and in the second, I ran a team of all 2013 Marlon Byrd against average pitching, then compared their final runs per game. I chose these players because both had a .364 wOBA last year, but Santana had a line of .268/.377/.455, while Byrd had a line of .291/.336/.511. Based on the findings of Staude and Tango, we would expect that the Santana team to score more runs per game since both hitters are above average.
The simulation confirmed this suspicion. The team of Carlos Santana scored 6.2 runs per game, while Marlon Byrd’s team scored only 5.6. That’s quite a difference considering equivalent wOBAs, and the tip of the iceberg as far as determining the best construction of a lineup to take full advantage of these characteristics. Does this mean that front offices should value certain players more than others based on their current lineup quality, even if the players are of the same overall quality? Maybe, but that’s a question for another day.
Lesson #5: On low-scoring teams, SLG rules
The flip side of the above conclusion from Tango and Staude is that for a low-scoring team, it is better to have players with power than players who get on base, again, all else equal.
I tested this in a similar way as above, by looking at Jonathan Villar (.243/.321/.319) and Zack Cozart (.254/.284/.381), who both had a .289 wOBA in 2013. Because Cozart had a higher slugging, we would expect a team of him to score more runs per game than Villar’s team, since slugging is more important in low-scoring environments.
The sim confirmed this expectation, with Villar’s team scoring 3.3 runs per game compared to Cozart’s 3.4. As you can see, the difference wasn’t nearly as drastic as that of Santana/Byrd, partly because Villar/Cozart weren’t as bad as the others were good, and partly because the difference between the two with regards to OBP and SLG wasn’t as great either. Still, over 10,000 games, even a tenth of a run per game is evidence of the benefit of slugging on lower-scoring teams (see: Stanton, Giancarlo).
Lesson #6: Random variation is real and significant (for teams)
The effect of random variation should be well-known to those who read tangotiger.com often, myself included, but the extent to which simulation results vary, even over large samples, astonished me. As an example, I’m going to run a simulation of a sample game between the Yankees and Red Sox (based on ZiPS projections) 162 times, note the results, then repeat 10 times. Below are the results in winning percentages and runs per game:
On the one hand, randomness like this isn’t, and shouldn’t be, surprising. One of the great things about baseball is the worse team always has a significant chance of winning, and it’s pretty much impossible to consistently predict who will win a game or series. So seeing variation in win percentages, in this sense, and in the statistical sense, is entirely expected.
On the other hand, the results shown above contradict the way in which many to most baseball fans and analysts think about the game, and about season results. A team that wins 104 games is clearly better than a team that wins 88. Even a team that wins 98 would be put in a different category than a team that wins 93, especially with the same quality of opponent. And yet, these are entirely plausible ranges of outcome for a team season. 104 and 88 are likely extreme cases for the team, but as we see above, it is not out of the question for a 96ish-win team to end up with 88 or 104 wins. As much as we want to ascribe final results entirely to skill, the fact of the matter is that there’s always an error range, even for entire seasons, a fact that should be in the back of our minds when we summarize past seasons or predict future ones.
Lesson #7: Random variation is real and significant (for players)
In the same vein as above, player results are significantly affected by random variation, as much as we want to ascribe them to skill rather than luck,. Consider the final batting statistics of 18 Mike Trouts batting against Yu Darvish over 150 games:
The results here seem even more surprising and counter to the common way of thinking about statistics than the team results. Based on pure randomness, a full season of the exact same player can reasonably vary between .360 and .420 wOBA, .250 and .300 batting average, 22 and 40 home runs, and so on. Those are drastic differences, and again, though they are the extremes of reasonable outcomes, they still illustrate the effect that randomness plays in player statistics. We may say that we are aware of random variation, and try to consider it in player evaluation, but for most people, including myself, the significance that it has in our day-to-day evaluation is probably not nearly as high as it should be.
(By the way, when I simulated 25 games, about how many players will have played when you read this, Trout’s wOBA ranged from .312 to .510. Another reminder to take early season statistics with a
grain bagful of salt.)
Lesson #8: Batting order matters, sort of
Much has been written and researched on the ideal order of a lineup, and there are tools available to determine the best order of a lineup. I was still curious, however, about what my simple simulator would say about these theories and about the differences between types of lineup orders.
One example I heard a lot about last year and in the offseason was the placement of Joey Votto and Zack Cozart in the Reds lineup. Last season, Votto batted almost every game in the third position, behind Shin-Soo Choo and Cozart. Choo, of course, was one of the highest-OBP players in baseball last season, but Cozart’s .284 OBP in arguably the most important OBP spot in the lineup was, by no reasonable standards, a smart move.
This season, Votto has been batting second (behind Billy Hamilton), and Cozart eighth; this placement of the two should, by The Book, net the Reds more runs per game than batting Cozart second and Votto third, all else equal. So does it?
Yes, but barely. With Votto third and Cozart second, the Reds scored 4.23 runs per game using 2013 numbers (and 20,000 games simulated), but with Votto second and Cozart eighth, they scored 4.26. Over the course of the season, that’s about five runs, or half a win. Which is not insignificant, and surely means that Cozart should not bat second, but the impact of this change is small.
Lesson #9: A good closer makes a big difference
A fun aspect of the sim to play around with is the custom starting situation options, in which you can begin games at a certain inning, with a certain number of outs, runners, score, and batting spot. In particular, I was curious upon adding this feature about the significance of having an elite closer close the game instead of a mediocre one.
To test this, I took two random teams (Rays @ Angels in this case), and simulated 20,000 games starting in the bottom of the ninth, with the Rays up by one. The first time I ran the sim, I used Craig Kimbrel as the closer (with ZiPS projections), and the Rays won 87% of the time. Then, I ran the 20,000 games again, this time with Jose Valverde as the pitcher. In the second situation, the Rays won only 74% of the time.
Without doing any analysis of these numbers, that took me aback. There is a huge difference between 87% and 74%, one that would have a huge effect over the course of a season. If you assume that they both have 50 save opportunities in a season, that’s a difference of 6.5 wins. Of course, reality is a bit more complicated than that, but it’s a good indication that there is a bigger difference between an elite and below-average closer than WAR may indicate.
Lesson #10: My work will never end
When I began this project, I saw it as a simple opportunity to get more coding experience, and maybe make something that would be fun to use for a bit. I soon realized that 1) I knew nothing, and 2) there were a million things that I could do to improve the simulation engine itself, as well as add new, hopefully interesting, features. I wanted to add a “Scores” page, which would run simulation on all current games; I did this, but there is a lot I need to do to make it better. I want to add a lineup optimizer, a season simulator, and most importantly, the many factors I mentioned in lesson #2.
The sim is not perfect, but as the lessons above show, it can still teach me, and hopefully others, some interesting lessons about baseball. Or, at the very least, it can remind us of lessons we should already know.