I’m on a bit of a pitcher evaluation kick at the moment. Just a couple of days ago, I wrote about crowdsourcing balls in play at Beyond the Box Score.

More importantly, two weeks ago I had an idea: instead of measuring starting pitching performances on an inning or plate appearance basis, why don’t we evaluate them on a game-by-game basis? Since (team) wins are the end goal of a pitcher, and since each game is basically independent, we could evaluate an entire season simply by evaluating each start, and summing them up.

So how do we evaluate a single start? Traditionally, we have used pitcher wins. Then, those who wanted to ignore the effect of the pitcher’s team offense thought of the Quality Start. But do we really want to say that a six-inning, three-run start (4.50 ERA) is quality? No is the answer. No we don’t.

There wasn’t a great way to evaluate a single start, so Bill James, doing what Bill James does best, created something called Game Score. Here’s the formula for Game Score:

*Game Score = Outs + 2*(innings completed after the fourth) + strikeouts – 2*hits – 4*earned runs – 2*unearned runs – walks + 50*

It was a pretty good start, but far from perfect. Weighting earned runs twice as strongly as unearned runs seems arbitrary, as does counting only innings after the fourth. I won’t get into the specifics of what’s wrong with this Game Score, because it doesn’t really matter for my purposes. But, because it will be a good reference, I’ll show you the leader board for the sum of each pitcher’s Game Score for each start in the 2012 season:

Num | Name | GS |
---|---|---|

1 | Clayton Kershaw | 2089.33 |

2 | Justin Verlander | 2072.66 |

3 | R.A. Dickey | 2057.33 |

4 | Felix Hernandez | 1969.66 |

5 | Matt Cain | 1947.66 |

6 | Zack Greinke | 1917.66 |

7 | David Price | 1914.99 |

8 | Gio Gonzalez | 1912.66 |

9 | Johnny Cueto | 1907 |

10 | James Shields | 1893.33 |

11 | Kyle Lohse | 1885 |

12 | Mat Latos | 1870 |

13 | Jake Peavy | 1862.99 |

14 | Cole Hamels | 1859.66 |

15 | Hiroki Kuroda | 1858.33 |

16 | Madison Bumgarner | 1837.66 |

17 | Yovani Gallardo | 1828.66 |

18 | Jordan Zimmermann | 1797 |

19 | C.J. Wilson | 1796.33 |

20 | Jason Vargas | 1787.66 |

Looks like it passes the sniff test to me. Let’s move on.

A couple years ago, Tom Tango introduced a few alternatives to James’ Game Score, each one based on a different method of evaluating pitchers. Let’s summarize them.

### Runs

The first new version of Game Score cares only about runs allowed. It’s essentially the Game Score version of RA9. Here’s the formula (again, as formulated by Tango):

*Game Score = 6.4*IP – 10*R + 40*

And the 2012 leader boards for total Game Score:

Num | Name | Runs GS |
---|---|---|

1 | Clayton Kershaw | 2077.06 |

2 | R.A. Dickey | 2049.06 |

3 | Justin Verlander | 2035.33 |

4 | Johnny Cueto | 1978.8 |

5 | Felix Hernandez | 1964.8 |

6 | David Price | 1960.39 |

7 | Matt Cain | 1953.73 |

8 | Kyle Lohse | 1940.4 |

9 | Zack Greinke | 1878.93 |

10 | Hiroki Kuroda | 1865.86 |

11 | Gio Gonzalez | 1865.73 |

12 | Jordan Zimmermann | 1842.26 |

13 | Matt Harrison | 1825.33 |

14 | Cole Hamels | 1818.13 |

15 | Jake Peavy | 1801.59 |

16 | Mat Latos | 1789.73 |

17 | Jason Vargas | 1780.93 |

18 | Jered Weaver | 1777.46 |

19 | Yovani Gallardo | 1765.6 |

20 | Cliff Lee | 1760.4 |

### Strikeouts and walks

Here we have the other end of the spectrum; instead of considering only runs allowed, this version is going to be based only on strikeouts and walks, and nothing else. It’s basically the Game Score version of kwERA.

*Game Score = 0.4*IP + 3*(SO–BB) + 40*

And the leader boards:

Num | Name | KBB GS |
---|---|---|

1 | Justin Verlander | 1958.33 |

2 | R.A. Dickey | 1947.06 |

3 | Clayton Kershaw | 1924.06 |

4 | Felix Hernandez | 1913.8 |

5 | James Shields | 1912.06 |

6 | Zack Greinke | 1882.93 |

7 | Max Scherzer | 1874.06 |

8 | Cole Hamels | 1827.13 |

9 | Cliff Lee | 1821.4 |

10 | Ian Kennedy | 1811.33 |

11 | Madison Bumgarner | 1807.33 |

12 | Jake Peavy | 1805.59 |

13 | Matt Cain | 1796.73 |

14 | Mat Latos | 1793.73 |

15 | Johnny Cueto | 1784.8 |

16 | Yovani Gallardo | 1779.6 |

17 | David Price | 1768.39 |

18 | Adam Wainwright | 1764.46 |

19 | Hiroki Kuroda | 1761.86 |

20 | Gio Gonzalez | 1761.73 |

### FIP

See the previous version, but add home runs, and you have the FIP version. There’s really not too much else to say. As always, Tango’s formula:

*Game Score = 2.5*IP + 2*SO – 3*BB – 13*HR + 40*

Leader board:

Num | Name | FIP GS |
---|---|---|

1 | Felix Hernandez | 1996 |

2 | Justin Verlander | 1972.83 |

3 | Clayton Kershaw | 1965.16 |

4 | R.A. Dickey | 1906.66 |

5 | Zack Greinke | 1894.83 |

6 | Johnny Cueto | 1875.5 |

7 | Gio Gonzalez | 1856.33 |

8 | James Shields | 1842.16 |

9 | Adam Wainwright | 1802.66 |

10 | David Price | 1798.49 |

11 | Matt Cain | 1791.33 |

12 | Kyle Lohse | 1775.5 |

13 | Madison Bumgarner | 1754.83 |

14 | Cole Hamels | 1751.33 |

15 | Max Scherzer | 1738.16 |

16 | Hiroki Kuroda | 1731.16 |

17 | Mat Latos | 1723.33 |

18 | Jake Peavy | 1720.49 |

19 | Cliff Lee | 1719.5 |

20 | Jordan Zimmermann | 1718.16 |

### Linear weights

Last one! This time, we’re going to use a simplified version of linear weights, looking only at walks, hits and home runs.

*Game Score = 8.4*IP – 3*BB – 5*H – 8*HR + 40*

Leader board:

Num | Name | LWTS GS |
---|---|---|

1 | Clayton Kershaw | 2080.39 |

2 | Justin Verlander | 2035.99 |

3 | R.A. Dickey | 1984.39 |

4 | Felix Hernandez | 1943.8 |

5 | Matt Cain | 1919.39 |

6 | Gio Gonzalez | 1918.4 |

7 | Kyle Lohse | 1869.4 |

8 | Johnny Cueto | 1865.8 |

9 | David Price | 1848.39 |

10 | Zack Greinke | 1837.59 |

11 | James Shields | 1824.39 |

12 | Mat Latos | 1818.39 |

13 | Jake Peavy | 1804.59 |

14 | Madison Bumgarner | 1801.99 |

15 | Hiroki Kuroda | 1793.19 |

16 | Cole Hamels | 1759.8 |

17 | Jered Weaver | 1754.79 |

18 | C.J. Wilson | 1735.6 |

19 | Jordan Zimmermann | 1726.6 |

20 | Adam Wainwright | 1701.8 |

### Average

Now, it’s almost certain that none of these versions of Game Score is perfect on its own. However, as Tango said in the article a few years ago, we can assign weights to each one depending on our goals or preferences. Unfortunately, right now, I’m not sure how to do that. Maybe that will be a project for a future article. For now, I’m going to give you the average of all four new versions of Game Score.

Num | Name | Avg GS |
---|---|---|

1 | Clayton Kershaw | 2027.2 |

2 | Justin Verlander | 2015.028 |

3 | R.A. Dickey | 1988.9 |

4 | Felix Hernandez | 1957.612 |

5 | Zack Greinke | 1882.388 |

6 | Johnny Cueto | 1882.38 |

7 | Matt Cain | 1881.768 |

8 | Gio Gonzalez | 1862.97 |

9 | David Price | 1858.13 |

10 | James Shields | 1845.8 |

11 | Kyle Lohse | 1838.54 |

12 | Cole Hamels | 1803.21 |

13 | Hiroki Kuroda | 1802.08 |

14 | Jake Peavy | 1799.05 |

15 | Mat Latos | 1799.036 |

16 | Madison Bumgarner | 1789.028 |

17 | Jordan Zimmermann | 1755.656 |

18 | Cliff Lee | 1744.14 |

19 | Yovani Gallardo | 1741.292 |

20 | Max Scherzer | 1732.132 |

This list looks good, but it is far from a perfect way to evaluate pitchers. It doesn’t take into account park or league factors, which is incredibly important. However, if you’re looking for a different way to evaluate pitchers that takes many different factors into account, this is something to consider.

### Conclusion

There you have it. For your reference, here’s a Google Docs spreadsheet of all the versions of Game Score for every pitcher who made at least one start in 2012.

Before I go, because I didn’t do a whole lot of actual analysis, here are some of my ideas at the moment for where to go next with these data:

{exp:list_maker} Include park and league factors

Combine these versions of Game Score with varying weights

Convert Game Score to wins

Look at total Game Score over a career

Probably much, much more. Stay tuned! {/exp:list_maker}

*Thanks again to Tom Tango for the inspiration and, honestly, most of the real analysis. Also thanks to James Gentile for the Retrosheet help.*

MrMan said...

Someone help me with something that continually perplexes me. I see on Fangraphs and sites like this a lot of really good, insightful analysis and well-thought out approaches to mining numbers.

But I also see a lot of poorly presented information. For example, in the very simple tables used on this post it’s difficult for readers to quickly grasp the meaning simply due to the poor formatting of the numbers.

Currently the numbers are presented in a centered format with two decimals. First, the two decimals are meaningless when you’re dealing with numbers i n the thousands; there’s no meaningful difference between 1897.54 and 1897.21. The decimals should not be included.

Second, the one thousand comma indicator is not included when it should be. Any time you’re dealing with numbers in the thousands the comma provides a visual cue to the eye that enables the reader to more quickly make sense of the numbers he’s looking at.

Finally, the numbers are centered when the should be right-aligned. This creates confusion for the user as numbers lined up vertically are sometimes the 1st number in a 4-digit number, sometimes the 2nd.

Combined these three factors create somewhat of a mess for the reader. Now this is a simple table and it simply makes the reader work a little more to make sense of the content. But it is indicative of a weakness I see in much SABR work.

Same thing goes for graphs and charts; I see a lot of lazy, poorly designed charts. Charts should take hard-to-comprehend numbers and make them easier to comprehend but I see a lot of charts that don’t actually make the information any more sensible than a raw table.

The presentation of data / findings is often as important as the actual content is communicating the meaning. I don’t understand why more effort isn’t put into the presentation, especially when you consider how long it would take to make the changes I note above (less than a minute).

Matt Hunter said...

Thanks for the feedback MrMan. You’re completely correct, and I should have taken more care in presenting these tables well. I’ll keep your suggestions in mind for future tables, graphs, and charts.

Jim said...

If you take off the 50 free points per start then you can subtract 1650 from both Kershaw and Verlander and probably others. This takes care of the comma problem for those who don’t read numbers well.

I guess another reason for the 50 points is if a pitcher starts throws one pitch that is not hit and then departs, his game score is 50. Wow!

MrMan said...

Matt

First, that’s a positive attitude. Most people don’t respond to any criticism with “you’re completely correct”. Second, you’re obviously smarter than I when it comes to baseball analytics; but if you ever do want some help in presenting data it’s basically what I do for a living and would be happy to help.

@Jim….I’m with you in not really understanding the granting of the 50 points per start. This accounts for as much as 80% to 90% of the overall score. Seems like a generous number for simply getting up on the mound and is too heavy compared to how difficult it is to accumulate additional numbers after that point.

Tangotiger said...

I can’t stand decimals either, in situations like this. If it has no meaningful difference, it shouldn’t be shown.

And this applies to ERA (should be to one decimal place), OBP, SLG (to two decimal places), and so on. At least for those, they can rely on intertia/tradition for their obstinance.

It’s why Bill James shows 107 runs created, and not 107.2.

***

My Game Score starts at 40, not 50, and someone recommended I start at 30 (and make each of the point be earned to get to the average of 50). Starting at 50 is problematic, precisely for the 1-pitch scenario.

Game Score is equivalent to win%, and so, by starting at 40 (or 30), you are explicitly starting everyone at “replacement level”.

***

Carl: what you are asking is pretty much exactly my first version of Game Score.

Carl said...

Tom Tango/Tangotiger (I assume same person)

I had missed how similar my proposal was to your #1 proposal. Nice catch. I do think my proposal would take that and tweak it by:

1) not giving 10 marginal runs as a win

This is important, I believe, to adjust for diiferent scoring periods (1960s vs 1920s vs steroid era vs deadball era) and

2) requiring minimum 5 innings (hidden in looking at wins for IP/ER allowed

3) using ER instead of R to avoid differences in official scorers.

4) not adding 40.

Are you up to the challenge of creating such a tool and analysis? Wuold be awesome to see leaders for a year and more controversial starter’s careers ie) Blyleven, John and Kaat.

Tangotiger said...

Yes, I would tweak it by era. The 10 runs per win is good enough though. Most eras will have it between 9 and 11, and, really, does it matter if the Game Score will show 64 instead of 67? Anyway, I agree that the “10” should be flexible.

***

“using ER instead of R to avoid differences in official scorers”

?? You are confused here. R has NO interpretation. ER is subject to official scorer interpretation. So, to avoid differences, you want R. You are arguing against yourself here.

***

And the “40” or “30” or whatever to add, is simply to give it a scale we can understand. If I say the average is “50” and 99% of the pitchers will be between 0 and 100, and that the number corresponds to chances of winning, isn’t that a good enough reason to do that?

***

I put out 4 different Game Scores, which is MY challenge to everyone. I’m sitting this one out, and just pointing out the issues as I see them.

Carl said...

Tom Tango,

1) You are 100% that I got myself confused w ER/R and while I had proposed ER above, I should not have. Using R ist he superior measuring stick.

2) I don’t care for the 10, as while it really doesn’t matter for individual games (as you say, 64 vs 67 doesn’t really matter that much). I want to avoid the situation where the amount of the loss affects an adjusted W/L record. For example, a SP who gives up 8 runs in 1/3 of an ining will get the loss 99.99% of the time. One who gives up 18 runs in 1/3 of an inning will get the loss 99.99% of the time. Still, when doing game-by-game summations to get a W-L record, my way would count for 1 loss while using 10 (or 11 or 9 by era) would get 2 losses.

Both game scores are giong to be at or near zero (rightfully so), but I only care when compilnig a season’s/career adjusted W/L record.

Sorry you didn’t take up the challenge. I simply don’thave time to pickup my own gauntlet. Anyone else out there? Bill James sitting around w a fere afternoon?

Carl said...

PS> Tom Tango, Loved the original article and remembered it as soon as I clicked on the link above.

Zachary said...

I know you didn’t mean too much by averaging the different metrics, so this is not any sort of criticism but just observations if you actually decide to attempt an improved game score.

You said that it is almost certain none of the versions on their own are perfect, but that assigning weights depending on goals/preferences to the metrics (like the 25/25/25/25 split in the simple average provided) could potentially produce a better game score figure.

In my opinion almost any sort of weighting method across multiple versions will actually produce a worse result for evaluation for a few reasons. One is the repetitive nature of some of these versions, for example the “strikeouts and walks” version and the “FIP” version. Like you said the “FIP” version is just the “previous version” (k’s and bb’s) but adding hr’s. The numbers are a little different in the two calculations are a little different, but looking at it from a simplified viewpoint as the general value impacts are the same, when you average these two together, you are essentially just creating a new formula where the bb and k impact on the score is double (or the hr impact is halved). I don’t see a way that this provides a better metric for evaluation, it is seemingly saying take FIP but with HR’s having half (roughly) the effect it usually does. If this was preferable I would think that would be the weight used for FIP from the outset.

I don’t want to run on forever, so I will just say the addition of the other two methods only continues to distort the results each one was originally intended to produce. In the end if you wanted to take the time, you could make a formula for the average game score, which would just be a factor of the inputs [IP (used in all 4 versions), R, SO (2 versions), BB (3 versions), HR (2 versions), H]. My rough try at the formula came out to (for whatever it’s worth): [(4.425IP)+(1.25SO)-(2.5R)-(2.25BB)-(5.25HR)-(1.25H)+40] This creates a new metric which, at least to me, has the same issue you had with the original game score method, that it seems arbitrary.

Side note: you stated that the average game scores you calculated was made up of the four new methods provided, but when I just ran through the excel on my own, I found the results posted to be an average of the four new methods along with the original game score. This obviously doesn’t matter for your article, or for my commentary, as we both agree that either output (4 method or 5 method average) has its flaws. I just wanted to point it out for you.

I am sure that there is some way that weighting the different methods can produce a more effective output, just as I am sure there is a better method out there for evaluating pitchers, but I am not sure how you could possibly discern what that would actually entail of, that is sort of an answer we are all searching for when it comes to player evaluation.

Tangotiger said...

I guess you are not familiar with “predictive FIP”, elsewhere at Hardball Times, because that’s pretty much what it comes down to.

***

The purpose of the different Game Scores is that each has a different view as to how to evaluate a pitcher’s performance.

And by laying it out there, each person is therefore free to decide how they want to view that performance… but forces that person to be consistent.

Is a 4-hit, 0-walk game the same as a 0-hit 6-walk game? Well, fine. But, if you are more impressed by the 4-hit 0-walk game, then you accept that alot of the hits don’t come down to the pitcher. Do you care for a 12-K game with 4 runs or would you prefer the 3-K game with 2 runs?

The four Game Scores now opens up the conversation, and it allows everyone to have a voice, and for everyone to be heard. It just forces consistency, that you can’t start choosing it one way for one pitcher, and then decide on a different way for a different pitcher.

Zachary said...

I absolutely agree that each Game Score version is a different but valid view as to how to evaluate a pitcher’s performance. There most likely isn’t a perfect Game Score method, and if there is, or at least a best method, I couldn’t begin to guess as to what it would be composed of. I also was completely aware that this article was not meant to be an answer, just the opening of the discussion on the issue, and I had no intention to claim any one individual method was without merit. Also I acknowledge I am in no ways an expert in any of this, and the majority of the people here probably are significantly better suited to analyze the topic, especially those of you who are producing the content on here which brings readers like myself here hoping to learn. It is very likely that I have made a mistake or did not understand something correctly; my comment was merely what first jumped out at me while reading.

All I was trying to say, which was probably not worded as it should have been to get my point across, is that the act of averaging the 4 methods, and of using a weighting system seems to be flawed from a statistical standpoint, as it seems to create results that don’t line up with the purpose of the score. What I mean is that when the same variable is used in multiple methods the way it is considered is unique to that method and merely combining them and taking an average doesn’t necessarily produce what either method was trying to accomplish. I see this as a possible issue in two ways, although I will say right now that I could be completely off base with my assumptions and understanding, in which case just disregard everything I have been saying…

The first is due to a difference in weight of a metric. As an example I look at the “FIP” method and the “Linear weights” method, as both use HR’s and IP. In the “FIP” method an IP is 2.5 points towards the Game Score, and giving up a HR is 13 points from your Game Score. This implies that 5.2 IP and 1 HR are worth 0 pts, or an IP is 19% of the pts of HR back. For the “Linear weights” method an IP is plus 8.4 pts and a HR is minus 8 pts, so .95 IP equalizes with giving up a HR, or an IP is 105% of the pts back from a HR. Where I see the issue is that if you add the IP and HR values for the two then divide by 2, which is how the average Game Score was computed, you get positive 5.45 pts for an IP and negative 10.5 pts for giving up a HR. This is saying that 1.93 IP equalizes with giving up a HR, or an IP is 52% of the pts back for a HR. Yet if you merely average the relationship values of the two methods you get 3.08 IP equalizes with 1 HR, or an IP is 62% of the pts back for a HR. The method used for computing the average Game Score produces results that are not actually representative of the middle ground between the different methods, even though the weighting was 50/50. I want to add that I am not 100% confident this is a bad thing, as the final values from both method’s Game Scores come out to pretty similar values, so it might just be that everything is already on the same scale and one method just views the significance of the relationship more than the other. Also I admit that this is probably very hard to follow, but hopefully the jist of it is clear enough.

The second issue which is much more straight foreword, at least to explain, is that some metrics are used in more methods than others. For example BB’s are in three but SO’s are in two of the methods. So essentially one method is saying that SO’s are not important to the Game Score, but by adding the BB weight without any SO weight you again get a relationship that really doesn’t represent what any of the metrics was trying to produce. Like with the previous issue this isn’t necessarily a problem as it incorporates that as a whole the impact of BB’s is considered more important when computing the Game Score than SO’s are. The concern is that when you use the metric of SO’s with a method that didn’t use it there might be a relationship where there wasn’t meant to be one.

I don’t think there is anything conclusive that comes from my concerns above, it just seems to me that simply taking the averages between the different metrics doesn’t necessarily produce a result that is better, at least not in the way it was meant to. I can say that thinking about it more I can see how assigning weights to the different metrics can create a score that will be representative of the merit that each individual method holds in people’s eyes, I’m just not sure that having a value that comes from appeasing the supporters of each different method is necessarily the best way to find a Game Score.

Zachary said...

As an aside, in response to the comment on “predictive FIP”, I am not entirely sure I follow what you are trying to say. From what I think I understand it is that you are saying pFIP calculation is similar, at least loosely, to how I was looking at the weighting of “SO’s and BB’s” method and the “FIP” method. If that is so, then maybe using the pFIP to calculate the Game Score may be of interest. What I was saying in the highlighted line was that we have a new formula for a game score that uses HR, BB, and SO that is different than the formula for the FIP calculation of the Game Score used. If this new formula is actually preferable, and it may be, then doesn’t imply that the original FIP formula is less preferable?

Sorry this was so long, I really did not intend it, just became very interested/curious on the issue the more I think about it, which I guess means the article was successful in opening the topic for conversation.

Jim said...

I came up with a totally different one years ago, which I will throw out here. I’m not in love with this one either, but it does reward excellence and not mediocrity.

Jim’s Modified game scores

1. Must pitch at least 7 innings.

2. Add one point for each batter retired after 21.

3. Subtract (or add) the difference between:

Strikeouts and walks.

Hits and four.

Earned runs and modified quality start

Average pitches per inning and eleven

My modified quality start is a straight ERA of 3.00 or less.

This can result in a negative game score, which I guess is the reason Bill James added 50 at the beginning, so no one would be negative for a game. On April 13, 2012, Matt Cain had a 91 using Bill James’ and a 72.22 using mine.

Mine eliminates the 6 inning pitchers, which eliminates a lot of games. However, if we develop a team game score to include relievers, this might be good. But then, we could use WPA for that. Doing this would make it non-attribute, and the current generation of metrics can’t stand team effort and the inability to not blame some one.

Carl said...

Carl’s Game Score:

Take the prior year’s winning percentage for each IP/ER combo and multiply that percentage for the correspondnig IP/ER for each start. The individual games will be a % of win earned and the sum of a pitcher’s individual games will be his total wins. Subtract starts from the adjusted wins to get Adjusted Losses.

By using entire starts, eliminates the situation where a starter is domninent one night, terrible the next yet his .500 record looks unfair due to lots of K’s, low walks, etc. Also, eliminates knuckle ballers (and other pitchers such as Hudson) who outperform their FIP.

No ma'am we're musicians said...

I guess what bothers me about such efforts is the lumping into the ‘bad’ bin all walks. I’ve seen situations where the walks did turn out bad for the defense, but I tend to remember more times when the walk of a hot hitter lead to the scoreless end of the inning. Conversely, I’ve seen hits by a slow runner clog up the bases enough so that instead of being a blowout inning, a couple runs are picked up.

Some sort of different approach is needed, where the results of an inning factor back into the events.

bucdaddy said...

I can’t stand decimals either, in situations like this.

—-

Heh, I guess it’s just a glitch in the way the clocks in clock sports operate, but it amuses/irks me that the timers in, say, basketball start running decimals in the last minute, and that announcers feel obligated to include them. “Time out, Lakers, with 57.6 seconds to play …” I mean, there DOES come a point when the decimals matter to actual strategy, but that point is with 3.8 or 5.1 seconds to go, and not much sooner.

Dave Cornutt said...

I seem to recall that when James first published the Game Score formula and his original observations, he stated that he regarded it as a toy—fun to play with, but with questionable value as to enlightenment. However, it’s a good point that GS is an attempt to do a better version of what Quality Start (which goes back to the dawn of sabermetrics) does. (Hey, does anyone remember Runs Created…) The main qulbble I have with the way this is presented is that, like all counting stats, you need to have some idea of context in order to really understand what they are telling you. The context, in this case, is starts—and given that the number of starts per season for a pitcher is a fairly small number to begin with, summing up the game scores could be misleading. Someone who piles up a bunch of average starts could end up with a higher total than a good pitcher with slightly fewer starts. (Consider the #5 starter on a really good staff vs. the #2 starter on a poor staff—who’s likely to have more starts?)

I think that average GS per start might be more enlightening. I might try to run that tonight.

MrMan said...

@Jim

Not sure if you’re serious or not.

But if you added all the money to the right of the decimal in my paychecks last year you come up with $12.45. Now, $12.45 isn’t worthless, you can have a decent meal with it or purchase a good Online game via XBOX Live or even make it halfway to a BPiA Boomstick.

But in terms of using it to evaluate my overall compensation…it would be meaningless. And realize that’s the aggregate of all my paychecks…any one of the paychecks the amount, a t most, would be $0.99.

The point is that .99 doesn’t tell you or me anything leading to knowledge; it’s simply numerals. Numerals whose presence actually interferes with making sense of the thing you’re trying to make sense of.

So…it’s not a fear of numbers…it’s a prioritization of the numbers that matter.

Tangotiger said...

Jim: we’re arguing for rounding, not truncating.

I can’t believe you would use the plot device of Superman III to make your case. (And I can’t believe they made a worse Superman movie after that one.)

Carl said...

Okay guys, I’m intrgued enough to sacrifice a few weekends to get this done.

Can someone pelase direct me to where I can get a download (pref in either xls or csv format, a list of all 2010-2012 games reflecting the last name/first name of teh starter, the IP by that starter in the game, the Runs allowed by each starter in the game and whether the team (not the pitcher) won or lost the game? I’m envisioning a large file w 14580 rows of data and 5 columns to start the analysis?

Thank you to all who can help me finish my jouney out of the Dark into Saberland.

Tangotiger said...

Carl: you can get it at Baseball-Reference.com’s Play Index.

But, you are going to find what can be explained by PythagenPat. If league average is 4.3 runs per 9 IP, and if you have a pitcher that gave up say 1 runs in 6 IP, this is what you have:

Team Runs scored = 4.3

Team Runs allowed = 1 + 4.3/9*3 = 2.43

And that will give you a win% of 72.5%.

My game score for that is:

Game Score = 6.4*6 – 10*1 + 40 = 68.4.

Which seems close enough for a crude measure.

Jim said...

I’m answering two posts directed to me at once.

Okay tango, after rounding what have you got? Truncation! And there would have been no clear cut triple crown winner last year because both Trout and Cabrrera would have batted 33 or .33.

No, if you have your employer send to my bank account everything to the right of the decimal in your paycheck and I get 999 other people to do the same, that will add a good chunk to my disposable income. Figuring the average amount at 50 cents, times 1000 deposits per week, keeps me in Dale’s, I guarantee.

And all numbers matter, otherwise there wouldn’t be 10 (counting the zero, of course). Hope you are not getting this confused with prime numbers.

And to think this started out because someone complained about aesthetics, wow.

Tangotiger said...

If my paycheck is 100.60 cents, I’d get 101$. If someone else’s paycheck is 100.40 cents, he’d get 100$. Either way, our company is paying out 201$. That’s why rounding works, because given a large enough number of employees, things will work out.

***

Carl, I just wrote this up:

http://tangotiger.com/index.php/site/article/tangos-lab-deconstructing-game-score

Jim said...

For those of you who don’t like decimals, would you send me everything to the right of the decimal on your pay check? If it’s not important to you, I will be able to use it. You guys are scared of number, methinks.

Tangotiger said...

The purpose of FIP is to be “descriptive”. The formula can’t change. It’s its raison d’etre.

The purpose of pFIP is, by definition, to be “predictive”.

Predictive is based on historical data, and that means to look backwards and figure out what the pitcher was really responsible for, and the rest was just random variation. (A descriptive stat still counts that random variation, while a predictive one removes it.)

By averaging the two game scores, we are reducing that random variation (somewhat).

***

And you can actually make a stronger case that you don’t like the averaging by comparing the walk to hit, and noting that the coefficient of the walk will be greater than the non-HR hit. Seems weird, but, we’re getting into trying to remove the random variation.

Rolling Fingers said...

After admittedly scrolling rather quickly through all the discussion about decimal places, we are still left to wonder why there are any fractions in the first table. The Bill James formula always yields an integer. The sum of a column of integers should always be an integer.

Carl said...

Tom Tango,

Thank you for both the suggestion to use basebalreference.com (will subscribe tomorrow), as well as the other link.

The use of pytheg to winning percentage is cool. After just a quick look though, I think the small differences (ie 3 game score points in the example above) breaks down at the extreme. For example, a team that allows 8 runs in 9 innings, to me, is unlikely to win 22% of the time. Conversely if the next game that same team allows only 2 runs in 9 innings, are they going to win only 78% of the time?

Once done w my analysis, will ask the HBT folks if they will publish my research as an article.

Tangotiger said...

Right, as I noted in the comments on my blog, I made a mistake. A team that AVERAGES allowing 2 runs will win 78% of the time, but if they allow EXACTLY 2 runs, the calculation will be different.

Tangotiger said...

As for the Bill James Game Score, since the decimals are all the .33 and .66 variety, it’s clear that Matt gave out points for partial innings, which is not correct.

Matt Hunter said...

Apologies, I must have had an error in my formula somehow.

Thanks for all the fantastic feedback, everyone. Definitely lots of material to think about and improve on for the future.

Hardwood said...

I do love watching people fail at statistics. Makes for quite the friday night.

bucdaddy said...

IIRC, sending fractions of a cent into your own account was a plot point in “Office Space.” It worked out to be a little more than the penny-ante (literally) thieves thought it would.