Summing up Game Score

by Matt Hunter
April 18, 2013

I’m on a bit of a pitcher evaluation kick at the moment. Just a couple of days ago, I wrote about crowdsourcing balls in play at Beyond the Box Score.

More importantly, two weeks ago I had an idea: instead of measuring starting pitching performances on an inning or plate appearance basis, why don’t we evaluate them on a game-by-game basis? Since (team) wins are the end goal of a pitcher, and since each game is basically independent, we could evaluate an entire season simply by evaluating each start, and summing them up.

So how do we evaluate a single start? Traditionally, we have used pitcher wins. Then, those who wanted to ignore the effect of the pitcher’s team offense thought of the Quality Start. But do we really want to say that a six-inning, three-run start (4.50 ERA) is quality? No is the answer. No we don’t.

There wasn’t a great way to evaluate a single start, so Bill James, doing what Bill James does best, created something called Game Score. Here’s the formula for Game Score:

Game Score = Outs + 2*(innings completed after the fourth) + strikeouts – 2*hits – 4*earned runs – 2*unearned runs – walks + 50

It was a pretty good start, but far from perfect. Weighting earned runs twice as strongly as unearned runs seems arbitrary, as does counting only innings after the fourth. I won’t get into the specifics of what’s wrong with this Game Score, because it doesn’t really matter for my purposes. But, because it will be a good reference, I’ll show you the leader board for the sum of each pitcher’s Game Score for each start in the 2012 season:

Num Name GS
1 Clayton Kershaw 2089.33
2 Justin Verlander 2072.66
3 R.A. Dickey 2057.33
4 Felix Hernandez 1969.66
5 Matt Cain 1947.66
6 Zack Greinke 1917.66
7 David Price 1914.99
8 Gio Gonzalez 1912.66
9 Johnny Cueto 1907
10 James Shields 1893.33
11 Kyle Lohse 1885
12 Mat Latos 1870
13 Jake Peavy 1862.99
14 Cole Hamels 1859.66
15 Hiroki Kuroda 1858.33
16 Madison Bumgarner 1837.66
17 Yovani Gallardo 1828.66
18 Jordan Zimmermann 1797
19 C.J. Wilson 1796.33
20 Jason Vargas 1787.66

Num	Name	GS
1	Clayton Kershaw	2089.33
2	Justin Verlander	2072.66
3	R.A. Dickey	2057.33
4	Felix Hernandez	1969.66
5	Matt Cain	1947.66
6	Zack Greinke	1917.66
7	David Price	1914.99
8	Gio Gonzalez	1912.66
9	Johnny Cueto	1907
10	James Shields	1893.33
11	Kyle Lohse	1885
12	Mat Latos	1870
13	Jake Peavy	1862.99
14	Cole Hamels	1859.66
15	Hiroki Kuroda	1858.33
16	Madison Bumgarner	1837.66
17	Yovani Gallardo	1828.66
18	Jordan Zimmermann	1797
19	C.J. Wilson	1796.33
20	Jason Vargas	1787.66

Looks like it passes the sniff test to me. Let’s move on.

A couple years ago, Tom Tango introduced a few alternatives to James’ Game Score, each one based on a different method of evaluating pitchers. Let’s summarize them.

Runs

The first new version of Game Score cares only about runs allowed. It’s essentially the Game Score version of RA9. Here’s the formula (again, as formulated by Tango):

Game Score = 6.4*IP – 10*R + 40

And the 2012 leader boards for total Game Score:

Num Name Runs GS
1 Clayton Kershaw 2077.06
2 R.A. Dickey 2049.06
3 Justin Verlander 2035.33
4 Johnny Cueto 1978.8
5 Felix Hernandez 1964.8
6 David Price 1960.39
7 Matt Cain 1953.73
8 Kyle Lohse 1940.4
9 Zack Greinke 1878.93
10 Hiroki Kuroda 1865.86
11 Gio Gonzalez 1865.73
12 Jordan Zimmermann 1842.26
13 Matt Harrison 1825.33
14 Cole Hamels 1818.13
15 Jake Peavy 1801.59
16 Mat Latos 1789.73
17 Jason Vargas 1780.93
18 Jered Weaver 1777.46
19 Yovani Gallardo 1765.6
20 Cliff Lee 1760.4

Num	Name	Runs GS
1	Clayton Kershaw	2077.06
2	R.A. Dickey	2049.06
3	Justin Verlander	2035.33
4	Johnny Cueto	1978.8
5	Felix Hernandez	1964.8
6	David Price	1960.39
7	Matt Cain	1953.73
8	Kyle Lohse	1940.4
9	Zack Greinke	1878.93
10	Hiroki Kuroda	1865.86
11	Gio Gonzalez	1865.73
12	Jordan Zimmermann	1842.26
13	Matt Harrison	1825.33
14	Cole Hamels	1818.13
15	Jake Peavy	1801.59
16	Mat Latos	1789.73
17	Jason Vargas	1780.93
18	Jered Weaver	1777.46
19	Yovani Gallardo	1765.6
20	Cliff Lee	1760.4

Strikeouts and walks

Here we have the other end of the spectrum; instead of considering only runs allowed, this version is going to be based only on strikeouts and walks, and nothing else. It’s basically the Game Score version of kwERA.

Game Score = 0.4*IP + 3*(SO–BB) + 40

And the leader boards:

Num Name KBB GS
1 Justin Verlander 1958.33
2 R.A. Dickey 1947.06
3 Clayton Kershaw 1924.06
4 Felix Hernandez 1913.8
5 James Shields 1912.06
6 Zack Greinke 1882.93
7 Max Scherzer 1874.06
8 Cole Hamels 1827.13
9 Cliff Lee 1821.4
10 Ian Kennedy 1811.33
11 Madison Bumgarner 1807.33
12 Jake Peavy 1805.59
13 Matt Cain 1796.73
14 Mat Latos 1793.73
15 Johnny Cueto 1784.8
16 Yovani Gallardo 1779.6
17 David Price 1768.39
18 Adam Wainwright 1764.46
19 Hiroki Kuroda 1761.86
20 Gio Gonzalez 1761.73

Num	Name	KBB GS
1	Justin Verlander	1958.33
2	R.A. Dickey	1947.06
3	Clayton Kershaw	1924.06
4	Felix Hernandez	1913.8
5	James Shields	1912.06
6	Zack Greinke	1882.93
7	Max Scherzer	1874.06
8	Cole Hamels	1827.13
9	Cliff Lee	1821.4
10	Ian Kennedy	1811.33
11	Madison Bumgarner	1807.33
12	Jake Peavy	1805.59
13	Matt Cain	1796.73
14	Mat Latos	1793.73
15	Johnny Cueto	1784.8
16	Yovani Gallardo	1779.6
17	David Price	1768.39
18	Adam Wainwright	1764.46
19	Hiroki Kuroda	1761.86
20	Gio Gonzalez	1761.73

FIP

See the previous version, but add home runs, and you have the FIP version. There’s really not too much else to say. As always, Tango’s formula:

Game Score = 2.5*IP + 2*SO – 3*BB – 13*HR + 40

Leader board:

Num Name FIP GS
1 Felix Hernandez 1996
2 Justin Verlander 1972.83
3 Clayton Kershaw 1965.16
4 R.A. Dickey 1906.66
5 Zack Greinke 1894.83
6 Johnny Cueto 1875.5
7 Gio Gonzalez 1856.33
8 James Shields 1842.16
9 Adam Wainwright 1802.66
10 David Price 1798.49
11 Matt Cain 1791.33
12 Kyle Lohse 1775.5
13 Madison Bumgarner 1754.83
14 Cole Hamels 1751.33
15 Max Scherzer 1738.16
16 Hiroki Kuroda 1731.16
17 Mat Latos 1723.33
18 Jake Peavy 1720.49
19 Cliff Lee 1719.5
20 Jordan Zimmermann 1718.16

Num	Name	FIP GS
1	Felix Hernandez	1996
2	Justin Verlander	1972.83
3	Clayton Kershaw	1965.16
4	R.A. Dickey	1906.66
5	Zack Greinke	1894.83
6	Johnny Cueto	1875.5
7	Gio Gonzalez	1856.33
8	James Shields	1842.16
9	Adam Wainwright	1802.66
10	David Price	1798.49
11	Matt Cain	1791.33
12	Kyle Lohse	1775.5
13	Madison Bumgarner	1754.83
14	Cole Hamels	1751.33
15	Max Scherzer	1738.16
16	Hiroki Kuroda	1731.16
17	Mat Latos	1723.33
18	Jake Peavy	1720.49
19	Cliff Lee	1719.5
20	Jordan Zimmermann	1718.16

Linear weights

Last one! This time, we’re going to use a simplified version of linear weights, looking only at walks, hits and home runs.

Game Score = 8.4*IP – 3*BB – 5*H – 8*HR + 40

Leader board:

Num Name LWTS GS
1 Clayton Kershaw 2080.39
2 Justin Verlander 2035.99
3 R.A. Dickey 1984.39
4 Felix Hernandez 1943.8
5 Matt Cain 1919.39
6 Gio Gonzalez 1918.4
7 Kyle Lohse 1869.4
8 Johnny Cueto 1865.8
9 David Price 1848.39
10 Zack Greinke 1837.59
11 James Shields 1824.39
12 Mat Latos 1818.39
13 Jake Peavy 1804.59
14 Madison Bumgarner 1801.99
15 Hiroki Kuroda 1793.19
16 Cole Hamels 1759.8
17 Jered Weaver 1754.79
18 C.J. Wilson 1735.6
19 Jordan Zimmermann 1726.6
20 Adam Wainwright 1701.8

Num	Name	LWTS GS
1	Clayton Kershaw	2080.39
2	Justin Verlander	2035.99
3	R.A. Dickey	1984.39
4	Felix Hernandez	1943.8
5	Matt Cain	1919.39
6	Gio Gonzalez	1918.4
7	Kyle Lohse	1869.4
8	Johnny Cueto	1865.8
9	David Price	1848.39
10	Zack Greinke	1837.59
11	James Shields	1824.39
12	Mat Latos	1818.39
13	Jake Peavy	1804.59
14	Madison Bumgarner	1801.99
15	Hiroki Kuroda	1793.19
16	Cole Hamels	1759.8
17	Jered Weaver	1754.79
18	C.J. Wilson	1735.6
19	Jordan Zimmermann	1726.6
20	Adam Wainwright	1701.8

Average

Now, it’s almost certain that none of these versions of Game Score is perfect on its own. However, as Tango said in the article a few years ago, we can assign weights to each one depending on our goals or preferences. Unfortunately, right now, I’m not sure how to do that. Maybe that will be a project for a future article. For now, I’m going to give you the average of all four new versions of Game Score.

Num Name Avg GS
1 Clayton Kershaw 2027.2
2 Justin Verlander 2015.028
3 R.A. Dickey 1988.9
4 Felix Hernandez 1957.612
5 Zack Greinke 1882.388
6 Johnny Cueto 1882.38
7 Matt Cain 1881.768
8 Gio Gonzalez 1862.97
9 David Price 1858.13
10 James Shields 1845.8
11 Kyle Lohse 1838.54
12 Cole Hamels 1803.21
13 Hiroki Kuroda 1802.08
14 Jake Peavy 1799.05
15 Mat Latos 1799.036
16 Madison Bumgarner 1789.028
17 Jordan Zimmermann 1755.656
18 Cliff Lee 1744.14
19 Yovani Gallardo 1741.292
20 Max Scherzer 1732.132

Num	Name	Avg GS
1	Clayton Kershaw	2027.2
2	Justin Verlander	2015.028
3	R.A. Dickey	1988.9
4	Felix Hernandez	1957.612
5	Zack Greinke	1882.388
6	Johnny Cueto	1882.38
7	Matt Cain	1881.768
8	Gio Gonzalez	1862.97
9	David Price	1858.13
10	James Shields	1845.8
11	Kyle Lohse	1838.54
12	Cole Hamels	1803.21
13	Hiroki Kuroda	1802.08
14	Jake Peavy	1799.05
15	Mat Latos	1799.036
16	Madison Bumgarner	1789.028
17	Jordan Zimmermann	1755.656
18	Cliff Lee	1744.14
19	Yovani Gallardo	1741.292
20	Max Scherzer	1732.132

This list looks good, but it is far from a perfect way to evaluate pitchers. It doesn’t take into account park or league factors, which is incredibly important. However, if you’re looking for a different way to evaluate pitchers that takes many different factors into account, this is something to consider.

Conclusion

There you have it. For your reference, here’s a Google Docs spreadsheet of all the versions of Game Score for every pitcher who made at least one start in 2012.

Before I go, because I didn’t do a whole lot of actual analysis, here are some of my ideas at the moment for where to go next with these data:

{exp:list_maker} Include park and league factors
Combine these versions of Game Score with varying weights
Convert Game Score to wins
Look at total Game Score over a career
Probably much, much more. Stay tuned! {/exp:list_maker}

Thanks again to Tom Tango for the inspiration and, honestly, most of the real analysis. Also thanks to James Gentile for the Retrosheet help.

Matt is the founder of SaberSim, a daily sports projections and analytics company. Follow him on Twitter @MattR_Hunter and @SaberSim, or email him here and tell him all the things he should do to make the site better.

33 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

MrMan

11 years ago

Someone help me with something that continually perplexes me. I see on Fangraphs and sites like this a lot of really good, insightful analysis and well-thought out approaches to mining numbers.

But I also see a lot of poorly presented information. For example, in the very simple tables used on this post it’s difficult for readers to quickly grasp the meaning simply due to the poor formatting of the numbers.

Currently the numbers are presented in a centered format with two decimals. First, the two decimals are meaningless when you’re dealing with numbers i n the thousands; there’s no meaningful difference between 1897.54 and 1897.21. The decimals should not be included.

Second, the one thousand comma indicator is not included when it should be. Any time you’re dealing with numbers in the thousands the comma provides a visual cue to the eye that enables the reader to more quickly make sense of the numbers he’s looking at.

Finally, the numbers are centered when the should be right-aligned. This creates confusion for the user as numbers lined up vertically are sometimes the 1st number in a 4-digit number, sometimes the 2nd.

Combined these three factors create somewhat of a mess for the reader. Now this is a simple table and it simply makes the reader work a little more to make sense of the content. But it is indicative of a weakness I see in much SABR work.

Same thing goes for graphs and charts; I see a lot of lazy, poorly designed charts. Charts should take hard-to-comprehend numbers and make them easier to comprehend but I see a lot of charts that don’t actually make the information any more sensible than a raw table.

The presentation of data / findings is often as important as the actual content is communicating the meaning. I don’t understand why more effort isn’t put into the presentation, especially when you consider how long it would take to make the changes I note above (less than a minute).

Matt Hunter

11 years ago

Thanks for the feedback MrMan. You’re completely correct, and I should have taken more care in presenting these tables well. I’ll keep your suggestions in mind for future tables, graphs, and charts.

Jim

11 years ago

If you take off the 50 free points per start then you can subtract 1650 from both Kershaw and Verlander and probably others. This takes care of the comma problem for those who don’t read numbers well.

I guess another reason for the 50 points is if a pitcher starts throws one pitch that is not hit and then departs, his game score is 50. Wow!

MrMan

11 years ago

Matt

First, that’s a positive attitude. Most people don’t respond to any criticism with “you’re completely correct”. Second, you’re obviously smarter than I when it comes to baseball analytics; but if you ever do want some help in presenting data it’s basically what I do for a living and would be happy to help.

@Jim….I’m with you in not really understanding the granting of the 50 points per start. This accounts for as much as 80% to 90% of the overall score. Seems like a generous number for simply getting up on the mound and is too heavy compared to how difficult it is to accumulate additional numbers after that point.

Tangotiger

11 years ago

I can’t stand decimals either, in situations like this. If it has no meaningful difference, it shouldn’t be shown.

And this applies to ERA (should be to one decimal place), OBP, SLG (to two decimal places), and so on. At least for those, they can rely on intertia/tradition for their obstinance.

It’s why Bill James shows 107 runs created, and not 107.2.

***

My Game Score starts at 40, not 50, and someone recommended I start at 30 (and make each of the point be earned to get to the average of 50). Starting at 50 is problematic, precisely for the 1-pitch scenario.

Game Score is equivalent to win%, and so, by starting at 40 (or 30), you are explicitly starting everyone at “replacement level”.

***

Carl: what you are asking is pretty much exactly my first version of Game Score.

Carl

11 years ago

Tom Tango/Tangotiger (I assume same person)

I had missed how similar my proposal was to your #1 proposal. Nice catch. I do think my proposal would take that and tweak it by:
1) not giving 10 marginal runs as a win
This is important, I believe, to adjust for diiferent scoring periods (1960s vs 1920s vs steroid era vs deadball era) and
2) requiring minimum 5 innings (hidden in looking at wins for IP/ER allowed
3) using ER instead of R to avoid differences in official scorers.
4) not adding 40.

Are you up to the challenge of creating such a tool and analysis? Wuold be awesome to see leaders for a year and more controversial starter’s careers ie) Blyleven, John and Kaat.

Tangotiger

11 years ago

Yes, I would tweak it by era. The 10 runs per win is good enough though. Most eras will have it between 9 and 11, and, really, does it matter if the Game Score will show 64 instead of 67? Anyway, I agree that the “10” should be flexible.

***

“using ER instead of R to avoid differences in official scorers”

?? You are confused here. R has NO interpretation. ER is subject to official scorer interpretation. So, to avoid differences, you want R. You are arguing against yourself here.

***

And the “40” or “30” or whatever to add, is simply to give it a scale we can understand. If I say the average is “50” and 99% of the pitchers will be between 0 and 100, and that the number corresponds to chances of winning, isn’t that a good enough reason to do that?

***

I put out 4 different Game Scores, which is MY challenge to everyone. I’m sitting this one out, and just pointing out the issues as I see them.

Carl

11 years ago

Tom Tango,
1) You are 100% that I got myself confused w ER/R and while I had proposed ER above, I should not have. Using R ist he superior measuring stick.

2) I don’t care for the 10, as while it really doesn’t matter for individual games (as you say, 64 vs 67 doesn’t really matter that much). I want to avoid the situation where the amount of the loss affects an adjusted W/L record. For example, a SP who gives up 8 runs in 1/3 of an ining will get the loss 99.99% of the time. One who gives up 18 runs in 1/3 of an inning will get the loss 99.99% of the time. Still, when doing game-by-game summations to get a W-L record, my way would count for 1 loss while using 10 (or 11 or 9 by era) would get 2 losses.

Both game scores are giong to be at or near zero (rightfully so), but I only care when compilnig a season’s/career adjusted W/L record.

Sorry you didn’t take up the challenge. I simply don’thave time to pickup my own gauntlet. Anyone else out there? Bill James sitting around w a fere afternoon?

Carl

11 years ago

PS> Tom Tango, Loved the original article and remembered it as soon as I clicked on the link above.

Zachary

11 years ago

I know you didn’t mean too much by averaging the different metrics, so this is not any sort of criticism but just observations if you actually decide to attempt an improved game score.
You said that it is almost certain none of the versions on their own are perfect, but that assigning weights depending on goals/preferences to the metrics (like the 25/25/25/25 split in the simple average provided) could potentially produce a better game score figure.
In my opinion almost any sort of weighting method across multiple versions will actually produce a worse result for evaluation for a few reasons. One is the repetitive nature of some of these versions, for example the “strikeouts and walks” version and the “FIP” version. Like you said the “FIP” version is just the “previous version” (k’s and bb’s) but adding hr’s. The numbers are a little different in the two calculations are a little different, but looking at it from a simplified viewpoint as the general value impacts are the same, when you average these two together, you are essentially just creating a new formula where the bb and k impact on the score is double (or the hr impact is halved). I don’t see a way that this provides a better metric for evaluation, it is seemingly saying take FIP but with HR’s having half (roughly) the effect it usually does. If this was preferable I would think that would be the weight used for FIP from the outset.
I don’t want to run on forever, so I will just say the addition of the other two methods only continues to distort the results each one was originally intended to produce. In the end if you wanted to take the time, you could make a formula for the average game score, which would just be a factor of the inputs [IP (used in all 4 versions), R, SO (2 versions), BB (3 versions), HR (2 versions), H]. My rough try at the formula came out to (for whatever it’s worth): [(4.425IP)+(1.25SO)-(2.5R)-(2.25BB)-(5.25HR)-(1.25H)+40] This creates a new metric which, at least to me, has the same issue you had with the original game score method, that it seems arbitrary.
Side note: you stated that the average game scores you calculated was made up of the four new methods provided, but when I just ran through the excel on my own, I found the results posted to be an average of the four new methods along with the original game score. This obviously doesn’t matter for your article, or for my commentary, as we both agree that either output (4 method or 5 method average) has its flaws. I just wanted to point it out for you.
I am sure that there is some way that weighting the different methods can produce a more effective output, just as I am sure there is a better method out there for evaluating pitchers, but I am not sure how you could possibly discern what that would actually entail of, that is sort of an answer we are all searching for when it comes to player evaluation.

Tangotiger

11 years ago

If this was preferable I would think that would be the weight used for FIP from the outset.

I guess you are not familiar with “predictive FIP”, elsewhere at Hardball Times, because that’s pretty much what it comes down to.

***

The purpose of the different Game Scores is that each has a different view as to how to evaluate a pitcher’s performance.

And by laying it out there, each person is therefore free to decide how they want to view that performance… but forces that person to be consistent.

Is a 4-hit, 0-walk game the same as a 0-hit 6-walk game? Well, fine. But, if you are more impressed by the 4-hit 0-walk game, then you accept that alot of the hits don’t come down to the pitcher. Do you care for a 12-K game with 4 runs or would you prefer the 3-K game with 2 runs?

The four Game Scores now opens up the conversation, and it allows everyone to have a voice, and for everyone to be heard. It just forces consistency, that you can’t start choosing it one way for one pitcher, and then decide on a different way for a different pitcher.

Zachary

11 years ago

I absolutely agree that each Game Score version is a different but valid view as to how to evaluate a pitcher’s performance. There most likely isn’t a perfect Game Score method, and if there is, or at least a best method, I couldn’t begin to guess as to what it would be composed of. I also was completely aware that this article was not meant to be an answer, just the opening of the discussion on the issue, and I had no intention to claim any one individual method was without merit. Also I acknowledge I am in no ways an expert in any of this, and the majority of the people here probably are significantly better suited to analyze the topic, especially those of you who are producing the content on here which brings readers like myself here hoping to learn. It is very likely that I have made a mistake or did not understand something correctly; my comment was merely what first jumped out at me while reading.
All I was trying to say, which was probably not worded as it should have been to get my point across, is that the act of averaging the 4 methods, and of using a weighting system seems to be flawed from a statistical standpoint, as it seems to create results that don’t line up with the purpose of the score. What I mean is that when the same variable is used in multiple methods the way it is considered is unique to that method and merely combining them and taking an average doesn’t necessarily produce what either method was trying to accomplish. I see this as a possible issue in two ways, although I will say right now that I could be completely off base with my assumptions and understanding, in which case just disregard everything I have been saying…
The first is due to a difference in weight of a metric. As an example I look at the “FIP” method and the “Linear weights” method, as both use HR’s and IP. In the “FIP” method an IP is 2.5 points towards the Game Score, and giving up a HR is 13 points from your Game Score. This implies that 5.2 IP and 1 HR are worth 0 pts, or an IP is 19% of the pts of HR back. For the “Linear weights” method an IP is plus 8.4 pts and a HR is minus 8 pts, so .95 IP equalizes with giving up a HR, or an IP is 105% of the pts back from a HR. Where I see the issue is that if you add the IP and HR values for the two then divide by 2, which is how the average Game Score was computed, you get positive 5.45 pts for an IP and negative 10.5 pts for giving up a HR. This is saying that 1.93 IP equalizes with giving up a HR, or an IP is 52% of the pts back for a HR. Yet if you merely average the relationship values of the two methods you get 3.08 IP equalizes with 1 HR, or an IP is 62% of the pts back for a HR. The method used for computing the average Game Score produces results that are not actually representative of the middle ground between the different methods, even though the weighting was 50/50. I want to add that I am not 100% confident this is a bad thing, as the final values from both method’s Game Scores come out to pretty similar values, so it might just be that everything is already on the same scale and one method just views the significance of the relationship more than the other. Also I admit that this is probably very hard to follow, but hopefully the jist of it is clear enough.
The second issue which is much more straight foreword, at least to explain, is that some metrics are used in more methods than others. For example BB’s are in three but SO’s are in two of the methods. So essentially one method is saying that SO’s are not important to the Game Score, but by adding the BB weight without any SO weight you again get a relationship that really doesn’t represent what any of the metrics was trying to produce. Like with the previous issue this isn’t necessarily a problem as it incorporates that as a whole the impact of BB’s is considered more important when computing the Game Score than SO’s are. The concern is that when you use the metric of SO’s with a method that didn’t use it there might be a relationship where there wasn’t meant to be one.
I don’t think there is anything conclusive that comes from my concerns above, it just seems to me that simply taking the averages between the different metrics doesn’t necessarily produce a result that is better, at least not in the way it was meant to. I can say that thinking about it more I can see how assigning weights to the different metrics can create a score that will be representative of the merit that each individual method holds in people’s eyes, I’m just not sure that having a value that comes from appeasing the supporters of each different method is necessarily the best way to find a Game Score.

Zachary

11 years ago

As an aside, in response to the comment on “predictive FIP”, I am not entirely sure I follow what you are trying to say. From what I think I understand it is that you are saying pFIP calculation is similar, at least loosely, to how I was looking at the weighting of “SO’s and BB’s” method and the “FIP” method. If that is so, then maybe using the pFIP to calculate the Game Score may be of interest. What I was saying in the highlighted line was that we have a new formula for a game score that uses HR, BB, and SO that is different than the formula for the FIP calculation of the Game Score used. If this new formula is actually preferable, and it may be, then doesn’t imply that the original FIP formula is less preferable?

Sorry this was so long, I really did not intend it, just became very interested/curious on the issue the more I think about it, which I guess means the article was successful in opening the topic for conversation.

Jim

11 years ago

I came up with a totally different one years ago, which I will throw out here. I’m not in love with this one either, but it does reward excellence and not mediocrity.

Jim’s Modified game scores

1. Must pitch at least 7 innings.
2. Add one point for each batter retired after 21.
3. Subtract (or add) the difference between:
Strikeouts and walks.
Hits and four.
Earned runs and modified quality start
Average pitches per inning and eleven

My modified quality start is a straight ERA of 3.00 or less.

This can result in a negative game score, which I guess is the reason Bill James added 50 at the beginning, so no one would be negative for a game. On April 13, 2012, Matt Cain had a 91 using Bill James’ and a 72.22 using mine.

Mine eliminates the 6 inning pitchers, which eliminates a lot of games. However, if we develop a team game score to include relievers, this might be good. But then, we could use WPA for that. Doing this would make it non-attribute, and the current generation of metrics can’t stand team effort and the inability to not blame some one.

Carl

11 years ago

Carl’s Game Score:

Take the prior year’s winning percentage for each IP/ER combo and multiply that percentage for the correspondnig IP/ER for each start. The individual games will be a % of win earned and the sum of a pitcher’s individual games will be his total wins. Subtract starts from the adjusted wins to get Adjusted Losses.

By using entire starts, eliminates the situation where a starter is domninent one night, terrible the next yet his .500 record looks unfair due to lots of K’s, low walks, etc. Also, eliminates knuckle ballers (and other pitchers such as Hudson) who outperform their FIP.

No ma'am we're musicians

11 years ago

I guess what bothers me about such efforts is the lumping into the ‘bad’ bin all walks. I’ve seen situations where the walks did turn out bad for the defense, but I tend to remember more times when the walk of a hot hitter lead to the scoreless end of the inning. Conversely, I’ve seen hits by a slow runner clog up the bases enough so that instead of being a blowout inning, a couple runs are picked up.

Some sort of different approach is needed, where the results of an inning factor back into the events.

bucdaddy

11 years ago

I can’t stand decimals either, in situations like this.
—-
Heh, I guess it’s just a glitch in the way the clocks in clock sports operate, but it amuses/irks me that the timers in, say, basketball start running decimals in the last minute, and that announcers feel obligated to include them. “Time out, Lakers, with 57.6 seconds to play …” I mean, there DOES come a point when the decimals matter to actual strategy, but that point is with 3.8 or 5.1 seconds to go, and not much sooner.

Dave Cornutt

11 years ago

I seem to recall that when James first published the Game Score formula and his original observations, he stated that he regarded it as a toy—fun to play with, but with questionable value as to enlightenment. However, it’s a good point that GS is an attempt to do a better version of what Quality Start (which goes back to the dawn of sabermetrics) does. (Hey, does anyone remember Runs Created…) The main qulbble I have with the way this is presented is that, like all counting stats, you need to have some idea of context in order to really understand what they are telling you. The context, in this case, is starts—and given that the number of starts per season for a pitcher is a fairly small number to begin with, summing up the game scores could be misleading. Someone who piles up a bunch of average starts could end up with a higher total than a good pitcher with slightly fewer starts. (Consider the #5 starter on a really good staff vs. the #2 starter on a poor staff—who’s likely to have more starts?)

I think that average GS per start might be more enlightening. I might try to run that tonight.

MrMan

11 years ago

@Jim

Not sure if you’re serious or not.

But if you added all the money to the right of the decimal in my paychecks last year you come up with $12.45. Now, $12.45 isn’t worthless, you can have a decent meal with it or purchase a good Online game via XBOX Live or even make it halfway to a BPiA Boomstick.

But in terms of using it to evaluate my overall compensation…it would be meaningless. And realize that’s the aggregate of all my paychecks…any one of the paychecks the amount, a t most, would be $0.99.

The point is that .99 doesn’t tell you or me anything leading to knowledge; it’s simply numerals. Numerals whose presence actually interferes with making sense of the thing you’re trying to make sense of.

So…it’s not a fear of numbers…it’s a prioritization of the numbers that matter.

Tangotiger

11 years ago

Jim: we’re arguing for rounding, not truncating.

I can’t believe you would use the plot device of Superman III to make your case. (And I can’t believe they made a worse Superman movie after that one.)

Carl

11 years ago

Okay guys, I’m intrgued enough to sacrifice a few weekends to get this done.

Can someone pelase direct me to where I can get a download (pref in either xls or csv format, a list of all 2010-2012 games reflecting the last name/first name of teh starter, the IP by that starter in the game, the Runs allowed by each starter in the game and whether the team (not the pitcher) won or lost the game? I’m envisioning a large file w 14580 rows of data and 5 columns to start the analysis?

Thank you to all who can help me finish my jouney out of the Dark into Saberland.

Tangotiger

11 years ago

Carl: you can get it at Baseball-Reference.com’s Play Index.

But, you are going to find what can be explained by PythagenPat. If league average is 4.3 runs per 9 IP, and if you have a pitcher that gave up say 1 runs in 6 IP, this is what you have:
Team Runs scored = 4.3
Team Runs allowed = 1 + 4.3/9*3 = 2.43

And that will give you a win% of 72.5%.

My game score for that is:
Game Score = 6.4*6 – 10*1 + 40 = 68.4.

Which seems close enough for a crude measure.

Jim

11 years ago

I’m answering two posts directed to me at once.

Okay tango, after rounding what have you got? Truncation! And there would have been no clear cut triple crown winner last year because both Trout and Cabrrera would have batted 33 or .33.

No, if you have your employer send to my bank account everything to the right of the decimal in your paycheck and I get 999 other people to do the same, that will add a good chunk to my disposable income. Figuring the average amount at 50 cents, times 1000 deposits per week, keeps me in Dale’s, I guarantee.

And all numbers matter, otherwise there wouldn’t be 10 (counting the zero, of course). Hope you are not getting this confused with prime numbers.

And to think this started out because someone complained about aesthetics, wow.

Tangotiger

11 years ago

If my paycheck is 100.60 cents, I’d get 101$. If someone else’s paycheck is 100.40 cents, he’d get 100$. Either way, our company is paying out 201$. That’s why rounding works, because given a large enough number of employees, things will work out.

***

Carl, I just wrote this up:

http://tangotiger.com/index.php/site/article/tangos-lab-deconstructing-game-score

Jim

11 years ago

For those of you who don’t like decimals, would you send me everything to the right of the decimal on your pay check? If it’s not important to you, I will be able to use it. You guys are scared of number, methinks.

Tangotiger

11 years ago

The purpose of FIP is to be “descriptive”. The formula can’t change. It’s its raison d’etre.

The purpose of pFIP is, by definition, to be “predictive”.

Predictive is based on historical data, and that means to look backwards and figure out what the pitcher was really responsible for, and the rest was just random variation. (A descriptive stat still counts that random variation, while a predictive one removes it.)

By averaging the two game scores, we are reducing that random variation (somewhat).

***

And you can actually make a stronger case that you don’t like the averaging by comparing the walk to hit, and noting that the coefficient of the walk will be greater than the non-HR hit. Seems weird, but, we’re getting into trying to remove the random variation.

Rolling Fingers

10 years ago

After admittedly scrolling rather quickly through all the discussion about decimal places, we are still left to wonder why there are any fractions in the first table. The Bill James formula always yields an integer. The sum of a column of integers should always be an integer.

Carl

10 years ago

Tom Tango,

Thank you for both the suggestion to use basebalreference.com (will subscribe tomorrow), as well as the other link.

The use of pytheg to winning percentage is cool. After just a quick look though, I think the small differences (ie 3 game score points in the example above) breaks down at the extreme. For example, a team that allows 8 runs in 9 innings, to me, is unlikely to win 22% of the time. Conversely if the next game that same team allows only 2 runs in 9 innings, are they going to win only 78% of the time?
Once done w my analysis, will ask the HBT folks if they will publish my research as an article.

Tangotiger

10 years ago

Right, as I noted in the comments on my blog, I made a mistake. A team that AVERAGES allowing 2 runs will win 78% of the time, but if they allow EXACTLY 2 runs, the calculation will be different.

Tangotiger

10 years ago

As for the Bill James Game Score, since the decimals are all the .33 and .66 variety, it’s clear that Matt gave out points for partial innings, which is not correct.

Matt Hunter

10 years ago

Apologies, I must have had an error in my formula somehow.

Thanks for all the fantastic feedback, everyone. Definitely lots of material to think about and improve on for the future.

Hardwood

10 years ago

I do love watching people fail at statistics. Makes for quite the friday night.

bucdaddy

10 years ago

IIRC, sending fractions of a cent into your own account was a plot point in “Office Space.” It worked out to be a little more than the penny-ante (literally) thieves thought it would.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG