How to measure a player’s value (Part 3)

I have defined what I mean by player value and why I believe that in parts One and Two. Now it’s time to get down to brass tacks, to apply principles to practice and see how they look in action.

This is not the most in-depth or accurate method of calculating a player’s value. The goal is to present a process that lends itself well to explanation. That said, I believe the values I will be presenting will hold up well with most of the advanced uberstats that have been made publicly available.

Certain assumptions will be presented, particularly when it comes to the replacement level baseline and the adjustment between positions. When reading through the explanations, please try not to get too wrapped up in the specific details. Consider them a starting point for discussion, neither more nor less. Once you’ve made it through the examples and understands how it all fits together, we can discuss challenging those assumptions.

All player data is drawn from the Stats section of THT.

Offense

For estimating a player’s value on offense, a linear system works best. What linear systems tell us is how many runs that player contributes to an average team. Therefore, linear weights values are not guaranteed to add up to team totals on teams that are especially good or poor on offense.

There are many linear weights formulas, of varying quality. Since a linear weights formula is baselined against the production of an average team, the best linear weights formula to use is one that’s specific to the context under study. For our purposes, I’ve prepared a linear weights formula for 2008 based on play-by-play data. It’s specifically tuned to the data on our basic hitting pages. The definition of out is AB-H-K. (The values below are rounded to three decimal places, and therefore might not reconcile precisely.)

Out
K
BB
HBP
H
2B
3B
HR
SB
CS
-0.278
-0.280
0.308
0.329
0.451
0.290
0.569
0.950
0.150
-0.467

These are runs above or below an average player’s. These should be similar to the batting values you see listed on Fangraphs, or the Batting Runs on Baseball Reference.

Now to park adjust. As discussed last week, a simple run-based park value formula is appropriate for a value metric. But we do have an issue. Our standard run-based park factors are typically applied as VALUE / PF. This works well with a lower bound of zero, but our linear weights are relative to average – negative values are in fact rather common. So what do we do?

What’s good for the goose should be good for the gander: If we’re using a linear run estimator, might it not be prudent to also use a linear park factor? What we simply do is take the average runs per plate appearance—roughly .122 in 2008—and park adjust that, and take the difference. That provides us with a linear factor to use, like so:

TEAM
PF
PF_PA
ARI
1.05
0.006
ATL
1.00
0.000
BAL
1.01
0.001
BOS
1.04
0.005
CHA
1.04
0.005
CHN
1.04
0.005
CIN
1.02
0.002
CLE
1.00
0.000
COL
1.09
0.011
DET
1.00
0.000
FLA
0.98
-0.002
HOU
0.99
-0.001
KC
1.00
0.000
LAA
0.98
-0.002
LAN
0.99
-0.001
MIL
1.00
0.000
MIN
1.00
0.000
NYA
1.00
0.000
NYN
0.97
-0.004
OAK
0.98
-0.002
PHI
1.02
0.002
PIT
0.98
-0.002
SD
0.92
-0.010
SEA
0.97
-0.004
SF
1.01
0.001
STL
0.98
-0.002
TB
0.99
-0.001
TEX
1.03
0.004
TOR
1.02
0.002
WAS
1.01
0.001

Simply mutliply that by the number of plate appearances and add it to the linear weights values and you have your park adjustment. (For those accustomed to seeing park factors relative to 100 instead of one, remember to simply divide your park factor by 100 when using it.)

And that’s really all there is to it. You have, simply enough, a measure of a player’s offensive value relative to average. We don’t want to adjust for position or replacement level yet—those both come later.

Defense

This is harder than measuring offense. This is also more controversial than measuring offense, and less accurate than measuring offense.

Please do not get bent around the axle about any of this. There is cause for reasonable people to have reasonable disagreements about this. And we can have those differences and discuss those differences reasonably.

But on the whole, any model based upon play-by-play data is going to get more right than it gets wrong. So we should be in the clear as long as we remember some of our basic principles:

  1. We can live with a certain amount of inaccuracy in our models, so long as we take pains to avoid bias, and
  2. We understand the limits of our models and take care to moderate our claims accordingly.

If you have two players who are five plays apart in fielding in a single season of a zone-based metric, with similar playing time at the same position, it makes little sense to declare too strongly that one was better than the other. And remember, there is no magic number for when a number becomes “reliable;” it’s not an on-off switch. Use everything with a grain of salt.

Start with the Revised Zone Rating figures published here on THT. What we have is a measure of playing time (BIZ, or balls in zone) and of plays made (Plays and OOZ, or out of zone plays). What we want is to baseline these against average, convert plays to runs, and then to adjust for the difference between positions.

Converting RZR to plays above or below average is simple. I use the formula:

BIZ * (PlayerRZR – LeagueRZR) + Innings * (PlayerOOZ/PlayerInn – LeagueOOZ/LeagueInn)

(You will occasionally see people use BIZ as a unit of playing time when measuring out of zone chances, but that doesn’t quite work the way you’d expect; there is a practical limit to the number of balls in play, and so at the team level what you tend to see is that the more balls in zone there are, the fewer out of zone chances there are. Using innings as the denominator is an imperfect solution to this issue.)

From there, it’s a simple matter to use a constant to convert plays to runs. Then you adjust for position, prorating out the difference between positions based upon playing time. The positional adjustment should be based on the relative difficulty between the positions on defense, which we can measure to some extent by looking at players who play multiple positions. Here’s the full nitty-gritty:

Pos
RZR
OOZ_INN
Run/Play
BIZ/30
PAdj
1B
0.739422
0.030305
0.798
219
-12.5
2B
0.821831
0.026007
0.754
426
2.5
3B
0.696536
0.038408
0.8
354.1333
2.5
CF
0.921688
0.067393
0.842
349.0333
2.5
LF
0.882837
0.04453
0.831
275.4
-7.5
RF
0.899098
0.048531
0.843
292.0333
-7.5
SS
0.828403
0.037678
0.753
424.8333
7.5

Once a player’s positional rating is figured, I simply prorate out the positional adjustment based upon playing time. To measure playing time, I use the number of BIZ chances relative to the league average. (Sometimes they’ll be prorated by plate appearances instead—I prefer to use a measure of defensive playing time instead, but that should be a largely pedantic point for the majority of players.)

Catchers are a different ball of wax—ability to turn a ball in play into an out is largely irrelevant to their defensive value on the field. Now, if there’s a reliable measure of a catcher’s game calling ability in a single season, I’ve yet to see it. But what we can measure is a catcher’s impact on the running game and a measure of how well he blocks pitches. That data is also available right here on THT.

We can measure a catcher’s value in controlling the running game using the same weights we used for stolen bases and caught stealing to evaluate hitters. (Simply reverse the sign; what’s positive for a runner is negative for a catcher.) But we also need to factor in the number of attempts—a catcher with a strong arm and a good reputation simply won’t see many attempts against him, while a weak armed catcher may see a lot of attempts against him. To account for this, we multiply stolen base attempts above or below average by -0.086. Blocked pitches are handled similarly: -0.232 for each wild pitch or passed ball blocked above average.

For catchers, I used innings played as the unit of playing time, and used a defensive adjustment of 12.5. For designated hitters, I use plate appearances as the unit of playing time, and use a positional adjustment of -17.5; some would argue that the defensive value of a DH should be zero, not negative, but that’s not an apples-to-apples comparison with their peers. If a DH could play the field and provide some sort of value, it’s very likely that his team would use him in such a fashion. As it stands, they can’t, and that makes a player who can hit and field simply more valuable to the team.

I don’t pretend any of that is definative or even state of the art; a system like UZR or PMR or the Fielding Bible Plus/Minus system are superior to this approach. When it comes to presenting the values and thinking through their meaning, I do prefer having the input components of a system like RZR to look at, tear apart and put back together again, however.

Putting it together

We have offense, and we have defense. What now?

Here’s the step where we want to convert runs into wins. This is where many a brilliant sabermetrician has become shipwrecked. We do not want to end up with half-flooded engines and radios, with a half-buried bow. So we are going to proceed with extreme caution. In order to convert runs to wins in a sensical fashion, we need to use marginal runs; in short, the question is how many additional runs result in one additional win? You may scoff, but that word additional has caused us to lose many more good men than necessary.

The question of where to set that margin is a rather controversial one, but the most important guiding principle is that it is very dangerous to set the margin too low. Let’s use a practical example. Baseball Prospectus’ Wins Above Replacement Player is explicitly set to a baseline somewhere around a .125-.150 team win percentage. And yet if you add all the wins together, you end up with 2567.9 marginal wins for 2008. This is, quite frankly, absurd; in the actual 2008 AL and NL there were a combined 2428 wins. It’s frankly impossible for marginal wins to exceed total wins.

But that is where you end up if you set the margin too low. A typical rule-of-thumb for converting runs to wins is to use 10 runs per win. But try applying that to the average team, which scored roughly 735 runs and allowed 735 runs. We know that a team that scores as many runs as it allows in the course of a season is right around a .500 team, or 81 wins in a 162-game season. But applying our run-to-wins converter to all runs scored gives us 73.5 runs creditable to the offense alone!

So we need to set our margin high enough to where our runs-to-wins conversion reconciles with team wins. We already have offense and defense measured relative to average; why not use that?

Here’s where we run into a problem. A player who, combining offense and defense, is precisely 0 is then worth 0 wins. And that’s true regardless of whether they play two games or ten or 162. In order to know a player’s overall value with an average baseline, we have to know both his run value and his playing time. This can be inconvenient and unwieldy.

So what we ideally want is a baseline low enough to capture the value of simply being on the field and contributing, but high enough to actually capture real contributions to team wins. The compromise between the two typically goes under the name “replacement level,” and is not so much one baseline but any number of baselines in the range between roughly a .290 and a .350 win percentage. For the time being, let’s simply say that a replacement level position player contributes -20 runs compared to average per 700 plate appearances. So, add that number of runs (prorated out based upon playing time to a player’s totals.

One more thing: the NL and the AL are not, and have not, been equal leagues for quite some time. (Don’t believe me? Check the interleague records.) To compensate for that, add five runs per 700 plate appearances to players in the American League. Then, to convert runs to wins, simply divide the total by 10 (you can improve upon this, but it’s easy to remember and it works well enough for right now).

So now, without further adieu, the top 10 position players, 2008, according to WAR:

Last
First
Offense
RepBonus
Defense
Total
WAR
Pujols
Albert
73.3
18.3
16.7
108.4
10.8
Jones
Chipper
51.5
15.3
18.1
84.9
8.5
Utley
Chase
33.6
20.2
30.0
83.7
8.4
Berkman
Lance
49.5
19.0
12.6
81.0
8.1
Rodriguez
Alex
39.0
21.2
14.3
74.4
7.4
Teixeira
Mark
45.5
21.2
7.2
74.0
7.4
Ramirez
Hanley
40.8
19.8
9.6
70.2
7.0
Wright
David A
41.0
21.0
7.3
69.4
6.9
Beltran
Carlos
30.4
20.2
16.8
67.4
6.7
Sizemore
Grady
30.1
26.6
8.1
64.8
6.5

And the bottom ten:

Last
First
Offense
RepBonus
Defense
Total
WAR
Wilkerson
Brad
-13.9
11.0
-9.4
-12.3
-1.2
Balentien
Wladimir R
-16.1
9.3
-6.0
-12.9
-1.3
Jacobs
Mike
5.4
14.8
-31.5
-11.3
-1.1
Matthews Jr.
Gary
-12.6
17.0
-16.4
-11.9
-1.2
Patterson
Corey
-29.4
11.2
4.7
-13.5
-1.4
Lamb
Mike
-15.6
9.6
-8.5
-14.4
-1.4
Francoeur
Jeff B
-26.0
18.7
-4.6
-11.9
-1.2
Pena
Tony F
-30.4
8.4
6.2
-15.8
-1.6
Guillen
Jose
-9.7
22.6
-28.3
-15.4
-1.5
Gload
Ross
-15.4
14.9
-20.1
-20.6
-2.1

Check out all those Royals! Now it’s time to turn our attentions to…

Pitching

As discussed last week, we want to view pitching seperately from their defense. Thus, we need to use a model that analyzes a pitcher’s own contributions. These models are not perfect, because no model is perfect. We should always be trying to improve these models.

At THT we have Fielding Independent Pitching, a pretty simple model but a very effective one. I have a hangup about applying linear models to pitching, however, as discussed last week, and thus use a dynamic FIP based on BaseRuns. The difference in most cases is likely pedantic, but for extremely good or bad pitchers it will matter.

If you want to use FIP instead (or any metric scaled to ERA), you first need to convert to RA instead; an unearned run can lose you a game as easily as am earned run. As a rule of thumb, divide ERA (or anything scaled to look like ERA) by .92 before using in a player value metric.

Once we have a pitcher’s RA, we also have that pitcher’s runs per game if he were to pitch a whole game. From there, we can compute his win percentage compared to a league-average pitcher—what percent of his games would he win if he pitched all nine innings, assuming his team scored an average number of runs? To figure it out, we can use the Pythagorean win expectation. (Instead of park-adjusting the pitcher’s performance line, at this step I park-adjust the average runs per game.)

Then, we want to compare his production to the win percentage of a replacement-level pitcher. The replacement level of a pitcher depends on his role – a relief pitcher is easier to replace than a starting pitcher. And again, there’s a league quality difference.

There’s also the question of a relief pitcher’s leverage to consider: A fireman who comes in and pitches the tough outs is more valuable than a middle reliever, or a mop-up guy who is sent in once the game is essentially already out of hand. For that, we need to figure a pitcher’s leverage bonus—essentially, how many extra wins does his leverage add?

So to figure WAR, we use:

WinPct-ReplacementWinPct * IP/9

And add:

WinPct-LevWinPct * Lev * IP/9

The values I use for those constants:

Lg
Start
Relief
Lev
RA
AL
0.37
0.46
0.57
4.72
NL
0.39
0.48
0.57
4.66

And now, the top ten pitchers by WAR:

Last
First
BsRA
WAR
Halladay
Roy
3.18
7.8
Sabathia
CC
6.35
7.5
Lee
Cliff
2.99
7.4
Lincecum
Tim
3.02
6.9
Haren
Dan
3.27
6.3
Webb
Brandon
3.54
5.8
Santana
Ervin R
3.50
5.6
Mussina
Mike
3.46
5.4
Lowe
Derek
3.34
5.3
Danks
John W
3.61
5.3

Doc is sadly overlooked sometimes, isn’t he? And the bottom ten:

Last
First
BsRA
WAR
Gagne
Eric
5.85
-1.4
Walker
Jamie
6.54
-1.4
Manning
Charlie
6.13
-1.4
Pinto
Renyel
5.33
-1.4
Hansen
Craig R
13.53
-1.5
Borkowski
Dave R
6.48
-1.5
Villarreal
Oscar
6.88
-1.8
Speier
Justin
5.76
-1.8
Batista
Miguel
6.78
-1.9
Heilman
Aaron
5.57
-2.1

What’s worse than being a gascan? Being a gascan with a high leverage.

Wrapping it all up

Despite how it may feel, I have tried to be brief about these issues while still presenting enough to give an idea of what methods I use and, more importantly, why. I have also cribbed a lot—there are plenty of smart people out there and whenever possible I try to use their ideas rather than my own. There is a lot of material in the references down there that I heartily recommend, and I owe a debt to a lot of people, specifically folks like Tom Tango, Justin Inaz, Patriot and Sean Smith.

Now, like I promised above, here’s your oportunity to question the assumptions listed above. Provided is the complete spreadsheet I used to calculate all the values above. What do you get? Well, you get the full ratings – offense, defense, catching and pitching, plus WAR, for all players as well as team totals for 2008.

In addition, there’s a tab called “Assumptions.” That’s where I stored all the constants in the fancy charts from the article. Don’t like one of my assumptions? Play around with it. Substitute your own values. Substitute someone else’s values. See what happens.

But if you have questions—like, say, why Alexai Ramirez only rates out at barely above replacement level, or why NL MVP runner-up Ryan Howard only rates 117th place in position player WAR—just check the Assumptions tab. Tell me what assumption you disagree with and why. Or tell me why RZR underrates their fielding. Or whatnot.

Just please, don’t come to the table assuming that you tell the model what to think about certain players, not the other way around. The model has limits, of course—they all do. But on the whole they work pretty well in explaining how real baseball teams win real games. And a model that doesn’t challenge our preconceptions is just as useful as no model at all.

References & Resources
Adapted from Tom Tango’s Wins Above Replacement methodology.

The information used to calculate linear weights values was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.

Park factors adapted from work by Patriot. Conversion to linear park factors inspired by a conversation with Tom Tango.

Method of adapting RZR to a plus/minus format adapted from work by Justin Inaz and Chris Dial. Catcher defense ratings adapted from work by Sean Smith. Tom Tango explains his positional adjustments.

Method for estimating reliever leverage courtesy of Justin Inaz. Method for estimating starter/reliever usage based upon games started courtesy of Tom Tango. Fielding Independent BaseRuns is my own work, based upon David Smyth’s BaseRuns and Tom Tango’s FIP. Pitcher win percentages derived from Pythagenpat.


Comments are closed.