You, too, can be a scout

by John Walsh
January 28, 2008

Could you be a baseball scout? Can you judge baseball talent by
watching somebody play? Why not? Many serious fans watch 100 or 150
baseball games a year; surely they must learn something during all
those hours. As Yogi said, “You can observe a lot just by watching.”

Well, as many of you probably know, occasional THT
contributor Tom Tango believes that fans do know how to judge talent,
and he also believes in the wisdom of
crowds, meaning that if you ask enough people a question the
average answer will be pretty good. The result is Tom’s
Scouting Report, By the Fans, For the Fans.

The Fans’ Scouting Report

Basically, Tom asked fans, anybody who wanted to contribute, to rate
the defensive abilities of major league players. He wanted people who
had seen the player in question in at least 10 games during the 2007
season and asked for a judgment in seven different defensive
categories, which I list here:

Before/As ball is pitched:
1. Reaction/Instincts
While ball is in air/on ground
1. Acceleration/First few steps
2. Velocity/Sprint speed
3. Hands/Catching
After ball is caught (throwing)
1. Release/Footwork
2. Throwing strength
3. Throwing accuracy

The fan/scout was asked to give a score of one to five for each category.
Tom then tabulated the results and massaged the data a bit to
convert the one-to-five scale to a zero-to-100 scale. He also kept
track of how many people contributed to each scouting report (the more
the better) and also reports an “Agreement Level” among the different
scouts.

Oh, one more important thing: Tom specifically requested that people
not take any stats into consideration when evaluating
fielders. Let’s take a look at an example:

Guerrero, Vlad  
Instincts  FirstStep  Speed  Hands  Release  Strength  Accuracy  
   48         43        50    26       67       93        58   
Ballots  Agreement  Overall
   29      0.65        52

Twenty-nine fans gave their opinion on Vlad’s defense and the
agreement level of 0.65 is roughly typical. He
scores above average in the three throwing categories (especially
“Strength”), but poorly in “Hands” and below average in “First Step.”
Tom calculates an overall, position-neutral score from the individual
categories and Vlad scores 52, just about average.

Cool, no? This is undoubtedly a great resource, but I have one
question: Can it possibly work? I don’t doubt that fans can recognize
good (or poor) play when they see it, but we fans are also clearly
influenced by a host of factors that go beyond observation:
reputation; past, perhaps out-dated, performance; our own prejudices,
including bias towards our favorite team and players; knowledge of
defensive stats; and probably others that I haven’t
thought of.

Another issue is how well fans/scouts can actually observe
players. I’m guessing that many of these fans are watching games on
television—how can they judge “first step” or “instincts” when
watching on TV? You have to be watching the player before the ball is
hit to judge those things. And even if you are at the park: are you
going to be staring at J.D. Drew as the pitcher delivers the ball?
You’d have to do that for every pitch in a game in order to see his
reaction to the handful of balls hit his way in a game. Who watches a
game like that?

So, “first step” is not so easy, but how about throwing? The throw is
always captured on TV or by the fan at the ballpark. Furthermore, it’s
fairly easy to judge arm strength and while “release” and “accuracy”
are a bit trickier, if you pay attention, you can judge these
aspects of throwing as well. So why don’t we have a look at how the
fans/scouts rated throwing and see if their observations agree with
the numbers?

Enter the stats

What numbers, you ask? Well, I happen to have some results on outfield throwing handy, so we can compare
those numbers to the fans’ report. The first thing to do is to
combine the three throwing categories of the fans’ report (remember,
release, strength and accuracy) into a single arm rating. I’m going to
do the easiest thing and simply average the three values. Here are the
top 10 outfield arms (any outfield position) according to the fans’
report:

10 Best OF Arms, Fans' Scouting Report 2007
Name              Arm Score
Young, Delmon     92.7
Suzuki, Ichiro    91.3
Victorino, Shane  91.3
Francoeur, Jeff   88.3
Cuddyer, Michael  87.0
Rios, Alex        84.0
Hawpe, Brad       82.7
Hamilton, Josh    82.3
Markakis, Nick    81.0
Pie, Felix        79.0

and here are the trailers (I am merciful, so I only list five):

Pierre, Juan        7.0
Podsednik, Scott    9.0
Bay, Jason         17.7
Damon, Johnny      17.7
Owens, Jerry       20.3
Gibbons, Jay       20.7

If you look back at my article on 2007 outfield arms, you will see
some agreement. My top five right field arms were: Cuddyer, Francoeur,
Young, Victorino and Rios. Wow! These are exactly the first five right
fielders in the fans’ report (although not in the same order). I also had Pierre and Owens as atrocious, Bay was around average, but poor in previous seasons. Damon and Podsednik did not qualify in 2007, but both have been terrible overall in recent seasons. Gibbons has been about average over the past few seasons.

Hey, this
is looking pretty good, so far. How about my top five center field arms? Here
they are, along with their ArmScore from the Fans’ Report:

Top Five Center Fielders, according o statistical analysis
Name             Arm Score 
Upton, BJ        57.3   (scouted as 2B)
Taveras, Willy   49.7
Edmonds, Jim     77.7
Suzuki, Ichiro   91.3
Freel, Ryan      39.3

Uh, oh. Not so good, is it? Taveras and Upton are rated as
average-ish, while Freel is definitely seen as below average. Ichiro,
considered to have the second-best outfield arm in baseball by the
fans, only manages fourth place in my center field ranking.

Hmmm, looks like we’re going to have to dig deeper to see what’s going
on. Instead of looking at individual results, let’s widen our
perspective a bit.
The plot on the right shows how well the results from my outfield arm
analysis match up with the Fans’ Report. Each point represents a
single outfielder season (between 2005 and 2007), with his Fans’ Arm
Score plotted on the horizontal axis and his runs saved per 200 opps
(my analysis) on the vertical axis. I require at least 50
opportunities in a given season and (of course) only outfielders that
have a scouting report are plotted.

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

I don’t know what you think, but in my opinion that’s an ugly plot.
Oh, we do see some agreement—if you squint your eyes
and tilt your head slightly, you can see an upward slope to the mass
of points as you move to your right. But I was expecting a tighter
bunching of the points around a straight line. To quantify the
agreement, it’s customary to quote a correlation coefficient: it turns
out to be 0.39. That is not a strong correlation.

Proceed with caution

But, you have to be careful with correlation coefficients, as Tom
Tango himself likes to forcefully
point out. What I think is happening here is that there is too
much noise in the statistical analysis. Brad Hawpe, who ranked high in
the Fans’ Report, scored 2.7, 11.2 and -4.4 runs per 200 opps from
2005 to 2007. Ichiro had years of 10.6, -1.2, 8.9 and 5.2
(2004-2007). Now, some of the variation may well be due to other
factors, but a large part of it is likely statistical noise.

We can try to reduce the effects of noise by increasing the minimum
number of opportunities—the more opportunities you consider,
the closer you can get to a player’s true talent level. The graph on
the right shows how the correlation coefficient between runs200 and
Arm Score varies as we increase the minimum number of
opportunities. To beef up the sample size for this plot, I combined all results
for the three-year period 2005-2007, and furthermore, I combined
results from the three different outfield positions if a player
played multiple positions in that time frame.

As expected, as we move to more and more opps, we see a stronger
correlation (the rising blue line) because there is less noise in the
data. The red line shows the number of players that meet the minimum opps
requirement. The point here is that there is a strong
correlation between the statistical analysis and the scouting
report. For example, when at least 500 opps are required, the
correlation coefficient rises to 0.78.

Another look

So, why don’t we go back to our original plot of runs200 vs. Arm
Score, but this time we use the combined data and ask for at least 300
opportunities for any given player. This gives me 70 outfielders to
look at. Here’s the graph:

Compared to the original version above, here we can see a strong
correlation between runs200 and Arm Score. I’ve superimposed the
“trend line” that best describes the data. Note how the trend line
comes very close to intersecting the point (50,0). In other words, an
average Arm Score (50) is predicting an average runs200 (0). This is
impressive agreement — there is nothing that I’ve done here that
forces the line through that point. This is the wisdom of crowds at work.

I’ve annotated a few players who have performed much better (in green)
or worse (in red) than what the scouts would have predicted. I just
picked these by eye, but for those of you who like numbers, here is a
list of the players who exceeded the scouts’ expectation by the
greatest amount.

Most Underrated by Fans/Scouts
                       ArmScore  Pred.   Actual    Difference
Soriano_Alfonso        73.0      2.6     11.1      8.6
Cuddyer_Michael        87.0      4.2     11.2      7.0
Taveras_Willy          49.7     -0.3      5.7      5.9
Francoeur_Jeff         88.3      4.4      9.8      5.4
Jones_Jacque           21.0     -3.7      1.6      5.3
Ramirez_Manny          54.3      0.3      5.6      5.2
Floyd_Cliff            30.3     -2.6      1.9      4.5
Lofton_Kenny           29.7     -2.7      1.2      3.9
Ibanez_Raul            32.3     -2.3      1.1      3.5
Rowand_Aaron           44.7     -0.9      2.5      3.4
---------------------------------------
Pred: predicted runs200 from trend line
Actual: actual runs200
Difference: Actual - Pred

and here are the underachievers:

Most Overrated by Fans/Scouts
                       ArmScore  Pred.   Actual    Difference
Green_Shawn            32.7     -2.3     -8.8     -6.5
Drew_J.D.              62.3      1.3     -4.5     -5.8
Anderson_Garret        58.3      0.8     -3.7     -4.5
Giles_Brian            46.0     -0.7     -4.6     -3.9
Griffey_Jr._Ken        64.0      1.5     -2.2     -3.7
Clark_Brady            38.7     -1.6     -5.0     -3.5
Gathright_Joey         28.0     -2.9     -6.1     -3.2
Jenkins_Geoff          76.0      2.9     -0.1     -3.0
Kearns_Austin          73.7      2.6     -0.2     -2.9
Dye_Jermaine           57.7      0.7     -1.9     -2.6

What makes for a good thrower?

Since the scouts have done a good job of evaluating arms, might we look at
the individual components of arm rating, i.e., strength,
accuracy and release, to see if we can learn something about the
relative importance of these aspects?

Well, the short answer is “no.” The reason is that when you look at
these three categories, you find that there is a strong correlation
between any pair of them, as you can see in the graphic below (which
includes all outfielders in the Scouting Report).

I believe the correlations can have two sources: 1) real correlations
among defensive abilities—after all, good
defensive players often excel in more than one category—and 2) fan
bias.

I believe fan bias is playing a role here, because I doubt that the
real correlations are as strong as we are seeing. Furthermore, in the
case of strength vs. accuracy, I might have expected to see little, or
even negative, correlation, not the strong positive correlation we see
in the plot.

What I guess might be happening is that fans are able to judge a
player’s overall throwing skill, but they tend to give a good thrower
high scores in all three categories and conversely for poor
throwers. They are not able to judge independently the three different
throwing categories. That’s my hypothesis anyway.

In any case, the high correlations among the arm
categories makes it impossible to determine the
relative importance of strength, accuracy and release in evaluating
outfield arms.

Last Word

The Fans’ Scouting Report, while not perfect, is a great resource, and
I believe it will be a useful piece of the puzzle in understanding
defensive ability. When you see Tom’s invitation to participate in
the 2008 Scouting Report, do not hesitate to do so.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG