If something is quantifiable or qualifiable to a reasonable degree, that all goes into sabermetrics. His knees buckle? Sabermetrics. Heart of lion? Sabermetrics. Takes a check swing on a 3-0 count? Sabermetrics. Likes to party at night? SABERMETRICS!

It’s all about constructing a model of reality, and everything is reality.

– Tangotiger

June 25, 2005

Defense has been heralded as the next frontier of sabermetrics, and performance analysts are working hard to invent and improve objective measures of fielding ability. Reviewers of work in this area have published some excellent summaries; check out recent articles by David Gassko and David Cameron for overviews of the different defensive rating metrics. An older article by Tom Tippett adds context by discussing the evolution of fielding analysis.

Although sabermetrics is usually defined as being limited to the analysis of baseball records, the quotation from Tom Tango (Tangotiger) suggests a broader view that welcomes qualitative data as well as the strictly quantitative. He hints at the two faces of reality, an objective domain of facts and explanation, and a subjective domain of opinion and understanding. In the objective domain—the usual sabermetric territory—observations and measurements of a player’s performance can be made by anyone, or even by machines. Analysis proceeds mechanically, assuring that the results are reliable, valid and not dependent on the researcher.

Subjectivity, on the other hand, is private. Only the baseball scout himself can observe and measure his own evaluation of a player. Unlike objective performance analysts, scouts can reach different conclusions from the same observations. Consequently, sabermetrics regards scouting reports as being closer to opinion than information. Qualitative opinion must be defended; quantitative analysis approximates truth.

Sabermetrics evaluates a player’s “true” abilities as statistical regularities found in large samples. Defensive evaluation is the one aspect of the game that has been resistant to the statistical approach, at least so far. The customary sabermetric solution to this problem is to get more data and treat it with stronger mathematics, which is exactly what the advanced PBP-based defensive metrics do. Sometimes, though, progress requires not simply more data and better processing, but new methods that produce new observations that lead to new abstractions.

Fortunately, there is an established (if not well known) framework for the objective and rigorous analysis of subjective data. It’s called Q methodology, and it’s one of my favorite tools for understanding a phenomenon from a different perspective. Q provides instrumentation (the “Q sort”) for the quantification of subjectivity and a technology (factor analysis) for data reduction and interpretation.

A Q sort ranks orders subjects by a series of relatively subjective variables (such as those typically provided by baseball scouts) and the factor analysis uncovers the commonalities among those subjective elements. The result is a better understanding of what it is that makes a fielder great. Let me run through an example, and I’ll put the more technical details in a footnote.

###### Analysis of Scouting Data

David Pinto’s PMR is one of the new breed of interesting PBP-based defensive metrics. I started with his listing of 2005 shortstops and took scouting reports from Stats, Inc. (as published on espn.com) and Tangotiger’s 2005 Fans’ Scouting Reports. Complete scouting reports were not available for every player on Pinto’s shortstop roster; my final sample included 30 players. The reports provided ratings and descriptions of players’ abilities in a dozen categories such as arm strength, range, hands, speed and instincts. Players were Q sorted according to each category, and then the sorts were factor analyzed.

The analysis extracted two factors or high-level dimensions. One of the factors accords with qualities that are usually thought of as “tools,” the other factor with “skills.” The following table shows the scouting report ratings that loaded onto each factor.

Category (Source) Factor 1: TOOLS Factor 2: SKILLS Arm Accuracy (STATS, Inc.) X Arm Strength (STATS, Inc.) X Hands (STATS, Inc.) X Range (STATS, Inc.) X Speed (STATS, Inc.) X Instincts (Fans’ Scouting Report) X First Step (Fans’ Scouting Report) X Speed (Fans’ Scouting Report) X Hands (Fans’ Scouting Report) X Release (Fans’ Scouting Report) X Arm Strength (Fans’ Scouting Report) X Arm Accuracy (Fans’ Scouting Report) X

Fielding ability is the complex of two distinct factors: Tools and Skills. Tools are physical attributes; Skills result from practice, preparation, experience and coaching. They do not lie on opposite ends of a continuum; Tools and Skills are orthogonal, and every player possesses some “quantity” of both. This is the Two Dimensional Model of Fielding Ability. Even though it’s based on data about 2005 major league shortstops, I strongly suspect it generalizes to any defensive position at any level of baseball. More research is needed for confirmation, but the Two Dimensional Model is surely a good first approximation.

That Tools and Skills factors would emerge is unsurprising. Baseball ability is already thought of in those terms, and many elements of human behavior exhibit nature/nurture dimensions. This study is significant because it discovered the factors by probing into descriptive, qualitative scouting information. I did not set out to test a hypothesis about Tools and Skills. No predetermined number of factors was extracted in the analysis—one factor or seven were equally plausible. The Tools and Skills dimensions quite literally bubbled up from the data. Given an opportunity to speak for itself, with minimal intrusion by the researcher, the data said “tools and skills.”

Where the professional scouts and knowledgeable fans both evaluated the same ability, their ratings agreed. This speaks to the credibility of the Fans’ Scouting Reports. The exception was Arm Accuracy, which the scouts treated as a Tool and the fans saw as a Skill. Conventional scouting reports typically subsume accuracy within the arm strength mechanic, which is defined by straight-line trajectories on thrown balls. Accuracy probably has both Tool (requires arm strength) and Skill (can be improved with practice) components.

###### Improving Defensive Metrics

All right, we’ve objectively analyzed subjective scouting data to breakdown the mechanics of fielding into Tools and Skills. Why is that important? Why does it necessarily lead to a better fielding model? (Glad you asked, Dave!)

If you accept that scouting evaluations of defensive mechanics are legitimate data derived from observation of actual events, albeit subjectively perceived, then the Two Dimensions serve to inject a little theory into the metric-building process. Suppose you want to build a thermometer. It would help to have a workable theory of, you know, temperature and its properties. Without a prior understanding of what exactly is being measured, it’s possible to get lost in data and mathematics. Because fielding ability breaks down to two underlying dimensions of Tools and Skills, a valid defensive metric—one that’s close to reality—should measure both tools and skills, as simply and directly as possible.

The Tools and Skills factor loadings tell us the components, like arm strength or instincts, that describe each dimension. And because a factor’s components correlate strongly with each other, it’s necessary to measure only a few of them. For instance, if range can be measured readily with existing PBP records, we really don’t need to worry about arm strength, which is much more difficult to quantify. Range and arm strength tend to go together, so we can infer one quantity from a measurement of the other. Were we starting from scratch on a new defensive metric, I think a reasonable starting point would be to assemble a list of reliably measurable quantities that represent Tools and Skills.

We could then sanity check the defensive metric by testing for correlation with the Tools and Skills factor scores. Zone Rating, in fact, does correlate significantly (P<.01) with my Skills factor. This agrees with Tom Tippett’s assessment that ZR places more emphasis on soft hands (Skill) than range (Tool) because the system basically counts only balls that a player is expected to reach, as I understand it. Extending or improving ZR, it seems, requires better capture of range and throwing performance without compromising ZR’s skillful measure of Skills ability. One such system that I tested—the PMR family—does not seem to have achieved this goal yet. PMR’s shortstop ratings are uncorrelated (that is, basically random) with either Tools or Skills. At the risk of being redundant, if the Two Dimensional Model of Fielding Ability is a reasonable approximation of reality, then metrics that inadequately measure either tools or skills will tend to be unpredictable and fail to converge into general agreement with other metrics. Analytically and anecdotally, Zone Rating does seem to capture certain aspects of Skill.

###### Predicting Zone Rating

The relationship between ZR and the Skills dimension is strong enough to enable us to predict objective performance from subjective observations. Correlation between ZR and Skills increased to a respectable .73 when players who started fewer than 120 games at shortstop in 2005 were removed from the sample. This suggests that scouting reports of fielding ability may tend to converge with performance metrics over a large number of innings. Conversely, even the most reliable defensive metrics may be unstable for platoon players, backups or fielders who spend time much time on the DL.

The following chart compares the Zone Rating and Skills factor scores for the 21 shortstops with 120+ games last season. Remember, “factor scores” represent the quantized subjectivity from scouting reports of defensive ability. To simplify comparison, I’ve transformed both sets of scores to T-scores (mean=50, sd=10).

Player Zone Rating Skills Score Vizquel 67 64 Everett 67 65 Wilson 61 69 Greene 61 59 Gonzalez 60 53 Cabrera 57 52 Young 56 37 Uribe 56 56 Tejada 55 42 Rollins 53 54 Reyes 52 43 Furcal 49 58 Eckstein 47 48 Jeter 46 47 Renteria 45 38 Lopez 44 49 Lugo 43 51 Clayton 42 40 Berroa 38 45 Adams 37 26 Guzman 33 36

Scouting reports predict shortstop Zone Rating scores surprisingly well, with only three outliers (defined as 1+ standard deviation = 10+ points difference between scores), one of whom—Mike Young—is a converted second baseman. As a matter of fact, these “data-less” scouting reports predict ZR about as well data-only methods. In truth, data comes in many forms, and objective baseball records do not have an exclusive claim as the only “data.” Subjective information is data, too.

###### Limits of Statistics

That the scouts and ZR stats mostly agree about this set of players is an indication that both underlying methods, objective statistics and subjective observations, are measuring the same phenomenon, fielding skill. Why is it that the Skills dimension is measurable statistically, while Tools are difficult to measure objectively?

Most baseball metrics compare individual players against one another or against an average. These differences are statistical and require large samples to detect real differences against a noisy background. Skill is a statistical characteristic. The skill of a shortstop cannot be determined from one cleanly fielded ground ball and accurate throw to first. If he repeats it 100 or 1,000 times, then we become confident about his true ability. Truth is found in large numbers. Skill, which is a learned ability that is sharpened and maintained through repetition, is about making routine plays over and over. It’s the world of quality control—a world where statistical analysis is undisputed king.

Perhaps Tools simply operate on a different scale, an immediate and much more dynamic scale that is impossible to measure effectively through statistics. Opportunities to display extraordinary athleticism and raw talent come less frequently than the routine plays that depend on skillful execution. Tools show up at the scale of events that are beneath the sensitivity of statistical measurement. This does not mean that Tools are nonexistent or unimportant.

###### Summary

In this article I’ve demonstrated a new way of using subjective scouting data to make objective evaluations of players’ mechanics. Scouting-based information contains a wealth of data that can now be analytically coupled with performance metrics, in defensive and in other aspects of player evaluation.

From a mechanical perspective, fielding ability is the complex of two distinct factors—Tools and Skills. Tools are natural; skills can be learned. The Two Dimensional Model of Fielding Ability is a first step towards laying a theoretical foundation for defensive metrics that come closer to reality.

**References & Resources**

This is not an article about Q methodology, but a few more details will help understand the study. A Q sort is essentially a rank ordering of variables—shortstops here—according to some condition, such as the Stats, Inc. range evaluations. Along with categorical ratings, the scouting reports often include short narrative descriptions that I used for tie breaking.

Picture each sort as representing a narrow “dimension” of fielding ability—arm strength, range, hands, and so forth—with shortstops ranked along that specific dimension in terms of their mechanics, subjectively rated by Stats, Inc. scouts and Tangotiger’s fan panel. The purpose of factor analysis is to reduce the dimensionality of those twelve sorts into a smaller number of “factors.”

Simply stated, factor analysis is correlation in more than the usual two (x,y) dimensions involving a matrix of coefficients that are literally factored in the same sense that factoring in algebra simplifies an expression. The factors extracted in this way are “meta correlations” which simplify the information contained in a large number of measurements into a few meaningful factors or dimensions.

In the table, two factors form the simplest solution that explains most of the variance in the scouting data. The rankings that “load” onto a factor have high correlations with each other and, at the same time, low correlations with those on the other factor. More abstractly, a factor represents a latent or underlying dimension that is the “driving force” behind the measurements. We can’t directly observe Tools—the underlying phenomenon—we can only observe and measure physical traits like arm strength and speed. Through factor analysis, we discover which measurements “hang together” and thus describe a driving force such as Tools or Skills.