I’ll be honest with you.

Had I known from the beginning that the idea of designing a model to estimate the difficulty of blocking every major league pitch is not a new one—let alone a groundbreaking one—I might have spent significant portions of my free time doing stuff that involves sun and physical activity instead. But, I didn’t. And, anyway, as Jovanotti already sang: *“Se tutti i grandi libri qualcuno li ha già scritti, mi chiedo ragazzi voi che cosa fate?”*

And, yes, I am aware you have no idea who Jovanotti is. Your loss, really.

###### Vertical or horizontal plane?

When deciding how to track the location of the pitches to be caught, I was considering two options, neither of which really seemed completely appropriate. With help from **Mike Fast** I was able to calculate the spot where every pitch would have landed had the ball traveled untouched until it hit the ground.

While this was really helpful for pitches that clearly landed in the dirt in front of the catcher, it gave us only obscure information about the rest of the pitches. Is a pitch that would otherwise land 40 feet behind the catcher easier or harder to block than the one that would land 30 feet behind him? And if two pitches both land 40 feet behind the catcher, do a curveball and a fastball cross the plane where the catcher is placed at the same height?

On the other hand, the PITCHf/x data offered the height and the width of every pitch as it crosses the front edge of the plate. This leads to similar problems. First, the catcher finds himself several feet behind that plane. Second, the balls in the dirt are represented with the negative height number, again leading us to guess where they bounced.

Discussing these issues with **Tom Tango**, it became clear that if we want to model reality, we have to make the model as real as possible. In this case it meant working in three dimensions and using both the horizontal and a vertical plane, the former for the pitches that usually bounce in front of the catcher and the latter for those that don’t.

###### Catcher positioning

I placed the vertical plane three feet behind the back edge of the home plate, because that is where I expect the catcher’s mitt to be most of the times. You might or might not agree with this. I don’t think there is a singular correct distance from the plate that can be always used, as some catchers position themselves deeper than others and they also adapt to the batter positioning himself in the batter’s box. Generally, they try to move as far ahead as possible without interfering with the swing.

The letter C in the above graphic is placed just about three feet behind the back edge of the home plate. And although the angle on the following shot of Yadier Molina doesn’t show great perspective, I think it can be assumed that his glove—not his body—is just behind the end of the batter’s box.

For sure, some of the pitches that would touch the ground a bit more than three feet behind the home plate will not be caught in the air. Similarly, some of the ones that would bounce just a bit in front of that imaginary line will be caught on the fly. As we are not dealing with the binary model here, it doesn’t really matter all that much.

###### The model

After calculating the appropriate landing/crossing spots and creating some quick buckets, this is what the probabilities of a pitch getting away from the catcher looked like, depending on its location:

In a nutshell, we see that pitches that bounce are harder to block than the ones that don’t and that pitches further away from the center of the plate pose more problems for the catchers than the ones which are over the plate. Incredible, right? Still, this kind of data presentation made it easier for me to visualize the certain areas where catchers have to receive the pitches.

###### Methodology

For the purposes of this research, I didn’t really care whether the balls that got away from the catcher were scored as a wild pitch or as a passed ball. So, quite ingeniously and after many hours of creative brainstorming, I came up with a name for both of them together—**Passed Pitch (PP)**. To determine the percentages in the above chart, we have to filter the pitches where a PP can happen in the first place. These are:

{exp:list_maker}With runners on base, all the called strikes, swinging strikes and balls*

With no runners on base, all the called and swinging strikes when the count is already two strikes

{/exp:list_maker}

* In addition, I decided to ignore all the pitchouts and intentional balls, although PP can occasionally happen on such pitches. The price of losing a few passed pitches seemed acceptable when compared to the danger of seriously skewing the data by including them.

###### Pitch location

I first looked at the PP dependency on pitch length/height. From the model above, the “length” of the pitch is the distance between the spot where the pitch bounced and the catcher’s plane. It is represented with a positive Y number. The “height” is the distance above ground at the catcher’s plane for the pitches that didn’t bounce. It is represented with a negative Y number.

The further in front of the catcher the pitch bounces, the harder it is to block. The pitches between two and three feet above the ground are the easiest to catch and it deteriorates from there upwards. That was rather easy, the pitch width is more complicated:

What you see is the overall dependency and the absolute distances from the center of the plate. Before going into more detail, it can be said that in general, the pitches further away from the center of the plate are harder to block.*

** As I mentioned above, I ignore intentional balls and pitchouts. When I don’t, it looks as if a ball four to five feet off the plate is easier to catch than the one three feet off.*

Three more factors come into effect regarding pitch width. The batter handedness is one and it is basically a disturbance factor. The pitches inside are harder to block than the ones outside because the catcher has the batter and his bat to deal with and to obscure his vision of the ball. The other factor is that the width of the pitch doesn’t seem to matter the same on different lengths of the pitch. A pitch that is a foot away from the center of the plate will not increase the chances of a PP by the same rate when it is in the dirt as it will when it is belt high, so I had to use multiple regressions.

And, finally, the center of the plate is not the easiest place to catch the ball, but rather a spot about half a foot to the left of it, as seen from the catcher’s perspective. I assume it has something to do with the fact that all the catchers are right-handed.

Unlike batter handedness, the pitcher handedness didn’t influence the outcome, although it did seem so in the beginning. Right-handed pitchers appeared to be tougher to catch, but it turned out it is due to the two factors we can otherwise control, pitch location and speed. More on speed in a second. On average, right handers threw about a mile and a half per hour faster than their left-handed counterparts and they threw to the places where catchers have tougher time catching the ball:

###### Other factors

Speed matters, but mostly only on the pitches in the dirt. There was a great correlation between pitch speed and PP percentage (controlling for the pitch location), but only on the pitches about a foot above the ground or lower ones:

As for the pitch movement, I saw some correlation between vertical movement and PP percentage on the balls in the dirt, but that mostly came by the way of speed correlation (faster pitches generally had more vertical movement). Horizontal movement did not seem to affect the probabilities on the short pitches, but showed some correlation on the higher ones. I decided against incorporating both of them into the model, as the correlations seemed inconclusive.

Instead of classifying the pitches by the type, going with speed and movement gives a fairer comparison, as both Jamie Moyer and Justin Verlander throw “fastballs,” for example. But, at latest when I started with the individual rankings and saw where poor Jarrod Saltalamacchia ended up, it was clear to me that there is one pitch type that needs to be looked at separately:

On average, a knuckleball is seven times more likely to get away as other pitches of the same speed and in the same location are. Not everybody has the same problems with the knuckleball, though. Here is the list of all the catchers with at least 100 knuckleball pitches that needed be caught over the last four years:

These are rather small samples, but that’s all we have (we’ll get to what the numbers mean in a second, for now it’s enough to know that numbers in the last two columns are bad when negative and good when positive). Every catcher performs worse against a knuckleball, but Saltalamacchia seemed to do an even poorer job than the rest of them.

Once the ball is not cleanly fielded by the catcher, other effects come into account, too. How far away from the catcher did the ball end up? How fast are the runners involved? Were they inclined to run based on the score? How much respect do they have for the catcher’s throwing arm? How far do they have to run? The answers to the first four questions are evenly spread between “no idea” and “too much work to check it out,” but the last one is rather easy. We can check the base runner’s state and how it affects the probability of a pitch getting away:

Scoring from third and reaching first happen less often than the model predicts, while advances to second and third are more likely than what is calculated from other factors. It makes sense, too. Second base means the longest throw, and while first and third are equally far away from the catcher, the big difference is in getting the jump. The runner from second already has the lead and has nothing to worry about but running. The batter who just struck out might not even realize he has a chance to run until it’s too late.

And here is the final one, the one that I found to be quite counter-intuitive. If a batter disturbs the catcher just by *being there*, him swinging will cause even more of a disturbance, right? So, controlling for everything else and looking just at swing-versus-no-swing states we come up with this:

The only explanation I can come up with is hit-and-run. The runner is being sent, the batter protects and swings at the bad pitch, the ball rolls away from the catcher. The runner from first would have made it to second anyway, but due to the fact that he started prior to the pitch, his advance is being credited as a stolen base and not as a variety of a PP. I let this one just be.

###### Putting everything together

I originally only used the 2010 and 2011 data to model the expectancies, because I wanted to use the 2008 and 2009 as a sort of a control group. That’s true for the most of the regressions I used, although few of them occurred to me after I have imported the 2009 season (base runners, pitch type and swinging). After importing 2008, I ran a number of checks, comparing what my model would expect and what really happened. I looked at close pitches, clearly wild pitches, slow ones, fast ones, the ones by left handers, the ones by right handers, split them by inning, year and month and they all held up rather well. Here is the most random split I thought of:

I thought of using the day of the week, too, but I was afraid I could run into a replacement-catcher-on-a-Sunday-morning bias.

The bottom line is that this model has its inaccuracies. With what I looked at and the way I looked at it, they seemed to be acceptable. In no way or form am I suggesting that it is perfect—and I’m certain that there will be objections and/or desired improvements out there—but in order to carry on from here I will use it as an evaluation tool for the catchers. For better or for worse.

###### Evaluation

You want some names, right? Here are some names. This is the list of 15 best catchers in blocking pitches over the last four years:

What columns mean:

{exp:list_maker}cPP: Expected number of passed pitches from the model

Pitches: Number of qualifying pitches as described above

PP: Actual number of passed pitches

PP+/-: The difference, with positive numbers indicating catchers who blocked more than their fair share of pitches

Rpp+/-: The number of runs above or below average, using 0.28 conversion rate from *The Book *

Rpp120: Prorated runs saved using 120 games and the league average 42 PP qualifying pitches per game{/exp:list_maker}

And here are the 15 worst ones:

*(complete data here)*

The swing between the best and the worst catchers seems to be about one win a year. Or, put in absolute terms, over last four years Yadier Molina’s performance blocking pitches was about three-and-a-half wins more than that of Miguel Olivo.

###### Is it a skill?

I used all the catchers with at least 1,000 chances in each of the last four years and split their even and odd years. This is how these two buckets compare:

*(complete data here)*

###### Glove versus arm

I mentioned that perhaps some catchers get good results blocking pitches because the runners are afraid to take their chances against good throwing arms. I checked the correlation between preventing base stealing and preventing advances on passed pitches, but found none:

###### Playing with the numbers

Recently, Mike presented his great research on catchers’ skills in framing pitches. FanGraphs offers the data on the catchers’ abilities to prevent stolen bases. What if we combined all these numbers for 2011?

*(data for catchers with at least 500 defensive innings in 2011 here)*

We see the heavy influence of the framing component. Alex Avila was below average both with his glove and his arm, yet he more than made up for it with the framing part. Mike Napoli lost his overall lead, but—for those of you counting at home—he was still five-and-a-half wins better than Jeff Mathis.

And, finally, I looked at the defensive talent spread observed in 2011, by defensive positions. I included all players with at least 500 innings played at that position and for everyone but catcher, I used UZR/150 numbers. For catchers, I once used only the stolen bases component of their defense and once the cumulative number comprised of all three components:

###### Next steps

Or, perhaps, previous steps? There are at least two other researches on this topic. Dan Turkenkopf wrote about it more than three years ago and Dave Allen took a similar approach as I did back in 2009. When I started my work, I was aware of the former, but not of the latter. Before you start asking me whether I’ve been living under a sabermetric rock or on a deserted island, let me preemptively admit—I *did *grow up on a small Mediterranean island that, by most standards, could be considered pretty deserted. So, I have that working for me.

What can be done next?

First, this model can be further improved upon. Just as I was finishing this article, I realized another dependency:

It took me a while to realize why the relative PP percentages went down in the last three rows. When there are runners on first, first and second or on every base and the count is already three balls, a passed pitch that was not swung at will be masked by the runner advancing on base on balls. I’ll implement that into the model, but I do not expect any significant changes out of it.

The other thing we can see from this chart is that pitchers throw tougher pitches to block when they are ahead. So, a possible further step would be to look at the whole issue from the pitchers’ side. Are pitchers more likely to go for the strikeout by bouncing the curveball in the dirt when they have a good blocking catcher? Is it quantifiable?

And, you know, you could always go discover Jovanotti’s music.

Dan Turkenkopf said...

Wow. Amazing work.

Need to re-read to capture it all, but at first glance this looks phenomenal.

tangotiger said...

Brilliant!

{clap clap clap}

Harry Pavlidis said...

be still my heart

Lucas Apostoleris said...

Incredible work.

Marc Normandin said...

Poor Saltalamacchia. But learning just how much catching Wakefield hurts him was worth the read, for sure. Excellent work.

Ivan Grushenko said...

This is the most awesome thing ever!

Max said...

Bravo. what a great article. Thank you.

J.P. McIntyre said...

Absolutely tremendous work.

Bojan Koprivica said...

Thanks, everyone.

Marc, yes, Salty is just a tad below average in blocking everybody else, but he is epically bad against the knuckleball.

Mike Fast said...

This is a fabulous article, Bojan. Very well done. Very thorough, well modeled, and presented very clearly.

Kevin said...

Amazing stuff here. Awesome work.

Brad Johnson said...

Best article I’ve read in awhile. Get any calls from MLB clubs yet?

Bojan Koprivica said...

Thanks Mike, and again, thanks for your help.

Brad, sadly no. The scouting report on me says that I can’t hit the curveball and I have to admit it’s true.

But, for a funny and true story, a friend of mine got contacted by MLB some six or seven years ago. He was coaching a low level baseball team in Germany, something like the third or fourth highest level. The kind of team where nobody makes any money and on good days there are 50 people in the stands.

They were called Falcons and apparently their logo looked too much like the Toronto Blue Jays one. Anyway, they were somehow discovered and then contacted by MLB and threatened with a lawsuit unless they change it.

gerevpstr said...

Notice the difference between Yadier and his brothers. The reason is Mike Matheny – whose career was significantly shortened because he blocked every ball like it was the 7th game of the world series. He threw his whole body in front of the pitch, landing directly on his knees and then bent over the ball with his body, closed the 5 hole with his glove and SMOTHERED any bounce. During the season Molina does this in critical situations – every couple games. In the post season he has done this 3-4 times a game.

Brett Hale said...

Curious, where would Posey end up? or is his sample size to small to get any really good indication of how productive or unproductive he is?

Awesome article all around.

Larry Lawrence said...

So many variables to consider in this particular defensive skill. What is amazing is the fact that these guys make the hard stops look routine, and actually glove most of the damn things!

Don’t forget the obvious one- a coach puts his best guy out there with the wildest pitchers.

Bojan Koprivica said...

Brett, thanks.

Buster Posey ranked 21st out of 64 catchers with at least 1000 PP opportunities in 2011 (with 3.2 Rpp/120) and 23rd out of 67 in 2010 (with 2.3 Rpp/120). Over those two years he amassed some 4,500 chances, so if you are looking for true talent, your best guess is regressing him some 40% to the mean.

I addedd the 2010 and 2011 rankings for those who are interested. They can be found here:

holzfeder.com/gallery/THT/ranking2010.csv

holzfeder.com/gallery/THT/ranking2011.csv

Atul said...

erm, wheres Joe Mauer?

Scooter said...

This is wonderful. Not just the data; I enjoyed your writing as well.

I see that you examined which bases were occupied, and the count. It left me curious whether the number of outs made a difference.

Thanks again for doing this and for sharing it with us.

Eric Dykstra said...

Fantastic article!

One idea on swing vs no swing: batters are more likely to swing at balls that are easier to block? Unless you already accounted for this.

Bojan Koprivica said...

Atul, if you refer to the adjusted WAR rankings, he is not included because he only caught 408 innings this year.

Scooter, thanks. I can have a look at this.

Eric, thank you. The other factors are already accounted and controlled for when looking at the swinging influence.

David said...

Wow! Well done Bojan!

Choo said...

Great work! You had me at Rob Johnson -7.8.

Nathan Aderhold said...

This is wonderful.

Thanks so much for sharing your hard work, Bojan.

In my fantasy world, Mike Scioscia finds this article on his desk tomorrow morning and Mathis is DFA’d by Friday.

One can dream.

floydarogers said...

This is great; and appears to have taken an immense amount of work.

One thing that could be an improvement, however. It’s very clear that the catcher (and his mitt) is not always positioned directly behind the centerline of the plate. They move left and right to put their mitt/person WHERE THE PITCH IS EXPECTED. Essentially, the C1-C4 changes depending where the catcher and pitcher have decided to throw the ball (in/out). I shudder to consider how much work it would be to account for that.

Luis said...

FABULOUS work. THANK YOU!!

Bojan Koprivica said...

floydarogers,

Thanks and you are absolutely right. What this model doesn’t account for is the “element of surprise”, such as a pitch inside when it should be outside, a fastball instead of a curveball and such (not that a pitch 10 feet in front of the catcher is not a “surprise”).

I am afraid you are also correct on the difficulty of implementing it into the model, at least with the data that is publicly available to me right now.

Tim_the_Beaver said...

great post for THT and glad you shared with AN. just fantastic. Can I start calling you ‘Bro’jan?

Brandon T said...

Great work! This answers a question that’s been around for a long time. Once question, though: if we throw out Wakefield for 2011 (both easy to steal on and hrad to catch), where does Salty end up, i.e. what’s his true talent, non-knuckleball division?

Bojan Koprivica said...

Brandon, I don’t have the “making of” numbers for throwing out baserunners, so I can only go with the final number that FanGraphs offers. Sorry.

As for blocking pitches, if we remove knuckleballs out of the equation, we are left with following:

cPP = cPP(all) – cPP(KN) = 86-2 = 84

PP = PP(all) – PP(KN) = 117-32 = 85

Sample size = 10075 – 510 = 9565

So, without having to worry about Wakefield’s knuckleball, Salty appears to be perfectly average in blocking pitches.