F/X Visualizations March 16, 2009
Run Value by Pitch Location

[Editor's note: Dave Allen has agreed to join Baseball Analysts. He is a graduate student whose research involves analysis of spatial data and spatially explicit modeling. He also loves baseball. Dave will combine these two interests in the F/X Visualizations series.]

A lot of interesting new sabremeteric work has become possible over the past two years with the availability of the pitch fx data. In this new blog entry, I will continue this analysis and present the results in a simple, yet hopefully effective, visual manner.

This first post builds on work that Joe Sheehan did a year ago looking at the run value of each pitch based on its location. He placed each pitch into one of 25 bins and calculated the average run value in each bin. In the post he suggested that it would be interesting to get rid of the bins and take a continuous approach. A year later, it seems no one has accomplished that so I thought it would be a good way to launch my work.

Using the first table in this post, I assigned a run value to every pitch in the pitch fx database, not just pitches that ended an at-bat, and then averaged the run value of all the pitches in each location. I split the data up by handedness of the pitcher and batter. The number in parentheses is the average run value for all pitches regardless of location. The images are from the catcher's perspective so that a right-handed batter stands to the left of the strike zone and a left-handed batter stands to the right of the strike zone.

This method reproduces some of Sheehan's results:

• Pitches outside the strike zone have a higher run value than those inside the strike zone.
• Pitches down the middle of the zone have the highest run value of pitches in the strike zone.
• Inside pitches have higher run values than outside pitches.
• Pitches down and in have higher run values than those that are up and in.

This continuous approach also gives some additional insights beyond Sheehan's:
• Of outside pitches, those high in the zone have a slightly higher run value than those down in the zone. This is interesting as it seems hitters prefer inside pitches down in the zone and outside pitches up in the zone.
• The area of negative to zero to just slightly positive run value pitches (the red, yellow and green colored area) extends well beyond the defined strike zone.
• This zone of negative to zero valued pitches extends far above the strike zone peaking at x=0 over a foot above the top of the strike zone.

Tango and Lichtman made some important comments on the limitations of Sheehan's original work without splitting the data by swing/taken or pitch type. These critiques apply equally, if not more so, here because I did not split the data by count as Sheehan did.

I hope to address these points in future posts. For example, I assume the peak of negative to zero valued pitches a foot above the center of the zone is mostly the result of 'high heat' fastballs in pitcher's counts. By analyzing the run value of pitch locations for just fast balls in specific counts, I will be able to confirm or deny this assumption.