The Year in PITCHf/x Calibration
This week, I handed in potentially the final paper of my academic career. It was titled, "The History of PITCHf/x." That is to say that I greatly enjoy thinking about, reading about, and writing about PITCHf/x data. So I don't mean to cast PITCHf/x in a negative light by bringing up its calibration issues, but data is kind of worthless without knowing the error involved. And while PITCHf/x is precise within a fraction of an inch, the accuracy is not always there, as some ballparks can report errors more along the lines of fractions of a foot.
The list of public analysts who have completed data correction systems is only a few names long. I believe Mike Fast, Josh Kalk, Harry Pavlidis, and Ike Hall have done some quality work in the area. My first pass is likely not as rigorous as their methods, but I feel I stumbled upon enough points of interest to warrant writing something up. My sample consisted of the fastest 25% of pitches thrown by each pitcher in each game. I compared the actual properties of those pitches to a set of expected values. These expected values were generated by finding the average properties of pitches thrown in other ballparks by the same pitchers. There were five values that I tested: the initial horizontal and vertical position (release point), the resultant horizontal and vertical position (plate location), and the pitch velocity.
One mid-august homestand in Houston jumped out at me. The graphs I present below contain the actual and expected values as detailed above, as well as the difference between the two, which loosely represents the magnitude of correction needed.
You can see that the actual release points and the expected release points follow each other quite well over the first half of the season. For instance, when two left-handed pitchers start, the average release point jumps to the opposite side of the graph. But then in August, the blue delta line spikes by a foot. I created a gif comparing all of Brett Myers' release points leading up to his August 13 game and his recorded release points in that game. Without context, it would be easy to draw the conclusion that Myers had altered his approach.
Some parks were consistently miscalibrated the entire year. Or perhaps the rubber on the pitching mound was off-center. Kansas City had on average a three-inch difference between the actual and expected horizontal release points. This was certainly the fault of Dayton Moore.
More importantly, Kansas City overstated velocity, a trend fortunately spotted by Jeff Zimmerman early on in the season. Here, the delta line is plotted on a different axis.
On average, the delta was 1.1 miles per hour, the exact same number reported by Mike Fast.
Mike published his own 2010 velocity corrections on THT, and I found the correlation coefficient between his and mine to be 0.8.
Texas was at the other end of the spectrum.
And Detroit was fine until the final months of the season.
Like Kauffman, Dodger Stadium was on average three inches off with its horizontal release points. Several parks deviated a couple inches from what we'd expect with their vertical release points. Again, rubber position and mound heights are not standardized across MLB, so it could be that pitchers do throw from different release points depending on the stadium. Citizens Bank and Yankee Stadium reported high release points, while Safeco and Petco came in lower.
Plate location adjustments are much harder to nail down. For one, the values reported by PITCHf/x around the plate are generally accurate, as they are more directly observed by cameras, as opposed to the release points which are extrapolated. Furthermore, pitchers vary their intended pitch locations much more than they do their release points. The park with the greatest pitch location abnormality is Yankee Stadium, and the reason is clear. The Yankees possess such a disproportionate number of left-handed batters that pitchers throw to the third-base side of the plate more than they would against any other team.
Correcting PITCHf/x data seems hard. Differences in a ballpark's configurations and a pitcher's intentions are difficult to separate from an oddity in PITCHf/x calibration. Including batter handedness appears vital, given that pitchers shift their position on the rubber or throw to a different side of the plate depending on batter handedness. I do not think that an automated correction system is the answer to correcting PITCHf/x data. I envision how hard it would be to pick up on sudden shifts in the data that stem from recalibrations without picking up on the random game-to-game noise. It would possibly be easiest to simply eyeball a span of time during which one fixed level of adjustment is needed.