Touching Bases December 24, 2010
The Year in PITCHf/x Calibration

This week, I handed in potentially the final paper of my academic career. It was titled, "The History of PITCHf/x." That is to say that I greatly enjoy thinking about, reading about, and writing about PITCHf/x data. So I don't mean to cast PITCHf/x in a negative light by bringing up its calibration issues, but data is kind of worthless without knowing the error involved. And while PITCHf/x is precise within a fraction of an inch, the accuracy is not always there, as some ballparks can report errors more along the lines of fractions of a foot.

The list of public analysts who have completed data correction systems is only a few names long. I believe Mike Fast, Josh Kalk, Harry Pavlidis, and Ike Hall have done some quality work in the area. My first pass is likely not as rigorous as their methods, but I feel I stumbled upon enough points of interest to warrant writing something up. My sample consisted of the fastest 25% of pitches thrown by each pitcher in each game. I compared the actual properties of those pitches to a set of expected values. These expected values were generated by finding the average properties of pitches thrown in other ballparks by the same pitchers. There were five values that I tested: the initial horizontal and vertical position (release point), the resultant horizontal and vertical position (plate location), and the pitch velocity.

One mid-august homestand in Houston jumped out at me. The graphs I present below contain the actual and expected values as detailed above, as well as the difference between the two, which loosely represents the magnitude of correction needed.

You can see that the actual release points and the expected release points follow each other quite well over the first half of the season. For instance, when two left-handed pitchers start, the average release point jumps to the opposite side of the graph. But then in August, the blue delta line spikes by a foot. I created a gif comparing all of Brett Myers' release points leading up to his August 13 game and his recorded release points in that game. Without context, it would be easy to draw the conclusion that Myers had altered his approach.

Some parks were consistently miscalibrated the entire year. Or perhaps the rubber on the pitching mound was off-center. Kansas City had on average a three-inch difference between the actual and expected horizontal release points. This was certainly the fault of Dayton Moore.

More importantly, Kansas City overstated velocity, a trend fortunately spotted by Jeff Zimmerman early on in the season. Here, the delta line is plotted on a different axis.

On average, the delta was 1.1 miles per hour, the exact same number reported by Mike Fast.
Mike published his own 2010 velocity corrections on THT, and I found the correlation coefficient between his and mine to be 0.8.

Texas was at the other end of the spectrum.

And Detroit was fine until the final months of the season.

Like Kauffman, Dodger Stadium was on average three inches off with its horizontal release points. Several parks deviated a couple inches from what we'd expect with their vertical release points. Again, rubber position and mound heights are not standardized across MLB, so it could be that pitchers do throw from different release points depending on the stadium. Citizens Bank and Yankee Stadium reported high release points, while Safeco and Petco came in lower.

Plate location adjustments are much harder to nail down. For one, the values reported by PITCHf/x around the plate are generally accurate, as they are more directly observed by cameras, as opposed to the release points which are extrapolated. Furthermore, pitchers vary their intended pitch locations much more than they do their release points. The park with the greatest pitch location abnormality is Yankee Stadium, and the reason is clear. The Yankees possess such a disproportionate number of left-handed batters that pitchers throw to the third-base side of the plate more than they would against any other team.

Correcting PITCHf/x data seems hard. Differences in a ballpark's configurations and a pitcher's intentions are difficult to separate from an oddity in PITCHf/x calibration. Including batter handedness appears vital, given that pitchers shift their position on the rubber or throw to a different side of the plate depending on batter handedness. I do not think that an automated correction system is the answer to correcting PITCHf/x data. I envision how hard it would be to pick up on sudden shifts in the data that stem from recalibrations without picking up on the random game-to-game noise. It would possibly be easiest to simply eyeball a span of time during which one fixed level of adjustment is needed.

Per commenter request:

Any chance your paper will be available for viewing online any time in the future?

Also, good work as always. I knew about the Houston inaccuracies (I spotted it due to it affecting a few Met pitchers as the Mets were in Houston during hte abnormal results), but good to see it posted.

Thanks for posting this, Jeremy. It's great to see the calibration errors presented graphically. How consistent was the Yankee Stadium velocity last year?

Ha, no. Not that kind of paper. You wouldn't want to read it.

Let me know if you're aware of any other inaccuracies.

Nice article Jeremy. One minor correction. The home plate pitch locations are extrapolated about the same amount as the release point values. Because the Pitch Fx ball identification algorithm would be confused by batter and bat movement, the processing of ball locations ends about 8 to 10 feet in front of home plate. However, the accuracy of the home plate locations should be better because there are more calibration points close to home plate.

At the 2009 Summit we asked Marv White to publicly post when major corrections had occurred at the different parks and what the correction factors should be, so that we could all be using identical correction factors and have uniform databases for analysis. He agreed to this, but with his departure from Sportvision that request seems to have lost in the shuffle. At this year's Summit I reminded Ryan Zander about that promise and he said that he would try to put it on the agenda of things to accomplish during the off season.

Thanks, Peter. I remember having read that the plate locations were more accurate than the release points, and I appreciate your clarification as to the reason why.

That would be nice if Sportvision published corrections, although I guess it would mean that I wasted several hours over the last few days trying to come up with something.

Good work, Jeremy.

I believe the Houston excursion in August 2010 is the worst in the history of PITCHf/x.

Given how often these things shift, and that the fact that frequently you can see them progressively getting worse/better over time, I don't believe that the shifts are due to physical changes in the mound or pitching rubber. There may be a few such physical changes showing up in the data, but they don't make up the majority by any means.

Jeremy, what we would like from Sportvision is a record of when they recalibrated cameras and the resultant calibration parameters.

That's different than correcting the final data. The cameras can go out of calibration without Sportvision realizing it or doing anything about it, and in fact I believe they often do.

One other thought, regarding velocity corrections, the difficulty I have run into with them is that the correction varies based on velocity. For instance, an 85-mph pitch may need to be corrected by a different amount than a 95-mph pitch.

Position corrections are fairly straightforward (though nontrivial), but velocity corrections are a hairy mess by comparison.

What Marv impressed upon me was that when a miscalibration occurs it is going to affect all the Pitch Fx parameters in a non linear way. Corrections made on individual parameters although close, are only going to be approximate. That's why we really need to have resultant calibration parameters from Sportvision.

Peter, any corrections, whether made with knowledge of the camera calibration parameters or not, are going to be approximate. Knowing the camera calibration parameters would help us help Sportvision improve the initial accuracy of their data. So it would be helpful in that sense.

However, I don't believe that knowledge of the camera calibration parameters is necessary to provide optimally corrected data at a practical level.

Lucas, I added Yankee Stadium velocity to the end of the article.

Mike, good to know these are likely shifts in the PITCHf/x data and not the ballparks.

I had a line in the article that I took out about not knowing squat about drag forces, which led me to calculate velocity adjustments the same way. Can you explain in some way I might be able to understand why you can't just take off a fixed percentage from the initial velocity?

"Citizens Bank and Yankee Stadium reported high release points, while Safeco and Petco came in lower."

Any chance the clubs are purposefully raising/lowering the mound to combat the respective offensive effects these parks have? Probably nonsense but just a passing thought as I read the article.

Andrew, unless the Phillies are moving the mound up and down all the time, I doubt the changes in release point are a real thing. There have been at least six noticeable shifts up and down over the last two years at Citizens Bank Park, some of them as much as 3-4 inches.

Similarly, at Safeco, you'd have to believe that they gradually built up the mound by and inch or two over the first three homestands in 2009, then lowered it somewhat for the next two homestands, then gradually built it up again for the next two homestands, then maintained it through the first homestand in 2010, then radically dropped the mound by 3 inches, then raised it back an inch and let it gradually inch up until the last couple homestands of 2010.

Yankee Stadium and Petco haven't seen the same whipsaw back and forth in 2010, but they have in previous seasons.

It's not impossible that teams are changing the mounds, but it seems a bit outlandish of an explanation for most of the changes we observe in the data. Now it is possible that a real change in the mound could happen here or there and we wouldn't be able to tell it apart from all the changes that are happening due to camera calibration drift and changes.

Jeremy, I think the way you are calculating the velocity adjustments is fine for a first pass. I do something similar when I want a quick and dirty estimate, and I think it works fine for most purposes. But to actually produce a fully corrected data set, one has to account for the fact that the position adjustments also affect the velocity adjustments.

Mike, right, and the position adjustments also probably affect the position adjustments. I imagine a correction system is an iterative process. Thanks for hanging around here to answer questions.

Mike,

Thanks for the reply - that all makes sense. I guess I assumed the mound height was relatively uniform throughout the year(s) and not changing constantly as you described in your reply.

And I forgot to say in my first comment: very interesting article!

Man am I a fool. Sorry for the double post but I am must correct my prior comment:

Thanks to JEREMY for the article and to MIKE for the reply.

Andrew,