PitchF/X Detective: Has Bradley's Strike Zone Been Widened
This claim was brought to my attention in Craig Calcaterra's ShysterBall blog where he suggested that someone with "PITCHf/x-fu" could check this assertion. I am not 100% sure what "PITCHf/x-fu" is, but I like to think I have it. Either way I thought this was an exciting new application of the pitchf/x data, so I decided to take Craig up on it and see if Bradley's strike zone has been any different this year.
First off we need the smallest bit of background on the strike zone. It is called differently to right- and left-handed batters; the outside edge is extended out a couple inches to lefties. In addition, its size is count-dependent, expanding in hitter's counts and shrinking in pitcher's counts. These two facts make an assessment of Bradley's claims a little tricky. He is a switch hitter so we have to break up the analysis for him as a LHB and as a RHB. And any differences could be the result of differences in the fraction of time he is in hitter's versus pitcher's counts this year compared to the past.
The pitchf/x system was phased-in in 2007 and has been operational in every game since, so I am going to compare pitches Bradley took in the part of 2007 covered and all of 2008 to those he took in 2009 thus far (ignoring the count issue temporarily). Here are the pitches he took as a RHB. Remember, the images are from the catcher's, so negative values of x are inside to a RHB and positive inside to a LHB. The gray dots are balls and the black dots called strikes.
There are too few taken pitches in 2009 as a righty to make much of a firm conclusion, but it does not look terribly out of whack. There are two called strikes on the inside edge, but right below them are four balls also along the inside edge.
Here are pitches he took as a LHB.
Bradley has way more at-bats as a lefty and thus there are more taken pitches. These addition pitches allowed me to make called strike contours. These contours are closed lines such that a pitch inside the line is a strike 50% of the time or more and a pitch outside the line is a ball 50% of the time or more. Here you can see how the outside edge of the strike zone is shifted farther outside to Bradley as a lefty, as is the case to all LHBs. The inside edge of the pre-2009 and 2009 zones are almost exactly the same. Up and outside the pre-2009 zone is larger, but down and outside the 2009 zone is larger. As a whole the two are almost exactly the same size.
To make this conclusion statistically explicit, and correct for the count, I ran a binomial logistic regression. This is a regression in which the dependent variable only takes two values, in this case 1 if a taken pitch is called a strike and 0 if it is called a ball. The dependent variable is regressed against any number of ordinal and/or categorical variables. In effect this binomial logistic model uses these regressors to calculate the probability a taken pitch is called a strike, and tells you which of the regressors are statistically significant in determining that probability. The technique is identical to that taken in my earlier strike zone post, but this time I restrict the analysis to just Bradley's data.
I regressed Bradley's strike/ball taken pitches against the horizontal distance between that pitch and the horizontal middle of zone (with a different middle for Bradley as a LHB and RHB), the vertical distance from that pitch and the vertical middle of zone, the interaction of these two distances, the number of balls and strikes (to control for the count) and a categorical factor of pre-2009 or 2009.
Binomial Logistic Regression +-----------------+----------+------------+---------+------------+ | | Estimate | Std. Error | z Value | P(>|z|) | +-----------------+----------+------------+---------+------------+ | (Intercept) | 5.995 | 0.370 | 16.21 | < 2e-16 * | | x Dist. | -0.364 | 0.022 | -16.37 | < 2e-16 * | | y Dist. | -0.526 | 0.031 | -17.48 | < 2e-16 * | | x*y Interaction | 0.012 | 0.000 | 13.87 | < 2e-16 * | | Num. Strikes | -0.897 | 0.178 | -5.03 | 4.8e-07 * | | Num. Balls | 0.251 | 0.085 | 2.96 | 0.003 * | | 2009 | -0.023 | 0.217 | -0.10 | 0.914 | +-----------------+----------+------------+---------+------------+
Regressors with a negative estimate decrease the likelihood of a pitch being called a strike. So as the x or y distance increases the probability of a strike decreases, as expected. As the number of strikes increases the probability of a strike decreases (the strike zone shrinks in pitcher's counts) and as the number of balls increases the probability of strike increases (the strike zone expands in hitter's counts). All of these effects are strongly significant and mirror the results for all hitters.
The difference between the pre-2009 and 2009 zone is very slight, and if anything the 2009 zone is slightly smaller. Taken pitches in 2009, correcting for distance and count, are slightly less likely to be strikes. But this effect is very non-significant. There is over a 90% chance the difference between pre-2009 and 2009 zones is just due to chance alone. There is no statistical difference between Bradley's zone this year and his zone in 2007 and 2008.
I can understand Bradley was frustrated on Sunday. The Cubs had just lost seven straight games, and in five of those games they scored either zero or one run. He is hitting a meager .196/.322/.373 this season, but he has his decreased BABIP and LD% and increased GB% to blame for it, not the umpires.