Controlling the Zone
"The STRIKE ZONE is that area over home plate the upper limit of which is a horizontal line at the midpoint between the top of the shoulders and the top of the uniform pants, and the lower level is a line at the hollow beneath the knee cap. The Strike Zone shall be determined from the batter's stance as the batter is prepared to swing at a pitched ball." Eddie Gaedel knows not a called strike. The 3-foot-7 dwarf took four balls in his lone Major League plate appearance. (If you want to see a discussion on the practicality of short pinch-hitters taken well beyond its logical extreme, follow this link. Gaedel physically shrunk the strike zone. I’m interested to see what batters can control the strike zone without any such advantage. Who manages to earn a ball on a pitch on the black or a strike on a pitch at the letters? That’s where pitchf/x comes into play. John Walsh and Dave Allen have found the true dimensions of the strike zone using pitchf/x data. Jonathan Hale has studied individual umpire strike zones and found that Cy Young winners and control pitchers get better calls, and Hale dispelled the myth that rookies get big leagued by umps. I assigned every pitch since 2008 an expected called strike probability based on the horizontal location of the pitch and a scaled vertical location*, while also accounting for batter handedness, pitch movement/velocity, and the umpire. After that, I added up the expected balls and called strikes of players, and the actual ball/strike numbers for all players. Here are the batters who have the largest disparity between their expected ball probability and the actual rate at which balls are called on them.
Michael Young and Carlos Beltran (who I suppose is synonymous with the called strike to Met fans) have the highest and lowest number of extra balls among all players, respectively. The average difference between a called strike and a ball is between a tenth and an eighth of a run. So Young has gotten nearly 20 runs of value out of controlling the strike zone better than Beltran has. To look deeper into this, I plotted their respective strike zones (Beltran's a switch hitter, so two for him) against the league average strike zones. Inside these contour lines, a pitch is more likely than not to be called a strike, while outside the contour lines, pitches are called for balls greater than 50% of the time. The difference between Beltran and Young can be seen at the knees. I should note the caveat that this entire effect could be caused by a few stringers listing Beltran’s bottom of the strikezone too high and Young's too low. I don't want to make any rash conclusions on what type of players get the benefit of the doubt from umpires, but with three Rangers in the top ten, and another five Rangers in the next dozen on my list, I feel that I can say with confidence that Rudy Jaramillo is paying off umpires. Just throwing it out there. But I'm pretty sure it's true. Seriously, though, one of the first things I noticed was that 10 of the top 30 players on the leaderboard were catchers. It turns out catchers are 2-3% more likely to have a pitch called a ball than average. It's fully possible that that's just noise, of course. I was especially interested in batters' luck in full count situations. The leverage of a full count is double that of any other count, with the disparity in value between a walk and out coming in at around 0.6 runs. It turns out that Jack Cust, who has taken more full count pitches in the last two years than anyone but Adam Dunn, has had easily the best luck on full counts, with ten more balls called than expected. (Dunn's had one fewer than expected.) Here I've plotted Cust's called strikes in green, balls in red, and the average LHB strike zone contour in blue. I count two strikes easily outside the zone, and nine balls that were easily inside the zone. Most batters experience a smaller strike zone on full count than on average, but Cust has been particularly lucky. Serves him right for not swinging too often in a full count. How about on the pitcher's side?
About the reliability of these ball and strike probabilities: For batters, the split-half correlation for "ball probability," (which I'm defining as the probability of a called pitch being called a ball above what is expected) reaches .5 when I limit my sample to batters with minimum of 125 called pitches. It takes batters with at least 600 called pitches to reach a .7 correlation. The league average pitches per plate appearances is 3.8, and an average of 2.1 of those pitches are called for a ball or strike by the umpire. So I’d say that it takes about 300 plate appearances for this metric to stabilize. You can compare that to more common metrics by reading the series by Pizza Cutter. or a sample of players with at least 50 plate appearances to know to regress halfway to the mean. For pitchers. r = .5 when pitchers the sample of pitchers has thrown at least 60 called pitches, and 300 called pitches to reach an r of .7. *And Glove Slap to Tango on how to scale vertical location. I unfortunately decided to use the mean values of every batter's top and bottom strike zone values as inputted by MLBAM stringers. I probably should have scaled to the median, or better yet the median by month. Maybe next time. |
Comments
The really striking thing about this analysis is that the average strike zone is an ellipse! Isn't it supposed to be a rectangle? Maybe that's the whole problem. We are relying on umpires, whose eyes have round lenses, to call strikes within a rectangular boundary. They simply cannot see pitches on the outer parts of the boundary, and thus call them balls. That's a really good insight, but I wonder what anyone can do about it.
Posted by: tim at December 3, 2009 8:02 AM
I've been calling for umpires with square lenses for a long time now. I don't think the MLB is willing to invest the research $$ required to get it done. Too bad.
Posted by: cdm at December 4, 2009 6:53 AM
Jeremy, this is a great piece. But the part I respect the most is where you note the potential error of the stringers. I don't see that often.
Posted by: JDSussman at December 4, 2009 7:14 AM
Obviously, I was referring to some kind of computerized system that would open up the strike zone to a more rectangular dimension. Now go back to spending your dad's trust fund money on ironic t-shirts and weed, cdm.
Posted by: tim at December 4, 2009 11:28 AM