On Count-Based Linear Weights
Ever since the work of Joe P. Sheehan, pitch-by-pitch run values have been a staple of PITCHf/x analysis. More recently, Bloomberg analysts Craig Glaser and Pat Andriola really got me thinking about what these values might mean.
We all know that Cliff Lee's walk rate is otherworldly. But last week, Jeff Sullivan wrote, "Of the 201 pitchers in baseball with at least 50 innings pitched, Lee's three-ball count rate is lower than 67 individual walk rates." That is an awesome piece of information. Let's say you have a pitcher who somehow manages a walk rate identical to Lee's, and we can say he has the same strikeout and home run rates too. But what if we knew that this pitcher had, say, twice as many three-ball counts as Lee. They may have been of equal value, but surely Lee projects better going forward.
FanGraphs has a whole assortment of what they call plate discipline stats. In essence, these stats are trying to separate the process from the results. A pitcher has a high strikeout rate. Does he throw a lot of strikes or does he induce out-of-zone swings? A batter has a high strikeout rate. Does he never swing or does he never make contact?*
*To those who do such things, please don't use contact rate to predict strikeout rate.
Here's where count-based linear weights come into play. Everything that happened before the result of a plate appearance can be summed up best by the count. A pitcher who walks nobody has better process if he never even goes to three-ball counts, like Cliff Lee.
Using Retrosheet data since 2002, I found the expected run value of the final pitch of every plate appearance, excluding intentional walks. So if a player homers on the first pitch of an at-bat, that goes down as 0 runs toward his count-based linear weights. In turn, a pitcher will have a worse score if he walks a batter on a 3-0 count than a 3-2 count. Here are the values straight from Joe's article. Harry Pavlidis and others have used updated values.
Barry Bonds and Curt Schilling stand unparalleled in getting into quality counts. Angel Berroa and Kirk Rueter not so much. Players who get into good counts but have bad results more often than not are burned by BABIP.
As for the top and bottom performers of 2009, here are the hitters:
And the pitchers:
After spending some time with the data, I've unfortunately yet to find much predictive power in the metric, beyond what we can get out of normal peripheral stats. Nevertheless, I think there's value to a count-based linear weight as a DIPS-type metric for pitchers.