Touching BasesAugust 19, 2010
On Count-Based Linear Weights
By Jeremy Greenhouse

Ever since the work of Joe P. Sheehan, pitch-by-pitch run values have been a staple of PITCHf/x analysis. More recently, Bloomberg analysts Craig Glaser and Pat Andriola really got me thinking about what these values might mean.

We all know that Cliff Lee's walk rate is otherworldly. But last week, Jeff Sullivan wrote, "Of the 201 pitchers in baseball with at least 50 innings pitched, Lee's three-ball count rate is lower than 67 individual walk rates." That is an awesome piece of information. Let's say you have a pitcher who somehow manages a walk rate identical to Lee's, and we can say he has the same strikeout and home run rates too. But what if we knew that this pitcher had, say, twice as many three-ball counts as Lee. They may have been of equal value, but surely Lee projects better going forward.

FanGraphs has a whole assortment of what they call plate discipline stats. In essence, these stats are trying to separate the process from the results. A pitcher has a high strikeout rate. Does he throw a lot of strikes or does he induce out-of-zone swings? A batter has a high strikeout rate. Does he never swing or does he never make contact?*

*To those who do such things, please don't use contact rate to predict strikeout rate.

Here's where count-based linear weights come into play. Everything that happened before the result of a plate appearance can be summed up best by the count. A pitcher who walks nobody has better process if he never even goes to three-ball counts, like Cliff Lee.

Using Retrosheet data since 2002, I found the expected run value of the final pitch of every plate appearance, excluding intentional walks. So if a player homers on the first pitch of an at-bat, that goes down as 0 runs toward his count-based linear weights. In turn, a pitcher will have a worse score if he walks a batter on a 3-0 count than a 3-2 count. Here are the values straight from Joe's article. Harry Pavlidis and others have used updated values.

Count  Runs/PA
3&0    0.207
3&1    0.137
2&0    0.097
3&2    0.062
2&1    0.035
1&0    0.034
0&0    0.000
1&1   -0.016
2&2   -0.037
0&1   -0.043
1&2   -0.083
0&2   -0.104

Barry Bonds and Curt Schilling stand unparalleled in getting into quality counts. Angel Berroa and Kirk Rueter not so much. Players who get into good counts but have bad results more often than not are burned by BABIP.

As for the top and bottom performers of 2009, here are the hitters:

Player PAs Total lwts Count lwts
Chipper Jones 577 9.5 11.1
Lance Berkman 546 24.6 8.7
Albert Pujols 665 59.0 8.6
Adrian Gonzalez 657 35.8 8.2
Nick Swisher 655 15.6 7.7
Ivan Rodriguez 447 -20.8 -10.9
Jose Lopez 642 -5.8 -10.9
David Eckstein 553 -16.4 -10.9
Miguel Olivo 414 -1.8 -11.1
Clint Barmes 607 -10.9 -16.1

And the pitchers:

Player PAs xFIP Count lwts
Cliff Lee 1103 3.69 -20.9
Roy Halladay 962 3.05 -20.8
Justin Verlander 968 3.26 -19.6
Johan Santana 689 4.13 -16.6
Cole Hamels 885 3.69 -16.1
Kyle Davies 532 5.12 4.8
Doug Davis 871 4.68 5.1
Joe Saunders 837 4.8 5.2
Trevor Cahill 764 4.92 5.2
Zach Miner 405 4.86 6.1

After spending some time with the data, I've unfortunately yet to find much predictive power in the metric, beyond what we can get out of normal peripheral stats. Nevertheless, I think there's value to a count-based linear weight as a DIPS-type metric for pitchers.

Comments

Good stuff as usual. I'm wondering, though - is using whiff rate as a predictor of strikeout rate *completely* useless? Even though there are other variables involved, I'd figure that a large disparity between whiff rate and strikeout rate would even out over a season. One example I've noticed this year is Phil Hughes, who's maintained a relatively low whiff rate for most of the year and has recently seen his strikeout rate plummet.

Jeremy, I don't quite understand. How did you compute "count lwts?" Each single count in an at bat times the weight? That is, if a plate appearance goes to 3-2, did you count all four previous counts? What did you do with the final pitch?

I guess I am a bit confused as well.

What is the difference between the "count lwts" and "total lwts"?

How were they computed, as studes asked. Thanks.

Studes, sorry I wasn't clear.

If a PA goes to 3-2, I counted it as 0.062 runs. If you add the four previous counts, you will still eventually get to 0.062. I didn't count the final pitch, since that pitch is what we already know from runs created or linear weights.

I'm still not clear on what you are doing. For example, the value for ending the PA on the 1st pitch is 0.000--yet the OPS+ for these PAs is 130 or so.

And the link you provide for not trying to predict K rate from contact rate is a thick piece on multicollinearity, which will only be understood by people with a better statistics background than many of your readers. Why not just state the principle in simple English?

I think I get it. Instead of giving the batter who puts a ball in play on the first pitch the league avg result of that, you are giving him the expectancy right before that, which is of course 0 lwts runs.

Not sure if this is the best way to do it.

Dave,

Sorry if you took offense to my being glib in directing people to the concept of multicollinearity. It was meant as a throwaway line.

I'm wholly open to the thought that the method I used is completely wrong. I do think the concept to using the final count of each plate appearance has merit in some way or another, so if you have any ideas on how to convert that into runs, I'd love to hear it.

Well, why not give the batter credit for the average result of a PA ending on a given count? For example, in 2009 my quick calc shows that a PA ending on the first pitch was worth around +.04 lwts runs. Give the guy the .04 'count runs', whether or not his physical talent (and luck etc) actually resulted in his gaining .07, or losing .02, on the 1st pitch.

Most analysts like to use the 'through' counts as the preferable set of numbers, but IMO all that really matters is the end result, which are the 'at' counts. If a batter ends a PA on 1-2, who cares if he went thru the favorable 1-0 count on the way there--he ended the PA on (on average) a mediocre pitch to hit.