The Baseball Analysts: On Count-Based Linear Weights

On Count-Based Linear Weights

By Jeremy Greenhouse

Ever since the work of Joe P. Sheehan, pitch-by-pitch run values have been a staple of PITCHf/x analysis. More recently, Bloomberg analysts Craig Glaser and Pat Andriola really got me thinking about what these values might mean.

We all know that Cliff Lee's walk rate is otherworldly. But last week, Jeff Sullivan wrote, "Of the 201 pitchers in baseball with at least 50 innings pitched, Lee's three-ball count rate is lower than 67 individual walk rates." That is an awesome piece of information. Let's say you have a pitcher who somehow manages a walk rate identical to Lee's, and we can say he has the same strikeout and home run rates too. But what if we knew that this pitcher had, say, twice as many three-ball counts as Lee. They may have been of equal value, but surely Lee projects better going forward.

FanGraphs has a whole assortment of what they call plate discipline stats. In essence, these stats are trying to separate the process from the results. A pitcher has a high strikeout rate. Does he throw a lot of strikes or does he induce out-of-zone swings? A batter has a high strikeout rate. Does he never swing or does he never make contact?*

*To those who do such things, please don't use contact rate to predict strikeout rate.

Here's where count-based linear weights come into play. Everything that happened before the result of a plate appearance can be summed up best by the count. A pitcher who walks nobody has better process if he never even goes to three-ball counts, like Cliff Lee.

Using Retrosheet data since 2002, I found the expected run value of the final pitch of every plate appearance, excluding intentional walks. So if a player homers on the first pitch of an at-bat, that goes down as 0 runs toward his count-based linear weights. In turn, a pitcher will have a worse score if he walks a batter on a 3-0 count than a 3-2 count. Here are the values straight from Joe's article. Harry Pavlidis and others have used updated values.

Count  Runs/PA
3&0    0.207
3&1    0.137
2&0    0.097
3&2    0.062
2&1    0.035
1&0    0.034
0&0    0.000
1&1   -0.016
2&2   -0.037
0&1   -0.043
1&2   -0.083
0&2   -0.104

Barry Bonds and Curt Schilling stand unparalleled in getting into quality counts. Angel Berroa and Kirk Rueter not so much. Players who get into good counts but have bad results more often than not are burned by BABIP.

As for the top and bottom performers of 2009, here are the hitters:

Player	PAs	Total lwts	Count lwts
Chipper Jones	577	9.5	11.1
Lance Berkman	546	24.6	8.7
Albert Pujols	665	59.0	8.6
Adrian Gonzalez	657	35.8	8.2
Nick Swisher	655	15.6	7.7
Ivan Rodriguez	447	-20.8	-10.9
Jose Lopez	642	-5.8	-10.9
David Eckstein	553	-16.4	-10.9
Miguel Olivo	414	-1.8	-11.1
Clint Barmes	607	-10.9	-16.1

And the pitchers:

Player	PAs	xFIP	Count lwts
Cliff Lee	1103	3.69	-20.9
Roy Halladay	962	3.05	-20.8
Justin Verlander	968	3.26	-19.6
Johan Santana	689	4.13	-16.6
Cole Hamels	885	3.69	-16.1
Kyle Davies	532	5.12	4.8
Doug Davis	871	4.68	5.1
Joe Saunders	837	4.8	5.2
Trevor Cahill	764	4.92	5.2
Zach Miner	405	4.86	6.1

After spending some time with the data, I've unfortunately yet to find much predictive power in the metric, beyond what we can get out of normal peripheral stats. Nevertheless, I think there's value to a count-based linear weight as a DIPS-type metric for pitchers.

Comments

Good stuff as usual. I'm wondering, though - is using whiff rate as a predictor of strikeout rate *completely* useless? Even though there are other variables involved, I'd figure that a large disparity between whiff rate and strikeout rate would even out over a season. One example I've noticed this year is Phil Hughes, who's maintained a relatively low whiff rate for most of the year and has recently seen his strikeout rate plummet.

Posted by: Lucas Apostoleris at August 19, 2010 6:48 AM

Jeremy, I don't quite understand. How did you compute "count lwts?" Each single count in an at bat times the weight? That is, if a plate appearance goes to 3-2, did you count all four previous counts? What did you do with the final pitch?

Posted by: studes at August 19, 2010 7:48 AM

I guess I am a bit confused as well.

What is the difference between the "count lwts" and "total lwts"?

How were they computed, as studes asked. Thanks.

Posted by: Erik at August 19, 2010 9:29 AM

Studes, sorry I wasn't clear.

If a PA goes to 3-2, I counted it as 0.062 runs. If you add the four previous counts, you will still eventually get to 0.062. I didn't count the final pitch, since that pitch is what we already know from runs created or linear weights.

Posted by: Jeremy Greenhouse at August 19, 2010 11:40 AM

I'm still not clear on what you are doing. For example, the value for ending the PA on the 1st pitch is 0.000--yet the OPS+ for these PAs is 130 or so.

And the link you provide for not trying to predict K rate from contact rate is a thick piece on multicollinearity, which will only be understood by people with a better statistics background than many of your readers. Why not just state the principle in simple English?

Posted by: dave at August 19, 2010 3:32 PM

I think I get it. Instead of giving the batter who puts a ball in play on the first pitch the league avg result of that, you are giving him the expectancy right before that, which is of course 0 lwts runs.

Not sure if this is the best way to do it.

Posted by: dave at August 19, 2010 3:37 PM

Dave,

Sorry if you took offense to my being glib in directing people to the concept of multicollinearity. It was meant as a throwaway line.

I'm wholly open to the thought that the method I used is completely wrong. I do think the concept to using the final count of each plate appearance has merit in some way or another, so if you have any ideas on how to convert that into runs, I'd love to hear it.

Posted by: Jeremy Greenhouse at August 19, 2010 3:56 PM

Well, why not give the batter credit for the average result of a PA ending on a given count? For example, in 2009 my quick calc shows that a PA ending on the first pitch was worth around +.04 lwts runs. Give the guy the .04 'count runs', whether or not his physical talent (and luck etc) actually resulted in his gaining .07, or losing .02, on the 1st pitch.

Most analysts like to use the 'through' counts as the preferable set of numbers, but IMO all that really matters is the end result, which are the 'at' counts. If a batter ends a PA on 1-2, who cares if he went thru the favorable 1-0 count on the way there--he ended the PA on (on average) a mediocre pitch to hit.

Posted by: dave at August 19, 2010 4:31 PM