Baseball BeatFebruary 20, 2009
Using Z-Scores to Rank Pitchers
By Rich Lederer

If you're not a stathead, fantasy geek, or a baseball nerd, then you might want to skip ahead to the rankings of pitchers in the middle and at the bottom of this article. Or you just may want to skip this article altogether and check out Deadspin, the Onion, or read the latest story or opinion on Alex Rodriguez and his cousin.

You see, I've been sorting and manipulating spreadsheets on the computer in my parents' basement (kind of embarrassing when you're 53) for the past several days. However, I'm not only planning on seeing the light of day this afternoon, I will be one of the fortunate souls who will attend two season openers today: Stephen Strasburg and San Diego State are facing Bethune-Cookman at 2 p.m. PT at the MLB Youth Academy in Compton and the Dirtbags are meeting the Trojans at 6:30 p.m. at Dedeaux Field on the campus of USC. I'll be sure to trade in my pajamas and green eyeshade for a pair of jeans and a Long Beach State (my hometown team) and USC (my college) baseball caps.

In the meantime, thanks to loyal reader and baseball enthusiast Ryan Thibodaux, I have developed a system to rank pitchers based on their strikeout, walk, and groundball rates. I had categorized pitchers by K and GB rates last week before adding BB to the mix earlier this week. The K and GB rankings were grouped in quadrants while the K-BB-GB rankings were presented in eight different sets.

On average, we know that pitchers in the northeast quadrant and those with above-average K-BB-GB rates fared better than their peers, yet many of the top hurlers fell into the southeast quadrant despite sporting strikeout rates – the most important variable of the three – that were superior to many of their counterparts in the more tony neighborhood of the NEQ. So which one is better? A pitcher with above-average K and GB/K-BB-GB rates or one with an outstanding K rate and more modest BB-GB rates?

To help answer that question, Ryan posted a spreadsheet with z-scores on a fantasy baseball website that linked to one of my articles above. After reading the thread and a comment that he left on our site, I contacted him and proposed that he weight the three variables by their relative impact rather than evenly. The deltas in above-average and below-average ERA and R (vs. their means) for each of the various classifications as well as the individual K, BB, and GB correlations to ERA and RA suggested to me that strikeout rates were nearly two times as important as walk rates and five times as important as groundball rates. The best-fit ratio was approximately 5:3:1 or 5:2.5:1.

If you're one of the statheads, fantasy geeks, or baseball nerds still with me, here are the correlation coefficients for strikeout, walk, and groundball rates to ERA and RA for the universe of 135 starting pitchers with 100 or more innings last year:

           K          BB          GB
ERA     -0.5786     0.3306     -0.1121
RA      -0.5918     0.3118     -0.0796

Using standard deviations (4.32% for K, 2.29% for BB, and 6.70% for GB), Ryan created z-scores (which indicate how many standard deviations an observation is above or below the mean) and then weighted them using the 5:2.5:1 ratios as mentioned above. The latter produced correlations of -0.7228 for ERA and -0.7203 for RA. By squaring these correlations, we produce coefficient of determinations (R²) that provide measures of how well outcomes are predicted by the model. Accordingly, the 5:2.5:1 weighting explains about 50 percent of a pitcher's ERA and RA, which is incredibly high given that team defense accounts for the lion's share of the unexplained balance. While we can improve the R² by substituting HR rates for GB, the former is not as reliable as the latter in terms of predicting future performance.

The K-BB-GB rates and z-score rankings can be accessed in this spreadsheet. The 135 qualifying pitchers were separated in quintiles by color. As such, there are 27 starters in each grouping or about one per team. If you'd like, think in terms of each quintile as No. 1s, No. 2s, No. 3s, No. 4s, and No. 5s in starting rotations. The reality is that front, middle, and back of the rotation starters are determined based on quality (which is the sole determinant of these rankings) and quantity (ability to pitch every fifth day, go deep into games, and amass a lot of innings over the course of a season).

The top quintile is presented below.


Tim Lincecum, CC Sabathia, Dan Haren, Roy Halladay, A.J. Burnett, Cliff Lee, Brandon Webb, Mike Mussina, Chad Billingsley, Roy Oswalt, Edinson Volquez, Derek Lowe, James Shields, and Ryan Dempster are all residents of the northeast quadrant. Sabathia, Haren, Halladay, Lee, Webb, Mussina, Oswalt, Lowe, and Shields are nine of the dozen pitchers from the K+/BB+/GB+ grouping. The other three are John Lackey, who heads up the second quintile; Andy Pettitte, ranked fourth in the second quintile and 31st overall; and Jon Lester, another member of the second quintile.

For purposes of illustration, I have included Lincecum's z-scores for K/BF and BB/BF (top row) and GB (bottom row) below. The colored portion of the normal distribution represents the area of probability. (You can compute your own z-scores in this applet.)



Lastly, here are the top 30 relievers as measured by z-scores.


Mariano Rivera and Jonathan Papelbon 1-2. There must be something to this methodology.


Nice article. The spreadsheet did not upload, but I will blame my computer rather than your file.

I've been pushing the strikeouts and groundball pitcher type for a long time too, so I'm not complaining, but this is basically just xFIP with a slightly different proxy for batted ball stats, no?

I grabbed the data from THT, stuck in the spreadsheet, and the correlation between Weighted Z and xFIP was -.93. That's about as strong as an inverse correlation as you're going to get from two things that aren't completely identical.

Z-scores ... I love them. So simple, yet so informative.

Clearly you have to choose the best stats for it to work correctly, which is done here, though the sample size of each pitcher may need adjusting (see Grant Balfour). I've been tempted to create a multivariate effeciency control chart with similar metrics, but with weightings of my creation. However, I don't have the time to work on this ... just throwing it out there.

David: Yes. I didn't compute the correlation between the weighted z-scores and xFIP but noticed the strong similarity when comparing the two rankings. However, I had never seen anyone place an actual weighting on these variables but apologize if it is something I've overlooked. Furthermore, I like my presentation (showing the K, BB, and GB rates in percentage terms, particularly compared to the means, as well as detailing the standard deviations for each variable) and believe it is more comprehensive and revealing for those who wish to study the whys and wherefores of ERA, RA, FIP, or xFIP.

As a roto "not so geek", this stuff is genius in identifying Highly skilled pitchers and pockets of value. I mean, Mike Adams? Whos going to bid on Mike Adams? I might now after I look at him further.

Nice work!

This is probably inane (or just wrong) to somebody with a deep understanding of statistics, but...12.3 standard deviations from the norm?! That's way the hell out there, no? If these skills were distributed normally, that would be like the 99.9999999th percentile or something. Wow.

Indeed, this is a fine piece of work and very helpful.
I believe, however, that Mike Adams is currently hurt and not expected to pitch much this season.

@Mark R: Keep in mind that you're looking at weighted Z-scores in that final column, so Lincecum is not 12.3 standard deviations from the mean.

He is, however, 2.70 standard deviations from the mean with regard to K/BF. Weighted at 5x, that puts his weighted Z for K/BF at 13.5. Then he loses a little bit due to his walk rate (-.51 * 2.5 = -1.275) and gains a mere 0.07 back from his GB% Z-score (weighted at 1x). Totaled, that's how we get 12.30.

wish it included last three years..


I like your line of thinking...


Thanks for the clarification.

What significance does the weighted z-score end up having, though? Isn't it supposed to approximate the pitcher's total effectiveness relative to his peers?

@Mark R: Yes, these Weighted Z's provide a measure of overall skill relative to other pitchers. However, that measure doesn't have a direct relationship to mean and standard deviation any longer once weightings are applied and totaled.

Notice that if you do a simple sum of Z-scores in individual skill categories, C.C. Sabathia actually has a higher cumulative Z-score than Lincecum (3.17 to 2.26). But, as Rich explained, K/BF is about 5 times as important as GB% and BB/BF is about 2.5 times as important as GB%, hence the 5:2.5:1 weightings.

Even though C.C. had better walk and ground ball rates in 2008, Lincecum's superior K-rate brings his weighted Z-score total above C.C.'s.

This is a very good method to identify the best pitchers. I use a similar method when I approximate fantasy baseball values. But, what does this add? It's a solid method and extremely well done, but I'm not sure I understand why this is any different than just ranking pitchers by QERA or adjusting FIP for neutral HR/FB numbers. It seems like somewhat of a lateral move, albeit an effective method with probably solid implications for other evaluations. Is there something I'm missing? Even if I'm not, it's definitely a very solid method, and I'm sure it's useful in many contexts. Well done.

Very nice! This is easy to understand intuitively, and the single-score summation gives a very nice way of ranking and comparing the pitchers.

If I could suggest one improvement: Instead of just selecting weights for the parameters ad hoc, it might be more effective to run a regression of the three weights on RA or ERA and use the parameters estimated by the regression (or some factor of them if you want to scale them) for the weights. It will probably improve your correlation between weighted mean and RA/ERA and might make it easier to explain where the weights came from.

are these adjusted for league and/or park? thanks.

The weighs were not selected ad hoc. I calculated and provided the correlations on each of the three variables as well as various weighted combinations. The latter correlations max out at just over .72 with the best fit at approximately 5-3-1 or 5-2.5-1, both of which produced almost identical correlations.

No, the K, BB, and GB rates, standard deviations, and z-scores are not adjusted for league or park effects.

Just out of curiosity what program are you using to generate the probability graphs?

This is really cool!! Do you think there is a way to calculate a team's z score?

Ross: The link to the applet for creating those probability graphs was provided directly above the images.

Amy: Sure. We didn't calculate z-scores for teams but the idea would apply at the club level as well. The spreadsheet is available via a link in the article should you wish to pursue this matter.

I ment to do a z-score for the entire team, not just the pitchers. I am actually doing a presentation for my MBA on the theory if the Phillies can win the world series title again in 2009. I have done a complete anaylsis on the stats from 2008 along with payroll and manager performamce, but was wondering if a z score could be done, and if so would it be a factor.I used the Cubs, Red Sox and rays in this anaylsis. Any help would be great.