Using Z-Scores to Rank Pitchers
If you're not a stathead, fantasy geek, or a baseball nerd, then you might want to skip ahead to the rankings of pitchers in the middle and at the bottom of this article. Or you just may want to skip this article altogether and check out Deadspin, the Onion, or read the latest story or opinion on Alex Rodriguez and his cousin.
You see, I've been sorting and manipulating spreadsheets on the computer in my parents' basement (kind of embarrassing when you're 53) for the past several days. However, I'm not only planning on seeing the light of day this afternoon, I will be one of the fortunate souls who will attend two season openers today: Stephen Strasburg and San Diego State are facing Bethune-Cookman at 2 p.m. PT at the MLB Youth Academy in Compton and the Dirtbags are meeting the Trojans at 6:30 p.m. at Dedeaux Field on the campus of USC. I'll be sure to trade in my pajamas and green eyeshade for a pair of jeans and a Long Beach State (my hometown team) and USC (my college) baseball caps.
In the meantime, thanks to loyal reader and baseball enthusiast Ryan Thibodaux, I have developed a system to rank pitchers based on their strikeout, walk, and groundball rates. I had categorized pitchers by K and GB rates last week before adding BB to the mix earlier this week. The K and GB rankings were grouped in quadrants while the K-BB-GB rankings were presented in eight different sets.
On average, we know that pitchers in the northeast quadrant and those with above-average K-BB-GB rates fared better than their peers, yet many of the top hurlers fell into the southeast quadrant despite sporting strikeout rates – the most important variable of the three – that were superior to many of their counterparts in the more tony neighborhood of the NEQ. So which one is better? A pitcher with above-average K and GB/K-BB-GB rates or one with an outstanding K rate and more modest BB-GB rates?
To help answer that question, Ryan posted a spreadsheet with z-scores on a fantasy baseball website that linked to one of my articles above. After reading the thread and a comment that he left on our site, I contacted him and proposed that he weight the three variables by their relative impact rather than evenly. The deltas in above-average and below-average ERA and R (vs. their means) for each of the various classifications as well as the individual K, BB, and GB correlations to ERA and RA suggested to me that strikeout rates were nearly two times as important as walk rates and five times as important as groundball rates. The best-fit ratio was approximately 5:3:1 or 5:2.5:1.
If you're one of the statheads, fantasy geeks, or baseball nerds still with me, here are the correlation coefficients for strikeout, walk, and groundball rates to ERA and RA for the universe of 135 starting pitchers with 100 or more innings last year:
K BB GB
ERA -0.5786 0.3306 -0.1121
RA -0.5918 0.3118 -0.0796
Using standard deviations (4.32% for K, 2.29% for BB, and 6.70% for GB), Ryan created z-scores (which indicate how many standard deviations an observation is above or below the mean) and then weighted them using the 5:2.5:1 ratios as mentioned above. The latter produced correlations of -0.7228 for ERA and -0.7203 for RA. By squaring these correlations, we produce coefficient of determinations (R²) that provide measures of how well outcomes are predicted by the model. Accordingly, the 5:2.5:1 weighting explains about 50 percent of a pitcher's ERA and RA, which is incredibly high given that team defense accounts for the lion's share of the unexplained balance. While we can improve the R² by substituting HR rates for GB, the former is not as reliable as the latter in terms of predicting future performance.
The K-BB-GB rates and z-score rankings can be accessed in this spreadsheet. The 135 qualifying pitchers were separated in quintiles by color. As such, there are 27 starters in each grouping or about one per team. If you'd like, think in terms of each quintile as No. 1s, No. 2s, No. 3s, No. 4s, and No. 5s in starting rotations. The reality is that front, middle, and back of the rotation starters are determined based on quality (which is the sole determinant of these rankings) and quantity (ability to pitch every fifth day, go deep into games, and amass a lot of innings over the course of a season).
The top quintile is presented below.
Tim Lincecum, CC Sabathia, Dan Haren, Roy Halladay, A.J. Burnett, Cliff Lee, Brandon Webb, Mike Mussina, Chad Billingsley, Roy Oswalt, Edinson Volquez, Derek Lowe, James Shields, and Ryan Dempster are all residents of the northeast quadrant. Sabathia, Haren, Halladay, Lee, Webb, Mussina, Oswalt, Lowe, and Shields are nine of the dozen pitchers from the K+/BB+/GB+ grouping. The other three are John Lackey, who heads up the second quintile; Andy Pettitte, ranked fourth in the second quintile and 31st overall; and Jon Lester, another member of the second quintile.
For purposes of illustration, I have included Lincecum's z-scores for K/BF and BB/BF (top row) and GB (bottom row) below. The colored portion of the normal distribution represents the area of probability. (You can compute your own z-scores in this applet.)
Lastly, here are the top 30 relievers as measured by z-scores.
Mariano Rivera and Jonathan Papelbon 1-2. There must be something to this methodology.