![]() |
Categorizing Pitchers by Batted Ball Types and Strikeout Rates
We all know that strikeouts are the best outcome for a pitcher. When a batter fails to put the ball in play, there is little or no chance for him to reach base or to advance runners on base. Among batted ball types, infield flies are the least harmful, followed by ground balls, outfield flies, and line drives. Although groundballs result in a higher batting average than fly balls, their run impact is lower because the hits are usually limited to singles and an occasional double down the first or third base line, whereas balls in the air that turn into hits are almost always doubles, triples, or home runs. According to Dave Studenmund's Batted Balls Redux article in The Hardball Times Baseball Annual 2007, strikeouts had a run impact of -0.113, infield flies -0.088, groundballs 0.045, outfield flies 0.192, and line drives 0.391 per incident last year. Based on the above information, it follows that just as pitchers with high strikeout rates would generally fare better than those with low rates, pitchers with high groundball rates would normally fare better than those with low rates (all else being equal). Furthermore, it also suggests that pitchers who combine higher strikeout and groundball rates will outperform those with lower rates. To provide a visual aid to categorize such pitchers, I created a graph (with the help of David Appelman of FanGraphs), plotting the strikeout and groundball rates for everyone in the major leagues who completed 100 or more innings and started in at least 33% of their appearances. The y-axis is groundball percentage (GB%) and the x-axis is strikeouts per batter faced (K/BF). The graph is divided into four quadrants with the mid-point equal to the average GB% of 43.80% and average K/BF of 15.88%. The northeast quadrant is comprised of pitchers with above-average groundball and strikeout rates; the southeast quadrant encompasses pitchers with above-average strikeout and below-average groundball rates; the northwest quadrant is made up of pitchers with above-average groundball and below-average strikeout rates; and the southwest quadrant is the home for pitchers with below-average groundball and strikeout rates. The average ERA among all qualifiers was 4.44. The average ERA for starters with above-average K rates was 4.12. The average ERA for those with below-average K rates was 4.78. Similarly, the average ERA for starters with above-average GB rates was 4.24, while the average ERA for those with below-average GB rates was 4.62. The average ERA by quadrants: W E N 4.53 3.94 S 5.01 4.27 Not surprisingly, pitchers with the highest strikeout and groundball rates had the lowest average ERA (3.94), while those with the lowest K and GB rates had the highest average ERA (5.01). In the hybrid categories, pitchers with above-average strikeout and below-average groundball rates (4.27) beat those with below-average K and above-average GB rates (4.53). Looking at the outliers in the graph would help us reach the same conclusion. There isn't a one of us who wouldn't take Brandon Webb, Felix Hernandez, Chris Carpenter, or Francisco Liriano over Scott Elarton or Runelvys Hernandez. Liriano, in fact, arguably had the best combination of K and GB rates, while Elarton had the worst combo. Lo and behold, Liriano had the lowest ERA (2.16) among starters with at least 100 IP and Elarton had one of the worst (5.34). Let's take a closer look at the results, starting with the northeast quadrant and going clockwise. The pitchers are sorted by K/BF rates in the first three tables and by GB% in the last table. NORTHEAST QUADRANT (ABOVE-AVG GB AND K RATES) Name GB% K/BF Francisco Liriano 55.33% 30.44% Carlos Zambrano 46.88% 22.90% Brett Myers 45.55% 22.69% Roger Clemens 49.02% 22.62% Jeremy Bonderman 48.17% 22.37% John Smoltz 46.29% 21.98% Scott Olsen 44.80% 21.81% Felix Hernandez 57.72% 21.57% C.C. Sabathia 45.05% 21.45% Chris Carpenter 53.34% 20.54% A.J. Burnett 50.49% 20.45% Erik Bedard 48.81% 20.26% Josh Johnson 45.77% 20.18% Adam Loewen 48.48% 19.44% Andy Pettitte 49.77% 19.16% Dave Bush 46.65% 19.10% Danny Haren 45.24% 18.92% Brandon Webb 66.48% 18.74% Kelvim Escobar 44.70% 18.63% Roy Oswalt 48.80% 18.53% Josh Beckett 45.10% 18.18% Vicente Padilla 44.10% 17.89% Doug Davis 44.08% 17.59% Cory Lidle 49.72% 17.45% Kevin Millwood 44.57% 17.31% Dontrelle Willis 47.54% 16.41% Jose Contreras 44.62% 16.09% Wandy Rodriguez 44.95% 16.04% That's an elite group of pitchers. The best of the bunch are those with strikeout rates above 20% and/or groundball rates over 50%. I had already signaled out Hernandez, Carpenter, and Liriano, but check out A.J. Burnett. The latter is one of the premier pitchers in baseball when healthy. While Jose Contreras and Wandy Rodriguez fall into the NE quadrant, both pitchers have K and GB rates that are close to the league average. As such, I would be reluctant to label either one as a special pitcher. On the other hand, I would be shocked if Hernandez (4.52), Adam Loewen (5.37), and Josh Beckett (5.01) don't lower their ERA by at least 0.50 and perhaps by more than 1.00 in 2007. With Liriano injured, I would rank Felix directly behind Johan Santana as the pitcher most likely to lead the AL in ERA this year. SOUTHEAST QUADRANT (BELOW-AVG GB AND ABOVE-AVG K RATES) Name GB% K/BF Ben Sheets 40.41% 26.98% Scott Kazmir 41.97% 26.72% Johan Santana 40.59% 26.54% Cole Hamels 38.86% 25.99% Jake Peavy 38.00% 25.41% Pedro Martinez 36.29% 24.91% Daniel Cabrera 40.72% 23.72% Orlando Hernandez 33.78% 23.46% Chris Young 25.37% 22.31% Curt Schilling 39.83% 21.94% Matt Cain 35.98% 21.88% Aaron Harang 38.69% 21.75% Jered Weaver 30.03% 21.43% Mike Mussina 42.37% 21.39% Javier Vazquez 39.84% 21.10% Ian Snell 42.78% 20.79% John Lackey 42.99% 20.61% Jason Schmidt 37.40% 20.13% Ted Lilly 37.73% 20.08% Boof Bonser 41.69% 20.05% Randy Johnson 41.71% 20.00% Oliver Perez 30.12% 19.28% James Shields 42.75% 19.26% Gil Meche 43.12% 19.24% Byung-Hyun Kim 41.51% 18.72% Chris Capuano 39.88% 18.59% Bronson Arroyo 38.15% 18.55% Brad Penny 43.54% 18.20% Chuck James 27.67% 18.06% Kyle Lohse 42.89% 17.11% Ervin Santana 38.41% 16.67% Claudio Vargas 39.86% 16.47% Ricky Nolasco 38.84% 16.15% Taylor Buchholz 43.73% 16.08% Rodrigo Lopez 42.61% 16.06% Justin Verlander 41.72% 15.98% Barry Zito 38.20% 15.98% Ryan Madson 42.79% 15.97% There are a couple of dozen outstanding pitchers in this group, most notably those listed in the top half (or with K rates over 20%). Ben Sheets, Scott Kazmir, and Santana had almost identical K and GB rates. Cole Hamels and Jake Peavy rank just below this threesome with metrics not too dissimilar from one another or those immediately above them. Oliver Perez (6.55) has the most room to shave a couple of runs off his ERA. Having gone to bat for Daniel Cabrera (4.74) last year, I hesitate to bring up his name again but he and Ian Snell (4.74) are good bets to show improvement in 2007. SOUTHWEST QUADRANT (BELOW-AVG GB AND K RATES) Name GB% K/BF Chan Ho Park 43.60% 15.84% Jason Jennings 43.77% 15.74% Victor Santos 43.64% 15.52% Brett Tomko 37.47% 15.48% Tim Wakefield 39.39% 14.75% Freddy Garcia 41.19% 14.72% Cliff Lee 32.70% 14.63% Esteban Loaiza 42.10% 14.29% Odalis Perez 43.72% 14.14% Jon Lieber 42.96% 14.01% Tony Armas Jr. 38.52% 14.00% Jeff Weaver 38.99% 13.90% Eric Milton 30.80% 13.60% Jaret Wright 38.30% 13.44% Livan Hernandez 36.58% 13.35% Michael O'Connor 36.17% 12.97% Jarrod Washburn 39.87% 12.73% Joe Blanton 43.13% 12.50% Jae Seo 35.40% 12.45% Jon Garland 42.11% 12.44% Noah Lowry 36.38% 12.19% Josh Fogg 42.55% 12.16% Jamie Moyer 40.03% 12.08% Seth McClung 37.25% 12.07% Brad Radke 41.64% 12.05% Shawn Chacon 32.70% 12.02% Ramon Ortiz 40.81% 11.94% Woody Williams 35.69% 11.54% Kris Benson 41.32% 11.27% Jason Marquis 42.88% 11.03% John Koronka 42.30% 11.01% Paul Byrd 38.52% 10.93% Steve Trachsel 41.52% 10.73% Runelvys Hernandez 38.58% 9.84% Scott Elarton 29.49% 9.78% Carlos Silva 43.62% 8.63% This is the quadrant that you want to avoid. It is inhabited by some of the worst starters in the game. If you fail to miss bats and don't keep the ball on the ground when it is put into play, you are going to run into trouble. There is basically only one way to survive in this quadrant: throwing strikes and maintaining a low walk rate. Freddy Garcia, Jon Lieber, Jon Garland, Brad Radke, Paul Byrd, and Carlos Silva fit this description. But these types of pitchers live on the edge with very little margin for error. Chan Ho Park, Jason Jennings, and Victor Santos were near league average in K and GB rates and should be classified more like Jose Contreras and Wandy Rodriguez (both of whom fell in the NE quadrant) than the rest of their SW brethren. NORTHWEST QUADRANT (ABOVE-AVG GB AND BELOW-AVG K RATES) Name GB% K/BF Derek Lowe 67.04% 13.47% Chien-Ming Wang 62.80% 8.44% Jake Westbrook 60.80% 12.06% Jason Johnson 59.02% 10.25% Jamey Wright 58.06% 11.69% Aaron Cook 57.77% 10.05% Tim Hudson 57.66% 14.70% Roy Halladay 57.33% 15.07% Kirk Saarloos 54.00% 9.49% Clay Hensley 53.87% 15.50% Paul Maholm 53.05% 14.85% Miguel Batista 51.66% 12.09% Zach Duke 51.12% 12.51% Greg Maddux 50.80% 13.57% Kenny Rogers 50.07% 11.66% Luke Hudson 49.10% 14.55% Mark Hendrickson 47.59% 13.77% Joel Pineiro 47.46% 11.55% Sean Marshall 46.81% 13.68% Nate Robertson 46.75% 15.55% Jeff Suppan 46.65% 12.43% Casey Fossum 45.88% 14.81% Aaron Sele 45.76% 12.64% Matt Morris 45.66% 12.96% Brian Moehler 45.25% 10.43% Jeff Francis 44.67% 13.88% Anibal Sanchez 44.61% 15.35% Mark Redman 44.41% 10.27% Mark Buehrle 44.35% 11.19% Tom Glavine 44.28% 15.56% Elizardo Ramirez 44.00% 14.84% Enrique Gonzalez 43.84% 14.29% This is an interesting group of pitchers. As a whole, they rank well behind those in the NE quadrant and well ahead of those in the SW quadrant. Although they are the opposite of the pitchers in the SE quadrant, their results (in terms of ERA) are the most similar. The two groups just get there in drastically different ways. The NW pitchers succeed by inducing grounders and keeping the ball in the park, whereas the SE hurlers thrive on strikeouts. Derek Lowe, Chien-Ming Wang, and Jake Westbrook are the biggest outliers - three pitchers who turned more than 60% of batted balls into grounders. As such, Lowe (0.58), Wang (0.50), and Westbrook (0.64) had extraordinarily low HR/9 rates. They also have one other common thread: low walk rates. Lowe (2.27), Wang (2.15), and Westbrook (2.34) offset their high hit rates by limiting the number of bases on balls. Roy Halladay, Clay Hensley, Nate Robertson, Anibal Sanchez, and Tom Glavine - all with K/BF rates exceeding 15% - are within a few whiskers of being in the NE group. However, Sanchez and Glavine are close to league average with K rates just to the west and GB% slightly to the north of the means. I would rather know a pitcher's strikeout and groundball rates than his ERA. Throw in a third dimension - walk rates - and you have almost everything you need to know about a pitcher. Focusing on the components gives one a much more comprehensive understanding of a pitcher's upside and downside than looking at a single metric such as ERA. (Notes: I could have chosen run average (RA) rather than earned run average (ERA), but the results would have been essentially the same in both direction and magnitude. As previously demonstrated, groundball pitchers generally give up a greater percentage of unearned runs because more errors (of the fielding and throwing type) are committed on grounders than balls hit in the air. Ballpark factors, team defense, and the level of competition may affect the components and/or ERA to varying degrees.) Tomorrow: Categorizing Relievers by Batted Ball Types and Strikeout Rates. |
Comments
It's really tricky, Rich. ERA measures some qualities of a pitcher, not covered by the components, controlling the running game (which impacts DP rates)and ability to pitch from the stretch being the two major ones.
If one is looking only at one year's worth of statistics, I'd rather know the component data. Over the longer haul, I'd rather know the ERA but the component data is still very useful in understanding seasonal changes which may be indicative of longer term growth or decline.
Posted by: Mike Green at January 8, 2007 9:05 AM
I agree that "ERA measures some qualities of a pitcher not covered by the components" but it is also affected by areas outside the control of the pitcher (such as team defense and bullpen support).
It basically comes down to one's preference for FIP or DIPS vs. ERA. However, rather than using HR rate as in FIP/DIPS, I like GB% because it has a strong correlation with HR rate and GIDP rate.
ERA may do a better job explaining a pitcher's past performance, but I believe the components do a better job at predicting future performance.
I also like categorizing pitchers in this manner because you can get a better feel for the "how" and "why" they differ from others. In addition, the quadrants are visually useful to me.
Posted by: Rich Lederer at January 8, 2007 9:27 AM
I love looking at graphs....great way to really look at data and see information.
I wonder is you could take the 10-15 pitcher that fell most closely around the center of the quadrants and averaged their ERA's what that would get you and who they would be.
If I have a second, I'll give it a try. Looking forward to seeing the relievers version of this.
Tim
Red Sox Times
Posted by: Tim at January 8, 2007 1:48 PM
~~~According to Dave Studenmund's Batted Balls Redux article in The Hardball Times Baseball Annual 2007, strikeouts had a run impact of -0.113, infield flies -0.088, groundballs 0.045, outfield flies 0.192, and line drives 0.391 per incident last year.~~~
Rich, Dave, or anyone -
Don't Run Expectancy tables say that a whiff (K) by a batter is no worse than any other form of out? If so, why does it favor the pitcher, in terms of preventing runs, by getting the whiff?
I know that K's are fielding independent, etc. And, that's why, etc.
So, then why does every sabermetric expert say that K's by a batter are no big deal?
This has always bothered me. Anything you can share would be appreciated. Thanks.
Posted by: Steve Lombardi at January 8, 2007 1:56 PM
I would also be interested to know what the variance looked like in each of these quadrants in ERA.
Was any group more clustered around a mean than another. Meaning, Quandrant A is high risk, high reward... with upside higher than Quadrant B where everyone is clustered together and there is consistency/predictability in that type of pitcher?
Tim
Red Sox Times
Posted by: Tim at January 8, 2007 2:13 PM
Thanks, Tim.
Steve - The batted ball info encompasses hits and outs. Not all infield/outfield flies and groundballs turn into outs, whereas 99.9% (or thereabouts) of all strikeouts result in an out.
It should also be pointed out that some batted balls, even when they result in the batter being put out, advance runners on base. Therefore, these so-called "productive outs" are more valuable than striking out. Hope that helps.
Posted by: Rich Lederer at January 8, 2007 2:21 PM
Steve, the reason is that the Run Expectancy tables are measuring different things for batters versus pitchers.
For pitchers, they're measuring a virtually certain out (K), versus balls in play which *might* be turned into outs (and generally are, at a rate of ~70% or so overall). That still gives a strikeout a bonus in those outcomes, because it is converted to an out 99+% of the time, and the others aren't.
For batters, you're measuring what kind of out, not whether or not it was an out. The batter's first job is not to make an out; if he makes an out, it usually doesn't matter what kind. (The less-supported extension is that high strikeout totals aren't bad, they're good; that turns out to be true if the hitter also has high walk rates, high fly-ball percentages, and high fly-ball/HR ratios. In other words, it's true when the hitter is taking an approach that maximizes both not-making-outs and hitting-the-ball-hard. This is confusing cause and effect; not-making-outs and hitting-ball-hard are good, high strikeout totals can be an effect of that, but don't cause it.)
If you break the RunExpectancy tables down for kind of out, there is very little difference i.e. going from 1st-2nd-one-out to 1st-3rd-two-outs is the big difference, not 1st-2nd-one-out to 1st-3rd-two-outs-where-second-out-was-grounder.
Finally, if you break down batted-ball data for hitters, you will find the same sorts of correlations i.e. that hitters as a group do better when they put the ball in play or hit a HR than when they strike out, because a strikeout converts to an out effectively all the time, and the others don't.
Posted by: Subrata Sircar at January 8, 2007 2:37 PM
I took the fourty players clustered around the median in both categories and marked them. I pulled the list of players that were marked in both categories and grouped them. This should be the twenty players closest to the center
Name GB% K/BF ERA
Anibal Sanchez 44.61% 15.35% 2.83
Justin Verlander 41.72% 15.98% 3.63
Jason Jennings 43.77% 15.74% 3.78
Tom Glavine 44.28% 15.56% 3.82
Jose Contreras 44.62% 16.09% 4.27
Brad Penny 43.54% 18.20% 4.33
Vicente Padilla 44.10% 17.89% 4.50
Kevin Millwood 44.57% 17.31% 4.52
Chan Ho Park 43.60% 15.84% 4.81
Esteban Loaiza 42.10% 14.29% 4.89
Doug Davis 44.08% 17.59% 4.91
Jon Lieber 42.96% 14.01% 4.93
Elizardo Ramirez 44.00% 14.84% 5.37
Enrique Gonzalez 43.84% 14.29% 5.67
Ryan Madson 42.79% 15.97% 5.69
Victor Santos 43.64% 15.52% 5.70
Kyle Lohse 42.89% 17.11% 5.83
Taylor Buchholz 43.73% 16.08% 5.89
Rodrigo Lopez 42.61% 16.06% 5.90
Odalis Perez 43.72% 14.14% 6.20
This group's average ERA is 4.87. Certainly nothing to write home about. I wonder what that says about Sanchez and Verlander?
Posted by: Tim at January 8, 2007 3:11 PM
on the other hand, the "strikeout is worse" theory is somewhat mitigated by the fact that putting the ball into play (with a runner on base) puts you at risk of hitting into a double-play, which is worse than a strikeout.
Posted by: Vishal at January 8, 2007 4:44 PM
This group's average ERA is 4.87. Certainly nothing to write home about. I wonder what that says about Sanchez and Verlander?
Sanchez had an unsustainably low BABIP of .244 last year. I would look for his ERA to jump a full run, if not more. Pitching half of his games in Florida may prevent his ERA from regressing all the way back to the league average for all qualifiers of 4.44 or the NW quadrant average of 4.53 or the 20 pitchers "closest to center" (as you called them) of 4.87. (BTW, my guess is that the weighted-average ERA of the latter group is better than 4.87 as the pitchers with the best ERA also were those with the most IP.)
Verlander, on the other hand, is perhaps the most difficult pitcher in the majors to steal bases against. He only allowed one SB vs. five CS. The rookie was also 21st in the bigs with 21 GIDP. I would hesitate to bet against Verlander given his raw talent but would be surprised if he lowered his ERA without a corresponding level of improvement in his component stats.
Posted by: Rich Lederer at January 8, 2007 5:16 PM
The one thing this study, and most others, don't have a way to evaulate is pithcers that only go for a strikeout when they need to. Verlander and Carpenter both seem to fit into that mold. Verlander seems to have a lot of "stuff" in reserve for when he needs it. Carpenter like to paint the corners with sinkers trying to get an easy out unless there are RISP.
Could we add a 3rd dimension based on pitches per PA? The idea being that the ability to combine a high strikeout rate with a low number of pitches per PA is more valuable than the same K rate with a high pitch count.
Posted by: Jason at January 8, 2007 5:32 PM
Don't Run Expectancy tables say that a whiff (K) by a batter is no worse than any other form of out? If so, why does it favor the pitcher, in terms of preventing runs, by getting the whiff?
Steve -- no, the run expectancy tables I use say that a strikeout is worse than a generic out. The best primary source I can think of is Tom Ruane's excellent Retrosheet article at...
http://www.retrosheet.org/Research/RuaneT/valueadd_art.htm
As you can see, a strikeout results in -.238 runs, while a batted ball out (other than double plays) results in -.200 runs.
Posted by: studes at January 8, 2007 6:02 PM
great point Rich...weighted average ERA for the group closest to the center is 4.77 (as long as my quick math is right).
Posted by: Tim at January 8, 2007 7:13 PM
Very cool, Rich. Thanks. A picture says a...well, you know.
Anywho. Here's an idea for even more work for someone other than me to do. Take last year's data. Overlay it on this year's data. Use those cool arrow things studes used in his recent "But I Regress" column to show the changes in a (select group of) pitcher's plots/tendencies. See if you can find the same data for 2004. Rinse, wash, repeat.
You know, that kind of thing.
Thanks again.
Yours in Cyle Hankerd,
Kent
Posted by: Kent Bonham at January 9, 2007 7:04 AM
I agree that RA, not ERA, should be used.
***
RE can be found here:
http://www.tangotiger.net/RE9902event.html
The run value of the K is around .01 runs worse than a regular out (see last line). This link also gives you the run value of the K and other outs by base/out state. K with man on 3b and less than 2 outs is great for a pitcher.
Posted by: tangotiger at January 9, 2007 7:25 AM
Thanks, Tango.
K with man on 3b and less than 2 outs is great for a pitcher.
Agree. That is one of the biggest differentiating characteristics of pitchers. Conversely, as it relates to hitters, putting the ball in play in such situations (as opposed to striking out) is important, too.
Posted by: Rich Lederer at January 9, 2007 7:51 AM
I was wondering if you did something similar to this with batting. I did something kind of like this on the batting standpoint, and found out that the two statistics with the closest relevance to runs were SLG, and POS. Of course I only used the basic statistics, and nothing as in-depth. I used 7 years along baseball's history to get these results. It was actually interesting.
Posted by: Michael at January 9, 2007 4:45 PM
Wondering what people think of Chris Young. Can he remain a top starter for the Padres with a GB% around 25%? Crazy outliers like Wang and Young kinda freak me out.
Posted by: Jurgen at January 9, 2007 5:09 PM
You should think about normalizing the results based on the fielding percentage for the team they pitch. KC pitchers might not be as bad as their ERA shows.
Posted by: Unitsmoke at January 11, 2007 11:53 AM
The purpose of the study was to categorize pitchers by GB and K rates, not ERA. One of the beauties of looking at these two metrics only is that neither is dependent on team defense. Elarton and (Runelvys) Hernandez were just plain awful on their own.
Posted by: Rich Lederer at January 11, 2007 12:38 PM