Baseball BeatJanuary 08, 2007
Categorizing Pitchers by Batted Ball Types and Strikeout Rates
By Rich Lederer

We all know that strikeouts are the best outcome for a pitcher. When a batter fails to put the ball in play, there is little or no chance for him to reach base or to advance runners on base. Among batted ball types, infield flies are the least harmful, followed by ground balls, outfield flies, and line drives. Although groundballs result in a higher batting average than fly balls, their run impact is lower because the hits are usually limited to singles and an occasional double down the first or third base line, whereas balls in the air that turn into hits are almost always doubles, triples, or home runs.

According to Dave Studenmund's Batted Balls Redux article in The Hardball Times Baseball Annual 2007, strikeouts had a run impact of -0.113, infield flies -0.088, groundballs 0.045, outfield flies 0.192, and line drives 0.391 per incident last year.

Based on the above information, it follows that just as pitchers with high strikeout rates would generally fare better than those with low rates, pitchers with high groundball rates would normally fare better than those with low rates (all else being equal). Furthermore, it also suggests that pitchers who combine higher strikeout and groundball rates will outperform those with lower rates.

To provide a visual aid to categorize such pitchers, I created a graph (with the help of David Appelman of FanGraphs), plotting the strikeout and groundball rates for everyone in the major leagues who completed 100 or more innings and started in at least 33% of their appearances. The y-axis is groundball percentage (GB%) and the x-axis is strikeouts per batter faced (K/BF). The graph is divided into four quadrants with the mid-point equal to the average GB% of 43.80% and average K/BF of 15.88%.

The northeast quadrant is comprised of pitchers with above-average groundball and strikeout rates; the southeast quadrant encompasses pitchers with above-average strikeout and below-average groundball rates; the northwest quadrant is made up of pitchers with above-average groundball and below-average strikeout rates; and the southwest quadrant is the home for pitchers with below-average groundball and strikeout rates.

Starter-GBK.png

The average ERA among all qualifiers was 4.44. The average ERA for starters with above-average K rates was 4.12. The average ERA for those with below-average K rates was 4.78. Similarly, the average ERA for starters with above-average GB rates was 4.24, while the average ERA for those with below-average GB rates was 4.62.

The average ERA by quadrants:

       W        E
N     4.53     3.94
S     5.01     4.27          

Not surprisingly, pitchers with the highest strikeout and groundball rates had the lowest average ERA (3.94), while those with the lowest K and GB rates had the highest average ERA (5.01). In the hybrid categories, pitchers with above-average strikeout and below-average groundball rates (4.27) beat those with below-average K and above-average GB rates (4.53).

Looking at the outliers in the graph would help us reach the same conclusion. There isn't a one of us who wouldn't take Brandon Webb, Felix Hernandez, Chris Carpenter, or Francisco Liriano over Scott Elarton or Runelvys Hernandez. Liriano, in fact, arguably had the best combination of K and GB rates, while Elarton had the worst combo. Lo and behold, Liriano had the lowest ERA (2.16) among starters with at least 100 IP and Elarton had one of the worst (5.34).

Let's take a closer look at the results, starting with the northeast quadrant and going clockwise. The pitchers are sorted by K/BF rates in the first three tables and by GB% in the last table.

NORTHEAST QUADRANT (ABOVE-AVG GB AND K RATES)

Name                    GB%     K/BF
Francisco Liriano       55.33%  30.44%
Carlos Zambrano         46.88%  22.90%
Brett Myers             45.55%  22.69%
Roger Clemens           49.02%  22.62%
Jeremy Bonderman        48.17%  22.37%
John Smoltz             46.29%  21.98%
Scott Olsen             44.80%  21.81%
Felix Hernandez         57.72%  21.57%
C.C. Sabathia           45.05%  21.45%
Chris Carpenter         53.34%  20.54%
A.J. Burnett            50.49%  20.45%
Erik Bedard             48.81%  20.26%
Josh Johnson            45.77%  20.18%
Adam Loewen             48.48%  19.44%
Andy Pettitte           49.77%  19.16%
Dave Bush               46.65%  19.10%
Danny Haren             45.24%  18.92%
Brandon Webb            66.48%  18.74%
Kelvim Escobar          44.70%  18.63%
Roy Oswalt              48.80%  18.53%
Josh Beckett            45.10%  18.18%
Vicente Padilla         44.10%  17.89%
Doug Davis              44.08%  17.59%
Cory Lidle              49.72%  17.45%
Kevin Millwood          44.57%  17.31%
Dontrelle Willis        47.54%  16.41%
Jose Contreras          44.62%  16.09%
Wandy Rodriguez         44.95%  16.04%

That's an elite group of pitchers. The best of the bunch are those with strikeout rates above 20% and/or groundball rates over 50%. I had already signaled out Hernandez, Carpenter, and Liriano, but check out A.J. Burnett. The latter is one of the premier pitchers in baseball when healthy.

While Jose Contreras and Wandy Rodriguez fall into the NE quadrant, both pitchers have K and GB rates that are close to the league average. As such, I would be reluctant to label either one as a special pitcher.

On the other hand, I would be shocked if Hernandez (4.52), Adam Loewen (5.37), and Josh Beckett (5.01) don't lower their ERA by at least 0.50 and perhaps by more than 1.00 in 2007. With Liriano injured, I would rank Felix directly behind Johan Santana as the pitcher most likely to lead the AL in ERA this year.

SOUTHEAST QUADRANT (BELOW-AVG GB AND ABOVE-AVG K RATES)

Name                    GB%     K/BF
Ben Sheets              40.41%  26.98%
Scott Kazmir            41.97%  26.72%
Johan Santana           40.59%  26.54%
Cole Hamels             38.86%  25.99%
Jake Peavy              38.00%  25.41%
Pedro Martinez          36.29%  24.91%
Daniel Cabrera          40.72%  23.72%
Orlando Hernandez       33.78%  23.46%
Chris Young             25.37%  22.31%
Curt Schilling          39.83%  21.94%
Matt Cain               35.98%  21.88%
Aaron Harang            38.69%  21.75%
Jered Weaver            30.03%  21.43%
Mike Mussina            42.37%  21.39%
Javier Vazquez          39.84%  21.10%
Ian Snell               42.78%  20.79%
John Lackey             42.99%  20.61%
Jason Schmidt           37.40%  20.13%
Ted Lilly               37.73%  20.08%
Boof Bonser             41.69%  20.05%
Randy Johnson           41.71%  20.00%
Oliver Perez            30.12%  19.28%
James Shields           42.75%  19.26%
Gil Meche               43.12%  19.24%
Byung-Hyun Kim          41.51%  18.72%
Chris Capuano           39.88%  18.59%
Bronson Arroyo          38.15%  18.55%
Brad Penny              43.54%  18.20%
Chuck James             27.67%  18.06%
Kyle Lohse              42.89%  17.11%
Ervin Santana           38.41%  16.67%
Claudio Vargas          39.86%  16.47%
Ricky Nolasco           38.84%  16.15%
Taylor Buchholz         43.73%  16.08%
Rodrigo Lopez           42.61%  16.06%
Justin Verlander        41.72%  15.98%
Barry Zito              38.20%  15.98%
Ryan Madson             42.79%  15.97%

There are a couple of dozen outstanding pitchers in this group, most notably those listed in the top half (or with K rates over 20%). Ben Sheets, Scott Kazmir, and Santana had almost identical K and GB rates. Cole Hamels and Jake Peavy rank just below this threesome with metrics not too dissimilar from one another or those immediately above them.

Oliver Perez (6.55) has the most room to shave a couple of runs off his ERA. Having gone to bat for Daniel Cabrera (4.74) last year, I hesitate to bring up his name again but he and Ian Snell (4.74) are good bets to show improvement in 2007.

SOUTHWEST QUADRANT (BELOW-AVG GB AND K RATES)

Name                    GB%     K/BF
Chan Ho Park            43.60%  15.84%
Jason Jennings          43.77%  15.74%
Victor Santos           43.64%  15.52%
Brett Tomko             37.47%  15.48%
Tim Wakefield           39.39%  14.75%
Freddy Garcia           41.19%  14.72%
Cliff Lee               32.70%  14.63%
Esteban Loaiza          42.10%  14.29%
Odalis Perez            43.72%  14.14%
Jon Lieber              42.96%  14.01%
Tony Armas Jr.          38.52%  14.00%
Jeff Weaver             38.99%  13.90%
Eric Milton             30.80%  13.60%
Jaret Wright            38.30%  13.44%
Livan Hernandez         36.58%  13.35%
Michael O'Connor        36.17%  12.97%
Jarrod Washburn         39.87%  12.73%
Joe Blanton             43.13%  12.50%
Jae Seo                 35.40%  12.45%
Jon Garland             42.11%  12.44%
Noah Lowry              36.38%  12.19%
Josh Fogg               42.55%  12.16%
Jamie Moyer             40.03%  12.08%
Seth McClung            37.25%  12.07%
Brad Radke              41.64%  12.05%
Shawn Chacon            32.70%  12.02%
Ramon Ortiz             40.81%  11.94%
Woody Williams          35.69%  11.54%
Kris Benson             41.32%  11.27%
Jason Marquis           42.88%  11.03%
John Koronka            42.30%  11.01%
Paul Byrd               38.52%  10.93%
Steve Trachsel          41.52%  10.73%
Runelvys Hernandez      38.58%   9.84%
Scott Elarton           29.49%   9.78%
Carlos Silva            43.62%   8.63%

This is the quadrant that you want to avoid. It is inhabited by some of the worst starters in the game. If you fail to miss bats and don't keep the ball on the ground when it is put into play, you are going to run into trouble. There is basically only one way to survive in this quadrant: throwing strikes and maintaining a low walk rate. Freddy Garcia, Jon Lieber, Jon Garland, Brad Radke, Paul Byrd, and Carlos Silva fit this description. But these types of pitchers live on the edge with very little margin for error.

Chan Ho Park, Jason Jennings, and Victor Santos were near league average in K and GB rates and should be classified more like Jose Contreras and Wandy Rodriguez (both of whom fell in the NE quadrant) than the rest of their SW brethren.

NORTHWEST QUADRANT (ABOVE-AVG GB AND BELOW-AVG K RATES)

Name                    GB%     K/BF
Derek Lowe              67.04%  13.47%
Chien-Ming Wang         62.80%   8.44%
Jake Westbrook          60.80%  12.06%
Jason Johnson           59.02%  10.25%
Jamey Wright            58.06%  11.69%
Aaron Cook              57.77%  10.05%
Tim Hudson              57.66%  14.70%
Roy Halladay            57.33%  15.07%
Kirk Saarloos           54.00%   9.49%
Clay Hensley            53.87%  15.50%
Paul Maholm             53.05%  14.85%
Miguel Batista          51.66%  12.09%
Zach Duke               51.12%  12.51%
Greg Maddux             50.80%  13.57%
Kenny Rogers            50.07%  11.66%
Luke Hudson             49.10%  14.55%
Mark Hendrickson        47.59%  13.77%
Joel Pineiro            47.46%  11.55%
Sean Marshall           46.81%  13.68%
Nate Robertson          46.75%  15.55%
Jeff Suppan             46.65%  12.43%
Casey Fossum            45.88%  14.81%
Aaron Sele              45.76%  12.64%
Matt Morris             45.66%  12.96%
Brian Moehler           45.25%  10.43%
Jeff Francis            44.67%  13.88%
Anibal Sanchez          44.61%  15.35%
Mark Redman             44.41%  10.27%
Mark Buehrle            44.35%  11.19%
Tom Glavine             44.28%  15.56%
Elizardo Ramirez        44.00%  14.84%
Enrique Gonzalez        43.84%  14.29%

This is an interesting group of pitchers. As a whole, they rank well behind those in the NE quadrant and well ahead of those in the SW quadrant. Although they are the opposite of the pitchers in the SE quadrant, their results (in terms of ERA) are the most similar. The two groups just get there in drastically different ways. The NW pitchers succeed by inducing grounders and keeping the ball in the park, whereas the SE hurlers thrive on strikeouts.

Derek Lowe, Chien-Ming Wang, and Jake Westbrook are the biggest outliers - three pitchers who turned more than 60% of batted balls into grounders. As such, Lowe (0.58), Wang (0.50), and Westbrook (0.64) had extraordinarily low HR/9 rates. They also have one other common thread: low walk rates. Lowe (2.27), Wang (2.15), and Westbrook (2.34) offset their high hit rates by limiting the number of bases on balls.

Roy Halladay, Clay Hensley, Nate Robertson, Anibal Sanchez, and Tom Glavine - all with K/BF rates exceeding 15% - are within a few whiskers of being in the NE group. However, Sanchez and Glavine are close to league average with K rates just to the west and GB% slightly to the north of the means.

I would rather know a pitcher's strikeout and groundball rates than his ERA. Throw in a third dimension - walk rates - and you have almost everything you need to know about a pitcher. Focusing on the components gives one a much more comprehensive understanding of a pitcher's upside and downside than looking at a single metric such as ERA.

(Notes: I could have chosen run average (RA) rather than earned run average (ERA), but the results would have been essentially the same in both direction and magnitude. As previously demonstrated, groundball pitchers generally give up a greater percentage of unearned runs because more errors (of the fielding and throwing type) are committed on grounders than balls hit in the air. Ballpark factors, team defense, and the level of competition may affect the components and/or ERA to varying degrees.)

Tomorrow: Categorizing Relievers by Batted Ball Types and Strikeout Rates.

Comments

It's really tricky, Rich. ERA measures some qualities of a pitcher, not covered by the components, controlling the running game (which impacts DP rates)and ability to pitch from the stretch being the two major ones.

If one is looking only at one year's worth of statistics, I'd rather know the component data. Over the longer haul, I'd rather know the ERA but the component data is still very useful in understanding seasonal changes which may be indicative of longer term growth or decline.

I agree that "ERA measures some qualities of a pitcher not covered by the components" but it is also affected by areas outside the control of the pitcher (such as team defense and bullpen support).

It basically comes down to one's preference for FIP or DIPS vs. ERA. However, rather than using HR rate as in FIP/DIPS, I like GB% because it has a strong correlation with HR rate and GIDP rate.

ERA may do a better job explaining a pitcher's past performance, but I believe the components do a better job at predicting future performance.

I also like categorizing pitchers in this manner because you can get a better feel for the "how" and "why" they differ from others. In addition, the quadrants are visually useful to me.

I love looking at graphs....great way to really look at data and see information.

I wonder is you could take the 10-15 pitcher that fell most closely around the center of the quadrants and averaged their ERA's what that would get you and who they would be.

If I have a second, I'll give it a try. Looking forward to seeing the relievers version of this.

Tim
Red Sox Times

~~~According to Dave Studenmund's Batted Balls Redux article in The Hardball Times Baseball Annual 2007, strikeouts had a run impact of -0.113, infield flies -0.088, groundballs 0.045, outfield flies 0.192, and line drives 0.391 per incident last year.~~~

Rich, Dave, or anyone -

Don't Run Expectancy tables say that a whiff (K) by a batter is no worse than any other form of out? If so, why does it favor the pitcher, in terms of preventing runs, by getting the whiff?

I know that K's are fielding independent, etc. And, that's why, etc.

So, then why does every sabermetric expert say that K's by a batter are no big deal?

This has always bothered me. Anything you can share would be appreciated. Thanks.

I would also be interested to know what the variance looked like in each of these quadrants in ERA.

Was any group more clustered around a mean than another. Meaning, Quandrant A is high risk, high reward... with upside higher than Quadrant B where everyone is clustered together and there is consistency/predictability in that type of pitcher?

Tim
Red Sox Times

Thanks, Tim.

Steve - The batted ball info encompasses hits and outs. Not all infield/outfield flies and groundballs turn into outs, whereas 99.9% (or thereabouts) of all strikeouts result in an out.

It should also be pointed out that some batted balls, even when they result in the batter being put out, advance runners on base. Therefore, these so-called "productive outs" are more valuable than striking out. Hope that helps.

Steve, the reason is that the Run Expectancy tables are measuring different things for batters versus pitchers.

For pitchers, they're measuring a virtually certain out (K), versus balls in play which *might* be turned into outs (and generally are, at a rate of ~70% or so overall). That still gives a strikeout a bonus in those outcomes, because it is converted to an out 99+% of the time, and the others aren't.

For batters, you're measuring what kind of out, not whether or not it was an out. The batter's first job is not to make an out; if he makes an out, it usually doesn't matter what kind. (The less-supported extension is that high strikeout totals aren't bad, they're good; that turns out to be true if the hitter also has high walk rates, high fly-ball percentages, and high fly-ball/HR ratios. In other words, it's true when the hitter is taking an approach that maximizes both not-making-outs and hitting-the-ball-hard. This is confusing cause and effect; not-making-outs and hitting-ball-hard are good, high strikeout totals can be an effect of that, but don't cause it.)

If you break the RunExpectancy tables down for kind of out, there is very little difference i.e. going from 1st-2nd-one-out to 1st-3rd-two-outs is the big difference, not 1st-2nd-one-out to 1st-3rd-two-outs-where-second-out-was-grounder.

Finally, if you break down batted-ball data for hitters, you will find the same sorts of correlations i.e. that hitters as a group do better when they put the ball in play or hit a HR than when they strike out, because a strikeout converts to an out effectively all the time, and the others don't.

I took the fourty players clustered around the median in both categories and marked them. I pulled the list of players that were marked in both categories and grouped them. This should be the twenty players closest to the center
Name GB% K/BF ERA
Anibal Sanchez 44.61% 15.35% 2.83
Justin Verlander 41.72% 15.98% 3.63
Jason Jennings 43.77% 15.74% 3.78
Tom Glavine 44.28% 15.56% 3.82
Jose Contreras 44.62% 16.09% 4.27
Brad Penny 43.54% 18.20% 4.33
Vicente Padilla 44.10% 17.89% 4.50
Kevin Millwood 44.57% 17.31% 4.52
Chan Ho Park 43.60% 15.84% 4.81
Esteban Loaiza 42.10% 14.29% 4.89
Doug Davis 44.08% 17.59% 4.91
Jon Lieber 42.96% 14.01% 4.93
Elizardo Ramirez 44.00% 14.84% 5.37
Enrique Gonzalez 43.84% 14.29% 5.67
Ryan Madson 42.79% 15.97% 5.69
Victor Santos 43.64% 15.52% 5.70
Kyle Lohse 42.89% 17.11% 5.83
Taylor Buchholz 43.73% 16.08% 5.89
Rodrigo Lopez 42.61% 16.06% 5.90
Odalis Perez 43.72% 14.14% 6.20

This group's average ERA is 4.87. Certainly nothing to write home about. I wonder what that says about Sanchez and Verlander?

on the other hand, the "strikeout is worse" theory is somewhat mitigated by the fact that putting the ball into play (with a runner on base) puts you at risk of hitting into a double-play, which is worse than a strikeout.

This group's average ERA is 4.87. Certainly nothing to write home about. I wonder what that says about Sanchez and Verlander?

Sanchez had an unsustainably low BABIP of .244 last year. I would look for his ERA to jump a full run, if not more. Pitching half of his games in Florida may prevent his ERA from regressing all the way back to the league average for all qualifiers of 4.44 or the NW quadrant average of 4.53 or the 20 pitchers "closest to center" (as you called them) of 4.87. (BTW, my guess is that the weighted-average ERA of the latter group is better than 4.87 as the pitchers with the best ERA also were those with the most IP.)

Verlander, on the other hand, is perhaps the most difficult pitcher in the majors to steal bases against. He only allowed one SB vs. five CS. The rookie was also 21st in the bigs with 21 GIDP. I would hesitate to bet against Verlander given his raw talent but would be surprised if he lowered his ERA without a corresponding level of improvement in his component stats.

The one thing this study, and most others, don't have a way to evaulate is pithcers that only go for a strikeout when they need to. Verlander and Carpenter both seem to fit into that mold. Verlander seems to have a lot of "stuff" in reserve for when he needs it. Carpenter like to paint the corners with sinkers trying to get an easy out unless there are RISP.

Could we add a 3rd dimension based on pitches per PA? The idea being that the ability to combine a high strikeout rate with a low number of pitches per PA is more valuable than the same K rate with a high pitch count.

Don't Run Expectancy tables say that a whiff (K) by a batter is no worse than any other form of out? If so, why does it favor the pitcher, in terms of preventing runs, by getting the whiff?

Steve -- no, the run expectancy tables I use say that a strikeout is worse than a generic out. The best primary source I can think of is Tom Ruane's excellent Retrosheet article at...

http://www.retrosheet.org/Research/RuaneT/valueadd_art.htm

As you can see, a strikeout results in -.238 runs, while a batted ball out (other than double plays) results in -.200 runs.

great point Rich...weighted average ERA for the group closest to the center is 4.77 (as long as my quick math is right).

Very cool, Rich. Thanks. A picture says a...well, you know.

Anywho. Here's an idea for even more work for someone other than me to do. Take last year's data. Overlay it on this year's data. Use those cool arrow things studes used in his recent "But I Regress" column to show the changes in a (select group of) pitcher's plots/tendencies. See if you can find the same data for 2004. Rinse, wash, repeat.

You know, that kind of thing.

Thanks again.

Yours in Cyle Hankerd,
Kent

I agree that RA, not ERA, should be used.

***

RE can be found here:
http://www.tangotiger.net/RE9902event.html

The run value of the K is around .01 runs worse than a regular out (see last line). This link also gives you the run value of the K and other outs by base/out state. K with man on 3b and less than 2 outs is great for a pitcher.

Thanks, Tango.

K with man on 3b and less than 2 outs is great for a pitcher.

Agree. That is one of the biggest differentiating characteristics of pitchers. Conversely, as it relates to hitters, putting the ball in play in such situations (as opposed to striking out) is important, too.

I was wondering if you did something similar to this with batting. I did something kind of like this on the batting standpoint, and found out that the two statistics with the closest relevance to runs were SLG, and POS. Of course I only used the basic statistics, and nothing as in-depth. I used 7 years along baseball's history to get these results. It was actually interesting.

Wondering what people think of Chris Young. Can he remain a top starter for the Padres with a GB% around 25%? Crazy outliers like Wang and Young kinda freak me out.

You should think about normalizing the results based on the fielding percentage for the team they pitch. KC pitchers might not be as bad as their ERA shows.

The purpose of the study was to categorize pitchers by GB and K rates, not ERA. One of the beauties of looking at these two metrics only is that neither is dependent on team defense. Elarton and (Runelvys) Hernandez were just plain awful on their own.