Behind the ScoreboardFebruary 16, 2010
There Are Two Types of Pitchers....
By Sky Andrecheck

Two weeks ago, I used a principal component analysis to try to separate hitters into two distinct groups. The hitters broke down between "three-true-outcome" players like Adam Dunn (lots of homers, walks and strikeouts) and small-ball type players like Ichiro Suzuki (contact hitters with a lot of singles, but not many walks or homers). This week I'll attempt to do the same for pitchers. As I mentioned last week, the principal component anaylsis basically attempts to create a "component" that maximizes the variance between players. The created component will be the one metric that best differentiates between the players.

A principal component analysis depends greatly on the variables fed into it. For hitters, I used the singles, doubles, triples, homers, walks, and strikeouts per plate appearance as the input variables. While I could do that here, I thought I would use variables over which the pitcher had more direct control. Using Fangraphs pitch data, I used the following: % of Fastballs Thrown (including cutters), % of Sliders, % of Changeups, Velocity of Fastball, Ground Ball%, Walks per PA, and Strikeouts per PA. I thought about using Hits per PA, and HR per PA, but since those are largely a function of luck and I didn't want to measure that, I decided to leave them out. Like before, each variable was normalized before putting it into the model.

For hitters I was uncertain of what to expect, however for pitchers I had a fairly good idea. I expected that the two groupings of pitchers would be between power pitchers and control pitchers. However, I wasn't exactly sure how it would break it down. Running the analysis, the factor loadings for the first principal component were as follows:


As it turns out, my intuition was correct - it does indeed separate pitchers into power pitchers and control guys. Higher scores indicate power pitchers. A pitcher's strikeout rate is the biggest determinant of his power score, followed by his velocity, and how often he throws his slider. Another indicator of being a "power pitcher" is walking more hitters. Predictably, pitchers who threw a lot of changeups had a lower power pitcher score. Meanwhile, somewhat surprising (to me, at least) was that whether the pitcher was a flyball or groundball pitcher didn't really make a bigger difference one way or another. I suppose I had expected power pitchers to throw high fastballs and hence give up more flyballs. With a coefficient of -.111, this was in that direction, but was not very strong. Also surprising was that the percentage of fastballs thrown was not a major factor.

So who were the top and bottom pitchers in terms of "power" score? Like last week, the scores were standardized to have an average of 100 and a standard deviation of 15. The top 10 power pitchers were all relievers, many of them very good. This is perhaps to be expected. After all, relievers have the luxury of being one-pitch or two-pitch pitchers, and hence they can throw harder and likely don't rely on the change-up. The number one power pitcher is Cubs reliever Carlos Marmol, who Richard Lederer has profiled recently. Marmol relies heavily on his slider, throws hard, and gives up a ton of walks, as well as getting his fair share of strikeouts. At #2 is the Dodgers' Jonathan Broxton, who throws a flaming fastball and strikes out a ton of hitters as well.


How about the "craftiest" pitchers? The leaderboard is below:


As you might expect, Tim Wakefield is the craftiest. Throwing no sliders, and only 10% fastballs at an average speed of just 72 mph, he's the direct opposite of Jonathan Broxton or Carlos Marmol. Jaime Moyer also is the quintessential "crafty left-hander". Righties can be crafty as well, with the Cardinals' Brad Thompson listed as the fourth craftiest pitcher, throwing very few sliders and not giving up many walks or dishing many strikeouts.

An interesting case is #7, Trevor Hoffman. Most closers are power pitchers, with closers comprising about half of the top 10 most powerful pitchers. Hoffman, used to be that guy, but he now has below average velocity and relies heavily on the change-up (he does still get his fair share of K's however, which is why he isn't listed higher).

With the top 10 power pitchers all relievers, you might wonder who the most powerful starting pitchers were. The list of leaders is below:


As you can see, it's a pretty exclusive group. While some of the power pitching relievers aren't necessarily all that effective, the top 10 power starters are all pretty much All-Star caliber. Apparently, if you're a starting pitcher who has the ability to pitch like a reliever for an entire game, you're going to be really effective. Sitting at #1 is the 21-year old phenom Clayton Kershaw. The biggest reason he's on the list is that he both strikes out a ton of batters and walks a lot as well. Couple that with a huge fastball, and you've got a true power pitcher. The rest of the list is a who's who of young, outstanding flamethrowers. The only exception is Randy Johnson, who can miraculously still pitch like a power pitcher well into his 40's.

Unlike the hitting breakdown, where three-true-outcome hitters were about as good as small-ball hitters, that wasn't true here. Here, power pitchers are clearly generally more effective than "crafty" pitchers. Not that there aren't effective crafty pitchers such as Mark Buerhele or Trevor Hoffman, but as a rule power pitchers are better. There's a reason that teams love guys who can throw hard. The results of the analysis wasn't too surprising, but it was interesting to see how the principal component analysis divided the pitchers into two groups. In theory, we could look to find other orthogonal traits by looking at the second and third principal components. However, as with the hitting data, I wasn't able to make much substantive sense out of the other components.

You can check out the full list of pitchers (with 50 or more IP) at the link below:

View image


how exactly did you come up with the weights? They seem semi-arbitrary.
Anyway, very cool stuff. I admire this kind of stuff. Well done.


Cool stuff. Were there other components that explained a lot of variance, or was it pretty much all wrapped up in the first component? If there are additional variables, do they show any other major separations between groups of pitchers?

Harris, Yeah, the process basically does it's best to separate the data, and those are the weights it gives. It ends up looking arbitrary, but the computer is basically spitting out the best numbers possible.

JinAZ, The other components did explain some variance as well, but I couldn't really make heads or tails of what the underlining factor was they were defining. Obviously the first component was pretty clear.

Did you try a varimax rotation? Sometimes they can result in much easier to interpret components. Or not--sometimes they are just gobbledygook. :D

It's PRINCIPAL not principle