Designated Hitter September 21, 2009
Best Fastballs in Baseball

A few weeks back, Jeremy Greenhouse presented a new method for evaluating who throws best pitches in baseball. Building on work by Dave Allen and John Walsh, the principle is to evaluate pitches based on their outcomes. Jeremy's innovation was to use regression to predict the likelihood of each outcome, given the velocity and movement of each pitch. Previous methods (such as those at FanGraphs, have the problem of giving too much credit to lucky pitchers. If two pitchers throw exactly the same pitch, Bronson Arroyo may get an out, and Chris Carpenter gives up a hit. The outcome-based method would give exactly the same credit to both pitchers.

While Jeremy was working on his analysis, I was working in parallel on a similar method. I've used a kernel density estimator and expectation-maximization algorithm to classify each of the 480,000 pitches throw by right-handed pitchers to right-handed batters between 2007 and 2009, and then estimate the likelihood of relevant outcomes. Some differences, instead of movement and velocity, this analysis includes five parameters: horizontal location, vertical location, velocity, vertical movement, and horizontal movement. Further, we can look at each pitch along each dimension in isolation to give a rough estimate of the importance of each dimension.

Note that although this method is not biased to favor lucky pitchers, it may be biased to punish pitchers with "intangibles." We can build any physically measurable factor into our model, but that won't help us quantify the value of "deception." I fully believe that some pitchers have strange deliveries that throw a batter's timing off, and some are better at sequencing their pitches. This method will undervalue them, because it is essentially evaluating each pitch in isolation. This method will fail to account for pitch selection or sequencing, or any contextual variables. Having a variety of pitches allows a pitcher to set up better pitch sequences, which will make the same fastball more successful. This method can't account for that.

Relative Importance of Components
Once each pitch was evaluated along each of the 5 dimensions, we could look to see how well these values correlated with the overall value of the pitch. This is sort of daft--we have a high powered mathematical algorithm that takes into account high-order statistical dependencies, and then we use a linear regression to evaluate the components. In using the regression for this step, we will lose the ability to look at nonlinearities and interactions, but its a first step. Depending on which pitches we look at (just 4-seamers, or all fastballs), this linear model explains 50 to 90% of the variance.

Regardless of how which pitches we include, the most valuable component is Velocity (with a beta of .592), followed by vertical location and movement (.494, .338 respectively). horizontal location limps in next at .163, and horizontal movement had might as well stayed home, at .070. These numbers change slightly depending on the parameters of the model, and the filters and such, but the general picture remains the same.

Top 20 Fastballs

Here is a list of the top 20 fastballs thrown between 2007 and August 2009, inclusive. Pitches are averaged by pitch type (4-seam fastball, FB; 2-seam fastball, FT; cut fastball, FC), for each pitcher and then ranked by average value. The marginal value of the pitch dimensions are summarized in Control, Velocity and Movement, evaluated by calculating how much value would drop by removing these dimensions. These values are represented as weighted Z scores.

 Rank Player Value Type Control Velocity Movement 1 Zack Greinke -0.0313 FT 1.13 2.68 0.90 2 Roy Halladay -0.0304 FT 0.73 2.19 0.72 3 Ronald Belisario -0.0181 FB 0.38 2.05 0.33 4 Ubaldo Jimenez -0.0166 FB 0.33 2.14 0.25 5 Jonathan Broxton -0.0164 FB 0.18 1.87 0.09 6 Felix Hernandez -0.0155 FB 0.38 1.94 0.25 7 Roy Halladay -0.0150 FC 0.50 0.96 0.55 8 Heath Bell -0.0149 FB 0.41 1.39 0.28 9 Mariano Rivera -0.0130 FC 0.19 1.32 0.32 10 Bobby Jenks -0.0123 FB 0.31 1.15 0.07 11 Daniel Bard -0.0122 FB 0.13 1.42 -0.01 12 Brandon Morrow -0.0118 FB 0.11 1.20 0.21 13 Joel Zumaya -0.0112 FB -0.08 1.95 -0.14 14 Vin Mazzaro -0.0106 FB 0.34 1.04 0.26 15 Andrew Bailey -0.0101 FC 0.38 0.51 0.37 16 J.J. Putz -0.0095 FB 0.20 0.96 0.14 17 Joe Nathan -0.0092 FB 0.19 0.63 0.26 18 Freddy Dolsi -0.0090 FB 0.12 1.47 0.12 19 Chris Carpenter -0.0090 FB 0.24 1.34 0.26 20 Kevin Jepsen -0.0090 FB 0.18 0.99 0.13

Pitcher Plots
Below, I've plotted the pitch values on a pitch-by-pitch basis for a few pitchers I selected arbitrarily. The first plot shows the movement and the velocity of each pitch, to give a sense of how successful the pitch classification system was. The second and third plots show the expected value of each pitch plotted against its X location and velocity.

#1 Zack Greinke, 2-Seam Fastball

Greinke's two-seam fastball was given the highest rating in both control and movement and velocity. These values reflect how much the value of the pitch decreases when you remove that dimension from the equation. So it is a little misleading, since a better pitch has more to lose if you remove an important dimension. There are many guys who throw harder than Greinke, but there are no pitchers who would suffer more if they suddenly had league-average velocity.

My classification system says that Halladay has 3 pitches: the 2-seam fastball, the cutter, the curveball. He probably has a change-up that is being misclassified as well. But however you split it, they are a very good pair of pitches. The value-by-location plot shows pretty good control; he hits the outside half of the plate frequently.

#3 Ronald Belisario 4-Seam Fastball

If you don't know who Ronald Belisario is, you're not alone. His fastball averages 95 mph, and crosses the plate in the zone 56% of the time. He has a 1.92 ERA in 65 innings, though with a somewhat low BABIP. We only have 319 pitches to analyze, so he's likely getting somewhat lucky,

#5 Jonathan Broxton, 4-Seam Fastball

Broxton has a crazy good, totally boring fastball. Its all about velocity. He averages nearly 97 mph, and you can see from the value by velocity graph that he can touch 100, where his value spikes. His vertical movement is good, averaging 10 inches. He also has good command, hitting the strike zone 57% of the time. No bells or whistles here, just heat.

#9 Mariano Rivera, Cutter

If Rivera wasn't included as one of the top fastballs, we'd know something is wrong. Want to see something really beautiful? Check out the histogram at the bottom of the value-by-location plot. That's control.

#11 Daniel Bard, 4-Seam Fastball
Our scouts tell us Bard relies a 96 mph fastball that can reach 101 mph and a 82 mph slider wih bite. He also supposedly has a high 80s cutter, a low 90s sinker, and a change-up. We don't have enough data from Bard to see his full range--he only barely makes the 100 pitch minimum--but we can still get an initial look.

Pitch F/X agrees with the scouts: he has a very consistent 97 mph fastball with 11 inches of vertical movement. He relies heavily on the fastball and slider, but he has also thrown a handful of change-ups. He has not yet thrown a low 90s sinker or high 80s cutter in the majors. The lateral location of his pitches looks bimodal, almost like he's either trying to throw inside, or hit the outside edge. Those inside pitches account for many of his worst pitches. His best pitches were high and outside.

#27 Jonathan Papelbon, 4-Seam Fastball

When he's not dancing, he throws this...

Other Rankings of Note
#29 Grant Balfour
#30 Josh Beckett
#32 Matt Lindstrom
#43 Frank Francisco
#45 Justin Verlander
#50 Zack Greinke's other fastball