When the PITCHf/x system debuted last year, the first thing I wanted to know (besides how hard Joel Zumaya actually threw) was exactly how different pitches moved. This was a basic question, and from watching baseball on television and playing it, I had a pretty good idea of how different pitches moved, but my knowledge lacked precision. I know a curveball from a left-handed pitcher breaks down-and-away from a left-handed hitter, but how much does it move? Where do you start measuring,? Where do you finish? How do you separate the downward movement from the away movement? Should you? That curveball ends up low and away, but would you say it broke 5 inches, down-and-away, or 3 inches down and 4 inches away? Which is "better"? Break is a tricky thing to define, let alone measure.
The first attempt to quantify break using PITCHf/x debuted during the 2006 playoffs and compared the actual pitch to a pitch thrown without spin. The system would capture the flight path of a pitch, then create a hypothetical pitch that was thrown with the same initial velocity and release point, but with only gravity and drag acting on it. The difference between where this pitch would have ended up and where the actual pitch ended up was given as the "pfx" of the pitch. There are a couple problems with this definition, the biggest being that nobody knows what a pitch without spin looks like. That isn't to say that it's path can't be calculated, but rather, that nobody has ever seen one, so people don't have a frame of reference for what the values mean. But it was a start. If you went into the XML files, there were two pfx values, one for the x direction and one for the z direction. Graphing these values, either alone or vs. the speed of the pitch remains an excellent method for identifying different pitches. Even if it's unclear how a pitch that ends up 10 inches higher than a non-spinning pitch would have actually moves, other pitches of this type will also have pfx_z's around 10 inches.
The next try at quantifying break arrived this season and is more in line with how people imagine break. This version of break is defined as the greatest distance between the path of the pitch and the straight line path from the release point to home. A 12-to-6 curve will have a large value, while a regular fastball will have a small one. It's confusing to think about this definition, so if you're having trouble understanding it, imagine holding a bow from one of the ends with the other end held away (and slightly down) from you. The end you're holding is the release point, the other end is where the ball crossed home, the string is the straight line path, while the ball would travel along the bow itself. If you rotate the bow around the string at given angle, you get the actual path of the pitch and break as given by PITCHf/x. (Thanks to John Walsh for the bow analogy).
This break value becomes even more valuable (at least to me) when you break it up into x and z components and Dr. Alan Nathan's website has some (more) helpful equations that allow you to calculate break-z and break-x values. To visualize break-z, imagine keeping the endpoints constant and rotating the bow around the string until the bow was above the string and perpendicular to the ground. Break-x is the same thing but the bow is parallel to the ground (don't worry if the bow is to the left or right of the string just yet). The break values are vary similar to the pfx values, except they are in reference to an imaginary straight line, something that is easy to visualize. If the break-z value is 17 inches for a Barry Zito curve, that means it really breaks 17 inches from it's "high point" to where it crosses home. If Mariano Rivera's cutter has a break-x value of -1.3 inches, that means it moves 1.3 inches in on a lefty between it's maximum horizontal deviation and end point . This makes a ton of sense and is much closer to how break is thought of.
Once you understand and are comfortable with the break values, they act pretty much the same as the pfx values, with the benefit of meaning something. Comparing the two Barry Zito graphs below show some of the similarities. The new definition of break in graphed on the left, while the no-spin version is graphed on the right. One thing to note is that because of a convention change, positive break x values (left hand graph) are negative pfx_x values (right hand graph), but the basic pattern of pitches is the same in both cases.
Negative break-x values mean movement away from a RHB, and you can see that Zito's pitches typically move away from a RHB. This type of horizontal movement (toward the arm-side) is what you would expect for a fastball and change-up from any pitcher. Zito's curveball breaks slightly away LHB, which is how curveballs from LHP are "supposed" to break, but the magnitude of Zito's horizontal break is less than normal. The table below shows other similar curveballs from LHP, sorted by their vertical break.
Name Count BreakX BreakZ MPH
Barry Zito 142 0.15" 17.18" 70.1
Doug Davis 165 2.31" 16.83" 68.0
Ted Lilly 157 1.73" 15.62" 70.8
Sean Marshall 62 2.24" 15.47" 73.2
Rich Hill 202 3.10" 14.93" 73.2
Lenny DiNardo 95 0.78" 14.68" 69.9
Zito's curveball actually has the biggest vertical drop of any pitch thrown this year, and comparing it to the other pitches in the chart, you see that the horizontal break is much lower. Zito has historically fared better when throwing to RHB than LHB (669/730 career OPS ) so maybe his unique curveball is the reason why. It's reasonable to think that because the curveball doesn't move away from LHB as much as normal, they would have an easier time hitting it. The only pitcher with a similar curveball is DiNardo and he too shows a reverse split (792 OPS career vs. RHB/814 OPS vs. LHP). Joe Saunders' curve is the next most similar to DiNardo's, although it has less vertical break and an almost normal horizontal break, but he doesn't have a reverse split. However, once you get past Saunders, no other curveballs have a horizontal break close to Zito or DiNardo's.
On The Book's blog this week, there was a discussion about comparing Mariano Rivera's cutter to other pitches and seeing if pitchers that threw those pitches had a reverse split like Rivera. The only problem with doing this for Rivera is you have a better chance of seeing Bigfoot as finding a pitch similar to his cutter. First of all, the horizontal movement on the pitch is totally unique. No other fastball (from either a lefty or righty) breaks as much to the pitcher's glove side as Rivera's does. The amount of movement he gets is consistent with a slider, but the cutter is thrown faster than an average fastball. A final difference is that it also breaks less vertically than a slider does. The table below shows some of the comparable pitches to Rivera's cutter, based on horizontal movement.
Name Pitch BreakX BreakZ MPH
Tim Hudson Cutter -0.66" 6.67 87.0
Miguel Batista Cutter -0.71" 5.27 89.6
Gil Meche Slider -0.97" 5.87 87.1
Mariano Rivera Cutter -1.30" 4.11 93.0
Buddy Carlyle Slider -1.44" 5.41 87.3
John Smoltz Slider -1.56" 6.31 87.2
Dustin McGowan Slider -1.66" 7.88 87.4
None of these pitches match Rivera's cutter very well and Meche is the only one of these pitchers to have a reverse split for his career. One idea I had as I was looking at Zito and Rivera is that uniqueness in horizontal movement might cause reverse splits. Rivera throws a fastball that breaks horizontally like nobody else's in baseball. Zito's curve is unique not due to it's vertical break (although it is large), but it's lack of horizontal break.
I had two topics I wanted to cover this week and while the second one is important to me, it's probably a little less interesting for other people, but I'm using a new algorithm to categorize pitches. It works better than applying a set of logical rules to each pitch and takes less time to run too.
As far as the nuts and bolts of the system, for each pitcher, the algorithm calculates the distance between each pitch using the their break and velocity. Once it has the distances between each pitch, it combines the two pitches that are closest together, recalculates the distances between that new cluster and the remaining pitches, and combines the next two objects that are closest together. It repeats this process until it reaches a certain level of difference between groups. Once the algorithm has run for an individual pitcher, all of their pitches are assigned to a certain group, and using some of the logical statements from my original filter, as well as other patterns regarding the speed and break of different types of pitches, I can label each group (and all it's members) as a specific pitch type.
Labeling pitches by group membership is better than applying a set of static rules to every individual pitch in the database because it allows me to compare different pitches to the rest of that pitcher's repertoire and not worry about how it compares to a global rule. One problem with my old filter was that I had to find a way to get Jamie Moyer and Josh Beckett's fastballs to both be recognized as fastballs, which wasn't easy given the differences in speed. With the new method, the fastest group for each pitcher is automatically labeled as a fastball...no fuss, no muss. This new algorithm is also more successful at identifying individual pitches at the edges of clusters. These pitches clearly belong with the rest of the cluster, but with the old system, these pitches would occasionally not match the logical rules used for classification and be labeled as unknown pitches.
While some of the kinks are still being worked out of this classification system, I can still generate a list of fastballs (for pitchers who have thrown at least 500 total pitches) and see which ones have the greatest vertical break.
Name N BreakX BreakZ MPH
Sean Green 300 3.64" 8.49" 89.8
Jesse Litsch 290 -0.59" 7.23" 84.8
Brandon Webb 637 3.71" 7.06" 89.0
Kameron Loe 428 3.14" 6.37" 88.6
Greg Maddux 555 3.56" 6.36" 86.3
Derek Lowe 670 3.93" 6.32" 90.3
Jake Westbrook 462 3.50" 6.28" 90.8
Justin Germano 466 3.38" 5.79" 86.9
Roy Halladay 268 3.51" 5.60" 93.9
Jamey Wright 320 3.02" 5.59" 89.1
Look familiar? Instead of saying Webb's sinker ends up 3 inches higher than a non-spinning pitch, while a 4-seam fastball ends up 6 inches higher (or whatever the numbers were), now you can say that Webb's sinker has a 7 inch downward break.