Command Post September 21, 2007
"Breaking" Away

When the PITCHf/x system debuted last year, the first thing I wanted to know (besides how hard Joel Zumaya actually threw) was exactly how different pitches moved. This was a basic question, and from watching baseball on television and playing it, I had a pretty good idea of how different pitches moved, but my knowledge lacked precision. I know a curveball from a left-handed pitcher breaks down-and-away from a left-handed hitter, but how much does it move? Where do you start measuring,? Where do you finish? How do you separate the downward movement from the away movement? Should you? That curveball ends up low and away, but would you say it broke 5 inches, down-and-away, or 3 inches down and 4 inches away? Which is "better"? Break is a tricky thing to define, let alone measure.

The first attempt to quantify break using PITCHf/x debuted during the 2006 playoffs and compared the actual pitch to a pitch thrown without spin. The system would capture the flight path of a pitch, then create a hypothetical pitch that was thrown with the same initial velocity and release point, but with only gravity and drag acting on it. The difference between where this pitch would have ended up and where the actual pitch ended up was given as the "pfx" of the pitch. There are a couple problems with this definition, the biggest being that nobody knows what a pitch without spin looks like. That isn't to say that it's path can't be calculated, but rather, that nobody has ever seen one, so people don't have a frame of reference for what the values mean. But it was a start. If you went into the XML files, there were two pfx values, one for the x direction and one for the z direction. Graphing these values, either alone or vs. the speed of the pitch remains an excellent method for identifying different pitches. Even if it's unclear how a pitch that ends up 10 inches higher than a non-spinning pitch would have actually moves, other pitches of this type will also have pfx_z's around 10 inches.

The next try at quantifying break arrived this season and is more in line with how people imagine break. This version of break is defined as the greatest distance between the path of the pitch and the straight line path from the release point to home. A 12-to-6 curve will have a large value, while a regular fastball will have a small one. It's confusing to think about this definition, so if you're having trouble understanding it, imagine holding a bow from one of the ends with the other end held away (and slightly down) from you. The end you're holding is the release point, the other end is where the ball crossed home, the string is the straight line path, while the ball would travel along the bow itself. If you rotate the bow around the string at given angle, you get the actual path of the pitch and break as given by PITCHf/x. (Thanks to John Walsh for the bow analogy).

This break value becomes even more valuable (at least to me) when you break it up into x and z components and Dr. Alan Nathan's website has some (more) helpful equations that allow you to calculate break-z and break-x values. To visualize break-z, imagine keeping the endpoints constant and rotating the bow around the string until the bow was above the string and perpendicular to the ground. Break-x is the same thing but the bow is parallel to the ground (don't worry if the bow is to the left or right of the string just yet). The break values are vary similar to the pfx values, except they are in reference to an imaginary straight line, something that is easy to visualize. If the break-z value is 17 inches for a Barry Zito curve, that means it really breaks 17 inches from it's "high point" to where it crosses home. If Mariano Rivera's cutter has a break-x value of -1.3 inches, that means it moves 1.3 inches in on a lefty between it's maximum horizontal deviation and end point . This makes a ton of sense and is much closer to how break is thought of.

Once you understand and are comfortable with the break values, they act pretty much the same as the pfx values, with the benefit of meaning something. Comparing the two Barry Zito graphs below show some of the similarities. The new definition of break in graphed on the left, while the no-spin version is graphed on the right. One thing to note is that because of a convention change, positive break x values (left hand graph) are negative pfx_x values (right hand graph), but the basic pattern of pitches is the same in both cases.

Negative break-x values mean movement away from a RHB, and you can see that Zito's pitches typically move away from a RHB. This type of horizontal movement (toward the arm-side) is what you would expect for a fastball and change-up from any pitcher. Zito's curveball breaks slightly away LHB, which is how curveballs from LHP are "supposed" to break, but the magnitude of Zito's horizontal break is less than normal. The table below shows other similar curveballs from LHP, sorted by their vertical break.

```Name          Count    BreakX    BreakZ   MPH
Barry Zito    142      0.15"     17.18"   70.1
Doug Davis    165      2.31"     16.83"   68.0
Ted Lilly     157      1.73"     15.62"   70.8
Sean Marshall 62       2.24"     15.47"   73.2
Rich Hill     202      3.10"     14.93"   73.2
Lenny DiNardo 95       0.78"     14.68"   69.9
```

Zito's curveball actually has the biggest vertical drop of any pitch thrown this year, and comparing it to the other pitches in the chart, you see that the horizontal break is much lower. Zito has historically fared better when throwing to RHB than LHB (669/730 career OPS ) so maybe his unique curveball is the reason why. It's reasonable to think that because the curveball doesn't move away from LHB as much as normal, they would have an easier time hitting it. The only pitcher with a similar curveball is DiNardo and he too shows a reverse split (792 OPS career vs. RHB/814 OPS vs. LHP). Joe Saunders' curve is the next most similar to DiNardo's, although it has less vertical break and an almost normal horizontal break, but he doesn't have a reverse split. However, once you get past Saunders, no other curveballs have a horizontal break close to Zito or DiNardo's.

On The Book's blog this week, there was a discussion about comparing Mariano Rivera's cutter to other pitches and seeing if pitchers that threw those pitches had a reverse split like Rivera. The only problem with doing this for Rivera is you have a better chance of seeing Bigfoot as finding a pitch similar to his cutter. First of all, the horizontal movement on the pitch is totally unique. No other fastball (from either a lefty or righty) breaks as much to the pitcher's glove side as Rivera's does. The amount of movement he gets is consistent with a slider, but the cutter is thrown faster than an average fastball. A final difference is that it also breaks less vertically than a slider does. The table below shows some of the comparable pitches to Rivera's cutter, based on horizontal movement.

```Name            Pitch    BreakX   BreakZ  MPH
Tim Hudson      Cutter  -0.66"    6.67    87.0
Miguel Batista  Cutter  -0.71"    5.27    89.6
Gil Meche       Slider  -0.97"    5.87    87.1
Mariano Rivera  Cutter  -1.30"    4.11    93.0
Buddy Carlyle   Slider  -1.44"    5.41    87.3
John Smoltz     Slider  -1.56"    6.31    87.2
Dustin McGowan  Slider  -1.66"    7.88    87.4
```

None of these pitches match Rivera's cutter very well and Meche is the only one of these pitchers to have a reverse split for his career. One idea I had as I was looking at Zito and Rivera is that uniqueness in horizontal movement might cause reverse splits. Rivera throws a fastball that breaks horizontally like nobody else's in baseball. Zito's curve is unique not due to it's vertical break (although it is large), but it's lack of horizontal break.

I had two topics I wanted to cover this week and while the second one is important to me, it's probably a little less interesting for other people, but I'm using a new algorithm to categorize pitches. It works better than applying a set of logical rules to each pitch and takes less time to run too.

As far as the nuts and bolts of the system, for each pitcher, the algorithm calculates the distance between each pitch using the their break and velocity. Once it has the distances between each pitch, it combines the two pitches that are closest together, recalculates the distances between that new cluster and the remaining pitches, and combines the next two objects that are closest together. It repeats this process until it reaches a certain level of difference between groups. Once the algorithm has run for an individual pitcher, all of their pitches are assigned to a certain group, and using some of the logical statements from my original filter, as well as other patterns regarding the speed and break of different types of pitches, I can label each group (and all it's members) as a specific pitch type.

Labeling pitches by group membership is better than applying a set of static rules to every individual pitch in the database because it allows me to compare different pitches to the rest of that pitcher's repertoire and not worry about how it compares to a global rule. One problem with my old filter was that I had to find a way to get Jamie Moyer and Josh Beckett's fastballs to both be recognized as fastballs, which wasn't easy given the differences in speed. With the new method, the fastest group for each pitcher is automatically labeled as a fastball...no fuss, no muss. This new algorithm is also more successful at identifying individual pitches at the edges of clusters. These pitches clearly belong with the rest of the cluster, but with the old system, these pitches would occasionally not match the logical rules used for classification and be labeled as unknown pitches.

While some of the kinks are still being worked out of this classification system, I can still generate a list of fastballs (for pitchers who have thrown at least 500 total pitches) and see which ones have the greatest vertical break.

```Name            N       BreakX   BreakZ  MPH
Sean Green      300     3.64"    8.49"   89.8
Jesse Litsch    290    -0.59"    7.23"   84.8
Brandon Webb    637     3.71"    7.06"   89.0
Kameron Loe     428     3.14"    6.37"   88.6
Greg Maddux     555     3.56"    6.36"   86.3
Derek Lowe      670     3.93"    6.32"   90.3
Jake Westbrook  462     3.50"    6.28"   90.8
Justin Germano  466     3.38"    5.79"   86.9
Roy Halladay    268     3.51"    5.60"   93.9
Jamey Wright    320     3.02"    5.59"   89.1
```

Look familiar? Instead of saying Webb's sinker ends up 3 inches higher than a non-spinning pitch, while a 4-seam fastball ends up 6 inches higher (or whatever the numbers were), now you can say that Webb's sinker has a 7 inch downward break.

Very interesting and definitely an improvement over the theoretical spinless pitch- thanks a lot for the work.

joe -- your articles are very good but you need to read them thoroughly before posting

alex

I've got to be honest I don't see what all the fuss is about with regards to the pfx_x/_z values.

In the x direction a pitch with no spin doesn't move, so the pfx_x value is just the deviation of the ball from the straight line path as it crosses the plate -- to me that is reasonably intuitive. In the z direction, for the spinless pitch the only force on the ball is gravity and the pfx_z value is the deviation from this theoretical pitch, as we all know.

The problem I see with break_x and break_z is that it doesn't always measure true break, in my opinion. The slower the pitch the higher the break (imagine tossing a weak parabolic underarm pitch). This would have a VERY high break under the break_x/_z system but in actual fact won't break much at all. The parameters (break and speed) are dependent.

Perhaps from communication point of view we should use the break definition but from an analysts point of view I feel it is less meaningful.

There are benefits and problems with both definitions of break and I don't know if either of them totally describes what people think of as break. I'm also not sure if there is a common definition for "break" that is being used yet, which I think might be where some of the confusion comes from.

You have a good point about the horizontal break, but for vertical break I don't agree with you. As an analyst, I feel totally comfortable using the pfx_z data to compare the vertical movement of pitches and classify them, and I understand what it means. However, as a reader I don't know exactly what is being described. I can't visualize the pfx_z data, which I think is important in this type of analysis. Another important thing to realize is that the pfx values are relative to the pitch type being thrown. They describe the path of a non-spinning pitch with the same initial parameters (initial velocity and release point) as a given pitch. However, those initial parameters, especially velocity, are unique for different pitch types, so you end up with a different baseline depending on which pitch you're looking at.

The break z value, especially on something simple like a 12-6 curveball is exactly how I would describe break (distance from the top of it's path to where it ends up), but I agree that an underhand pitch would have a weird "break z" number. How would you describe that pitch? I'm not asking to be difficult, but because I don't know. Would you say it doesn't break? Again, I'm not sure.

You mentioned that speed is an important factor in determining break and there is a term in the XML that describes the y-distance from home when the "break_length" value is calculated. Maybe incorporating the distance from home when the max break occurs could account for the parabolic pitch and the "sharpness" of the break?

Overall, I think the two terms are telling pretty much the same story for most pitches, so maybe it would be interesting to look at pitches where the two break values disagree, like in the case of the parabola pitch.

Is it really difficult to visualise a pfx_z? Perhaps I am out of tune with reality (that is definitely possible)but having a reference pitch where the only force acting on the ball is that of gravity is a good reference (imo). A fastball has a bit of backspin which creates lift which is why it "rises" -- people talk about rising fastballs and this is, in a sense, what people mean. It accurately describes the physics of the pitch.

The point you make, which is that no-one has ever thrown such a pitch is correct. However, I'd challenge anyone to differentiate between a fast ball that 'rose' 3" and one that 'rose' 6". I think it would be difficult. It is very tricky to create a baseline pitch.

The issue with the break_z component is that I don't think we can use it to compare pitchers because if Zito throws his curveball slower then Zito will almost certainly have imparted a larger break on it as a result of it travelling at a lower speed and therefore having to have a higher peak to cross the plate. What the fan on the street really wants to know is how much more does pitch move than a typical curveball (thrown at the same speed -- though said fan might not care about that). Unless we classify all curveballs and define the 3d angular velocity to create an average curveball I fear we are stuck with the pfx_z component as the best fit.

Rereading your post I think we are probably on the same page. The question is how should the analytic community communicate the amazing results on the pfx data to fans. In that environment break_z definitely works but for me isn't analytically as rigorous.

If I didn't say the first time (which I didn't) great work and good to see you posting weekly. Keep up the good work. This offseason, once the THT annual is out of the way I want to spend a lot of time with these data -- there is a tresure trove of great stuff out there

Even speed affects the pfx values. For instance, say, there's two balls with the same spin but one is thrown slower than the other. The slower ball will have more break (in the pfx sense). But like John Beamer said, the break_z values will even more influenced by speed because of the force of gravity. It does more accurate show what is seen when watching the game, but I'll continue looking at the pfx values. It might get more people involved and interested using those break values though. Or it might just confuse more people.

I think in that last table it would have been nice to have included the pfx_z values too see how they compare, and maybe even include the az values.

Ultxmzpx.

I'm not sure that is right -- it could be but I'm not sure. My understanding of the pfx_z value is that it is the where the pitch would end up if everything were the same bar spin. Therefore for two identical balls with the same spin but different speeds then the zero spin balls will end up in different places.

John,

I meant visualize in terms of actually imagining how a pitch actually moved. I have no problem imagining a pitch that only is affected by gravity and using that in my analysis, I just don’t know what that looks like during it’s flight toward home. Even though speed impacts break_z, I have a good idea of what a straight line from release point to home looks like. I agree that pfx_z is a more meaningful term to compare between pitchers because it removes the impact of speed from its measurement, but I think that whatever is measured by break_z (break? Gravity drop? Speed drop?) is important too, maybe not for comparing different pitchers, but for what the pitches did during their flight. Overall though, I think we are on the same page with this stuff.

Ultxmzpx

I’m not sure if you’re right on that either. I think there is a different “no-spin” pitch calculated for every pitch thrown and depends on the initial speed and release point of the pitch. If you graph pfx_z and break_z, there's pretty much a linear relationship between the two.

You guys may be right.

For (my) clear understanding, the amount of break on a pitch depends on the angular velocity or spin rate (rpm) on the ball. But is there another variable that break amount depends on in the pfx calculations? Is it the distance or is it the time the ball travels? Is the spin rate already dependent on the velocity of the pitch, so velocity doesn't matter in the pfx calculations?

A ball with no spin is called a knuckleball and is very unpredictable due to the seams catching air. A ball with no spin does not go straight.

Of course, you guys know that, right?

A knuckleball has a very slight amount of spin, maybe 1/2 to 1/4 of a rotation. The knuckle effect is due to air flowing over the stitches of the ball in an asymmetric way. An actual pitch thrown without any spin at all will probably gain a little spin on the way to the plate, due to imbalances, and act like a knuckleball as a result. A pitch thrown with no-spin and that doesn't acquire any spin on the way to the plate will break in a predictable way, based on the initial orientation of the seams. Since we don't know the orientation of the seams, the assumption is the only forces acting on the ball are gravity and drag.

A ball truly without spin will break, but it isn't what we think of as a knuckleball. A slowly spinning ball will knuckle.

Is the path of the no-spin pitch calculated based on release point or release vector?

FWIW, I prefer the definition of break to be how much the ball has deviated from a ball with no spin.

Even speed affects the pfx values. For instance, say, there's two balls with the same spin but one is thrown slower than the other. The slower ball will have more break (in the pfx sense).

Are we sure that's really the way it works? I mean, it seems like a logical conclusion, but for anyone that's watched baseball for a time, things don't always follow from what seems logical.

The Magnus force that causes the ball to break is created from the spin of the ball, but also from the movement of the ball through the air. Visualize this by imagining a ball that's spinning but is not moving throught the air. There's no break or movement by the ball. It's only when it starts moving through the air that it gains the Magnus force. So if two balls are thrown with the same amount of spin, but different velocities, wouldn't the faster pitch create more Magnus force, and have a greater break velocity? Less time in the air to break, for sure, but maybe that extra Magnus force makes up for that.

Joe,

I didn't see you include Brian Wilson and Jason Isringausen's cutters in the comparison to Rivera. I haven't seen the raw data, but from viewing the Gameday applet, it seems those two's cutters might be a good match for Rivera's.

Also something weird I notice about Brian Wilson's "straight" fastball, is that it has glove-side movement as opposted to arm-side movement.

As far as the break on a curveball. It really depends on the arm slot of the pitcher. Zito throws from a "12 o'clock" arm slot, so his curve breaks a true 12-6. A few other pitchers that throw from a near 12 o'clock slot are Buchholz, Lincecum, and Gallardo. Maybe you compare their curveballs against Zito's. I'm surprised to see that DiNardo's curveball doesn't have much lateral movement, as his arm slot is not as high as close to 12 o'clock as Zito's.