Touching BasesMay 19, 2009
HitTracker F/X
By Jeremy Greenhouse

What happens when you combine Hit Tracker data with Pitch f/x data? You get a whole lot of data.

First, I looked at batter age in relation to standard home run distance. Standard home run distance is the distance a home run would travel in neutral conditions if it were to land at field level. My sample contains data on home runs from 2007 and 2008, totaling nearly 10,000 data points.

It appears to me that the age 25-29 peak holds true. I had data on 16 homers hit by players before their 21st birthday and the average distance was 420 feet. This is because Justin Upton is an absolute monster. The oldest grouping of players is likely biased since players who maintain the ability to hit home runs at that age are almost entirely power-happy first basemen and designated hitters. That group will be lighter on lighter-hitting middle infielders than the younger groups.

There are about 500-1000 home runs per grouping, which leaves it prone to skewness. Albert Pujols and Adam Dunn were born two months apart and their tremendous power probably contributed to the large break between ages 28-29 and 29-30.

Next up I graphed standard distance against a batter's weight. It’s a standard assumption that heavier players have more raw power. And even though listed player weights are some of the more unreliable baseball data available, the relationship is still undeniable.

Less obvious is the relationship between home run distance and batter height. Yet the trend is just as distinct.

When it comes to raw power, short players are at a greater disadvantage than light players while heavy players are at a greater advantage than tall players.

All of our assumptions about quantifiable measures that contribute to a batter’s power seem to hold true. Age, height, and weight are important in determining power. With pitch f/x data, we can also see what effects pitchers have on home run distance. This is getting into Defensive Independent Pitching Statistics theory. Max Marchi wrote a couple of great articles combining hit location and pitch f/x data. A good chunk of gameday data from 2007 did not have pitch f/x data, so I am working with closer to 7,000 home runs.

One would think that pitch velocity plays a part in determining how hard a ball is hit. To compare apples to apples, I used Hit Tracker’s speed off bat measure instead of standard distance.

It looks to me like pitch velocity is insignificant. Perhaps on the slowest of pitches, the ball doesn’t receive the same force off the bat, but every group faster than 80 miles per hour generates a speed off bat within half a mile per hour of each other. That’s nothing.

I wanted to see if there were any balls that left the pitcher’s hand with a greater velocity than that which they flew off the bat. There were about a dozen cases, with the biggest disparity in velocity coming on a 345-foot, 96 mile per hour Carlos Pena homer off a 99 mile per hour A.J. Burnett fastball.

Now, if I were Dave Allen I would come up with some awesome heat charts to demonstrate the relationship between pitch location and standard distance. I am not. But I do have bar charts. Here is pitch height plotted against standard distance.

I’m 6’2” and the top of my knee is exactly two feet high. Meanwhile, the top of my belt would be 3.5 feet high, but there just aren’t that many homers hit in the top layer of the strike zone. It would appear that home runs are hit the farthest on pitches at or around the knees. I’m not a physicist, or a physician for that matter, but I believe there are two factors a batter can control in how far he hits the ball: force and trajectory. I decided to break these down by pitch height.

pitchheight.jpg

Batters hit the ball hardest on pitches down in the zone. But the elevation angle—which is defined by Hit Tracker as the angle above horizontal at which the ball left the bat, in degrees—might actually determine why balls fly farther when batters go down to get them. The increase In elevation angle is uniform, and in general the lower the elevation angle, the higher the home run distance. The correlation coefficient between the terms is -.25. Furthermore, there is a correlation coefficient of -.5 between elevation angle and speed off bat, which affirms that batters want to get on top of the ball, so to speak. Of course, the reason for the negative correlation between home run distance and pitch height could actually be the horizontal launch angle. Maybe low pitches are easier to turn on than higher pitches.

I broke down horizontal pitch location by batter handedness.

This is from the batter’s perspective, so pitches 2-6 inches from the center of the plate (on the right) are outside to right-handed batters.

I’m extremely surprised to see that batters hit pitches outside farther than they hit pitches inside.

I incorporated pitcher handedness as well as home run field location to find the differences in platoon splits.

platoon.jpg

Lefties not only hit longer homers on outside pitches than righties, but they also hit longer opposite-field home runs. These two points are probably intertwined. Other than that, I don’t see anything notable in platoon splits.

Finally, I looked at the count’s effect on home run distance. I might have saved the best for last, as there is quite a clear relationship, which strongly signifies a change in hitter approach.

counts.jpg

On 3-0, hitters get better pitches to hit and might even swing harder when they choose to let it fly, and with two strikes hitters get worse pitches to hit and might shorten their swing to protect the plate. Again, this is selective sampling. Batters will only hit home runs on decent pitches. And pitchers are even more likely to throw fastballs over the heart of the plate when behind in the count than they are when ahead.

Thanks to Greg Rybarczyk and MLB for making all this wonderful data freely available.

Comments

As far as weight & power, it may be the case that heavier players must have power because their weight makes them poor fielders so they have to be more useful with their bat. Heavy guys that don't have power probably don't last long in the majors (Sean Casey aside).

I'd guess heavier players draw more walks too, though walks are not at all directly related to weight but rather are indirectly related to having power and being older.

Really cool stuff.

Some of the charts/graphs show only a difference of a foot or so between delineations. Is such a small measurement significant? You're talking about 1 foot out of almost 400. 1/4 of one percent. Seems pretty insignificant.

Mike, good point. Heavy guys aren't just more powerful, they have to be.

Saber, I thought the pitch velocity chart was insignificant. The rest of them I thought had telling results. What did you think?

Jeremy, loved the article. Some of these are questions I had been looking to answer for a while now myself. But I think you got this part backwards:

"I wanted to see if there were any balls that left the bat with a greater velocity than that which they left the pitcher’s hand. There were about a dozen, with the biggest disparity in velocity coming on a 345-foot, 96 mile per hour Carlos Pena homer off a 99 mile per hour A.J. Burnett fastball."

In that case, the pitch was faster than the SOB, which I agree is rare. Switch around the first sentence and it makes sense. So I think it should read: "I wanted to see if there were any balls that left the bat with a lesser velocity than that which they left the pitcher’s hand..."

Overall, though, extremely interesting work.

Jeremy,

Before I comment, let me say this is great stuff, and I thank you for investigating and writing it up for all of our enjoyment.

There are two obvious (to me anyway) approaches to checking significance. The first is something I'd encourage you to always do on plots like the first four, which is to add error bars. Here you'd calculate them by computing the standard deviation of the sample in each bin and dividing by the square root of the number of samples in each bin to get the error on measurement of the mean. This would probably be somewhat innacurate, since the underlying distributions are not going to be gaussian, but it would give a ballpark from which to judge significance.

The second is to take the individual measurements and do regression analysis to calculate the significance of the regression coefficients.

Larry,

Yeah, graphs with error bars are always nice. I'll look into seeing what type of graphs I can do that with. I wasn't so concerned with statistical significance as I was with the importance, as in the magnitude with which each factor can affect home runs.

Throwing this stuff into a regression might be an idea.

Is this a typo? Isn't it supposed to read "less".


"I wanted to see if there were any balls that left the bat with a greater velocity than that which they left the pitcher’s hand. There were about a dozen, with the biggest disparity in velocity coming on a 345-foot, 96 mile per hour Carlos Pena homer off a 99 mile per hour A.J. Burnett fastball."

Thanks. Fixed.

The thing I dont understand is that we are not seeing players hitting longer home runs than the "smaller" players of the 1950s/1960s. Mantle, Killebrew and others were barely six feet and under 200 pounds yet they hit a number of home runs that carried well over 500 feet. Today's players are six inches taller and 30-50 pounds heavier but we're not seeing new distance records. Perhaps it is the lighter bats? Maybe its physics?

Larry

Larry,

I'm sure there are a number of variables. Bats and balls change all the time. We also only have hearsay on how far Mantle and Killebrew actually did hit home runs. And Mantle was a one-of-a-kind player. Perhaps it was just those two unique players who had abnormal raw power and the trend has existed throughout time.

WRT the hearsay of distances, we do know that many parks were larger so for the home run to exist, it had to be significant distance. Also, many moons ago, before data was so widespread, most people would use World Series video/data because it was very well kept. Perhaps a review of WS home runs from the 50s (ff) could shed some light on HR distances from the past.