The Baseball Analysts: The Case of Michael Young and Line Drive Rates

The Case of Michael Young and Line Drive Rates

By Rich Lederer

Courtesy of The Hardball Times, the table below details the top 20 line-drive rates over the past five seasons. Do you notice any repeaters? There are only two players who qualified more than once: David Wright twice (2005 and 2008) and Michael Young FOUR times (2004-2007).

Does this data say more about Young's proclivity in hitting liners, his home ballpark, or the bias of scorekeepers? A combination of the three? Or perhaps something else?

This table captures a number of career years. Freddy Sanchez hit .344 with an OPS of .851 in 2006 vs. career averages of .300 and .753. Brian Roberts hit .314/.903 in 2005 vs. .284/.771. Geoff Jenkins hit .292/.888 in 2005 vs. .275/.834. Chone Figgins hit .330/.825 in 2007 vs. .290/.743. Ryan Ludwick hit .299/.966 in 2008 vs. .273/.857. Brady Clark hit .306/.798 in 2005 vs. .277/.744. Joe Mauer hit .347/.936 in 2006 vs. .317/.856.

Other than Juan Pierre, all of these players had BA/BIP over .300 with a mean of .340. Young, for what it's worth, owns three of the top four BA/RISP (among this sample size), including the only one greater than .400.

Of note, Young is the only Texas player included in the above list, which suggests LD% has more to do with the hitter than the effects of the ballpark or scorekeeper. However, it should be noted that Mark Teixeira had a 28.2% LD rate in 2003. In addition, Hank Blalock (2005), Milton Bradley (2008), and Ian Kinsler (2008) had rates that fell just outside the top 20. As such, I think it is fair to say that ballparks influence LD rates.

According to Baseball Analysts contributor Jeremy Greenhouse, there have been about 50 Rangers with at least 100 plate appearances since 2005 and the average line-drive rate (sans Young) was 20.5% vs. 19.9% league wide. Furthermore, in a study at Fangraphs, Brian Cartwright determined that "a batter is 18% more likely to have a batted ball coded as a LD" in Arlington . . . "while in Minneapolis, it's 20% less likely."

As Tangotiger wrote in response to Brian's work, "A 'line drive' is not necessarily a line drive. If hitters are showing as hitting 20% fewer line drives in the Metrodome than away from the Metrodome, we don't know if it's because the Metrodome depresses LD rates, or if it's because the scorer in Minnesota is depressing it. Since it makes a huge difference when looking at LD and FB rates, then you need some sort of park factor to normalize the data . . . Taking a guess, I have to believe this is a scorer issue. A line drive is really a batted ball that leaves the bat at a certain angle, at a certain velocity. I don't see how those things would affect whether a ball is a LD, FB, or GB, regardless of the park you are in. I can see how the scorer can be influenced by the positioning of the fielder (and worse, if the fielder caught the ball or not), and try to assign a batted ball code."

The thread attached to Tango's comments is fascinating and includes posts by Colin Wyers, Mike Fast, MGL, Greg Rybarczyk, Dave Studeman, and David Gassko. It is worth reading if you're into advanced batted ball studies. As studes points out, "From my work in the 2006 THT Annual, there was a greater standard error in line drive rates per park than in GB or Outfield Fly rates. Not outrageously higher, but definitely higher." You can also download a PDF of the 2004 THT Annual that includes Robert Dudek’s groundbreaking article on hang time, which is important because, as Tango notes, "how much time it takes for the ball and the fielder to intersect" is what is really important in differentiating between batted balls.

There are a number of questions to ask when it comes to batted balls. What percentage is attributed to the hitter or pitcher, the ballpark, or the scorekeeper? What distinguishes a line drive from a hard-hit groundball or a looping flyball? Is a one hopper that skips past the infield classified as a grounder or a liner? Does the ball have to hit the outfield grass first in order to be coded as a line drive? How high can a ball be hit and still be considered a line drive? Should the outcome have an effect on how a batted ball is coded? Does the outcome have an effect?

Play by play, batted ball, pitch f/x. We know a lot more today than we did just five years ago and we will know a lot more in five years than we know today. Hit f/x is next. Stats are not ridiculous. Only those who ignore (the right) stats are ridiculous.

Comments

Obvious question: Are there big differences in Young's (and other TX players') LD rates in home and away games? I looked but couldn't find home-away splits for LD rates anywhere.

Posted by: t ball at March 9, 2009 10:31 AM

I don't have access to Young's batted ball splits. However, as noted in the article with respect to Texas players in general, "a batter is 18% more likely to have a batted ball coded as a LD in Arlington" than on the road.

Batted ball splits are not broken out by Fangraphs, THT, Bill James Online, or any other publically available source that I'm aware of although this information could be queried from a database like Retrosheet or GameDay.

Perhaps Brian Cartwright, should he stop by here, could provide the info on Young.

Posted by: Rich Lederer at March 9, 2009 1:41 PM

I agree, Harold Reynolds IS ridiculous.

Posted by: Alex at March 9, 2009 3:13 PM

They will never listen, Rich. It seems to be a no-win trying to get non "stat-heads" to understand the numbers. Some come around, it may take years, and some never come around (or at least never admit to coming around).

Posted by: Joe at March 9, 2009 3:18 PM

HR is just a bitter man these days.

Posted by: Rafa at March 9, 2009 5:24 PM

My ears were burning.

I was updating my Park Factors thru 2008, and added extra categories such as LD% and Foul Fly%. I regret to say I haven't finished it yet, I was recoding from scratch as I know a lot more SQL than when I did it the first time. I got it through the first iteration, but never did the 2nd & 3rd iterations to adjust the factors of the road parks in the calculations.

RetroSheet LD data is complete and presumed accurate from 2003-2008, as well as some earlier years, but NOT for 1999-2002, so I restricted the LD studies to the past six seasons. My full results are still unpublished.

Top "Career" rates for those six seasons, 500 or more BIP, are
.279 Cory Sullivan
.245 Todd Helton
.243 Mark Loretta
.242 Michael Young
.238 Garrett Atkins
.238 Bobby Abreu
.235 Freddy Sanchez
.235 Josh Hamilton
.234 Lyle Overbay
.231 Ian Kinsler

There's several Rockies and Rangers. Torii Hunter, Jacque Jones, Luis Rivas and Lew Ford are in the bottom 10.

I just ran an ad-hoc query to find the six-year totals at Arlington and everywhere else (sorted by BIP at Arlington)

ARL Oth PF
.264 .220 1.20 Michael Young
.259 .190 1.36 Hank Blalock
.241 .186 1.29 Mark Teixeira
.207 .170 1.22 Kevin Mench
.265 .197 1.35 Ian Kinsler
.187 .163 1.15 Gary Matthews
.219 .191 1.15 Alfonso Soriano
.183 .169 1.08 Rod Barajas
.216 .163 1.32 Gerald Laird
.237 .202 1.17 Marlon Byrd
.222 .165 1.35 David Dellucci
.222 .201 1.10 Laynce Nix
.195 .177 1.10 Alex Rodriguez
.242 .194 1.25 Frank Catalanotto
.295 .200 1.48 Josh Hamilton

After making this list, a couple things I thought of to test for - are home team batters scored differently than visitors, and is there any difference by the "reputation" of the batter?

Posted by: Brian Cartwright at March 10, 2009 6:14 PM

Thanks, Brian. Great stuff. Your tables are powerful.

The questions you posed at the end are worthwhile to explore although it's unclear to me how you would determine "reputation" of the batter.

Do you know what the typical home/road adjustment is for line drives? Just as players, all things equal, tend to perform better at home than on the road with respect to AVG/OBP/SLG, I would have to believe that the same holds true for LD. However, as Dave Cameron pointed out, BABIP, which is loosely correlated with LD rates, has consistently been higher on the road than at home.

Like umpires, there may be more bias from certain scorekeepers than others, both in terms of coding batted balls one way or the other and in treating home and visiting players.

Posted by: Rich Lederer at March 10, 2009 7:35 PM

Excuse my ignorance, but what's the importance of a high line drive rate? It's not like that list is full of stars. Loretta, Kennedy, Clark? What am I missing? Thanks in advance.

Posted by: Nails at March 11, 2009 8:56 AM

As partly a stats guy, this stuff fascinates me. I will admit, however, that I'm not much a stats guy when it comes to fielding metrics because I don't think the variables (at least at this point) are capable of being isolated. When hit f/x comes out, I'll then be more inclined to trust some fielding metrics than I am now. Stats are great, but in some specific areas I think they have a ways to go. Fortunately with the increasing ability to both collect and mine data, it's going to be impossible to ignore.

Your guys' work is pioneering and a joy to peruse. Keep up the good work.

Posted by: Russ at March 11, 2009 12:31 PM

Meant to include the segue that brought me to defense in the first place (otherwise I'm sure my post reads like it's out of place, which it might be :-))

Some of the factors that influence the coding of the LD%, such as park scorer and others, bother me when it comes to trying to get reliable metrics along with predictability. I think the latest craze regarding defensive metrics has this same problem, only worse. Like I mentioned in the previous post, I am a stats guy (sort of - my father was a former beat writer for an MLB team so I still rely on my eyes for the 'sniff test'), but I can't get myself to trust in defensive metrics, because they seem so context driven with what appear to be an awful lot of random variables. The same problem you guys allude to in the LD% would seem to me to be a bigger one when it comes to defense.

Posted by: Russ at March 11, 2009 12:39 PM

My general model is to calculate the expected value and compare it to the observed value. The science is in building an expected value.

For fielding, I was already an adult when all we still had as games, put outs, assists and errors. It didn't give us a very clear picture of how many actual opportunities a fielder had. Today, with the publicly available play by play, we still don't have the answer to "who's ball was it?" such as on a ground single through the hole.

The title of my article that Rich linked to was "What I Hate About Line Drives" - no one knew how much LD% varied between ballparks. The problem is that the expected value of a line drive is so very much different than the expected value of a fly ball, but at the same time the data recorders have a problem labeling them in an objective manner.

Hit f/x will hopefully be able to tell us the speed off the bat, elevation angle, horizintal angle, and distnace travelled for each batted ball. Then, wehn calculating how many outs we expected an outfielder to make, given the opportunites presented, we can come up with a less subjective model.

Posted by: Brian Cartwright at March 11, 2009 3:49 PM