Designated HitterOctober 06, 2005
Extra Base Hits
By James Click

Baserunning - actually running the bases as opposed to stolen bases - has long been one of the more ignored aspects of baseball performance analysis. There have been brief discussions of it here and there (Dan Fox's work comes to mind), but a comprehensive study of it was lacking. To that end, I made a stab at valuing baserunning in Baseball Prospectus 2005 in an article entitled "Station to Station: The Expensive Art of Baserunning." While any data on baserunning numbers is a welcome relief to the void that currently exists, when evaluating baserunning skill and decision it's important to remember that there are many factors at play when a baserunner decides to attempt the extra base.

In Baseball Prospectus 2005, I considered several factors and their impact on baserunning decisions: the ballpark, the number of outs, the fielders, and the batter at the plate. Before applying each of these factors to the baserunning numbers, it's important to confirm that each one has a consistent effect on baserunner performance. For example, baserunning park factors - for both attempt rate and success rate - are very consistent from year to year. Much like other aspects of offense, the ballpark affects baserunning performance, though in the case of baserunning, I assume the size of the outfield and irregularity of the dimensions has more to do with it than things like the hitters background, size of foul territory, altitude, and other more general differences. Regardless, because baserunning park factors are so consistent from year to year, we can say with confidence that the park has an effect on baserunning numbers.

An even stronger correlation was present with regards to the number of outs, but the other two aspects - the fielder and the batter - were found to be essentially random and thus were not considered. It's the latter of these two factors to which I want to return today.

It's difficult to imagine that the batter at the plate has no discernable effect on the ability of baserunners to advance extra bases. Slap hitters like Ichiro Suzuki and Juan Pierre would seem unlikely to advance baserunners beyond the next bag on their high numbers of infield singles while power hitters like David Ortiz would seem likely to advance those runners on booming singles off the wall or cut off in the gap. There are some possible reasons for this - the same ball hit to the same place may be a single for some batters and a double for others or runners may be able to more quickly determine that a slap single is a single as opposed to a drive to the gap or wall - but on the whole, the absence of any ability to advance baserunners by batters was surprising.

With another year's perspective, let's dive back in and check this out again. To do so, three common baserunning situations will be considered: a single with a runner on first or second and a double with a runner on first. (If there are runners on first and second when a single is hit, only the lead runner is considered.) In each of these situations, any runner advancing more than the number of total bases of the hit will be considered to have taken the extra base. Additionally, runners thrown out at the extra base will also be considered to have attempted the extra base.

Next, each batter and runners totals will be adjusted for three factors: the park, the number of outs, and if there is a full count or not. The first two of these were used in the original analysis in Baseball Prospectus 2005, but the final one is a new twist. It's a well-known fact that runners get a head start when there is a full count on the batter and the difference between runners attempting the extra base with a full count and without is remarkably consistent from year to year. From 1990-2005, the attempt rate was between 43% and 47% without a full count and 58% and 68% with three balls and two strikes. (Interestingly, the lowest attempt rate with a full count prior to 2004 was 62%, but the numbers the last two years have been unusually low. Insert your chosen rant about "modern players not doing the little things to win" and "playing the game the right way" here.)

To determine if a batter truly has an effect on the runners on base, it's important to remember that specific batters and baserunners are often paired because lineup orders are often repeated. Sluggers in the middle of the lineup usually come to the plate with leadoff or #2 hitters on base, runners who are often the team's best baserunners. Thus, it may initially appear that middle-of-the-lineup hitters advance baserunners more than other hitters, but that conclusion would be based more on the quality of the baserunners than that of the hitter. Instead, for each batter and runner, an expected attempt rate (ATTr) is calculated by looking at the baserunning numbers excluding a particular runner.

Working with an example should make things clearer. Assume that Ortiz hit a single or a double in with Johnny Damon on first ten times and Damon took the extra base seven of those times. To determine who's more responsible for that advancement, we'll instead calculate Damon's baserunning in all situations except when Ortiz is batting. Assume then that Damon takes the extra base 60% of the time when other batters are hitting. In this case, then, Ortiz is credited with one of Damon's seven advances because Damon normally only takes six of ten extra bases.

Once those numbers are adjusted for the park, the count, and the number of outs, we can total the number of extra bases we would have expected each batter to advance the baserunners based on the ATTr of the runners when that batter was not up. We'll dub this rate - the ATTr above expected - netATTr. If batters show consistent netATTr from season to season, we can say with confidence that batters do have an influence on baserunning numbers.

Unfortunately, the correlation looks something like this:


Ugh. That may look more like a Rorschach test than a correlation, but it's simply an extremely definitive picture of complete randomness. If batters showed a strong tendency to advance baserunners more than expected from year to year, those dots would form more a line from the lower left quadrant to the upper right. Instead, a great, lifeless blob stares back at us from the center of the plot with no discernable trend.

Of course, the problem could be that the same sizes are too small. The set of data in use - singles or doubles hit with men on first or second - don't necessarily occur very often for most batters every season. Given that, we can employ a technique based on the one used by Keith Woolner in his rebuttal to Voros McCracken's research on Pitcher Control on Balls in Play. In that article, Woolner broke pitcher careers into two halves and compared them, but rather than breaking them up chronologically, he put all even-numbered seasons in one half and all odd-numbered seasons in another. This technique drastically increases sample size while still effectively choosing random data to avoid picking up other trends. (In the current case, we want to avoid batters who changed their hitting approach later in their careers, perhaps advancing more or fewer baserunners as a result.)

Comparing those two career halves and restricting it to batters who were involved in at least 100 baserunning instances in each half of their careers, the scatter plot now looks like this:


That may not look like much, but our correlation has jumped dramatically. In statistics, a tool called the coefficient of determination - commonly referred to as r-squared - reveals how much of the variance in one set of data can be explained by the other. R-squared is presented on a scale of 0 to 1 with 1 being a perfect correlation and 0 being complete randomness. In the first scatter plot, r-squared was 0.001, indicating nearly complete randomness. In the second plot, r-squared has jumped to .249, meaning 24.9% of the change in netATTr in one career half can be explained by the other half. Generally, r-squared needs to be a little higher before employing one variable to project the other. But for the purposes of establishing whether or not batters have some control over the runners on base in front of them, we can say that - contrary to the complete randomness seen in the first plot - there does appear to be some ability, albeit inconsistent, for hitters to advance baserunners more or less than league average.

This conclusion fits with what most of us have seen on the field. Batters show a huge degree of variance from season to season when it comes to advancing baserunners more than the runners would advance themselves, but over a career, there are some batters who will move runners around the bases a little more often than their lighter-hitting counterparts.

James Click is an author for Baseball Prospectus where he writes a weekly column, Crooked Numbers, and spends too much time looking up random baseball stats that he forgets as soon as the query is done running, a condition that has cost him more than a few bar bets. He lives in San Francisco, CA.


Mr. Click,
I just read your section, "What does Mike Redmond know about Tom Glavine?" in Baseall Between the Numbers, and I was interested in the sabermetric approach to studying platoons and matchups. Up until reading your book I had normally thought of sabermetrics as a tool used most commonly and effectively by GMs in putting together the best team, but now I see how these tools can be used by managers. Has anyone ever made a list of the most important factors in deciding which of two players to play. A manager must ask a number of questions, and I'm wondering which of these answers apply most to how a player will perform on that day. For example:
Who is the better player?
Who is playing better right now?
Who plays better against pitchers of this handedness?
Who plays better against this pitcher?
Is fatigue inhibiting either players performance?
Judging by the tools (not the stats) of the batters and the pitcher, who would seem to be a better fit?
Which hitter fares better in that ball park?

-I know there is a lot in here, but I was wondering if you could rank those questions in order of importance and add some of your own.