Behind the ScoreboardApril 11, 2009
A Study in Home Field Advantage - Will the New Stadiums Be Friendly to NY Teams?
By Sky Andrecheck

On Monday, Major League Baseball will christen two new stadiums, New Yankee Stadium and Citi Field. The previous New York stadiums were known as intimidating places to play in, and fans are probably wondering whether the new ballparks will confer as great of a home field advantage as the old buildings - particularly in the case of Yankee Stadium, where the ghosts of Babe Ruth, Lou Gehrig, and others were said to give Yankee Stadium a certain aura of invincibility. This article attempts to explore the relationship of a stadium to home field advantage, and how it might affect the two New York ballparks.

Of course everybody knows that playing at home is indeed an advantage. Over the course of modern baseball history, the difference between playing at home vs. on the road has been about 80 points of team WPCT - a road team will win about 46% of its games, and a home team will win about 54% of its games, all else being equal. But do some parks confer more of an advantage than others? Or is the "Yankee mystique" no more potent than the Padres mystique?

Gathering data from all major league home parks during the modern era (thanks to Retrosheet), I found the average home field advantage (as defined by home WPCT minus road WPCT) of each park during each year. A quick chi-squared statistical test shows that indeed the home park is highly significant, and not all home parks are identical. To the average baseball fan, this comes as no surprise - we expect that some parks are more advantageous than others, and indeed we see this born out in real data: Fenway Park has a lifetime average advantage of .109 while Seattle's Kingdome had a lifetime home field advantage of .070. So what is it about a park that gives one place a bigger advantage than others?

Home Field Advantage Over Time

One factor to consider is the year - throughout history, home field advantage has fluctuated and it's been suggested that there has been a strong decrease in home field advantage over time. A cursory look at the data implies that surely this is the case - from 1901-1910 teams had a home field advantage of .104, but by the 1980's, the advantage was down to .080. Cyril Morong finds a statistically significant decrease in home field advantage over time, and people have suggested that it's due to shorter travel times, increased luxuries and amenities for the players, more comfortable hotels, etc.

It's a nice theory, but I find it not to be true. Modeling home field advantage using year alone does indeed suggest a powerful effect. However, this theory leaves out an important confounding variable - the fact that ballparks have also changed over time. If the decrease in home field advantage was due to things like air travel and player amenities, we would see decreasing home field advantages even when looking within the same ballpark over time. However, when we run a model with both year and ballpark included, we see that the effect of year on home field advantage is no longer significant, with a p-value of .50 (in fact, the direction of the year effect actually switches to being positive!) Nearly all of the variability over time is due to the parks themselves, not the year. I also ran the model with a pre/post-1960 variable (around the time that travel became easier and cushier for visiting players) instead of using the continuous year variable, and again there was no effect. From this, we see that the reason for the decrease in home field advantage over time actually is due to different ballparks being built, not due to things like air travel and amenities as commonly believed.

With this knowledge, we can move forward more confidently. If individual ballparks do have a major effect on home field advantage, then what are the features that make up this advantage?

What Features Are Advantageous?

For this I looked at several statistical variables for each park, as well as several qualitative variables for each. From the retrosheet data, I was able to calculate park indices for several different statistics: runs, hits, doubles, triples, home runs, walks, and strikeouts. A number greater than 1 indicates the park was more likely to have those events occur, while a number less than 1 indicates the park was less likely. I also created additional variables for each of the above statistics by taking the absolute value of the difference from 1 (so the value for homers would be high if the park either allowed a lot of homers or allowed very few). I also created several subjective variables: whether or not the park's features were "quirky" (odd dimensions, wild wind, strange ground rules, etc.), whether the park strongly favored one handed batter over another, whether the park was in a hot outdoor climate, whether the park was grass or turf, whether the park was a dome, whether the park had "rabid" fans (confined to old baseball cities mostly on the east coast, NY, BOS, PHI, etc.), and what era the park was built in (wooden era, classic era, modern era, nostalgic era).

Running this all in a model (weighted for the number of seasons the park was used), we find that most of the variables are not significant at all. Of the qualitative variables there was no advantage to being in a city with rabid fans, no advantage to a dome, no advantage if the park strongly favored one hand over the other, no advantage to being in a hot climate, and no advantage based on what era the park was built. Of the statistical variables, most had no effect as well. The most interesting result was that the amount of homers allowed by the park had no effect on the home field advantage (all that time agonizing over whether to build a homer happy park, homer deprived park, or simply a homer neutral park was time wasted).

After taking out the insignificant terms, we are left with this final model:

park3.GIF

As you can see, the 5 factors of home field advantage are, in order of significance:
1) Having either a good hitters park or a good pitchers park - not a neutral park.
2) Having a "quirky" park (weird field dimensions, difficult fences, weird wind patterns, etc)
3) Having a park conducive to doubles
4) Having a park conducive to triples
5) Having a park conducive to strikeouts

From the above list, all of the variables seem to favor more unusual parks. Parks which deviate from the normal amount of runs scored seem to be advantageous. Parks that allow a high proportion of doubles and triples also tend to be more unusual, with odd angles and odd dimensions. This makes intuitive sense as well. The more unusual a park is, the more difficult it would be to play in it for the first time - giving the home team, who is already familiar with the park, a significant advantage. Likewise, cookie cutter parks, requiring little adjustment on the part of visiting players, have the lowest advantage. It also could be the case that teams bring in specific players who are particularly suited for an unusual home park, also increasing home field advantage. However, if this were the main reason, I would think we would see a spike in home field advantage after the free agent era, when it became much easier to bring in specific players - since we don't see this, it's likely not the driving force.

The quirky variable is an attempt at a subjective definition of unusual and the model sees it as highly significant - even when considering the statistical variables above. Obviously, the "quirky" variable is highly subjective and is simply a binary variable that doesn't take into account just how quirky a park is, but the inclusion gives some information that statistics alone cannot. It also varies a lot depending on the era of the park - in the wooden/classic era 18 of 23 were considered quirky, in the modern era just 3 out of 27 were considered quirky, and in the nostalgic era 5 out of 17 were considered quirky.

The strikeout variable was perhaps the most interesting of the bunch (though it's only marginally significant) - parks which increase strikeouts tend to increase home field advantage. My guess is that this is related to the hitting background - with more difficult or unusual hitting backgrounds being advantageous to the home team since they have more practice hitting under those tough conditions.

Below is a table of the top and bottom 5 parks in each of the statistical categories (with 10 years or more as an MLB park).

park1.GIF

Additionally you can see a chart of the top and bottom 5 parks according to predicted home field advantage according the model. As you can see, the model predicts Coors Field to be by far the #1 biggest home field advantage in the history of baseball. It's followed by Baker Bowl, that wacky Philadelphia ballpark, and classic Fenway Park. The other parks rounding out the current top 5 most advantageous parks according to the model are Minnesota's Metrodome, Minute Maid Park in Houston, and AT&T Park in San Francisco. The bottom 5 parks are dominated by the more modern ballparks, with New Comiskey Park being the lowest and the other current low advantage parks being Angels Stadium, Jacobs Field, Turner Field, and Camden Yards.

park2.GIF

With an R-squared of .38, the model is far from a perfect fit, explaining only 38% of the variability in home field advantage between parks. The model misses considerably on several parks (in fact, New Comiskey has enjoyed a decent home field advantage over its 18 years). Of the misses that the model makes, its most egregious (accounting for the number of seasons) are overestimates of Camden Yards and Riverfront Stadium and underestimates of Crosley Field and the Astrodome. A graph of the predicted and actual home field advantage of each park can be seen here. As you can see, there are still other unknown factors at play, but the model does a fair job of predicting how much home field advantage a park will bring.

park5.GIF

What Does This Mean?

So, some parks have a greater home field advantage than others, and we now have some idea of why, but is it significant in a baseball sense? Over the course of the year, if we assume a team plays .460 ball on the road, a team with a healthy home field advantage may play .560 ball at home, while a team with a small home field advantage may play only .520 ball at home. The team with the big advantage will win over 3 games more than the team with the small advantage. This is not insignificant at all and could easily be the difference between winning and losing the pennant. Additionally, a good home field advantage is the gift that keeps on giving, with the team reaping the advantage year after year. In the extreme case, Coors field, the ballpark makes a .500 team into an 86 win team - so far a lifetime gain of 70 wins for the park, making it likely the most valuable member of the Rockies franchise.

So what does this mean for new parks being built? If I were building a new park for maximum home field advantage, I would choose one which had a difficult hitting background, increasing strikeouts and making it a low scoring park, with short but high fences down the lines to maximize doubles, and spacious alleys to maximize triples and further minimize scoring. Astroturf also would also help (while astroturf is not significant in itself, it is positively correlated with doubles and triples). Throw in gale force winds, hard brick walls, and a hill in right field, and you'd have yourself a ballpark. It may be baseball's most hideous park, but it'd probably net a fairly decent home field advantage.

Thankfully, neither the Yankees nor Mets decided to go this route and it remains to be seen how their new parks will play. Shea Stadium actually gave the Mets poor home field advantage (.063 actual, .074 predicted), so Citi Field should be a boon to the team. It's not particularly quirky, but is being billed as a hitters park. After a year or two, we'll see how it plays.

Yankee Stadium, in contrast, did give the Yankees a healthy home field advantage (.091 actual, .086 predicted). However, the advantage predictably decreased after the 1970's renovation, when the outfield walls were brought in. Before the renovation, Yankee Stadium had a home field advantage of .094, but in the 20 years since the walls have been brought in to their current dimensions, the advantage has dropped to .070. This can be attributed in part to the triples park factor decreasing from 1.40 to .73. Since the New Yankee Stadium will have the exact same dimensions as the old park, we can expect the park to confer about the same advantage - which is to say, not nearly the advantage that Ruth, DiMaggio, and Mantle enjoyed. Of course, that doesn't consider the ghosts.

An update to this study can be found here.


Comments

Sky -- Does having a park with an above average HFA affect the team on the road? Ie is it possible that the team plays worse on the road if they have an odd home park? I'm specifically thinking of the Rockies, but maybe this extends to other teams as well.

The statement that jumps out at me in the wrap-up is "if we assume a team plays .460 ball on the road"; but can we? That is, does home-field advantage have any countervailing effect on road disadvantage? That is the crucial point for those involved in park design.

Without having thought much about it, it seems at first blush as if the factors that operate to give a home-field advantage would also operate to give an away disadvantage, in that players in highly idiosyncratic parks would be at something of a disadvantage in other places. Indeed, an idiosyncratic park might give a high home-field advantage merely by lowering road win percentages. The thing to check, I'd imagine, is the effect of a given team moving from a fairly standard park into an idiosyncratic new one, or vice-versa, to see which percentage--home or away--changed more, and by how much, and how the two compare.

If one designs an idiosyncratic ballpark to gain a .560 home win percantage but thereby acquires a .430 road win percentage, the move is unwise. So it is crucial to know how the differentials compare, and how park changes affect teams.

Could home field advantage also be due to having better teams? Are your correlating somehow for how well a team does at home with how it does on the road, taking into account skill factors? I would expect, for example, the Yankees to have a better home field advantage than the Rays simply because they've been a better team over the course of their histories.

Sky, I liked your article. I have a couple issues with it, though. One is that it's not very useful to run a regression on a dataset where all but one of the dependant variables are clustered together, which is certainly the case here as Coors Field is a huge outlier. It will affect your data too much. Trying running your regression again without including Coors. You will likely get completely different coefficients. The issue with Coors is almost definitely altitude related, but you do not have that in your regression. This leads to omitted variables bias, where those variables correlated with the omitted variables are falsely attributed as the cause.

I have done a good bit of research on HFA myself. My main article is here:

http://www.thegoodphight.com/2008/7/3/564256/homefield-advantage

It definitely relates to what you did here, and it's a good starting point to look at this stuff as I isolated exactly what homefield advantage affects. It seems like it mostly affects hitters through the mound and the batting eye, though defense (especially outfield defense) seems to be a large culprit as well. A couple other findings I found that were very important include:

(1) HFA is not a persistent effect for the vast majority of teams (a notable exception being Coors)

(2) HFA definitely is related to travel times, but not exclusively. The largest HFA occurs in games where the players are less familiar with the road stadium and where the teams have not traveled the furthest.

(3) In a different article, I found that HFA is actually larger in the 2nd game of the series rather than than the 1st game, so it's probably not a jetlag issue or a travel day issue. (That link is here: http://www.thegoodphight.com/2008/8/17/595739/home-team-dominance-the-mi).

(4) HFA is larger for poorer teams, indicating that home field advantage is actually more of a road field disadvantage than a home field advantage.

Thanks for the comments all around.

Matt, you make an excellent point about the outliers influencing the dataset. The Cook's Distance outlier statistic of Coors Field was the highest of any park, although it was not greater than 1 (it was .35 due to the fact that the data is weighted at there have been only 14 seasons there). Removing it from the data as you suggest does give slightly different results: 1) There is an additional advantage to low scoring parks. 2) The p-value of triples and strikeouts were increased so that they are no longer significant at the .10 level.

Eric, Interesting point - intuitively I would think a team's play on the road would be generally the same - learning to track flyballs in the Metrodome or deal with the Green Monster won't hinder your play in other parks - but I think it's an interesting question to explore.

Check back next week, as I'll likely follow up with another column on park effects.