A Study in Home Field Advantage - Will the New Stadiums Be Friendly to NY Teams?
On Monday, Major League Baseball will christen two new stadiums, New Yankee Stadium and Citi Field. The previous New York stadiums were known as intimidating places to play in, and fans are probably wondering whether the new ballparks will confer as great of a home field advantage as the old buildings - particularly in the case of Yankee Stadium, where the ghosts of Babe Ruth, Lou Gehrig, and others were said to give Yankee Stadium a certain aura of invincibility. This article attempts to explore the relationship of a stadium to home field advantage, and how it might affect the two New York ballparks.
Of course everybody knows that playing at home is indeed an advantage. Over the course of modern baseball history, the difference between playing at home vs. on the road has been about 80 points of team WPCT - a road team will win about 46% of its games, and a home team will win about 54% of its games, all else being equal. But do some parks confer more of an advantage than others? Or is the "Yankee mystique" no more potent than the Padres mystique?
Gathering data from all major league home parks during the modern era (thanks to Retrosheet), I found the average home field advantage (as defined by home WPCT minus road WPCT) of each park during each year. A quick chi-squared statistical test shows that indeed the home park is highly significant, and not all home parks are identical. To the average baseball fan, this comes as no surprise - we expect that some parks are more advantageous than others, and indeed we see this born out in real data: Fenway Park has a lifetime average advantage of .109 while Seattle's Kingdome had a lifetime home field advantage of .070. So what is it about a park that gives one place a bigger advantage than others?
Home Field Advantage Over Time
One factor to consider is the year - throughout history, home field advantage has fluctuated and it's been suggested that there has been a strong decrease in home field advantage over time. A cursory look at the data implies that surely this is the case - from 1901-1910 teams had a home field advantage of .104, but by the 1980's, the advantage was down to .080. Cyril Morong finds a statistically significant decrease in home field advantage over time, and people have suggested that it's due to shorter travel times, increased luxuries and amenities for the players, more comfortable hotels, etc.
It's a nice theory, but I find it not to be true. Modeling home field advantage using year alone does indeed suggest a powerful effect. However, this theory leaves out an important confounding variable - the fact that ballparks have also changed over time. If the decrease in home field advantage was due to things like air travel and player amenities, we would see decreasing home field advantages even when looking within the same ballpark over time. However, when we run a model with both year and ballpark included, we see that the effect of year on home field advantage is no longer significant, with a p-value of .50 (in fact, the direction of the year effect actually switches to being positive!) Nearly all of the variability over time is due to the parks themselves, not the year. I also ran the model with a pre/post-1960 variable (around the time that travel became easier and cushier for visiting players) instead of using the continuous year variable, and again there was no effect. From this, we see that the reason for the decrease in home field advantage over time actually is due to different ballparks being built, not due to things like air travel and amenities as commonly believed.
With this knowledge, we can move forward more confidently. If individual ballparks do have a major effect on home field advantage, then what are the features that make up this advantage?
What Features Are Advantageous?
For this I looked at several statistical variables for each park, as well as several qualitative variables for each. From the retrosheet data, I was able to calculate park indices for several different statistics: runs, hits, doubles, triples, home runs, walks, and strikeouts. A number greater than 1 indicates the park was more likely to have those events occur, while a number less than 1 indicates the park was less likely. I also created additional variables for each of the above statistics by taking the absolute value of the difference from 1 (so the value for homers would be high if the park either allowed a lot of homers or allowed very few). I also created several subjective variables: whether or not the park's features were "quirky" (odd dimensions, wild wind, strange ground rules, etc.), whether the park strongly favored one handed batter over another, whether the park was in a hot outdoor climate, whether the park was grass or turf, whether the park was a dome, whether the park had "rabid" fans (confined to old baseball cities mostly on the east coast, NY, BOS, PHI, etc.), and what era the park was built in (wooden era, classic era, modern era, nostalgic era).
Running this all in a model (weighted for the number of seasons the park was used), we find that most of the variables are not significant at all. Of the qualitative variables there was no advantage to being in a city with rabid fans, no advantage to a dome, no advantage if the park strongly favored one hand over the other, no advantage to being in a hot climate, and no advantage based on what era the park was built. Of the statistical variables, most had no effect as well. The most interesting result was that the amount of homers allowed by the park had no effect on the home field advantage (all that time agonizing over whether to build a homer happy park, homer deprived park, or simply a homer neutral park was time wasted).
After taking out the insignificant terms, we are left with this final model:
As you can see, the 5 factors of home field advantage are, in order of significance:
From the above list, all of the variables seem to favor more unusual parks. Parks which deviate from the normal amount of runs scored seem to be advantageous. Parks that allow a high proportion of doubles and triples also tend to be more unusual, with odd angles and odd dimensions. This makes intuitive sense as well. The more unusual a park is, the more difficult it would be to play in it for the first time - giving the home team, who is already familiar with the park, a significant advantage. Likewise, cookie cutter parks, requiring little adjustment on the part of visiting players, have the lowest advantage. It also could be the case that teams bring in specific players who are particularly suited for an unusual home park, also increasing home field advantage. However, if this were the main reason, I would think we would see a spike in home field advantage after the free agent era, when it became much easier to bring in specific players - since we don't see this, it's likely not the driving force.
The quirky variable is an attempt at a subjective definition of unusual and the model sees it as highly significant - even when considering the statistical variables above. Obviously, the "quirky" variable is highly subjective and is simply a binary variable that doesn't take into account just how quirky a park is, but the inclusion gives some information that statistics alone cannot. It also varies a lot depending on the era of the park - in the wooden/classic era 18 of 23 were considered quirky, in the modern era just 3 out of 27 were considered quirky, and in the nostalgic era 5 out of 17 were considered quirky.
The strikeout variable was perhaps the most interesting of the bunch (though it's only marginally significant) - parks which increase strikeouts tend to increase home field advantage. My guess is that this is related to the hitting background - with more difficult or unusual hitting backgrounds being advantageous to the home team since they have more practice hitting under those tough conditions.
Below is a table of the top and bottom 5 parks in each of the statistical categories (with 10 years or more as an MLB park).
Additionally you can see a chart of the top and bottom 5 parks according to predicted home field advantage according the model. As you can see, the model predicts Coors Field to be by far the #1 biggest home field advantage in the history of baseball. It's followed by Baker Bowl, that wacky Philadelphia ballpark, and classic Fenway Park. The other parks rounding out the current top 5 most advantageous parks according to the model are Minnesota's Metrodome, Minute Maid Park in Houston, and AT&T Park in San Francisco. The bottom 5 parks are dominated by the more modern ballparks, with New Comiskey Park being the lowest and the other current low advantage parks being Angels Stadium, Jacobs Field, Turner Field, and Camden Yards.
With an R-squared of .38, the model is far from a perfect fit, explaining only 38% of the variability in home field advantage between parks. The model misses considerably on several parks (in fact, New Comiskey has enjoyed a decent home field advantage over its 18 years). Of the misses that the model makes, its most egregious (accounting for the number of seasons) are overestimates of Camden Yards and Riverfront Stadium and underestimates of Crosley Field and the Astrodome. A graph of the predicted and actual home field advantage of each park can be seen here. As you can see, there are still other unknown factors at play, but the model does a fair job of predicting how much home field advantage a park will bring.
What Does This Mean?
So, some parks have a greater home field advantage than others, and we now have some idea of why, but is it significant in a baseball sense? Over the course of the year, if we assume a team plays .460 ball on the road, a team with a healthy home field advantage may play .560 ball at home, while a team with a small home field advantage may play only .520 ball at home. The team with the big advantage will win over 3 games more than the team with the small advantage. This is not insignificant at all and could easily be the difference between winning and losing the pennant. Additionally, a good home field advantage is the gift that keeps on giving, with the team reaping the advantage year after year. In the extreme case, Coors field, the ballpark makes a .500 team into an 86 win team - so far a lifetime gain of 70 wins for the park, making it likely the most valuable member of the Rockies franchise.
So what does this mean for new parks being built? If I were building a new park for maximum home field advantage, I would choose one which had a difficult hitting background, increasing strikeouts and making it a low scoring park, with short but high fences down the lines to maximize doubles, and spacious alleys to maximize triples and further minimize scoring. Astroturf also would also help (while astroturf is not significant in itself, it is positively correlated with doubles and triples). Throw in gale force winds, hard brick walls, and a hill in right field, and you'd have yourself a ballpark. It may be baseball's most hideous park, but it'd probably net a fairly decent home field advantage.
Thankfully, neither the Yankees nor Mets decided to go this route and it remains to be seen how their new parks will play. Shea Stadium actually gave the Mets poor home field advantage (.063 actual, .074 predicted), so Citi Field should be a boon to the team. It's not particularly quirky, but is being billed as a hitters park. After a year or two, we'll see how it plays.
Yankee Stadium, in contrast, did give the Yankees a healthy home field advantage (.091 actual, .086 predicted). However, the advantage predictably decreased after the 1970's renovation, when the outfield walls were brought in. Before the renovation, Yankee Stadium had a home field advantage of .094, but in the 20 years since the walls have been brought in to their current dimensions, the advantage has dropped to .070. This can be attributed in part to the triples park factor decreasing from 1.40 to .73. Since the New Yankee Stadium will have the exact same dimensions as the old park, we can expect the park to confer about the same advantage - which is to say, not nearly the advantage that Ruth, DiMaggio, and Mantle enjoyed. Of course, that doesn't consider the ghosts.
An update to this study can be found here.