Wins Above Replacement vs. Salaries in 2009
A couple of weeks ago, I did an analysis of the the wins provided per dollar in Major League Baseball for free agent eligible players, arbitration eligible players, and players under team control. I did this using a regression using Rally's WAR data as well as salary data from the Lahman database. After a rousing and lengthy discussion over at the Book Blog over the dollar per win value of free agent eligible players (defined as any player with 6 or more years of service), my 2008 estimate of about $6 million dollars per win, was shown to be a bit higher than the commonly held $4.5 million mark that is usually used. However, since Rally gives out fewer WAR than Fangraphs, this was cited as one possible reason for the difference. Additionally, the fact that I estimated service time and the fact that contracts could be backloaded were other potential sources of bias. For the 2009 season, I took different data, this time using contract data from Cot's Contracts and getting the WAR data from Fangraphs. Cot's data lists the deal the player is currently in, including the length of the contract as well as the overall contract value. Cot's Contracts also gives the exact service time for the 2009 season. Wins Above Replacement was gleaned from Fangraphs, since it is the mostly widely used form of WAR. Here I look at only at players with over 6 years of MLB service to try to determine this same fact for 2009. To account for contracts potentially being backloaded or frontloaded, I used the average yearly salary over the life of the contract, rather than the actual salary given to the player in 2009. Another data caveat was that I threw out all players who had signed contracts before they were actually free agents. Since their average salary would include years when they were only arbitration-eligible, simply using the average salary of these players would be artificially low. Additionally, these players were never eligible on the open market, so they are not really in the population we are interested in. Running the regression on this data set I expected to find a dollar per win value around $5 million or so. What I found was vastly different. The equation for the number of WAR expected to be gained for each million dollars spent is below: WAR = .216 + .138*(Salary) This translates to a whopping $7.25 million per win spent on free agents in 2009. This means that a free agent with a $20 million average contract would be expected to produce only 3 WAR while a player with a $2 million average contract would be expected to produce 0.5 WAR. This seems surprising, but the data points seem to back up the analysis as you can see below. There is an argument to be made that the intercept should be locked in to zero to represent the fact that a player earning zero dollars should be expected to produce zero WAR. This is also reasonable, and here I do the same equation fitting the regression with no intercept. WAR = .156*(Salary) While this brings down the dollars per win value slightly, it still translates to $6.4 million per win, far higher than the common $4.5 million figure. Perhaps the relationship between dollars and wins would show more strongly if other factors were accounted for. For instance, someone in the first year of a long term contract will probably be expected to produce more WAR than someone in the last year of a long term contract, even at the same salary. Here I tried accounting for average salary as well as the length of the contract and how many years into the contract the player was. I also included an interaction term of salary*length to account for the fact that the salary-to-WAR slope might be different for longer contract lengths. I came up with this model: WAR = .456 + .118*(Salary) + .029*(Length of Contract) - .171*(Year of Contract) + .005*(Salary)*(Length) Unfortunately, while the theory may have been good, the data didn't back it up. With the exception of average salary, none of the terms in the model were significant. The p-value for the Year of Contract variable was the closest to being significant at .16. Paring down the model or adding other interactions were also futile, and as a result, attempts to include only significant terms leads right back to the basic salary-to-WAR model, though the Year of Contract variable was close to signficant. If more data were available, I would guess this would be a factor. In any case, controlling for these other terms does not strongly change the amount of dollars paid per WAR of free agents. As a final attempt, I looked at only players who were in their first year of their contracts in 2009. These are players who were actually available on the free agent market in 2009 (as opposed to the other analyses which included all players who would be eligible based on service time, whether they were actually free agents or not). As you might expect, the value of these players were higher than those who were still working off of old contracts. However, the change was not huge. Controlling for whether the player had signed a multi-year contract or not, I got the following formula: WAR = .277 + .184*(Salary) - .407*(MultiYear) The dollar per win mark here was lower at just $5.4 million, however, this doesn't capture the true cost, since players signing mult-year contracts will likely be worse at the end of their contract than during the first year studied here. Even with this bias, the $5.4 million mark is far more than the usual $4.5 million mark. An additional counterintuitive finding is that players signing multi-year contracts tended to perform worse than their single-year contract contemporaries. This multi-year term was not significant, however, so the result isn't generalizable. Still, it was surprising to find the effect going in the opposite direction than what one would expect in 2009. While 2009 could have been just a bad year for free agents - this is further evidence that the $4.5 million per win mark commonly used may be, if not wrong, at least obsolete. Using this 2009 data from two different data sources, again shows the dollars per win value above $6 million. While estimates based on projected WAR may yield a different figure, the reality is that teams are paying much more than that (or at least they did in 2009). Interestingly, 2009 was seen at the time as being a depressed free agent market, where teams could pick up relatively cheap bargains. At $6.5 to $7 million per win, there were very few bargains to be had. Update: I had a few missing players in my dataset and the numbers have been changed to reflect that. However, the difference with these players added was very slight. |
Comments
I just read through the (interesting!) text once and I'm not sure what model you used, but out of the top of my head a couple of notes
1. Does the relationship between salary and wins have to be linear?
2. Are the residuals really normal (assuming you assumed that)?
3. Do you have a confidence interval for the price of a win?
Best regards
Bjoern
Posted by: Bjoern at December 15, 2009 12:02 PM
Sky, how many players are in the sample? Can you provide a list somewhere?
Interesting work.
Posted by: Rally at December 15, 2009 12:36 PM
I identified 223 players which had the required years of service and were playing under a contract that didn't include any arbitration years.
There were 127 players who were in the first year of their contracts.
Posted by: Sky Andrecheck at December 15, 2009 1:11 PM
Sky, have you looked at hitters and pitchers seperately?
Posted by: Nathaniel Dawson at December 15, 2009 10:44 PM
Nate, I did look at hitters and pitchers separately, but didn't find a significant difference, at least in this sample.
Posted by: Sky Andrecheck at December 16, 2009 6:27 PM
Sky,
I would recommend re-running the model using a multiplicative, unbiased error.
To unbias the error, you would essential have the sum of the difference between the predicted and actual values, divided by the predicted values, as zero (∑[(F(X)-Y)/F(X)] = 0). The Error of each term is that expression squared, and the total error is the square root of the sum of the errors divided by the degrees of freedom, or
Standard Percentage Error = SQRT[(∑(Y-F(X))/F(X)/df)]
MAYBE that will normalize the slope of the equation? The outliers on the right likely have too much influence on the regression results.
Posted by: Joe R at December 17, 2009 2:47 PM
What are the results for just a simple average, both for first year only and all free agents? How much of an effect does regressing vs averaging have?
Posted by: Bill L at December 17, 2009 4:07 PM