Behind the ScoreboardNovember 30, 2009
WAR, Salary, and Service: Estimating Dollars Per Win
By Sky Andrecheck

The Hot Stove League is in full swing, and what better way to dig in than by estimating player salaries. In this post I'll attempt to find a simple relationship between salaries, Wins Above Replacement (WAR), and years of service. In particular, how much of a pay cut do those in arbitration or under team control make compared to those eligible for free agency?

The WAR data is from Sean Smith, and the salary data comes from the Sean Lahman database. Data on service years is scarce, so I estimated years of service based on playing time - it's not perfect but it will do for now - I crossed checked it with actual service time for 2007 players and my method of estimating service wasn't too terrible (130 PA, 20 games pitched, or 50 inning pitched equaled one year of service). I divided the service time into three groups - those with less than three years of service, who presumably are held under team control. Those with 3-5 years of service, who are arbitration eligible, and those with 6 or more years of service, who are eligible for free agency.

There are two ways to examine the relationship between WAR and salary. One is to estimate the salary of the player based upon the player's WAR. Another way is to estimate the player's WAR based upon the salary.

Predicting Salary from Performance

Let's go with the first approach first. My independent variable is player salary and I want to estimate it by WAR, service category, and year. Lahman's salary data goes back to 1985, but for now I'll look at just 2008.

As others have found, the relationship between salary and WAR is linear. The model I estimated can be boiled down to three equations - one for each level of service . Here I'll present the results for 2008:

When under team control: Salary = .51 + WAR*.001
When Arb eligible: Salary = 2.26 + WAR*.31
When FA eligible: Salary = 5.53 + WAR*1.23

fa_elig.GIF

The $500,000 salary of pre-arbitration players seems reasonable. Not surprisingly, the players' actual contribution to the team is of very slight importance. Basically these players get close to the minimum for their efforts no matter what.

However, when looking at the free agent eligible players, things get interesting. According to the formula, a player producing absolutely nothing for the team is due to be paid $5.5 million. What team in their right mind would do that? Well, none of course, but plenty of teams DO pay a lot of money for no production. In fact, there's probably a do-nothing overpaid free agent sitting on your favorite team's bench right now. Chances are that if a team has a 0 WAR producing free agent, he'll be making over $5 million. Bad signings, injuries, bad luck, and a host of other problems can often cause a worthless free agent to be paid a lot of money.

High producing free agents do make more, of course, but not way more - $1.3 million per win. While a worthless free agent would be expected to make $5 million, a free agent player producing an MVP-type season of 6 WAR is expected to have pulled in $13.4 million.

Arbitration-eligible players fall in the middle as you might expect, with 0 WAR players making an expected $2.3 million, and players with great seasons making $4.1 million. What's the relationship between arbitration-eligible players and free agent-eligible players? It appears from the data that low-value free agents make about double the amount of low-value arbitration eligibles ($5.5 mil vs. $2.3 mil). However, as the player increases his performance, the gap widens. For a 5-WAR season, the free agent will make three times as much as the arbitration eligible player ($11.7 mil vs. $3.8 mil). Meanwhile, non-arb eligible players earn the same no matter what. As one might expect, the better the player, the greater the benefit of being a free agent.

How does this compare to the results from years past? Just for fun, here are the formulas from 1990:

When under team control: Salary = .14 + WAR*.02
When Arb eligible: Salary = .51 + WAR*.09
When FA eligible: Salary = .95 + WAR*.10

Obviously, these salaries are much lower than salaries of today. What's interesting is that the high WAR players did not make much more than low WAR players, even for free agents. In 1990, a 6 WAR player would be expected to make 64% more than a 0 WAR free agent. However, in 2008, a high WAR player would make 144% more than a 0 WAR free agent. Perhaps this is a sign that teams are getting more for their money, or a sign of some other change in the market. Perhaps I will explore this relationship over time in a later post.

Predicting Performance from Salary

While predicting salary from performance is interesting, perhaps more relevant is predicting performance from salary. A player's salary is determined before the player performs, so it makes more sense to analyze it this way. It's also useful to ask, "if we spend $10 million on a free agent, how many wins should we expect?"

We can answer this question using the same sets of models, with Salary and WAR swapped in the equations. In 2008, the numbers were:

When under team control: WAR = .84 + Salary*.002
When Arb eligible: WAR = .62 + Salary*.21
When FA eligible: WAR = 0 + Salary*.16

fa_elig2.GIF

As expected the numbers are vastly different for each of the three categories. For those under team control, the player's salary basically has no correlation with the number of wins he is expected to produce - everybody is getting paid the same, good, bad, or ugly - hence the flat curve. For those arbitration eligible, a player getting paid the league minimum will be expected to produce 0.7 WAR, while producing .21 WAR for every million dollars after that. A star arbitration eligible player making $7 million will be expected to produce 2.1 WAR. In general, as the graph shows, teams get more value from high-priced arbitration eligible players than from high-priced free agents.

For free agents, the link between salary and performance is more tenuous. Those making the league minimum will be expected to produce 0.1 WAR. For every million dollars paid out after that, the average player will return .16 WAR. This means that a $10 million free agent will be likely to produce just 1.6 WAR. There are a lot of overpaid free agents out there.

The data show that on the open market, teams will have to pay about $6 million for an expected return of one win. This $6 million figure is a bit more than the $4.5 million that is commonly used as the dollar per win ratio. The Fangraphs method differs from mine in that it calculates the expected win value based upon an estimate of "true performance level," and then compares that to the amount that players are actually signing for on the free agent market. In contrast, my method compares salary to WAR in a particular year for all players, regardless of when a player was signed or what his true talent really is. Since there is more noise in a player's actual yearly WAR than in a player's true talent estimate, WAR and salary will have a lower correlation - hence the higher cost to gain an expected win.

In 2008, Albert Pujols made a salary of $13.9 million and contributed a league best 9.6 WAR. A free agent eligible player making $13.9 million would have been expected to contribute 2.3 wins. The fact that Pujols actually contributed 9.6 wins means that he gave the Cardinals 7.3 wins more than they bargained for, making him the league's best value. To get an expected return of 9.6 WAR on the free agent market, a team would have to pay $59 million - making Pujols a huge bargain. While $59 million seems like a lot, think of all of the Jason Schmidt's and Andruw Jones' that might have been bought instead with no value to the team.

From Pujols' perspective however, he didn't make all that much less than expected. An average 9.6 WAR producer would have been expected to make $17.3 million compared to $13.9 million. Why the major discrepancy in Pujols' dollar value? The reason is the regression effect of course. Since dollars and wins are only loosely related, both will regress to the mean quite strongly. For teams, it means that you have to pay a lot to get a little. For players, it means that a season of great performance doesn't earn too much more than a season of mediocre performance.

As fans, we're probably more apt to care about how many wins can be squeezed out of dollars rather than the other way around, making the first formulation (where Pujols is worth $59 million) more apt. Since teams would have to spend $59 million to get an average return of 9.6 wins, this would have been a fair price had Pujols' value been guaranteed in advance to provide 9.6 wins.

In the next week or two, I'll be exploring this relationship a bit more in depth. However, this simple formulation does provide some insight on just how much teams are paying for marginal wins.

Update: I've had a few requests to see the data points plotted, so here they are for free agent eligibles in 2008. The data looks linear to me, and although the variance of the errors does get a little larger as salary increases, it doesn't seem like a major problem.

fa_datapoints.PNG

Comments

Very interesting,, but can we get error bars on the plots and/or error estimates on the regression coefficients? It'd be interesting to know those to get a feel for how to interpret these numbers.

Sky, did you yourself test the linearity of WAR and salary? I hope you do. And I believe the slope of your Free Agent regression line is so flat because you're using actual WAR as opposed to predicted WAR, no? Therefore, high paid players will often get injured and total 0 WAR while low paid players will rarely total 5 WAR. Seems like you have the data to do some really solid work with predicted WAR and actual salary.

Hi Jeremy, Here is the data so you can see the linearity for yourself. Seems linear to me, and it makes sense to me theoretically as well (until you get really extreme).

For others interested, there's a rip-roaring discussion about this over at the Book Blog.

Could we get a R^2 value for your linear fit? It seems like it'd be easy enough to see if the data really is linear (say, check the R^2 value of a second order polynomial and how it compares to the linear R^2, or an exponential, etc).