Touching BasesMay 07, 2009
Findings from the Free Agent Market
By Jeremy Greenhouse

Curt Flood really started something with this whole free agency thing, huh? Using ESPN’s Free Agent Tracker, I collected data for all free agents since 2006 and used regression analysis to pick up on some trends.

WAR to Wages

This offseason, Fangraphs unveiled its Wins Above Replacement measure in the Value section of its stats pages. WAR is a statistic that combines offensive, defensive and positional value and sets it against a replacement-level baseline to find the marginal wins a player contributes to his team. There has been debate over how to convert these marginal wins into a marginal value in terms of dollars. One of the first things I looked at was whether the relationship between WAR and salary was linear or nonlinear. I plotted the WAR from each free agent's contract year—excluding those who were injured all year or who came over from Japan—against the average annual value of the contract they signed.


I admit that before having seen any data, I had a bias toward the nonlinear relationship, since it just makes intuitive sense to me.

The regression lines look rather similar. It would appear that the nonlinear regression has an advantage at the extremes, since it won’t predict negative salaries for very negative WAR and it better captures the exponential value of superstar players. However, there is little difference between the regression lines for the vast majority of players, those between 0 WAR and 5 WAR. The R2 values, which measure the percentage of variance of Average Annual Value that is explained by WAR,, are similar at an impressive .62-.64 range. This affirms that a single year of WAR captures a lot of a player’s value. Keep in mind when looking at these R2 values that the R2 will always increase in a polynomial equation due to the nature of adding a new term, so we definitely cannot make any conclusions about either method from this graph alone.

Time 100’s own Nate Silver, in deriving Marginal Value Over Replacement Player, used a nonlinear form of WARP . I have duplicated his graph here which projects WARP for 2005's free agent class by using three years of WARP from 2002-2004 instead of the one previous year of WAR I used for 2006-2008 free agents. I have superimposed a rough line of best fit to portray the difference between a linear and nonlinear model.


The thinking behind a nonlinear model is that there is an abnormal distribution of talent in baseball, which makes top talent disproportionately more valuable than average talent.

Phil Birnbaum shows that individual skills in the major leagues may be normally distributed. Anecdotally, this is reaffirmed by the 20-80 scouting scale, which is based on a normal distribution with a mean of 50 and standard distribution of 10. Furthermore, Tom Tango shows that “when you consider the number of opportunities each player gets (in the Major Leagues), the total effective talent distribution is rather typical.”

However, when observing only the Major Leagues, we neglect the fact that most subpar baseball talent resides at another level. There is an abundance of freely available talent that could provide marginal upgrades to current Major Leaguers. What this means in terms of player value is that below-average players will be disproportionately underpaid compared to above-average players due to the difference in the supply within each pool.

Bill James once wrote “talent in baseball is not normally distributed. It is a pyramid. For every player who is 10 percent above the average player, there are probably twenty players who are 10 percent below average.” I believe this theory holds if by baseball he means the total baseball universe and by average he means the Major League average. So, Tango may be right that, at the Major League level, talent follows a normal distribution, but when we add talent from all player pools, the curve does begin to look like the right tail of a normal distribution.

Think of it this way: would you rather have the right side of the Cardinals’ infield or the Reds’ infield? The combinations of Albert Pujols/Skip Schumaker and Joey Votto/Brandon Phillips will both produce 8 WAR, give or take. Through the currently dominant model for fair-market evaluation, both sets of players are worth some $35 million if you simply multiply their WAR by $4-5 million. But my intuition tells me that I'd rather have the pair on the Cardinals. The key is that Pujols takes up only one roster spot and provides the same value of a pair of players who take up two. I might be able to upgrade over Schumaker on the cheap eventually. We also must account for the fact that freely available talent is, well, free, while the superstars who bring in 5+ WAR will need to be acquired through trading or bidding.

Furthermore, I found statistically significant evidence that the Type A tag for free agents is correlated with increased pay. In a practical sense, the Type A label decreases a player's value in a free market since it costs prospective teams a first-round pick to acquire the player or the label costs the player in leverage if he tries to re-sign with his former team. However, Type A free agents tend to be the best players in my sample, so it is evident that teams ignore the Type A tag and are willing to spend what it takes to reel in superior players.

Separating position players and pitchers, I find that is much easier to predict position players' salaries in general, and the nonlinear regression fits better for position players than it does for pitchers. In separating the two pools of players, I decided to test for some skills that do not translate into a hitter’s or pitcher’s WAR, but still might directly relate to his salary.

General Managers dig the fastball

Fangraphs keeps track of pitch usage and velocity for all pitchers since 2002, and all the data can be easily exported to a spreadsheet. This is a good thing for baseball analysts. Dave Allen and Dan Turkenopf both used pitch f/x data to show how velocity relates to production. In these regressions, I account for a player’s WAR, and therefore can try to isolate the effect of a pitcher’s fastball velocity on his salary. Here is the regression output.

      Source |       SS       df       MS              Number of obs =     149
-------------+------------------------------           F(  4,   144) =   62.82
       Model |  1.7252e+15     4  4.3131e+14           Prob > F      =  0.0000
    Residual |  9.8863e+14   144  6.8655e+12           R-squared     =  0.6357
-------------+------------------------------           Adj R-squared =  0.6256
       Total |  2.7139e+15   148  1.8337e+13           Root MSE      =  2.6e+06
         aav |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
         WAR |    2399138   153233.6    15.66   0.000      2096260     2702016
         fbv |   164514.8   72588.22     2.27   0.025     21038.76    307990.9
          o7 |  -423055.5   545027.9    -0.78   0.439     -1500344    654233.1
          o8 |   -1365307   508682.7    -2.68   0.008     -2370757   -359857.4
       _cons |  -1.19e+07    6496299    -1.83   0.069    -2.47e+07    954444.2

This means that there is statistically significant evidence that fastball velocity (fbv) contributes to a pitcher’s salary. Every additional mile per hour harder a pitcher throws, he is paid about $165,000. Fastball velocities typically range from 85 MPH to 95 MPH, so if two players were to put up the same WAR, but one was a soft tosser and the other a flame thrower, in an auction teams may bid up to a couple million dollars more for the harder thrower, based on that skill alone.

I created two player pools, separating those with above-average fastball velocities and those with below-average fastball velocities. The average fastball in my sample of 149 pitchers travels 89.7 miles per hour. The WAR of both player pools is nearly identical, as the harder throwers average .97 WAR compared to .96 WAR for the softer throwers. Yet the harder throwers earned $4.9 million per year in free agency compared to $4.2 million for the latter group. Perhaps fastball velocity predicts future performance, or perhaps there is an allure to signing a player who can light up the radar gun, or maybe fans come out to see fast pitchers. No matter the case, throwing hard gets you paid.

I also included time-fixed effects in this regression, setting dummy variables to represent the year during which the pitcher became a free agent. We find statistically significant evidence of deflation in 2008. While 2006 and 2007 appear stable in terms of free agent salaries, pitchers with similar production in 2008 were liable to lose on average a million dollars per year on their contract because they hit the market at the wrong time.

General Managers dig the longball

By longball, I don’t mean home runs. I mean actual distance. From Hit Tracker, I included the average true distance in feet of home runs for all players in my dataset..I also included weight of a player in pounds, which might measure raw power or might measure nothing, but was significant in the regression. Unfortunately, weight is also probably the least accurate data point I could use since there are no reliable sources for it.

      Source |       SS       df       MS              Number of obs =     169
-------------+------------------------------           F(  3,   165) =  123.05
       Model |  2.5996e+15     3  8.6653e+14           Prob > F      =  0.0000
    Residual |  1.1620e+15   165  7.0421e+12           R-squared     =  0.6911
-------------+------------------------------           Adj R-squared =  0.6855
       Total |  3.7616e+15   168  2.2390e+13           Root MSE      =  2.7e+06
         aav |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
         WAR |    2256088   125521.3    17.97   0.000      2008253     2503923
        true |   28062.52   13259.32     2.12   0.036     1882.712    54242.32
      weight |    24497.9   10709.87     2.29   0.023     3351.842    45643.95
       _cons |  -1.49e+07    4881150    -3.05   0.003    -2.45e+07    -5253468

These measures are essentially independent of WAR but do affect salary. I believe home run distance and weight are actually capturing the phenomenon that has shown that there is a stronger correlation between slugging percentage and salary than between salary and most any other basic statistic. Weight and True Distance correlate very well with slugging percentage. We can say with confidence that there is a bias toward heavier players who hit for power, all else being equal. For every ten pounds of weight or ten feet in home run distance, a hitter can expect a positive return averaging around 250 grand.

This is not to say whether paying these players more for the ability to throw fast or hit long home runs is efficient or not. I did this analysis to observe trends in the market over the last few years, and I am not trying to comment on any sort of inefficiencies that may exist.

Thanks to all the data sources I used in this study including ESPN, Fangraphs, Hit Tracker, Forbes, and Fantasypitchfx

Edit: At Jake's request, I have separated the data series by year and added separate trendlines for each year.



Easy suggestion: use "millions of dollars" instead of just dollars. It will just make it easier on the eyes.

Why did you regress against the previous year WAR instead of some sort of projection of next year WAR? For example Marcel's 5/3/2 weighing of the last 3 years seems like a sensible choice.

The criticism of regressing salary against WARP was that WARP had a very low replacement level, which made the resulting line have a parabolic shape. Tango's opinion was that with a properly constructed WAR (fangraphs uses his research I believe) a linear relationship was better.

Alex, thanks for the suggestion. I'll try that out for the graph.

Anonymous, I agree that creating a projection would have been a better method, and a 5/3/2 rating as Silver and Marcel did would have been easy. It was just that I collected data a couple months ago and when I decided to use it for these purposes I was too lazy to go back and find players' previous years worth of WAR. Not really any excuse.

I am aware of the criticism of regressing salary against WARP. I am not sure which relationship is "better," but a parabolic shape is inevitable when using a second-order polynomial equation, as Silver did, and I repeated. A linear shape appears to be less complicated, which is a plus, and seems to get similar results to more complicated models for 90% of players, which is also a point in its favor.


In the first chart, I might make a suggestion for the data visualization. Since you lumped several years of contracts into one graph it misses a chance to explain a whole lot more. A new chart (same basic format) with the contracts grouped by the time of signing would allow us to distinguish year-by-year patterns. I see the chart is from excel, it would be quite painless to assign different groups different colored dots.

The external environment makes a huge deal as we saw this past free agent period.

The correlation between home run distance and fast ball speed and salary is not surprising. For projecting future performance, a player who hits the ball farther or throws faster would appear less likely to regress and have a greater possibility of breaking out than the weaker, but equally valuable player.

Doug, agreed. The important thing is that the numbers bear that out.



I have a suggestion for the first chart, if you group the scatter points by years, you can get excel to plot the groups in different colors.

This would add a lot to the graph, there is a lot going on and being able to see the patterns based on the years would be very helpful to the message you are trying to get across.

As we saw this free agent period, the external market factors can have a big play on salaries.

Jake, thanks for the suggestion. I don't currently have the data in front of me, but I'll try playing around with it when I do.

Great article on the height of Major Leaguers. I tested for height but nothing significant came up in its relationship with salary. Maybe I should have broken it down into tall, average, and short players since the relationship with height and production isn't linear apparently.

Opps on the double post. I figured 5 hours after I'd originally posted the comment and more people had been put up there that mine had failed somehow.

Would love to see the chart again after you make whatever adjustments to it.

Appreciate the comment on my article, I'm not surprised height wasn't significant. It hasn't been shown to be significant influence on any performance metric I'm aware of either. And its not a case of not being 95% sig, its not even been close. I've been looking at that question a while a now. Plus the data doesn't have enough variation in it because 95% of MLB pitchers are between 6'0 - 6'8". So when we see outcomes that persist in this fashion, I have to chalk it up to natural selection telling us something.