Draft Picks and Expected Wins Above Replacement
Last week here at Baseball Analysts, we covered the baseball draft in detail with player interviews, scouting reports, and a live blog of the draft. Each team of course has high hopes for the players they draft - hopes that often go unrealized. Of course, a great deal of the expectations heaped upon a player are determined by the pick which he was drafted. Teams understandably expect more out of the #1 overall pick than they do with a 30th round choice. But how much contribution can a team really expect out of each pick?
This is a subject which has been covered before, over at Beyond the Boxscore, by Hardball Times' Victor Wang, and by other places. Here I intend to add to the discussion by adding a theoretical model to the mix to predict the lifetime win contributions from a particular draft pick. Obviously, it's no secret that the higher the pick is, the more production we can expect from a player, but just what is the difference between, say, the #1 and the #500 pick?
Baseball Reference recently has listed all draft picks in the history of the draft, which provides a handy reference from which to start this research. I collected all picks from #1 to #50 and then the picks from every 25th pick after that. This gave me a database of over 2,500 picks to analyze. I then matched this data with Sean Smith's lifetime Wins Above Replacement (WAR) values (due to data issues I actually used a home-brewed method of calculating WAR for very low achieving players - however the vast majority of WAR are from Sean's actual data).
WAR is probably the best metric out there for assessing a player's total value to major league teams, and so I use this as my statistic of interest. I use career WAR rather than WAR over the first six years (pre-free agency), although I think both are probably useful. Since I used career WAR, I had to either make some assumptions about the rest of recent players' careers or throw out a lot of data. I chose to impute the rest of recently drafted players' careers. I assumed that players drafted in 2001 had by now accumulated 50% of their lifetime win shares, gradually going back and increasing that amount to assume that the 1996 draft class had already earned 100% of their win shares. Draft classes 2002 and after were thrown out since it is too soon to predict a player's career win shares.
Fitting A Model
Looking at all data gives quite a messy picture. Of course there are many, many players at every pick clustered at the point where WAR equals zero. These players either succumbed to injury, flamed out, or otherwise never made it to the big show. A few players have slightly below zero values, meaning that they made it to the majors but performed so poorly that they played worse than a replacement player could. Then of course, there are the players with positive contributions, ranging from Barry Bonds' 174 Wins Above Replacement to Harold Baines' 40 WAR, down to the many, many Dave Clark's and Franklin Stubbs' who made a positive, but quite small contribution to their teams.
We can clean this data picture up, by plotting the average WAR at each draft pick, rather than plotting all possible data points. What we see is below:
As you can see, there is a lot of variability even when looking at the average WAR of each pick. However, you can also see that the data follows a definite curve. There is a major advantage to having the very first pick in the first round vs. having the last pick in the first round (#30 overall). The point where the expected WAR tends to level off also seems to be around the end of the first round of the draft. Mathematically we'd like to fit this curve to a model to get a theoretical valuation of each pick. The data certainly isn't linear, but instead seems to follow a definite power law and can be explained by the following formula:
WAR= a * (selection#)^ b, where a and b are the parameters of the model.
Running a non-linear regression, we find those parameters equal to a=19.8 and b=-.50. The model fits very well as you can see from the graph above.
What can we learn from it? Plugging the picks into the formula, we see that the #1 overall selection will accumulate an average of about 19.8 WAR over the course of his career. Meanwhile, it drops significantly to 14.0 WAR for the #2 pick. From there it drops rapidly to an expected 6.2 WAR for pick #10 before leveling off at 3.6 WAR for #30, 2.0 WAR for #100, and 0.9 WAR for #500. The model-based approach makes sense because it uses a relationship which both fits the data and matches our preconceived notions that the #1 pick is likely to become an excellent player, followed by a sharp drop-off in value with each successive pick until leveling off.
Other Factors Affecting Expected WAR
The beauty of a model is that we can also add other variables to the data to determine if other factors affect the curve. Going back to the full dataset (which gives the same parameter estimates as using the average by pick data), we can add terms to our model to differentiate between college players and high school players as well as between pitchers and hitters. The model was defined as the following:
WAR= (a + college*c + pitcher*p)* (selection#)^ b, where a and b are the usual parameters and c adds or subtracts to the scale parameter if the player is in college and p adjusts the scale parameter if the player is a pitcher.
We get the following results from our model. Others have talked about the wisdom of choosing hitters as well as college players and here we have a model that backs up this assertion. The results are below:
In formula form we get:
Expected Lifetime WAR = (20.7 + (-8.5 * pitcher) + (4.6 * college)) * selection ^ (-.49)
where pitcher is equal to 1 if a player is a pitcher, college is equal to 1 if he is a college player, and selection is equal to the # overall selection in the draft.
Here we see a major penalty in WAR for teams choosing a pitcher. If the player is a #1 selection, we would expect a difference of 8.5 WAR between a hitter and a pitcher. Meanwhile choosing a college player is indeed a benefit. The benefit of choosing a college player as the #1 pick amounts to about 4.6 WAR. Both of these numbers of course decrease in proportion to the power law as the draft goes on, so the difference between choosing a high school pitcher and a college pitcher is quite small in absolute terms by the time you get down to the 100th selection in the draft. Below is a pair of charts showing the expected WAR for each type of player at both the 1st and the 100th overall selection.
You can also take a look at a graph of each of the 4 types of players according to the model. As you can see, the shapes remain the same, with the hitters and college players having a higher expected WAR.
The model given above is just the final model with significant terms. I also tried using parameters for college players and pitchers in the exponent to see if the overall shape of the WAR curve changes depending on the type of player. However, this gave a null result, indicating that the pitchers, hitters, college players, and high schoolers all follow the same basic curve - just that hitters and college players start with a higher win expectation. An interaction term between the pitcher and college parameters also came up null, as did parameters distinguishing between various types of position players.
Overall, this analysis backs up the assertion made by others that college hitters have historically been best type of player to draft on draft day, meaning that sabermetrically minded teams can take advantage of this information (and some have been!). Of course, the more teams that catch on to this trend, the less advantageous taking hitters and college players will be. If all teams were drafting with an eye for maximum value with this information, all types of players would eventually have the same Expected WAR. However, I don't believe we are at that point yet.
Aside from measuring the effects of drafting pitchers and college players, this study is useful because it fits a nice smooth curve to easily quantify the expected WAR of each pick, allowing teams and fans to know what type of player to expect with each pick using a simple formula. Armed with this information, we can know what to realistically expect from the players recently selected on June 9th.