Draft Picks and Expected Wins Above Replacement
By Sky Andrecheck

Last week here at Baseball Analysts, we covered the baseball draft in detail with player interviews, scouting reports, and a live blog of the draft. Each team of course has high hopes for the players they draft - hopes that often go unrealized. Of course, a great deal of the expectations heaped upon a player are determined by the pick which he was drafted. Teams understandably expect more out of the #1 overall pick than they do with a 30th round choice. But how much contribution can a team really expect out of each pick?

This is a subject which has been covered before, over at Beyond the Boxscore, by Hardball Times' Victor Wang, and by other places. Here I intend to add to the discussion by adding a theoretical model to the mix to predict the lifetime win contributions from a particular draft pick. Obviously, it's no secret that the higher the pick is, the more production we can expect from a player, but just what is the difference between, say, the #1 and the #500 pick?

Baseball Reference recently has listed all draft picks in the history of the draft, which provides a handy reference from which to start this research. I collected all picks from #1 to #50 and then the picks from every 25th pick after that. This gave me a database of over 2,500 picks to analyze. I then matched this data with Sean Smith's lifetime Wins Above Replacement (WAR) values (due to data issues I actually used a home-brewed method of calculating WAR for very low achieving players - however the vast majority of WAR are from Sean's actual data).

WAR is probably the best metric out there for assessing a player's total value to major league teams, and so I use this as my statistic of interest. I use career WAR rather than WAR over the first six years (pre-free agency), although I think both are probably useful. Since I used career WAR, I had to either make some assumptions about the rest of recent players' careers or throw out a lot of data. I chose to impute the rest of recently drafted players' careers. I assumed that players drafted in 2001 had by now accumulated 50% of their lifetime win shares, gradually going back and increasing that amount to assume that the 1996 draft class had already earned 100% of their win shares. Draft classes 2002 and after were thrown out since it is too soon to predict a player's career win shares.

Fitting A Model

Looking at all data gives quite a messy picture. Of course there are many, many players at every pick clustered at the point where WAR equals zero. These players either succumbed to injury, flamed out, or otherwise never made it to the big show. A few players have slightly below zero values, meaning that they made it to the majors but performed so poorly that they played worse than a replacement player could. Then of course, there are the players with positive contributions, ranging from Barry Bonds' 174 Wins Above Replacement to Harold Baines' 40 WAR, down to the many, many Dave Clark's and Franklin Stubbs' who made a positive, but quite small contribution to their teams.

We can clean this data picture up, by plotting the average WAR at each draft pick, rather than plotting all possible data points. What we see is below:

war4.GIF

As you can see, there is a lot of variability even when looking at the average WAR of each pick. However, you can also see that the data follows a definite curve. There is a major advantage to having the very first pick in the first round vs. having the last pick in the first round (#30 overall). The point where the expected WAR tends to level off also seems to be around the end of the first round of the draft. Mathematically we'd like to fit this curve to a model to get a theoretical valuation of each pick. The data certainly isn't linear, but instead seems to follow a definite power law and can be explained by the following formula:

WAR= a * (selection#)^ b, where a and b are the parameters of the model.

Running a non-linear regression, we find those parameters equal to a=19.8 and b=-.50. The model fits very well as you can see from the graph above.

What can we learn from it? Plugging the picks into the formula, we see that the #1 overall selection will accumulate an average of about 19.8 WAR over the course of his career. Meanwhile, it drops significantly to 14.0 WAR for the #2 pick. From there it drops rapidly to an expected 6.2 WAR for pick #10 before leveling off at 3.6 WAR for #30, 2.0 WAR for #100, and 0.9 WAR for #500. The model-based approach makes sense because it uses a relationship which both fits the data and matches our preconceived notions that the #1 pick is likely to become an excellent player, followed by a sharp drop-off in value with each successive pick until leveling off.

Other Factors Affecting Expected WAR

The beauty of a model is that we can also add other variables to the data to determine if other factors affect the curve. Going back to the full dataset (which gives the same parameter estimates as using the average by pick data), we can add terms to our model to differentiate between college players and high school players as well as between pitchers and hitters. The model was defined as the following:

WAR= (a + college*c + pitcher*p)* (selection#)^ b, where a and b are the usual parameters and c adds or subtracts to the scale parameter if the player is in college and p adjusts the scale parameter if the player is a pitcher.

We get the following results from our model. Others have talked about the wisdom of choosing hitters as well as college players and here we have a model that backs up this assertion. The results are below:

war1.GIF

In formula form we get:
Expected Lifetime WAR = (20.7 + (-8.5 * pitcher) + (4.6 * college)) * selection ^ (-.49)

where pitcher is equal to 1 if a player is a pitcher, college is equal to 1 if he is a college player, and selection is equal to the # overall selection in the draft.

Here we see a major penalty in WAR for teams choosing a pitcher. If the player is a #1 selection, we would expect a difference of 8.5 WAR between a hitter and a pitcher. Meanwhile choosing a college player is indeed a benefit. The benefit of choosing a college player as the #1 pick amounts to about 4.6 WAR. Both of these numbers of course decrease in proportion to the power law as the draft goes on, so the difference between choosing a high school pitcher and a college pitcher is quite small in absolute terms by the time you get down to the 100th selection in the draft. Below is a pair of charts showing the expected WAR for each type of player at both the 1st and the 100th overall selection.

war2.GIF

You can also take a look at a graph of each of the 4 types of players according to the model. As you can see, the shapes remain the same, with the hitters and college players having a higher expected WAR.

war3.GIF

The model given above is just the final model with significant terms. I also tried using parameters for college players and pitchers in the exponent to see if the overall shape of the WAR curve changes depending on the type of player. However, this gave a null result, indicating that the pitchers, hitters, college players, and high schoolers all follow the same basic curve - just that hitters and college players start with a higher win expectation. An interaction term between the pitcher and college parameters also came up null, as did parameters distinguishing between various types of position players.

Conclusion

Overall, this analysis backs up the assertion made by others that college hitters have historically been best type of player to draft on draft day, meaning that sabermetrically minded teams can take advantage of this information (and some have been!). Of course, the more teams that catch on to this trend, the less advantageous taking hitters and college players will be. If all teams were drafting with an eye for maximum value with this information, all types of players would eventually have the same Expected WAR. However, I don't believe we are at that point yet.

Aside from measuring the effects of drafting pitchers and college players, this study is useful because it fits a nice smooth curve to easily quantify the expected WAR of each pick, allowing teams and fans to know what type of player to expect with each pick using a simple formula. Armed with this information, we can know what to realistically expect from the players recently selected on June 9th.

Comments

I did a study long ago studying the draft and my point then, I think still applies now: averages don't mean for much when the odds of any one draftee becoming a good major league player is so low. One of the key points is that the vast majority of players don't become a good major league player, even high in the draft (though what you do here is very good work on separating out the different type of players).

A low probability of success has huge implications for analyzing and understanding the draft. People see the low bonus a major leaguer got when he was drafted, but with a low probability, you need to account for all the bonuses paid to draftees who never made it when analyzing the true cost of developing a major league star.

With a low probability, it means that losing a draft pick due to signing a free agent is not that big a deal, as that pick most likely will not turn into a good player.

With a low probability, it means that when you lose a good player to free agency, you are probably better off trading to get known prospects, known quantities, than to wait for the draft for your two picks which probably don't even add up to 20% probability of selecting a good player, it's probably closer to 15%.

In addition, I see studies of the draft using averages but never pointing out the implications of that. For example, the average WAR in your study for someone drafted in around the 20-30 picks overall appear to around 5. So I started looking for someone with that. Mark Loretta has 18.7, so he's better. Julio Lugo has 13.9. Ah, Felipe Lopez has 4.9, but he'll pass that soon since his career is not over. OK, here is a good one, Terrence Long has a career WAR of 6.3 and it appears his career is over. So, the average player selected in the back of the first round, on average, is worse than a Terrence Long. Not too useful, right?

What is more important is figuring out the odds of selecting a good baseball player with each draft pick, and my study found that it starts out less than a coin flip for the very top of the draft, and by the end of the first round, is no better than roughly 10%. Which are horrible odds, but fans go crazy over these first round draft picks, like they are going to become a star player, but the odds are that they are going to fail miserably at that, they are only lottery tickets giving hope to the fans, but ultimately disappointing them.

FYI, the BP draft study also examined this distinction between HS and College, and found that the difference had lessened over time (FYI: they split their data in half and examined the relative differences).

Also, I have never seen a study examine what the market value of hitters vs. pitchers. Using your study (and BP's), one could assume that it is better to draft hitters, they offer more value, on average. And many fans do.

However, what I have not seen is what the value of that is. Is it relatively harder to find pitchers of equal value to hitters? If so, they are rarer and thus could be more valuable to a team. Even if their average career WAR is lower than a hitter's, if it is harder to find such a pitcher - and since the averages are clearly lower all through the draft, that appears to be so - then a lower WAR value pitcher could be just as valuable as a hitter, as it is not like you can field a team of all high WAR hitters, you need to include pitchers in to the mix.

Thus, it is interesting that hitters create more WAR than pitchers, but the more pertinent question is whether that is more valuable to the team? There is a balance each team has to have, between the hitters and pitchers, so even if a hitter might be more valuable, you still need pitching.

I think you need to DEFINITELY look only at the WAR prior to them hitting free agency. This is the only thing a team is buying at below market value.

Tango, I absolutely would like to look at the first-six-year data as well, and I probably will do so soon, since as you mention, it is more useful for team management purposes.

However, I also think there is value in looking at career data from a more general fan standpoint of "how good will this player be", regardless of monetary considerations and the current MLB salary structure.

OGC, Thanks for your comments. I agree that looking at the distribution of player values within each pick is also important and could be another study of analysis. You make your own point though, when saying that it's better to take known quantities over draft picks - the average #10 overall pick turns out to be about only the value of Felipe Lopez. Of course, some players will be much more valuable than that while others are complete busts.

On your point about hitters vs. pitchers, obviously it's possible to have a glut of one type of players in your system, so I wouldn't advise teams to take a hitter with every single pick, but all being equal a hitter will provide more value.

"how good will this player be"

Sky, you've got the average first round pick as having 6.4 career WAR. That's one typical MVP season! So, I do not think the question you are asking will have much value to the fan.

Would it be possible to send me (or post) your data, say:
careerWAR, selectionNumber, playerId, playerName

Could you plot, say, the 50th percentile WAR value for each pick, and the 80th%, and the 95th%?

Good suggestions and ideas all around (both here and at the Book Blog). I'll be following up with another article soon.

Per OGC's comments, I think a good way to separate the values of these players might be to look at the probability of each draft pick to sustain above average WAR for their first six years. The way I might do this is to take each year in those 6 and subtract their WAR by 2, and give it a sum, then average among all your players. Because as has been said, the absolute WAR is only a small fraction of the picture. One guy like Griffey, with a long career, and great performance is going to skew the picture greatly, especially if you're not showing error bars. Another way to break it down might be to show the simple percent of players that average more than 2 WAR per season over the first 6 years, with maybe the percents of averaging more than 1 and 3, to give glimpse of the distribution.

Also, I didn't notice an R^2 term for your fit? On that note, in jumping from the 50th pick to the 75th, you may be dragging down the curve to far to soon. It seems possible the 50th-75th picks might remain around the 3-4 WAR. Though I fully understand the reasons for making this jump.

Just some ideas for those that actually want to go through the analysis. But really cool stuff, and a great start.