Behind the ScoreboardJanuary 18, 2010
The Value of a Good Farm System
By Sky Andrecheck

Baseball America's farm system rankings are one of the most respected rankings of a club's minor league talent around. Since 1984, they've been rating and ranking minor league systems in terms of their potential for major league impact. In this post, I try to determine just how much of an impact a team's farm system has on future performance.

Recently, the Baseball America came out with its December farm system rankings. Baseball America had the Houston Astros dead last, while the Rangers were ranked #1. If you're a Rangers fan, you might be smiling ear to ear, believing that the Rangers, who were also ranked #1 in 2009, would be poised for a long-term dynasty. Meanwhile Astros fans might despair, knowing that good young talent is not on the way.

But really, how predictive are these rankings? Does a good ranking actually lead to future success? If so, just how much?

To test this, I obtained Baseball America's organizational rankings from 1984-2010. I first transformed the rankings into a ratings, assuming that teams' minor league talent was normally distributed. This reflected the likely reality that the difference between having the #13 and #17 farm system is pretty small, but the difference between the #1 and #5 farm system is quite large. Transforming the ratings into normally distributed scores (which range from about -2.1 to 2.1) reflects this nicely.

I then used statistical regression to find the relationship between Baseball America ratings and team winning percentage. Doing a simple, single-term linear regression, it appears that the Baseball America rankings have predictive power for many years forward. One year's Baseball America ranking has a statistically significant effect on winning percentage for each of the next 8 years. As you would expect, those with higher rankings will tend to do better. If the only information you have is a team's 2010 Baseball America ranking, you would predict that a team with good rankings now will have an advantage come 2018.

But of course, we have more information than that. To really get at the heart of the matter, we need to take into account potential confounding variables. We can take these into account by using a multiple regression. To predict the next year's WPCT, significant important factors were:

a) WPCT from last year
b) WPCT from two seasons ago
c) Salary from this season
d) Salary from last season
e) Market size

Now, to test the effect of farm systems, we can add in the Baseball America rankings data. When we do, we get an interesting, yet difficult to interpret model, the results being the following:

ba1.PNG
*market size was also transformed from a ranking to a normalized rating
**salary variables were expressed as a ratio of team salary to league-average salary

Clearly the salary and previous winning percentage variables are the main predictors of a team's success in a season, with market size close to significant. Less clear is the Baseball America rankings, which don't have a clear pattern. The years with most predictive power are the rankings from the previous season and from four seasons ago. Rankings from two years ago and from seven years ago show some predictive power, but not a lot. Meanwhile the other years show very little predictive power, with the effect being negative in some years.

The reason for this volatility of course is that the sample size is fairly small, so the estimates are not all that accurate. While using these weights would give the best fit, it doesn't seem to make sense that a BA ranking from one or four years ago would have much more predictive value that the BA ranking from two or three years ago. What does appear clear however, is rankings from the previous four years combined have a pretty strong correlation with WPCT, while rankings from after that time, on the whole, don't really a strong much effect.

My imperfect solution, then is to put the average of the previous four years of BA rankings into the model. When I do this, I get the following result.

ba2.PNG

Overall, the values of the other terms are relatively unchanged, but we get a nice, highly significant, result for the Baseball America rankings. What does it all mean? Those ranked as the #1 farm system for the previous four years would get the maximum Baseball America score of 2.1. Multiplying 2.1 by .0155 gives means that it would be expected to add about .033 points to its WPCT in the next season. That translates to about 5.3 wins. Now five and a half wins is nothing to sneeze at, but it’s also not an enormous factor. Teams with weak farm systems do take a hit in future production, but it's certainly not insurmountable. The Astros, ranked last now for three consecutive years, figure to take a hit of 3.3 wins in 2010 and 4.4 wins in 2011. While that's certainly not desirable, there's no reason they still can't compete in the coming years, despite a poor farm system.

The model can be extended to predict values further into the future as well. Using only known, WPCT's, salaries, market size, and Baseball America rankings, we can build models for years down the road. For instance, using only known 2010 variables, how many wins does the #1 farm system provide in 2015? The models show that being the best farm system in 2010 correlates to about 4 extra wins in 2015.

The Rangers should feel good, but not get too overconfident, despite having the #1 system in both 2009 and 2010. The Rangers, who were ranked #1 in '09 and '10, were ranked #27 in 2008 and #15 in 2007. What do the models show the Rangers farm system producing over the next several years? The models predict the following boost in wins:
2010: 1.2 games
2011: 2.6 games
2012: 5.2 games
2013: 5.6 games
2014: 4.9 games
2015: 3.8 games
2016: 3.3 games
2017: 3.1 games
2018: 2.1 games

Since the Rangers' system was rated #27 as recently as 2008, the expected farm impact in 2010 is small. However, the impact increases dramatically starting in 2012. Overall, over the next 9 years, the Rangers farm system will likely net them 31 extra wins, meaning that while their system won't have a huge effect in any one particular year, it's likely to have a strong impact on the Rangers franchise over the next decade.

How about for their Texas counterpart, the Houston Astros? For them, the following 9-year outlook looks as follows:
2010: -3.3 games
2011: -4.4 games
2012: -6.0 games
2013: -5.6 games
2014: -4.9 games
2015: -3.8 games
2016: -3.3 games
2017: -3.1 games
2018: -2.1 games

For the Astros, it's nearly the opposite situation. Their farm system projects to cause them to lose over 36 games over the next ten years. So, is the difference between the Rangers and Astros farm systems really 67 wins over the next nine years? It would appear that way, although there are some caveats. For one, the year-to-year farm system rankings are correlated with one another, so the fact that the Rangers have a good farm system now is also indicative that they will have a good system in the future. That undoubtedly accounts for some of the large difference in wins. While the Rangers may not be still reaping fruit from their 2010 farm system in the year 2018, the fact that they have a good farm team now bodes well for their future farm teams, and hence their future major league teams.

Another factor to consider is how teams go about team-building. The fact that the Rangers have a good farm system means that they may be in strong contention in the next few years. With the team blossoming, this may spur the front-office to go out and sign free agents to supplement the team. Thus, the wins the future free agents provide are also correlated with the Rangers having a good farm team. While the Rangers may win more because of the free agents, this boost (reflected in these numbers) is not necessarily a direct product of having a good farm system in 2010.

For these reasons, I would hesitate to put a dollar value on having the #1 farm system in baseball vs. the #30 farm system in baseball - at least using this analysis. There are too many potential confounding variables here such as the ones I mentioned above. Still, if you are a fan, it matters little where your team's wins are coming from. Rangers fans really do have a reason to be smiling. While a handful of wins each year may not have a major impact, 30 wins over the next 9 season is a significant force. Whether the Rangers can parlay those wins into championships remains to be seen.

The following graph shows some trajectories for some of the more extreme teams in the league:

ba3.PNG

The results also are a testament to the accuracy and relevance of the Baseball America organizational rankings. While obviously a #1 ranking doesn't guarantee championships, the ranking is significant predictor of major league wins far into the future. Kudos to Baseball America for doing these rankings. Their well-respected reputation is well-deserved.

Comments

This is a great piece. I would love to see this idea applied to previous seasons to see how closely the expected jump in wins correlates to actual wins. Obviously there are factors independant of the farm system which contribute to wins and losses, but it would still be interesting. I know I will be following the standings at the end of this upcoming baseball season to see if the Rangers come out 1 or 2 wins higher this year compared to last year.

Interesting piece...A few questions:

-Are you using OLS? If so, did you explore other options (e.g., beta regression) given that the DV is not continuous? Try it with wins and use poisson or NB?
-What happens if you change your assumptions regarding normal distribution of rankings? Did you run it straight up? What do those results look like?
-What do the CI's look like for the win predictions?

This result also seems to pass the smell test: a great organization should be able to churn out 85 wins over and above what other competitors should be able to find in average systems and on the free agent market. Works for me.

Thanks for the comments.

The standard errors on predicted WPCT's are quite high due to random error, so don't expect the standings to necessarily match up with the results here.

I did not try other methods besides OLS. I think there are some other methods out there that would analyze this data more efficiently, but this was my first stab at it. Using an Heirarchical Model or some sort of advanced time-series approach may be useful.

Sky:

If you're just using OLS, then you should be dropping Market and Salary from your models.

Previous and current year salaries will only have an indirect effect through previous and current year WPCT. Likewise, market size will only have an indirect effect via salary and then WPCT.

Unless you're using a hierarchical model that's going to measure the effect of Market on Salary, then Salary on current and previous WPCT, and then those variables along with the farm system on future WPCT, you're just testing the same causal effects multiple times.

Finally, you may consider just sticking with rankings rather than normalizing. Normalization is an additional assumption you're throwing into the model. Nate Silver uses rankings rather than a normalized index in his very successful Secret Sauce model and that works fine.

Very interesting study and results. Thanks for the good job!

How many wins are the Giants looking to add on top from 2010-2018?

Looks like it will peak in 2013-14 about a little more than 3 wins. What does that mean exactly, they will add 3 wins over 2012? (meaning it is cumulative) Or just 3 wins over 2009?

Thanks.

JD, If market size and salary have only an indirect effect (through current WPCT) on future WPCT, then those variables would no longer appear significant when current WPCT is included in the model.

The fact is that salary and market size are significant predictors of future WPCT, ***even when current WPCT is accounted for***. That's why they are still in the model.

If Silver is using straight rankings in an OLS model I would really question his methodology.

What's wrong with straight rankings? Is there evidence that suggests that your assumption of normality is more appropriate? Both work as independent variables in an OLS framework. It is your dependent variable that is violating one of the core assumptions of OLS. (Plus it's really easy to re-run the model and see if there's much of a difference)

OGC, It means that the Giants system will give them a 3 win boost in 2013, also a 3 win boost in 2014. It's not cumulative though, so the farm system helps them the same in both 2013 and 2014 - three wins in each season. Overall the Giants can expect about a total 16 win boost over the next nine seasons due to their farm system.

Sparky & others, due to the Central Limit Theorem, things made up of many small variables (such as a talent on a farm team for an MLB team) will necessarily be normally distributed. That's why I transformed it the way I did. Farm talent isn't linear distributed like using straight rankings would assume.

Also, WPCT is close enough to being continuous that its no problem to treat it as such.

I'm not saying that it is/is not normally distributed, I'm just saying you provide no evidence that it is (and that it is simple enough to run it both ways to let us know what the difference is, if any). Similarly, I'm no statistician so I might be off base, but I'm having difficulty understanding how the central limit theorem justifies the assumption that the quality of farm systems are normally distributed.

Sky,

I agree with JD that the conversion to z-scores is an unnecessary assumption. I think the cleanest test would be a non-parametric approach like spearmans rank correlation. I think SPSS and the like allow you to enter in covariates, so you should be able to run the same exact test as above. It will be more conservative, but I doubt it will change your results.

It's probably worth doing because (I'd argue) the normality assumption is probably wrong. You would want the distribution of farm values (across teams) to be normally distributed. You're right that if any given teams farm value is a sum of independent random variables, that teams farm value distribution will be normal. But that doesn't mean that the distribution will be normal across teams. If there are systematic biases in how talent is distributed between farm systems, then you would expect a very non-normal distribution. But again, my guess is that the more conservative nonparametric tests will support your conclusions nonetheless.

Thanks for the publicity, Sky. I do want to point out that the rankings you linked to on SI.com are NOT Baseball America's rankings, however. They were mine. I don't do BA's farm system rankings myself; those are a collaborative effort. In fact, in that article, I ranked 1-5 and 26-30, but 6-25 were merely grouped in the top half and bottom half, then listed alphabetically. We haven't released our rankings yet, but they are in -- you guessed it -- the Prospect Handbook, which actually ships next week. Thanks for study, though. We feel good about our process and we're proud of our track record.

Please email me from a regular email.
You were close to stumbling upon one of the most remarkable patterns in baseball. Email me back or start your range a new years earlier covering Davey Johnson's Mets

Ted

Sky:

Re: statistical significance. First of all, your Market variable ISN'T statistically significant. Second of all, this does not mean they have independent effects. It could mean that, but it also could mean that there's a high degree of collinearity between those variables. Did you test for this? Third of all, unless you have an understanding of why Market and Salary should have an independent effect on future wins, then you simply shouldn't include them in the model. Period. Do you have a theory on why they would? I'd love to hear it.

Re: Rankings vs. Z-scores. I doubt as well Nate would use rankings in an OLS model, but that wasn't my point. You're assuming that the value of each farm system is normally distributed, even though in most of the years you only have 28 data points. Assuming that any set of data points with n=28 is going to be normally distributed is a very questionable assumption, and you don't have much of a foundation to apply it.

JD,
Ok, I'll agree that market size is only marginally signficant. As a result it isn't doing a lot in the model. If you remove it, things will remain pretty much the same either way. I will add, however, that when forecasting futher into the future, market size has a stronger effect. Why? A team may overspend or underspend its means in a particular year, but market size is a more permanent measure of a team's ability to attract talent.

As for normality, you have to assume the rankings have some sort of distribution in order to use it in a parameterized model. It's certainly not linear. As defense for normality, take a look at the dollar valuations done at BtB:
http://www.beyondtheboxscore.com/2009/3/27/807059/farm-system-value-rankings
As it turns out, the distribution is.....approximately normal.

Regardless of whether or not the model fulfills the assumptions of the Central Limit Theorem, it doesn’t satisfy the other requirements of Gauss-Markov. Just looking at the output raises red flags. It is conceivable that last year's salary has a negative impact on future winning percentage. One can imagine that teams with high payrolls may be suffering from bloated contracts given to aging players who depress winning percentage. This seems unlikely though (and may be the result of the multicollinearity between this year’s salary and last). I would be shocked if the coefficient was still negative if model was run excluding this year’s salary.

Amongst other problems, in order for OLS to provide a best linear unbiased estimate the covariance of the error terms is equal to zero (no auto correlation). Yet that won’t be the case in this example because the error terms will surely be related from year to year as salaries and winning percentage over time are included. There will also be issues of multicolinearity since the right hand side variables are related (e.g. market and salary). The model is not likely to be homoscedastic where for example, the marginal return of wins declines as salary rises.

Most importantly, though, the endogenous variable is not independent of the exogenous variables. To see this think of a variable that was excluded from this model: revenue. Winning percentage is almost certainly a function of revenue (not exclusively of course, but it is a variable). Revenue is not perfect because clubs have different debt structure and philosophical constraints, but it is sufficient for this example. Salary captures revenue, but not in it entirety. In a simple model Salary (S) maybe a function of R- anticipated revenue as well as actual revenues as the acquisitions may be made through the season, the owner’s wealth (O) and debt (D) with an error term (u).
S= ß1+ ß2R+ ß3O+ß4D+u

If you plug this into the model (with Market (M)…) then Winning percentage (W):
W=∂1+∂2+∂3(ß1+ ß2R+ ß3O+ß4D+u)+∂4Wt-1+…

But revenue is certainly a function of winning percentage as more wins tend to increase attendance… Here then revenue could be expressed:
R=π1+π2M+ π3W+…

In this example, where winning percentage is a function of revenue and revenue is a function of winning percentage, there is a simultaneous equation.

There are fixes for many of the problems (autocorrelation, et al) and many software packages have quick tests and even fix them with little effort. However if there isn’t independence between left and right hand side variables then OLS will not provide an efficient and consistent estimate and another method should be used. Most relationships you would wish to examine suffer from similar problems and require methods other than OLS.

James,
Let me first say, that yes, the model is not airtight. There are some other methods that could be used to refine it. That said, it works pretty well overall.

In response to your first point, salary from this year has a positive effect on future WPCT, while salary from last year has a negative effect. This is exactly what we would expect. Teams that have increased their payroll from last year expect to improve. Which team will do better? A) team payroll of $100 million, finished .500 last year, spent $50 million last year. Or B) team payroll of $100 million, finished .500 last year, and spent $200 million last year. The answer is obviously A. That's why the variable is negative. You are correct that if you exclude this year's payroll, the variable will show the opposite effect.

I'll agree I ignored the autocorrelation. However, the main result of doing that is that my standard errors are probably artificially small. Estimates remain the same, and most of the variables have sufficient power to still be significant even after inflating them. Overall, the basic model works, even if there are minor adjustments that could be made.

Sky,
Absolutely, no equation is airtight otherwise it would be an identity and not a model. Furthermore, I wouldn’t be surprised if OLS produced a reasonably accurate model in this case (i.e. consistently high rankings by BA indicating a degree of the value in the farm system that translates into a higher winning percentages in the years to come).

Most of my comment concerned minor points. The predictive power of models with multicollinearity, heteroscedasticity and autocorrelation problems will be fine. However models are not only supposed to predict but explain, The failure of these assumptions affects the explanatory component of the model. The coefficient should tell us about the impact of each variable in isolation from the others. Autocorrelation, as between past and present salaries, inflates the coefficient of determination and leads to inefficient estimators. The negative coefficient does not fully explain the affect of the past year’s salary without considering the present year’s salary. In terms of prediction this is fine, but not for explanation. Though explanation was less the goal of this analysis than prediction, did you consider for example using percent increase/decrease from last year to the present as the second variable to the current payroll? The covariance of the error terms I’d guess would be zero, and though it might introduce multicollinearity, I suspect it would be a cleaner variable.

My greater concern would be with the dependence of the left and right hand side variables. Predictively, OLS may be incidentally correct (and I’d certainly concede that the probability is high here) even if the model fails to satisfy the assumptions of Gauss-Markov. Personally, I wouldn’t be comfortable interpreting this or similar models 9 years into the future. The Texas Rangers are a good example of the variability and impact of management that would make me uneasy stretching that far, though that really is nit-picking. But if I was trying to quantify the affect with a fine degree of precision with regard to the magnitude and not just direction I would use something other than OLS.

Excuse me, but I had a question I forgot. Did you consider whether or not you were rejecting variables that are in fact statistically significant?

... it doesn't seem to make sense that a BA ranking from one or four years ago would have much more predictive value that the BA ranking from two or three years ago.

I have a two hypotheses that might explain this.

1. Contending teams are looking to win now, and when they have a highly ranked system *now* (best expressed by last year's rankings), that implies they have highly regarded minor league prospects, some of whom can be traded for help in the current year. So I wonder whether last year's system ranking correlates better because teams in contention trade prospects for current year help.

2. As for 4 years as opposed to 3, 5, or 6, I'd guess arbitration and cost is a major factor. 4 years after a system is ranked #1, its top prospects should have 2-3 years of MLB experience, enough to be getting very good, but not enough yet to command much salary in arbitration. So for earlier years, your talent hasn't yet matured to the point to optimally help your big league club. But for later years, cost-conscious teams may have already had to trade players they no longer can afford.

Big picture, I think you're right that small sample size is likely the biggest source of the difference in p-values. But I also wonder whether the two effects I describe might well be true.

To test the first hypothesis, I suppose you could break teams up into two groups, based, say, on how close they are to contention at the All Star break. Obviously contending teams would have a better record than non-contenders, so you'd have to adjust for that, but my hypothesis of trading prospects for current year help would predict that in the contending group, there would still be a positive correlation between last year's farm system rankings and this year's winning percentages. It would expect non-contending teams would have much smaller correlation, or perhaps negative correlation (teams not in contention are, if anything, more likely to trade away good talent for even more prospects, further weakening their current team). Of course, that makes sample sizes even smaller, and it becomes even harder to draw meaningful conclusions.

Hey Sky,

I didn't see it pointed out in the comments, but the Rangers were #4 in 2008.

This is the closest to a source I can find: http://www.sportsworldny.com/index.php?showtopic=16764

Keep posting stuff like this i really like it