(Hint: It's Not What you Think)
Years ago, fresh out of graduate school, I was hired by Data Resources, Inc., Otto Eckstein’s econometric consulting firm. Our green, doe‐eyed class went through DRI’s training program and steeped ourselves in academic literature on building time‐series regression models.
Yet after building dozens of models and seeing which ones forecast well and which did not, I reached a surprising conclusion: Models that would pass even the most rigorous academic scrutiny were not necessarily the ones that produced the best forecasts. In fact, some models that appeared weak from an academic perspective and could never get published were forecasting champs. After examining a number of models like this I realized that one thing separated winners from losers. More surprisingly, this key factor did not appear in any academic textbooks. What was it? Coefficient elasticities. Models with reasonable elasticities on the coefficients performed well; those that were too large or too small did not.
Years later, after building almost 200 choice models in commercial settings, I find a similar situation in this realm, too. Choice models are typically judged by performance on holdout samples. It’s easy to see why. Predicting results using holdout samples is a comparatively easy thing to do and appears a reasonable approach. Journal editors, in particular, like this idea. For practitioners, though, the goal is very different, for they have to contend with the messy task of producing models that forecast well in the real world. Holdout samples, especially those using made‐up data, avoid the messiness that prevails in the world’s complexities.
Practitioners, though, are less interested in getting papers published. Real‐world clients don’t pay for forecasting holdout samples; they care about how well models forecast in a complex setting where extraneous variables often affect outcomes. A number of studies, including mine, show that models that predict well using holdout samples do not necessarily produce better forecasts in a complicated world.
So what does work? Once again it’s not found in academic papers or conference proceedings. Based on years of observing successful models and unsuccessful models, this is what I have found: