Although some say that there is a certain degree of predictability in stock market returns, the consensus is that estimating expected returns is a challenging, even an insurmountable task. The difficulty in estimating expected returns makes the outcome of traditional optimisation algorithms such as the ground-breaking mean-variance portfolio of Markowitz very controversial. First of all, there is the uncertainty surrounding expected returns, and secondly their impact on the overall result is substantial since trivial differences have a huge influence on the optimised portfolios.

### Predictive powers?

Though constant expected returns are widely agreed to be too restrictive, many use the simple long-run average of realized returns as their best estimation for expected returns, hereby skipping the daunting task of trying to capture the undeniable time-varying behaviour of risk premiums. This choice is defendable. Academic literature has identified several variables with some predictive ability, dividend-to-price ratio and the book-to-market ratio to name but two. However most of these traditional valuation ratios gave bearish signals during the 1990s stock market rally and their predictive power seems to have been lost. This illustrates the main problem that most individual predictors face. Although they might have value at a certain moment, sooner or later their predictive power, temporarily or definitively, breaks down.

Additionally, many of the relationships that are documented rely on an *in-sample *fit, meaning that the merit of the *predicting *variable is judged by looking at how well it described the past, hereby using a complete dataset. However, the obtained relationship wouldn’t have been available to an investor in the past, as it is (at least partly) based on data that were not yet available at that time and, unfortunately, an investor doesn’t have the benefit of foresight. For an investor and for any predictor in general, the real value of a forecasting model should, logically, be judged on the quality of the predictions it made. Anyone pondering taking up the challenge of trying to beat the historical average as best predictor for future returns should refrain from overly relying on *individual**, in-sample fitted *models.

### Pooling forecasts

One way to tackle these issues can be to intelligently combine the forecasts of individual models into one pooled forecast. Intuitively, one can easily grasp that relying on a forecast coming from different models is less risky than relying on a single model, although this doesn’t necessarily mean that a combined forecast will perform better than the best individual predictor(s). The analogy with investing is clear: combining different assets in a portfolio doesn’t necessarily yield the best return, but it reduces the risk of being in the wrong asset(s) at the wrong time. Looking at potential combination methods, we notice that the list is surprisingly long and that they come in many different shapes and forms, stretching from a simple average (i.e. in the case of four individual predictors, each gets allocated a weight of 25%) to complex statistical combination techniques like Bayesian model averaging. An interesting and not too complex class determines the weights of each individual prediction by the historical forecasting performance. Well-performing predictions get a higher weight in the combined forecast and poorly-performing predictors will get a lower, potentially even zero, weight. The *forecasting *performance is judged by a model’s *out-of-sample *performance. One method could be a real-time assessment. Suppose the investor wants to check the performance of a model that tries to predict the 10-year expected return. He or she could use the data up to today to check for any relationship and make a prediction for the return to be expected over the next ten years. The next quarter he or she will use the extra data point to recalibrate the relationship and again make a prediction for the expected return. After ten years he or she will be able to see how good his or her first prediction was. The limitation of this method is clear: an unrealistically long wait will be necessary before the investor will be able to have a sufficient long track record to judge the quality of the model.

### Using past data

Another, more feasible, solution is to simulate the forecaster’s situation by using the past data. The investor places himself or herself at a certain moment in the past, uses data *known up until that point only *to look for a relationship between 10-year returns and the predictive variable, and makes a prediction. Then he or she expands the initial sample with one more extra observation, recalibrates the model and makes a new prediction, and so on. Assuming he or she has a sufficiently long data set, he or she can verify how well the model predicts, as a large part of the subsequent 10-year returns has already realised. For example, today the last forecast he or she can judge is the one created with data up until ten years ago (i.e. he or she knows the return that cumulated over the last ten years and can compare this to the prediction that *would have been *made in the past). This exercise will be done for each individual model and, as stated before, the better-performing models will receive more weight in the combined forecast*[1]*. When new data is available the next quarter, the whole exercise will be repeated and new weights will be determined.

### Putting it into practice

Going from theory to practice we examined a set of individual predictors and checked how well they predicted returns compared to a pooled forecast, using the combination technique described above. The left graph shows the 10-year expected annualised return forecast for the US market made by our six chosen variables. *Dividend Yield (DY), Shiller’s PE (SHILLERPE), Earnings Yield (EP), Tobin’s Q (TOBINQ), Market Cap to Gross **Value Added (MC2GVA)[2] *are all variables whose predictive power has been studied by academics and/or investment practitioners. A sixth variable, *the US unemployment rate (U) *was added. Not much hope is put on the last variable as being a valuable predictor for long-term expected return but we want to see how the pooling algorithm treats it *[3]*.

The individual forecasts (left-hand graph) display high variability. *Shiller’s PE, Earnings Yield, Tobin’s Q and Market Cap to Gross Value Added *seem to have decent predicting power. Dividend Yield gives rather erratic forecasts, but probably cannot be ignored completely and the presumption of unemployment rate being a poor predictor seems confirmed as it fails to capture most of the dynamics of stock returns. Also the benchmark forecast, the long-term average of past returns, clearly forgoes the time-varying behaviour of financial markets. The pooled forecast (right-hand graph) seems to do relatively well; it clearly captured the big trends and during a couple of years its projection was even spot-on with the realised subsequent 10-year return.

How much weight is given to each individual forecast and their evolution over time is shown in the graphs below. *Market Cap to Gross Value Added *received the biggest weight over the full evaluation period. *Unemployment rate *got banished quickly. *Dividend Yield *was ousted at the end of 2004 but revived a couple of years later. Interestingly, the only predictor that did a better job than the combined forecast was the *Market Cap to Gross Value Added*, which received an average weight of 42%[4]. So the combined forecast beats five out of six individual predictors, but reduces the risks associated with reliance on a single model.

### Conclusions

Now is this the long-awaited holy grail? Unfortunately not: the pooled forecast missed the stock market rally of the 1990s by a mile. Additionally, a discrepancy of 2% between expected and realised *annualised *return looks OK visually but means almost 22% of *cumulative *difference over the 10-year prediction period. For relative valuation among different regions, the use is limited as well. The US is a widely-researched region with a long history of data, which is needed for these types of algorithms. For other regions, such as Europe or Emerging Markets, finding the right (and sufficiently long) data set is hard. Although trends in expected returns contain valuable information for an investor, one should remain sceptical of any point estimations, especially if they are deduced from an individual predictor. No matter how well a single predictor was able to forecast, structural breaks will occur sooner or later and its predictive power will be lost. Using different predictors and then synthesising them seems a far better solution, but even then a great deal of uncertainty remains. Therefore, it is necessary to treat the following number with care: 1.3%. That number is the annualised return for the US market for the ten years to come.

*1 Weights are inversely related to the Mean Squared Forecast Error *

*2 Market Cap to Gross Value Added is a valuation metric proposed by John Hussman and is an adaption of the more widely-known “Buffet Indicator”,*

*Market Cap to GDP*

*3 We’ve limited ourselves to 6 variables, but of course it is possible to choose a wider set of potential predictors*

4 Based on the out-of-sample R² statistic, introduced by Campbell & Thompson (2008)

4 Based on the out-of-sample R² statistic, introduced by Campbell & Thompson (2008)