Adjusting Band-Regression Estimators for Prediction: Shrinkage and Downweighting

Erhard Reschenhofer, Marek Chudy

International Journal of Econometrics and Financial Management OPEN ACCESSPEER-REVIEWED

Adjusting Band-Regression Estimators for Prediction: Shrinkage and Downweighting

Erhard Reschenhofer1,, Marek Chudy1

1Department of Statistics and Operations Research, University of Vienna, Vienna, Austria

Abstract

This paper proposes further developments of band-regression models for forecasting purposes, namely a simple method for shrinking the parameter estimates as well as a method for the automatic selection of the underlying frequency band. In combination with a method for downweighting older data, the improved band-regression model is used to forecast real GDP growth across nine industrialized economies. The results of this empirical study show that this forecasting approach outperforms conventional forecasting methods. As a secondary finding, the empirical results also raise doubts whether the yield-curve spread is really a valuable leading indicator of GDP growth.

Cite this article:

  • Erhard Reschenhofer, Marek Chudy. Adjusting Band-Regression Estimators for Prediction: Shrinkage and Downweighting. International Journal of Econometrics and Financial Management. Vol. 3, No. 3, 2015, pp 121-130. http://pubs.sciepub.com/ijefm/3/3/3
  • Reschenhofer, Erhard, and Marek Chudy. "Adjusting Band-Regression Estimators for Prediction: Shrinkage and Downweighting." International Journal of Econometrics and Financial Management 3.3 (2015): 121-130.
  • Reschenhofer, E. , & Chudy, M. (2015). Adjusting Band-Regression Estimators for Prediction: Shrinkage and Downweighting. International Journal of Econometrics and Financial Management, 3(3), 121-130.
  • Reschenhofer, Erhard, and Marek Chudy. "Adjusting Band-Regression Estimators for Prediction: Shrinkage and Downweighting." International Journal of Econometrics and Financial Management 3, no. 3 (2015): 121-130.

Import into BibTeX Import into EndNote Import into RefMan Import into RefWorks

At a glance: Figures

1. Introduction

A robust and often successful forecasting strategy is to combine forecasts from different models, either by averaging [1] or by weighted averaging [2]. Combining forecasts can also be beneficial in situations where there is only one model, but there is uncertainty about the optimal time period used for forecasting [3, 4]. In the absence of structural breaks, the use of the full sample period is clearly the method of choice. But if there are breaks, the forecast based on the full sample will be biased. In this case, one could try to estimate the locations of the breaks [5, 6, 7, 8] and use only data from the most recent regime for prediction. Unfortunately, it is often very difficult to obtain precise estimates of the locations and the sizes of the breaks [9]. The problem intensifies when predictions have to be made shortly after some parameter has changed. In general, the forecast based on a recent subsample will have a smaller bias and a higher variance than that based on the full sample. Trying to balance the trade-off between bias and variance, Pesaran and Timmermann [3] proposed to combine forecasts based on subsamples of different lengths. In the context of forecasting random walks with breaks in the drift, Pesaran and Pick [10] obtained Monte Carlo results as well as empirical results indicating that averaging forecasts over estimation windows can improve the forecasting performance compared to a single estimation window when the break size is not too small compared to the volatility. However, Steinberger [11] showed that, depending on the specification of the volatility process, single-window forecasting is asymptotically as good as average-window forecasting or even better. Pesaran et al. [12] derived “robust optimal” weights, which do not require knowledge of the break dates, and applied them to forecast real GDP growth across nine industrialized economies using the yield-curve spread as a predictor. In this application, their weights outperformed other weights in different settings.

The GDP forecasts in [12] are obtained in two steps. First, the observations are weighted and then conventional regression models are applied to the weighted observations. The main focus of our paper is on the second step. We modify conventional regression models by imposing frequency-domain restrictions which are plausible for economic data and are also supported by empirical evidence. Our approach is related to the band-regression approach which is based on the assumption that linear relationships between several variables exist only in certain frequency bands [13, 14]. For example, band regression is used for the analysis of relationships in the presence of seasonal patterns or for the estimation of cointegrating relationships (e.g. [15]). In the former case, narrow bands around all seasonal frequencies must be excluded and, in the latter case, the focus should be on a narrow band close to frequency 0. However, band regression can - in contrast to our approach - not be used for prediction in the time domain.

Section 2.1 derives a shrinkage version of the conventional least squares (LS) estimator under certain frequency-domain restrictions. In addition, fractional degrees of freedom are defined, which are later used in Section 3 for the automatic choice of the frequency-domain restrictions. Section 2.2 discusses different weights which are intended to take care of parameter instability. Section 3 combines the methods discussed in 2.1 and 2.2 and applies them to the economic data set used in [12]. All methods required for this empirical analysis were programmed in R [16]. Section 4 concludes.

2. Methods

2.1. Band Regression
2.1.1. Deterministic Regressors

Let

(1)

where , , and is a nonstochastic matrix with full column rank . Pre-multiplication of both sides of this equation by the matrix , , which consists of the first rows of the matrix

(2)

where , , gives

(3)

with and

(4)

We assume that all variables are mean-corrected, hence there is no need to include a constant dummy variable which would produce a column of zeroes in . Alternatively, this singularity could be avoided by including frequency zero, i.e., adding the row

(5)

In case of odd , the matrix would then have columns. In case of even , this can also be achieved by including additionally frequency , i.e., adding also the column

(6)

While the estimator

(7)

is best linear unbiased, it will be inefficient relative to

(8)

if the rank of is less than . Hence, the latter estimator should always be preferred to the former unless it is suspected that some model assumptions are violated.

Replacing both the dependent variable and all regressors by their projections onto the span of the columns of also leads to the band-regression estimator, i.e.,

(9)

where

(10)

The covariance matrix of is given by

(11)

Now assume that the linear regression model (1) is misspecified because a linear relationship exists only between and , but not between and , i.e.,

(12)

for some , whereas

(13)

Under (13), it seems natural to modify the conventional LS estimator

(14)

by setting

(15)

The resulting estimator

(16)

is a compromise between the band-regression estimator , which is suitable for , and the trivial estimator , which is suitable for .


2.1.2. Jointly Stationary Series

If , i.e., , the band-regression estimator (7)can be written as

(17)

where

(18)

is the discrete Fourier transform of the series and

(19)

is the cross periodogram of the series and .

If the regressor is not deterministic but the two series are rather jointly stationary, the real part of the cross periodogram is an estimator of the cospectrum, the integral of which is just the covariance between and . However, cospectra of economic time series rarely appear to be constant over all frequencies. If the integral of the cospectrum over the interval is negligible compared with that over the interval and the goal is to find an appropriate linear predictor of the form , then it seems natural to use rather than the conventional LS estimator

(20)

which minimizes the sum of squared errors. Noting that

(21)

and

(22)

if both series are mean corrected and is odd, the sum of squared errors can also be written as

(23)

Incorporating the information that the integral of the cospectrum over the interval is negligible leads to the estimator

(24)

which can also be obtained from the band-regression estimator by shrinkage towards zero, i.e., by multiplication by

(25)

If the focus is on the relationship between the low-frequency components of two series, the use of the band-regression estimator is appropriate. However, if it is on prediction in the time domain, the whole frequency range will be used. Ideally, the low-frequency components should be multiplied by and the high-frequency components by zero. Unfortunately, this differentiation is not possible in the time domain. While already the unrestricted estimator strikes a balance between the two extremes, the estimator is clearly the better choice because it additionally makes use of available extra information.


2.1.3. Degrees-of-Freedom Adjustment

The variance of a stationary process with variance can be obtained from its spectral density via

(26)

hence we have for large

(27)

The sum of squared residuals

(28)

is therefore approximately equal to

(29)

when the integral of the cospectrum over the interval is negligible and is large.

Now assume that the two processes are independent Gaussian white-noise processes. Then

(30)

are i.i.d. , which implies that

(31)

and

(32)

An approximately unbiased estimate of is given by

(33)

and can be estimated by

(34)

In the standard case, when all frequencies are used, i.e., , the number of degrees of freedom is just . In contrast, it is a fractional number, namely , when only frequencies are used.

2.2. Downweighting

Pesaran and Pick [10] investigated the performance of univariate one-step-ahead forecasts of the form

(35)

where

(36)

and found that the average-window (AW) weights

(37)

(see [3]), which are obtained by averaging across all subsamples

(38)

outperform the single-window (SW) weights

(39)

unless the breaks are very small. Complementing this result, Steinberger [11] compared these weights in a low-signal/high-noise framework. In his asymptotic analysis, he allowed for an increasing number of breaks. Depending on the specification of the volatility process, the asymptotic mean square forecasting error (MSFE) of the SW weights turned out to be asymptotically as small as that of the AW weights or even smaller.

Assuming that both the regression parameters and the error variance of a simple regression model are subject to a single structural break, i.e.,

(40)

and

(41)

Pesaran et al. [12] derived optimal weights for one-step-ahead forecasts of the form

(42)

where

(43)

Their optimal weights have a discrete change which depends on the time and size of the break. This discrete change can only be approximated poorly by weights that are either constant or slowly decaying. Moreover, the time and size of the structural break are usually unknown and must therefore be estimated. Monte-Carlo results in [12] suggest that the use of estimates instead of the true values for the calculation of the weights is advisable only if the break is large and can easily be identified. Otherwise, a more robust approach is required. Pesaran et al. [12] derived "robust optimal" (RO) weights

(44)

by integrating out the effects of uncertainty about the breaks with respect to some given distribution. However, as can be seen from

(45)

the "robust optimal" weights are very similar to the conventional average-window weights, hence it is unrealistic to expect too much of them.

Anyhow, there simply cannot be any fully specified weights that are appropriate for all applications. At least, adjustments to the respective applications must be made with the help of some tuning parameters. For example, an integer-valued tuning parameter can be introduced by requiring a minimum size for the subsamples to be used in window averaging (see [10]) and a real-valued tuning parameter by averaging across exponentially decaying weights rather than equal weights, i.e.,

(46)

where

(47)

and

(48)

as . Clearly, it can never be ruled out that in a concrete application some ad-hoc weights such as

(49)

outperform weights that are supposedly more sophisticated.

3. Empirical Results

Downweighting older observations is an established method to take care of parameter instability in forecasting tasks. Pesaran et al. [12] derived optimal weights for one-step-ahead forecasts under the assumption that information about the location and the size of the breaks is available and then obtained "robust optimal" weights by integrating out the effects of uncertainty about the breaks with respect to some given distribution. Using the latter weights to forecast real GDP with the slope of the yield curve as explanatory variable, they observed a significant improvement over conventional methods. In the following, we will examine whether the focus on certain frequency bands can lead to a further improvement. For the sake of comparability, the same data set will be used. This data set contains quarterly observations on the real GDP as well as long and short interest rates from 1979Q1 to 2009Q4 of Australia, Canada, France, Germany, Italy, Japan, Spain, UK, and USA. It is part of the larger data set “GVAR data”, which is an extension of the data set originally used by Dees et al. [17] and can be downloaded from http://www-cfap.jbs.cam.ac.uk/research/gvartoolbox/download.html.

The slope of the yield curve is supposedly a valuable leading indicator of GDP growth (e.g. [18-23][18], [12]). Forecasts of the GDP growth rate at time n, yn+1, may therefore be obtained from (42) and (43), where the explanatory variable xn+1 is the slope of the yield curve at time n. The growth rates are defined as the differenced log GDP values (multiplied by 100) and the slopes as the differences between the long-term interest rates and the short-term interest rates (yield-curve spreads).

At first glance, forecasts of the form (42) seem a bit naïve because no lagged values of the dependent variable are used. But this is only a serious problem if SW-weights are used in (43). In the case of declining weights, the dummy variable takes over the job of modeling the conditional mean and can compete with low-order ARMA models, at least when there are tuning parameters. For example, the best forecast of a simple invertible MA(1) process

(50)

is just given by

(51)

It turns out that the dummy variable is quite successful in modeling the conditional mean. Actually, it is so successful that there is hardly anything left to explain for the true explanatory variable, namely the spread.

Following [3] recursive out-of-sample forecasts are constructed for the period from 1994Q1 to 2009Q4. Table 1.a compares averages (over all countries) of sums of squared forecast errors. Consistent with the findings in [3], the forecasts obtained with the AW weights and the RO weights outperform those obtained with the SW weights. While it is not surprising - because of relationship (45) - that the former two are practically equivalent, it is remarkable that these “optimal” weights are clearly outperformed by some ad-hoc weights such as (49), and to make matters worse, even without fine-tuning, i.e., simply with . Even more remarkable is the fact that the performance of the forecasts using the “optimal” weights does not deteriorate when the spread is removed and only the dummy variable is left, which raises a doubt whether the yield-curve spread is really a valuable leading indicator of GDP growth. In contrast, using the lagged GDP growth instead of the spread in addition to the dummy variable immediately boosts the forecasting performance. For a more informative comparison of the competing forecasts, we use cumulative sums of forecast errors rather than total sums. Figure 1 shows the cumulative sums of absolute forecast errors separately for each country. Despite the fact that absolute errors are used rather than squared errors for reasons of robustness, the curves still fluctuate considerably. Overall, the results are largely consistent across countries and in line with Table 1.a. There are no apparent country-specific characteristics. The only exception is Australia where the benchmark model (with dummy variable plus spread and SW weights) is competitive.

Figure 1. Cumulative absolute forecast errors of models 1 SW (gray), 1+x AW (red), 1 AW (pink), 1+x RO (green), 1+x AH (darkgreen), 1+L(y) SW (blue), 1+L(y) AW (purple) relative to 1+x SW (black): (a) USA (b) Japan (c) Germany (d) UK (e) France (f) Italy (g) Spain (h) Canada (i) Australia

Table 1. Averages (over all countries) of the sums of squared forecast errors of weighted (band) regressionsof GDP growth (y) at time non a dummy variable (1) and the yield-curve spread (x) at time n-1. Also included are forecasts where the spread is replaced by the lagged GDP growth (Ly). The conventional forecast based on the whole frequency band and obtained with SW weights serves as a benchmark (=1.000)

Figure 2. Frequency-domain analysis of U.S. data: (a) GDP growth, (b) Periodogram of GDP growth, (c) Spread, (d) Periodogram of spread, (e) Co-periodogram (f) Quadrature periodogram

Figure 2 displays the results of a frequency-domain analysis of the U.S. data. The size of the periodogram at a particular frequency indicates how strongly oscillations of this frequency are represented in a time series. The co-periodogram indicates for each frequency how strongly two time series oscillate with a phase difference of zero or half a cycle and the quadrature periodogram indicates how strongly they oscillate with a phase difference of a quarter cycle. Figure 3 displays the co-periodograms of GDP growth and (lagged) spread for all nine countries. Figure 2 and Figure 3 show that the variance of the growth rate, the variance of the spread, and the covariance between the growth rate and the spread are largely determined by the low frequency components of these time series. A band-regression approach may therefore be promising. Thus, the shrinkage estimator (24), which has been derived in the previous section, is also included in the comparative study. Table 1.b shows the results. Again, cumulative sums of absolute forecast errors are displayed separately for each country to provide more detailed information about differences in the forecasting performance (see Figure 4). Apparently, focusing only on some low frequency band has no effect on the forecasts utilizing the spread. They are just as bad as before.

Figure 3. Co-periodogram of GDP growth and lagged spread: (a) USA (b) Japan (c) Germany (d) UK (e) France (f) Italy (g) Spain (h) Canada (i) Australia

However, when the spread is replaced by a variable with predictive power additional to the dummy variable, the band-regression approach works quite well. The best results are obtained when the concepts of downweighting and band regression are combined (see the last line of Table 1.b). A suitable upper limit bound for the frequency band is in the range from 0.15π and 0.25π. The fractional BIC

(52)

selects also an upper bound in that range (on an average about 0.15π). In contrast, the fractional AIC

(53)

is of little help as it typically selects all frequencies. Both the fractional AIC and the fractional BIC are obtained from their conventional counterparts simply by replacing the degrees of freedom k occurring in these criteria by the fractional degrees of freedom kr/m.

Figure 4. Cumulative absolute forecast errors of band regression (r=0.2m) models 1+x SW (red), 1+x AW (gold), 1+L(y) SW (green), 1+L(y) AW (blue) relative to standard regression model 1+x SW (black): (a) USA (b) Japan (c) Germany (d) UK (e) France (f) Italy (g) Spain (h) Canada (i) Australia

4. Discussion

Using frequency-domain information for forecasting purposes is a nontrivial task because it is not clear how the information contained in the phases should be processed unless there are strict periodicities. In contrast, forecasts based on time-domain models such as ARMA models depend only on the most recent observations and thereby utilize phase information automatically. Of course, estimates of the parameters of time-domain models can also be obtained indirectly from the amplitudes, but this extra effort will only be worthwhile if there is some advantage over straightforward time-domain estimation. For example, if a linear relationship between variables exists only in a certain frequency band, it will make sense to focus just on this frequency band and ignore the other frequencies. However, the improved estimator of the linear relationship obtained in this way, i.e., the band-regression estimator, is only relevant for certain frequencies and completely irrelevant for the other frequencies. The estimator proposed in this paper therefore forges a compromise between these extremes by shrinking the band-regression estimator towards zero, where the right amount of shrinkage is determined by imposing frequency-domain restrictions on the conventional time-domain estimator.

This estimator is then applied to forecast international GDP growth rates and its forecasting performance is compared with that of other estimators. In addition, the effect of downweighting older observations to take care of structural breaks is examined. The new estimator applied to downweighted observations comes out best. Remarkably, it makes no difference whether or not the slope of the yield-curve is included in the model. Previous evidence that this variable may be a valuable leading indicator of GDP growth is attributed to a possible misspecification of the univariate dynamics of GDP growth.

In principle, the pooling of cross section and time series data would allow the investigation of country specific and time specific factors. However, this is beyond the scope of this paper and is left for future work.

Acknowledgement

We would like to thank David Preinerstorfer, Lukas Steinberger and the reviewer for their valuable comments and suggestions, which substantially improved this manuscript.

References

[1]  Barnard, G., “New methods of quality control, ”Journal of the Royal Statistical Society: Series A, 1963, 126, 255-258.
In article      View Article
 
[2]  Bates, J.M. and Granger, C.W.J., “The Combination of Forecasts,” OR, 20, 451-468. 1969.
In article      View Article
 
[3]  Pesaran, M.H. and Timmermann, A., “Selection of estimation window in the presence of breaks,” Journal of Econometrics, 137: 134-161. 2007.
In article      View Article
 
[4]  Clark, T.E. and McCracken, M.W., “Improving forecast accuracy by combining recursive and rolling forecasts,” IER, 50, 363-395. 2009.
In article      View Article
 
[5]  Bai, J. and Perron, P., “Estimating and testing linear models with multiple structural changes,” Econometrica, 66: 47-78. 1998.
In article      View Article
 
[6]  Bai, J. and Perron, P., “Computation and analysis of multiple structural change models,” Journal of Applied Econometrics, 18, 1-22. 2003.
In article      View Article
 
[7]  Preinerstorfer, D., “Linear forecasting and subset-selection-based forecasting in structural break models,” Master's thesis, University of Vienna, Austria. 2011.
In article      
 
[8]  Reschenhofer, E., Preinerstorfer, D. and Steinberger, L., “Non-monotonic penalizing for the number of structural breaks,” Computational Statistics, 28, 2585-2598. 2013.
In article      View Article
 
[9]  Paye, B.S. and Timmermann, A., “Instability of return prediction models,” Journal of Empirical Finance, 13, 274-315. 2006.
In article      View Article
 
[10]  Pesaran, M.H. and Pick, A., “Forecast combination across estimation windows,” Journal of Business and Economic Statistics, 29: 307-318. 2011.
In article      View Article
 
[11]  Steinberger, L., “Forecasting random walks with structural breaks: averaging across estimation windows,” Master's thesis, University of Vienna, Austria. 2012.
In article      
 
[12]  Pesaran, M.H., Pick, A. and Pranovich, M., “Optimal forecasts in the presence of structural breaks,” Journal of Econometrics, 177: 134-152. 2013.
In article      View Article
 
[13]  Hannan, E. J., “Regression for Time Series,” in Proceedings of the Symposium on Time Series Analysis, John Wiley and Sons, 14-37. 1963.
In article      
 
[14]  Engle, R. F., “Band spectrum regression, ”International Economic Review, 15: 1-11. 1974.
In article      View Article
 
[15]  Phillips, P.C. B., “Spectral regression for co-integrated time series,” in Nonparametric and Semiparametric Methods in Economics and Statistics, Cambridge University Press. 1991.
In article      
 
[16]  R Core Team, “A language and environment for statistical computing.” R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/. 2013.
In article      
 
[17]  Dees, S., Di Mauro, F., Pesaran, M.H. and Smith, L.V., “Exploring the international linkages of the euro area: a global VAR analysis,” Journal of Applied Econometrics, 22: 1-38. 2007.
In article      View Article
 
[18]  Estrella, A. and Hardouvelis, G.A., “The term structure as a predictor of real economic activity,” Journal of Finance, 46: 555-576. 1991.
In article      View Article
 
[19]  Estrella, A. and Mishkin, F.S., “The predictive power of the term structure of interest rates in Europe and the United States: Implications for the European Central Bank,” European Economic Review 41: 1375-1401. 1997.
In article      View Article
 
[20]  Stock, J. H. and Watson, M.W., “Forecasting output and inflation: The role of asset prices,” Journal of Economic Literature 41: 788-829. 2003.
In article      View Article
 
[21]  Estrella, A., Rodriguez, A.P. andSchich, S., “How stable is the predictive power of the yield curve? Evidence from Germany and the United States,” Review of Economics and Statistics, 85: 629-644. 2003.
In article      View Article
 
[22]  Giacomini, R. and Rossi, B., “How stable is the forecasting performance of the yield curve for output growth?,” Oxford Bulletin of Economics and Statistics, 68: 783-795. 2006.
In article      View Article
 
[23]  Schrimpf, A. and Wang, Q., “A reappraisal of the leading indicator properties of the yield curve under structural instability,” International Journal of Forecasting, 26: 836-857. 2010.
In article      View Article
 
  • CiteULikeCiteULike
  • MendeleyMendeley
  • StumbleUponStumbleUpon
  • Add to DeliciousDelicious
  • FacebookFacebook
  • TwitterTwitter
  • LinkedInLinkedIn