Difference between revisions of "Forecasting"

From ECLR
Jump to: navigation, search
(Literature)
 
(2 intermediate revisions by the same user not shown)
Line 135: Line 135:
 
The most common summary measures of forecast performance are the following
 
The most common summary measures of forecast performance are the following
  
<math>\textbf{Forecast Bias}: bias_i = \sum_{\tau} u_{i,\tau}\\
+
<math>\textbf{Mean Forecast Bias}: bias_i = \frac{1}{P} \sum_{\tau} u_{i,\tau}\\
\textbf{Mean Square Error}: MSE_i = \sum_{\tau} u_{i,\tau}^2\\
+
\textbf{Mean Square Error}: MSE_i = \frac{1}{P}  \sum_{\tau} u_{i,\tau}^2\\
\textbf{Mean Absolute Error}: MAE_i = \sum_{\tau} |u_{i,\tau}|\\</math>
+
\textbf{Mean Absolute Error}: MAE_i = \frac{1}{P}  \sum_{\tau} |u_{i,\tau}|\\</math>
  
Here the summations are over all forecast periods <math>\tau</math>. When these are used we would argue that that smaller measures (in absolute terms for the bias measure) are to be preferred. What these measures cannot tell us is whether the differences in these measures, between different models, are statistically significant.
+
Here the summations are over all <math>P</math> forecast periods <math>\tau</math>. When these are used we would argue that that smaller measures (in absolute terms for the bias measure) are to be preferred. What these measures cannot tell us is whether the differences in these measures, between different models, are statistically significant.
  
 
== Comparing Different Forecasts ==
 
== Comparing Different Forecasts ==
  
We can compare two different types of approaches to comparing two or more forecast models. Clark and McCracken ([[[
+
We can compare two different types of approaches to comparing two or more forecast models. Clark and McCracken ([http://research.stlouisfed.org/wp/2011/2011-025.pdf 2011]) describe the traditional approach as the ''population level'' approach and distinguish this approach from the ''finite sample inference''. The key issue that the different approaches address is the fact that model parameters are estimated on the basis of finite samples. The second approach, ''finite sample inference'', accepts these parameters estimated as they are, and asks whether, given these estimated parameters, different models deliver significantly different inference. The former, the ''population level'' approach, however, uses the same sample evidence, produced with parameters estimated on the basis of finite samples, but tests hypotheses based on true model parameters . This implies that the resulting test statistics will have to take account of the variation in the test statistic introduced through the parameter estimation process.
http://research.stlouisfed.org/wp/2011/2011-025.pdf|
 
http://research.stlouisfed.org/wp/2011/2011-025.pdf]]|2011]) describe the traditional approach as the ''population level'' approach and distinguish this approach from the ''finite sample inference''. The key issue that the different approaches address is the fact that model parameters are estimated on the basis of finite samples. The second approach, ''finite sample inference'', accepts these parameters estimated as they are, and asks whether, given these estimated parameters, different models deliver significantly different inference. The former, the ''population level'' approach, however, uses the same sample evidence, produced with parameters estimated on the basis of finite samples, but tests hypotheses based on true model parameters . This implies that the resulting test statistics will have to take account of the variation in the test statistic introduced through the parameter estimation process.
 
  
 
In what follows we will point out which tests belongs to which category.
 
In what follows we will point out which tests belongs to which category.
Line 167: Line 165:
 
=== Comparing MSE when using estimated models ===
 
=== Comparing MSE when using estimated models ===
  
The following tests belong to the category of ''population level'' tests. Here we compare forecasts from two models, A and B, and decide whether the null hypothesis <math>H_0:MSE_A=MSE_B</math> for true, but unknown population parameters can be rejected. The difference to the DM test discussed in the previous sub-section is that we now allow the forecasts to come from estimated models and hence we need to allow for parameter uncertainty to come into the equation.
+
The following tests belong to the category of ''population level'' tests. Here we compare forecasts from two models, A and B, and decide whether the null hypothesis <math>H_0:MSE_A=MSE_B</math> for true, but unknown population parameters can be rejected. The difference to the DM test discussed in the previous sub-section is that we now allow the forecasts to come from estimated models and hence we need to allow for parameter uncertainty to come into the equation. In effect we are moving from comparing forecasts (as in the DM test) to comparing Models (via their forecasts).
  
= Forecasting Setup =
+
As it turns out, the test statistic
  
Imagine you have a dataset with <math>T</math> observations and you are planning to run a forecasting exercise for a forecasting horizon of <math>\tau</math>. If <math>\tau=1</math> then we are talking about one-step ahead forecasts. You want to use your available datasample to produce &quot;out-of-sample&quot;<ref>Here we are not talking about genuine out-of-sample forecasts, as they would forecast for time periods after <math>T</math>.
+
<math>MSE-t_{AB} = \frac{\bar{d}_{AB}}{\hat{\sigma}_{\bar{d}_{AB}}}</math>
</ref> forecasts and evaluate these. This means that we need to split the <math>T</math> observations into data which are used to estimate model parameters and observations for which we then produce forecasts.
 
  
In order to make this discussion more tangible we will use the following example. Imagine we have a univariate time series stored in <code>y</code>. It is of length <math>T</math> and you want to produce the following conditional mean forecasts, starting with an information set <math>I_R</math>
+
is exactly the same<ref>The test stat has a different label to indicate that the underlying assumptions are different to those for the <math>DM</math>-test. The name <math>MSE-t</math> is that used in the Clark and McCracken (2011) paper.
 +
</ref>. To be precise, in this context one should condition <math>\bar{d}_{AB}(\hat{\theta_A},\hat{\theta_B})</math> on the estimated model parameters <math>\theta_A</math> and <math>\theta_B</math>. And the hypothesis tested is <math>H_0:MSE_A(\theta_A)=MSE_B(\theta_B)</math>, stating that the models have equal predictive ability under the true (but unknown) model parameters.
  
<math>E[y_{R+\tau}|I_R]\\
+
As it turns out, this difference to the DM setup, although it may sound minor or technical, wreaks havoc when it comes to deriving the asymptotic distribution of the test statistic. Under the simple assumption made in DM the test statistic was asymptotically standard normally distributed. Now the test’s distribution under the null hypothesis of equal predictive ability depends on a number of features, such as whether the models are nested or not and which forecasting scheme has been used.
E[y_{R+1+\tau}|I_{R+1}]\\
 
E[y_{R+2+\tau}|I_{R+2}]\\
 
...\\
 
E[y_{T}|I_{T-\tau}]</math>
 
  
Each forecast depends on information available at the time of the forecast (in the case of a univariate model this is just the value of the series available and on an estimated parameter vector, say <math>\widehat{\mathbf{\beta}}</math>. The next issue we need to pin down is to determine on the basis of which information we obtain these parameter estimates.
+
As it turns out the distributions are such that in general one has to employ bootstrapping techniques to establish correct critical values (see Clark and McCracken, 2011, for details).
  
There are three common, distinctly different schemes.
+
=== Forecast encompassing ===
  
== Fixed Scheme ==
+
The test discussed in the previous section is a two sided test in the sense that there is no view on which of the different models may be superior. The concept of forecast encompassing works with different hypothesis. Now the null hypothesis is that, say, Model A is as good as it gets, or at least, that Model B does not contribute any extra forecasting value to that delivered by Model A. If this was the case then Model A is said to ''forecast encompass'' Model B.
  
In this scheme we estimate the model parameter once only, for the first forecast period. To be precise we use observations 1 to <math>R</math> to obtain <math>\widehat{\mathbf{\beta}}_{1,R}</math> where the subscript reflects the observations on the basis of which the estimate is obtained. For subsequent forecasts we continue to use that estimate.
+
Previously the test statistic was based on <math>d_{AB\tau}=u_{A,\tau}^2-u_{B,\tau}^2</math>. Now we are basing the test on the following term, <math>c_{AB\tau}=u_{A,\tau}(u_{A,\tau}-u_{B,\tau})</math>. We are basically trying to establish whether the forecast error of Model A, <math>u_{A,\tau}</math>, is correlated to the difference in forecast errors, <math>(u_{A,\tau}-u_{B,\tau})</math>. If Model B cannot improve on Model A, then there should be no such correlation. The test statistic we use here is
  
This is clearly potentially suboptimal in the sense that we are not making the best possible use of the newly available information, e.g. <math>y_{R+1}</math> for the second forecast, <math>E[y_{R+1+\tau}|I_{R+1}]</math>. While that information is used in the conditioning information, it may also be that this new observation would change our parameter estimate. This scheme is generally only used if the model we use is extremely difficult to estimate, in the sense that an estimation takes a long time.
+
<math>ENC-t_{AB} = \frac{\bar{c}_{AB}}{\hat{\sigma}_{\bar{c}_{AB}}}</math>
  
What follows is a schematic piece of MATLAB code that could accomplish this. Assume that your <math>(T \times 1)</math> vector for the dependent variable is stored in <code>y</code> with typical element <math>y_t</math>. Further the matrix <code>X</code> contains a <math>(T \times k)</math> matrix with corresponding explanatory variables. For this example we will assume that we care about one-step ahead forecasts and that all values in the <math>t</math>th row of <code>X</code> are available at time <math>t-1</math>. To illustrate this, assume that you are using an AR(1) model, in which case the variables <math>y</math> and <math>X</math> would be defined as follows (assuming we have 1001 observations available and <math>T=1001-1=1000</math>)
+
where <math>\bar{c}_{AB}=\frac{1}{P}\sum_{\tau} c_{AB,\tau}</math>, assuming that we have <math>P</math> forecast periods and <math>\hat{\sigma}_{\bar{c}_{AB}}</math> is a consistent estimate of <math>\bar{c}_{AB}</math>s standard deviation. As for the DM and <math>MSE-t</math> test, there is a convenient regression setup that can be used to calculate the test statistic
  
<math>y = \left(
+
<math>u_{A,\tau} = \alpha (u_{A,\tau}-u_{B,\tau}) + \epsilon_{\tau}</math>
              \begin{array}{c}
 
                y_2 \\
 
                y_3 \\
 
                \vdots \\
 
                y_{1000} \\
 
                y_{1001} \\
 
              \end{array}
 
            \right); X = \left(
 
                                    \begin{array}{cc}
 
                                      1 & y_1  \\
 
                                      1 & y_2  \\
 
                                      \vdots & \vdots \\
 
                                      1 & y_{999} \\
 
                                      1 & y_{1000} \\
 
                                    \end{array}
 
                                  \right)</math>
 
  
Further assume that we want to produce out-of-sample forecasts for periods <math>R=802</math> onwards. In the fixed scheme this implies that we will estimate the model parameters using information up to <math>t=801</math>. We would the produce the following loop:
+
in which the null hypothesis of Model A encompassing Model B is represented by <math>H_0: \alpha = 0</math> (<math>H_A: \alpha > )</math>, one-sided!). The <math>ENC-t</math> test statistic is then essentially equivalent to the t-test on <math>\alpha</math>. It is again important that the standard error to <math>\hat{\alpha}</math> is estimated using robust methods, such as Newey-West standard errors.
  
<source>
+
So far so good, but as it turns out, the asymptotic distribution of this test statistic will, as for <math>MSE-t</math>, depend on the forecasting scheme (recursive, rolling fixed) and the relation between models A and B. Therefore, a bootstrap methodology is again called for.
y;  % (1000 x 1) dependent variable
 
X;  % (1000 x k) explanatiry variables
 
T = 1000;
 
R = 801;
 
 
 
%% Fixed Scheme
 
[par_est] = ModelEstimation(y(1:R-1),X(1:R-1,:));  %Use data up to R-1 to estimate parameters
 
 
 
save_forecasts = zeros(T-R+1,1);    % save forecasts in here
 
count = 1;
 
 
 
for i = R:T    % loop from R to T
 
 
 
    save_forecasts(count) = ModelForecast(par_est,X(i,:));  % produce model forecasts, conditional on info at i
 
    count = count+1;        % increase forecast counter by 1
 
end</source>
 
Here <code>ModelEstimation</code> and <code>ModelForecast</code> are functions that are used to estimate the model parameters and produce forecasts respectively. They will depend on the particular models used. In the case of an AR(1) model they could be the functions <code>armaxfilter</code> and <code>armaforc</code> discussed in the section on univariate [[UniTS|time-series models]].
 
 
 
== Recursive Scheme ==
 
 
 
In this scheme we re-estimate the parameter for every new forecast. The parameter estimate for the forecast <math>E[y_{R+\tau}|I_R]</math> remains as above, <math>\widehat{\mathbf{\beta}}_{1,R}</math>. For <math>E[y_{R+1+\tau}|I_{R+1}]</math> we use <math>\widehat{\mathbf{\beta}}_{1,R+1}</math> and for <math>E[y_{R+2+\tau}|I_{R+2}]</math> we use <math>\widehat{\mathbf{\beta}}_{1,R+2}</math> and so on.
 
 
 
In other word, at any time we use all available information to obtain parameter estimates, using an incrasing estimation window. The schematic MATLAB code would change to the following:
 
 
 
<source>%% Recursive Scheme
 
 
 
save_forecasts = zeros(T-R+1,1);    % save forecasts in here
 
count = 1;
 
 
 
for i = R:T    % loop from R to T
 
    [par_est] = ModelEstimation(y(1:i-1),X(1:i-1,:));  % Use data up to i-1 to estmate parameters
 
    save_forecasts(count) = ModelForecast(par_est,X(i,:));  % produce model forecasts, conditional on info at i
 
    count = count+1;        % increase forecast counter by 1
 
end</source>
 
The difference to the fixed scheme is that the parameter estimation has come into the loop and uses ever increasing sample sizes.
 
 
 
== Rolling Scheme ==
 
 
 
Here we also re-estimate the model parameters for every forecast, but we do that while keeping the estimation window at a constant size. The parameter estimate for the forecast <math>E[y_{R+\tau}|I_R]</math> remains as above, <math>\widehat{\mathbf{\beta}}_{1,R}</math>. For <math>E[y_{R+1+\tau}|I_{R+1}]</math> we use <math>\widehat{\mathbf{\beta}}_{2,R+1}</math> and for <math>E[y_{R+2+\tau}|I_{R+2}]</math> we use <math>\widehat{\mathbf{\beta}}_{3,R+2}</math> and so on.
 
 
 
This sounds sub-optimal as we are not using all available information. However, this scheme has two nice aspects. Firstly, it may deliver some protection against structural breaks compared to the recursive scheme, which, with its increasing estimation window size, becomes more vulnerable to changes in the underlying model parameters. The second, and more potent advantage is that this scheme makes forecast comparison more straightforward. We will pick up on this point again when we get to the forecast comparison techniques a little later.
 
 
 
<source>%% Rolling Scheme
 
 
 
save_forecasts = zeros(T-R+1,1);    % save forecasts in here
 
count = 1;
 
 
 
for i = R:T    % loop from R to T
 
    [par_est] = ModelEstimation(y(i-R+1:i-1),X(i-R+1:i-1,:));  % Use data up to i-1 to estmate parameters
 
    save_forecasts(count) = ModelForecast(par_est,X(i,:));  % produce model forecasts, conditional on info at i
 
    count = count+1;        % increase forecast counter by 1
 
end</source>
 
The difference to the recursive scheme is that the size of the estimation window remains constant. If you compare the two code snippets you can see that the difference in the code is rather minimal.
 
 
 
= Model Setup =
 
 
 
Often you will have several competing models which you want to evaluate. In fact often you will ask a question similar to &quot;Which of the models at hand produces the best forecasts?&quot; This seems like a question made for statistical inference. Indeed we will use hypothesis tests to answer questions like this, but although this seems an innocuous enough question, it turns out, that it is rarely easy to answer.
 
 
 
One aspect that will later complicate issues is how the models considered relate to each other.
 
 
 
== Nested, non-nested and overlapping Models ==
 
 
 
We will not go into any technical details here, but explain the issue by illustration. Consider two different models:
 
 
 
<math>\textbf{Model A}: y_{t} = \beta_0 + \beta_1 * x_t + u_{At}\\
 
\textbf{Model B}: y_{t} = \beta_0 + \beta_1 * x_t + \beta_2 * z_t + u_{Bt}</math>
 
 
 
This combination of models is nested, as a simple parameter restriction (<math>\beta_2=0</math>) in one Model (here Model B) turns Model B into Model A. As it turns out, if you use models that are related in this way, statistical inference to establish which of the models is a superior forecasting model, is, in general, greatly complicated.
 
 
 
It is easier to compare models statistically if they are non-nested models. These are often models coming from different types of models. Say you are using a nonlinear model (without being specific of the type) and a linear model. It is often impossible to restrict the parameters of one of the models (here the more complex, nonlinear model) in such a way that it simplifies to the less complex (here linear) model.
 
 
 
The next relationship type between models is that of overlap. Consider the following two models A (as above) and C
 
 
 
<math>\textbf{Model A}: y_{t} = \beta_0 + \beta_1 * x_t + u_{At}\\
 
\textbf{Model C}: y_{t} = \alpha_0 + \alpha_1 * z_t + u_{Ct}</math>
 
 
 
These models are in general different, unless the following parameter restrictions are valid, <math>\beta_1=0</math> in Model A and <math>\alpha_1=0</math> in Model B. If both these restrictions hold, the two models will deliver identical results. Such models are called overlapping. Comparing models of this sort is equally complicated.
 
 
 
= Forecast Evaluation =
 
 
 
When it comes to comparing forecasts from different models there is a wide range of possibilities. In this Section we will only touch on a limited range of these.
 
 
 
Consider a model that produces a series of forecasts <math>\hat{y}_{i,\tau}</math> where the index <math>\tau</math> is over all periods for which you produce forecasts and the index <math>i</math> is to differentiate between forecasts coming from different models. At this stage we will assume that the forecasts <math>\hat{y}_{i,\tau}</math> are one step ahead forecasts and have been conditioned on information available at time <math>\tau-1</math>. The forecast error <math>u_{i,\tau}</math> is defined as <math>u_{i,\tau}= y_{\tau} - \hat{y}_{i,\tau}</math> and is the basis of all methods of forecast evaluation used here.
 
 
 
== Individual measures of forecast precision ==
 
 
 
Here we present measures that are used to give a summary statistic for a given forecast model.
 
 
 
The most common summary measures of forecast performance are the following
 
 
 
<math>\textbf{Forecast Bias}: bias_i = \sum_{\tau} u_{i,\tau}\\
 
\textbf{Mean Square Error}: MSE_i = \sum_{\tau} u_{i,\tau}^2\\
 
\textbf{Mean Absolute Error}: MAE_i = \sum_{\tau} |u_{i,\tau}|\\</math>
 
 
 
Here the summations are over all forecast periods <math>\tau</math>. When these are used we would argue that that smaller measures (in absolute terms for the bias measure) are to be preferred. What these measures cannot tell us is whether the differences in these measures, between different models, are statistically significant.
 
 
 
== Comparing Different Forecasts ==
 
 
 
We can compare two different types of approaches to comparing two or more forecast models. Clark and McCracken ([[[
 
http://research.stlouisfed.org/wp/2011/2011-025.pdf|
 
http://research.stlouisfed.org/wp/2011/2011-025.pdf]]|2011]) describe the traditional approach as the ''population level'' approach and distinguish this approach from the ''finite sample inference''. The key issue that the different approaches address is the fact that model parameters are estimated on the basis of finite samples. The second approach, ''finite sample inference'', accepts these parameters estimated as they are, and asks whether, given these estimated parameters, different models deliver significantly different inference. The former, the ''population level'' approach, however, uses the same sample evidence, produced with parameters estimated on the basis of finite samples, but tests hypotheses based on true model parameters . This implies that the resulting test statistics will have to take account of the variation in the test statistic introduced through the parameter estimation process.
 
 
 
In what follows we will point out which tests belongs to which category.
 
 
 
=== Diebold-Mariano (DM) test ===
 
 
 
This is possibly the most commonly applied test for forecast accuracy. But let’s start with the most important restriction for that test. It is for comparing forecasts that are not based on some estimated model. If we do the latter, as hinted at above, we need to take into account the fact that model parameters are estimated with uncertainty and indeed we need to consider the relation between the models that are compared (nested or non-nested!). The DM test abstracts from these difficulties as we assume that the forecasts basically &quot;fall from the sky&quot;. Well they could come, for instance, from different surveys.
 
 
 
The hypothesis the DM test is designed to test is <math>H_0: MSE_A = MSE_B</math>, where we refer to two Forecast series, A and B. At the core of the test statistic id what is called the ''loss differential'', <math>d_{AB,\tau}=L(u_{A,\tau})-L(u_{B,\tau})</math>. If we used a quadratic loss function this would be <math>d_{AB\tau}=u_{A,\tau}^2-u_{B,\tau}^2</math>. The test statistic is then
 
 
 
<math>DM_{AB} = \frac{\bar{d}_{AB}}{\hat{\sigma}_{\bar{d}_{AB}}}</math>
 
 
 
where <math>\bar{d}_{AB}=\frac{1}{P}\sum_{\tau} d_{AB,\tau}</math>, assuming that we have <math>P</math> forecast periods and <math>\hat{\sigma}_{\bar{d}_{AB}}</math> is a consistent estimate of <math>\bar{d}_{AB}</math>s standard deviation. If the series of <math>d_{AB,\tau}</math> is covariance stationary, then, under the null hypothesis of equal predictive ability it is easy to show that asymptotically <math>DM_{AB}</math> is standard normally distributed.
 
 
 
The only complication here is that <math>\hat{\sigma}_{\bar{d}_{AB}}</math> needs to be estimated consistently, for instance allowing for autocorrelation and heteroskedasticity. The simplest way of achieving this is to use a regression framework. You merely need to estimate an OLS regression
 
 
 
<math>d_{AB,\tau} = \alpha + \epsilon_{\tau}</math>
 
 
 
where <math>\hat{\alpha}</math> will be equal to |d<sub>AB</sub> and the estimated regression standard error will be an estimate for <math>\hat{\sigma}_{\bar{d}_{AB}}</math>. If you want to allow for heteroskedasticity and/or autocorrelation you should employ an OLS routine that calculates Newey-West standard errors (for example the one discussed [[ExampleCodeOLShac|here]]. It is then obvious that the t-statistic for the constant term is equivalent to <math>DM_{AB}</math>.
 
 
 
=== Comparing MSE when using estimated models ===
 
 
 
The following tests belong to the category of ''population level'' tests. Here we compare forecasts from two models, A and B, and decide whether the null hypothesis <math>H_0:MSE_A=MSE_B</math> for true, but unknown population parameters can be rejected. The difference to the DM test discussed in the previous sub-section is that we now allow the forecasts to come from estimated models and hence we need to allow for parameter uncertainty to come into the equation.
 
  
 
= Literature =
 
= Literature =
Line 350: Line 205:
  
 
A review of the uses and abuses of the Diebold-Mariano Test has recently been provided by the man himself, [http://www.ssc.upenn.edu/~fdiebold/papers/paper113/Diebold_DM%20Test.pdf Francis Diebold].
 
A review of the uses and abuses of the Diebold-Mariano Test has recently been provided by the man himself, [http://www.ssc.upenn.edu/~fdiebold/papers/paper113/Diebold_DM%20Test.pdf Francis Diebold].
 
=Footnotes=
 
 
<references />
 
  
 
=Footnotes=
 
=Footnotes=
  
 
  <references />
 
  <references />

Latest revision as of 17:55, 5 November 2013

Forecasting Setup

Imagine you have a dataset with [math]T[/math] observations and you are planning to run a forecasting exercise for a forecasting horizon of [math]\tau[/math]. If [math]\tau=1[/math] then we are talking about one-step ahead forecasts. You want to use your available datasample to produce "out-of-sample"[1] forecasts and evaluate these. This means that we need to split the [math]T[/math] observations into data which are used to estimate model parameters and observations for which we then produce forecasts.

In order to make this discussion more tangible we will use the following example. Imagine we have a univariate time series stored in y. It is of length [math]T[/math] and you want to produce the following conditional mean forecasts, starting with an information set [math]I_R[/math]

[math]E[y_{R+\tau}|I_R]\\ E[y_{R+1+\tau}|I_{R+1}]\\ E[y_{R+2+\tau}|I_{R+2}]\\ ...\\ E[y_{T}|I_{T-\tau}][/math]

Each forecast depends on information available at the time of the forecast (in the case of a univariate model this is just the value of the series available and on an estimated parameter vector, say [math]\widehat{\mathbf{\beta}}[/math]. The next issue we need to pin down is to determine on the basis of which information we obtain these parameter estimates.

There are three common, distinctly different schemes.

Fixed Scheme

In this scheme we estimate the model parameter once only, for the first forecast period. To be precise we use observations 1 to [math]R[/math] to obtain [math]\widehat{\mathbf{\beta}}_{1,R}[/math] where the subscript reflects the observations on the basis of which the estimate is obtained. For subsequent forecasts we continue to use that estimate.

This is clearly potentially suboptimal in the sense that we are not making the best possible use of the newly available information, e.g. [math]y_{R+1}[/math] for the second forecast, [math]E[y_{R+1+\tau}|I_{R+1}][/math]. While that information is used in the conditioning information, it may also be that this new observation would change our parameter estimate. This scheme is generally only used if the model we use is extremely difficult to estimate, in the sense that an estimation takes a long time.

What follows is a schematic piece of MATLAB code that could accomplish this. Assume that your [math](T \times 1)[/math] vector for the dependent variable is stored in y with typical element [math]y_t[/math]. Further the matrix X contains a [math](T \times k)[/math] matrix with corresponding explanatory variables. For this example we will assume that we care about one-step ahead forecasts and that all values in the [math]t[/math]th row of X are available at time [math]t-1[/math]. To illustrate this, assume that you are using an AR(1) model, in which case the variables [math]y[/math] and [math]X[/math] would be defined as follows (assuming we have 1001 observations available and [math]T=1001-1=1000[/math])

[math]y = \left( \begin{array}{c} y_2 \\ y_3 \\ \vdots \\ y_{1000} \\ y_{1001} \\ \end{array} \right); X = \left( \begin{array}{cc} 1 & y_1 \\ 1 & y_2 \\ \vdots & \vdots \\ 1 & y_{999} \\ 1 & y_{1000} \\ \end{array} \right)[/math]

Further assume that we want to produce out-of-sample forecasts for periods [math]R=802[/math] onwards. In the fixed scheme this implies that we will estimate the model parameters using information up to [math]t=801[/math]. We would the produce the following loop:

y;  % (1000 x 1) dependent variable
X;  % (1000 x k) explanatiry variables
T = 1000;
R = 801;

%% Fixed Scheme
[par_est] = ModelEstimation(y(1:R-1),X(1:R-1,:));   %Use data up to R-1 to estimate parameters

save_forecasts = zeros(T-R+1,1);    % save forecasts in here
count = 1;

for i = R:T     % loop from R to T

    save_forecasts(count) = ModelForecast(par_est,X(i,:));  % produce model forecasts, conditional on info at i
    count = count+1;        % increase forecast counter by 1
end

Here ModelEstimation and ModelForecast are functions that are used to estimate the model parameters and produce forecasts respectively. They will depend on the particular models used. In the case of an AR(1) model they could be the functions armaxfilter and armaforc discussed in the section on univariate time-series models.

Recursive Scheme

In this scheme we re-estimate the parameter for every new forecast. The parameter estimate for the forecast [math]E[y_{R+\tau}|I_R][/math] remains as above, [math]\widehat{\mathbf{\beta}}_{1,R}[/math]. For [math]E[y_{R+1+\tau}|I_{R+1}][/math] we use [math]\widehat{\mathbf{\beta}}_{1,R+1}[/math] and for [math]E[y_{R+2+\tau}|I_{R+2}][/math] we use [math]\widehat{\mathbf{\beta}}_{1,R+2}[/math] and so on.

In other word, at any time we use all available information to obtain parameter estimates, using an incrasing estimation window. The schematic MATLAB code would change to the following:

%% Recursive Scheme

save_forecasts = zeros(T-R+1,1);    % save forecasts in here
count = 1;

for i = R:T     % loop from R to T
    [par_est] = ModelEstimation(y(1:i-1),X(1:i-1,:));   % Use data up to i-1 to estmate parameters
    save_forecasts(count) = ModelForecast(par_est,X(i,:));  % produce model forecasts, conditional on info at i
    count = count+1;        % increase forecast counter by 1
end

The difference to the fixed scheme is that the parameter estimation has come into the loop and uses ever increasing sample sizes.

Rolling Scheme

Here we also re-estimate the model parameters for every forecast, but we do that while keeping the estimation window at a constant size. The parameter estimate for the forecast [math]E[y_{R+\tau}|I_R][/math] remains as above, [math]\widehat{\mathbf{\beta}}_{1,R}[/math]. For [math]E[y_{R+1+\tau}|I_{R+1}][/math] we use [math]\widehat{\mathbf{\beta}}_{2,R+1}[/math] and for [math]E[y_{R+2+\tau}|I_{R+2}][/math] we use [math]\widehat{\mathbf{\beta}}_{3,R+2}[/math] and so on.

This sounds sub-optimal as we are not using all available information. However, this scheme has two nice aspects. Firstly, it may deliver some protection against structural breaks compared to the recursive scheme, which, with its increasing estimation window size, becomes more vulnerable to changes in the underlying model parameters. The second, and more potent advantage is that this scheme makes forecast comparison more straightforward. We will pick up on this point again when we get to the forecast comparison techniques a little later.

%% Rolling Scheme

save_forecasts = zeros(T-R+1,1);    % save forecasts in here
count = 1;

for i = R:T     % loop from R to T
    [par_est] = ModelEstimation(y(i-R+1:i-1),X(i-R+1:i-1,:));   % Use data up to i-1 to estmate parameters
    save_forecasts(count) = ModelForecast(par_est,X(i,:));  % produce model forecasts, conditional on info at i
    count = count+1;        % increase forecast counter by 1
end

The difference to the recursive scheme is that the size of the estimation window remains constant. If you compare the two code snippets you can see that the difference in the code is rather minimal.

Model Setup

Often you will have several competing models which you want to evaluate. In fact often you will ask a question similar to "Which of the models at hand produces the best forecasts?" This seems like a question made for statistical inference. Indeed we will use hypothesis tests to answer questions like this, but although this seems an innocuous enough question, it turns out, that it is rarely easy to answer.

One aspect that will later complicate issues is how the models considered relate to each other.

Nested, non-nested and overlapping Models

We will not go into any technical details here, but explain the issue by illustration. Consider two different models:

[math]\textbf{Model A}: y_{t} = \beta_0 + \beta_1 * x_t + u_{At}\\ \textbf{Model B}: y_{t} = \beta_0 + \beta_1 * x_t + \beta_2 * z_t + u_{Bt}[/math]

This combination of models is nested, as a simple parameter restriction ([math]\beta_2=0[/math]) in one Model (here Model B) turns Model B into Model A. As it turns out, if you use models that are related in this way, statistical inference to establish which of the models is a superior forecasting model, is, in general, greatly complicated.

It is easier to compare models statistically if they are non-nested models. These are often models coming from different types of models. Say you are using a nonlinear model (without being specific of the type) and a linear model. It is often impossible to restrict the parameters of one of the models (here the more complex, nonlinear model) in such a way that it simplifies to the less complex (here linear) model.

The next relationship type between models is that of overlap. Consider the following two models A (as above) and C

[math]\textbf{Model A}: y_{t} = \beta_0 + \beta_1 * x_t + u_{At}\\ \textbf{Model C}: y_{t} = \alpha_0 + \alpha_1 * z_t + u_{Ct}[/math]

These models are in general different, unless the following parameter restrictions are valid, [math]\beta_1=0[/math] in Model A and [math]\alpha_1=0[/math] in Model B. If both these restrictions hold, the two models will deliver identical results. Such models are called overlapping. Comparing models of this sort is equally complicated.

Forecast Evaluation

When it comes to comparing forecasts from different models there is a wide range of possibilities. In this Section we will only touch on a limited range of these.

Consider a model that produces a series of forecasts [math]\hat{y}_{i,\tau}[/math] where the index [math]\tau[/math] is over all periods for which you produce forecasts and the index [math]i[/math] is to differentiate between forecasts coming from different models. At this stage we will assume that the forecasts [math]\hat{y}_{i,\tau}[/math] are one step ahead forecasts and have been conditioned on information available at time [math]\tau-1[/math]. The forecast error [math]u_{i,\tau}[/math] is defined as [math]u_{i,\tau}= y_{\tau} - \hat{y}_{i,\tau}[/math] and is the basis of all methods of forecast evaluation used here.

Individual measures of forecast precision

Here we present measures that are used to give a summary statistic for a given forecast model.

The most common summary measures of forecast performance are the following

[math]\textbf{Mean Forecast Bias}: bias_i = \frac{1}{P} \sum_{\tau} u_{i,\tau}\\ \textbf{Mean Square Error}: MSE_i = \frac{1}{P} \sum_{\tau} u_{i,\tau}^2\\ \textbf{Mean Absolute Error}: MAE_i = \frac{1}{P} \sum_{\tau} |u_{i,\tau}|\\[/math]

Here the summations are over all [math]P[/math] forecast periods [math]\tau[/math]. When these are used we would argue that that smaller measures (in absolute terms for the bias measure) are to be preferred. What these measures cannot tell us is whether the differences in these measures, between different models, are statistically significant.

Comparing Different Forecasts

We can compare two different types of approaches to comparing two or more forecast models. Clark and McCracken (2011) describe the traditional approach as the population level approach and distinguish this approach from the finite sample inference. The key issue that the different approaches address is the fact that model parameters are estimated on the basis of finite samples. The second approach, finite sample inference, accepts these parameters estimated as they are, and asks whether, given these estimated parameters, different models deliver significantly different inference. The former, the population level approach, however, uses the same sample evidence, produced with parameters estimated on the basis of finite samples, but tests hypotheses based on true model parameters . This implies that the resulting test statistics will have to take account of the variation in the test statistic introduced through the parameter estimation process.

In what follows we will point out which tests belongs to which category.

Diebold-Mariano (DM) test

This is possibly the most commonly applied test for forecast accuracy. But let’s start with the most important restriction for that test. It is for comparing forecasts that are not based on some estimated model. If we do the latter, as hinted at above, we need to take into account the fact that model parameters are estimated with uncertainty and indeed we need to consider the relation between the models that are compared (nested or non-nested!). The DM test abstracts from these difficulties as we assume that the forecasts basically "fall from the sky". Well they could come, for instance, from different surveys.

The hypothesis the DM test is designed to test is [math]H_0: MSE_A = MSE_B[/math], where we refer to two Forecast series, A and B. At the core of the test statistic id what is called the loss differential, [math]d_{AB,\tau}=L(u_{A,\tau})-L(u_{B,\tau})[/math]. If we used a quadratic loss function this would be [math]d_{AB\tau}=u_{A,\tau}^2-u_{B,\tau}^2[/math]. The test statistic is then

[math]DM_{AB} = \frac{\bar{d}_{AB}}{\hat{\sigma}_{\bar{d}_{AB}}}[/math]

where [math]\bar{d}_{AB}=\frac{1}{P}\sum_{\tau} d_{AB,\tau}[/math], assuming that we have [math]P[/math] forecast periods and [math]\hat{\sigma}_{\bar{d}_{AB}}[/math] is a consistent estimate of [math]\bar{d}_{AB}[/math]s standard deviation. If the series of [math]d_{AB,\tau}[/math] is covariance stationary, then, under the null hypothesis of equal predictive ability it is easy to show that asymptotically [math]DM_{AB}[/math] is standard normally distributed.

The only complication here is that [math]\hat{\sigma}_{\bar{d}_{AB}}[/math] needs to be estimated consistently, for instance allowing for autocorrelation and heteroskedasticity. The simplest way of achieving this is to use a regression framework. You merely need to estimate an OLS regression

[math]d_{AB,\tau} = \alpha + \epsilon_{\tau}[/math]

where [math]\hat{\alpha}[/math] will be equal to |dAB and the estimated regression standard error will be an estimate for [math]\hat{\sigma}_{\bar{d}_{AB}}[/math]. If you want to allow for heteroskedasticity and/or autocorrelation you should employ an OLS routine that calculates Newey-West standard errors (for example the one discussed here. It is then obvious that the t-statistic for the constant term is equivalent to [math]DM_{AB}[/math].

Comparing MSE when using estimated models

The following tests belong to the category of population level tests. Here we compare forecasts from two models, A and B, and decide whether the null hypothesis [math]H_0:MSE_A=MSE_B[/math] for true, but unknown population parameters can be rejected. The difference to the DM test discussed in the previous sub-section is that we now allow the forecasts to come from estimated models and hence we need to allow for parameter uncertainty to come into the equation. In effect we are moving from comparing forecasts (as in the DM test) to comparing Models (via their forecasts).

As it turns out, the test statistic

[math]MSE-t_{AB} = \frac{\bar{d}_{AB}}{\hat{\sigma}_{\bar{d}_{AB}}}[/math]

is exactly the same[2]. To be precise, in this context one should condition [math]\bar{d}_{AB}(\hat{\theta_A},\hat{\theta_B})[/math] on the estimated model parameters [math]\theta_A[/math] and [math]\theta_B[/math]. And the hypothesis tested is [math]H_0:MSE_A(\theta_A)=MSE_B(\theta_B)[/math], stating that the models have equal predictive ability under the true (but unknown) model parameters.

As it turns out, this difference to the DM setup, although it may sound minor or technical, wreaks havoc when it comes to deriving the asymptotic distribution of the test statistic. Under the simple assumption made in DM the test statistic was asymptotically standard normally distributed. Now the test’s distribution under the null hypothesis of equal predictive ability depends on a number of features, such as whether the models are nested or not and which forecasting scheme has been used.

As it turns out the distributions are such that in general one has to employ bootstrapping techniques to establish correct critical values (see Clark and McCracken, 2011, for details).

Forecast encompassing

The test discussed in the previous section is a two sided test in the sense that there is no view on which of the different models may be superior. The concept of forecast encompassing works with different hypothesis. Now the null hypothesis is that, say, Model A is as good as it gets, or at least, that Model B does not contribute any extra forecasting value to that delivered by Model A. If this was the case then Model A is said to forecast encompass Model B.

Previously the test statistic was based on [math]d_{AB\tau}=u_{A,\tau}^2-u_{B,\tau}^2[/math]. Now we are basing the test on the following term, [math]c_{AB\tau}=u_{A,\tau}(u_{A,\tau}-u_{B,\tau})[/math]. We are basically trying to establish whether the forecast error of Model A, [math]u_{A,\tau}[/math], is correlated to the difference in forecast errors, [math](u_{A,\tau}-u_{B,\tau})[/math]. If Model B cannot improve on Model A, then there should be no such correlation. The test statistic we use here is

[math]ENC-t_{AB} = \frac{\bar{c}_{AB}}{\hat{\sigma}_{\bar{c}_{AB}}}[/math]

where [math]\bar{c}_{AB}=\frac{1}{P}\sum_{\tau} c_{AB,\tau}[/math], assuming that we have [math]P[/math] forecast periods and [math]\hat{\sigma}_{\bar{c}_{AB}}[/math] is a consistent estimate of [math]\bar{c}_{AB}[/math]s standard deviation. As for the DM and [math]MSE-t[/math] test, there is a convenient regression setup that can be used to calculate the test statistic

[math]u_{A,\tau} = \alpha (u_{A,\tau}-u_{B,\tau}) + \epsilon_{\tau}[/math]

in which the null hypothesis of Model A encompassing Model B is represented by [math]H_0: \alpha = 0[/math] ([math]H_A: \alpha \gt )[/math], one-sided!). The [math]ENC-t[/math] test statistic is then essentially equivalent to the t-test on [math]\alpha[/math]. It is again important that the standard error to [math]\hat{\alpha}[/math] is estimated using robust methods, such as Newey-West standard errors.

So far so good, but as it turns out, the asymptotic distribution of this test statistic will, as for [math]MSE-t[/math], depend on the forecasting scheme (recursive, rolling fixed) and the relation between models A and B. Therefore, a bootstrap methodology is again called for.

Literature

An excellent overview of the issues at hand is given in

Clark, T.E. and McCracken, M.W. (2011) Advances in Forecast Evaluation, Working Paper 2011-025B

A significant part of this paper buids on a previous survey paper:

West, K.W. (2006) Forecast Evaluation, in: Handbook of Economic Forecasting, Volume 1, edited by G. Elliott, C. W.J. Granger, A. G. Timmermann

A review of the uses and abuses of the Diebold-Mariano Test has recently been provided by the man himself, Francis Diebold.

Footnotes

  1. Here we are not talking about genuine out-of-sample forecasts, as they would forecast for time periods after [math]T[/math].
  2. The test stat has a different label to indicate that the underlying assumptions are different to those for the [math]DM[/math]-test. The name [math]MSE-t[/math] is that used in the Clark and McCracken (2011) paper.