Difference between revisions of "UniTS"
Line 54: | Line 54: | ||
=== MATLAB implementation === | === MATLAB implementation === | ||
− | To estimate AR, MA or ARMA models we recommend the use of pre-written functions. In particular we recommend the use the functions provided through the MFE toolbox [https://bitbucket.org/kevinsheppard/mfe_toolbox]<ref>This means that you need to have this toolbox downloaded and made available to your MATLAB - use FILE - Set Path. | + | To estimate AR, MA or ARMA models we recommend the use of pre-written functions. In particular we recommend the use the functions provided through the MFE toolbox [https://bitbucket.org/kevinsheppard/mfe_toolbox]<ref>This means that you need to have this toolbox downloaded and made available to your MATLAB - use FILE - Set Path. Or alternatively use the technique demonstrated in this [http://youtu.be/<sub>3</sub>2OqcW9WoY?hd=1 video clip] |
</ref>. The particular function that is written to estimate any type of ARMAX(p,q) model is <source enclose="none">parameters = armaxfilter(data,1,P,Q)</source>. For all the details and possible outputs that can be provided by this function (e.g. log-likelihood, estimated residuals, AIC et.) see <source enclose="none">doc armaxfilter</source>. The first input is the time-series vector, the second input is equal to "1" if you want to include a constant ("0" otherwise), the third input, <source enclose="none">P</source>, is a vector containing all the AR lags you want to include (see example below), and the the last input, <source enclose="none">Q</source> is a vector that contains all the MA lags you want to include. | </ref>. The particular function that is written to estimate any type of ARMAX(p,q) model is <source enclose="none">parameters = armaxfilter(data,1,P,Q)</source>. For all the details and possible outputs that can be provided by this function (e.g. log-likelihood, estimated residuals, AIC et.) see <source enclose="none">doc armaxfilter</source>. The first input is the time-series vector, the second input is equal to "1" if you want to include a constant ("0" otherwise), the third input, <source enclose="none">P</source>, is a vector containing all the AR lags you want to include (see example below), and the the last input, <source enclose="none">Q</source> is a vector that contains all the MA lags you want to include. | ||
Revision as of 15:33, 16 November 2012
Contents
Introduction
When dealing with time-series data we will often revert to modeling them in a univariate framework, meaning we will build one model for one time-series, without any reference to any variables. It is, of course, understood that in an economic context no time-series does evolve without being influenced by any other series and therefore univariate models should not be understood as being, even a remotely, realistic representation of the true dynamics.
However, such univariate models have been found to be useful for at least two reasons:
- To be used as a forecasting tool (mainly in the short term).
- To describe some important dynamic features of the time-series. In particular the most common tests for stationarity are couched in a univariate time-series modeling framework.
ARMA models
Two basic linear models are commonly used in time-series modelling, autoregressive (AR) and moving average (MA) models. Combinations of the two are called, ARMA models. Below is the general form of a ARMA(p,q) model.
[math]y_{t}=\alpha +\phi _{1}y_{t-1}+...+\phi _{p}y_{t-p}+\varepsilon _{t}+\theta_{1}\varepsilon _{t-1}+...+\theta _{q}\varepsilon _{t-q}[/math]
For notational ease we shall collect all parameters in a [math]((p+q+1) \times 1)[/math] vector [math]\beta = (\alpha \ \phi_1 \ ... \ \phi_p \ \theta _{1} \ ... \ \theta _{q})'[/math]. It is easily possible to extend the above model to also include further (weakly exogenous) variables. Such models are often called ARMAX models and the MATLAB code used below allows for this extension in a very straightforward manner. Such models should be applied to time series [math]y_t[/math] that are stationary. The stationarity of a time-series should be established prior to to applying this modelling approach.
Parameter estimation
Parameter estimation of ARMA(p,0) models, i.e. AR(p) models is extremely straightforward as it can be achieved by standard OLS estimation. Say you have your favourite OLS estimation procedure (e.g. OLSest.m from this page). You will then need to define your vector with observations of the dependent variable (dep
) and a matrix with explanatory variables in the columns (exp
), which are used as input into your OLS function. All of these will have to be defined from your one vector of time-series data. Let’s say your time-series is saved in the vector data
. Say that data vector has 315 observations and you want to estimate a AR(3) model:
[math]data_{t}=\alpha +\phi _{1}data_{t-1}+\phi _{2}data_{t-2}+\phi _{3}data_{t-3}+\varepsilon _{t}[/math]
You therefore have four explanatory variables, the constant, [math]data_{t-1}[/math], [math]data_{t-2}[/math] and [math]data_{t-3}[/math], meaning that your matrix exp
should have four columns. It is important that the dep
and exp
variables have identical number of rows and that they are lined up appropriately, meaning that in the row in which we have [math]data_{10}[/math] in the dep
variable, we should have a 1 for the constant and [math]data_9[/math], [math]data_8[/math] and [math]data_7[/math] in the exp
variable.
This implies that the first row of dep
should contain [math]data_4[/math] and the first row of exp
should contain a 1, [math]data_3[/math], [math]data_2[/math] and [math]data_1[/math]. Altogether the input variables should look as follows:
[math]exp = \left( \begin{array}{c} data_4 \\ data_5 \\ \vdots \\ data_{314} \\ data_{315} \\ \end{array} \right); dep = \left( \begin{array}{cccc} 1 & data_3 & data_2 & data_1 \\ 1 & data_4 & data_3 & data_2 \\ \vdots & \vdots & \vdots & \vdots \\ 1 & data_{313} & data_{312} & data_{311} \\ 1 & data_{314} & data_{313} & data_{312} \\ \end{array} \right)[/math]
and are a [math](312 \times 1)[/math] vector and [math](312 \times 4)[/math] matrix respectively. In MATLAB we will achieve this as follows, assuming the variable data
is a [math](315 \times 1)[/math] vector with all observations:
dep = data(4:end);
exp = [ones(size(dep)) data(3:end-1) data(2:end-2) data(1:end-3)];
All that is the left to do is to feed these variables into the OLS estimation function and the [math](4 \times 1)[/math] estimated parameter vector will be the OLS estimates for [math](\alpha \ \phi_1 \ \phi_2 \ \phi_3)'[/math].
Estimation of MA(q), and by implication ARMA(p,q) models where [math]q\gt 0[/math] is less straightforward as the [math]\varepsilon _{t}[/math]s are unobserved. This has the consequence that there is no neat analytical formula (unlike for OLS estimations) which can be used to obtain the optimal parameters. What is required is a recursive calculation of all error terms (this calculation will, of course depend on the parameter vector [math]\beta[/math] used. As there is no analytical solution we will have to revert to a nonlinear (or trial and error) optimisation strategy (see NonlinOptTheory for details).
MATLAB implementation
To estimate AR, MA or ARMA models we recommend the use of pre-written functions. In particular we recommend the use the functions provided through the MFE toolbox [1][1]. The particular function that is written to estimate any type of ARMAX(p,q) model is parameters = armaxfilter(data,1,P,Q)
. For all the details and possible outputs that can be provided by this function (e.g. log-likelihood, estimated residuals, AIC et.) see doc armaxfilter
. The first input is the time-series vector, the second input is equal to "1" if you want to include a constant ("0" otherwise), the third input, P
, is a vector containing all the AR lags you want to include (see example below), and the the last input, Q
is a vector that contains all the MA lags you want to include.
Assume you have a time-series saved in y
. In order to estimate an AR(3) model (as above), we would call
pars = armaxfilter(y,1,(1:3),[])
This code will return the estimated constant (in pars(1)
) and the three AR coefficients (in pars(2:end)
). You can see that the last input was an empty vector ([ ]
) indicating that we did not want to include any MA term (AR(3) is equivalent to ARMA(3,0)). If you wanted to add MA terms than we would indicate which terms we wanted in that last term. For example, if you wanted to estimate an ARMA(3,2) model then the relevant function call would be
pars = armaxfilter(y,1,(1:3),(1:2))
const = pars(1); % extract constant
arpar = pars(2:4); % extract AR parameters
mapar = pars(5:6); % extract MA parameters
The results vector pars
will return the estimated parameters in the following order: constant, AR parameters, MA parameters. Please consult the help function for the armaxfilter
function to see how to include exogenous variables (the X in ARMAX) into the estimation.
Model identification
A practical problem when using univariate time series models is to identify the correct order of the ARMA(p,q), i.e the values for [math]p[/math] and [math]q[/math] to be used. The most common approach used is to use information criteria to identify the best model. The idea behind information criteria is the following. The more explanatory variables we use (larger [math]p[/math] and [math]q[/math]) the better we will fit the data. However, it has shown that this does not necessarily translate into better forecast performance and hence we want to trade off this improved in-sample data fit, against the need to include more explanatory variables to achieve this.
This trade-off is formalised in the form of information criteria. The in-sample fit is measured by the [math]s_{\widehat{\varepsilon }}^{2}[/math], the estimated residual variance of the model and the number of parameters, in the case of [math]ARMA(p,q)[/math] models is represented by [math]k=p+q+1[/math], assuming that a constant is included. The sample size used in the estimation is [math]n[/math]. The Akaike Information Criterion (AIC) is:
[math]AIC\left( k\right) =\log \left( s_{\widehat{\varepsilon }}^{2}\right) +2k/n[/math]
and the Schwarz (Bayes) Information Criterion (SIC) is:
[math]SIC\left( k\right) =\log \left( s_{\widehat{\varepsilon }}^{2}\right) +k\log \left( n\right) /n[/math]
The idea is that you calculate these statistics for different model and then choose the model that delivers the smallest model. There are different arguments for why you should use one or the other information criterion (and more than just these two are available), but in practical terms it is good to use that in general the AIC criterion will tend to favour models with more explanatory variables compared to the SIC.
One last note on the use of these information criteria is required. When you compare time-series models different lag lengths also mean different sample sizes. When you estimate time-series models with different lag lengths you should ensure that they are all estimated on the same effective sample size.
MATLAB implementation
We just stated that we want to find the ARMA(p,q) model that minimises the information criterion of our choice. When doing so we need to make a decision on what the maximum value for [math]p[/math] and [math]q[/math] should be. Let’s say it was maxlag = 4
. Then we would have to estimate the 25 possible models, as [math]p[/math] and [math]q[/math] can take values from 0 to 4. The way to implement this is in a loop. Here is the general structure:
%% Identify the best lag order p using SIC and AIC
% ARMA(p,q) model with p,q = 0,1,...,maxlag
maxlag = 4; % We search over lag orders p = 0,1,...,4
save_AIC = zeros(maxlag+1,maxlag+1); % Save AIC and SIC for all possible combinations of p and q
save_SIC = zeros(maxlag+1,maxlag+1); % Save AIC and SIC for all possible lag lengths in here
% Estimate ARMA(p,q) model and save AIC and SIC
for i = 0:maxlag
for j = 0:maxlag
% Define required lags
P = (1:i); % if i==0 this will crate empty vector as required
Q = (1:j);
% need 5th output of armaxfilter as that is a structure
% which contains the AIC and SIC
[b, LL, errors, SEregression, di] = armaxfilter(y,1,P,Q,[],[],[],maxlag);
save_AIC(i+1,j+1) = di.AIC;
save_SIC(i+1,j+1) = di.SBIC;
end
end
This code starts with defining the maximum lag, followed by defining storage matrices into which we will save the calculated AIC and SIC criteria. You can then see two nested loops. Inside the loop we will first define the lags in a way in which they are needed in armaxfilter
function. Then we are calling the armaxfilter
function that does all the hard work. There are two noteworthy differences to the way we call the armaxfilter
function previously. First, we are asking it to return five outputs: [b, LL, errors, SEregression, di]
. This is required because the fifth di
is a structure which contains the SIC and AIC statistics (in di.SBIC
and di.AIC
).
Second, we are also delivering more inputs, namely eight: (y,1,P,Q,[],[],[],maxlag)
. We are familiar with the first four and in fact inputs 5, 6 and 7 are left empty ([ ]
). You can consult help armaxfilter
to exactly figure out what the (unused) inputs 5 to 7 are for. But the eight input is used and we deliver maxlag
to the function. This input insures that all the models in in these loops are evaluated on the same set of observations. Consider two different ARMA models, ARMA(2,1) and ARMA(4,4). For the former we loose two observations, but for the latter we loose 4. That means that, if we just estimated these models, the ARMA(2,1) model would be evaluated on two more observations than the ARMA(4,4) model. But this also means that when we compare information criteria for these two models, then their respective [math]s_{\widehat{\varepsilon }}^{2}[/math] are based on slightly different observations and therefore we cannot really compare them anymore. The eight input (maxlag
) basically excludes the first maxlag
observations from the estimation. As long as none of the AR or MA lags is larger than maxlag
this therefore ensures that all models are evaluated on the same range of observations.
Forecasting
Footnotes
- ↑ This means that you need to have this toolbox downloaded and made available to your MATLAB - use FILE - Set Path. Or alternatively use the technique demonstrated in this 32OqcW9WoY?hd=1 video clip