# R TimeSeries

## The forecast package

In this section we will demonstrate how to do basic univariate time-series modelling with R. We will use a package written by Rob Hyndman, called "forecast". So before you get started you need to go to R and

   install.packages("forecast")


But note that this package requires R of version 3. Then at the beginning of your code you will have to import the library by adding

   library(forecast)


## Importing Data

Here we will initially use a dataset on UK CPI UKCPI.xls. Download this and save it as a csv file as this will facilitate the upload to R.

   setwd("YOUR DIRECTORY")              # This sets the working directory
data <- read.csv("UKCPI.csv")  # Opens UKCPI.csv from working directory


which will produce a dataframe (data) with two variables, one giving dates (DATE) and the other containing the actual CPI data for 1988Q1 to 2013Q4 (UKCPI). You could find more recent data e.g. from the excellent FRED database [1].

## Basic Time-Series Data Transformations

### Forcing Time-Series

The first thing we need is to ensure that R knows that the data we are dealing with are actually time series data. There is a function in R you can use to check whether R realises that data are time series data or not:

   print(is.ts(data$UKCPI)) # check whether it is a time-series object  This will print either TRUE or FALSE. In this case the answer is FALSE, data$UKCPI is not recognised as a time-series. We can now force R to recognise this variable as a time-series. Use the following command:

   data$UKCPI <- ts(data$UKCPI, start=c(1988, 1), end=c(2013, 4), frequency=4)


Let's see what is happening here. ts() is the function used and as usual you can use ?ts to figure out some more details. The first input is the data series we want to be recognised as atime-series. The next input is the data of the first observation, start=c(1988, 1), and the date of the last observation, and importantly the data frequency, frequency=4 (4 representing quarterly data and 12 monthly data).

To confirm that the data are now time-series data you could use the is.ts() function again which should now return a TRUE.

### Differencing data

Often you will need a differenced time series $\Delta y_t = y_{t} - y_{t-1}$. The command to create this is a straightforward:

## ARMA models

A generic ARMA(p,q) model can be written as

$y_t = \alpha_0 + \alpha_1 y_{t-1} + ... + \alpha_p y_{t-p} + \epsilon_t + \delta_1 \epsilon_{t-1} + ... + \delta_q \epsilon_{t-q}$

If the series that is being modelled with this ARMA(p,q) process is a differenced series, e.g. dUKCPI, then we would actually call this a ARIMA(p,1,q) model. If however we model the UK CPI directly then we would also call it ARIMA(p,0,q).

It should be noted that the forecast package which we use actually estimates the following model:

$(y_t-m) = \alpha_1 (y_{t-1}-m) + ... + \alpha_p (y_{t-p}-m) + \epsilon_t + \delta_1 \epsilon_{t-1} + ... + \delta_q \epsilon_{t-q}$

in other words, we drop the constant but instead estimate a mean parameter m. This has the advantage that the estimated value for m (it will be reported as the constant) can actually be interpreted as a mean value for the modelled series.

### Estimating ARMA Models

To estimate a specific ARMA(p,q) model, say a ARIMA(4,0,0) or ARMA(4,0) or AR(4) model, we use the Arima() function from the "forecast" package. It's use is very simple and straightforward:

   fit_diff_400 <- Arima(dUKCPI, order=c(4,0,0))


The first input is the time series for which the model is being estimated and the second input delivers the model order information to R, order=c(4,0,0). The first number gives the AR order, the second gives the number of times we want to difference the time series, and the third is the MA order. You got it if you realise that

   Arima(data$UKCPI, order=c(4,1,0))  delivers exactly the same results. If you don't believe it, try it! Before we discuss the output of such an ARIMA estimation there is one further little detail that is worth mentioning. Whenever we are using autoregressive lags, e.g. $p$ lags of $y_t$, we are in some sense loosing $p$ observations. The values of $y_1$ to $y_p$ only appear on the right hand side but not on the left hand side as dependent variable as we do not have all conditioning information. If we were to estimate the model only for the $y_{p+1}$ onwards, we are essentially treating $y_1$ to $y_p$ as conditioning information only and we are not trying to fit the model to these values. This means that we are not using all information efficiently. It is not straightforward to explain how to use these data well (read James Hamilton's textbook on Time-Series Analysis if you want to know), but we are lucky, the clever people who wrote the forecast package, did the right thing. So when you use the arima function as described above all is good. The output of this function is a list (just as the output of the LM() function. To figure out what information you can extract from this estimated model type  names(fit_diff_400)  which in this case will produce  [1] "coef" "sigma2" "var.coef" "mask" [5] "loglik" "aic" "arma" "residuals" [9] "call" "series" "code" "n.cond" [13] "model" "bic" "aicc" "x"  You should recognise a few of these. There are estimated coefficients (coef), residuals and information criterion (aic, bic and aicc). As usual you can access individual elements using the $-sign, e.g. fit_diff_400$coef. When estimating univariate time-series models it is crucial to get the process orders right. The most common technique is to find that process order that minimises an information criterion. You could do that manually by estimating many models of different orders and compare an information criterion or you could let R do the hard work and use the auto.arima() function. Without going into any details (you by now know how to access details about the use of functions), just try  fit_diff <- auto.arima(dUKCPI,seasonal=FALSE)  and try to figure out what happens. ### Forecasting ARMA Models Once you have an estimated model you can use that information to produce forecasts. Again there is a convenient function that handles the details of the forecasting. You merely need to know what information needs to be handed to the forecasting function forecast(). Again we will illustrate this using an example.  fit_diff_400f <- forecast(fit_diff_400,h=10)  The first input the forecast() function requires is the list in which we saved the model estimates, here fit_diff_400. The second input specifies how many periods we want to forecast ahead, here h=10 periods. Again this function calculates all sorts of stuff and names(fit_diff_400f) reveals which outputs are available. The actual forecasts can be accessed from fit_diff_400f$mean. Often you will want to see a graphical representation of the forecasts. This is again easily done using the plot function. Let's look at the following example:

   plot(forecast(fit_diff_400,h=10))


The first input is our above call to the forecast function, forecast(fit_diff_400,h=10) (you could have also directly used fit_diff_400f). This will deliver the following plot:

As you can see this plots all data used to estimate the model plus the 10 quarters of forecasts. The shaded areas indicate the forecast confidence intervals. If you add a second input, include = 40, into the plot function

   plot(forecast(fit_diff_400,h=10), include = 40)


then we will only see the last 40 observations of the series used to estimate the model (this plot is not shown here).

## Online Demo

Here is an online clip which goes through this material [2].