Difference between revisions of "R TimeSeries"

From ECLR
Jump to: navigation, search
Line 22: Line 22:
 
The first thing we need is to ensure that R knows that the data we are dealing with are actually time series data. There is a function in R you can use to check whether R realises that data are time series data or not:
 
The first thing we need is to ensure that R knows that the data we are dealing with are actually time series data. There is a function in R you can use to check whether R realises that data are time series data or not:
  
     print(is.ts(data$UKCPI))  # check whether it is a time-series object
+
     print(is.ts(data<source enclose=none>$</source>UKCPI))  # check whether it is a time-series object
  
 
This will print either <source enclose=none>TRUE</source> or <source enclose=none>FALSE</source>. In this case the answer is <source enclose=none>FALSE</source>, <source enclose=none>data$UKCPI</source> is not recognised as a time-series. We can now force R to recognise this variable as a time-series. Use the following command:
 
This will print either <source enclose=none>TRUE</source> or <source enclose=none>FALSE</source>. In this case the answer is <source enclose=none>FALSE</source>, <source enclose=none>data$UKCPI</source> is not recognised as a time-series. We can now force R to recognise this variable as a time-series. Use the following command:
  
     data$UKCPI <- ts(data$UKCPI, start=c(1988, 1), end=c(2013, 4), frequency=4)  
+
     data$UKCPI <- ts(data<source enclose=none>$</source>UKCPI, start=c(1988, 1), end=c(2013, 4), frequency=4)  
  
 
Let's see what is happening here.  <source enclose=none>ts()</source> is the function used and as usual you can use <source enclose=none>?ts</source> to figure out some more details. Here we needed as the next input the data of the first observation, <source enclose=none>start=c(1988, 1)</source>, and the date of the last observation, and importantly the data frequency, <source enclose=none>frequency=4</source> (4 representing quarterly data and 12 monthly data).
 
Let's see what is happening here.  <source enclose=none>ts()</source> is the function used and as usual you can use <source enclose=none>?ts</source> to figure out some more details. Here we needed as the next input the data of the first observation, <source enclose=none>start=c(1988, 1)</source>, and the date of the last observation, and importantly the data frequency, <source enclose=none>frequency=4</source> (4 representing quarterly data and 12 monthly data).

Revision as of 22:19, 11 February 2015

In this section we will demonstrate how to do basic univariate time-series modelling with R. We will use a package written by Rob Hyndman, called "forecast". So before you get started you need to go to R and

   install.packages("forecast")

But note that this package requires R of version 3. Then at the beginning of your code you will have to import the library by adding

   library(forecast)

to your code.

Importing Data

Here we will initially use a dataset on UK CPI UKCPI.xls. Doenload this and save it as a csv file as this will facilitate the upload to R.

   setwd("YOUR DIRECTORY")              # This sets the working directory
   data <- read.csv("UKCPI.csv")  # Opens UKCPI.csv from working directory

which will produce a dataframe (data) with two variables, one giving dates (DATE) and the other containing the actual CPI data for 1988Q1 to 2013Q4 (UKCPI).

Basic Time-Series Data Transformations

The first thing we need is to ensure that R knows that the data we are dealing with are actually time series data. There is a function in R you can use to check whether R realises that data are time series data or not:

   print(is.ts(data$UKCPI))  # check whether it is a time-series object

This will print either TRUE or FALSE. In this case the answer is FALSE, data$UKCPI is not recognised as a time-series. We can now force R to recognise this variable as a time-series. Use the following command:

   data$UKCPI <- ts(data$UKCPI, start=c(1988, 1), end=c(2013, 4), frequency=4) 

Let's see what is happening here. ts() is the function used and as usual you can use ?ts to figure out some more details. Here we needed as the next input the data of the first observation, start=c(1988, 1), and the date of the last observation, and importantly the data frequency, frequency=4 (4 representing quarterly data and 12 monthly data).

Additional Resources

  • A very quick intro from Quick-R can be found here [1]
  • We are using the package "forecast" authored by Rob Hyndman who has also written an online textbook on the topic of forecasting [2]
  • To access some very useful data-series in a very convenient way we will also use the QUANDL package.