R Packages

From ECLR
Jump to: navigation, search

We introduce how packages are used in R. Being able to use packages is extremely important in R and will save you a lot of time as you will use solutions that have been programmed by other people.

What are they

The R software has some basic functionality, but the power of R comes from the ability to use code written to perform statistical and econometric techniques that has been written by other people. These additional pieces of software are called packages. Packages usually include a host of functions that can perform tasks that are all related to a particular problem type (say, using probability distributions, estimating GARCH models, performing Bayesian inference etc.). The next step will be to learn how to find them, install them and use them.

Finding the right package

The most difficult task is often to find the right package you want to use. Usually Dr Google or Prof Bing will be the right people to ask.

Let's say you want to show an empirical frequency distribution for a categorical variable, like the number of children below or at least 6 years old (kidsth6, kidsge6). It is a bit of an art to find the right package (and there may be several packages which do the job). I googled the following term "R empirical probability distributions package". If you don't include the package term you will tend to find solutions to programme it yourself, but if you want a pre-written code including "package" helps.

Scanning the results there appears a link to a package called "prob" and if you open the pdf file (to which my serach engine linked) you will find a list of function that are contained in that package and you will soon see an "empirical" function which appears to be designed to do the job. Keep this file open as we will need to consult it to understand how to use the function. But first we need to make this code available to our script file.

Installing packages

Such packages do not come pre-installed into R, but luckily, they are easily installed and used. If you first want to check which packages are already installed on your computer, you can use the following command:

    ip <- installed.packages(.Library)

Which produces an object ip which contains the list of all the packges that are already installed. The process you need to go through to use a new package is the following:

  1. Install the package: install.packages("NAME_OF_PACKAGE"). The name comes in inverted commas. This only needs to be done once on any computer. The very nice thing is that you will not need to download anything yourself. R will do all the work for you! [1]
  2. Load the package into your particular code. You need to do this every time you load R. So, if you are working in a script you would have the following command at the beginning: library(NAME_OF_PACKAGE). Now you use the package name without inverted commas.

How to use packages

Once you installed a package and then included the library(NAME_OF_PACKAGE) statement, you can access all the functions contained in a package. Go pack to the pdf file that described the "prob" package and find the section on the "empirical" function

R empirical.JPG

This is a typical describtion of a R function. As it turns out, this is a rather easy function which only accepts one input, the variable for which you want to calculate the empirical distribution. The following applies this function to the kidslt6 variable:

    test <- empirical(mydata[c("kidslt6")])

This creates a new dataframe test that contains the empirical distribution of kidslt6

      kidslt6       probs
    1       0   0.804780876
    2       1   0.156706507
    3       2   0.034528552
    4       3   0.003984064

from which we can see that 80% of women in the dataset have no children under the age of 6 and only 3% have two.

Updating Packages

Every once in a while you will want to update your packages (as they get maintained and updated by their authors). This is done pretty easily using the function update.packages() (part of the "utils" package). All you need to do is call update.packages() from the command line and R will iterate through the packages you have installed and ask you whether you want them updated. Depending on how many packages you have this can take a while.

Additional Resources for Packages

  • A list of available packages can be found on the CRAN webpage.
  • If you are looking for something particular it may be a good idea to look at the CRAN Task Views which are short paragraphs introducing useful packages for particular topic areas.

Footnotes

  1. The first time you do that on your computer, R will ask you from which Mirror you want to download this and will offer a list. Choose the one that is geographically closest to you.