Maps in R

From ECLR
Jump to: navigation, search

Introduction

Here we will demonstrate how to combine maps with data. This is a very introductory tas, but if you are interested you will findd enough startingf capital here to continue developping your knowledge and plotting skills.

I myself used the following introductory tutorial and found it very useful.

Let's start by setting a working directory of your choice

setwd("C:/Users/Ralf/Dropbox/R/maptools")

The challenge

The task in this tutorial is to produce a map that is colour coded according to the level of monthly minimum wage for different countries in Europe.

The data

I got the 2015 minimum wage data for different countries from Eurostat. You can use this link to download the latest data, but for the purpose of practicing you could also just create the following dataframe:

countries <- c("Belgium","Bulgaria","Czech Republic","Denmark","Germany","Estonia","Ireland","Greece","Spain","France","Croatia","Italy","Cyprus","Latvia","Lithuania","Luxembourg","Hungary","Malta","Netherlands","Austria","Poland","Portugal","Romania","Slovenia","Slovakia","Finland","Sweden","United Kingdom","Iceland","Liechtenstein","Norway","Switzerland","Montenegro","Former Yugoslav Republic of Macedonia","Albania","Serbia","Turkey")
geo <- c("BE","BG","CZ","DK","DE","EE","IE","EL","ES","FR","HR","IT","CY","LV","LT","LU","HU","MT","NL","AT","PL","PT","RO","SI","SK","FI","SE","UK","IS","LI","NO","CH","ME","MK","AL","RS","TR")
minwage <- c(1501.82,184.07,331.71,NA,1473,390,1461.85,683.76,756.7,1457.52,395.61,NA,NA,360,300,1922.96,332.76,720.46,1501.8,NA,409.53,589.17,217.5,790.73,380,NA,NA,1378.87,NA,NA,NA,NA,288.05,213.72,156.99,235.04,424.26)
mydata <- data.frame(countries,geo,minwage,stringsAsFactors = FALSE)

Notice that we prevented R from translating the country names to factor variables. R likes factor variables, but for the purpose of plotting we want the country names as strings.

The maps

This tutorial recommends the use of a package called ggmap (although others are available), so we will follow this advice and after installing open the following libraries:

x <- c("ggplot2","ggmap", "rgdal", "rgeos", "maptools", "dplyr", "tidyr", "tmap")
# install.packages(x) # warning: uncommenting this may take a number of minutes
lapply(x, library, character.only = TRUE) # load the required packages

The first two are specific packages used for using maps. The third ggplot2 is an extremely powerful plotting tool.

To get started we find and print a map of The University of Manchester, my employer, by entering the country and postcode as the location parameter.

map <- get_map(location = 'UK,M139PL', zoom = 17)
ggmap(map)
UoM map.png

The zoom parameter is pretty self explanatory and as location you should enter what you would normall enter into a google maps search box. So here is how we get the map of Europe:

map <- get_map(location = 'Europe', zoom = 4)
ggmap(map)
Europe map1.png

While this was nice, what we really want is to plot maps which colour code certain areas. We need a slightly different technology for that.

Using area boundaries

If you wantr to colour certain regions, like countries according to certain colours, R will need to know where the boundaries of your regions are. This information is available in so called shape files and these shape files exist for most of the administrative areas you may be interested in.

For European data a good source is Eurostat but if you need shape files for different regions you should find them if you google for shape files for a specific region. Go to the Eurostat page and download the NUTS_2010_03M_SH.zip file which you should then extract into your working folder.

We now need to make the relevant boundary information available to R. We do this with the readShapePoly function which is part of the maptools package we loaded earlier.

# Ensure the path is correct on your computer
eurMap <- readShapePoly(fn="NUTS_2010_03M_SH/data/NUTS_RG_03M_2010")

All the shape (polygon) information is now available in the eurMap object. While there is no need to understand what exactly is hidden behind it it is good to have some basic understanding. This object is a special type of dataframe, specialised to handle geographic data. The objectt eurMap contains two slots (sections), the data section (eurMap@data) and the polygons section (eurMap@polygons). The former is just like normal dataframe and we can, for instance, look at the first three entries:

head(eurMap@data,3)
##   STAT_LEVL_ NUTS_ID SHAPE_Leng SHAPE_Area
## 0          0      AT   24.85974   10.02825
## 1          0      BE   14.00564    3.89695
## 2          0      BG   21.37210   12.20904

We can see that we have four variables. The important information for our purpose is in the NUTS_ID column in which the areas are specified (AT stands for Austria, BE for Belgium and BG for Bulgaria). It will be this dataframe to which we will soon attach our information on the countries minimum wage. One extra bit of important information is the STAT_LEVL_ columns as it indicates which level of unit we are looking at (it will be the information in NUTS_LEVL_ that will allow us to link the minimum wage information.). An entry of 0 indicates that we are looking at countries.

You could try and extract all countries

# Select countries only
eurMap@data[eurMap@data$STAT_LEVL_ == 0,]

Altogether there are 1920 (check nrow(eurMap)) administrative units represented in this object (most of which are smaller administrative regions in the european countries).

The second slot of the eurMap object (eurMap@polygons) is used to save the geographic information for each of the units. You can try and explore them, but there is no immediate need to do so. Suffice to day that all the ingredients for the graphical magic that is about to happen are hidden in here (if you are a geographer, then you should of course get very excited about these!). All I want to mention here is that the geographic information could be delivered in a number of different ways. Polygons, which are used here are just one, others are points, lines, pixels, grids etc.

Linking the map and statistical data

We now merge the shape and administrative unit (i.e. countries) information and the data on the minimum wage. As we said above, eurMap contains way more information than countries only. For this example we are only interested in the country info, so let's select that information.

eurMap_countries  <- eurMap[eurMap@data$STAT_LEVL_ == 0,]

Now we merge the eurMap_countries and the mydata objects. This will create one big dataframe. When we merge we need to specifiy by what variables they should be merged. Both datasets contain the geocode (2digit country codes, eg. UK for the United Kingdom). This variable is called NUTS_ID in eurMap_countries (and hence by.x="NUTS_ID") and is called geo in mydata (and hence by.y="geo"). This implies that we will attach the minimum wage information to the geographical units in eurMap_countries.

# merge map and data
eurMap_merge <- merge(eurMap_countries, mydata, by.x="NUTS_ID", by.y="geo")

The warning tells you that two records which were in mydata die not find a match in eurMap_countries. Upon inspection you will find that these are Albania (AL) and Serbia (RS). I suspect that if we were to get a somewhat newer shape file from Eurostat (we got the 2010 version) these would be available as well.

To understand what happened here it may be useful to look at the first 8 entries in the @data slot:

head(eurMap_merge@data,8)
##   NUTS_ID STAT_LEVL_ SHAPE_Leng SHAPE_Area      countries minwage
## 1      AT          0  24.859744 10.0282545        Austria      NA
## 2      BE          0  14.005645  3.8969496        Belgium 1501.82
## 3      BG          0  21.372101 12.2090423       Bulgaria  184.07
## 4      CH          0  17.455046  4.8662851    Switzerland      NA
## 5      CY          0   6.326353  0.9188543         Cyprus      NA
## 6      CZ          0  20.821580  9.8424177 Czech Republic  331.71
## 7      DE          0  67.633696 45.9463067        Germany 1473.00
## 8      DK          0  61.585337  6.2008428        Denmark      NA

Here we can see that we attached the new variables countries (the country name) and minwage to the information that previously existed in eurMap_countries.

Plotting the map

There is nothing wrong with having a quick fix, right! So here we use the easiest way to produce maps. In my opinion this is delivered by the tmap package, which we already loaded earlier. The core function that takes a shape file, like our eurMap_merge_Df and turns it into a map is the qtf function.

The first input shp = eurMap_merge specifies which shape file we are using, the second fill = "minwage" indicates which variable (contained in the data slot of the shape file) we use to determine the colours and lastly fill.palette = "-Blues" tells R what sort of colours to use.

qtm(shp = eurMap_merge, fill = "minwage", fill.palette = "-Blues")
## Warning in process_shapes(shps, x[shape.id], gmeta, data_by, dw, dh,
## masterID): Currect projection of shape eurMap_merge unknown. Long-lat
## (WGS84) is assumed.
Europe map2.png

Nice, there is the obvious issue that the French Overseas terretories determine the boundaries of the map, but hey, I promised a quick and dirty fix. As usual, check ?qtm to find more things to change in this map.

If you want more sophisticated map manipulations you may well want to use other packages (like the ggmap expansion to ggplot2 or the Leaflet package) and a good place to start is to check out the mapping in R tutorial linkes in the introduction.

Shape File Resources

Good sources for shape files are

  • GADM database of Global Administrative Areas, [1], although for the UK they do not have the smaller areas. But you can prespecify that you want the data for R and then you can get the shape dataframe directly