Maps in R
Here we will demonstrate how to combine maps with data. This is a very introductory tas, but if you are interested you will findd enough startingf capital here to continue developping your knowledge and plotting skills.
I myself used the following introductory tutorial and found it very useful.
Let's start by setting a working directory of your choice
The task in this tutorial is to produce a map that is colour coded according to the level of monthly minimum wage for different countries in Europe.
I got the 2015 minimum wage data for different countries from Eurostat. You can use this link to download the latest data, but for the purpose of practicing you could also just create the following dataframe:
countries <- c("Belgium","Bulgaria","Czech Republic","Denmark","Germany","Estonia","Ireland","Greece","Spain","France","Croatia","Italy","Cyprus","Latvia","Lithuania","Luxembourg","Hungary","Malta","Netherlands","Austria","Poland","Portugal","Romania","Slovenia","Slovakia","Finland","Sweden","United Kingdom","Iceland","Liechtenstein","Norway","Switzerland","Montenegro","Former Yugoslav Republic of Macedonia","Albania","Serbia","Turkey") geo <- c("BE","BG","CZ","DK","DE","EE","IE","EL","ES","FR","HR","IT","CY","LV","LT","LU","HU","MT","NL","AT","PL","PT","RO","SI","SK","FI","SE","UK","IS","LI","NO","CH","ME","MK","AL","RS","TR") minwage <- c(1501.82,184.07,331.71,NA,1473,390,1461.85,683.76,756.7,1457.52,395.61,NA,NA,360,300,1922.96,332.76,720.46,1501.8,NA,409.53,589.17,217.5,790.73,380,NA,NA,1378.87,NA,NA,NA,NA,288.05,213.72,156.99,235.04,424.26) mydata <- data.frame(countries,geo,minwage,stringsAsFactors = FALSE)
Notice that we prevented R from translating the country names to factor variables. R likes factor variables, but for the purpose of plotting we want the country names as strings.
This tutorial recommends the use of a package called ggmap (although others are available), so we will follow this advice and after installing open the following libraries:
x <- c("ggplot2","ggmap", "rgdal", "rgeos", "maptools", "dplyr", "tidyr", "tmap") # install.packages(x) # warning: uncommenting this may take a number of minutes lapply(x, library, character.only = TRUE) # load the required packages
The first two are specific packages used for using maps. The third
ggplot2 is an extremely powerful plotting tool.
To get started we find and print a map of The University of Manchester, my employer, by entering the country and postcode as the location parameter.
map <- get_map(location = 'UK,M139PL', zoom = 17)
The zoom parameter is pretty self explanatory and as
location you should enter what you would normall enter into a google maps search box. So here is how we get the map of Europe:
map <- get_map(location = 'Europe', zoom = 4)
While this was nice, what we really want is to plot maps which colour code certain areas. We need a slightly different technology for that.
Using area boundaries
If you wantr to colour certain regions, like countries according to certain colours, R will need to know where the boundaries of your regions are. This information is available in so called shape files and these shape files exist for most of the administrative areas you may be interested in.
For European data a good source is Eurostat but if you need shape files for different regions you should find them if you google for shape files for a specific region. Go to the Eurostat page and download the
NUTS_2010_03M_SH.zip file which you should then extract into your working folder.
We now need to make the relevant boundary information available to R. We do this with the
readShapePoly function which is part of the
maptools package we loaded earlier.
# Ensure the path is correct on your computer eurMap <- readShapePoly(fn="NUTS_2010_03M_SH/data/NUTS_RG_03M_2010")
All the shape (polygon) information is now available in the
eurMap object. While there is no need to understand what exactly is hidden behind it it is good to have some basic understanding. This object is a special type of dataframe, specialised to handle geographic data. The objectt
eurMap contains two slots (sections), the
data section (
eurMap@data) and the
polygons section (
eurMap@polygons). The former is just like normal dataframe and we can, for instance, look at the first three entries:
## STAT_LEVL_ NUTS_ID SHAPE_Leng SHAPE_Area ## 0 0 AT 24.85974 10.02825 ## 1 0 BE 14.00564 3.89695 ## 2 0 BG 21.37210 12.20904
We can see that we have four variables. The important information for our purpose is in the
NUTS_ID column in which the areas are specified (AT stands for Austria, BE for Belgium and BG for Bulgaria). It will be this dataframe to which we will soon attach our information on the countries minimum wage. One extra bit of important information is the
STAT_LEVL_ columns as it indicates which level of unit we are looking at (it will be the information in
NUTS_LEVL_ that will allow us to link the minimum wage information.). An entry of 0 indicates that we are looking at countries.
You could try and extract all countries
# Select countries only eurMap@data[eurMap@data$STAT_LEVL_ == 0,]
Altogether there are 1920 (check
nrow(eurMap)) administrative units represented in this object (most of which are smaller administrative regions in the european countries).
The second slot of the
eurMap object (
eurMap@polygons) is used to save the geographic information for each of the units. You can try and explore them, but there is no immediate need to do so. Suffice to day that all the ingredients for the graphical magic that is about to happen are hidden in here (if you are a geographer, then you should of course get very excited about these!). All I want to mention here is that the geographic information could be delivered in a number of different ways. Polygons, which are used here are just one, others are points, lines, pixels, grids etc.
Linking the map and statistical data
We now merge the shape and administrative unit (i.e. countries) information and the data on the minimum wage. As we said above,
eurMap contains way more information than countries only. For this example we are only interested in the country info, so let's select that information.
eurMap_countries <- eurMap[eurMap@data$STAT_LEVL_ == 0,]
Now we merge the
eurMap_countries and the
mydata objects. This will create one big dataframe. When we merge we need to specifiy by what variables they should be merged. Both datasets contain the geocode (2digit country codes, eg. UK for the United Kingdom). This variable is called
eurMap_countries (and hence
by.x="NUTS_ID") and is called
geo in mydata (and hence
by.y="geo"). This implies that we will attach the minimum wage information to the geographical units in
# merge map and data eurMap_merge <- merge(eurMap_countries, mydata, by.x="NUTS_ID", by.y="geo")
The warning tells you that two records which were in
mydata die not find a match in
eurMap_countries. Upon inspection you will find that these are Albania (AL) and Serbia (RS). I suspect that if we were to get a somewhat newer shape file from Eurostat (we got the 2010 version) these would be available as well.
To understand what happened here it may be useful to look at the first 8 entries in the
## NUTS_ID STAT_LEVL_ SHAPE_Leng SHAPE_Area countries minwage ## 1 AT 0 24.859744 10.0282545 Austria NA ## 2 BE 0 14.005645 3.8969496 Belgium 1501.82 ## 3 BG 0 21.372101 12.2090423 Bulgaria 184.07 ## 4 CH 0 17.455046 4.8662851 Switzerland NA ## 5 CY 0 6.326353 0.9188543 Cyprus NA ## 6 CZ 0 20.821580 9.8424177 Czech Republic 331.71 ## 7 DE 0 67.633696 45.9463067 Germany 1473.00 ## 8 DK 0 61.585337 6.2008428 Denmark NA
Here we can see that we attached the new variables
countries (the country name) and
minwage to the information that previously existed in
Plotting the map
There is nothing wrong with having a quick fix, right! So here we use the easiest way to produce maps. In my opinion this is delivered by the
tmap package, which we already loaded earlier. The core function that takes a shape file, like our
eurMap_merge_Df and turns it into a map is the
The first input
shp = eurMap_merge specifies which shape file we are using, the second
fill = "minwage" indicates which variable (contained in the data slot of the shape file) we use to determine the colours and lastly
fill.palette = "-Blues" tells R what sort of colours to use.
qtm(shp = eurMap_merge, fill = "minwage", fill.palette = "-Blues")
## Warning in process_shapes(shps, x[shape.id], gmeta, data_by, dw, dh, ## masterID): Currect projection of shape eurMap_merge unknown. Long-lat ## (WGS84) is assumed.
Nice, there is the obvious issue that the French Overseas terretories determine the boundaries of the map, but hey, I promised a quick and dirty fix. As usual, check
?qtm to find more things to change in this map.
If you want more sophisticated map manipulations you may well want to use other packages (like the ggmap expansion to ggplot2 or the Leaflet package) and a good place to start is to check out the mapping in R tutorial linkes in the introduction.
Shape File Resources
Good sources for shape files are
- GADM database of Global Administrative Areas, , although for the UK they do not have the smaller areas. But you can prespecify that you want the data for R and then you can get the shape dataframe directly