<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
		<id>http://eclr.humanities.manchester.ac.uk/index.php?action=history&amp;feed=atom&amp;title=Scraping_in_R</id>
		<title>Scraping in R - Revision history</title>
		<link rel="self" type="application/atom+xml" href="http://eclr.humanities.manchester.ac.uk/index.php?action=history&amp;feed=atom&amp;title=Scraping_in_R"/>
		<link rel="alternate" type="text/html" href="http://eclr.humanities.manchester.ac.uk/index.php?title=Scraping_in_R&amp;action=history"/>
		<updated>2026-05-11T20:16:06Z</updated>
		<subtitle>Revision history for this page on the wiki</subtitle>
		<generator>MediaWiki 1.30.1</generator>

	<entry>
		<id>http://eclr.humanities.manchester.ac.uk/index.php?title=Scraping_in_R&amp;diff=4108&amp;oldid=prev</id>
		<title>Rb: /* Get your reference spreadsheet */</title>
		<link rel="alternate" type="text/html" href="http://eclr.humanities.manchester.ac.uk/index.php?title=Scraping_in_R&amp;diff=4108&amp;oldid=prev"/>
				<updated>2015-11-06T15:22:44Z</updated>
		
		<summary type="html">&lt;p&gt;‎&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Get your reference spreadsheet&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr style=&quot;vertical-align: top;&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: white; color:black; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: white; color:black; text-align: center;&quot;&gt;Revision as of 15:22, 6 November 2015&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l39&quot; &gt;Line 39:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 39:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Get your reference spreadsheet ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Get your reference spreadsheet ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;−&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;we said we wanted this info for all Primary Schools in the country. So what we need is a list of these and, importantly, their &amp;#039;&amp;#039;urn&amp;#039;&amp;#039;. Fortunately this exists. Go to the homepage of the UK&amp;#039;s [https://www.gov.uk/government/organisations/department-for-education Department for Education] and then click on &amp;#039;&amp;#039;&amp;#039;Edubase&amp;#039;&amp;#039;&amp;#039;. In the data downloads part of the website you will find the following spreadsheet: &amp;amp;quot;All EduBase data.csv&amp;amp;quot; which you should download into your working directory. Have a look at the spreadsheet. You should easily be able to identify the column named &amp;#039;&amp;#039;URN&amp;#039;&amp;#039;. This is the list we are after. Also note that there are more than 44K entries in that spreadsheet. A lot of &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;websirtes &lt;/del&gt;to look at!&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;we said we wanted this info for all Primary Schools in the country. So what we need is a list of these and, importantly, their &amp;#039;&amp;#039;urn&amp;#039;&amp;#039;. Fortunately this exists. Go to the homepage of the UK&amp;#039;s [https://www.gov.uk/government/organisations/department-for-education Department for Education] and then click on &amp;#039;&amp;#039;&amp;#039;Edubase&amp;#039;&amp;#039;&amp;#039;. In the data downloads part of the website you will find the following spreadsheet: &amp;amp;quot;All EduBase data.csv&amp;amp;quot; which you should download into your working directory. Have a look at the spreadsheet. You should easily be able to identify the column named &amp;#039;&amp;#039;URN&amp;#039;&amp;#039;. This is the list we are after. Also note that there are more than 44K entries in that spreadsheet. A lot of &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;websites &lt;/ins&gt;to look at!&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;# Read in school admin data&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;# Read in school admin data&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;−&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;schooldata &amp;amp;lt;- read.csv(&amp;amp;quot;edubasealldata20151029.csv&amp;amp;quot;)&amp;#160;  # make sure the name matches that of your csv file&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;schooldata &amp;amp;lt;- read.csv(&amp;amp;quot;edubasealldata20151029.csv&amp;amp;quot;)&amp;#160;  # make sure the name matches that of your csv file&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;, the date changes fequently&lt;/ins&gt;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The spreadsheet is now available as the dataframe &amp;lt;code&amp;gt;schooldata&amp;lt;/code&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The spreadsheet is now available as the dataframe &amp;lt;code&amp;gt;schooldata&amp;lt;/code&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Rb</name></author>	</entry>

	<entry>
		<id>http://eclr.humanities.manchester.ac.uk/index.php?title=Scraping_in_R&amp;diff=4107&amp;oldid=prev</id>
		<title>Rb at 12:06, 6 November 2015</title>
		<link rel="alternate" type="text/html" href="http://eclr.humanities.manchester.ac.uk/index.php?title=Scraping_in_R&amp;diff=4107&amp;oldid=prev"/>
				<updated>2015-11-06T12:06:56Z</updated>
		
		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr style=&quot;vertical-align: top;&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: white; color:black; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: white; color:black; text-align: center;&quot;&gt;Revision as of 12:06, 6 November 2015&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l1&quot; &gt;Line 1:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;−&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;= ScrapingIntroduction =&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;−&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;−&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Ralf Becker&amp;lt;br /&amp;gt;&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;−&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Monday, November 02, 2015&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;−&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;= The challenge =&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;= The challenge =&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Rb</name></author>	</entry>

	<entry>
		<id>http://eclr.humanities.manchester.ac.uk/index.php?title=Scraping_in_R&amp;diff=4106&amp;oldid=prev</id>
		<title>Rb: /* Merging data */</title>
		<link rel="alternate" type="text/html" href="http://eclr.humanities.manchester.ac.uk/index.php?title=Scraping_in_R&amp;diff=4106&amp;oldid=prev"/>
				<updated>2015-11-05T14:57:56Z</updated>
		
		<summary type="html">&lt;p&gt;‎&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Merging data&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr style=&quot;vertical-align: top;&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: white; color:black; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: white; color:black; text-align: center;&quot;&gt;Revision as of 14:57, 5 November 2015&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l276&quot; &gt;Line 276:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 276:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;# primary_data_merged_2014 &amp;amp;lt;- merge(primary_data,save_data,by=&amp;amp;quot;URN&amp;amp;quot;)&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;# primary_data_merged_2014 &amp;amp;lt;- merge(primary_data,save_data,by=&amp;amp;quot;URN&amp;amp;quot;)&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;−&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The merge command does what it says and it merges all the info. The variable by which it merges is &amp;lt;code&amp;gt;URN&amp;lt;/code&amp;gt; which exists in both &amp;lt;code&amp;gt;primary_data&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;save_data&amp;lt;/code&amp;gt;. check out &amp;lt;code&amp;gt;?merge&amp;lt;/code&amp;gt; to find &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;moe &lt;/del&gt;details on merging two dataframes.&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The merge command does what it says and it merges all the info. The variable by which it merges is &amp;lt;code&amp;gt;URN&amp;lt;/code&amp;gt; which exists in both &amp;lt;code&amp;gt;primary_data&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;save_data&amp;lt;/code&amp;gt;. check out &amp;lt;code&amp;gt;?merge&amp;lt;/code&amp;gt; to find &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;more &lt;/ins&gt;details on merging two dataframes.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Saving data ==&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Saving data ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Rb</name></author>	</entry>

	<entry>
		<id>http://eclr.humanities.manchester.ac.uk/index.php?title=Scraping_in_R&amp;diff=4105&amp;oldid=prev</id>
		<title>Rb at 14:54, 5 November 2015</title>
		<link rel="alternate" type="text/html" href="http://eclr.humanities.manchester.ac.uk/index.php?title=Scraping_in_R&amp;diff=4105&amp;oldid=prev"/>
				<updated>2015-11-05T14:54:23Z</updated>
		
		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr style=&quot;vertical-align: top;&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: white; color:black; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: white; color:black; text-align: center;&quot;&gt;Revision as of 14:54, 5 November 2015&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l1&quot; &gt;Line 1:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;= ScrapingIntroduction =&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Ralf Becker&amp;lt;br /&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Monday, November 02, 2015&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;= The challenge =&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;= The challenge =&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l11&quot; &gt;Line 11:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 16:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The last six digits in that url (132089) are the &amp;#039;&amp;#039;Unique reference Number (urn)&amp;#039;&amp;#039;. This will be important later.&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The last six digits in that url (132089) are the &amp;#039;&amp;#039;Unique reference Number (urn)&amp;#039;&amp;#039;. This will be important later.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;−&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;But for starters, have a look at that website and you should be able to find some information on the percentage of pupils eligible for Free School meals, here 25.2% (as per November 2015). Also browse this website and check what other info may be of interest to you. Clearly we will want to look at the &amp;#039;&amp;#039;Total Income (&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;£ &lt;/del&gt;per pupil)&amp;#039;&amp;#039; and &amp;#039;&amp;#039;Total expenditure (&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;£ &lt;/del&gt;per pupil)&amp;#039;&amp;#039; information. And we may also be interested in the actual number of pupils (&amp;#039;&amp;#039;Total number of pupils on roll (all ages)&amp;#039;&amp;#039;).&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;But for starters, have a look at that website and you should be able to find some information on the percentage of pupils eligible for Free School meals, here 25.2% (as per November 2015). Also browse this website and check what other info may be of interest to you. Clearly we will want to look at the &amp;#039;&amp;#039;Total Income (&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;Â£ &lt;/ins&gt;per pupil)&amp;#039;&amp;#039; and &amp;#039;&amp;#039;Total expenditure (&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;Â£ &lt;/ins&gt;per pupil)&amp;#039;&amp;#039; information. And we may also be interested in the actual number of pupils (&amp;#039;&amp;#039;Total number of pupils on roll (all ages)&amp;#039;&amp;#039;).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;What we want our end result to be is a spreadsheet with these bits of information for all Primary schools (and perhaps their location, Local Authority etc). I should warn you, you will need some patience and you will have to learn a little programming and html to achieve this.&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;What we want our end result to be is a spreadsheet with these bits of information for all Primary schools (and perhaps their location, Local Authority etc). I should warn you, you will need some patience and you will have to learn a little programming and html to achieve this.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l79&quot; &gt;Line 79:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 84:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;save_data &amp;amp;lt;- data.frame(&amp;amp;quot;URN&amp;amp;quot;=-9999, &amp;amp;quot;eFSM&amp;amp;quot;=NA, &amp;amp;quot;TotalIncome&amp;amp;quot;=NA, &amp;amp;quot;TotalExp&amp;amp;quot;=NA)&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;save_data &amp;amp;lt;- data.frame(&amp;amp;quot;URN&amp;amp;quot;=-9999, &amp;amp;quot;eFSM&amp;amp;quot;=NA, &amp;amp;quot;TotalIncome&amp;amp;quot;=NA, &amp;amp;quot;TotalExp&amp;amp;quot;=NA)&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;−&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;=&lt;/del&gt;= Getting the data from the website &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;=&lt;/del&gt;=&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;= Getting the data from the website =&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;In the end we will have to create a loop which repeats the information extraction for all the schools in &amp;lt;code&amp;gt;primary_data&amp;lt;/code&amp;gt;. But first we shall investigate how this works for one particular school, say the first school in the &amp;lt;code&amp;gt;primary_data&amp;lt;/code&amp;gt; dataframe, which is the Sir John Cass&amp;#039;s Foundation Primary School (urn = 100000). Later we will extract this from all schools.&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;In the end we will have to create a loop which repeats the information extraction for all the schools in &amp;lt;code&amp;gt;primary_data&amp;lt;/code&amp;gt;. But first we shall investigate how this works for one particular school, say the first school in the &amp;lt;code&amp;gt;primary_data&amp;lt;/code&amp;gt; dataframe, which is the Sir John Cass&amp;#039;s Foundation Primary School (urn = 100000). Later we will extract this from all schools.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l97&quot; &gt;Line 97:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 102:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;newschool &amp;amp;lt;- NULL&amp;#160;  # create object that will be attached to dataframe&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;newschool &amp;amp;lt;- NULL&amp;#160;  # create object that will be attached to dataframe&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;newschool[&amp;amp;quot;URN&amp;amp;quot;] &amp;amp;lt;- id&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;newschool[&amp;amp;quot;URN&amp;amp;quot;] &amp;amp;lt;- id&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;−&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;=&lt;/del&gt;= Finding the relevant information &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;=&lt;/del&gt;=&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;= Finding the relevant information =&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;While &amp;lt;code&amp;gt;PARSED&amp;lt;/code&amp;gt; is more structured than &amp;lt;code&amp;gt;SOURCE&amp;lt;/code&amp;gt;, it is still massive. But there is a very convenient function (&amp;lt;code&amp;gt;xpathSApply&amp;lt;/code&amp;gt;) that allows to quickly identify relevant bits of the website.&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;While &amp;lt;code&amp;gt;PARSED&amp;lt;/code&amp;gt; is more structured than &amp;lt;code&amp;gt;SOURCE&amp;lt;/code&amp;gt;, it is still massive. But there is a very convenient function (&amp;lt;code&amp;gt;xpathSApply&amp;lt;/code&amp;gt;) that allows to quickly identify relevant bits of the website.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;−&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;=&lt;/del&gt;== Finding the FSM data &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;=&lt;/del&gt;==&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Finding the FSM data ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Let&amp;#039;s quote the line first and explain after:&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Let&amp;#039;s quote the line first and explain after:&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l168&quot; &gt;Line 168:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 173:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;In the previous block of code I used two functions &amp;lt;code&amp;gt;xmlSApply&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;gsub&amp;lt;/code&amp;gt;. I added comments to explain what these functions do but of course you can use &amp;lt;code&amp;gt;?xmlSApply&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;?gsub&amp;lt;/code&amp;gt; to find out more details. How do you find these functions in the first place? Dr Google is your friend. For instance google for &amp;amp;quot;How can I extract values from XML elements in R?&amp;amp;quot; and you will after some searching find someone who had the same problem as you and in general you will also find someone providing a solution.&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;In the previous block of code I used two functions &amp;lt;code&amp;gt;xmlSApply&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;gsub&amp;lt;/code&amp;gt;. I added comments to explain what these functions do but of course you can use &amp;lt;code&amp;gt;?xmlSApply&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;?gsub&amp;lt;/code&amp;gt; to find out more details. How do you find these functions in the first place? Dr Google is your friend. For instance google for &amp;amp;quot;How can I extract values from XML elements in R?&amp;amp;quot; and you will after some searching find someone who had the same problem as you and in general you will also find someone providing a solution.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;−&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;=&lt;/del&gt;== Finding the financial data &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;=&lt;/del&gt;==&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;== Finding the financial data ==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The financial data we are after &amp;#039;&amp;#039;Total income&amp;#039;&amp;#039; and &amp;#039;&amp;#039;Total Expenditure&amp;#039;&amp;#039; are in tables that have a slightly different structure. Open one of the school&amp;#039;s site on the performance tables and go to the Financial Table (note that some schools have little such information published). Then right-mouse click and &amp;amp;quot;Inspect Element&amp;amp;quot; and you should see something like this:&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The financial data we are after &amp;#039;&amp;#039;Total income&amp;#039;&amp;#039; and &amp;#039;&amp;#039;Total Expenditure&amp;#039;&amp;#039; are in tables that have a slightly different structure. Open one of the school&amp;#039;s site on the performance tables and go to the Financial Table (note that some schools have little such information published). Then right-mouse click and &amp;amp;quot;Inspect Element&amp;amp;quot; and you should see something like this:&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l206&quot; &gt;Line 206:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 211:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;pre&amp;gt;##&amp;#160; &amp;#160; &amp;#160; &amp;#160;  URN&amp;#160; &amp;#160; &amp;#160; &amp;#160; eFSM TotalIncome&amp;#160; &amp;#160; TotalExp &amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;pre&amp;gt;##&amp;#160; &amp;#160; &amp;#160; &amp;#160;  URN&amp;#160; &amp;#160; &amp;#160; &amp;#160; eFSM TotalIncome&amp;#160; &amp;#160; TotalExp &amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;##&amp;#160; &amp;#160; &amp;amp;quot;100000&amp;amp;quot;&amp;#160; &amp;#160;  &amp;amp;quot; 33.2&amp;amp;quot;&amp;#160; &amp;#160;  &amp;amp;quot; 9578&amp;amp;quot;&amp;#160; &amp;#160;  &amp;amp;quot; 9628&amp;amp;quot;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;##&amp;#160; &amp;#160; &amp;amp;quot;100000&amp;amp;quot;&amp;#160; &amp;#160;  &amp;amp;quot; 33.2&amp;amp;quot;&amp;#160; &amp;#160;  &amp;amp;quot; 9578&amp;amp;quot;&amp;#160; &amp;#160;  &amp;amp;quot; 9628&amp;amp;quot;&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;−&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;You can see that we have now also added the financial data to our &amp;lt;code&amp;gt;newschool&amp;lt;/code&amp;gt; entry. We have collected everything we want. Let&amp;#039;s add that new school to our dataframe with new data, &amp;lt;code&amp;gt;save_data&amp;lt;/code&amp;gt;. For this we use the &amp;lt;code&amp;gt;rbind&amp;lt;/code&amp;gt; command which attaches &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;two matrices (or rows in our case) &lt;/del&gt;to &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;each other&lt;/del&gt;.&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;You can see that we have now also added the financial data to our &amp;lt;code&amp;gt;newschool&amp;lt;/code&amp;gt; entry. We have collected everything we want. Let&amp;#039;s add that new school to our dataframe with new data, &amp;lt;code&amp;gt;save_data&amp;lt;/code&amp;gt;. For this we use the &amp;lt;code&amp;gt;rbind&amp;lt;/code&amp;gt; command &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;(row binding) &lt;/ins&gt;which attaches &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;the &amp;lt;code&amp;gt;newschool&amp;lt;/code&amp;gt; entry &lt;/ins&gt;to &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;the already existing &amp;lt;code&amp;gt;save_data&amp;lt;/code&amp;gt; dataframe&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;&amp;#160; # add the newschool entry to the dataframe&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;&amp;#160; # add the newschool entry to the dataframe&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;#160;&amp;#160; save_data &amp;amp;lt;- rbind(save_data,newschool)&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;#160;&amp;#160; save_data &amp;amp;lt;- rbind(save_data,newschool)&amp;lt;/pre&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;We made sure that we used identical variable names in both. You could just print &amp;lt;code&amp;gt;save_data&amp;lt;/code&amp;gt; a this stage to convince yourself that we have achieved what we wanted.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;print(save_data)&amp;lt;/pre&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;pre&amp;gt;##&amp;#160; &amp;#160; &amp;#160; URN&amp;#160; eFSM TotalIncome TotalExp&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;## 1&amp;#160; -9999&amp;#160; &amp;amp;lt;NA&amp;amp;gt;&amp;#160; &amp;#160; &amp;#160; &amp;#160; &amp;amp;lt;NA&amp;amp;gt;&amp;#160; &amp;#160;  &amp;amp;lt;NA&amp;amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;## 2 100000&amp;#160; 33.2&amp;#160; &amp;#160; &amp;#160; &amp;#160; 9578&amp;#160; &amp;#160;  9628&amp;lt;/pre&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;As you can see we have now added the info of the school with URN = 100000 to our dataframe.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;= Putting it all together =&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;So far we have learned how to get the required information for one particular school. Now we want to do that for all the schools in our database, ie all the URNs in &amp;lt;code&amp;gt;primary_data&amp;lt;/code&amp;gt;. Recall that previously we defined the &amp;lt;code&amp;gt;id&amp;lt;/code&amp;gt; variable as follows:&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;id &amp;amp;lt;- primary_data$URN[1]&amp;#160; # get first URN from primary_data&amp;lt;/pre&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;And once we had defined &amp;lt;code&amp;gt;id&amp;lt;/code&amp;gt; this would determine which website we called up and all the rest followed automatically. we now need to write a loop to ensure that we do everything we did so far for all schools in our database &amp;lt;code&amp;gt;primary_data&amp;lt;/code&amp;gt;.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;for (i in 1:nrow(primary_data)){&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;#160; &amp;#160; id &amp;amp;lt;- primary_data$URN[i]&amp;#160; # ger URN from primary_data&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;#160; &amp;#160; url &amp;amp;lt;- paste0(url_base_2014,id) # assemble the complete url&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;#160; &amp;#160; &lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;#160; &amp;#160; # ADD all the parsing and information extraction code here&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;#160; &amp;#160; &lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;#160; &amp;#160; # save_data &amp;amp;lt;- rbind(save_data,newschool)&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;#160; }&amp;lt;/pre&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;code&amp;gt;nrow(primary_data)&amp;lt;/code&amp;gt; checks how many entries there are in &amp;lt;code&amp;gt;primary_data&amp;lt;/code&amp;gt; and allows the loop to run through all rows and pick out the URN of all schools. After the line that defines &amp;lt;code&amp;gt;url&amp;lt;/code&amp;gt; you will have to pase all the code we wrote above to get the data off the webpage and extract the information required and save it into &amp;lt;code&amp;gt;newschool&amp;lt;/code&amp;gt;. The last line in the loop should be &amp;lt;code&amp;gt;save_data &amp;amp;lt;- rbind(save_data,newschool)&amp;lt;/code&amp;gt;.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Once the loop has completed &amp;lt;code&amp;gt;save_data&amp;lt;/code&amp;gt; will contain all the extracted information.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;= Tidying up and merging =&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;== Changing variable types ==&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Let&amp;#039;s look at the structure of &amp;lt;code&amp;gt;save_data&amp;lt;/code&amp;gt; (after only scraping the info from one school, i.e. before implementing the previous loop).&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;str(save_data)&amp;lt;/pre&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;pre&amp;gt;## &amp;#039;data.frame&amp;#039;:&amp;#160; &amp;#160; 2 obs. of&amp;#160; 4 variables:&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;##&amp;#160; $ URN&amp;#160; &amp;#160; &amp;#160; &amp;#160; : chr&amp;#160; &amp;amp;quot;-9999&amp;amp;quot; &amp;amp;quot;100000&amp;amp;quot;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;##&amp;#160; $ eFSM&amp;#160; &amp;#160; &amp;#160;  : chr&amp;#160; NA &amp;amp;quot; 33.2&amp;amp;quot;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;##&amp;#160; $ TotalIncome: chr&amp;#160; NA &amp;amp;quot; 9578&amp;amp;quot;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;##&amp;#160; $ TotalExp&amp;#160;  : chr&amp;#160; NA &amp;amp;quot; 9628&amp;amp;quot;&amp;lt;/pre&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;You can see that all frour variables incuded in the dataframe as characterised as &amp;lt;code&amp;gt;chr&amp;lt;/code&amp;gt; variables, or text/string variables. That means you cannot really perform calculations with these. We want to transform them to numerical variables, where that makes sense. This is done now:&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;save_data[&amp;amp;quot;URN&amp;amp;quot;]&amp;amp;lt;-lapply(save_data[&amp;amp;quot;URN&amp;amp;quot;], as.numeric)&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;save_data[&amp;amp;quot;eFSM&amp;amp;quot;]&amp;amp;lt;-lapply(save_data[&amp;amp;quot;eFSM&amp;amp;quot;], as.numeric)&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;save_data[&amp;amp;quot;TotalIncome&amp;amp;quot;]&amp;amp;lt;-lapply(save_data[&amp;amp;quot;TotalIncome&amp;amp;quot;], as.numeric)&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;save_data[&amp;amp;quot;TotalExp&amp;amp;quot;]&amp;amp;lt;-lapply(save_data[&amp;amp;quot;TotalExp&amp;amp;quot;], as.numeric)&amp;lt;/pre&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;The &amp;lt;code&amp;gt;lapply(x,fctname)&amp;lt;/code&amp;gt; (list apply) function aplies a particular function (&amp;lt;code&amp;gt;fctname&amp;lt;/code&amp;gt;) to all members of a list &amp;lt;code&amp;gt;x&amp;lt;/code&amp;gt;. So we convert each element of &amp;lt;code&amp;gt;save_data[&amp;amp;quot;URN&amp;amp;quot;]&amp;lt;/code&amp;gt; to a numeric value. If we now look again&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;str(save_data)&amp;lt;/pre&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;pre&amp;gt;## &amp;#039;data.frame&amp;#039;:&amp;#160; &amp;#160; 2 obs. of&amp;#160; 4 variables:&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;##&amp;#160; $ URN&amp;#160; &amp;#160; &amp;#160; &amp;#160; : num&amp;#160; -9999 100000&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;##&amp;#160; $ eFSM&amp;#160; &amp;#160; &amp;#160;  : num&amp;#160; NA 33.2&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;##&amp;#160; $ TotalIncome: num&amp;#160; NA 9578&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;##&amp;#160; $ TotalExp&amp;#160;  : num&amp;#160; NA 9628&amp;lt;/pre&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;we notice that all variables are now numeric variables.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;== Merging data ==&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;We want to add the scraped info to the existing database of primary schools.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;# primary_data_merged_2014 &amp;amp;lt;- merge(primary_data,save_data,by=&amp;amp;quot;URN&amp;amp;quot;)&amp;lt;/pre&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;The merge command does what it says and it merges all the info. The variable by which it merges is &amp;lt;code&amp;gt;URN&amp;lt;/code&amp;gt; which exists in both &amp;lt;code&amp;gt;primary_data&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;save_data&amp;lt;/code&amp;gt;. check out &amp;lt;code&amp;gt;?merge&amp;lt;/code&amp;gt; to find moe details on merging two dataframes.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;== Saving data ==&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;# save(primary_data_merged_2014,file = &amp;amp;quot;primary_data_merged_2014.Rda&amp;amp;quot;)&amp;lt;/pre&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;= Things to look out for =&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;When you access data from the internet you should make sure that you are not acting against any terms and conditions you may have inadvertently agreed to. In our example we are all good under the Open Government Licence 3.0 (see the link at the bottom of Department for Education&amp;#039;s [https://www.gov.uk/government/organisations/department-for-education website]). The issue of scraping has indeed exercised the courts. Here is a [http://www.out-law.com/en/articles/2015/january/website-operators-can-prohibit-screen-scraping-of-unprotected-data-via-terms-and-conditions-says-eu-court-in-ryanair-case/ link] to an article that describes how Ryanair has been allowed to restrict scraping activities.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;EBay is another website that does not allow scraping of data, but they actually do offer access to the data via an Application Programming Interface (API). As it turns out many websites have APIs and allow free usage of these. Examples are [https://go.developer.ebay.com/what-ebay-api Ebay], [https://dev.twitter.com/rest/public Twitter] and [https://developers.google.com/apis-explorer/#p/ Google]. But often you will need very similar technmiques as the ones discussed here to access the data and bring them into a useable format.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Rb</name></author>	</entry>

	<entry>
		<id>http://eclr.humanities.manchester.ac.uk/index.php?title=Scraping_in_R&amp;diff=4101&amp;oldid=prev</id>
		<title>Rb: /* The challenge */</title>
		<link rel="alternate" type="text/html" href="http://eclr.humanities.manchester.ac.uk/index.php?title=Scraping_in_R&amp;diff=4101&amp;oldid=prev"/>
				<updated>2015-11-04T10:58:08Z</updated>
		
		<summary type="html">&lt;p&gt;‎&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;The challenge&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr style=&quot;vertical-align: top;&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: white; color:black; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: white; color:black; text-align: center;&quot;&gt;Revision as of 10:58, 4 November 2015&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l11&quot; &gt;Line 11:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 11:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The last six digits in that url (132089) are the &amp;#039;&amp;#039;Unique reference Number (urn)&amp;#039;&amp;#039;. This will be important later.&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The last six digits in that url (132089) are the &amp;#039;&amp;#039;Unique reference Number (urn)&amp;#039;&amp;#039;. This will be important later.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;−&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;But for starters, have a look at that website and you should be able to find some information on the percentage of pupils eligible for Free School meals, here 25.2% (as per November 2015). Also browse this website and check what other info may be of interest to you. Clearly we will want to look at the &amp;#039;&amp;#039;Total Income (&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;Â£ &lt;/del&gt;per pupil)&amp;#039;&amp;#039; and &amp;#039;&amp;#039;Total expenditure (&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;Â£ &lt;/del&gt;per pupil)&amp;#039;&amp;#039; information. And we may also be interested in the actual number of pupils (&amp;#039;&amp;#039;Total number of pupils on roll (all ages)&amp;#039;&amp;#039;).&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;+&lt;/td&gt;&lt;td style=&quot;color:black; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;But for starters, have a look at that website and you should be able to find some information on the percentage of pupils eligible for Free School meals, here 25.2% (as per November 2015). Also browse this website and check what other info may be of interest to you. Clearly we will want to look at the &amp;#039;&amp;#039;Total Income (&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;£ &lt;/ins&gt;per pupil)&amp;#039;&amp;#039; and &amp;#039;&amp;#039;Total expenditure (&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;£ &lt;/ins&gt;per pupil)&amp;#039;&amp;#039; information. And we may also be interested in the actual number of pupils (&amp;#039;&amp;#039;Total number of pupils on roll (all ages)&amp;#039;&amp;#039;).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;What we want our end result to be is a spreadsheet with these bits of information for all Primary schools (and perhaps their location, Local Authority etc). I should warn you, you will need some patience and you will have to learn a little programming and html to achieve this.&lt;/div&gt;&lt;/td&gt;&lt;td class=&#039;diff-marker&#039;&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f9f9f9; color: #333333; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #e6e6e6; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;What we want our end result to be is a spreadsheet with these bits of information for all Primary schools (and perhaps their location, Local Authority etc). I should warn you, you will need some patience and you will have to learn a little programming and html to achieve this.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Rb</name></author>	</entry>

	<entry>
		<id>http://eclr.humanities.manchester.ac.uk/index.php?title=Scraping_in_R&amp;diff=4100&amp;oldid=prev</id>
		<title>Rb: Created page with &quot;= The challenge =  The internet is a very rich source of data and here we demonstrate how to tap it. well, at least we provide an introduction into how this can be done if the...&quot;</title>
		<link rel="alternate" type="text/html" href="http://eclr.humanities.manchester.ac.uk/index.php?title=Scraping_in_R&amp;diff=4100&amp;oldid=prev"/>
				<updated>2015-11-04T10:57:47Z</updated>
		
		<summary type="html">&lt;p&gt;Created page with &amp;quot;= The challenge =  The internet is a very rich source of data and here we demonstrate how to tap it. well, at least we provide an introduction into how this can be done if the...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;= The challenge =&lt;br /&gt;
&lt;br /&gt;
The internet is a very rich source of data and here we demonstrate how to tap it. well, at least we provide an introduction into how this can be done if the data are structured very nicely.&lt;br /&gt;
&lt;br /&gt;
The example we shall use here is the following. Schools that educate Children from families with low income receive an extra amount of money for educating these children. In the educational jargon these pupils are called &amp;#039;&amp;#039;Free School Meal (FSM)&amp;#039;&amp;#039; pupils. We shall investigate how the proportion of &amp;#039;&amp;#039;FSM&amp;#039;&amp;#039; children correlates with the actual expenditure per child.&lt;br /&gt;
&lt;br /&gt;
The source of this information is going to be the Department of Education&amp;#039;s [http://www.education.gov.uk/schools/performance/ School Performance Database]. Say we are interested in Claycots School in Slough (which is close to being the largest Primary Schol in the country). Once you searched for that school you will find a large amount of information. at this stage I want you to have a good look at the url of that page:&lt;br /&gt;
&lt;br /&gt;
http://www.education.gov.uk/cgi-bin/schools/performance/school.pl?urn=132089&lt;br /&gt;
&lt;br /&gt;
The last six digits in that url (132089) are the &amp;#039;&amp;#039;Unique reference Number (urn)&amp;#039;&amp;#039;. This will be important later.&lt;br /&gt;
&lt;br /&gt;
But for starters, have a look at that website and you should be able to find some information on the percentage of pupils eligible for Free School meals, here 25.2% (as per November 2015). Also browse this website and check what other info may be of interest to you. Clearly we will want to look at the &amp;#039;&amp;#039;Total Income (Â£ per pupil)&amp;#039;&amp;#039; and &amp;#039;&amp;#039;Total expenditure (Â£ per pupil)&amp;#039;&amp;#039; information. And we may also be interested in the actual number of pupils (&amp;#039;&amp;#039;Total number of pupils on roll (all ages)&amp;#039;&amp;#039;).&lt;br /&gt;
&lt;br /&gt;
What we want our end result to be is a spreadsheet with these bits of information for all Primary schools (and perhaps their location, Local Authority etc). I should warn you, you will need some patience and you will have to learn a little programming and html to achieve this.&lt;br /&gt;
&lt;br /&gt;
= The website structure =&lt;br /&gt;
&lt;br /&gt;
Before you can think about extracting data from websites you will have to understand the rough structure of the website. First, given we want to extract information for several (thousands) of schools you want to understand what changes for this website from school to school. Go to the url in your browser and change the urn number from 132089 to 101193 (which is for the manor Infants School in barking, another huge Primary School). as you change the urn in the url and press enter you should see that really nothing changes in the structure of the website, but the entries to the Table. That is an important feature which enables us t automate the scraping (collection) of data.&lt;br /&gt;
&lt;br /&gt;
Next, you need to understand a little bit of the html code that is used to produce the website. The way to do that is to do the following (in Windows at least). Put your cursor into that part of the page you are interested in, say the table row with the info on FSM pupil percentage. Then right mouse click and then click on &amp;amp;quot;Inspect Element&amp;amp;quot; and then ... refrain from panicking! You will see something like&lt;br /&gt;
&lt;br /&gt;
[[File:InspectElement.JPG|frame|none]]&lt;br /&gt;
&lt;br /&gt;
This is how Tables look in html. &amp;lt;code&amp;gt;&amp;amp;lt;table&amp;amp;gt;&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;&amp;amp;lt;/table&amp;amp;gt;&amp;lt;/code&amp;gt; mark the beginnings and end of each table. Each row begins with &amp;lt;code&amp;gt;&amp;amp;lt;tr&amp;amp;gt;&amp;lt;/code&amp;gt; and ends with &amp;lt;code&amp;gt;&amp;amp;lt;/tr&amp;amp;gt;&amp;lt;/code&amp;gt;. these markers are called tags and we will use them soon to our advantage. Tags are your friends! In essence we will later ask R to find the row in the Table that has the entry &amp;amp;quot;Percentage of pupils eligible for FSM at any time during the past 6 years&amp;amp;quot; and then we will just take the entry in the second column of that same row to be the number we want. Sounds easy, right? Conceptually yes, of course making it happen is another sory.&lt;br /&gt;
&lt;br /&gt;
= Prep Work =&lt;br /&gt;
&lt;br /&gt;
Start with setting your working directory&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;# make sure this points to your working directory&lt;br /&gt;
setwd(&amp;amp;quot;C:/Users/Ralf/Dropbox/R/scraping/DeptEduc&amp;amp;quot;)&amp;lt;/pre&amp;gt;&lt;br /&gt;
We will require the &amp;lt;code&amp;gt;Rcurl&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;XML&amp;lt;/code&amp;gt; package, so let&amp;#039;s install these (if you havn&amp;#039;t already done so) and load them using the library command.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;x &amp;amp;lt;- c(&amp;amp;quot;RCurl&amp;amp;quot;,&amp;amp;quot;XML&amp;amp;quot;)&lt;br /&gt;
# install.packages(x) # warning: uncommenting this may take a number of minutes&lt;br /&gt;
lapply(x, library, character.only = TRUE) # load the required packages&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;## Loading required package: bitops&amp;lt;/pre&amp;gt;&lt;br /&gt;
== Get your reference spreadsheet ==&lt;br /&gt;
&lt;br /&gt;
we said we wanted this info for all Primary Schools in the country. So what we need is a list of these and, importantly, their &amp;#039;&amp;#039;urn&amp;#039;&amp;#039;. Fortunately this exists. Go to the homepage of the UK&amp;#039;s [https://www.gov.uk/government/organisations/department-for-education Department for Education] and then click on &amp;#039;&amp;#039;&amp;#039;Edubase&amp;#039;&amp;#039;&amp;#039;. In the data downloads part of the website you will find the following spreadsheet: &amp;amp;quot;All EduBase data.csv&amp;amp;quot; which you should download into your working directory. Have a look at the spreadsheet. You should easily be able to identify the column named &amp;#039;&amp;#039;URN&amp;#039;&amp;#039;. This is the list we are after. Also note that there are more than 44K entries in that spreadsheet. A lot of websirtes to look at!&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;# Read in school admin data&lt;br /&gt;
schooldata &amp;amp;lt;- read.csv(&amp;amp;quot;edubasealldata20151029.csv&amp;amp;quot;)   # make sure the name matches that of your csv file&amp;lt;/pre&amp;gt;&lt;br /&gt;
The spreadsheet is now available as the dataframe &amp;lt;code&amp;gt;schooldata&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Let&amp;#039;s rename a few variables as some of the names are extremely clumsy:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;names(schooldata)[names(schooldata)==&amp;amp;quot;PhaseOfEducation..name.&amp;amp;quot;] &amp;amp;lt;- &amp;amp;quot;EducPhase&amp;amp;quot;&lt;br /&gt;
names(schooldata)[names(schooldata)==&amp;amp;quot;EstablishmentStatus..name.&amp;amp;quot;] &amp;amp;lt;- &amp;amp;quot;EstabStatus&amp;amp;quot;&lt;br /&gt;
names(schooldata)[names(schooldata)==&amp;amp;quot;TypeOfEstablishment..name.&amp;amp;quot;] &amp;amp;lt;- &amp;amp;quot;EstabType&amp;amp;quot;&amp;lt;/pre&amp;gt;&lt;br /&gt;
And now we will select the relevant schools (Primary schools that are still open)&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;# Select relevant school&lt;br /&gt;
primary_data &amp;amp;lt;- schooldata[schooldata$EducPhase == &amp;amp;quot;Primary&amp;amp;quot;,]  # only primary schools&lt;br /&gt;
primary_data &amp;amp;lt;- primary_data[primary_data$EstabStatus == &amp;amp;quot;Open&amp;amp;quot;,]  # only open schools&amp;lt;/pre&amp;gt;&lt;br /&gt;
This leaves us with 17,988 primary schools.&lt;br /&gt;
&lt;br /&gt;
== Setup of required info ==&lt;br /&gt;
&lt;br /&gt;
Now we start preparing for the real work. First we indicate what the url (but for the school&amp;#039;s urn) looks like&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;# Base url from which to find the data - URN needs to be appended&lt;br /&gt;
url_base_2014 &amp;amp;lt;-  &amp;amp;quot;http://www.education.gov.uk/cgi-bin/schools/performance/school.pl?urn=&amp;amp;quot;&amp;lt;/pre&amp;gt;&lt;br /&gt;
We will soon combine this with the &amp;#039;&amp;#039;URN&amp;#039;&amp;#039; information from the reference spreadsheet.&lt;br /&gt;
&lt;br /&gt;
Let&amp;#039;s prepare to get information from the website. As we aregued before, our search strategy is going to be to find Table rows with entries that identify the rows with the data we are interested in. So let&amp;#039;s collect thise pieces of identifying information in two list (although we will not really use them later) as they focus your mind:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;# this identifies the rows of interest&lt;br /&gt;
get_info = rbind(&amp;amp;quot;Percentage of pupils eligible for FSM at any time during the past 6 years&amp;amp;quot;,&lt;br /&gt;
                 &amp;amp;quot;Total income&amp;amp;quot;,&lt;br /&gt;
                 &amp;amp;quot;Total expenditure&amp;amp;quot;)&lt;br /&gt;
&lt;br /&gt;
# these will be the variable names for the respective data&lt;br /&gt;
get_info_new = rbind(&amp;amp;quot;eFSM&amp;amp;quot;,&lt;br /&gt;
                 &amp;amp;quot;TotalIncome&amp;amp;quot;,&lt;br /&gt;
                 &amp;amp;quot;TotalExp&amp;amp;quot;)&amp;lt;/pre&amp;gt;&lt;br /&gt;
Now we prepare the dataframe into which we will save the scraped information. We are basically creating a dummay entry into a new dataframe called &amp;lt;code&amp;gt;save_data&amp;lt;/code&amp;gt;:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;save_data &amp;amp;lt;- data.frame(&amp;amp;quot;URN&amp;amp;quot;=-9999, &amp;amp;quot;eFSM&amp;amp;quot;=NA, &amp;amp;quot;TotalIncome&amp;amp;quot;=NA, &amp;amp;quot;TotalExp&amp;amp;quot;=NA)&amp;lt;/pre&amp;gt;&lt;br /&gt;
== Getting the data from the website ==&lt;br /&gt;
&lt;br /&gt;
In the end we will have to create a loop which repeats the information extraction for all the schools in &amp;lt;code&amp;gt;primary_data&amp;lt;/code&amp;gt;. But first we shall investigate how this works for one particular school, say the first school in the &amp;lt;code&amp;gt;primary_data&amp;lt;/code&amp;gt; dataframe, which is the Sir John Cass&amp;#039;s Foundation Primary School (urn = 100000). Later we will extract this from all schools.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;id &amp;amp;lt;- primary_data$URN[1]  # get first URN from primary_data&lt;br /&gt;
url &amp;amp;lt;- paste0(url_base_2014,id) # assemble the complete url&lt;br /&gt;
print(url)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;## [1] &amp;amp;quot;http://www.education.gov.uk/cgi-bin/schools/performance/school.pl?urn=100000&amp;amp;quot;&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;SOURCE &amp;amp;lt;-  getURL(url,encoding=&amp;amp;quot;UTF-8&amp;amp;quot;) #Download the page&lt;br /&gt;
PARSED &amp;amp;lt;- htmlParse(SOURCE) #Format the html code &amp;lt;/pre&amp;gt;&lt;br /&gt;
The &amp;lt;code&amp;gt;getURL&amp;lt;/code&amp;gt; function retrieves what you saw when you looked at the html code of the website &amp;lt;code&amp;gt;url&amp;lt;/code&amp;gt; and saves it into the &amp;lt;code&amp;gt;SOURCE&amp;lt;/code&amp;gt; object. At this stage this is basically a very long and unstructured piece of text. The next line &amp;lt;code&amp;gt;htmlParse(SOURCE)&amp;lt;/code&amp;gt; then takes this text and divides it into little bits using the tags (like &amp;lt;code&amp;gt;&amp;amp;lt;table&amp;amp;gt;&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;&amp;amp;lt;tr&amp;amp;gt;&amp;lt;/code&amp;gt;) which we saw earlier. This will facilitate our search process.&lt;br /&gt;
&lt;br /&gt;
Print &amp;lt;code&amp;gt;SOURCE&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;PARSED&amp;lt;/code&amp;gt; to see the difference.&lt;br /&gt;
&lt;br /&gt;
We continue by creating a new object in which we will save all the new school&amp;#039;s info. Eventually we will add this to &amp;lt;code&amp;gt;save_data&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;newschool &amp;amp;lt;- NULL   # create object that will be attached to dataframe&lt;br /&gt;
newschool[&amp;amp;quot;URN&amp;amp;quot;] &amp;amp;lt;- id&amp;lt;/pre&amp;gt;&lt;br /&gt;
== Finding the relevant information ==&lt;br /&gt;
&lt;br /&gt;
While &amp;lt;code&amp;gt;PARSED&amp;lt;/code&amp;gt; is more structured than &amp;lt;code&amp;gt;SOURCE&amp;lt;/code&amp;gt;, it is still massive. But there is a very convenient function (&amp;lt;code&amp;gt;xpathSApply&amp;lt;/code&amp;gt;) that allows to quickly identify relevant bits of the website.&lt;br /&gt;
&lt;br /&gt;
=== Finding the FSM data ===&lt;br /&gt;
&lt;br /&gt;
Let&amp;#039;s quote the line first and explain after:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;temp_tr &amp;amp;lt;- xpathSApply(PARSED, &amp;amp;quot;//tr[count(td)=1]&amp;amp;quot;)  # get all rows with one td tag count, as results are in Table&amp;lt;/pre&amp;gt;&lt;br /&gt;
The first input is the object &amp;lt;code&amp;gt;PARSED&amp;lt;/code&amp;gt;, the second input is something like a search criterion for tags. This is when you haveto go back to the html info to understand what you are looking for. The FSM info is in a table with two columns, the first being a header column (&lt;br /&gt;
&amp;lt;th&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/th&amp;gt;&lt;br /&gt;
) and the second being an ordinary column (&lt;br /&gt;
&amp;lt;td&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/td&amp;gt;&lt;br /&gt;
). So if we find all table rows (&lt;br /&gt;
&amp;lt;tr&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/tr&amp;gt;&lt;br /&gt;
) with one &amp;amp;quot;&lt;br /&gt;
&amp;lt;td&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
&amp;lt;/td&amp;gt;&lt;br /&gt;
&amp;amp;quot; entry. If you want to understand the syntax of this search criterion and want to understand how to adjust it to your problem you should check this [http://www.zvon.org/xxl/XPathTutorial/General/examples.html website].&lt;br /&gt;
&lt;br /&gt;
Look at the first two elements to see what &amp;lt;code&amp;gt;temp_tr&amp;lt;/code&amp;gt; delivers:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;head(temp_tr,2)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;## [[1]]&lt;br /&gt;
## &amp;amp;lt;tr&amp;amp;gt;&lt;br /&gt;
##   &amp;amp;lt;th&amp;amp;gt;&lt;br /&gt;
##     &amp;amp;lt;label&amp;amp;gt;Street&amp;amp;lt;/label&amp;amp;gt;&lt;br /&gt;
##   &amp;amp;lt;/th&amp;amp;gt;&lt;br /&gt;
##   &amp;amp;lt;td class=&amp;amp;quot;num &amp;amp;quot;&amp;amp;gt; St James&amp;#039;s Passage&amp;amp;lt;/td&amp;amp;gt;&lt;br /&gt;
## &amp;amp;lt;/tr&amp;amp;gt; &lt;br /&gt;
## &lt;br /&gt;
## [[2]]&lt;br /&gt;
## &amp;amp;lt;tr&amp;amp;gt;&lt;br /&gt;
##   &amp;amp;lt;th&amp;amp;gt;&lt;br /&gt;
##     &amp;amp;lt;label&amp;amp;gt;Town&amp;amp;lt;/label&amp;amp;gt;&lt;br /&gt;
##   &amp;amp;lt;/th&amp;amp;gt;&lt;br /&gt;
##   &amp;amp;lt;td class=&amp;amp;quot;num &amp;amp;quot;&amp;amp;gt; London&amp;amp;lt;/td&amp;amp;gt;&lt;br /&gt;
## &amp;amp;lt;/tr&amp;amp;gt;&amp;lt;/pre&amp;gt;&lt;br /&gt;
each element is a html row and indeed has one elemnt of &amp;lt;code&amp;gt;&amp;amp;lt;td&amp;amp;gt;...&amp;amp;lt;td&amp;amp;gt;&amp;lt;/code&amp;gt; (and in fact being preceeded by a &amp;lt;code&amp;gt;&amp;amp;lt;th&amp;amp;gt; ... &amp;amp;lt;/th&amp;amp;gt;&amp;lt;/code&amp;gt; element). We will now loop through all these elements and check whether we can find one which we are interested in.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;  # search through selected rows to find the ones we need&lt;br /&gt;
  if (length(temp_tr)&amp;amp;gt;0){   # case in which school data are available&lt;br /&gt;
    for (j in 1:length(temp_tr)){&lt;br /&gt;
      temp &amp;amp;lt;- temp_tr[[j]]              # this picks out the current row (including all tags)&lt;br /&gt;
      temp1 &amp;amp;lt;- xmlSApply(temp,xmlValue) # this extracts the values from the tags&lt;br /&gt;
      # check if we found the value we are after&lt;br /&gt;
      if (temp1[[1]] == &amp;amp;quot;Percentage of pupils eligible for FSM at any time during the past 6 years&amp;amp;quot;) {&lt;br /&gt;
        var &amp;amp;lt;- &amp;amp;quot;eFSM&amp;amp;quot;                      # Set&amp;#039;s the variable name we found&lt;br /&gt;
        temp2  &amp;amp;lt;- gsub(&amp;amp;quot;%&amp;amp;quot;,&amp;amp;quot;&amp;amp;quot;,temp1[[2]])  # gsub is a function to subsitute something&lt;br /&gt;
        newschool[var] &amp;amp;lt;- temp2            # places the value in newschool&lt;br /&gt;
      }&lt;br /&gt;
    }&lt;br /&gt;
  } &amp;lt;/pre&amp;gt;&lt;br /&gt;
To check what we have achieved so far we look at &amp;lt;code&amp;gt;newschool&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;print(newschool)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;##      URN     eFSM &lt;br /&gt;
## &amp;amp;quot;100000&amp;amp;quot;  &amp;amp;quot; 33.2&amp;amp;quot;&amp;lt;/pre&amp;gt;&lt;br /&gt;
As you can see we have added the percentage (without percentage sign) of chidren on Free School Meals.&lt;br /&gt;
&lt;br /&gt;
Just a few extra notes on the previous block of code. It starts with an &amp;lt;code&amp;gt;if&amp;lt;/code&amp;gt; condition. This has to be added as some school&amp;#039;s webpage has no relevant data (no elements in &amp;lt;code&amp;gt;temp_tr&amp;lt;/code&amp;gt;) and if that is the case the code would fail. When I first wrote this code I did not have this condition and while the code worked well for the first 100 odd schools it failed then as there was a school with no available information. When you write codes like this you will have to do this a lot, proofing the code against some unexpected behaviour. This is why code never works on the first go and of course the code can fail in all sorts of different ways. Think of being a detective that needs to figire out how it fails and then think about how to fix it.&lt;br /&gt;
&lt;br /&gt;
In the previous block of code I used two functions &amp;lt;code&amp;gt;xmlSApply&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;gsub&amp;lt;/code&amp;gt;. I added comments to explain what these functions do but of course you can use &amp;lt;code&amp;gt;?xmlSApply&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;?gsub&amp;lt;/code&amp;gt; to find out more details. How do you find these functions in the first place? Dr Google is your friend. For instance google for &amp;amp;quot;How can I extract values from XML elements in R?&amp;amp;quot; and you will after some searching find someone who had the same problem as you and in general you will also find someone providing a solution.&lt;br /&gt;
&lt;br /&gt;
=== Finding the financial data ===&lt;br /&gt;
&lt;br /&gt;
The financial data we are after &amp;#039;&amp;#039;Total income&amp;#039;&amp;#039; and &amp;#039;&amp;#039;Total Expenditure&amp;#039;&amp;#039; are in tables that have a slightly different structure. Open one of the school&amp;#039;s site on the performance tables and go to the Financial Table (note that some schools have little such information published). Then right-mouse click and &amp;amp;quot;Inspect Element&amp;amp;quot; and you should see something like this:&lt;br /&gt;
&lt;br /&gt;
[[File:InspectElement2.JPG|frame|none]]&lt;br /&gt;
&lt;br /&gt;
Here you can see that in this table each row has one &amp;lt;code&amp;gt;&amp;amp;lt;th&amp;amp;gt; ...&amp;amp;lt;/th&amp;amp;gt;&amp;lt;/code&amp;gt; element and four &amp;lt;code&amp;gt;&amp;amp;lt;tr&amp;amp;gt;...&amp;amp;lt;/tr&amp;amp;gt;&amp;lt;/code&amp;gt; elements. This means that we have to change our filtering strategy&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;temp_tr &amp;amp;lt;- xpathSApply(PARSED, &amp;amp;quot;//tr[count(td)=4]&amp;amp;quot;)  # get all rows with four td tag count, as results are in Table&amp;lt;/pre&amp;gt;&lt;br /&gt;
Now we have filtered all rows from tables with four columns (plus a header column although we didn&amp;#039;t specify this).&lt;br /&gt;
&lt;br /&gt;
What follows now is the serach for the rows with &amp;#039;&amp;#039;Total income&amp;#039;&amp;#039; and &amp;#039;&amp;#039;Total expenditure&amp;#039;&amp;#039; but that works in almost exactly the same way as for &amp;#039;&amp;#039;FSM&amp;#039;&amp;#039;. See whether you can spot the slight differences.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;  # search through selected rows to find the ones we need&lt;br /&gt;
  if (length(temp_tr)&amp;amp;gt;0){   # case in which school data are available&lt;br /&gt;
    for (j in 1:length(temp_tr)){&lt;br /&gt;
      temp &amp;amp;lt;- temp_tr[[j]]              # this picks out the current row (including all tags)&lt;br /&gt;
      temp1 &amp;amp;lt;- xmlSApply(temp,xmlValue) # this extracts the values from the tags&lt;br /&gt;
      # check if we found the value we are after&lt;br /&gt;
      if (temp1[[1]] == &amp;amp;quot;Total income&amp;amp;quot;) {&lt;br /&gt;
        var &amp;amp;lt;- &amp;amp;quot;TotalIncome&amp;amp;quot;      # Set&amp;#039;s the variable name we found&lt;br /&gt;
        temp2  &amp;amp;lt;- temp1[[2]]      # The school&amp;#039;s finance data is in columns 2&lt;br /&gt;
        newschool[var] &amp;amp;lt;- temp2   # places the value in newschool&lt;br /&gt;
      }&lt;br /&gt;
      if (temp1[[1]] == &amp;amp;quot;Total expenditure&amp;amp;quot;) {&lt;br /&gt;
        var &amp;amp;lt;- &amp;amp;quot;TotalExp&amp;amp;quot;         # Set&amp;#039;s the variable name we found&lt;br /&gt;
        temp2  &amp;amp;lt;- temp1[[2]]      # The school&amp;#039;s finance data is in columns 2&lt;br /&gt;
        newschool[var] &amp;amp;lt;- temp2   # places the value in newschool&lt;br /&gt;
      }&lt;br /&gt;
    }&lt;br /&gt;
  } &amp;lt;/pre&amp;gt;&lt;br /&gt;
Clearly we now looked for two pieces of information rather than just one, which is why we had two &amp;lt;code&amp;gt;if&amp;lt;/code&amp;gt; conditions. If you have many more you may want to learn about the &amp;lt;code&amp;gt;switch&amp;lt;/code&amp;gt; function instead. We also didn&amp;#039;t need to strib any &amp;amp;quot;%&amp;amp;quot; signs out of our values, so the definition of &amp;lt;code&amp;gt;temp2&amp;lt;/code&amp;gt; is a little more straightforward.&lt;br /&gt;
&lt;br /&gt;
To check what we have achieved so far we look at &amp;lt;code&amp;gt;newschool&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;print(newschool)&amp;lt;/pre&amp;gt;&lt;br /&gt;
&amp;lt;pre&amp;gt;##         URN        eFSM TotalIncome    TotalExp &lt;br /&gt;
##    &amp;amp;quot;100000&amp;amp;quot;     &amp;amp;quot; 33.2&amp;amp;quot;     &amp;amp;quot; 9578&amp;amp;quot;     &amp;amp;quot; 9628&amp;amp;quot;&amp;lt;/pre&amp;gt;&lt;br /&gt;
You can see that we have now also added the financial data to our &amp;lt;code&amp;gt;newschool&amp;lt;/code&amp;gt; entry. We have collected everything we want. Let&amp;#039;s add that new school to our dataframe with new data, &amp;lt;code&amp;gt;save_data&amp;lt;/code&amp;gt;. For this we use the &amp;lt;code&amp;gt;rbind&amp;lt;/code&amp;gt; command which attaches two matrices (or rows in our case) to each other.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre class=&amp;quot;r&amp;quot;&amp;gt;  # add the newschool entry to the dataframe&lt;br /&gt;
  save_data &amp;amp;lt;- rbind(save_data,newschool)&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Rb</name></author>	</entry>

	</feed>