Difference between revisions of "R Packages"
(→Basic Data Analysis) |
(→Basic Data Analysis) |
||
Line 21: | Line 21: | ||
Max. :4950.0 Max. :60.00 | Max. :4950.0 Max. :60.00 | ||
− | Another extremely useful statistic is the correlation between different variables. This is achieved with the <source enclose=none>cor( )</source> function. | + | Another extremely useful statistic is the correlation between different variables. This is achieved with the <source enclose=none>cor( )</source> function. Let's say we want the correlation between <source enclose=none>educ, motheduc, fatheduc</source>, then we use in the same manner: |
cor(mydata[,c("educ","motheduc","fatheduc")]) | cor(mydata[,c("educ","motheduc","fatheduc")]) | ||
+ | |||
+ | resulting in the following correlation matrix | ||
+ | |||
+ | educ motheduc fatheduc | ||
+ | educ 1.0000000 0.4353365 0.4424582 | ||
+ | motheduc 0.4353365 1.0000000 0.5730717 | ||
+ | fatheduc 0.4424582 0.5730717 1.0000000 | ||
== Packages == | == Packages == |
Revision as of 22:07, 14 January 2015
In this section we shall demonstrate how to do some basic data analysis on data in a dataframe. Eventually we will use this task to also introduce how packages are used in R.
Basic Data Analysis
The easiest way to find basic summary statistics on your variables contained in a dataframe is the following command:
summary(mydata)
You will find that this will provide a range of summary statistics for each variable (Minimum and Maximum, Quartiles, Mean and Median). If the dataframe contains a lot of variables, as the dataframe based on mroz.xls, this output can be somewhat lengthy. Say you are only interested in the summary statistics for two of the variables hours and husage
, then you would want to select these two variables only. The way to do that is the following:
summary(mydata[,c("hours","husage")])
This will produce the following output:
hours husage Min. : 0.0 Min. :30.00 1st Qu.: 0.0 1st Qu.:38.00 Median : 288.0 Median :46.00 Mean : 740.6 Mean :45.12 3rd Qu.:1516.0 3rd Qu.:52.00 Max. :4950.0 Max. :60.00
Another extremely useful statistic is the correlation between different variables. This is achieved with the cor( )
function. Let's say we want the correlation between educ, motheduc, fatheduc
, then we use in the same manner:
cor(mydata[,c("educ","motheduc","fatheduc")])
resulting in the following correlation matrix
educ motheduc fatheduc educ 1.0000000 0.4353365 0.4424582 motheduc 0.4353365 1.0000000 0.5730717 fatheduc 0.4424582 0.5730717 1.0000000
Packages
The basic R software has some basic functionality, but the power of R comes from the ability to use code written to perform statistical and econometric techniques that has been written by other people. These additional pieces of software are called packages and the next step will be to learn how ot use these.
Such packages do not come pre-installed into R, but luckily, they are easily installed and used.