Difference between revisions of "Dummy Variables in R"
Line 2: | Line 2: | ||
== Dummy Variables == | == Dummy Variables == | ||
+ | |||
+ | Econometricians think of dummy variables as binary (0/1) variables. And in some datasets you will find the data presented as such right from the start. This is, for instance, the case for the Baseball wages dataset. Importing the dataset you will find information on the position each player takes in its team. These are firstbase (frstbase), second base (scndbase), thitd base (thrdbase), short stop (shrtstop), outfield (outfield) and catcher (catcher). Each player is given exactly one of these positions. | ||
+ | |||
+ | setwd("YOUR DIRECTORY PATH") # This sets the working directory | ||
+ | load("mlb1.RData") # Opens mlb1 dataset from R datafile | ||
+ | |||
As we discussed in the [[ R_Data|Data Section]] categorical variables are labelled as factor variables in R. As econometricians we often think of these as dummy variables. In [[R_Analysis#Some_basic_summary_statistics| the data analysis section]] we already learned how to get frequency counts of categorical variables using the <source enclose=none>table( )</source> or <source enclose=none>summary( )</source> command. | As we discussed in the [[ R_Data|Data Section]] categorical variables are labelled as factor variables in R. As econometricians we often think of these as dummy variables. In [[R_Analysis#Some_basic_summary_statistics| the data analysis section]] we already learned how to get frequency counts of categorical variables using the <source enclose=none>table( )</source> or <source enclose=none>summary( )</source> command. |
Revision as of 09:27, 7 July 2015
In this section we explain how dummy variables can be used in Regressions and we will utilise the Baseball Wages dataset for this purpose.
Dummy Variables
Econometricians think of dummy variables as binary (0/1) variables. And in some datasets you will find the data presented as such right from the start. This is, for instance, the case for the Baseball wages dataset. Importing the dataset you will find information on the position each player takes in its team. These are firstbase (frstbase), second base (scndbase), thitd base (thrdbase), short stop (shrtstop), outfield (outfield) and catcher (catcher). Each player is given exactly one of these positions.
setwd("YOUR DIRECTORY PATH") # This sets the working directory load("mlb1.RData") # Opens mlb1 dataset from R datafile
As we discussed in the Data Section categorical variables are labelled as factor variables in R. As econometricians we often think of these as dummy variables. In the data analysis section we already learned how to get frequency counts of categorical variables using the table( )
or summary( )
command.
Using Categorical/Factor variables in regressions
When using such categorical variables in regressions as explanatory variables we will use them in the form of dummy variables. Our practice dataset doesn't have any real factor variables (remember, wages and log(wages) were initially treated as factor variables due to the missing values.