Dummy Variables in R

From ECLR
Revision as of 10:27, 7 July 2015 by Rb (talk | contribs)
Jump to: navigation, search

In this section we explain how dummy variables can be used in Regressions and we will utilise the Baseball Wages dataset for this purpose.

Dummy Variables

Econometricians think of dummy variables as binary (0/1) variables. And in some datasets you will find the data presented as such right from the start. This is, for instance, the case for the Baseball wages dataset. Importing the dataset you will find information on the position each player takes in its team. These are firstbase (frstbase), second base (scndbase), thitd base (thrdbase), short stop (shrtstop), outfield (outfield) and catcher (catcher). Each player is given exactly one of these positions.

    setwd("YOUR DIRECTORY PATH")              # This sets the working directory
    load("mlb1.RData")  # Opens mlb1 dataset from R datafile


As we discussed in the Data Section categorical variables are labelled as factor variables in R. As econometricians we often think of these as dummy variables. In the data analysis section we already learned how to get frequency counts of categorical variables using the table( ) or summary( ) command.

Using Categorical/Factor variables in regressions

When using such categorical variables in regressions as explanatory variables we will use them in the form of dummy variables. Our practice dataset doesn't have any real factor variables (remember, wages and log(wages) were initially treated as factor variables due to the missing values.