R reg diag

From ECLR
Revision as of 08:13, 14 April 2015 by Rb (talk | contribs)
Jump to: navigation, search

When estimating regression models you will usually want to undertake some diagnostic testing. The functions we will use are all contained in the "AER" package (see the relevant CRAN webpage).

Heteroskedasticity

One of the Gauss-Markov assumption is that the variance of the regression error terms is constant. If they are not, then the OLS parameter estimators will not be efficient and one needs to use heteroskedasticity robust standard errors to obtain valid inference on regression coefficients (see R_robust_se).

Tests for heteroskedasticity are usually based on an auxiliary regression of estimated squared regression residuals on a set of explanatory variables that are suspected to be related to the potentially changing error variance. We continue the example we started in R_Regression#A first example and which is replicated here, but note the first line which we include to gain access to the procedures in the AER toolbox:

    library(AER)  # allow access to AER package
    # This is my first R regression!
    setwd("T:/ECLR/R/FirstSteps")              # This sets the working directory
    mydata <- read.csv("mroz.csv")  # Opens mroz.csv from working directory
     
    # Now convert variables with "." to num with NA
    mydata$wage <- as.numeric(as.character(mydata$wage))
    mydata$lwage <- as.numeric(as.character(mydata$lwage))

Before we run our initial regression model we shall restrict the dataframe mydata to those data that do not have missing wage information, using the following subset command:

   mydata <- subset(mydata, wage!="NA")  # select non NA data

Now we can run our initial regression:

    # Run a regression
    reg_ex1 <- lm(lwage~exper+log(huswage),data=mydata)

Autocorrelation

Residual Normality

Structural Break