Difference between revisions of "R reg diag"

From ECLR
Jump to: navigation, search
(Heteroskedasticity)
Line 5: Line 5:
 
One of the Gauss-Markov assumption is that the variance of the regression error terms is constant. If they are not, then the OLS parameter estimators will not be efficient and one needs to use heteroskedasticity robust standard errors to obtain valid inference on regression coefficients (see [[R_robust_se]]).
 
One of the Gauss-Markov assumption is that the variance of the regression error terms is constant. If they are not, then the OLS parameter estimators will not be efficient and one needs to use heteroskedasticity robust standard errors to obtain valid inference on regression coefficients (see [[R_robust_se]]).
  
Tests for heteroskedasticity are usually based on an auxiliary regression of estimated squared regression residuals on a set of explanatory variables that are suspected to be related to the potentially changing error variance. We continue the example we started in [[R_Regression#A First Example]]
+
Tests for heteroskedasticity are usually based on an auxiliary regression of estimated squared regression residuals on a set of explanatory variables that are suspected to be related to the potentially changing error variance. We continue the example we started in [[R_Regression#A first example]] and which is replicated here, but note the first line which we include to gain access to the procedures in the AER toolbox:
 +
   
 +
    <span style="color:#0000ff">library(AER)</span>  # allow access to AER package
 +
    # This is my first R regression!
 +
    setwd("T:/ECLR/R/FirstSteps")              # This sets the working directory
 +
    mydata <- read.csv("mroz.csv")  # Opens mroz.csv from working directory
 +
     
 +
    # Now convert variables with "." to num with NA
 +
    mydata<source enclose=none>$</source>wage <- as.numeric(as.character(mydata<source enclose=none>$</source>wage))
 +
    mydata<source enclose=none>$</source>lwage <- as.numeric(as.character(mydata<source enclose=none>$</source>lwage))
 +
 
 +
Before we run our initial regression model we shall restrict the dataframe <source enclose=none>mydata</source> to those data that do not have missing wage information, using the following <source enclose=none>subset</source> command:
 +
 
 +
    <span style="color:#0000ff">mydata <- subset(mydata, wage!="NA")</span>  # select non NA data
 +
 
 +
Now we can run our initial regression:
 +
     
 +
    # Run a regression
 +
    reg_ex1 <- lm(lwage~exper+log(huswage),data=mydata)
  
 
= Autocorrelation =
 
= Autocorrelation =

Revision as of 09:13, 14 April 2015

When estimating regression models you will usually want to undertake some diagnostic testing. The functions we will use are all contained in the "AER" package (see the relevant CRAN webpage).

Heteroskedasticity

One of the Gauss-Markov assumption is that the variance of the regression error terms is constant. If they are not, then the OLS parameter estimators will not be efficient and one needs to use heteroskedasticity robust standard errors to obtain valid inference on regression coefficients (see R_robust_se).

Tests for heteroskedasticity are usually based on an auxiliary regression of estimated squared regression residuals on a set of explanatory variables that are suspected to be related to the potentially changing error variance. We continue the example we started in R_Regression#A first example and which is replicated here, but note the first line which we include to gain access to the procedures in the AER toolbox:

    library(AER)  # allow access to AER package
    # This is my first R regression!
    setwd("T:/ECLR/R/FirstSteps")              # This sets the working directory
    mydata <- read.csv("mroz.csv")  # Opens mroz.csv from working directory
     
    # Now convert variables with "." to num with NA
    mydata$wage <- as.numeric(as.character(mydata$wage))
    mydata$lwage <- as.numeric(as.character(mydata$lwage))

Before we run our initial regression model we shall restrict the dataframe mydata to those data that do not have missing wage information, using the following subset command:

   mydata <- subset(mydata, wage!="NA")  # select non NA data

Now we can run our initial regression:

    # Run a regression
    reg_ex1 <- lm(lwage~exper+log(huswage),data=mydata)

Autocorrelation

Residual Normality

Structural Break