Difference between revisions of "R reg diag"

From ECLR
Jump to: navigation, search
(Heteroskedasticity)
Line 24: Line 24:
 
     # Run a regression
 
     # Run a regression
 
     reg_ex1 <- lm(lwage~exper+log(huswage),data=mydata)
 
     reg_ex1 <- lm(lwage~exper+log(huswage),data=mydata)
 +
    reg_ex1_sm <- summary(reg_ex1)
 +
 +
as we have learned in [[R_Regression]] these two objects now contain all the information we would want from a regression.
 +
A test for error heteroskedasticty is now based on an auxiliary regression that uses the estimated squared regression residuals (as proxies for error variance) as a dependent variable and a range of variables as explanatory variables. This testing principle goes back to Trevor Breusch and Adrian Pagan two excellent Australian econometricians <ref>T. S. Breusch and A. R. Pagan, (1979), A Simple Test for Heteroscedasticity and Random Coefficient Variation, Econometrica, Vol. 47, No. 5, pp. 1287-1294</ref> What these variables should be depends on the problem. In general any variables that the researcher suspects correlate to the error variance.
 +
 +
The <source enclose=none>AER</source> toolbox contains a procedure that makes performing this test a doodle.
 +
 +
    # Test wheter residuals are homoscedasticity
 +
    print(bptest(reg_ex1))
 +
 +
will deliver:
 +
 +
    studentized Breusch-Pagan test
 +
 +
    data:  reg_ex1
 +
    BP = 9.5792, df = 2, p-value = 0.008316
 +
 +
The p-value comes from the asymptotic distribution (Chi-Square distribution with 2 degrees of freedom) of the test statistic assuming that the null hypothesis of homoskedasticity is valid. In this particular case this p-value is fairly small (smaller than 1%) which woul lead us to reject the null hypothesis of homoskedasticity at a 1% significance level.
 +
 +
This was pretty easy, but I want to demonstrate what is really happening behind this test so that you know how to change it according to your problem at hand. As discussed above, we need to decide on which explanatory variables to use in order to execute the test. The default choice of the <source enclose=none>bptest</source> is to use the same explanatory variables as those used in the original regression model, here <source enclose=none>exper</source> and <source enclose=none>log(huswage)</source>. You could get the same result by running this regression yourself and calculating the test statistic <math>n*R^2</math>
  
 
= Autocorrelation =
 
= Autocorrelation =

Revision as of 09:42, 14 April 2015

When estimating regression models you will usually want to undertake some diagnostic testing. The functions we will use are all contained in the "AER" package (see the relevant CRAN webpage).

Heteroskedasticity

One of the Gauss-Markov assumption is that the variance of the regression error terms is constant. If they are not, then the OLS parameter estimators will not be efficient and one needs to use heteroskedasticity robust standard errors to obtain valid inference on regression coefficients (see R_robust_se).

Tests for heteroskedasticity are usually based on an auxiliary regression of estimated squared regression residuals on a set of explanatory variables that are suspected to be related to the potentially changing error variance. We continue the example we started in R_Regression#A first example and which is replicated here, but note the first line which we include to gain access to the procedures in the AER toolbox:

    library(AER)  # allow access to AER package
    # This is my first R regression!
    setwd("T:/ECLR/R/FirstSteps")              # This sets the working directory
    mydata <- read.csv("mroz.csv")  # Opens mroz.csv from working directory
     
    # Now convert variables with "." to num with NA
    mydata$wage <- as.numeric(as.character(mydata$wage))
    mydata$lwage <- as.numeric(as.character(mydata$lwage))

Before we run our initial regression model we shall restrict the dataframe mydata to those data that do not have missing wage information, using the following subset command:

   mydata <- subset(mydata, wage!="NA")  # select non NA data

Now we can run our initial regression:

    # Run a regression
    reg_ex1 <- lm(lwage~exper+log(huswage),data=mydata)
    reg_ex1_sm <- summary(reg_ex1)

as we have learned in R_Regression these two objects now contain all the information we would want from a regression. A test for error heteroskedasticty is now based on an auxiliary regression that uses the estimated squared regression residuals (as proxies for error variance) as a dependent variable and a range of variables as explanatory variables. This testing principle goes back to Trevor Breusch and Adrian Pagan two excellent Australian econometricians [1] What these variables should be depends on the problem. In general any variables that the researcher suspects correlate to the error variance.

The AER toolbox contains a procedure that makes performing this test a doodle.

   # Test wheter residuals are homoscedasticity
   print(bptest(reg_ex1))

will deliver:

   studentized Breusch-Pagan test
   data:  reg_ex1 
   BP = 9.5792, df = 2, p-value = 0.008316

The p-value comes from the asymptotic distribution (Chi-Square distribution with 2 degrees of freedom) of the test statistic assuming that the null hypothesis of homoskedasticity is valid. In this particular case this p-value is fairly small (smaller than 1%) which woul lead us to reject the null hypothesis of homoskedasticity at a 1% significance level.

This was pretty easy, but I want to demonstrate what is really happening behind this test so that you know how to change it according to your problem at hand. As discussed above, we need to decide on which explanatory variables to use in order to execute the test. The default choice of the bptest is to use the same explanatory variables as those used in the original regression model, here exper and log(huswage). You could get the same result by running this regression yourself and calculating the test statistic [math]n*R^2[/math]

Autocorrelation

Residual Normality

Structural Break

  1. T. S. Breusch and A. R. Pagan, (1979), A Simple Test for Heteroscedasticity and Random Coefficient Variation, Econometrica, Vol. 47, No. 5, pp. 1287-1294