Difference between revisions of "R robust se"
(→Which package to use) |
(→Heteroskedasticity Robust F-tests) |
||
(5 intermediate revisions by the same user not shown) | |||
Line 13: | Line 13: | ||
library(sandwich) | library(sandwich) | ||
− | All code snippets below assume that you have done so. <span color | + | All code snippets below assume that you have done so. <span style="color:red">In fact, you may instead want to use another package called "AER" which contains the sandwich package</span> and other relevant packaes (such as the one used for instrumental variables estimation [[IV_in_R]]). See [http://cran.r-project.org/web/packages/sandwich/index.html the relevant CRAN webpage] |
== Heteroskedasticity robust standard errors == | == Heteroskedasticity robust standard errors == | ||
Line 49: | Line 49: | ||
coeftest(reg_ex1, vcov = vcovHAC(reg_ex1)) | coeftest(reg_ex1, vcov = vcovHAC(reg_ex1)) | ||
+ | |||
+ | |||
+ | == Heteroskedasticity Robust F-tests == | ||
+ | |||
+ | To illustrate robust F-tests, we shall basically replicate the example from the [[Regression_Inference_in_R#F-test_3_ways|standard inference section]]. We first estimate a somewhat larger regression model. | ||
+ | |||
+ | reg_ex2 <- lm(lwage~exper+log(huswage)+age+educ,data=mydata) | ||
+ | reg_ex2_sm <- summary(reg_ex2) | ||
+ | |||
+ | and now we want to test whether the inclusion of the extra two variables age and educ is statistically significant. In the [[Regression_Inference_in_R#F-test_3_ways|standard inference section]] we learned that one way to do that is by means of the following command | ||
+ | |||
+ | > waldtest(reg_ex2, .~. - age - educ) | ||
+ | |||
+ | which delivered | ||
+ | |||
+ | Wald test | ||
+ | |||
+ | Model 1: lwage ~ exper + log(huswage) + age + educ | ||
+ | Model 2: lwage ~ exper + log(huswage) | ||
+ | Res.Df Df F Pr(>F) | ||
+ | 1 423 | ||
+ | 2 425 -2 24.741 6.895e-11 *** | ||
+ | --- | ||
+ | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 | ||
+ | |||
+ | But this procedure assumed that the error terms were homoskedastic. If you want to allow for for heteroskedastic error terms you merely have to add <span style="color:#blue">another input</span> to the <code>waldtest</code> function call. | ||
+ | |||
+ | > waldtest(reg_ex2, .~. - age - educ, <span style="color:blue">vcov=vcovHC</span>) | ||
+ | Wald test | ||
+ | |||
+ | Model 1: lwage ~ exper + log(huswage) + age + educ | ||
+ | Model 2: lwage ~ exper + log(huswage) | ||
+ | Res.Df Df F Pr(>F) | ||
+ | 1 423 | ||
+ | 2 425 -2 28.854 1.791e-12 *** | ||
+ | --- | ||
+ | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 | ||
+ | |||
+ | The input <code><span style="color:blue">vcov=vcovHC</span></code> instructs R to use a robust version of the variance covariance matrix. | ||
+ | |||
+ | As you can see it produces slightly different results, although there is no change in the substantial conclusion that you should not omit these two variables as the null hypothesis that both are irrelevant is soundly rejected. | ||
+ | |||
+ | If you prefer the <code>lht</code> function to perform F-tests, you can calculate robust F-tests by adding the argument <code>white.adjust = TRUE</code> to your function call. | ||
== Footnotes == | == Footnotes == | ||
<references /> | <references /> |
Latest revision as of 14:35, 26 August 2015
Here we briefly discuss how to estimate robust standard errors for linear regression models
Contents
Which package to use
There are a number of pieces of code available to facilitate this task[1]. Here I recommend to use the "sandwich" package. Which has the most comprehensive robust standard error options I am aware of.
As described in more detail in R_Packages you should install the package the first time you use it on a particular computer:
install.packages("sandwich")
and then call the package at the beginning of your script into the library:
library(sandwich)
All code snippets below assume that you have done so. In fact, you may instead want to use another package called "AER" which contains the sandwich package and other relevant packaes (such as the one used for instrumental variables estimation IV_in_R). See the relevant CRAN webpage
Heteroskedasticity robust standard errors
I assume that you know that the presence of heteroskedastic standard errors renders OLS estimators of linear regression models inefficient (although they remain unbiased). More seriously, however, they also imply that the usual standard errors that are computed for your coefficient estimates (e.g. when you use the summary()
command as discussed in R_Regression), are incorrect (or sometimes we call them biased). This implies that inference based on these standard errors will be incorrect (incorrectly sized). What we need are coefficient estimate standard errors that are correct even when regression error terms are heteroskedastic, sometimes called White standard errors.
Let's assume that you have calculated a regression (as in R_Regression):
# Run a regression
reg_ex1 <- lm(lwage~exper+log(huswage),data=mydata)
The function from the "sandwich" package that you want to use is called vcovHC()
and you use it as follows:
vcv <- vcovHC(reg_ex1, type = "HC1")
This saves the heteroscedastic robust standard error in vcv
[2]. Now you can calculate robust t-tests by using the estimated coefficients and the new standard errors (square roots of the diagonal elements on vcv
). But note that inference using these standard errors is only valid for sufficiently large sample sizes (asymptotically normally distributed t-tests).
You may actually want a neat way to see the standard errors, rather than having to calculate the square roots of the diagonal of this matrix. This is done with the following function (this is part of the lmtest package which will be automatically installed if you installed the AER package as recommended above):
coeftest(reg_ex1, vcv)
if you already calculated vcv
. Try it out and you will find the regression coefficients along with their new standard errors, t-stats and p-values. If not, you may as well use this line
coeftest(reg_ex1, vcov = vcovHC(reg_ex1,type="HC1"))
which incorporates the call to the vcovHC
function.
Autocorrelation and heteroskedasticity robust standard errors
When the error terms are autocorrelated (and potentially heteroskedastic) all of the above applies and we need to use yet another estimator for the coefficient estimate standard errors, sometimes called the Newey-West estimators.
The function from the "sandwich" package that you want to use is called vcovHAC()
and you use it as follows:
vcv <- vcovHAC(reg_ex1)
Everything is as for heteroskedastic error terms. I.e. you would print these standard errors along with the coefficient estimates, t-statistics and p-values from:
coeftest(reg_ex1, vcov = vcovHAC(reg_ex1))
Heteroskedasticity Robust F-tests
To illustrate robust F-tests, we shall basically replicate the example from the standard inference section. We first estimate a somewhat larger regression model.
reg_ex2 <- lm(lwage~exper+log(huswage)+age+educ,data=mydata) reg_ex2_sm <- summary(reg_ex2)
and now we want to test whether the inclusion of the extra two variables age and educ is statistically significant. In the standard inference section we learned that one way to do that is by means of the following command
> waldtest(reg_ex2, .~. - age - educ)
which delivered
Wald test
Model 1: lwage ~ exper + log(huswage) + age + educ Model 2: lwage ~ exper + log(huswage) Res.Df Df F Pr(>F) 1 423 2 425 -2 24.741 6.895e-11 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
But this procedure assumed that the error terms were homoskedastic. If you want to allow for for heteroskedastic error terms you merely have to add another input to the waldtest
function call.
> waldtest(reg_ex2, .~. - age - educ, vcov=vcovHC)
Wald test
Model 1: lwage ~ exper + log(huswage) + age + educ
Model 2: lwage ~ exper + log(huswage)
Res.Df Df F Pr(>F)
1 423
2 425 -2 28.854 1.791e-12 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The input vcov=vcovHC
instructs R to use a robust version of the variance covariance matrix.
As you can see it produces slightly different results, although there is no change in the substantial conclusion that you should not omit these two variables as the null hypothesis that both are irrelevant is soundly rejected.
If you prefer the lht
function to perform F-tests, you can calculate robust F-tests by adding the argument white.adjust = TRUE
to your function call.