Regression Inference in R
here we will discuss how to perform standard inference in regression models.
Setup
We continue the example we started in R_Regression#A first example and which is replicated here:
# This is my first R regression! setwd("T:/ECLR/R/FirstSteps") # This sets the working directory mydata <- read.csv("mroz.csv") # Opens mroz.csv from working directory # Now convert variables with "." to num with NA mydata$
wage <- as.numeric(as.character(mydata$
wage)) mydata$
lwage <- as.numeric(as.character(mydata$
lwage))
Before we run our initial regression model we shall restrict the dataframe mydata
to those data that do not have missing wage information, using the following subset
command:
mydata <- subset(mydata, wage!="NA") # select non NA data
Now we can run our initial regression:
# Run a regression reg_ex1 <- lm(lwage~exper+log(huswage),data=mydata) reg_ex1_sm <- summary(reg_ex1)
We will introduce inference in this model.
t-tests
We use t-tests to test simple coefficient restrictions on regression coefficients.
F-tests
F-tests are used to test multiple coefficient restrictions on regression coefficients.
Let's say we are interested whether two additional variables age
and educ
should be included into the model. As a good econometrics student, or even master, you know that to calculate a F-test you need residual sum of squares from a restricted model (that is model reg_ex1
) and an unrestricted model. The latter we estimate here:
reg_ex2 <- lm(lwage~exper+log(huswage)+age+educ,data=mydata) reg_ex2_sm <- summary(reg_ex2)
Calculating the F-test is now very easy. We use the function anova
:
print(anova(reg_ex1,reg_ex2))
which delivers the following output:
Analysis of Variance Table Model 1: lwage ~ exper + log(huswage) Model 2: lwage ~ exper + log(huswage) + age + educ Res.Df RSS Df Sum of Sq F Pr(>F) 1 425 210.11 2 423 188.10 2 22.004 24.741 6.895e-11 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The table at the heart of this output delivers the individual residual sum of squares, the F-test statstic and its p-value. The p-value is extremely small which would lead us to reject the null hypothesis, concluding that at least one of age
or educ
was significant. If you look at the regression output of reg_ex2
you will see that it is the education variable.