Difference between revisions of "Regression Inference in R"
(→F-tests) |
|||
Line 49: | Line 49: | ||
Res.Df RSS Df Sum of Sq F Pr(>F) | Res.Df RSS Df Sum of Sq F Pr(>F) | ||
1 425 <span style="color:#0000ff">210.11</span> | 1 425 <span style="color:#0000ff">210.11</span> | ||
− | 2 423 <span style="color:#0000ff">188.10</span> 2 22.004 <span style="color:# | + | 2 423 <span style="color:#0000ff">188.10</span> 2 22.004 <span style="color:#0f00f0">24.741</span> <span style="color:#ff0000">6.895e-11</span> *** |
--- | --- | ||
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 | ||
− | The table at the heart of this output delivers the individual <span style="color:#0000ff">residual sum of squares</span>, the <span style="color:# | + | The table at the heart of this output delivers the individual <span style="color:#0000ff">residual sum of squares</span>, the <span style="color:#0f00f0">F-test statstic</span> and its <span style="color:#ff0000">p-value</span> |
Revision as of 12:57, 14 April 2015
here we will discuss how to perform standard inference in regression models.
Setup
We continue the example we started in R_Regression#A first example and which is replicated here:
# This is my first R regression! setwd("T:/ECLR/R/FirstSteps") # This sets the working directory mydata <- read.csv("mroz.csv") # Opens mroz.csv from working directory # Now convert variables with "." to num with NA mydata$
wage <- as.numeric(as.character(mydata$
wage)) mydata$
lwage <- as.numeric(as.character(mydata$
lwage))
Before we run our initial regression model we shall restrict the dataframe mydata
to those data that do not have missing wage information, using the following subset
command:
mydata <- subset(mydata, wage!="NA") # select non NA data
Now we can run our initial regression:
# Run a regression reg_ex1 <- lm(lwage~exper+log(huswage),data=mydata) reg_ex1_sm <- summary(reg_ex1)
We will introduce inference in this model.
t-tests
We use t-tests to test simple coefficient restrictions on regression coefficients.
F-tests
F-tests are used to test multiple coefficient restrictions on regression coefficients.
Let's say we are interested whether two additional variables age
and educ
should be included into the model. As a good econometrics student, or even master, you know that to calculate a F-test you need residual sum of squares from a restricted model (that is model reg_ex1
) and an unrestricted model. The latter we estimate here:
reg_ex2 <- lm(lwage~exper+log(huswage)+age+educ,data=mydata) reg_ex2_sm <- summary(reg_ex2)
Calculating the F-test is now very easy. We use the function anova
:
print(anova(reg_ex1,reg_ex2))
which delivers the following output:
Analysis of Variance Table Model 1: lwage ~ exper + log(huswage) Model 2: lwage ~ exper + log(huswage) + age + educ Res.Df RSS Df Sum of Sq F Pr(>F) 1 425 210.11 2 423 188.10 2 22.004 24.741 6.895e-11 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The table at the heart of this output delivers the individual residual sum of squares, the F-test statstic and its p-value