Difference between revisions of "Regression Inference in R"

From ECLR
Jump to: navigation, search
(F-tests)
(F-tests)
Line 49: Line 49:
 
       Res.Df    RSS Df Sum of Sq      F    Pr(>F)     
 
       Res.Df    RSS Df Sum of Sq      F    Pr(>F)     
 
     1    425 <span style="color:#0000ff">210.11</span>                                   
 
     1    425 <span style="color:#0000ff">210.11</span>                                   
     2    423 <span style="color:#0000ff">188.10</span>  2    22.004 <span style="color:green>24.741</span> <span style="color:#ff0000">6.895e-11</span> ***
+
     2    423 <span style="color:#0000ff">188.10</span>  2    22.004 <span style="color:brown">24.741</span> <span style="color:#ff0000">6.895e-11</span> ***
 
     ---
 
     ---
 
     Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1  
 
     Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1  
  
 
The table at the heart of this output delivers the individual <span style="color:#0000ff">residual sum of squares</span>, the <span style="color:#0000f0">F-test statstic</span> and its <span style="color:#ff0000">p-value</span>
 
The table at the heart of this output delivers the individual <span style="color:#0000ff">residual sum of squares</span>, the <span style="color:#0000f0">F-test statstic</span> and its <span style="color:#ff0000">p-value</span>

Revision as of 12:59, 14 April 2015

here we will discuss how to perform standard inference in regression models.

Setup

We continue the example we started in R_Regression#A first example and which is replicated here:

    # This is my first R regression!
    setwd("T:/ECLR/R/FirstSteps")              # This sets the working directory
    mydata <- read.csv("mroz.csv")  # Opens mroz.csv from working directory
     
    # Now convert variables with "." to num with NA
    mydata$wage <- as.numeric(as.character(mydata$wage))
    mydata$lwage <- as.numeric(as.character(mydata$lwage))

Before we run our initial regression model we shall restrict the dataframe mydata to those data that do not have missing wage information, using the following subset command:

   mydata <- subset(mydata, wage!="NA")  # select non NA data

Now we can run our initial regression:

    # Run a regression
    reg_ex1 <- lm(lwage~exper+log(huswage),data=mydata)
    reg_ex1_sm <- summary(reg_ex1)

We will introduce inference in this model.

t-tests

We use t-tests to test simple coefficient restrictions on regression coefficients.


F-tests

F-tests are used to test multiple coefficient restrictions on regression coefficients.

Let's say we are interested whether two additional variables age and educ should be included into the model. As a good econometrics student, or even master, you know that to calculate a F-test you need residual sum of squares from a restricted model (that is model reg_ex1) and an unrestricted model. The latter we estimate here:

   reg_ex2 <- lm(lwage~exper+log(huswage)+age+educ,data=mydata)
   reg_ex2_sm <- summary(reg_ex2)

Calculating the F-test is now very easy. We use the function anova:

   print(anova(reg_ex1,reg_ex2))

which delivers the following output:

   Analysis of Variance Table
   Model 1: lwage ~ exper + log(huswage)
   Model 2: lwage ~ exper + log(huswage) + age + educ
     Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
   1    425 210.11                                  
   2    423 188.10  2    22.004 24.741 6.895e-11 ***
   ---
   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

The table at the heart of this output delivers the individual residual sum of squares, the F-test statstic and its p-value