R Asymptotics

From ECLR
Jump to: navigation, search

OLS Estimator asymptotic distribution

Here we will demonstrate that OLS estimators (assuming Gauss-Markov assumptions hold) are random variables that are normally distributed with a variance that depends on the sample size..

We will first generate a large dataset (Npop <- 100000) with [math](y_i,x_i)[/math] pairs of observations. The true relationship between the [math]x_i[/math] and the [math]Y_i[/math] is

[math]y_i = 0.5 + 1.5 x_i + u_i[/math]

Then we will randomly draw Q samples (of sample size Nsamp1) from that population and estimate the regresison coefficients by OLS. Finally we will investigate the properties of the resulting distribution for the estimated [math]\hat{\beta}[/math].

Of cours eyou should note that this is a rather artificial situation. In practice you will have one sample only and you will not know what the true coefficient is.

Generate the population

Npop <- 100000  # Population Size
u <- rnorm(Npop)      # true error terms
beta <- matrix(c(0.5, 1.5),2,1)   # true parameters
Xpop <- matrix(1,Npop,2) # initialise population X
Xpop[,2] <- floor(runif(Npop)*12+8) # exp var uniform r.v. in [8,19]
Ypop <- Xpop %*% beta + u    # %*% is matrix multiplication

Now we have large vectors of Ypop and Xpop which contain the population data.

Prepare the sampling

We set up the number of samples to draw and the (three different) sample sizes

Nsamp1 <- 1000    # Sample Size
Nsamp2 <- 10000    # Sample Size
Nsamp3 <- 100000    # Sample Size
Q <- 600  # number of samples taken

Now we prepare a [math](Q \times 3)[/math] matrix into which we will save the estimated slope coefficients (initially it will be filled with zeros). We have 3 columns for the three different sample sizes.

save_beta1 <- matrix(0,Q,3)

Now draw the samples

Now we are repeating something Q times. We use a for loop for this.

for (i in 1:Q ) {
     sel1 <- ceiling(runif(Nsamp1)*Npop)  # select Nsamp1 observations
     Ysel <- Ypop[sel1,]    
     Xsel <- Xpop[sel1,]
     reg_sel <- lm(Ysel~Xsel[,2])   # estimate regression in sample
     save_beta1[i,1] <- reg_sel$coefficients[2]  # save estimated beta_1
     
     sel2 <- ceiling(runif(Nsamp2)*Npop)    # select Nsamp2 observations
     Ysel <- Ypop[sel2,]
     Xsel <- Xpop[sel2,]
     reg_sel <- lm(Ysel~Xsel[,2])   # estimate regression in sample
     save_beta1[i,2] <- reg_sel$coefficients[2]  # save estimated beta_1
 
     sel3 <- ceiling(runif(Nsamp3)*Npop)    # select Nsamp3 observations
     Ysel <- Ypop[sel3,]
     Xsel <- Xpop[sel3,]
     reg_sel <- lm(Ysel~Xsel[,2])   # estimate regression in sample
     save_beta1[i,3] <- reg_sel$coefficients[2]  # save estimated beta_1
   }

Analyse the distribution for estimated coefficients

Now we have filled the vector save_beta with estimated coefficients for theree different sample sizes. Let's analyse them. We will plot kernel density estimates (think smoothed histograms) for the three distributions (one for each sample size). There is really no need for you to try and understand the details below, just look at the picture.

 # Kernel Density Estimates (smooth histograms)
   d1 <- density(save_beta1[,1])
   d2 <- density(save_beta1[,2])
   d3 <- density(save_beta1[,3])
   # Plots
   colors <- rainbow(3) 
   plot(range(c(1.35,1.65)), 
        range(c(0,max(d3$y)+0.05)), type="n", xlab="beta", ylab="Density")

   lines(d1$x,d1$y, col = colors[1], lwd=3)
   lines(d2$x,d2$y, col = colors[2], lwd=3)
   lines(d3$x,d3$y, col = colors[3], lwd=3)
   title(main = "Distribution of estimated Parameters")
   legend("topleft", inset=.0, title="Sample Size",
          c("Nsamp1","Nsamp2","Nsamp3"), fill=rainbow(3), bty = "n")
OLSasympdist kernel.jpeg