R Asymptotics
Contents
OLS Estimator asymptotic distribution
Here we will demonstrate that OLS estimators (assuming Gauss-Markov assumptions hold) are random variables that are normally distributed with a variance that depends on the sample size..
We will first generate a large dataset (Npop <- 100000
) with [math](y_i,x_i)[/math] pairs of observations. The true relationship between the [math]x_i[/math] and the [math]Y_i[/math] is
[math]y_i = 0.5 + 1.5 x_i + u_i[/math]
Then we will randomly draw Q
samples (of sample size Nsamp1
) from that population and estimate the regresison coefficients by OLS. Finally we will investigate the properties of the resulting distribution for the estimated [math]\hat{\beta}[/math].
Of cours eyou should note that this is a rather artificial situation. In practice you will have one sample only and you will not know what the true coefficient is.
Generate the population
Npop <- 100000 # Population Size u <- rnorm(Npop) # true error terms beta <- matrix(c(0.5, 1.5),2,1) # true parameters Xpop <- matrix(1,Npop,2) # initialise population X Xpop[,2] <- floor(runif(Npop)*12+8) # exp var uniform r.v. in [8,19] Ypop <- Xpop %*% beta + u # %*% is matrix multiplication
Now we have large vectors of Ypop
and Xpop
which contain the population data.
Prepare the sampling
We set up the number of samples to draw and the (three different) sample sizes
Nsamp1 <- 1000 # Sample Size Nsamp2 <- 10000 # Sample Size Nsamp3 <- 100000 # Sample Size Q <- 600 # number of samples taken
Now we prepare a [math](Q \times 3)[/math] matrix into which we will save the estimated slope coefficients (initially it will be filled with zeros). We have 3 columns for the three different sample sizes.
save_beta1 <- matrix(0,Q,3)
Now draw the samples
Now we are repeating something Q
times. We use a for
loop for this.
for (i in 1:Q ) { sel1 <- ceiling(runif(Nsamp1)*Npop) # select Nsamp1 observations Ysel <- Ypop[sel1,] Xsel <- Xpop[sel1,] reg_sel <- lm(Ysel~Xsel[,2]) # estimate regression in sample save_beta1[i,1] <- reg_sel$coefficients[2] # save estimated beta_1 sel2 <- ceiling(runif(Nsamp2)*Npop) # select Nsamp2 observations Ysel <- Ypop[sel2,] Xsel <- Xpop[sel2,] reg_sel <- lm(Ysel~Xsel[,2]) # estimate regression in sample save_beta1[i,2] <- reg_sel$coefficients[2] # save estimated beta_1 sel3 <- ceiling(runif(Nsamp3)*Npop) # select Nsamp3 observations Ysel <- Ypop[sel3,] Xsel <- Xpop[sel3,] reg_sel <- lm(Ysel~Xsel[,2]) # estimate regression in sample save_beta1[i,3] <- reg_sel$coefficients[2] # save estimated beta_1 }
Analyse the distribution for estimated coefficients
Now we have filled the vector save_beta
with estimated coefficients for theree different sample sizes. Let's analyse them. We will plot kernel density estimates (think smoothed histograms) for the three distributions (one for each sample size). There is really no need for you to try and understand the details below, just look at the picture.
# Kernel Density Estimates (smooth histograms) d1 <- density(save_beta1[,1]) d2 <- density(save_beta1[,2]) d3 <- density(save_beta1[,3]) # Plots colors <- rainbow(3) plot(range(c(1.35,1.65)), range(c(0,max(d3$y)+0.05)), type="n", xlab="beta", ylab="Density") lines(d1$x,d1$y, col = colors[1], lwd=3) lines(d2$x,d2$y, col = colors[2], lwd=3) lines(d3$x,d3$y, col = colors[3], lwd=3) title(main = "Distribution of estimated Parameters") legend("topleft", inset=.0, title="Sample Size", c("Nsamp1","Nsamp2","Nsamp3"), fill=rainbow(3), bty = "n")