# R Asymptotics

## Contents

# OLS Estimator asymptotic distribution

Here we will demonstrate that OLS estimators (assuming Gauss-Markov assumptions hold) are random variables that are normally distributed with a variance that depends on the sample size..

We will first generate a large dataset (`Npop <- 100000`

) with [math](y_i,x_i)[/math] pairs of observations. The true relationship between the [math]x_i[/math] and the [math]Y_i[/math] is

[math]y_i = 0.5 + 1.5 x_i + u_i[/math]

Then we will randomly draw `Q`

samples (of sample size `Nsamp1`

) from that population and estimate the regresison coefficients by OLS. Finally we will investigate the properties of the resulting distribution for the estimated [math]\hat{\beta}[/math].

Of cours eyou should note that this is a rather artificial situation. In practice you will have one sample only and you will not know what the true coefficient is.

# Generate the population

Npop <- 100000 # Population Size u <- rnorm(Npop) # true error terms beta <- matrix(c(0.5, 1.5),2,1) # true parameters Xpop <- matrix(1,Npop,2) # initialise population X Xpop[,2] <- floor(runif(Npop)*12+8) # exp var uniform r.v. in [8,19] Ypop <- Xpop %*% beta + u # %*% is matrix multiplication

Now we have large vectors of `Ypop`

and `Xpop`

which contain the population data.

# Prepare the sampling

We set up the number of samples to draw and the (three different) sample sizes

Nsamp1 <- 1000 # Sample Size Nsamp2 <- 10000 # Sample Size Nsamp3 <- 100000 # Sample Size Q <- 600 # number of samples taken

Now we prepare a [math](Q \times 3)[/math] matrix into which we will save the estimated slope coefficients (initially it will be filled with zeros). We have 3 columns for the three different sample sizes.

save_beta1 <- matrix(0,Q,3)

# Now draw the samples

Now we are repeating something `Q`

times. We use a `for`

loop for this.

for (i in 1:Q ) { sel1 <- ceiling(runif(Nsamp1)*Npop) # select Nsamp1 observations Ysel <- Ypop[sel1,] Xsel <- Xpop[sel1,] reg_sel <- lm(Ysel~Xsel[,2]) # estimate regression in sample save_beta1[i,1] <- reg_sel$coefficients[2] # save estimated beta_1 sel2 <- ceiling(runif(Nsamp2)*Npop) # select Nsamp2 observations Ysel <- Ypop[sel2,] Xsel <- Xpop[sel2,] reg_sel <- lm(Ysel~Xsel[,2]) # estimate regression in sample save_beta1[i,2] <- reg_sel$coefficients[2] # save estimated beta_1 sel3 <- ceiling(runif(Nsamp3)*Npop) # select Nsamp3 observations Ysel <- Ypop[sel3,] Xsel <- Xpop[sel3,] reg_sel <- lm(Ysel~Xsel[,2]) # estimate regression in sample save_beta1[i,3] <- reg_sel$coefficients[2] # save estimated beta_1 }

# Analyse the distribution for estimated coefficients

Now we have filled the vector `save_beta`

with estimated coefficients for theree different sample sizes. Let's analyse them. We will plot kernel density estimates (think smoothed histograms) for the three distributions (one for each sample size). There is really no need for you to try and understand the details below, just look at the picture.

# Kernel Density Estimates (smooth histograms) d1 <- density(save_beta1[,1]) d2 <- density(save_beta1[,2]) d3 <- density(save_beta1[,3]) # Plots colors <- rainbow(3) plot(range(c(1.35,1.65)), range(c(0,max(d3$y)+0.05)), type="n", xlab="beta", ylab="Density") lines(d1$x,d1$y, col = colors[1], lwd=3) lines(d2$x,d2$y, col = colors[2], lwd=3) lines(d3$x,d3$y, col = colors[3], lwd=3) title(main = "Distribution of estimated Parameters") legend("topleft", inset=.0, title="Sample Size", c("Nsamp1","Nsamp2","Nsamp3"), fill=rainbow(3), bty = "n")