OLS Estimator unbiasedness

Here we will demonstrate that OLS estimators (assuming Gauss-Markov assumptions hold) are unbiased estimators.

We will first generate a large dataset (Npop <- 100000) with $(y_i,x_i)$ pairs of observations. The true relationship between the $x_i$ and the $Y_i$ is

$y_i = 0.5 + 1.5 x_i + u_i$

Then we will randomly draw Q samples (of sample size Nsamp1) from that population and estimate the regresison coefficients by OLS. Finally we will investigate the properties of the resulting distribution for the estimated $\hat{\beta}$.

Of cours eyou should note that this is a rather artificial situation. In practice you will have one sample only and you will not know what the true coefficient is.

Generate the population

   Npop <- 100000  # Population Size
u <- rnorm(Npop)      # true error terms
beta <- matrix(c(0.5, 1.5),2,1)   # true parameters
Xpop <- matrix(1,Npop,2) # initialise population X
Xpop[,2] <- floor(runif(Npop)*12+8) # exp var uniform r.v. in [8,19]
Ypop <- Xpop %*% beta + u    # %*% is matrix multiplication

Now we have large vectors of Ypop and Xpop which contain the population data.

Draw the samples

We set up the number of samples to draw and the sample size

Nsamp1 <- 100    # Sample Size
Q <- 300  # number of samples taken

Now we prepare a $(Q \times 1)$ matrix into which we will save the estimated slope coefficients (initially it will be filled with zeros).

save_beta1 <- matrix(0,Q,1)

Now draw the samples

Now we are repeating something Q times. We use a for loop for this.

   for (i in 1:Q ) {
sel1 <- ceiling(runif(Nsamp1)*Npop)  # select Nsamp1 observations
Ysel <- Ypop[sel1,]   # draws the Y values
Xsel <- Xpop[sel1,]   # draws the X values
reg_sel <- lm(Ysel~Xsel[,2])   # estimate regression in sample
save_beta1[i,1] <- reg_sel\$coefficients[2]  # save estimated beta_1

}

Analyse the distribution for estimated coefficients

Now we have filled the vector save_beta with estimated coefficients. Let's analyse them.

summary(save_beta1)
##        V1
##  Min.   :1.424
##  1st Qu.:1.481
##  Median :1.498
##  Mean   :1.498
##  3rd Qu.:1.516
##  Max.   :1.564

As you can see the estimated coefficients come in a considerable range (compare the max and min values). But on average the values are very close to the correct value of 1.5.

Let's look at a distribution, which is centered around 1.5.

hist(save_beta1,breaks = 15, col = "blue",plot = TRUE)