# R Unbiasedness

## Contents

# OLS Estimator unbiasedness

Here we will demonstrate that OLS estimators (assuming Gauss-Markov assumptions hold) are unbiased estimators.

We will first generate a large dataset (`Npop <- 100000`

) with [math](y_i,x_i)[/math] pairs of observations. The true relationship between the [math]x_i[/math] and the [math]Y_i[/math] is

[math]y_i = 0.5 + 1.5 x_i + u_i[/math]

Then we will randomly draw `Q`

samples (of sample size `Nsamp1`

) from that population and estimate the regresison coefficients by OLS. Finally we will investigate the properties of the resulting distribution for the estimated [math]\hat{\beta}[/math].

Of cours eyou should note that this is a rather artificial situation. In practice you will have one sample only and you will not know what the true coefficient is.

# Generate the population

Npop <- 100000 # Population Size u <- rnorm(Npop) # true error terms beta <- matrix(c(0.5, 1.5),2,1) # true parameters Xpop <- matrix(1,Npop,2) # initialise population X Xpop[,2] <- floor(runif(Npop)*12+8) # exp var uniform r.v. in [8,19] Ypop <- Xpop %*% beta + u # %*% is matrix multiplication

Now we have large vectors of `Ypop`

and `Xpop`

which contain the population data.

# Draw the samples

We set up the number of samples to draw and the sample size

Nsamp1 <- 100 # Sample Size Q <- 300 # number of samples taken

Now we prepare a [math](Q \times 1)[/math] matrix into which we will save the estimated slope coefficients (initially it will be filled with zeros).

save_beta1 <- matrix(0,Q,1)

# Now draw the samples

Now we are repeating something `Q`

times. We use a `for`

loop for this.

for (i in 1:Q ) { sel1 <- ceiling(runif(Nsamp1)*Npop) # select Nsamp1 observations Ysel <- Ypop[sel1,] # draws the Y values Xsel <- Xpop[sel1,] # draws the X values reg_sel <- lm(Ysel~Xsel[,2]) # estimate regression in sample save_beta1[i,1] <- reg_sel$coefficients[2] # save estimated beta_1 }

# Analyse the distribution for estimated coefficients

Now we have filled the vector `save_beta`

with estimated coefficients. Let's analyse them.

summary(save_beta1)

## V1 ## Min. :1.424 ## 1st Qu.:1.481 ## Median :1.498 ## Mean :1.498 ## 3rd Qu.:1.516 ## Max. :1.564

As you can see the estimated coefficients come in a considerable range (compare the max and min values). But on average the values are very close to the correct value of 1.5.

Let's look at a distribution, which is centered around 1.5.

hist(save_beta1,breaks = 15, col = "blue",plot = TRUE)