Difference between revisions of "IV"
(Created page with "= Introduction = In this Section we will demonstrate how to use instrumental variables (IV) estimation to estimate the parameters in a linear regression model. The material w...") |
(No difference)
|
Revision as of 17:55, 3 December 2012
Contents
Introduction
In this Section we will demonstrate how to use instrumental variables (IV) estimation to estimate the parameters in a linear regression model. The material will follow the notation in the Heij et al. textbook[1].
[math]\mathbf{y}=\mathbf{X\beta }+\mathbf{\varepsilon }[/math]
The issue is that we may suspect (or know) that the explanatory variable is correlated with the (unobserved) error term
[math]p\lim \left( \frac{1}{n}\mathbf{X}^{\prime }\mathbf{\varepsilon }\right) \neq 0.[/math]
Reasons for such a situation include measurement error in [math]x[/math], endogenous explanatory variables, omitted relevant variables or a combination of the above. The consequence is that the OLS parameter estimate of [math]\mathbf{\beta}[/math] is biased and inconsistent. Fortunately it is well established that an IV estimation of [math]\mathbf{\beta}[/math] can potentially deliver consistent parameter estimates. This does, however, require the availability of sufficient instruments [math]\mathbf{Z}[/math].
Before continuing it is advisable to be clear about the dimensions of certain variables. Let’s assume that [math]\mathbf{y}[/math] is a [math](n \times 1)[/math] vector containing the [math]n[/math] observations for the dependent variable. [math]\mathbf{X}[/math] is a [math](n \times k)[/math] matrix with the [math]k[/math] explanatory variables in the columns, usually containing a vector of 1s in the first column, representing a regression constant. Now, let [math]\mathbf{Z}[/math] be a [math](n \times p)[/math] matrix with instruments. Importantly, [math]p \ge k[/math], and further [math]\mathbf{X}[/math] and [math]\mathbf{Z}[/math] may have columns in common. If so, these are explanatory variables from [math]\mathbf{X}[/math] that are judged to be certainly uncorrelated with the error term (like the constant).
It is well established that the instrumental variables in [math]\mathbf{Z}[/math] need to meet certain restrictions in order to deliver useful IV estimators of [math]\mathbf{\beta}[/math]. They need to be uncorrelated to the error terms and they need to be correlated with the explanatory variables in [math]\mathbf{X}[/math] that are deemed to be endogenous (related to the error term). Further they should have no relevance for the dependent variable, other than through its relation to the potentially endogenous variable (exclusion assumption).
IV estimator
It is well established that the IV estimator can be estimated as follows
[math]\mathbf{\widehat{\beta}}_{IV} = \left(\mathbf{X}'\mathbf{P}_Z \mathbf{X}\right)^{-1} \mathbf{X}'\mathbf{P}_Z \mathbf{y}[/math]
where [math]\mathbf{P}_Z[/math] is the projection matrix of [math]\mathbf{Z}[/math]. When performing inference the Variance-Covariance matrix of [math]\mathbf{\widehat{\beta}}_{IV}[/math] is of obvious interest and it is calculated as follows
[math]Var\left(\mathbf{\widehat{\beta}}_{IV} \right) = \sigma ^{2}\left( \mathbf{X}^{\prime }\mathbf{P}_{Z}\mathbf{X}\right)^{-1}[/math]
where the estimate for the error variance comes from
[math]\begin{aligned} s_{IV}^{2} &=&\frac{1}{n-k}\widehat{\mathbf{\varepsilon }}_{IV}^{\prime }% \widehat{\mathbf{\varepsilon }}_{IV} \\ &=&\frac{1}{n-k}\left( \mathbf{y-X}\widehat{\mathbf{\beta }}_{IV}\right) ^{\prime }\left( \mathbf{y-X}\widehat{\mathbf{\beta }}_{IV}\right)\end{aligned}[/math]
MATLAB implementation
The following code extract assumes that the vector y
contains the [math](n \times 1)[/math] vector with the dependent variable, the [math](n \times k)[/math] matrix x
contains all explanatory variables and z
is a [math](n \times p)[/math] matrix [math](p ge k)[/math] with instruments.
pz = z*inv(z'*z)*z'; % Projection matrix
xpzxi = inv(x'*pz*x); % this is also (Xhat'Xhat)^(-1)
biv = xpzxi*x'*pz*y; % IV estimate
res = y - x*biv; % IV residuals
ssq = res'*res/(n-k); % Sample variance for IV residuals
s = sqrt(ssq); % Sample Standard deviation for IV res
bse = ssq*xpzxi; % Variance covariance matrix for IV estimates
bse = sqrt(diag(bse)); % Extract diagonal and take square root -> standard errors for IV estimators
One feature of IV estimations is that in general it is an inferior estimator of [math]\mathbf{\beta}[/math] if all explanatory variables are exogenous. In that case, assuming that all other Gauss-Markov assumptions are met, the OLS estimator is the BLUE estimator. In other words, IV estimators have larger standard errors for the coefficient estimates. Therefore, one would really like to avoid having to rely on IV estimators, unless, of course, they are the only estimators that deliver consistent estimates.
For this reason any application of IV, should be accompanied by evidence that establishes that it was necessary. Once that is established, one should also establish that the instruments chosen meet the necessary requirements (of being correlated with the endogenous variable and being exogenous to the regression error term).
Testing exogeneity
The null hypothesis to be tested here is whether
[math]p\lim \left( \frac{1}{n}\mathbf{X}^{\prime }\mathbf{\varepsilon }\right) \neq 0.[/math]
and therefore whether an IV estimation is required or no. The procedure described is as in Heij et al.. It consists of the following three steps.
Estimate [math]\mathbf{y}=\mathbf{X\beta }+\mathbf{\varepsilon}[/math] by OLS and save the residuals [math]\widehat{\mathbf{\varepsilon}}[/math].
Estimate
[math]\mathbf{x}_{j}=\mathbf{Z\gamma }_{j}\mathbf{+v}_{j}[/math]
by OLS for all [math]\widetilde{k}[/math] elements in [math]\mathbf{X}[/math] that are possibly endogenous and save [math]\widehat{\mathbf{v}}_{j}[/math]. Collect these in the [math]\left( n\times \widetilde{k}\right) [/math] matrix [math]\widehat{\mathbf{V}}[/math].
Estimate the auxilliary regression
[math]\widehat{\mathbf{\varepsilon }}=\mathbf{X\delta }_{0}+\widehat{\mathbf{V}}% \mathbf{\delta }_{1}+\mathbf{u}[/math]
and test the following hypothesis
[math]\begin{aligned} H_{0} &:&\mathbf{\delta }_{1}=0~~\mathbf{X}\text{ is exogenous} \\ H_{A} &:&\mathbf{\delta }_{1}\neq 0~~\mathbf{X}\text{ is endogenous} \end{aligned}[/math]
using the usual test statistic [math]\chi ^{2}=nR^{2}[/math] which, under [math]H_{0}[/math], is [math] \chi ^{2}\left( \widetilde{k}\right) [/math] distributed.
Implementing this test does not require anything else but the application of OLS regressions. In the following excerpt we assume that the dependent variable is contained in vector y
, the elements in [math]X[/math] that are assumed to be exogenous are contained in x1
, those elements that are suspected that they may be endogenous are in x2
and the instrument matrix is saved in z
. As before, it is assumed that z
should contain all elements of x1
.
The code also uses the OLSest function for the step 3 regression. However, that could easily be circumvented as for the regressions in Step 1 and 2.
x = [x1 x2]; % Combine to one matrix x
xxi = inv(x'*x);
b = xxi*x'*y; % Step 1: OLS estimator
res = y - x*b; % Step 1: saved residuals
zzi = inv(z'*z); % Step 2: inv(Z'Z) which is used in Step 2
gam = zzi*z'*x2; % Step 2: Estimate OLS coefficients of step 2 regressions
% This works even if we have more than one element in x2
% we get as many columns of gam as we have elements in x2
vhat = x2 - z*gam; % Step 2: residuals (has as many columns as in x2
[b,bse,res,n,rss,r2] = OLSest(res,[x vhat],0); % Step 3 regression
teststat = size(res,1)*r2; % Step 3: Calculate nR^2 test stat
pval = 1 - chi2cdf(teststat,size(x2,2)); % Step 3: Calculate p-value
Footnotes
- ↑ Heij C, de Boer P., Franses P.H., Kloek T. and van Dijk H.K (2004) Econometric Methods with Applications in Business and Economics, Oxford University Press, New York 11?s=books&ie=UTF8&qid=1354473313&sr=1-1. This is an all-round good textbook that uses matrix multiplication