Difference between revisions of "Function"

From ECLR
Jump to: navigation, search
(How are functions used?)
(OLSest in detail)
Line 89: Line 89:
 
     ym    = y - mean(y);
 
     ym    = y - mean(y);
 
     r2    = 1 - (res'*res)/(ym'*ym);
 
     r2    = 1 - (res'*res)/(ym'*ym);
     }</source>
+
     </source>
 
The commands in this section calculate the residuals (<source enclose=none>res</source>), the residual sum of squares (<source enclose=none>rss</source>), the coefficient estimates standard error (<source enclose=none>bse</source>) and the regressionś <math>R^2</math> (<source enclose=none>r2</source>). Again we refer to standard econometric textbooks for the details of these calculations.<br />
 
The commands in this section calculate the residuals (<source enclose=none>res</source>), the residual sum of squares (<source enclose=none>rss</source>), the coefficient estimates standard error (<source enclose=none>bse</source>) and the regressionś <math>R^2</math> (<source enclose=none>r2</source>). Again we refer to standard econometric textbooks for the details of these calculations.<br />
  

Revision as of 10:00, 30 May 2013

Overview

Functions are an essential toolkit in every programming language. They are used to "outsource" a piece of code that is so generic that it may be reused on a number of occasions. For it to be able to be reused it is written such that the things that may change (i.e. different datasets) are treated in a way that makes them easy to change.
In fact, a good analogy is a drinks vending machine. The box or machine (or in our language, function) hides a large number of things (computer, mechanics etc.) from the eyes of the user. All the user does is to provide some input (money and choice of drink), then the machine does its stuff, and eventually delivers some output, hopefully an ice-cold can of your favorite softdrink.
Here we will do exactly the same. We will write a bit of code that does something useful (in our case it will calculate an OLS regression). To do that it will require the user to provide some input. The function will do its work and deliver back some output.

Econometric Background

This is not the place to review the Econometric Theory in detail, but to make the context clearer consider that we are concerned with estimating a regression model

[math]\label{OLSModel} \mathbf{y}=\mathbf{X} \mathbf{\beta} + \mathbf{\epsilon}[/math]

where [math]\mathbf{y}[/math] is a [math](n \times 1)[/math] vector that contains all [math]n[/math] observations for the dependent variable and [math]\mathbf{X}[/math] is a [math](n \times k)[/math] that contains all explanatory variables. The [math](k \times 1)[/math] vector [math]\mathbf{\beta}[/math] represents the unobserved population coefficient vector and [math]\mathbf{\epsilon}[/math] is a [math](n \times 1)[/math] vector of unobserved error terms. The OLS estimator for the unknown parameter vector is of course

[math]\label{OLSest} \widehat{\mathbf{\beta}} = (\mathbf{X}'\mathbf{X})^{-1} \mathbf{X}'\mathbf{y}[/math]

You will also recall that useful associated statistics to such a regression are the standard errors of [math]\widehat{\mathbf{\beta}}[/math], the residual sum of squares and the [math]R^2[/math], all of which you can review in the Econometrics textbook of your choice.

Function structure

In general a function will look like this:

    function [out1,out2,...] = FunctionName(in1,in2,...)
    programming commands;
    ...
    programming commands;
    end

We have a set of input variables (in1, in2, etc.) which will be used in a set of calculations (programming commands). These calculations will use the input variables to calculate some outputs (out1, out2, etc.). The function then hands back the values for these variables (out1,out2,...) such that they can be used later in any subsequent calculations. More on this here. Where should you save this code? It is best to save this code in a m-file that has exactly the same name as the function, i.e. FunctionName.m. If you do that you can call that function from any other code, which is very useful for a OLS function which we are about to write, as that will be used very often.

So, before we continue we need to specify what the inputs ought to be and what outputs we should expect from our function. The input that is required to estimate a regression is the following:

  • A vector that contains all observations for the dependent variable, [math]\mathbf{y}[/math]
  • A matrix, [math]\mathbf{X}[/math] that contains all explanatory variables in the columns. This matrix should have the same number of rows as [math]\mathbf{y}[/math].
  • (optional) A variable that indicates whether we want the regression output printed into the MATLAB command window or not.

The function will then estimate a regression and deliver some output. It is of course in the hands of the programmer (that is you!) to determine what regression outputs you want. For the sake of this exercise we shall deliver the following

  • [math]\widehat{\mathbf{\beta}}[/math] (or b below), the vector containing the estimated regression coefficients.
  • [math]s_{\widehat{\mathbf{\beta}}}[/math] a [math](k \times 1)[/math] vector with the estimated OLS standard errors for [math]\widehat{\mathbf{\beta}}[/math] (bse below).
  • [math]\widehat{\mathbf{\epsilon}}[/math], the [math](n \times 1)[/math] vector of estimated regression residuals (res below).
  • [math]n[/math], the number of observations used.
  • [math]RSS[/math], the residual sum of squares.
  • [math]R^2[/math] (or below r2).

Refer to Section [FuncUse] to see how we use this function from another bit of code.

OLSest in detail

You can obtain the complete function code from here. With the above list of in-and outputs we know that our function (which we will call OLSest) will have the following architecture:

    function [b,bse,res,n,rss,r2] = OLSest(y,x,output)
       programming commands;
       ...
       programming commands};
    end

We will now discuss the core of the function, the programming commands that transform the input variables

    function [b,bse,res,n,rss,r2] = OLSest(y,x,output);
    % This function performs an OLS estimation
    % input vars:    y, vector with dependent variable
    %                x, matrix with explanatory variable
    %                   function will automatically add a constant if the first col
    %                   is not a vector  of ones
    %                output, 1 = printed output
    % output vars:   b, estimated parameters
    %                bse, standard errors for bhat
    %                res, estimated residuals
    %                n, number of observations used
    %                rss, residual sum of squares
    %                r2, Rsquared

All lines beginning with a % are comment lines and you should make it a habit to describe every function at the beginning and to outline what the required input and the output variables are. This is extremely important to facilitate the re-use of your function. Just imagine you have written a piece of code a year ago and you want to re-use it now. You will be extremely grateful for any explanation! Also, if you type help OLSest or doc OLSest the block of comments you wrote directly after the function header will show in the MATLAB command window.

    [n,k] = size(x);
    xxi   = inv(x'*x);
    b     = xxi*x'*y;

These commands establish the dimensions of [math]\mathbf{X}[/math], and use formula ([OLSest]) to estimate the OLS coefficients which are then stored in b. Note that [math](\mathbf{X}'\mathbf{X})^{-1}[/math] is saved in xxi as it will be used later (in the calculation of [math]s_{\widehat{\mathbf{\beta}}}[/math]) and as inverting big matrices is computing intensive we will want to avoid having to do this twice. So saving the result and re-using it is an efficient way to use the computer’s limited resources.

    res   = y - x*b;
    rss   = res'*res;
    ssq   = rss/(n-k);
    s     = sqrt(ssq);
    bse   = ssq*xxi;
    bse   = sqrt(diag(bse));
    ym    = y - mean(y);
    r2    = 1 - (res'*res)/(ym'*ym);

The commands in this section calculate the residuals (res), the residual sum of squares (rss), the coefficient estimates standard error (bse) and the regressionś [math]R^2[/math] (r2). Again we refer to standard econometric textbooks for the details of these calculations.


    if output
    fprintf('===========================================================\n');
    fprintf('===== Regression Output  ==================================\n');
    fprintf('Obs used = \%4.0f, missing obs = \%4.0f \n',n,(ninit-n));
    fprintf('Rsquared = \%5.4f \n',r2);
    fprintf('===== Estimated Model Parameters ==========================\n');
    fprintf('=   Par       se(Par)  ==================\n');
    format short;
    disp([b bse]);
    fprintf('===== Model Statistics ====================================\n');
    fprintf(' standard error = \%5.4f\n',sqrt(ssq));
    fprintf('RSS = \%5.4f \n',rss);
    fprintf('===========================================================\n');
    end

This section of the code is only activated if the third input variable output is true or equal to 1. In this way the user can control whether she wants this bit printed (likely if you are only performing a single regression) or not (likely if you are estimating many regressions in some bigger procedure). This bit also contains a number of commands (like format and fprintf) which may be unknown to you at this stage, but are useful when printing results to the screen. Use the MATLAB help function for some more guidance (type doc format into the command window and the relevant documantation will open).

The actual example OLSest function in the file OLSest.m is a somewhat expanded version of this as it also calculates t-statistics, p-values and Durbin-Watson test statistics. But all these extra stats only appear in the output (if output = 1). It also checks whether the input matrix X contains a column of constants (and if not adds one) and checks for missing observations.

How are functions used?

So far we have described how to write a function and we understand that it is the equivalent of a drinks machine ready to be used (receiving some inputs and handing back outputs). The question remains how to use it. The best way to use them is to save the function into a new mfile (*.m) that has the same file as the function name (here OLSest.m).


Having done that you can use that function (say in a MATLAB script, an example of which you can find here) as demonstrated in the following code extract:

        depvar = ...; \% a vector which contains the dependent variable
        expvar = ...; \% a matrix that contains all explanatory variables in columns,
                          should include a columns of 1s for constant
        [bhat,bhatse,resids,obs,resss,rsq] = OLSest(depvar,expvar,0);
        disp(bhat)

As you can see here all input and output variables have names different to those used in the code of the function itself. Let’s take the first input variable depvar which contains a vector with the dependent variable. In this script this vector is known as depvar. In this function call it is handed over to the function and there it adopts the name y as that is the name given to the first input variable into the function OLSest. Inside the function the variable depvar is actually unknown. Also note that the third input variable was given the value 0. By doing so we ensure that the function does not print the regression output. The first output variable is given the name bhat in the above script. The function OLSest itself actually does not know that variable. However, it did calculate the OLS parameter estimate, saved it (inside the function) as b and then handed this value back as the first output variable. Here in the script file this is then known as bhat and you can continue using that value.

Numerical test example

If you use the data in the OLSexample.xls spreadsheet (column 1 as dependent variable and columns 2 to 4 as explanatory variables (don’t forget to include a vector of ones as constant) you should obtain the following OLS parameter estimate:

[math]\label{OLSest2} \widehat{\mathbf{\beta}}=\left( \begin{array}{c} 0.3983 \\ 1.0574 \\ -1.9973 \\ 0.4953 \\ \end{array} \right)[/math]

where the first element is the estimated constant and the remaining parameters the OLS estimates associated with the variables in columns two to four respectively.