Function

From ECLR
Revision as of 11:39, 21 November 2014 by Rb (talk | contribs) (How are functions used?)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Overview

Functions are an essential toolkit in every programming language. They are used to "outsource" a piece of code that is so generic that it may be reused on a number of occasions. For it to be able to be reused it is written such that the things that may change (i.e. different datasets) are treated in a way that makes them easy to change.
In fact, a good analogy is a drinks vending machine. The box or machine (or in our language, function) hides a large number of things (computer, mechanics etc.) from the eyes of the user. All the user does is to provide some input (money and choice of drink), then the machine does its stuff, and eventually delivers some output, hopefully an ice-cold can of your favorite softdrink.
Here we will do exactly the same. We will write a bit of code that does something useful (in our case it will calculate an OLS regression). To do that it will require the user to provide some input. The function will do its work and deliver back some output.

Econometric Background

This is not the place to review the Econometric Theory in detail, but to make the context clearer consider that we are concerned with estimating a regression model

[math]\label{OLSModel} \mathbf{y}=\mathbf{X} \mathbf{\beta} + \mathbf{\epsilon}[/math]

where [math]\mathbf{y}[/math] is a [math](n \times 1)[/math] vector that contains all [math]n[/math] observations for the dependent variable and [math]\mathbf{X}[/math] is a [math](n \times k)[/math] that contains all explanatory variables. The [math](k \times 1)[/math] vector [math]\mathbf{\beta}[/math] represents the unobserved population coefficient vector and [math]\mathbf{\epsilon}[/math] is a [math](n \times 1)[/math] vector of unobserved error terms. The OLS estimator for the unknown parameter vector is of course

[math]\label{OLSest} \widehat{\mathbf{\beta}} = (\mathbf{X}'\mathbf{X})^{-1} \mathbf{X}'\mathbf{y}[/math]

You will also recall that useful associated statistics to such a regression are the standard errors of [math]\widehat{\mathbf{\beta}}[/math], the residual sum of squares and the [math]R^2[/math], all of which you can review in the Econometrics textbook of your choice.

Function structure

In general a function will look like this:

    function [out1,out2,...] = FunctionName(in1,in2,...)
    programming commands;
    ...
    programming commands;
    end

We have a set of input variables (in1, in2, etc.) which will be used in a set of calculations (programming commands). These calculations will use the input variables to calculate some outputs (out1, out2, etc.). The function then hands back the values for these variables (out1,out2,...) such that they can be used later in any subsequent calculations. More on this here. Where should you save this code? It is best to save this code in a m-file that has exactly the same name as the function, i.e. FunctionName.m. If you do that you can call that function from any other code, which is very useful for a OLS function which we are about to write, as that will be used very often.

So, before we continue we need to specify what the inputs ought to be and what outputs we should expect from our function. The input that is required to estimate a regression is the following:

  • A vector that contains all observations for the dependent variable, [math]\mathbf{y}[/math]
  • A matrix, [math]\mathbf{X}[/math] that contains all explanatory variables in the columns. This matrix should have the same number of rows as [math]\mathbf{y}[/math].
  • (optional) A variable that indicates whether we want the regression output printed into the MATLAB command window or not.

The function will then estimate a regression and deliver some output. It is of course in the hands of the programmer (that is you!) to determine what regression outputs you want. For the sake of this exercise we shall deliver the following

  • [math]\widehat{\mathbf{\beta}}[/math] (or b below), the vector containing the estimated regression coefficients.
  • [math]s_{\widehat{\mathbf{\beta}}}[/math] a [math](k \times 1)[/math] vector with the estimated OLS standard errors for [math]\widehat{\mathbf{\beta}}[/math] (bse below).
  • [math]\widehat{\mathbf{\epsilon}}[/math], the [math](n \times 1)[/math] vector of estimated regression residuals (res below).
  • [math]n[/math], the number of observations used.
  • [math]RSS[/math], the residual sum of squares.
  • [math]R^2[/math] (or below r2).

Refer to Section [FuncUse] to see how we use this function from another bit of code.

OLSest in detail

You can obtain the complete function code from here. With the above list of in-and outputs we know that our function (which we will call OLSest) will have the following architecture:

    function [b,bse,res,n,rss,r2] = OLSest(y,x,output)
       programming commands;
       ...
       programming commands};
    end

We will now discuss the core of the function, the programming commands that transform the input variables

    function [b,bse,res,n,rss,r2] = OLSest(y,x,output);
    % This function performs an OLS estimation
    % input vars:    y, vector with dependent variable
    %                x, matrix with explanatory variable
    %                   function will automatically add a constant if the first col
    %                   is not a vector  of ones
    %                output, 1 = printed output
    % output vars:   b, estimated parameters
    %                bse, standard errors for bhat
    %                res, estimated residuals
    %                n, number of observations used
    %                rss, residual sum of squares
    %                r2, Rsquared

All lines beginning with a % are comment lines and you should make it a habit to describe every function at the beginning and to outline what the required input and the output variables are. This is extremely important to facilitate the re-use of your function. Just imagine you have written a piece of code a year ago and you want to re-use it now. You will be extremely grateful for any explanation! Also, if you type help OLSest or doc OLSest the block of comments you wrote directly after the function header will show in the MATLAB command window.

    [n,k] = size(x);
    xxi   = inv(x'*x);
    b     = xxi*x'*y;

These commands establish the dimensions of [math]\mathbf{X}[/math], and use formula ([OLSest]) to estimate the OLS coefficients which are then stored in b. Note that [math](\mathbf{X}'\mathbf{X})^{-1}[/math] is saved in xxi as it will be used later (in the calculation of [math]s_{\widehat{\mathbf{\beta}}}[/math]) and as inverting big matrices is computing intensive we will want to avoid having to do this twice. So saving the result and re-using it is an efficient way to use the computer’s limited resources.

    res   = y - x*b;
    rss   = res'*res;
    ssq   = rss/(n-k);
    s     = sqrt(ssq);
    bse   = ssq*xxi;
    bse   = sqrt(diag(bse));
    ym    = y - mean(y);
    r2    = 1 - (res'*res)/(ym'*ym);

The commands in this section calculate the residuals (res), the residual sum of squares (rss), the coefficient estimates standard error (bse) and the regressionś [math]R^2[/math] (r2). Again we refer to standard econometric textbooks for the details of these calculations.


    if output
    fprintf('===========================================================\n');
    fprintf('===== Regression Output  ==================================\n');
    fprintf('Obs used = \%4.0f, missing obs = \%4.0f \n',n,(ninit-n));
    fprintf('Rsquared = \%5.4f \n',r2);
    fprintf('===== Estimated Model Parameters ==========================\n');
    fprintf('=   Par       se(Par)  ==================\n');
    format short;
    disp([b bse]);
    fprintf('===== Model Statistics ====================================\n');
    fprintf(' standard error = \%5.4f\n',sqrt(ssq));
    fprintf('RSS = \%5.4f \n',rss);
    fprintf('===========================================================\n');
    end

This section of the code is only activated if the third input variable output is true or equal to 1. In this way the user can control whether she wants this bit printed (likely if you are only performing a single regression) or not (likely if you are estimating many regressions in some bigger procedure). This bit also contains a number of commands (like format and fprintf) which may be unknown to you at this stage, but are useful when printing results to the screen. Use the MATLAB help function for some more guidance (type doc format into the command window and the relevant documantation will open).

The actual example OLSest function in the file OLSest.m is a somewhat expanded version of this as it also calculates t-statistics, p-values and Durbin-Watson test statistics. But all these extra stats only appear in the output (if output = 1). It also checks whether the input matrix X contains a column of constants (and if not adds one) and checks for missing observations.

How are functions used?

So far we have described how to write a function and we understand that it is the equivalent of a drinks machine ready to be used (receiving some inputs and handing back outputs). The question remains how to use it. The best way to use them is to save the function into a new mfile (*.m) that has the same file as the function name (here OLSest.m).

Having done that you can use that function (say in a MATLAB script, an example of which you can find here) as demonstrated in the following code extract:

        depvar = ...; \% a vector which contains the dependent variable
        expvar = ...; \% a matrix that contains all explanatory variables in columns,
                          should include a columns of 1s for constant
        [bhat,bhatse,resids,obs,resss,rsq] = OLSest(depvar,expvar,0);
        disp(bhat)

As you can see here all input and output variables have names different to those used in the code of the function itself. Let’s take the first input variable depvar which contains a vector with the dependent variable. In this script this vector is known as depvar. In this function call it is handed over to the function and there it adopts the name y as that is the name given to the first input variable into the function OLSest. Inside the function the variable depvar is actually unknown. Also note that the third input variable was given the value 0. By doing so we ensure that the function does not print the regression output. The first output variable is given the name bhat in the above script. The function OLSest itself actually does not know that variable. However, it did calculate the OLS parameter estimate, saved it (inside the function) as b and then handed this value back as the first output variable. Here in the script file this is then known as bhat and you can continue using that value.

Often you will call the function from a script file but the actual function is saved in its own file, eg OLSest.m. If you then call any function MATLAB wll then search for the relevant .m file (e.g. OLSest.m). It will search it in the current folder and if it is not there it will search in its search path (type path to see which folders these are).

There is an alternative way in which you can tie functions into your code. You can open a new MATLAB file and include your main code (which would normally be in a script file) and a (or several) function(s) which are used in that code. Here is the simplest of examples.

function main()     % main code

for i = 1:4
    temp = fct2(i);
    disp(temp);
end
test = 1;
end % end of main() code


function out1 = fct2(x)
    out1 = x^2;
end  % end of fct2

The main code merely contains a loop in which we repeatedly call function fct2 which does nothing else but square the input variable. The main code is enveloped into a function called main(). Following this function you can see the function definition fct2(x). If you are using several functions you could attach its definition below as well.

When you now run the code it will execute all the code in main(), and as soon as MATLAB reaches the end of that main function it will delete the entire workspace. So if you want to run your code but also want to see, at the end all the created variables it is convenient to introduce a redundant line of code at the end of your main function, here test = 1; and put a breakpoint onto that line such that the execution of the code pauses before it reaches the end. At that stage you can still see all the variables that have been created.

But note, that any function you write and which you think you may want to reuse in another project should really go in its own file to facilitate access from other pieces of code.

Numerical test example

If you use the data in the OLSexample.xls spreadsheet (column 1 as dependent variable and columns 2 to 4 as explanatory variables (don’t forget to include a vector of ones as constant) you should obtain the following OLS parameter estimate:

[math]\label{OLSest2} \widehat{\mathbf{\beta}}=\left( \begin{array}{c} 0.3983 \\ 1.0574 \\ -1.9973 \\ 0.4953 \\ \end{array} \right)[/math]

where the first element is the estimated constant and the remaining parameters the OLS estimates associated with the variables in columns two to four respectively.