Difference between revisions of "Function"
(→OLSest in detail) |
(→How are functions used?) |
||
(11 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
= Overview = | = Overview = | ||
− | Functions are an essential toolkit in every programming language. They are used to | + | Functions are an essential toolkit in every programming language. They are used to "outsource" a piece of code that is so generic that it may be reused on a number of occasions. For it to be able to be reused it is written such that the things that may change (i.e. different datasets) are treated in a way that makes them easy to change.<br /> |
− | In fact, a good analogy is a drinks vending machine. The box or machine (or in our language, function) hides a large number of things (computer, mechanics etc.) from the eyes of the user. All the user does is to provide some input (money and choice of drink), then the machine does its stuff, and eventually delivers some output, hopefully an ice-cold can of your | + | In fact, a good analogy is a drinks vending machine. The box or machine (or in our language, function) hides a large number of things (computer, mechanics etc.) from the eyes of the user. All the user does is to provide some input (money and choice of drink), then the machine does its stuff, and eventually delivers some output, hopefully an ice-cold can of your favorite softdrink.<br /> |
Here we will do exactly the same. We will write a bit of code that does something useful (in our case it will calculate an OLS regression). To do that it will require the user to provide some input. The function will do its work and deliver back some output. | Here we will do exactly the same. We will write a bit of code that does something useful (in our case it will calculate an OLS regression). To do that it will require the user to provide some input. The function will do its work and deliver back some output. | ||
Line 29: | Line 29: | ||
programming commands; | programming commands; | ||
end</source> | end</source> | ||
− | We have a set of input variables (< | + | We have a set of input variables (<source enclose=none>in1, in2, etc.</source>) which will be used in a set of calculations (<source enclose=none>programming commands</source>). These calculations will use the input variables to calculate some outputs (<source enclose=none>out1, out2, etc.</source>). The function then hands back the values for these variables (<source enclose=none>out1,out2,...</source>) such that they can be used later in any subsequent calculations. More on this [[#FuncUse|here]]. Where should you save this code? It is best to save this code in a m-file that has exactly the same name as the function, i.e. FunctionName.m. If you do that you can call that function from any other code, which is very useful for a OLS function which we are about to write, as that will be used very often. |
So, before we continue we need to specify what the inputs ought to be and what outputs we should expect from our function. The input that is required to estimate a regression is the following: | So, before we continue we need to specify what the inputs ought to be and what outputs we should expect from our function. The input that is required to estimate a regression is the following: | ||
Line 39: | Line 39: | ||
The function will then estimate a regression and deliver some output. It is of course in the hands of the programmer (that is you!) to determine what regression outputs you want. For the sake of this exercise we shall deliver the following | The function will then estimate a regression and deliver some output. It is of course in the hands of the programmer (that is you!) to determine what regression outputs you want. For the sake of this exercise we shall deliver the following | ||
− | * <math>\widehat{\mathbf{\beta}}</math> (or < | + | * <math>\widehat{\mathbf{\beta}}</math> (or <source enclose=none>b</source> below), the vector containing the estimated regression coefficients. |
− | * <math>s_{\widehat{\mathbf{\beta}}}</math> a <math>(k \times 1)</math> vector with the estimated OLS standard errors for <math>\widehat{\mathbf{\beta}}</math> (< | + | * <math>s_{\widehat{\mathbf{\beta}}}</math> a <math>(k \times 1)</math> vector with the estimated OLS standard errors for <math>\widehat{\mathbf{\beta}}</math> (<source enclose=none>bse</source> below). |
− | * <math>\widehat{\mathbf{\epsilon}}</math>, the <math>(n \times 1)</math> vector of estimated regression residuals (< | + | * <math>\widehat{\mathbf{\epsilon}}</math>, the <math>(n \times 1)</math> vector of estimated regression residuals (<source enclose=none>res</source> below). |
* <math>n</math>, the number of observations used. | * <math>n</math>, the number of observations used. | ||
* <math>RSS</math>, the residual sum of squares. | * <math>RSS</math>, the residual sum of squares. | ||
− | * <math>R^2</math> (or below < | + | * <math>R^2</math> (or below <source enclose=none>r2</source>). |
Refer to Section [FuncUse] to see how we use this function from another bit of code. | Refer to Section [FuncUse] to see how we use this function from another bit of code. | ||
Line 50: | Line 50: | ||
= OLSest in detail = | = OLSest in detail = | ||
− | You can | + | You can obtain the complete function code from [[FctExampleCode#OLSestm|here]]. With the above list of in-and outputs we know that our function (which we will call <source enclose=none>OLSest</source>) will have the following architecture: |
<source> function [b,bse,res,n,rss,r2] = OLSest(y,x,output) | <source> function [b,bse,res,n,rss,r2] = OLSest(y,x,output) | ||
Line 63: | Line 63: | ||
% This function performs an OLS estimation | % This function performs an OLS estimation | ||
% input vars: y, vector with dependent variable | % input vars: y, vector with dependent variable | ||
− | % | + | % x, matrix with explanatory variable |
− | % | + | % function will automatically add a constant if the first col |
− | % | + | % is not a vector of ones |
− | % | + | % output, 1 = printed output |
% output vars: b, estimated parameters | % output vars: b, estimated parameters | ||
− | % | + | % bse, standard errors for bhat |
− | % | + | % res, estimated residuals |
− | % | + | % n, number of observations used |
− | % | + | % rss, residual sum of squares |
− | % | + | % r2, Rsquared</source> |
− | + | All lines beginning with a % are comment lines and you should make it a habit to describe every function at the beginning and to outline what the required input and the output variables are. This is extremely important to facilitate the re-use of your function. Just imagine you have written a piece of code a year ago and you want to re-use it now. You will be extremely grateful for any explanation! Also, if you type <source enclose=none>help OLSest</source> or <source enclose=none>doc OLSest</source> the block of comments you wrote directly after the function header will show in the MATLAB command window. | |
− | All lines beginning with a % are comment lines and you should make it a habit to describe every function at the beginning and to outline what the required input and the output variables are. This is extremely important to facilitate the re-use of your function. Just imagine you have written a piece of code a year ago and you want to re-use it now. You will be extremely grateful for any explanation! Also, if you type or the block of comments you wrote directly after the function header will show in the MATLAB command window. | ||
<source> | <source> | ||
[n,k] = size(x); | [n,k] = size(x); | ||
xxi = inv(x'*x); | xxi = inv(x'*x); | ||
− | b = xxi*x'*y; | + | b = xxi*x'*y;</source> |
− | + | These commands establish the dimensions of <math>\mathbf{X}</math>, and use formula ([OLSest]) to estimate the OLS coefficients which are then stored in <source enclose=none>b</source>. Note that <math>(\mathbf{X}'\mathbf{X})^{-1}</math> is saved in <source enclose=none>xxi</source> as it will be used later (in the calculation of <math>s_{\widehat{\mathbf{\beta}}}</math>) and as inverting big matrices is computing intensive we will want to avoid having to do this twice. So saving the result and re-using it is an efficient way to use the computer’s limited resources. | |
− | These commands establish the dimensions of <math>\mathbf{X}</math>, and use formula ([OLSest]) to estimate the OLS coefficients which are then stored in < | ||
<source> res = y - x*b; | <source> res = y - x*b; | ||
Line 91: | Line 89: | ||
ym = y - mean(y); | ym = y - mean(y); | ||
r2 = 1 - (res'*res)/(ym'*ym); | r2 = 1 - (res'*res)/(ym'*ym); | ||
− | + | </source> | |
− | The commands in this section calculate the residuals (< | + | The commands in this section calculate the residuals (<source enclose=none>res</source>), the residual sum of squares (<source enclose=none>rss</source>), the coefficient estimates standard error (<source enclose=none>bse</source>) and the regressionś <math>R^2</math> (<source enclose=none>r2</source>). Again we refer to standard econometric textbooks for the details of these calculations.<br /> |
<source> | <source> | ||
if output | if output | ||
− | fprintf('===========================================================\ | + | fprintf('===========================================================\n'); |
− | fprintf('===== Regression Output ==================================\ | + | fprintf('===== Regression Output ==================================\n'); |
− | fprintf('Obs used = \%4.0f, missing obs = \%4.0f \ | + | fprintf('Obs used = \%4.0f, missing obs = \%4.0f \n',n,(ninit-n)); |
− | fprintf('Rsquared = \%5.4f \ | + | fprintf('Rsquared = \%5.4f \n',r2); |
− | fprintf('===== Estimated Model Parameters ==========================\ | + | fprintf('===== Estimated Model Parameters ==========================\n'); |
− | fprintf('= Par se(Par) ==================\ | + | fprintf('= Par se(Par) ==================\n'); |
format short; | format short; | ||
disp([b bse]); | disp([b bse]); | ||
− | fprintf('===== Model Statistics ====================================\ | + | fprintf('===== Model Statistics ====================================\n'); |
− | fprintf(' standard error = \%5.4f\ | + | fprintf(' standard error = \%5.4f\n',sqrt(ssq)); |
− | fprintf('RSS = \%5.4f \ | + | fprintf('RSS = \%5.4f \n',rss); |
− | fprintf('===========================================================\ | + | fprintf('===========================================================\n'); |
− | end | + | end</source> |
− | + | This section of the code is only activated if the third input variable <source enclose=none>output</source> is true or equal to 1. In this way the user can control whether she wants this bit printed (likely if you are only performing a single regression) or not (likely if you are estimating many regressions in some bigger procedure). This bit also contains a number of commands (like <source enclose=none>format</source> and <source enclose=none>fprintf</source>) which may be unknown to you at this stage, but are useful when printing results to the screen. Use the MATLAB help function for some more guidance (type <source enclose=none>doc format</source> into the command window and the relevant documantation will open). | |
− | This section of the code is only activated if the third input variable < | ||
− | The actual example < | + | The actual example <source enclose=none>OLSest</source> function in the file <source enclose=none>OLSest.m</source> is a somewhat expanded version of this as it also calculates t-statistics, p-values and Durbin-Watson test statistics. But all these extra stats only appear in the output (if <source enclose=none>output</source> = 1). It also checks whether the input matrix <source enclose=none>X</source> contains a column of constants (and if not adds one) and checks for missing observations. |
− | = How are functions used? | + | = <div id="FuncUse"></div>How are functions used? = |
− | So far we have described how to write a function and we understand that it is the equivalent of a drinks machine ready to be used (receiving some inputs and handing back outputs). The question remains how to use it. The best way to use them is to save the function into a new mfile (*.m) that has the same file as the function name (here < | + | So far we have described how to write a function and we understand that it is the equivalent of a drinks machine ready to be used (receiving some inputs and handing back outputs). The question remains how to use it. The best way to use them is to save the function into a new mfile (*.m) that has the same file as the function name (here <code>OLSest.m</code>).<br /> |
− | + | Having done that you can use that function (say in a MATLAB script, an example of which you can find [[FctExampleCode#FunctionExamplem|here]]) as demonstrated in the following code extract: | |
− | Having done that you can use that function (say in a MATLAB script) as demonstrated in the following code extract: | ||
<source> depvar = ...; \% a vector which contains the dependent variable | <source> depvar = ...; \% a vector which contains the dependent variable | ||
Line 126: | Line 122: | ||
should include a columns of 1s for constant | should include a columns of 1s for constant | ||
[bhat,bhatse,resids,obs,resss,rsq] = OLSest(depvar,expvar,0); | [bhat,bhatse,resids,obs,resss,rsq] = OLSest(depvar,expvar,0); | ||
− | disp(bhat) | + | disp(bhat)</source> |
− | + | As you can see here all input and output variables have names different to those used in the code of the function itself. Let’s take the first input variable <code>depvar</code> which contains a vector with the dependent variable. In this script this vector is known as <code>depvar</code>. In this function call it is handed over to the function and there it adopts the name <code>y</code> as that is the name given to the first input variable into the function <code>OLSest</code>. Inside the function the variable <code>depvar</code> is actually unknown. Also note that the third input variable was given the value 0. By doing so we ensure that the function does not print the regression output. The first output variable is given the name <code>bhat</code> in the above script. The function <code>OLSest</code> itself actually does not know that variable. However, it did calculate the OLS parameter estimate, saved it (inside the function) as <code>b</code> and then handed this value back as the first output variable. Here in the script file this is then known as <code>bhat</code> and you can continue using that value.<br /> | |
− | As you can see here all input and output variables have names different to those used in the code of the function itself. | + | |
+ | Often you will call the function from a script file but the actual function is saved in its own file, eg OLSest.m. If you then call any function MATLAB wll then search for the relevant .m file (e.g. OLSest.m). It will search it in the current folder and if it is not there it will search in its search path (type <code>path</code> to see which folders these are). | ||
+ | |||
+ | There is an alternative way in which you can tie functions into your code. You can open a new MATLAB file and include your main code (which would normally be in a script file) and a (or several) function(s) which are used in that code. Here is the simplest of examples. | ||
+ | |||
+ | <source>function main() % main code | ||
+ | |||
+ | for i = 1:4 | ||
+ | temp = fct2(i); | ||
+ | disp(temp); | ||
+ | end | ||
+ | test = 1; | ||
+ | end % end of main() code | ||
+ | |||
+ | |||
+ | function out1 = fct2(x) | ||
+ | out1 = x^2; | ||
+ | end % end of fct2</source> | ||
+ | The main code merely contains a loop in which we repeatedly call function fct2 which does nothing else but square the input variable. The main code is enveloped into a function called <code>main()</code>. Following this function you can see the function definition <code>fct2(x)</code>. If you are using several functions you could attach its definition below as well. | ||
+ | |||
+ | When you now run the code it will execute all the code in <code>main()</code>, and as soon as MATLAB reaches the end of that main function it will delete the entire workspace. So if you want to run your code but also want to see, at the end all the created variables it is convenient to introduce a redundant line of code at the end of your main function, here <code>test = 1;</code> and put a breakpoint onto that line such that the execution of the code pauses before it reaches the end. At that stage you can still see all the variables that have been created. | ||
+ | But note, that any function you write and which you think you may want to reuse in another project should really go in its own file to facilitate access from other pieces of code. | ||
= Numerical test example = | = Numerical test example = | ||
− | If you use the data in the < | + | If you use the data in the <source enclose=none>OLSexample.xls</source> spreadsheet (column 1 as dependent variable and columns 2 to 4 as explanatory variables (don’t forget to include a vector of ones as constant) you should obtain the following OLS parameter estimate: |
<math>\label{OLSest2} | <math>\label{OLSest2} |
Latest revision as of 11:39, 21 November 2014
Contents
Overview
Functions are an essential toolkit in every programming language. They are used to "outsource" a piece of code that is so generic that it may be reused on a number of occasions. For it to be able to be reused it is written such that the things that may change (i.e. different datasets) are treated in a way that makes them easy to change.
In fact, a good analogy is a drinks vending machine. The box or machine (or in our language, function) hides a large number of things (computer, mechanics etc.) from the eyes of the user. All the user does is to provide some input (money and choice of drink), then the machine does its stuff, and eventually delivers some output, hopefully an ice-cold can of your favorite softdrink.
Here we will do exactly the same. We will write a bit of code that does something useful (in our case it will calculate an OLS regression). To do that it will require the user to provide some input. The function will do its work and deliver back some output.
Econometric Background
This is not the place to review the Econometric Theory in detail, but to make the context clearer consider that we are concerned with estimating a regression model
[math]\label{OLSModel} \mathbf{y}=\mathbf{X} \mathbf{\beta} + \mathbf{\epsilon}[/math]
where [math]\mathbf{y}[/math] is a [math](n \times 1)[/math] vector that contains all [math]n[/math] observations for the dependent variable and [math]\mathbf{X}[/math] is a [math](n \times k)[/math] that contains all explanatory variables. The [math](k \times 1)[/math] vector [math]\mathbf{\beta}[/math] represents the unobserved population coefficient vector and [math]\mathbf{\epsilon}[/math] is a [math](n \times 1)[/math] vector of unobserved error terms. The OLS estimator for the unknown parameter vector is of course
[math]\label{OLSest} \widehat{\mathbf{\beta}} = (\mathbf{X}'\mathbf{X})^{-1} \mathbf{X}'\mathbf{y}[/math]
You will also recall that useful associated statistics to such a regression are the standard errors of [math]\widehat{\mathbf{\beta}}[/math], the residual sum of squares and the [math]R^2[/math], all of which you can review in the Econometrics textbook of your choice.
Function structure
In general a function will look like this:
function [out1,out2,...] = FunctionName(in1,in2,...)
programming commands;
...
programming commands;
end
We have a set of input variables (in1, in2, etc.
) which will be used in a set of calculations (programming commands
). These calculations will use the input variables to calculate some outputs (out1, out2, etc.
). The function then hands back the values for these variables (out1,out2,...
) such that they can be used later in any subsequent calculations. More on this here. Where should you save this code? It is best to save this code in a m-file that has exactly the same name as the function, i.e. FunctionName.m. If you do that you can call that function from any other code, which is very useful for a OLS function which we are about to write, as that will be used very often.
So, before we continue we need to specify what the inputs ought to be and what outputs we should expect from our function. The input that is required to estimate a regression is the following:
- A vector that contains all observations for the dependent variable, [math]\mathbf{y}[/math]
- A matrix, [math]\mathbf{X}[/math] that contains all explanatory variables in the columns. This matrix should have the same number of rows as [math]\mathbf{y}[/math].
- (optional) A variable that indicates whether we want the regression output printed into the MATLAB command window or not.
The function will then estimate a regression and deliver some output. It is of course in the hands of the programmer (that is you!) to determine what regression outputs you want. For the sake of this exercise we shall deliver the following
- [math]\widehat{\mathbf{\beta}}[/math] (or
b
below), the vector containing the estimated regression coefficients. - [math]s_{\widehat{\mathbf{\beta}}}[/math] a [math](k \times 1)[/math] vector with the estimated OLS standard errors for [math]\widehat{\mathbf{\beta}}[/math] (
bse
below). - [math]\widehat{\mathbf{\epsilon}}[/math], the [math](n \times 1)[/math] vector of estimated regression residuals (
res
below). - [math]n[/math], the number of observations used.
- [math]RSS[/math], the residual sum of squares.
- [math]R^2[/math] (or below
r2
).
Refer to Section [FuncUse] to see how we use this function from another bit of code.
OLSest in detail
You can obtain the complete function code from here. With the above list of in-and outputs we know that our function (which we will call OLSest
) will have the following architecture:
function [b,bse,res,n,rss,r2] = OLSest(y,x,output)
programming commands;
...
programming commands};
end
We will now discuss the core of the function, the programming commands that transform the input variables
function [b,bse,res,n,rss,r2] = OLSest(y,x,output);
% This function performs an OLS estimation
% input vars: y, vector with dependent variable
% x, matrix with explanatory variable
% function will automatically add a constant if the first col
% is not a vector of ones
% output, 1 = printed output
% output vars: b, estimated parameters
% bse, standard errors for bhat
% res, estimated residuals
% n, number of observations used
% rss, residual sum of squares
% r2, Rsquared
All lines beginning with a % are comment lines and you should make it a habit to describe every function at the beginning and to outline what the required input and the output variables are. This is extremely important to facilitate the re-use of your function. Just imagine you have written a piece of code a year ago and you want to re-use it now. You will be extremely grateful for any explanation! Also, if you type help OLSest
or doc OLSest
the block of comments you wrote directly after the function header will show in the MATLAB command window.
[n,k] = size(x);
xxi = inv(x'*x);
b = xxi*x'*y;
These commands establish the dimensions of [math]\mathbf{X}[/math], and use formula ([OLSest]) to estimate the OLS coefficients which are then stored in b
. Note that [math](\mathbf{X}'\mathbf{X})^{-1}[/math] is saved in xxi
as it will be used later (in the calculation of [math]s_{\widehat{\mathbf{\beta}}}[/math]) and as inverting big matrices is computing intensive we will want to avoid having to do this twice. So saving the result and re-using it is an efficient way to use the computer’s limited resources.
res = y - x*b;
rss = res'*res;
ssq = rss/(n-k);
s = sqrt(ssq);
bse = ssq*xxi;
bse = sqrt(diag(bse));
ym = y - mean(y);
r2 = 1 - (res'*res)/(ym'*ym);
The commands in this section calculate the residuals (res
), the residual sum of squares (rss
), the coefficient estimates standard error (bse
) and the regressionś [math]R^2[/math] (r2
). Again we refer to standard econometric textbooks for the details of these calculations.
if output
fprintf('===========================================================\n');
fprintf('===== Regression Output ==================================\n');
fprintf('Obs used = \%4.0f, missing obs = \%4.0f \n',n,(ninit-n));
fprintf('Rsquared = \%5.4f \n',r2);
fprintf('===== Estimated Model Parameters ==========================\n');
fprintf('= Par se(Par) ==================\n');
format short;
disp([b bse]);
fprintf('===== Model Statistics ====================================\n');
fprintf(' standard error = \%5.4f\n',sqrt(ssq));
fprintf('RSS = \%5.4f \n',rss);
fprintf('===========================================================\n');
end
This section of the code is only activated if the third input variable output
is true or equal to 1. In this way the user can control whether she wants this bit printed (likely if you are only performing a single regression) or not (likely if you are estimating many regressions in some bigger procedure). This bit also contains a number of commands (like format
and fprintf
) which may be unknown to you at this stage, but are useful when printing results to the screen. Use the MATLAB help function for some more guidance (type doc format
into the command window and the relevant documantation will open).
The actual example OLSest
function in the file OLSest.m
is a somewhat expanded version of this as it also calculates t-statistics, p-values and Durbin-Watson test statistics. But all these extra stats only appear in the output (if output
= 1). It also checks whether the input matrix X
contains a column of constants (and if not adds one) and checks for missing observations.
How are functions used?
So far we have described how to write a function and we understand that it is the equivalent of a drinks machine ready to be used (receiving some inputs and handing back outputs). The question remains how to use it. The best way to use them is to save the function into a new mfile (*.m) that has the same file as the function name (here OLSest.m
).
Having done that you can use that function (say in a MATLAB script, an example of which you can find here) as demonstrated in the following code extract:
depvar = ...; \% a vector which contains the dependent variable
expvar = ...; \% a matrix that contains all explanatory variables in columns,
should include a columns of 1s for constant
[bhat,bhatse,resids,obs,resss,rsq] = OLSest(depvar,expvar,0);
disp(bhat)
As you can see here all input and output variables have names different to those used in the code of the function itself. Let’s take the first input variable depvar
which contains a vector with the dependent variable. In this script this vector is known as depvar
. In this function call it is handed over to the function and there it adopts the name y
as that is the name given to the first input variable into the function OLSest
. Inside the function the variable depvar
is actually unknown. Also note that the third input variable was given the value 0. By doing so we ensure that the function does not print the regression output. The first output variable is given the name bhat
in the above script. The function OLSest
itself actually does not know that variable. However, it did calculate the OLS parameter estimate, saved it (inside the function) as b
and then handed this value back as the first output variable. Here in the script file this is then known as bhat
and you can continue using that value.
Often you will call the function from a script file but the actual function is saved in its own file, eg OLSest.m. If you then call any function MATLAB wll then search for the relevant .m file (e.g. OLSest.m). It will search it in the current folder and if it is not there it will search in its search path (type path
to see which folders these are).
There is an alternative way in which you can tie functions into your code. You can open a new MATLAB file and include your main code (which would normally be in a script file) and a (or several) function(s) which are used in that code. Here is the simplest of examples.
function main() % main code
for i = 1:4
temp = fct2(i);
disp(temp);
end
test = 1;
end % end of main() code
function out1 = fct2(x)
out1 = x^2;
end % end of fct2
The main code merely contains a loop in which we repeatedly call function fct2 which does nothing else but square the input variable. The main code is enveloped into a function called main()
. Following this function you can see the function definition fct2(x)
. If you are using several functions you could attach its definition below as well.
When you now run the code it will execute all the code in main()
, and as soon as MATLAB reaches the end of that main function it will delete the entire workspace. So if you want to run your code but also want to see, at the end all the created variables it is convenient to introduce a redundant line of code at the end of your main function, here test = 1;
and put a breakpoint onto that line such that the execution of the code pauses before it reaches the end. At that stage you can still see all the variables that have been created.
But note, that any function you write and which you think you may want to reuse in another project should really go in its own file to facilitate access from other pieces of code.
Numerical test example
If you use the data in the OLSexample.xls
spreadsheet (column 1 as dependent variable and columns 2 to 4 as explanatory variables (don’t forget to include a vector of ones as constant) you should obtain the following OLS parameter estimate:
[math]\label{OLSest2} \widehat{\mathbf{\beta}}=\left( \begin{array}{c} 0.3983 \\ 1.0574 \\ -1.9973 \\ 0.4953 \\ \end{array} \right)[/math]
where the first element is the estimated constant and the remaining parameters the OLS estimates associated with the variables in columns two to four respectively.