Difference between revisions of "NonlinOptImp"
(Created page with "coming soon") |
|||
Line 1: | Line 1: | ||
+ | = Introduction = | ||
+ | |||
+ | Here we will introduce the codes used to find optimal parameters by means of nonlinear optimisation. In the firs section we will introduce the grid-search methodology. The following sections will discuss the implementation of gradient based and non-gradient based nonlinear search algorithms. | ||
+ | |||
+ | = Grid search = | ||
+ | |||
+ | In the [[NonlinOptTheory|theory]] section we discussed the basic structure of the grid-search algorithm. The particular problem to be tackled here is to find the parameter vector <math>\beta = (\beta_0 ~ \beta_1)'</math> for the model | ||
+ | |||
+ | <math>y_{t}=\beta _{0}+\beta _{1}x_{1,t}+\beta _{1}^{2}x_{2,t}+u_{t}</math> | ||
+ | |||
+ | that minimises the sum of squared residuals. The dataset used is available in [[media:judge.xls]]. | ||
+ | |||
+ | If you want to tackle this problem the first step is to write a little function that accepts the data (<math>y_t</math>, <math>x_{1,t}</math> and <math>x_{2,t}</math>) and values for the two parameters and returns the sum of square residuals: <source enclose="none">rss = judgerss(theta,data)</source>, where <source enclose="none">theta</source> represents the parameter vector and <source enclose="none">data</source> the data matrix (here with three columns). | ||
+ | |||
+ | <source>function rss = judgerss(theta,data) | ||
+ | % first calculate residual for res = y - theta(1) - theta(2)*x1 - | ||
+ | % theta(2)^2*x2 | ||
+ | res = data(:,1) - theta(1) - theta(2)*data(:,2) - theta(2)^2*data(:,3); | ||
+ | rss = res'*res; % output for function is Residual Sum of Squares | ||
+ | end</source> | ||
+ | You should note that the first line of code inside the function represents the calculation of the residual vector <math>u</math> in the above model. The second line is a short form to calculate the sum of squares (<math>u'u</math>). Calling this function with the dataset and any parameter vector will return the residual sum of squares in <source enclose="none">rss</source>. Save this function in a m file named <source enclose="none">judgerss.m</source>. | ||
+ | |||
+ | Now we need to write the code that sets up the range of values for the parameters, loops through all possible combinations and calculates the associates <source enclose="none">rss</source> and then establishes which combination of parameters delivers the minimum. First we need to load the data and setup the parameter grid (using the <source enclose="none">(start:step:end)</source> operation to create a regular grid). We should also prepare a matrix into which we save all the residual sum of squares. Lastly we need to think about how to figure out what the best parameter combination is. One way of doing so is to just calculate all possible RSS and then find the minimum. The strategy we use here is that after every RSS calculation we will check whether we found a new optimum, and if so we will save the associated parameters. That will imply that as soon as we completed looping through all possibilities we will have the optimal parameter values ready. To do this we need a variable <source enclose="none">min_rss</source> that always stores the best RSS. before the loop starts we need to initialise this with a value that we know will be beaten. Here that is the sum of squares of <math>y_t</math>, (<source enclose="none">judgedata(:,1)’*judgedata(:,1)</source>). Then we need a <math>(2 \times 1)</math> vector in which we store information on which of the possible parameter values is associated with that <source enclose="none">min_rss</source>, here called <source enclose="none">min_rss_beta</source>. | ||
+ | |||
+ | <source>clc % Clears the command window | ||
+ | clear all % Clears workspace | ||
+ | judgedata = xlsread('judge.xls'); % import data for judge example | ||
+ | % col1: y, col2: x1, col3: x2 | ||
+ | |||
+ | grid_beta0 = (-3:0.01:3)'; % Set grid of values for parameters | ||
+ | grid_beta1 = (-3:0.01:3)'; % \beta_0 and \beta_1 | ||
+ | |||
+ | % RSS for all possible par combinations will be saved in here | ||
+ | save_rss = zeros(length(grid_theta1),length(grid_theta2)); | ||
+ | |||
+ | min_rss = judgedata(:,1)'*judgedata(:,1); % variable to store best RSS | ||
+ | min_rss_beta = [1;1]; % indicates which values of grid_beta0 and grid_beta1 is associated with min_rss</source> | ||
+ | Now we are ready to start the loops. As we have two parameter values we need two nested loops in order to evaluate all possible combinations of parameter values. In the loop we call <source enclose="none">judgerss(beta,judgedata)</source>. Note that the data (<source enclose="none">judgedata</source>) remain the same throughout the code, but the parameter vector used changes for every iteration. The result is saved straight into <source enclose="none">save_rss</source>. Also, before continuing to the next possible combination we check whether this latest RSS (<source enclose="none">save_rss(i0,i1)</source>) is the new best (smallest) value for the RSS and if so we save it and the associated parameter vector <source enclose="none">min_rss_beta = beta</source>. If not we don’t do anything and just continue to the next possible combination (which is why the <source enclose="none">if</source> command has no <source enclose="none">else</source> part). | ||
+ | |||
+ | <source>for i0 = 1:size(grid_beta0,1) % loop through \beta_0 | ||
+ | for i1 = 1:size(grid_beta1,1) % loop through \beta_1 | ||
+ | |||
+ | beta = [grid_beta0(i0);grid_beta1(i1)]; % get \beta_0 and \beta_1 values | ||
+ | save_rss(i0,i1) = judgerss(beta,judgedata); % calculate rss for this combination and save | ||
+ | |||
+ | if save_rss(i0,i1) < min_rss % check if we found new minimum RSS | ||
+ | min_rss = save_rss(i0,i1); % if yes save as new best | ||
+ | min_rss_beta = beta; % and save the associated beta | ||
+ | end | ||
+ | |||
+ | end | ||
+ | end</source> | ||
+ | The very last thing you want to do is to display your results. The straightforward way to do so is: | ||
+ | |||
+ | <source>disp('optimal par value after grid search'); | ||
+ | disp(min_rss_beta); | ||
+ | disp('Min RSS value'); | ||
+ | disp(min_rss);</source> | ||
+ | which for this example should deliver: | ||
+ | |||
+ | <source>optimal par value after grid search | ||
+ | -0.1400 | ||
+ | 1.2400 | ||
+ | |||
+ | Min RSS value | ||
+ | 16.2503</source> | ||
+ | One advantage of a grid search is that you automatically have all the information required for a contour plot which you can use to visualise the problem the grid search just solved. | ||
+ | |||
+ | <source>v = [(16:0.1:16.9)';(17:1:51)']; % where to draw contour lines | ||
+ | contour(grid_beta0,grid_beta1,save_rss,v); figure(gcf) | ||
+ | xlabel('\beta_1'); | ||
+ | ylabel('\beta_0'); | ||
+ | title('Judge Contour plot');</source> | ||
+ | Don’t worry about the details of this command at this stage. There is more detail on how to plot pictures in MATLAB in the [[Graphing]] Section. It should produce the following picture | ||
+ | |||
+ | This can be read like a topographical map with height lines. You can identify two peaks (although here they are really troughs as we are looking for a minimum). The optimum is at parameter values <math>\beta_0 = -0.14</math> and <math>\beta_1=1.24</math>. | ||
+ | |||
+ | The disadvantage of this method (just to repeat what was discussed in the theory session, is that this becomes extremely computing intensive if you have many parameters. Also, you need to know that the optimal parameter values are inside the pre-specified grid. Both of these concerns are addressed by the more general nonlinear optimisation routines below (however this advantage comes at a price, nothing is for free). | ||
+ | |||
+ | = Gradient-based optimisation, <source enclose="none">fminunc</source> = | ||
+ | |||
+ | coming soon | ||
+ | |||
+ | = Non-gradient based, <source enclose="none">fminsearch</source> = | ||
+ | |||
coming soon | coming soon |
Revision as of 12:50, 10 October 2012
Contents
Introduction
Here we will introduce the codes used to find optimal parameters by means of nonlinear optimisation. In the firs section we will introduce the grid-search methodology. The following sections will discuss the implementation of gradient based and non-gradient based nonlinear search algorithms.
Grid search
In the theory section we discussed the basic structure of the grid-search algorithm. The particular problem to be tackled here is to find the parameter vector [math]\beta = (\beta_0 ~ \beta_1)'[/math] for the model
[math]y_{t}=\beta _{0}+\beta _{1}x_{1,t}+\beta _{1}^{2}x_{2,t}+u_{t}[/math]
that minimises the sum of squared residuals. The dataset used is available in media:judge.xls.
If you want to tackle this problem the first step is to write a little function that accepts the data ([math]y_t[/math], [math]x_{1,t}[/math] and [math]x_{2,t}[/math]) and values for the two parameters and returns the sum of square residuals: rss = judgerss(theta,data)
, where theta
represents the parameter vector and data
the data matrix (here with three columns).
function rss = judgerss(theta,data)
% first calculate residual for res = y - theta(1) - theta(2)*x1 -
% theta(2)^2*x2
res = data(:,1) - theta(1) - theta(2)*data(:,2) - theta(2)^2*data(:,3);
rss = res'*res; % output for function is Residual Sum of Squares
end
You should note that the first line of code inside the function represents the calculation of the residual vector [math]u[/math] in the above model. The second line is a short form to calculate the sum of squares ([math]u'u[/math]). Calling this function with the dataset and any parameter vector will return the residual sum of squares in rss
. Save this function in a m file named judgerss.m
.
Now we need to write the code that sets up the range of values for the parameters, loops through all possible combinations and calculates the associates rss
and then establishes which combination of parameters delivers the minimum. First we need to load the data and setup the parameter grid (using the (start:step:end)
operation to create a regular grid). We should also prepare a matrix into which we save all the residual sum of squares. Lastly we need to think about how to figure out what the best parameter combination is. One way of doing so is to just calculate all possible RSS and then find the minimum. The strategy we use here is that after every RSS calculation we will check whether we found a new optimum, and if so we will save the associated parameters. That will imply that as soon as we completed looping through all possibilities we will have the optimal parameter values ready. To do this we need a variable min_rss
that always stores the best RSS. before the loop starts we need to initialise this with a value that we know will be beaten. Here that is the sum of squares of [math]y_t[/math], (judgedata(:,1)’*judgedata(:,1)
). Then we need a [math](2 \times 1)[/math] vector in which we store information on which of the possible parameter values is associated with that min_rss
, here called min_rss_beta
.
clc % Clears the command window
clear all % Clears workspace
judgedata = xlsread('judge.xls'); % import data for judge example
% col1: y, col2: x1, col3: x2
grid_beta0 = (-3:0.01:3)'; % Set grid of values for parameters
grid_beta1 = (-3:0.01:3)'; % \beta_0 and \beta_1
% RSS for all possible par combinations will be saved in here
save_rss = zeros(length(grid_theta1),length(grid_theta2));
min_rss = judgedata(:,1)'*judgedata(:,1); % variable to store best RSS
min_rss_beta = [1;1]; % indicates which values of grid_beta0 and grid_beta1 is associated with min_rss
Now we are ready to start the loops. As we have two parameter values we need two nested loops in order to evaluate all possible combinations of parameter values. In the loop we call judgerss(beta,judgedata)
. Note that the data (judgedata
) remain the same throughout the code, but the parameter vector used changes for every iteration. The result is saved straight into save_rss
. Also, before continuing to the next possible combination we check whether this latest RSS (save_rss(i0,i1)
) is the new best (smallest) value for the RSS and if so we save it and the associated parameter vector min_rss_beta = beta
. If not we don’t do anything and just continue to the next possible combination (which is why the if
command has no else
part).
for i0 = 1:size(grid_beta0,1) % loop through \beta_0
for i1 = 1:size(grid_beta1,1) % loop through \beta_1
beta = [grid_beta0(i0);grid_beta1(i1)]; % get \beta_0 and \beta_1 values
save_rss(i0,i1) = judgerss(beta,judgedata); % calculate rss for this combination and save
if save_rss(i0,i1) < min_rss % check if we found new minimum RSS
min_rss = save_rss(i0,i1); % if yes save as new best
min_rss_beta = beta; % and save the associated beta
end
end
end
The very last thing you want to do is to display your results. The straightforward way to do so is:
disp('optimal par value after grid search');
disp(min_rss_beta);
disp('Min RSS value');
disp(min_rss);
which for this example should deliver:
optimal par value after grid search
-0.1400
1.2400
Min RSS value
16.2503
One advantage of a grid search is that you automatically have all the information required for a contour plot which you can use to visualise the problem the grid search just solved.
v = [(16:0.1:16.9)';(17:1:51)']; % where to draw contour lines
contour(grid_beta0,grid_beta1,save_rss,v); figure(gcf)
xlabel('\beta_1');
ylabel('\beta_0');
title('Judge Contour plot');
Don’t worry about the details of this command at this stage. There is more detail on how to plot pictures in MATLAB in the Graphing Section. It should produce the following picture
This can be read like a topographical map with height lines. You can identify two peaks (although here they are really troughs as we are looking for a minimum). The optimum is at parameter values [math]\beta_0 = -0.14[/math] and [math]\beta_1=1.24[/math].
The disadvantage of this method (just to repeat what was discussed in the theory session, is that this becomes extremely computing intensive if you have many parameters. Also, you need to know that the optimal parameter values are inside the pre-specified grid. Both of these concerns are addressed by the more general nonlinear optimisation routines below (however this advantage comes at a price, nothing is for free).
Gradient-based optimisation, fminunc
coming soon
Non-gradient based, fminsearch
coming soon