NonlinOptImp

From ECLR
Revision as of 13:50, 10 October 2012 by Admin (talk | contribs)
Jump to: navigation, search

Introduction

Here we will introduce the codes used to find optimal parameters by means of nonlinear optimisation. In the firs section we will introduce the grid-search methodology. The following sections will discuss the implementation of gradient based and non-gradient based nonlinear search algorithms.

Grid search

In the theory section we discussed the basic structure of the grid-search algorithm. The particular problem to be tackled here is to find the parameter vector [math]\beta = (\beta_0 ~ \beta_1)'[/math] for the model

[math]y_{t}=\beta _{0}+\beta _{1}x_{1,t}+\beta _{1}^{2}x_{2,t}+u_{t}[/math]

that minimises the sum of squared residuals. The dataset used is available in media:judge.xls.

If you want to tackle this problem the first step is to write a little function that accepts the data ([math]y_t[/math], [math]x_{1,t}[/math] and [math]x_{2,t}[/math]) and values for the two parameters and returns the sum of square residuals: rss = judgerss(theta,data), where theta represents the parameter vector and data the data matrix (here with three columns).

function rss = judgerss(theta,data)
% first calculate residual for res = y - theta(1) - theta(2)*x1 -
% theta(2)^2*x2
res = data(:,1) - theta(1) - theta(2)*data(:,2) - theta(2)^2*data(:,3);
rss = res'*res;     % output for function is Residual Sum of Squares
end

You should note that the first line of code inside the function represents the calculation of the residual vector [math]u[/math] in the above model. The second line is a short form to calculate the sum of squares ([math]u'u[/math]). Calling this function with the dataset and any parameter vector will return the residual sum of squares in rss. Save this function in a m file named judgerss.m.

Now we need to write the code that sets up the range of values for the parameters, loops through all possible combinations and calculates the associates rss and then establishes which combination of parameters delivers the minimum. First we need to load the data and setup the parameter grid (using the (start:step:end) operation to create a regular grid). We should also prepare a matrix into which we save all the residual sum of squares. Lastly we need to think about how to figure out what the best parameter combination is. One way of doing so is to just calculate all possible RSS and then find the minimum. The strategy we use here is that after every RSS calculation we will check whether we found a new optimum, and if so we will save the associated parameters. That will imply that as soon as we completed looping through all possibilities we will have the optimal parameter values ready. To do this we need a variable min_rss that always stores the best RSS. before the loop starts we need to initialise this with a value that we know will be beaten. Here that is the sum of squares of [math]y_t[/math], (judgedata(:,1)’*judgedata(:,1)). Then we need a [math](2 \times 1)[/math] vector in which we store information on which of the possible parameter values is associated with that min_rss, here called min_rss_beta.

clc             % Clears the command window
clear all       % Clears workspace
judgedata = xlsread('judge.xls');       % import data for judge example
                                        % col1: y, col2: x1, col3: x2

grid_beta0 = (-3:0.01:3)';      % Set grid of values for parameters
grid_beta1 = (-3:0.01:3)';      % \beta_0 and \beta_1

        % RSS for all possible par combinations will be saved in here
save_rss = zeros(length(grid_theta1),length(grid_theta2));

min_rss = judgedata(:,1)'*judgedata(:,1); % variable to store best RSS
min_rss_beta = [1;1];            % indicates which values of grid_beta0 and grid_beta1 is associated with min_rss

Now we are ready to start the loops. As we have two parameter values we need two nested loops in order to evaluate all possible combinations of parameter values. In the loop we call judgerss(beta,judgedata). Note that the data (judgedata) remain the same throughout the code, but the parameter vector used changes for every iteration. The result is saved straight into save_rss. Also, before continuing to the next possible combination we check whether this latest RSS (save_rss(i0,i1)) is the new best (smallest) value for the RSS and if so we save it and the associated parameter vector min_rss_beta = beta. If not we don’t do anything and just continue to the next possible combination (which is why the if command has no else part).

for i0 = 1:size(grid_beta0,1)       % loop through \beta_0
    for i1 = 1:size(grid_beta1,1)   % loop through \beta_1

        beta = [grid_beta0(i0);grid_beta1(i1)];     % get \beta_0 and \beta_1 values
        save_rss(i0,i1) = judgerss(beta,judgedata); % calculate rss for this combination and save

        if save_rss(i0,i1) < min_rss        % check if we found new minimum RSS
            min_rss = save_rss(i0,i1);      % if yes save as new best
            min_rss_beta = beta;            % and save the associated beta
        end

    end
end

The very last thing you want to do is to display your results. The straightforward way to do so is:

disp('optimal par value after grid search');
disp(min_rss_beta);
disp('Min RSS value');
disp(min_rss);

which for this example should deliver:

optimal par value after grid search
   -0.1400
    1.2400

Min RSS value
   16.2503

One advantage of a grid search is that you automatically have all the information required for a contour plot which you can use to visualise the problem the grid search just solved.

v = [(16:0.1:16.9)';(17:1:51)']; % where to draw contour lines
contour(grid_beta0,grid_beta1,save_rss,v); figure(gcf)
xlabel('\beta_1');
ylabel('\beta_0');
title('Judge Contour plot');

Don’t worry about the details of this command at this stage. There is more detail on how to plot pictures in MATLAB in the Graphing Section. It should produce the following picture

This can be read like a topographical map with height lines. You can identify two peaks (although here they are really troughs as we are looking for a minimum). The optimum is at parameter values [math]\beta_0 = -0.14[/math] and [math]\beta_1=1.24[/math].

The disadvantage of this method (just to repeat what was discussed in the theory session, is that this becomes extremely computing intensive if you have many parameters. Also, you need to know that the optimal parameter values are inside the pre-specified grid. Both of these concerns are addressed by the more general nonlinear optimisation routines below (however this advantage comes at a price, nothing is for free).

Gradient-based optimisation, fminunc

coming soon

Non-gradient based, fminsearch

coming soon