ConNonlinOptImp

From ECLR
Jump to: navigation, search


Introduction

This Section is based on the Section in which we demonstrate how to implement an unconstrained optimisation (which is where the UNC in fminUNC) came from. This was called unconstrained, because we allowed the optimisation to find any parameter value as the optimal value. The optimal parameter value could be any real number on the real line.

This is appropriate on many occasions (and is equivalent to what happens when we estimate a linear regression model by OLS) but may be inappropriate in some:

  1. Some parameters only make sense if they are constraint to a subset of the real line. An example are variances or standard deviations which have to be positive.
  2. You want to estimate a restricted model where the restriction cannot be implemented by a rearrangement of the explanatory variables (e.g. the dropping of a variable from the model).
  3. You want to estimate a model where some model parameters are related to each other and you need to impose this relationship.

There are basically two ways in which you can implement restrictions. The first is by imposing the restrictions inside the function which is to be optimised (minimised), the second is to use a constrained optimisation routine.

Adjusting the target function

This is the easiest way to deal with constraints if at all possible. As an example we shall consider the case in which we estimate a linearasied regression function

[math]y_{t}=\beta _{0}+\beta _{1}x_{1,t}+\beta _{1}^{2}x_{2,t}+u_{t}[/math]

by Maximum Lilkelihood. In that case we will have to specify the distribution of the error terms [math]u_t[/math] as that will determine the form of the target function. Let’s assume that [math]u_t[/math] come from a normal distribution with variance [math]\sigma^2[/math]. It is well known that the density of the normal distribution is

[math]f(u_{t})=\frac{1}{\sqrt{2\pi \sigma^2}}exp\left( \frac{u_t^2}{2\sigma^2}\right)[/math]

In practice we will have to replace [math]u_t[/math] with the estimate

[math]\widehat{u}_{t}=y_{t}-\beta _{0}-\beta _{1}x_{1,t}-\beta _{1}^{2}x_{2,t}[/math]

This means that [math]f(u_{t})[/math] is a function of all three parameters ([math]\beta _{0}[/math], [math]\beta _{1}[/math] and [math]\sigma[/math]). We want to minimise the negative log-likelihood function and the following target function is written to return just that.

function nll = judgeml(theta,data)
% this returns the negative log likelihood function for the Judge function
% input: (i) theta, parameter vector, 3rd value is standard deviation
%        (ii) data, data matrix

    % first calculate residual for res = y - theta(1) - theta(2)*x1 -
    % theta(2)^2*x2
res = data(:,1) - theta(1) - theta(2)*data(:,2) - theta(2)^2*data(:,3);

    % negative log likelihood
nlogl = 0.5*log(2*pi) + 0.5*log(theta(3)^2) + (res.^2)./(2*theta(3)^2);
nll = sum(nlogl);     % output for function is sum negative log likelihood

end

Why is this function an example for imposing a restriction through the target function? One of the parameters that needs to be estimated is the error variance [math]\sigma^2[/math]. That variance needs to be positive, besides, if it wasn’t the term [math]log(\sigma^2)[/math] was not computable. This is the reason why the parameter vector is defined to contain [math]\theta=(\beta _{0}~\beta _{1}~\sigma)'[/math] and not [math]\theta=(\beta _{0}~\beta _{1}~\sigma^2)'[/math]. Note that in the target function we actually never need [math]\sigma[/math], but always [math]\sigma^2[/math]. This is fortunate as in that way, [math]\sigma^2[/math] (i.e. theta(3)^ 2) will always be positive, even if the nonlinear optimisation routine (e.g. fminunc or fminsearch) suggests a negative value for [math]\sigma[/math] (i.e. theta(3)).

If we had submitted the variance into the judgeml(theta,data) as the third element in theta and had directly used theta(3) in the calculations it may well have happened that fminunc or fminsearch had attempted to use a negative value for that parameter, leading to nonsensical results (or even an error message in the code). Remember, the optimisation routines are just "stupid" trial and error machines who have no knowledge of the meaning of the parameters. If you had submitted the variance, you could rescue the situation by not using theta(3) directly in the calculation of the negative log likelihood, but rather abs(theta(3)).

Useful parameter transformation

Sometimes you need parameters to be constrained to certain regions. Examples:

  • A parameter should be positive (e.g. a variance parameter).
  • A parameter should take a value between 0 and 1.
  • A parameter should take a value between [math]a[/math] and [math]b[/math].

There are fairly straightforward ways to enforce such restrictions without having to resort to constrained optimisation algorithms (see below).

Let’s say a parameter ([math]\theta[/math]) should be positive (e.g. a variance parameter). We have seen in the above example for judgeml how to deal with this. We define the parameter that get’s handed into the function as the standard deviation and whenever we need the variance we use the squared value of the parameter handed in as standard deviation. But what if we need the standard deviation. In that case we use either the absolute value (abs(theta)) of the parameter handed in or we now take the square root of the variance (sqrt(theta2)). This will ensure that even if the optimisation algorithm hands in a negative value for the standard deviation, it will be treated as if it was positive. You just need to remember that if your algorithm returned an optimised negative standard deviation, that it really is a positive value.

Next assume that you parameter should be larger than a certain value, say [math]\theta \gt 5[/math]. It is then best to take the input theta and transform it into theta2 = 5 + exp(theta). Confirm yourself that theta2 will be larger than 5 for any value of theta on the real line.

Another common restriction you may be interested in is that a parameter should take values between 0 and 1. Now you need to find a function that transposes any value on the real line into that interval. We are in uck as we should be familiar with one example of such functions, cumulative density functions. So here is the trick. The parameter handed into the function is theta, but the parameter you really use it theta2 = normcdf(theta,0,1). This works as normcdf(theta,0,1) will deliver the cumulative (standard) normal density for value theta and that, by definition of the cdf is between 0 and 1.

One drawback of the above methods is that any transformations may complicate the calculation of standard errors. Say you restricted the parameter to be in the unit interval according to theta2 = normcdf(theta,0,1). This means that it really is the parameter theta2 which has some economic meaning and hence you would like any standard errors (and hence hypothesis tests) to apply to theta2 rather than theta. However, if you use the techniques described in the section describing how to estimate standard errors for Maximum Likelihood parameter estimates, then these methods will automatically calculate the standard error for theta (the parameter the optimisation black box sees) and not theta2. This means that you need to translate the standard error for theta into the one for theta2. The method used is called the Delta Method.

Using fmincon

There will be occasions where implementing parameter restrictions with the methods above is impossible or cumbersome, or you want to avoid the issues arising from having to translate the estimated standard errors. What we want is a bigger tool, an optimisation routine that can be implemented with similar ease as fminunc but at the same time respects parameter restrictions. Unsurprisingly this is not possible, but the nonlinear optimiser fmincon gets close. All we need is to tell MATLAB what the restrictions are and once we’ve done that the implementation is indeed as straightforward as the use of fminunc.

Let’s first review how we would optimise the judgeml function written above, being the one that requires a three parameter input, the last being the standard deviation. We would call it as follows

theta0 = [0;0;1];   % need to specify initial parameter vector, 3rd parameter is sigma
options = optimset('Display','iter');
thetaopt = fminunc(@judgeml,theta0,options,data);
disp('estimated parameters');
disp(thetaopt');

which will result in the following output

estimated parameters
   -0.1305    1.2355    0.9014

confirming that we get the right results. As discussed previously we made sure that negative inputs for the standard deviation parameter theta0(3) would not cause any problems as we always used the square value[1].

What would we do if we wanted to ensure that the optimisation routine does not hand in negative values for the standard deviation (as discussed that wouldn’t be necessary here)? We would use the fmincon optimisation routine in the following manner:

theta0 = [0;0;1];   % need to specify initial parameter vector, 3rd parameter is sigma
options = optimset('Display','iter');

            % specify constraints
A = [];
B = [];
Aeq = [];
Beq = [];
LB = [-inf -inf 0.00001];
UB = [inf inf inf];
NONLCON = [];

thetaopt = fmincon(@judgeml,theta0,A,B,Aeq,Beq,LB,UB,NONLCON,options,data);
disp('estimated parameters');
disp(thetaopt');

First look at the line in which we call the fmincon function. If you compare that to the line in which we call the fminunc routine you will see that the fmincon function demands the following seven extra inputs: A,B,Aeq,Beq,LB,UB,NONLCON. You will then see that all these inputs, which are all related to possible constraints are defined just above the fmincon call. Most of these have been defined as empty variables ([]), which means that they are not in use. The only ones used here are LB and UB representing lower and upper bounds. In each of these we have three values, one for each of the three coefficients. However, we only want to restrict the last, the standard deviation, and that should be restricted only such that it should not become negative. Hence we restrict it to be in the interval [math][0.00001,\inf][/math]. The first two coefficients should remain unrestricted and hence their intervals are [math][-\inf,\inf][/math] and [math][-\inf,\inf][/math].

When we run this code we get exactly the same parameter estimates as before plus a notice that none of the constraints is activated, meaning that the optimal parameter estimates are not in conflict with the imposed constraints.

Footnotes

  1. This is best illustrated if you change the starting value for the standard deviation from 1 to -1. What happens nothing much different, just that the estimated value for theta(3) is -0.9014, but you will realise that the estimated standard error is really 0.9014.