ConNonlinOptImp
Contents
Introduction
This Section is based on the Section in which we demonstrate how to implement an unconstrained optimisation (which is where the UNC in fminUNC) came from. This was called unconstrained, because we allowed the optimisation to find any parameter value as the optimal value. The optimal parameter value could be any real number on the real line.
This is appropriate on many occasions (and is equivalent to what happens when we estimate a linear regression model by OLS) but may be inappropriate in some:
- Some parameters only make sense if they are constraint to a subset of the real line. An example are variances or standard deviations which have to be positive.
- You want to estimate a restricted model where the restriction cannot be implemented by a rearrangement of the explanatory variables (e.g. the dropping of a variable from the model).
- You want to estimate a model where some model parameters are related to each other and you need to impose this relationship.
There are basically two ways in which you can implement restrictions. The first is by imposing the restrictions inside the function which is to be optimised (minimised), the second is to use a constrained optimisation routine.
Adjusting the target function
This is the easiest way to deal with constraints if at all possible. As an example we shall consider the case in which we estimate a linearasied regression function
[math]y_{t}=\beta _{0}+\beta _{1}x_{1,t}+\beta _{1}^{2}x_{2,t}+u_{t}[/math]
by Maximum Lilkelihood. In that case we will have to specify the distribution of the error terms [math]u_t[/math] as that will determine the form of the target function. Let’s assume that [math]u_t[/math] come from a normal distribution with variance [math]\sigma^2[/math]. It is well known that the density of the normal distribution is
[math]f(u_{t})=\frac{1}{\sqrt{2\pi \sigma^2}}exp\left( \frac{u_t^2}{2\sigma^2}\right)[/math]
In practice we will have to replace [math]u_t[/math] with the estimate
[math]\widehat{u}_{t}=y_{t}-\beta _{0}-\beta _{1}x_{1,t}-\beta _{1}^{2}x_{2,t}[/math]
This means that [math]f(u_{t})[/math] is a function of all three parameters ([math]\beta _{0}[/math], [math]\beta _{1}[/math] and [math]\sigma[/math]). We want to minimise the negative log-likelihood function and the following target function is written to return just that.
function nll = judgeml(theta,data)
% this returns the negative log likelihood function for the Judge function
% input: (i) theta, parameter vector, 3rd value is standard deviation
% (ii) data, data matrix
% first calculate residual for res = y - theta(1) - theta(2)*x1 -
% theta(2)^2*x2
res = data(:,1) - theta(1) - theta(2)*data(:,2) - theta(2)^2*data(:,3);
% negative log likelihood
nlogl = 0.5*log(2*pi) + 0.5*log(theta(3)^2) + (res.^2)./(2*theta(3)^2);
nll = sum(nlogl); % output for function is sum negative log likelihood
end
Why is this function an example for imposing a restriction through the target function? One of the parameters that needs to be estimated is the error variance [math]\sigma^2[/math]. That variance needs to be positive, besides, if it wasn’t the term [math]log(\sigma^2)[/math] was not computable. This is the reason why the parameter vector is defined to contain [math]\theta=(\beta _{0}~\beta _{1}~\sigma)'[/math] and not [math]\theta=(\beta _{0}~\beta _{1}~\sigma^2)'[/math]. Note that in the target function we actually never need [math]\sigma[/math], but always [math]\sigma^2[/math]. This is fortunate as in that way, [math]\sigma^2[/math] (i.e. theta(3)^ 2
) will always be positive, even if the nonlinear optimisation routine (e.g. fminunc
or fminsearch
) suggests a negative value for [math]\sigma[/math] (i.e. theta(3)
).
If we had submitted the variance into the judgeml(theta,data)
as the third element in theta
and had directly used theta(3)
in the calculations it may well have happened that fminunc
or fminsearch
had attempted to use a negative value for that parameter, leading to nonsensical results (or even an error message in the code). Remember, the optimisation routines are just "stupid" trial and error machines who have no knowledge of the meaning of the parameters. If you had submitted the variance, you could rescue the situation by not using theta(3)
directly in the calculation of the negative log likelihood, but rather abs(theta(3))
.
Useful parameter transformation
Sometimes you need parameters to be constrained to certain regions. Examples:
- A parameter should be positive (e.g. a variance parameter).
- A parameter should take a value between 0 and 1.
- A parameter should take a value between [math]a[/math] and [math]b[/math].
There are fairly straightforward ways to enforce such restrictions without having to resort to constrained optimisation algorithms (see below).
Let’s say a parameter ([math]\theta[/math]) should be positive (e.g. a variance parameter). We have seen in the above example for judgeml
how to deal with this. We define the parameter that get’s handed into the function as the standard deviation and whenever we need the variance we use the squared value of the parameter handed in as standard deviation. But what if we need the standard deviation. In that case we use either the absolute value (abs(theta)
) of the parameter handed in or we now take the square root of the variance (sqrt(theta2)
). This will ensure that even if the optimisation algorithm hands in a negative value for the standard deviation, it will be treated as if it was positive. You just need to remember that if your algorithm returned an optimised negative standard deviation, that it really is a positive value.
Next assume that you parameter should be larger than a certain value, say [math]\theta \gt 5[/math]. It is then best to take the input theta
and transform it into theta2 = 5 + exp(theta)
. Confirm yourself that theta2
will be larger than 5 for any value of theta
on the real line.
Another common restriction you may be interested in is that a parameter should take values between 0 and 1. Now you need to find a function that transposes any value on the real line into that interval. We are in uck as we should be familiar with one example of such functions, cumulative density functions. So here is the trick. The parameter handed into the function is theta
, but the parameter you really use it theta2 = normcdf(theta,0,1)
. This works as normcdf(theta,0,1)
will deliver the cumulative (standard) normal density for value theta
and that, by definition of the cdf is between 0 and 1.
One drawback of the above methods is that any transformations may complicate the calculation of standard errors. Say you restricted the parameter to be in the unit interval according to theta2 = normcdf(theta,0,1)
. This means that it really is the parameter theta2
which has some economic meaning and hence you would like any standard errors (and hence hypothesis tests) to apply to theta2
rather than theta
. However, if you use the techniques described in the section describing how to estimate standard errors for Maximum Likelihood parameter estimates, then these methods will automatically calculate the standard error for theta
(the parameter the optimisation black box sees) and not theta2
. This means that you need to translate the standard error for theta
into the one for theta2
. The method used is called the Delta Method.
Using fmincon
There will be occasions where implementing parameter restrictions with the methods above is impossible or cumbersome, or you want to avoid the issues arising from having to translate the estimated standard errors. What we want is a bigger tool, an optimisation routine that can be implemented with similar ease as fminunc
but at the same time respects parameter restrictions. Unsurprisingly this is not possible, but the nonlinear optimiser fmincon
gets close. All we need is to tell MATLAB what the restrictions are and once we’ve done that the implementation is indeed as straightforward as the use of fminunc
.
Let’s first review how we would optimise the judgeml
function written above, being the one that requires a three parameter input, the last being the standard deviation. We would call it as follows
theta0 = [0;0;1]; % need to specify initial parameter vector, 3rd parameter is sigma
options = optimset('Display','iter');
thetaopt = fminunc(@judgeml,theta0,options,data);
disp('estimated parameters');
disp(thetaopt');
which will result in the following output
estimated parameters
-0.1305 1.2355 0.9014
confirming that we get the right results. As discussed previously we made sure that negative inputs for the standard deviation parameter theta0(3)
would not cause any problems as we always used the square value[1].
What would we do if we wanted to ensure that the optimisation routine does not hand in negative values for the standard deviation (as discussed that wouldn’t be necessary here)? We would use the fmincon
optimisation routine in the following manner:
theta0 = [0;0;1]; % need to specify initial parameter vector, 3rd parameter is sigma
options = optimset('Display','iter');
% specify constraints
A = [];
B = [];
Aeq = [];
Beq = [];
LB = [-inf -inf 0.00001];
UB = [inf inf inf];
NONLCON = [];
thetaopt = fmincon(@judgeml,theta0,A,B,Aeq,Beq,LB,UB,NONLCON,options,data);
disp('estimated parameters');
disp(thetaopt');
First look at the line in which we call the fmincon
function. If you compare that to the line in which we call the fminunc
routine you will see that the fmincon
function demands the following seven extra inputs: A,B,Aeq,Beq,LB,UB,NONLCON
. You will then see that all these inputs, which are all related to possible constraints are defined just above the fmincon
call. Most of these have been defined as empty variables ([]
), which means that they are not in use. The only ones used here are LB
and UB
representing lower and upper bounds. In each of these we have three values, one for each of the three coefficients. However, we only want to restrict the last, the standard deviation, and that should be restricted only such that it should not become negative. Hence we restrict it to be in the interval [math][0.00001,\inf][/math]. The first two coefficients should remain unrestricted and hence their intervals are [math][-\inf,\inf][/math] and [math][-\inf,\inf][/math].
When we run this code we get exactly the same parameter estimates as before plus a notice that none of the constraints is activated, meaning that the optimal parameter estimates are not in conflict with the imposed constraints.
Footnotes
- ↑ This is best illustrated if you change the starting value for the standard deviation from 1 to -1. What happens nothing much different, just that the estimated value for
theta(3)
is -0.9014, but you will realise that the estimated standard error is really 0.9014.