Introduction

This Section will make use of the nonlinear optimisation procedure described in this Section. There it was described how you can make MATLAB minimise a function by identifying that parameter vector that minimises an objective/target function. In Maximum Likelihood (ML) section the target function will be a negative (log)likelihood function. We want to choose a set of parameters such that the likelihood for these data is maximised (or the negative likelihood minimised).

Below we will illustrate the use of maximum likelihood with a very simple example, in particular a simple linear regression. While we know that we can solve such a problem by means of OLS, we therefore have a means of comparison for the results we obtain from ML estimation. We will introduce the objective function that is to be minimised and then also discuss how to obtain the relevant standard errors for ML parameter estimates.

The example

The problem addressed here is to find the parameters of the following linear model:

[math]\begin{aligned} y_{t}&=&\beta _{0}+\beta _{1}x_{1,t}+\beta _{2}x_{2,t}+u_{t}\\ u_t&\sim&N(0,\sigma^2)\end{aligned}[/math]

It is of course well known that if the data are stationary then the above model is best estimated by OLS which will find an estimate for [math]\mathbf{\beta}=(\beta_0~\beta_1~\beta_2)'[/math]. Once that is obtained one can obtain an estimate for [math]\sigma^2[/math] by estimating the sample variance of the estimated regression residual.

When performing ML estimation one will obtain estimates for all four parameters [math]\mathbf{\theta}=(\mathbf{\beta}'~\sigma)'[/math] in one swoosh. It is well known (check for instance the two textbooks in the Literature Listing at the bottom of this page) that ML estimates for [math]\mathbf{\theta}[/math] are asymptotically normally distributed, assuming that the model is correctly specified.

The reason why we use this apparently simple example to introduce ML is actually a very important programming point. When you programme something new you really want to apply your method to a problem to which you know the solution and therefore you can check whether you actually obtain what you know to be the correct solution. Once that is established you can explore either more complicated or new problems.

The negative log-likelihood function

Before writing the objective function we need to recognise that it is the idea of ML estimates to maximise the log-likelihood of the data conditional on the parameters [math]lnL(\theta) = \sum log(p(y_t|\mathbf{x}_t,\mathbf{\theta}))[/math]. For the above model it is easy to establish that that [math]y_t[/math] conditional on [math]\mathbf{x}_t[/math] and [math]\mathbf{\theta}[/math] is normally distributed as follows

[math]y_t \sim N(\beta _{0}+\beta _{1}x_{1,t}+\beta _{2}x_{2,t},\sigma^2).[/math]

Therefore the density is as follows:

[math]p(y_t|\mathbf{x}_t,\mathbf{\theta})=\frac{1}{\sqrt(2 \pi \sigma^2)}exp(-\frac{1}{2} \frac{(y_t-\beta _{0}-\beta _{1}x_{1,t}-\beta _{2}x_{2,t})^2}{\sigma^2}).[/math]

When replacing [math]y_t-\beta _{0}-\beta _{1}x_{1,t}-\beta _{2}x_{2,t}[/math] with [math]r_t[/math] and taking the log of the density we obtain:

[math]log(p(y_t|\mathbf{x}_t,\mathbf{\theta}))=-\frac{1}{2}log(2\pi) -\frac{1}{2}log(\sigma^2)-\frac{r_t^2}{2 \sigma^2}.[/math]

This will be at the core of the negative log-likelihood function. We need the negative of the log-likelihood as the MATLAB optimisation algorithm (fminunc or fminsearch) minimise rather than maximise.

function nll = nll_lin( theta, data , vec)
% input:    (i) theta, coef vector, last element is error sd
%           (ii), data matrix, col1: y cols2:end: explan. variables (incl constant)
%           (iii), 0 = if vector of loglikelihoods, and 1 if sum should be
%           returned
beta = theta(1:end-1);
sig  = abs(theta(end))+0.000001;    % this ensures a non-zero variance
y = data(:,1);
x = data(:,2:end);
res = y - x*beta;
nll = (-0.5)*(-log(2*pi)-log(sig^2)-(res.^2/(sig^2)));

if vec
    nll = sum(nll);
end

end

The variable nll is a vector that contains the log density for each observation. The function will either return this vector or the sum of this vector. The need to allow for both types out outputs will become clear when we discuss how to calculate the parameter estimates standard errors.

You can see rom the code that this function will expect to be handed one parameter vector that contains all parameters including the error term’s standard deviation as the last element. The variance used in the definition of nll of course needs to be positive by definition of being a variance. The MATLAB optimisation routine does not know this and hence it is the job of the econometrician to ensure that the variance used in nll is indeed positive. This is why we define the parameter to be estimated as the standard deviation sig and then use sig2 in the calculation. This will be positive even if MATLAB delivers a negative sig which it may well do^[1]. If this was all we wanted to do then the line in which we define sig as the last element of theta would merely be: sig = theta(end). There is, however, one extra little snag. The variance should also be non-zero (as it appears in the denominator of nll. To ensure this we add a very small value to abs(theta(end)).

The above function should be saved in a file called nll_lin.m.

Calling `fminunc`

Calling the optimisation function is exactly as explained in the Section. I assume that you imported your data and saved them in a vector y and a matrix x where the latter has three columns, the first being a vector of ones to represent the constant, the second a vector with values for [math]x_1[/math] and the third a vector with values for [math]x_2[/math].

datamat = [y x];                        % define data matrix for use in nll_lin
theta0 = [mean(y); 0; 0; std(y)];       % this sets the initial parameter vector
options = optimset; % sets optimisation options to default
[betaopt] = fminunc(@nll_lin,theta0,options,datamat,1);

Recall that we need starting values for [math]\mathbf{\theta}[/math]. As we discussed in the Section you should choose reasonable starting values as, otherwise, there is the risk that the optimisation algorithm finds a local rather than a global minimum. Therefore we use the average value of [math]y_t[/math] as the starting value for the constant and the standard deviation of [math]y_t[/math] as the starting value for the error standard deviation. This would indeed be the correct values if [math]x_{1t}[/math] and [math]x_{2t}[/math] were insignificant ([math]\beta_1 = \beta_2 = 0[/math]). This why the starting values for [math]\beta_1[/math] and [math]\beta_2[/math] are set to 0.

Also note that the last two inputs into the fminunc function are dataset which is the matrix containing the data and a 1 which lets the function know that it should return the sum of the negative log-likelihood.

When you estimate this model you should virtually obtain the same parameters as if you had estimated the model by OLS

Literature

Hamilton J.D. (1994) Time Series Analysis, Princeton, Section 5.7 as well as Judge G.G, W.E. Griffiths, R.C. Hill, H. Lütkepohl and T.-C. Lee (1985) The Theory and Practice of Econometrics, John Wiley, Appendix B, give good introductions into the mechanics of nonlinear optimisation algorithms.

Martin V., Hurn S. and Harris D. (2012) Econometric Modelling with Time Series: Specification, Estimation and Testing (Themes in Modern Econometrics), Chapter 3 gives an excellent introduction into nonlinear optimisation strategies.

Footnotes

↑ The optimisation algorithm fminunc or fminsearch will explore any value (positive or negatiove) to minimise the objective function. If you need constraint parameter values that cannot be obtained with the simple strategy used here you should consider using fmincon.

[1] The optimisation algorithm fminunc or fminsearch will explore any value (positive or negatiove) to minimise the objective function. If you need constraint parameter values that cannot be obtained with the simple strategy used here you should consider using fmincon.

[1]

MaxLik

Contents

Introduction

The example

The negative log-likelihood function

Calling `fminunc`

Literature

Footnotes

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

ECLR

Tools

MaxLik

Contents

Introduction

The example

The negative log-likelihood function

Calling fminunc

Literature

Footnotes

Navigation menu

Search

Calling `fminunc`