MaxLik
Contents
Introduction
This Section will make use of the nonlinear optimisation procedure described in this Section. There it was described how you can make MATLAB minimise a function by identifying that parameter vector that minimises an objective/target function. In Maximum Likelihood (ML) section the target function will be a negative (log)likelihood function. We want to choose a set of parameters such that the likelihood for these data is maximised (or the negative likelihood minimised).
Below we will illustrate the use of maximum likelihood with a very simple example, in particular a simple linear regression. While we know that we can solve such a problem by means of OLS, we therefore have a means of comparison for the results we obtain from ML estimation. We will introduce the objective function that is to be minimised and then also discuss how to obtain the relevant standard errors for ML parameter estimates.
The example
The problem addressed here is to find the parameters of the following linear model:
[math]\begin{aligned} y_{t}&=&\beta _{0}+\beta _{1}x_{1,t}+\beta _{2}x_{2,t}+u_{t}\\ u_t&\sim&N(0,\sigma^2)\end{aligned}[/math]
It is of course well known that if the data are stationary then the above model is best estimated by OLS which will find an estimate for [math]\mathbf{\beta}=(\beta_0~\beta_1~\beta_2)'[/math]. Once that is obtained one can obtain an estimate for [math]\sigma^2[/math] by estimating the sample variance of the estimated regression residual.
When performing ML estimation one will obtain estimates for all four parameters [math]\mathbf{\theta}=(\mathbf{\beta}'~\sigma)'[/math] in one swoosh. It is well known (check for instance the two textbooks in the Literature Listing at the bottom of this page) that ML estimates for [math]\mathbf{\theta}[/math] are asymptotically normally distributed, assuming that the model is correctly specified.
The reason why we use this apparently simple example to introduce ML is actually a very important programming point. When you programme something new you really want to apply your method to a problem to which you know the solution and therefore you can check whether you actually obtain what you know to be the correct solution. Once that is established you can explore either more complicated or new problems.
The negative log-likelihood function
Before writing the objective function we need to recognise that it is the idea of ML estimates to maximise the log-likelihood of the data conditional on the parameters [math]lnL(\theta) = \sum log(p(y_t|\mathbf{x}_t,\mathbf{\theta}))[/math]. For the above model it is easy to establish that that [math]y_t[/math] conditional on [math]\mathbf{x}_t[/math] and [math]\mathbf{\theta}[/math] is normally distributed as follows
[math]y_t \sim N(\beta _{0}+\beta _{1}x_{1,t}+\beta _{2}x_{2,t},\sigma^2).[/math]
Therefore the density is as follows:
[math]p(y_t|\mathbf{x}_t,\mathbf{\theta})=\frac{1}{\sqrt(2 \pi \sigma^2)}exp(-\frac{1}{2} \frac{(y_t-\beta _{0}-\beta _{1}x_{1,t}-\beta _{2}x_{2,t})^2}{\sigma^2}).[/math]
When replacing [math]y_t-\beta _{0}-\beta _{1}x_{1,t}-\beta _{2}x_{2,t}[/math] with [math]r_t[/math] and taking the log of the density we obtain:
[math]log(p(y_t|\mathbf{x}_t,\mathbf{\theta}))=-\frac{1}{2}log(2\pi) -\frac{1}{2}log(\sigma^2)-\frac{r_t^2}{2 \sigma^2}.[/math]
This will be at the core of the negative log-likelihood function. We need the negative of the log-likelihood as the MATLAB optimisation algorithm (fminunc
or fminsearch
) minimise rather than maximise.
function nll = nll_lin( theta, data , vec)
% input: (i) theta, coef vector, last element is error sd
% (ii), data matrix, col1: y cols2:end: explan. variables (incl constant)
% (iii), 0 = if vector of loglikelihoods, and 1 if sum should be
% returned
beta = theta(1:end-1);
sig = abs(theta(end))+0.000001; % this ensures a non-zero variance
y = data(:,1);
x = data(:,2:end);
res = y - x*beta;
nll = (-0.5)*(-log(2*pi)-log(sig^2)-(res.^2/(sig^2)));
if vec
nll = sum(nll);
end
end
The variable nll
is a vector that contains the log density for each observation. The function will either return this vector or the sum of this vector. The need to allow for both types out outputs will become clear when we discuss how to calculate the parameter estimates standard errors.
You can see rom the code that this function will expect to be handed one parameter vector that contains all parameters including the error term’s standard deviation as the last element. The variance used in the definition of nll
of course needs to be positive by definition of being a variance. The MATLAB optimisation routine does not know this and hence it is the job of the econometrician to ensure that the variance used in nll
is indeed positive. This is why we define the parameter to be estimated as the standard deviation sig
and then use sig2
in the calculation. This will be positive even if MATLAB delivers a negative sig
which it may well do[1]. If this was all we wanted to do then the line in which we define sig
as the last element of theta
would merely be: sig = theta(end)
. There is, however, one extra little snag. The variance should also be non-zero (as it appears in the denominator of nll
. To ensure this we add a very small value to abs(theta(end))
.
The above function should be saved in a file called nll_lin.m
.
Calling fminunc
Calling the optimisation function is exactly as explained in the Section. I assume that you imported your data and saved them in a vector y
and a matrix x
where the latter has three columns, the first being a vector of ones to represent the constant, the second a vector with values for [math]x_1[/math] and the third a vector with values for [math]x_2[/math].
datamat = [y x]; % define data matrix for use in nll_lin
theta0 = [mean(y); 0; 0; std(y)]; % this sets the initial parameter vector
options = optimset; % sets optimisation options to default
[betaopt] = fminunc(@nll_lin,theta0,options,datamat,1);
Recall that we need starting values for [math]\mathbf{\theta}[/math]. As we discussed in the Section you should choose reasonable starting values as, otherwise, there is the risk that the optimisation algorithm finds a local rather than a global minimum. Therefore we use the average value of [math]y_t[/math] as the starting value for the constant and the standard deviation of [math]y_t[/math] as the starting value for the error standard deviation. This would indeed be the correct values if [math]x_{1t}[/math] and [math]x_{2t}[/math] were insignificant ([math]\beta_1 = \beta_2 = 0[/math]). This why the starting values for [math]\beta_1[/math] and [math]\beta_2[/math] are set to 0.
Also note that the last two inputs into the fminunc
function are dataset
which is the matrix containing the data and a 1
which lets the function know that it should return the sum of the negative log-likelihood.
When you estimate this model you should virtually obtain the same parameters as if you had estimated the model by OLS
Literature
Hamilton J.D. (1994) Time Series Analysis, Princeton, Section 5.7 as well as Judge G.G, W.E. Griffiths, R.C. Hill, H. Lütkepohl and T.-C. Lee (1985) The Theory and Practice of Econometrics, John Wiley, Appendix B, give good introductions into the mechanics of nonlinear optimisation algorithms.
Martin V., Hurn S. and Harris D. (2012) Econometric Modelling with Time Series: Specification, Estimation and Testing (Themes in Modern Econometrics), Chapter 3 gives an excellent introduction into nonlinear optimisation strategies.
Footnotes
- ↑ The optimisation algorithm
fminunc
orfminsearch
will explore any value (positive or negatiove) to minimise the objective function. If you need constraint parameter values that cannot be obtained with the simple strategy used here you should consider usingfmincon
.