MaxLik
Introduction
This Section will make use of the nonlinear optimisation procedure described in this Section. There it was described how you can make MATLAB minimise a function by identifying that parameter vector that minimises an objective/target function. In Maximum Likelihood (ML) section the target function will be a negative (log)likelihood function. We want to choose a set of parameters such that the likelihood for these data is maximised (or the negative likelihood minimised).
Below we will illustrate the use of maximum likelihood with a very simple example, in particular a simple linear regression. While we know that we can solve such a problem by means of OLS, we therefore have a means of comparison for the results we obtain from ML estimation. We will introduce the objective function that is to be minimised and then also discuss how to obtain the relevant standard errors for ML parameter estimates.
The example
The problem addressed here is to find the parameters of the following linear model:
[math]\begin{aligned} y_{t}&=&\beta _{0}+\beta _{1}x_{1,t}+\beta _{2}x_{2,t}+u_{t}\\ u_t&\sim&N(0,\sigma^2)\end{aligned}[/math]
It is of course well known that if the data are stationary then the above model is best estimated by OLS which will find an estimate for [math]\mathbf{\beta}=(\beta_0~\beta_1~\beta_2)'[/math]. Once that is obtained one can obtain an estimate for [math]\sigma^2[/math] by estimating the sample variance of the estimated regression residual.
When performing ML estimation one will obtain estimates for all four parameters [math]\mathbf{\theta}=(\mathbf{\beta}'~\sigma)'[/math] in one swoosh. It is well known (check for instance the textbooks in the Literature Listing at the bottom of this page) that ML estimates for [math]\mathbf{\theta}[/math] are asymptotically normally distributed, assuming that the model is correctly specified.
The reason why we use this apparently simple example to introduce ML is actually a very important programming point. When you programme something new you really want to apply your method to a problem to which you know the solution and therefore you can check whether you actually obtain what you know to be the correct solution. Once that is established you can explore either more complicated or new problems.
The negative log-likelihood function
Before writing the objective function we need to recognise that it is the idea of ML estimates to maximise the log-likelihood of the data conditional on the parameters [math]lnL(\theta) = \sum log(p(y_t|\mathbf{x}_t,\mathbf{\theta}))[/math]. For the above model it is easy to establish that that [math]y_t[/math] conditional on [math]\mathbf{x}_t[/math] and [math]\mathbf{\theta}[/math] is normally distributed as follows
[math]y_t \sim N(\beta _{0}+\beta _{1}x_{1,t}+\beta _{2}x_{2,t},\sigma^2).[/math]
Therefore the density is as follows:
[math]p(y_t|\mathbf{x}_t,\mathbf{\theta})=\frac{1}{\sqrt(2 \pi \sigma^2)}exp(-\frac{1}{2} \frac{(y_t-\beta _{0}-\beta _{1}x_{1,t}-\beta _{2}x_{2,t})^2}{\sigma^2}).[/math]
When replacing [math]y_t-\beta _{0}-\beta _{1}x_{1,t}-\beta _{2}x_{2,t}[/math] with [math]r_t[/math] and taking the log of the density we obtain:
[math]ll_t=log(p(y_t|\mathbf{x}_t,\mathbf{\theta}))=-\frac{1}{2}log(2\pi) -\frac{1}{2}log(\sigma^2)-\frac{r_t^2}{2 \sigma^2}.[/math]
This will be at the core of the negative log-likelihood function. We need the negative of the log-likelihood as the MATLAB optimisation algorithm (fminunc
or fminsearch
) minimise rather than maximise.
function nll = nll_lin(theta, data , vec)
% input: (i) theta, coef vector, last element is error sd
% (ii), data matrix (T x k), col1: y cols2:end: explan. variables (incl constant)
% (iii), 0 = if vector of loglikelihoods, and 1 if sum should be
% returned
beta = theta(1:end-1);
sig = abs(theta(end))+0.000001; % this ensures a non-zero variance
y = data(:,1);
x = data(:,2:end);
res = y - x*beta;
nll = (-0.5)*(-log(2*pi)-log(sig^2)-(res.^2/(sig^2)));
if vec
nll = sum(nll);
end
end