NonlinOptTheory

From ECLR
Revision as of 12:39, 9 October 2012 by Admin (talk | contribs)
Jump to: navigation, search

Introduction

This section will give you an introduction to a very common problem in econometrics[1]. Recall what Ordinary Least Squares achieves, it minimises the sum of squared residuals [math]\sum_{t=1}^{T}u_{t}[/math] in the following sort of regression specifications

[math]y_{t}=\beta _{0}+\beta _{1}x_{1,t}+\beta _{2}x_{2,t}+u_{t}\text{.}[/math]

In other words it found those values of [math]\beta _{0}[/math] and [math]\beta _{1}[/math] that left us with the smallest variation of the observed values [math]y_{t}[/math] around its predictions. In your basic econometrics lectures you learned that, provided this specification is linear in its parameters, you can find the optimal values for [math]\mathbf{\beta }=\left( \beta _{0}~\beta _{1}\right)' [/math] analytically, meaning there is a particular formula which gives you [math]% \widehat{\mathbf{\beta }}_{OLS}[/math] as a function of the values observed for [math]% y_{t}[/math], [math]x_{1,t}[/math] and [math]x_{2,t}[/math].

Unfortunately there will be many other occasions where your model parameters cannot be found by means of OLS. These could be linear models, but you may like to find parameters that optimise a criterion other than the residual sum of squares (say you want to minimise some moment conditions) or it could be nonlinear models. What is required in such situations is a nonlinear optimisation algorithm. What does this mean? Well, essentially we will have to follow some trial and error strategy to find the parameter values which optimises the criterion function.

Nonlinear Least Squares

In order to introduce the problem we stick to the minimisation of residual sum of squares as our objective, but we want to do that in a nonlinear model. Consider the simple model:

[math]y_{t}=\beta _{0}+\beta _{1}x_{1,t}+\beta _{1}^{2}x_{2,t}+u_{t} \label{Judge}[/math]

where the [math]u_{t}[/math] are independent and identically distributed random variables with mean [math]0[/math] and variance [math]\sigma ^{2}[/math]. (I call this function the Judge Function as it appears in a book by Judge and others). The nonlinear least-squares estimate for [math]\mathbf{\beta }[/math] is defined as that value of [math]\mathbf{\beta} = (\beta_0~\beta_1)'[/math] that minimises the residual sum of squares

[math]S(\mathbf{\beta })=\sum_{t=1}^{T}\hat{u}_{t}^{2}=\sum_{t=1}^{T}\left( y_{t}-\beta _{0}-\beta _{1}x_{1,t}-\beta _{1}^{2}x_{2,t}\right) ^{2}[/math]

The smallest achievable value for [math]S(\mathbf{\beta })[/math] is known as the global minimum, and the value of [math]\mathbf{\beta}[/math] that delivers that value is the optimal parameter. But how to find it? The next two sections will introduce two fundamentally different strategies. The first (grid search) is easy to understand but not implementable in complex problems. The second (nonlinear optimisation) is more generally applicable.

Grid Search

The simplest way of finding an estimate of [math]\mathbf{\beta }[/math] is by what is known as a grid search. This procedure is a brute force computational approach. One has to propose a range of possible values for each parameter (here [math]\beta _{0}[/math] and [math]\beta _{1}[/math]), then one simply computes [math]S(\mathbf{% \beta })[/math] for all possible combinations of values of [math]\mathbf{\beta }[/math] and identifies the parameter combination for which [math]S(\mathbf{\beta })[/math] is minimized.

beta0_range = % define a (n0 x 1) vector of possible values for beta0
beta1_range = % define a (n1 x 1) vector of possible values for beta1

save_S = zeros(n1,n2);

for i0 = 1:n0
    for i1 = 1: n1

        save_S(i0,i1) = % calculate S as a function of beta0_range(n0) and beta1_range(i1)

    end
end

% find minimum value of S in save_S
% values of beta0_range and beta1_range that are related to that minimun value
% are the optimal parammeter values from the grid search

As a general proposition the grid-search approach is not attractive as it is computationally extremely intensive. It is easy to see that this procedure quickly becomes infeasible when you either have many parameters and/or use a finer/larger grid for the parameters. You will also have to rely on (aka hope that) the optimal parameter values lie inside the ranges you chose for the parameters. Fortunately there are alternative optimisation algorithms which do not posses these shortcomings.

In the Implementation Section you can find an example for a grid search. For the above reasons grid search is rarely used as the optimisation algorithm of choice. However, it often has a role to play in finding starting values for a nonlinear optimisation algorithm.

General Nonlinear Optimisation Algorithms

Here I will provide a schematic representation of the procedure used to find optimal parameter estimates when analytical solutions are not available. Such algorithms are called nonlinear optimisation algorithms. Consider [math]% S(\theta )[/math] to be the target function, the function that is to be minimised. The value of this function varies with the parameter vector [math]\theta [/math] and the goal is to find that parameter vector, [math]\widehat{\theta }[/math], which minimises [math]S(\widehat{\theta })[/math]. In the above example [math]S(\theta )[/math] was the sum of squared residuals, [math]S(\theta )=\sum_{t=1}^{T}\hat{u}_{t}^{2}[/math], and the parameter vector was [math]\theta =\mathbf{\beta }[/math], but that is only a special case. Very commonly [math]S(\theta )[/math] represents a negative likelihood function [math]S(\theta )=-L\left( \theta \right) [/math], where [math]L\left( \theta\right) [/math] is the likelihood function to be maximised (and hence [math]S\left(\theta \right) [/math] is to be minimised). More on this in a later section, for the time being you only got to realise that [math]S\left( \theta \right) [/math] is to minimised, however it is defined.

 In general, the task is to provide an iterative scheme

[math]\widehat{\theta }^{\left( 1\right) }=\widehat{\theta }^{\left( 0\right)}+\eta \label{theta_k1}\\ \widehat{\theta }^{\left( 2\right) }=\widehat{\theta }^{\left( 1\right)}+\eta \\ ...\\ \widehat{\theta }^{\left( k+1\right) }=\widehat{\theta }^{\left( k\right)}+\eta ...[/math]

Literature

Hamilton J.D. (1994) Time Series Analysis, Princeton, Section 5.7 as well as Judge G.G, W.E. Griffiths, R.C. Hill, H. Lütkepohl and T.-C. Lee (1985) The Theory and Practice of Econometrics, John Wiley, Appendix B, give good introductions into the mechanics of nonlinear optimisation algorithms. Martin V., Hurn S. and Harris D. (2012) Econometric Modelling with Time Series: Specification, Estimation and Testing (Themes in Modern Econometrics), Chapter 3 gives an excellent introduction into nonlinear optimisation strategies.

Footnotes

  1. I am indebted to Stan Hurn, School of Economics and Finance, Queensland University of Technology, who allowed me to use some of his lecture material.