Maximum Likelihood Theory

8.1. Introduction

Consider a random sample Z ь…, Zn from a ^-variate distribution with density f (z0), where в0 є © c 1” is an unknown parameter vector with © a given parameter space. As is well known, owing to the independence of the Zj’s, the joint density function of the random vector Z = (ZT, zj )T is the product of the marginal densities, ПП=1 f (zj в0). The likelihood function in this case is defined as this joint density with the nonrandom arguments zj replaced by the corresponding random vectors Zj, and в 0 by в:

n

L n (в) = П f (Zjв). (8.1)

j=1

The maximum likelihood (ML) estimator of в0 is now в = argmaxeє©Ln (в), or equivalently,

в = argmaxln(L n (в)), (8.2)

where “argmax” stands for the argument for which the function involved takes its maximum value.

The ML estimation method is motivated by the fact that, in this case,

E[ln(Ln(в))] < E[ln(Ln(во))]. (8.3)

To see this, note that ln(u) = u — 1 for u = 1 and ln(u) < u — 1for0 < u < 1 and u > 1. Therefore, if we take u = f (Zj в)/f (Zj в0) it follows that, for all в, ln(f (Zjв)/f (Zjв0)) < f (Zjв)/f (Zjв0) — 1, and if we take expectations

it follows now that

E[ln(f (Zjв)/f(Zj |0o))] < E[f (Zjв)/f(Zj |0q)] – 1

Summing up for j = 1, 2,…,n, (8.3) follows.

This argument reveals that neither the independence assumption of the data Z = (ZT, ZT)T nor the absolute continuity assumption is necessary for (8.3). The only condition that matters is that

E [Ln (в )/Ln (в0)] < 1

for all в e © and n > 1. Moreover, if the support of Zj is not affected by the parameters in в0 – that is, if in the preceding case the set {z e r” : f (ze) > 0} is the same for all в e © – then the inequality in (8.4) becomes an equality:

E [Ln (в )/L n (в0)] = 1

for all в e © and n > 1. Equality (8.5) is the most common case in eco­nometrics.

To show that absolute continuity is not essential for (8.3), suppose that the Zj’s are independent and identically discrete distributed with support S, that is, for all z e S, P[Zj = z] > 0 and J]zeS P[Zj = z] = 1. Moreover, now let f (ze0) = P[Zj = z], where f (ze) is the probability model involved. Of course, f (ze) should be specified such that J]zeS f (ze) = 1forall в e ©.For example, suppose that the Zj’s are independent Poisson (в0) distributed, and thus f (ze) = е-ввz/z! and S = {0, 1, 2,…}. Then the likelihood function involved also takes the form (8.1), and

E[f(Zjв)/f(Zjв0)] = g f(z^) = g f(z^) = 1;

hence, (8.5) holds in this case as well and therefore so does (8.3).

In this and the previous case the likelihood function takes the form of a prod­uct. However, in the dependent case we can also write the likelihood function as a product. For example, let Z = (ZT, Zj)T be absolutely continuously distributed with joint density fn (zn,…, zi в0), where the Zj’s are no longer independent. It is always possible to decompose a joint density as a product of conditional densities and an initial marginal density. In particular, letting, for t > 2,

ft (ztzt-1, …, Zl, в) = ft (zt, …, z1 в )/ft-1(zt-1, …, z1 в),

we can write

n

fn (zn, Zlв) = fi(zie )f[ ft (zt zt-1, …,Z1,0).

t =2

Therefore, the likelihood function in this case can be written as

n

L n (в) = fn (Zn, Zi) = fi( Z ів Щ f (Zt Zt-i, Z і, в).

t =2

(8.6)

It is easy to verify that in this case (8.5) also holds, and therefore so does (8.3). Moreover, it follows straightforwardly from (8.6) and the preceding argument that

for t = 2, 3,…, n;

hence,

P(E[ln(Lt(в)/Lt-і(в)) – ln(Lt(во)/Lt-і(во))Zt-і = і for t = 2, 3,…,n.

Of course, these results hold in the independent case as well.