Methods of Estimation

Consider a Normal distribution with mean ц and variance a2. This is the important “Gaussian” distribution which is symmetric and bell-shaped and completely determined by its measure of centrality, its mean ц and its measure of dispersion, its variance a2. ц and a2 are called the population parameters. Draw a random sample X1,…,Xn independent and identically distributed (IID) from this population. We usually estimate ц by ft = X and a2 by

s2 = E"=i(Xi – X)2/(n -1).

For example, ц = mean income of a household in Houston. X = sample average of incomes of 100 households randomly interviewed in Houston.

This estimator of ц could have been obtained by either of the following two methods of estimation:

(i) Method of Moments

Simply stated, this method of estimation uses the following rule: Keep equating population moments to their sample counterpart until you have estimated all the population parameters.

B. H. Baltagi, Econometrics, Springer Texts in Business and Economics, DOI 10.1007/978-3-642-20059-5_2, © Springer-Verlag Berlin Heidelberg 2011 The normal density is completely identified by j and a2, hence only the first 2 equations are needed

j! = X and j2 + a2 = E ™=1 X2/n

Substituting the first equation in the second one obtains
a2 = Ei=i x2/n – X2 = Z0=iX – X)2/U

(ii) Maximum Likelihood Estimation (MLE)

For a random sample of size и from the Normal distribution Xi ~ N (j, a2), we have fi(Xi; j, a2) = (1/aV2n) exp {-(Xi – j)2/2a2} – ж < Xi < +ж

Since X1,…, Xn are independent and identically distributed, the joint probability density func­tion is given as the product of the marginal probability density functions:

 (1/2na2)n/2 exp {- En=1(Xi- j)2/2a2} (2-1)

n

f (Xi ,…,Xn; j, a2) = fi(Xi; j, a2)

i=1

Usually, we observe only one sample of и households which could have been generated by any pair of (j, a2) with – ж < j < +ж and a2 > 0. For each pair, say (j0, a0), f (X1,…, Xn; j0, a2) denotes the probability (or likelihood) of obtaining that sample. By varying (j, a2) we get differ­ent probabilities of obtaining this sample. Intuitively, we choose the values of j and a2 that max­imize the probability of obtaining this sample. Mathematically, we treat f (X1,…, Xn; j, a2) as L(j, a2) and we call it the likelihood function. Maximizing L(j, a2) with respect to j and a2, one gets the first-order conditions of maximization:

(dL/dj) = 0 and (dL/da2) = 0

Equivalently, we can maximize logL( j, a2) rather than L(j, a2) and still get the same answer. Usually, the latter monotonic transformation of the likelihood is easier to maximize and the first-order conditions become

(d logL/dj) = 0 and (dlogL/da2) = 0

For the Normal distribution example, we get

logL( j; a2) = -(u/2)log a2 – (n/2)log 2n – (1/2a2) En=1 (Xi – j)2 dlogL( j; a2)/dj = (1/a2) EIL 1(Xi – j) = 0 ^ jMLE = X dlogL( j; a2)/da2 = -(u/2)(1/a2) +EE^Xi – j)2/2a4 = 0

^ aMLE = EE1(Xi – jiMLE)2/u = EE 1(Xi – X)2/u

Note that the moments estimators and the maximum likelihood estimators are the same for the Normal distribution example. In general, the two methods need not necessarily give the same estimators. Also, note that the moments estimators will always have the same estimating equations, for example, the first two equations are always

E(X) = v = E?=1 Xi/n = X and E(X2) = E + a2 = EEi xf/n-

For a specific distribution, we need only substitute the relationship between the population moments and the parameters of that distribution. Again, the number of equations needed depends upon the number of parameters of the underlying distribution. For e. g., the exponential distribution has one parameter and needs only one equation whereas the gamma distribution has two parameters and needs two equations. Finally, note that the maximum likelihood technique is heavily reliant on the form of the underlying distribution, but it has desirable properties when it exists. These properties will be discussed in the next section.

So far we have dealt with the Normal distribution to illustrate the two methods of estima­tion. We now apply these methods to the Bernoulli distribution and leave other distributions applications to the exercises. We urge the student to practice on these exercises.

Bernoulli Example: In various cases in real life the outcome of an event is binary, a worker may join the labor force or may not. A criminal may return to crime after parole or may not. A television off the assembly line may be defective or not. A coin tossed comes up head or tail, and so on. In this case 9 = Pr[Head] and 1 — 9 = Pr[Tail] with 0 < 9 < 1 and this can be represented by the discrete probability function

f (X; 9) = 9X (1 — 9)i-X X = 0,1

= 0 elsewhere

The Normal distribution is a continuous distribution since it takes values for all X over the real line. The Bernoulli distribution is discrete, because it is defined only at integer values for X. Note that P[X = 1] = f (1; в) = 9 and P[X = 0] = f (0; 9) = 1 — 9 for all values of 0 < 9 < 1. A random sample of size n drawn from this distribution will have a joint probability function

n n

L(9) = f (Xi,…,Xn; 9) = 9^ *=iX (1 — 9)n-^ *=iX

with Xi = 0,1 for i = 1,…,n. Therefore,

logL(9) = (EEi Xi)log9 + (n — EEi Xi)log(1 — 9) 9 logL(9) = EEi Xi (n — E n=i Xi) d9 9 (1 — 9)

Solving this first-order condition for 9, one gets (EEi Xi)(1 — 9) — 9(n — EEi Xi) = 0 which reduces to

9mle = E n=i Xi/n = X.

This is the frequency of heads in n tosses of a coin.

For the method of moments, we need

E(X) = £^=o X/(X, в) = 1./(1, в) + 0./(0, в) = / (1, в) = в

and this is equated to X to get в = X. Once again, the MLE and the method of moments yield the same estimator. Note that only one parameter в characterizes this Bernoulli distribution and one does not need to equate second or higher population moments to their sample values.