# Maximum Likelihood Estimation

Assumption 5: The ui’ s are independent and identically distributed N(0,a2).

This assumption allows us to derive distributions of estimators and other test statistics. In fact using (3.5) one can easily see that Pols is a linear combination of the ui’ s. But, a linear combination of normal random variables is itself a normal random variable, see Chapter 2, problem 15. Hence, /3OLS is N(в, a^EnE x2). Similarly aOLS is N(a, a2Y^=1 X2/^27=1 x2), and Yi is N (a+вХі, a2). Moreover, we can write the joint probability density function of the u ’ s as f (u1 ,u2,… ,un; a, в, a2) = (1/2na2)n/2exp(—En= u2/2a2). To get the likelihood function we make the transformation ui = Yi — a — вХі and note that the Jacobian of the transformation is 1. Therefore,

f (Y1, Y2,…, Yn; a, в, a2) = (1/2na2)n/2exp{— EXX — a — вХі)2/2a2} (3.8)

Taking the log of this likelihood, we get

logL(a, в, a2) = —(n/2)log(2na2) — EXYi — a — вХі)2/2a2 (3.9)

Maximizing this likelihood with respect to a, в and a2 one gets the maximum likelihood esti­mators (MLE). However, only the second term in the log likelihood contains a and в and that term (without the negative sign) has already been minimized with respect to a and в in (3.2) and (3.3) giving us the OLS estimators. Hence, aaMLE = aOLS and Pmle = Pols. Similarly, by differentiating logL with respect to a2 and setting this derivative equal to zero one gets ea2MLE = En=1 e2/n, see problem 7. Note that this differs from s2 only in the divisor. In fact, E(aMLE) = (n — 2)a2/n = a2. Hence, a2MLE is biased but note that it is still asymptotically unbiased.

So far, the gains from imposing assumption 5 are the following: The likelihood can be formed, maximum likelihood estimators can be derived, and distributions can be obtained for these estimators. One can also derive the Cramer-Rao lower bound for unbiased estimators of the parameters and show that the aOLS and /3OLS attain this bound whereas s2 does not. This derivation is postponed until Chapter 7. In fact, one can show following the theory of complete sufficient statistics that 33OLS, /3OLS and s2 are minimum variance unbiased estimators for a, в and a2, see Chapter 2. This is a stronger result (for aOLS and POLS) than that obtained using the Gauss-Markov Theorem. It says, that among all unbiased estimators of a and в, the OLS estimators are the best. In other words, our set of estimators include now all unbiased estimators and not just linear unbiased estimators. This stronger result is obtained at the expense of a stronger distributional assumption, i. e., normality. If the distribution of the disturbances is not normal, then OLS is no longer MLE. In this case, MLE will be more efficient than OLS as long as the distribution of the disturbances is correctly specified. Some of the advantages and disadvantages of MLE were discussed in Chapter 2.

We found the distributions of aOLS, Pols, now we give that of s2. In Chapter 7, it is shown that £П= e^/a2 is a chi-squared with (n — 2) degrees of freedom. Also, s2 is independent of aOLS and вOLS. This is useful for test of hypotheses. In fact, the major gain from assumption 5 is that we can perform test of hypotheses.

Standardizing the normal random variable вOLS, one gets z = (вOLS — в)/(a2/£П=1 X2)2 ~ N(0,1). Also, (n — 2)s2/a2 is distributed as хП_2. Hence, one can divide z, a N(0,1) random variable, by the square root of (n — 2)s2/a2 divided by its degrees of freedom (n — 2) to get a t-statistic with (n — 2) degrees of freedom. The resulting statistic is tobs = (POLS — в )/(s2/£ n=i x2) 2 ~ tn-2, see problem 8. This statistic can be used to test H0; в = во, versus H1; в = в0, where в0 is a known constant. Under H0, tobs can be calculated and its value can be compared to a critical value from a t-distribution with (n — 2) degrees of freedom, at a specified critical value of a%. Of specific interest is the hypothesis H0; в = 0, which states that there is no linear relationship between Yi and Xi. Under H0,

tobs = Pols/(s2/ £n=i xf) 2 = Pols/se(PoLS)

where se(PoLS) = (s2/£n=i x2)2. If tobs > ta/2;n_2, then H0 is rejected at the a% significance level. ta/2.n_2 represents a critical value obtained from a t-distribution with n — 2 degrees of freedom. It is determined such that the area to its right under a tn_2 distribution is equal to a/2.

Similarly one can get a confidence interval for в by using the fact that, Pr[—ta/2;n_2 < tobs < ta/2;n_2] = 1 — a and substituting for tobs its value derived above as (вOLS — в)/SPe(вOLS). Since the critical values are known, вOLS and se^oLS) can be calculated from the data, the following (1 — a)% confidence interval for в emerges

Pols ± ta/2.n_2se(PoLS) •

Tests of hypotheses and confidence intervals on a and a2 can be similarly constructed using the normal distribution of aOLS and the хП_2 distribution of (n — 2)s2/a2.