General Results
4.1.1 Consistency
Because there is no essential difference between maximization and minimization, we shall consider an estimator that maximizes a certain function of the parameters. Let us denote the function by QT(у, в), where у = (Уі, y2, • . . , Утї is a Гvector of random variables and в is a Аvector of parameters. [We shall sometimes write it more compactly as £?r(0).] The vector в should be understood to be the set of parameters that characterize the
distribution of y. Let us denote the domain of в, or the parameter space, by 0 and the “true value” of в by 0O. The parameter space is the set of all the possible values that the true value 60 can take. When we take various operations on a function of y, such as expectation or probability limit, we shall use the value d0.
As a preliminary to the consistency theorems, we shall define three modes of uniform convergence of a sequence of random variables.
Definition 4.1.1. Letgr(0) be a nonnegative sequence of random variables depending on a parameter vector в. Consider the three modes of uniform convergence of gr{&) to 0:
(i) Ptlim^. supee0 £r(0) = 0] = 1,
(ii) limr_e P[supeee gT{Q) < e] = 1 for any e > 0,
(iii) limr_«, infeee P[gT(fi) < e] = 1 for any e > 0.
If (і) holds, we say gT(6) converges to 0 almost surely uniformly in в Є 0. If (ii) holds, we say gT(6) converges to 0 in probability uniformly in в Є 0. If (iii) holds, we say gT(6) converges to 0 in probability semiuniformly in в Є 0.
It is easy to show that (i) implies (ii) and (ii) implies (iii). Consider an example of a sequence for which (iii) holds but (ii) does not. Let the parameter space 0 be [0, 1] and the sample space D also be [0, 1] with the probability measure equal to Lebesgue measure. For 0 є 0 and со Є Q, define gT(co, в) by
gT(co, в) = 1 if e = j: and
/ = 0,1,——– T— 1,
= 0 otherwise.
Then, for 0 < e < 1,
inf0ee P[gT{co, в) <€] = (T— 1 )/T and ^[supeee £г(<У, в) < e] = 0 for all T.
Now we shall prove the consistency of extremum estimators. Because we need to distinguish between the global maximum and a local maximum, we shall present two theorems to handle the two cases.
Theorem 4.1.1. Make the assumptions:
(A) The parameter space 0 is a compact subset of the Euclidean ЛГspace (RK). (Note that в0 is in 0.)
(B) QT(у, в) is continuous in в є 0 for all у and is a measurable function of у for all в є 0.
(C) T~lQT{Q) converges to a nonstochastic function £?(0) in probability uniformly in в Є 0 as T goes to °°, and Q(B) attains a unique global maximum at в0. (The continuity of Q(6) follows from our assumptions.)
Define 6T as a value that satisfies
QT(§T) = max QA&). (4.1.1)
вєв
A
[It is understood that if QT is not unique, we appropriately choose one such value in such a way that §T( y) is a measurable function of y. This is possible by a theorem of Jennrich (1969, p. 637).] Then 6T converges to 60 in probability.1
Proof Let N be an open neighborhood in RK containing 60. Then N П 0, where N is the complement of N in RK, is compact. Therefore тахвє^пе Q(0) exists. Denote
e = <2(0o)_ max Q(0). вєлгпе 
(4.1.2) 
Let AT be the event “ T~lQT(6) — Q(0) < c/2 for all в." Then 

AT=> Q(§T) > T~’Qt(6t) – e/2 
(4.1.3) 
and 

AT=* T‘QT(e0) > Q(e0) – e/2. 
(4.1.4) 
But, because Qt(0t) – QtWo) by the definition of §T, 
we have from Exp. 
(4.1.3) 
AT=> Q(dr) > T‘QtW ~ e/2. (4.1.5)
Therefore, adding both sides of the inequalities in (4.1.4) and (4.1.5), we obtain
AT=>Q(6T)>Q(e0)e. (4.1.6)
Therefore, from (4Л.2) and (4.1.6) we can conclude AT=> dTGN, which implies P(AT) ё Р(втЕ N). But, since Ііт^,» P(AT) — 1 by assumption C, §T converges to в0 in probability.
A
Note that вг is defined as the value of в that maximizes QT(6) within the parameter space 0. This is a weakness of the theorem in so far as the extremum estimators commonly used in practice are obtained by unconstrained maximization or minimization. This practice prevails because of its relative computational ease, even though constrained maximization or minimization would be more desirable if a researcher believed that the true value lay in a proper subset of RK. The consistency of the unconstrained maximum 6T defined by
QA^t) = sup Qt(8) (4.1.7)
вея*
will follow from the additional assumption
(D) limr_M* P[QT(e0) > supe$e QT(0)] = 1 because of the inequality
P[QT(00) > sup QTm Ш Рфт Є Є). (4.1.8)
As we shall see in later applications, QT is frequently the sum of independent random variables or, at least, of random variables with a limited degree of dependence. Therefore we can usually expect T~lQT(ff) to converge to a constant in probability by using some form of a law of large numbers. However, the theorem can be made more general by replacing T~lQT(ff) with h(T)~lQT(0), where h(T) is an increasing function of T.
The three major assumptions of Theorem 4.1.1 are (1) the compactness of 0, (2) the continuity of QT(0), and (3) the uniform convergence of T~lQT(0) to To illustrate the importance of these assumptions, we shall give examples that show what things go wrong when one or more of the three assumptions are removed. In all the examples, QT is assumed to be nonstochastic for simplicity.
Example 4.1.1. 0 = [— 1, 1], 0O = —1/2, T~lQT not continuous and not uniformly convergent.
Т‘(2т(в)=1+Є, 1S0S0O
= 0, в0 <0^0 = 0, 0=1.
Here the extremum estimator does not exist, although lim T~lQT attains its unique maximum at 0O.
Example 4.1.2. 0 = [— 1, <»], в0 = —1/2, T~lQT continuous but not uniformly convergent.
Q(0) =1+0, 1 S 0 ё 0O
= — 0, 0O <0=§O
= 0, elsewhere.
where
Лг(0) = 0—7’, Гі0іГ+1
= Г+20, Г+1<0ёГ+2
= 0, elsewhere.
Here we have plim 0r = plim (741) = °°, although lim T~’QT = Q attains its unique maximum at 0O.
Example 4.1.3. 0 = [0, 2], 0O = 1.5, T~lQTcontinuous but not uniformly convergent.
TlQA6)=Te, OS0S^t
= 1 — 70, !< 0^4
2 T T


= тТT(2_0)’ 0о<б^2.
А
Here we have plim 0Г = plim (2Г)_1 = 0, although lim T~lQT attains its unique maximum at 0O.
Example 4.1.4. 0 = [—2, 1], 0O = —1, T lQT not continuous but uniformly convergent.
T~lQT(0) = (1 – 21~т)в + 2 22~г, 2=§0ё1
= (1 2′т)в, – К03ІО
2Т+1 _ 2
= ^ГЗТ0> О<0ё12<™>
і в + 221~т, 1 — 2(Г+1)<0< 1
= 0, 0=1.
Here we have plim вт — plim [1 — 2(Г+1)] = 1, although lim T‘(2 тattains its unique maximum at 0O. (If we change this example so that 0 = [—2, 1), T~lQT becomes continuous and only the compactness assumption is violated.)
The estimator 0r of Theorem 4.1.1 maximizes the function QT(6) globally. However, in practice it is often difficult to locate a global maximum of Qr(0), for it means that we must look through the whole parameter space except in the fortunate situation where we can prove that QT(fi) is globally concave.
Another weakness of the theorem is that it is often difficult to prove that (2(0) attains its unique global maximum at 0O. Therefore we would also like to have a theorem regarding the consistency of a local maximum.
Still another reason for having such a theorem is that we can generally prove asymptotic normality only for the local maximum, as we shall show in Section 4.1.2. Theorem 4.1.2 is such a theorem.
Theorem 4.1.2. Make the assumptions:
(A) Let 0 be an open subset of the Euclidean АГspace. (Thus the true value 0O is an interior point of 0.)
(B) QT(y, 0) is a measurable function of у for all 0 Є 0, and dQT/d6 exists and is continuous in an open neighborhood Ni(60) of 0O. (Note that this implies QT is continuous for 0 Є.)
(C) There exists an open neighborhood N2(60) of 0O such that T~lQr(6) converges to a nonstochastic function Q(ff) in probability uniformly in 0 in N2(60), and £2(0) attains a strict local maximum at 0O.
Let 0r be the set of roots of the equation
ж0 (4L9)
corresponding to the local maxima. If that set is empty, set 0r equal to (0).
Then, for any є > 0,
lim P[ inf (0 – в0У(в – 0O) > є] = 0.
T* о» вєвг
Proof. Choose a compact set S C Nt П N2. Then the value of 0, say 0*, that globally maximizes QT(0) in Sis consistent by Theorem 4.1.1. But because the probability that T~lQT(6) attains a local maximum at 0£ approaches 1 as T goes to lim^» Р(6$ Є 6Г) = 1.
We sometimes state the conclusion of Theorem 4.1.2 simply as “there is a consistent root of the Eq. (4.1.9).”
The usefulness of Theorem 4.1.2 is limited by the fact that it merely states that one of the local maxima is consistent and does not give any guide as to how to choose a consistent maximum. There are two ways we can gain some degree of confidence that a local maximum is a consistent root: (1) if the solution gives a reasonable value from an economictheoretic viewpoint and
(2) if the iteration by which the local maximum was obtained started from a consistent estimator. We shall discuss the second point more fully in Section 4.4.2.
Leave a reply