# Basic Concepts

Consider a structure s that describes the probability distribution function Ps(y) of a random vector Y. The set of all a priori possible structures is called a model. We assume that Y is generated by a known parametric probability function P() conditional on a parameter vector 0 Є S, where S is an open subset of Rl. So a structure is described by a parameter point 0 and a model is defined by a set {P(y, 0)| 0 Є S}. Submodels {P(y, 0)| 0 Є H} are defined by sets of structures H that are subsets of S: He S. Hence, a structure is described by a parametric point 0, and a model is a set of points H e Rl. So the problem of distinguishing between structures is reduced to the problem of distinguishing between parameter points.

Definition 1. The sets of structures S1 and S2 are observationally equivalent if {P(y, 0)| 0 Є S1} = {P(y, 0)| 0 Є S2}. In particular, two parameter points 01 and 02 are observationally equivalent if P( y, 01) = P( y, 02) for all y.

Definition 2. The element 0 k of the parameter vector 00 Є S is said to be locally identified in S if there exists an open neighborhood of 00 containing no point 0 Є S, with 0k Ф 0 k, that is observationally equivalent to 00.

The notion of identification is related to the existence of an unbiased or consistent estimator. That is, if 0 °k is locally not identified, there exist points 0 arbitrarily close to 00 with 0k Ф 0k and P(y, 0) = P(y, 00). Hence exact knowledge of P(y, 00) is not sufficient to distinguish between 0 k and 0k.2

Now consider an estimator 0k of 0k. Its distribution function is a function of P( y, 00) so that, again, exact knowledge of this distribution function is not suf­ficient to distinguish between 0 k and 0k. Asymptotically the same holds with respect to the limit distribution of 0k. In that case 0 k cannot be expressed as a function of the small – or large-sample distribution of 0k. In particular, 0 k cannot be expressed as the expectation or probability limit of 0k.

On the other hand, if 0 k is locally identified and if we restrict the parameter space to a sufficiently small open neighborhood of 00, we find that P( y, 00) corre­sponds uniquely to a single value 0k = 0k. In fact we have the following theorem.

Theorem 1. Let P(y, 0) be a continuous function of 0 Є S for all y, then 0 0k is locally identified (in S) if and only if there exists an open neighborhood oe° of

0 such that any sequence 0!, i = 1, 2,… in S П Oe° for which P(y, 0!) ^ P(y, 0o), for all y, also satisfies 0k ^ 0°.

Proof. f is in two parts.

Necessity. If 0k is locally not identified, then for any open neighborhood Oe° there exists a point 0 Є S П O0° with P(y, 0) = P(y, 0°) and 0k Ф 0k. Thus if we take

1 = 0, i = 1, 2,… we find 0k = 0k Ф 0 k.

Sufficiency. If for any open neighborhood Oe° there exists a sequence 0′, i = 1,

2,. .. in S П Oe° for which P(y, 0!) ^ P(y, 0 °) and 0 k does not converge to 0°, we

may consider converging subsequences in compact neighborhoods with 0’k ^ 0* Ф 00k. Due to the continuity we find that P(y, 0*) = P(y, 00) so that 00k is locally not identified. ■

Hence, if P( y, 00) can be consistently estimated, for example in the case of iid observations, then 00k, the kth element of 00 Є oec, can be consistently estimated if and only if it is identified. Thus, if one considers a sample as a single observation on a random vector with probability distribution P( y, 00) and uses an asymptotic parameter sequence consisting of repeated samples, i. e. iid observations on this random vector, then 0k can be consistently estimated if and only if it is identified.3

So far for the identification of a single parameter. Definition 2 can be extended to the definition of the whole parameter vector straightforwardly.

Definition 3. If all elements of 00 are locally identified then 00 is said to be locally identified.

Although the notion of local identification plays the predominant role, we will occasionally refer to global identification.

Definition 4. If the open neighborhood referred to in definition 2 is equal to S, then the identification is said to be global in S.

Definitions 2 and 3 are obviously difficult to apply in practice. In the following section we present a much more manageable tool for the characterization of local identification.

2 Identification and the Rank of the Information Matrix

When analyzing local identification of a model, the information matrix can be used conveniently. The following theorem, due to Rothenberg (1971) but with a slightly adapted proof, contains the essential result.

Definition 5. Let M(0) be a continuous matrix function of 0 Є Rl and let 00 Є Rl. Then 00 is a regular point of M(0) if the rank of M(0) is constant for points in Rl in an open neighborhood of 00.

Theorem 2. Let 00 be a regular point of the information matrix ¥(0). Assume that the distribution of y has a density function f(y, 0), and assume that f(y, 0) and log f( y, 0) are continuously differentiable in 0 for all 0 Є S and for all y. Then 00 is locally identified if and only if ¥(0°) is nonsingular.

Proof. Let

g( ^ 0) = log f( ^ 0) h(y, 0) = 3 log f(y, 0)/30.

Then the mean value theorem implies

g(У, 0) – g(У, 0°) = h(y, 0*)'(0 – 0°), (7.1)

for all 0 in a neighborhood of 0°, for all y, and with 0* between 0 and 0° (although 0* may depend on y). Now suppose that 0° is not locally identified. Then any open neighborhood of 0° will contain parameter points that are observationally equivalent to 0°. Hence we can construct an infinite sequence 01, 02,…, 0k,…, such that limk^„ 0k = 0°, with the property that g(y, 0k) = g(y, 0°), for all k and all y. It then follows from (7.1) that for all k and all y there exist points 0*k (which again may depend on y), such that

h(y, 0*k)’5k = h(y, 0*k)'(0k – 0°)/ ||0k – 0°|| = °, (7.2)

with 0*k between 0k and 0°.

Since 0k ^ 0°, there holds 0*k ^ 0° for all y. Furthermore, the sequence 81,

52,. .., Sk,… is an infinite sequence on the unit sphere, so there must be at least one limit point. Let 5° be such a limit point. Then (7.2) implies h( y, 0°)’5 ° = ° for all y. This gives for the information matrix

E{(h( y, 0°)’5°)2} = 5°’ E{h( y, 0°)h( y, 0°)’} 5° = 5°’¥(0°)5° = °,

so that indeed nonidentification of 0° implies singularity of the information matrix.

Conversely, if 0° is a regular point but ¥(0°) is singular, then there exists a vector c(0) such that in an open neighborhood of 0°

c(0)’¥(0)c(0) = E{(h( y, 0)’c(0))2} = 0.

This implies, for all 0 in this neighborhood, that h( y, 0)’c(0) = ° for all y. Since ¥(0) is continuous and of constant rank, c(0) can be chosen to be continuous in a neighborhood of 0°. We use this property to define a curve 0(t) which solves for ° < t < t* the differential equation d 0(t)/dt = c(0), 0(°) = 0°. This gives

My^) = h( y, 9)’ dt = h( y, 0)’c(0) = °

for all y. So g( y, 0) is constant along the curve for ° < t < t*, hence 0° is not identified. ■