The Jacobian Matrix Criterion

The advantage of Theorem 2 is that we do not have to operate on the joint probability distribution of the underlying random variables directly when ana­lyzing the identification of a particular model. It suffices to consider the informa­tion matrix. There is a further simplification possible when the underlying distribution admits a sufficient statistic (of a size that does not depend on the sample size). Under suitable regularity conditions, which are essentially those for the Cramer-Rao theorem, such a sufficient statistic exists if and only if the distribution belongs to the exponential family. Then, its density function can be written as

f(y, 0) = a(y)eb(y)’T(0)+c(0),

for a suitable choice of functions a(), b(-), c() and t(), where t() and b() are vector functions. Without loss of generality we assume that the covariance matrix of b(y) is nonsingular.

When ya,…, yn denote the vectors of observations, a sufficient statistic is given by s( y) – X”i b(y,), as follows from the factorization theorem for jointly sufficient statistics. The first-order derivative of the loglikelihood is

«НЕ® = Яв)5( y) + п»,

Э9 дв

where

Q(0) – . (7.3)

Since E{d log l(0)/d0} = 0, the information matrix is given by the variance of the derivative of the loglikelihood:

*(0) = Q№( y)Q(0)’.

Since Xs( y) is of full rank, the information matrix is of full rank if and only if Q(0) is of full row rank. So we have established the following result.

Theorem 3. Let f( y, 0) belong to the exponential family. Let 00 be a regular point of the information matrix *(0). Let Q() be as defined in (7.3). Then 00 is locally identified if and only if Q(00) has full row rank.

As a byproduct of this theorem, we note that t(0) (sometimes called the canonical or natural parameter) is identified since the corresponding information matrix is simply the covariance matrix of s(y), assumed to be of full rank.

A major application of Theorem 3 concerns the ^-dimensional normal distribu­tion with parameters p and X whose elements are functions of a parameter vector 0. For the normal we can write

f( y, 0) = (2n)-V2|X|4 e4(y-g)’X-1(y-g)

= (2n)-k/2 e(vуу ‘®у ‘)(х-1ц ;vecX-1)+los|X|-1/2 –

This gives the normal distribution in the form of the exponential family but the term y ® y contains redundant elements and hence its covariance matrix is singular. To eliminate this singularity, let Nk (of order k2 x k2), Dk (of order k2 x -2 k(k + 1)), and Lk (of order ^ k(k + 1) x k2) be matrices with properties

Nkvec A = vec -2 (A + A’), Dkv(A) = vec A, L’kv(B) = vecB

for every k x k-matrix A and for every lower triangular k x k-matrix B, and the 4 k(k + 1)-vector v(A) is the vector obtained from vec A by eliminating all supra – diagonal elements of A. Then DkLkNk = Nk (Magnus, 1988, theorem 5.5(ii)), and

(y ® y)’vecI-1 = (y ® y)’NkvecI-1

= (y ® y)’NkL’kDkvec I-1 = (y ® y)’L’kDkvecI-1 = (Lk( y ® y)) Dk vec I-1.

So the normal density fits in the k-dimensional exponential family with

b(y) = (y; –2 Lk(y ® y)), t(0) = (I-1p; DkvecI-1).

The identification of a normality-based model with parameterized mean and variance hence depends on the column rank of the matrix of derivatives of t(0) with respect to 0′, or equivalently on the column rank of the matrix of deriva­tives of

o(0) = (p; v(I)). (7.4)

The equivalence is due to the fact that the Jacobian matrix of the transformation from t(0) to o(0) is equal to

X-1 -(p’X-1 ® X-1)Dk 0 – D((X-1 ® X-1)Dk _

and is hence nonsingular. If, as is often the case in practice, the mean of the distribution is zero, (7.4) reduces to o(0) = v(I). So, the identification of a model when the underlying distribution is multivariate normal with zero means depends on the structure of the covariance matrix only.

While frequently normality is an assumption of convenience, not justified by theory or data, it should be stressed that, when dealing with problems of identi­fication, it is also a conservative assumption (cf. Aigner et al., 1984; Bekker, 1986). When the underlying distribution is nonnormal, models that are not identified under normality may as yet be identified since higher order moments can then be added to o(0), which either leaves the rank of the Jacobian matrix unaffected or increases it.

For the remainder of this chapter we do not make a specific assumption as to the form of the distribution. We merely assume in generality that there exists a vector function о of order n, о(0): Rl ^ Rn, such that there exists a one-to-one relation between the elements of {P(y, 0) | 0 Є Rl} and {о(0) | 0 Є Rl}; the identification of the parameters from the underlying distribution is "transmitted" through о(0).

Leave a reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>