PROPERTIES OF THE SYMMETRIC MATRIX

Now we shall study the properties of symmetric matrices, which play a major role in multivariate statistical analysis. Throughout this section, A will denote an n X n symmetric matrix and X a matrix that is not neces­sarily square. We shall often assume that X is n X К with К < n.

The following theorem about the diagonalization of a symmetric matrix is central to this section.

THEOREM 11.5.1 For any symmetric matrix A, there exists an orthogonal matrix H (that is, a square matrix satisfying H’H = I) such that

(11.5.1) H’AH = A,

where A is a diagonal matrix. The diagonal elements of A are called the characteristic roots (or eigenvalues) of A. The ith column of H is called the characteristic vector (or eigenvector) of A corresponding to the characteristic root of A, which is the ith diagonal element of A.

Proof. See Bellman (1970, p. 54).

Note that H and A are not uniquely determined for a given symmetric matrix A, since H’AH = A would still hold if we changed the order of the diagonal elements of A and the order of the corresponding columns of

H. The set of the characteristic roots of a given matrix is unique, however, if we ignore the order in which they are arranged.

Theorem 11.5.1 is important in that it establishes a close relationship between matrix operations and scalar operations. For example, the inverse of a matrix defined in Definition 11.3.2 is related to the usual inverse of a scalar in the following sense. Premultiplying and postmultiplying

(11.5.1) by H and H’ respectively, and noting that HH’ = H’H = I, we obtain

(11.5.2) A = HAH’.

Inverting both sides of (11.5.2) and using Theorem 11.3.7 yields

(11.5.3) A-1 = HA_1H’,

since H’H = I implies H 1 = H’. Denote A by D(X,), indicating that it is a diagonal matrix with k, in the ith diagonal position. Then clearly A”1 = D(X,~1)- Thus the orthogonal diagonalization (11.5.1) has enabled us to reduce the calculation of the matrix inversion to that of the ordinary scalar inversion.

More generally, a matrix operation / (A) can be reduced to the corre­sponding scalar operation by the formula

(11.5.4) /(A) = HD[/ (Xj)]H’.

The reader should verify, for example, that

(11.5.5) A2(=AA) = HD(Xf )H’.

Given a symmetric matrix A, how can we find A and H? The following theorem will aid us.

THEOREM 11.5.2 Let X be a characteristic root of A and let h be the corresponding characteristic vector. Then,

 (11.5.6) Ah = Xh and (11.5.7) о II i—i 1 <_ Proof. Premultiplying (11.5.1) by H yields (11.5.8) AH = HA.

Singling out the ith column of both sides of (11.5.8) yields Ah, = ХДі;, where X, is the ith diagonal element of A and h, is the ith column of H. This proves (11.5.6). Writing (11.5.6) as (A — XI)h = 0 and using Theo­rem 11.4.1 proves (11.5.7). □

Let us find the characteristic roots and vectors of the matrix

1 2 2 1

By (11.5.7) we have 1-Х 2

2 1-Х

Therefore the characteristic roots are 3 and — 1. Solving  1 2 2 1

simultaneously for x and x%, we obtain X; = = V2~ 1. Solving

 "1 2" Уі = (“I) % 2 1 J2. .72.
 2 2 and ji + = 1

simultaneously for уi and y%, we obtain y = V2~-1 and y% = — V2"_1. (yi = —V2^ 1 and у2 = V2~ 1 also constitute a solution.) The diagonaliza – tion (11.5.1) can be written in this case as

 ‘ 1 1 ‘ " 1 1 V2 V2 CM V2 V2" "3 0 " 1 1 2 1 1 1 0 -1 .V2 V2_ Г l Ж V2- г L

The characteristic roots of any square matrix can be also defined by (11.5.7). From this definition some of the theorems presented below hold for general square matrices. Whenever we speak of the characteristic roots of a matrix, the reader may assume that the matrix in question is symmet­ric. Even when a theorem holds for a general square matrix, we shall prove it only for symmetric matrices.

The following are useful theorems concerning characteristic roots.

THEOREM 11.5.3 The rank of a square matrix is equal to the number of its nonzero characteristic roots.

Proof. We shall prove the theorem for an n X n symmetric matrix A. Suppose that щ of the roots are nonzero. Using (11.5.2), we have

rank(A) = rank(HAH’)

= rank(AH’) by Theorem 11.4.11

= rank(HA) by Theorem 11.4.7

= rank(A) by Theorem 11.4.11

= щ. □

THEOREM 11.5.4 For any matrices X and Ynot necessarily square, the nonzero characteristic roots of XY and YX are the same, whenever both XY and YX are defined.

Proof. See Bellman (1970, p. 96).

theorem 11.5.5 Let A and В be symmetric matrices of the same size. Then A and В can be diagonalized by the same orthogonal matrix if and only if AB = BA.

Proof. See Bellman (1970, p. 56).

theorem 11.5.6 Let X] and n be the largest and the smallest charac­teristic roots, respectively, of an n X n symmetric matrix A. Then for every nonzero n-component vector x,

(11.5.9) —>Xn.

X X    Proof. Using (11.5.1) and HH’ = I, we have

where z = H’x. The inequalities (11.5.9) follow from z’^I — A)z > 0 and z'(A — „I)z ^ 0. □

Each characteristic root of a matrix can be regarded as a real function of the matrix which captures certain characteristics of that matrix. The determinant of a matrix, which we examined in Section 11.3, is another important scalar representation of a matrix. The following theorem estab­lishes a close connection between the two concepts.

THEOREM 11.5.7 The determinant of a square matrix is the product of its characteristic roots.

Proof. We shall prove the theorem only for a symmetric matrix A. Taking the determinant of both sides of (11.5.2) and using Theorems

11.3.1 and 11.3.5 yields |A| = |Н|2|А|. Similarly, H’H = I implies |H| = 1. Therefore |A| = |A|, which implies the theorem, since the determinant of a diagonal matrix is the product of the diagonal elements. □

We now define another important scalar representation of a square matrix called the trace.

DEFINITION 11.5.1 The trace of a square matrix, denoted by the nota­tion tr, is defined as the sum of the diagonal elements of the matrix.

The following useful theorem can be proved directly from the definition of matrix multiplication.

THEOREM 11.5.8 Let X and Ybe any matrices, not necessarily square, such that XY and YX are both defined. Then, tr XY = tr YX.

There is a close connection between the trace and the characteristic roots.

theorem 11.5.9 The trace of a square matrix is the sum of its charac­teristic roots.

Proof. We shall prove the theorem only for a symmetric matrix A. Using

(11.5.2) and Theorem 11.5.8, we have (11.5.11) tr A = tr НЛН’ = tr AH’H = tr A. □

We now introduce an important concept called positive definiteness, which plays an important role in statistics. We deal only with symmetric matrices.

DEFINITION 11.5.2 If A is an и X и symmetric matrix, A is positive definite if x’Ax > 0 for every я-vector x such that x Ф 0. If x’Ax > 0, we say that A is nonnegative definite or positive semidefinite. (Negative definite and nonposi­tive definite or negative semidefinite are similarly defined.)

If A is positive definite, we write A > 0. The inequality symbol should not be regarded as meaning that every element of A is positive. (If A is diagonal, A > 0 does imply that all the diagonal elements are positive.) More generally, if A — В is positive definite, we write A > B. For non­negative definiteness, we use the symbol >.

THEOREM 11.5.10 A symmetric matrix is positive definite if and only if its characteristic roots are all positive. (The theorem is also true if we change the word “positive” to “nonnegative,” “negative,” or “nonpositive.”)

Proof. The theorem follows immediately from Theorem 11.5.6. □

THEOREM 1 1.5.1 1 A>0=>A_1>0.

Proof. The theorem follows from Theorem 11.5.10, since the charac­teristic roots of A 1 are the reciprocals of the characteristic roots of A because of (11.5.3). □

THEOREM 11.5.12 Let A be an n X n symmetric matrix and let X be an n X К matrix where К < n. Then A > 0 =>X’AX > 0. Moreover, if rank(X) = K, then A > 0 => X’AX > 0.

Proof. Let c be an arbitrary nonzero vector of К components, and define d = Xc. Then c’X’AXc = d’Ad. Since A > 0 implies d’Ad > 0, we have X’AX > 0. If X is full rank, then d Ф 0. Therefore A > 0 implies d’Ad > 0 and X’AX > 0. □
theorem 11.5. IB Let A and В be symmetric positive definite matrices of the same size. Then A 2: В =» В 1 > A and A > В => В 1 > A

Proof. See Bellman (1970, p. 93).

Next we discuss application of the above theorems concerning a positive definite matrix to the theory of estimation of multiple parameters. Recall that in Definition 7.2.1 we defined the goodness of an estimator using the mean squared error as the criterion. The question we now pose is, How do we compare two vector estimators of a vector of parameters? The following is a natural generalization of Definition 7.2.1 to the case of vector estimation.

definition 11.5.3 Let в and в be estimators of a vector parameter 0. Let A and В be their respective mean squared error matrix; that is, A = £(0 — 0) (0 — 0)’ and В = £(0 — 0) (0 — 0)’. Then we say that 0 is better than 0 if A < В for any parameter value and А Ф В for at least one value of the parameter. (Both A and В can be shown to be nonnegative definite direcdy from Definition 11.5.2.)

Note that if 0 is better than 0 in the sense of this definition, 0 is at least as good as 0 for estimating any element of 0. More generally, it implies that c’0 is at least as good as c’0 for estimating c’0 for an arbitrary vector c of the same size as 0. Thus we see that this definition is a reasonable generalization of Definition 7.2.1.    Unfortunately, we cannot always rank two estimators by this definition alone. For example, consider

In neither example can we establish that A ^ В or В ^ A. We must use some other criteria to rank estimators. The two most commonly used are the trace and the determinant. In (11.5.12), tr A < tr B, and in (11.5.13),

|A| < В Note that A < В implies tr A < tr В because of Theorem 11.5.9. It can be also shown that A < В implies |A| < |B|. The proof is somewhat involved and hence is omitted. In each case, the converse is not necessarily true.

In the remainder of this section we discuss the properties of a particular positive definite matrix of the form P = X(X’X) JX’, where X is an n X К matrix of rank K. This matrix plays a very important role in the theory of the least squares estimator developed in Chapter 12.

THEOREM 11.5.14 An arbitrary гг-dimensional vector у can be written as у = yx + y2 such that Pyx = yx and Py2 = 0.

Proof. By Theorem 11.4.9, there exists an n X (n — K) matrix Z such that (X, Z) is nonsingular and X’Z = 0. Since (X, Z) is nonsingular, there exists an n-vector c such that у = (X, Z)c = Xcx + Zc2. Set yx = Xcx and y2 = Zc2. Then clearly Pyx = yx and Py2 = 0. □

It immediately follows from Theorem 11.5.14 that Py = yx. We call this operation the projection of у onto the space spanned by the columns of X, since the resulting vector yx = Xcx is a linear combination of the columns of X. Hence we call P a projection matrix. The projection matrix M = Z(Z’Z) !Z’, where Z is as defined in the proof of Theorem 11.5.14, plays the opposite role from the projection matrix P. Namely, My = y2.

THEOREM 11.5.15 I – P = M.

Proof. We have

(11.5.14) (I – P – M)(X, Z) = (X, Z) – (X, 0) – (0, Z) = 0.

Postmultiplying both sides of (11.5.14) by (X, Z)  yields the desired result. □

THEOREM 1 1.5.16 P = P’ = p.

2

This can be easily verified. Any square matrix A for which A = A is called an idempotent matrix. Theorem 11.5.16 states that P is a symmetric idempotent matrix.

THEOREM 11.5.17 rank(P) = K.

Proof. As we have shown in the proof of Theorem 11.5.14, there exists an n X (n — K) full-rank matrix Z such that PZ = 0. Suppose PW = 0 for some matrix W with n rows. Since, by Theorem 11.5.14, W = XA + ZB for some matrices A and B, PW = 0 implies XA = 0, which in turn implies A = 0. Therefore W = ZB, which implies rank(W) < n — K. Thus the theorem follows from Theorem 11.4.10. (An alternative proof is to use Theorem 11.5.3 and Theorem 11.5.18 below.) □

THEOREM 11.5.18 Characteristic roots of P consist of К ones and n — К zeroes.

Proof. By Theorem 11.5.4 the nonzero characteristic roots of X(X’X)_1X’ and (X’X)_1X’X are the same. But since the second matrix is the identity of size K, its characteristic roots are К ones. □

THEOREM 11.5.19 Let X be an n X К matrix of rank K. Partition X as X = (Xj, X2) such that Xx is n X Кj and X2 is n X K% and K + = K.

If we define X| = [I – Х1(Х;Х1)_1ХІ]Х2, then we have X(X’X)_1X’ = XjCXlXj)-^; +X|(X2’X|)_1X|’.

Proof. The theorem follows from noting that

[X(X’X)_1X’ – XiCXjXO^Xl – X|(X|’X|)_1X|’] [Xb X2, Z] = 0. □  1 2 

 A = 1 2" and В = 1— сч 1__ 3 4 1 1

4. (Section 11.3)

Prove A 1 — (A + B) 1 = A ‘(A 1 + В ‘) ‘A 1 whenever all the inverses exist. If you cannot prove it, verify it for the A and В given in Exercise 3 above.

5. (Section 11.4)

Solve the following equations for X] for х%; first, by using the inverse of the matrix, and second, by using Cramer’s rule:

 ’і і Х Y 3 4 Х<£ 1

6. (Section 11.4)

Solve the following equations for x, x%, and x%; first, by using the inverse of the matrix, and second, by using Cramer’s rule:

 1 -2 з" Хі і 1 1 -1 х% = 0 2 1 2 Ч і

7. (Section 11.4)

Find the rank of the matrix

"l 1 1~

2 3 1-

2 1 3

8. (Section 11.4)

Find the rank of the matrix

12 12"

12 3 4

1 2 5 6

2 4 0 2 

a 3 V2

_л/2 2

and compute A0’5.

10. (Section 11.5)

Compute

Г -1-0.5

5 2 2 2

11. (Section 11.5)

Prove Theorem 11.5.8.

12. (Section 11.5)

Let A be a symmetric matrix whose characteristic roots are less than one in absolute value. Show that

(I – A)’ = I + A + A2 +__________

13. (Section 11.5)

Suppose that A and В are symmetric positive definite matrices of the same size. Show that if AB is symmetric, it is positive definite.

14. (Section 11.5)

Find the inverse of the matrix I + xx’ where x is a vector of the same dimension as I.

15. (Section 11.5) Define

1 1 1 1

1 -2

Compute X(X’X) ’x and its characteristic vectors and roots.