Some Useful Matrix Properties
This book assumes that the reader has encountered matrices before, and knows how to add, subtract and multiply conformable matrices. In addition, that the reader is familiar with the transpose, trace, rank, determinant and inverse of a matrix. Unfamiliar readers should consult standard texts like Bellman (1970) or Searle (1982). The purpose of this Appendix is to review some useful matrix properties that are used in the text and provide easy access to these properties. Most of these properties are given without proof.
Starting with Chapter 7, our data matrix X is organized such that it has n rows and к columns, so that each row denotes an observation on к variables and each column denotes n observations on one variable. This matrix is of dimension n x к. The rank of an n x к matrix is always less than or equal to its smaller dimension. Since n > к, the rank (X) < к. When there is no perfect multicollinearity among the variables in X, this matrix is said to be of full column rank к. In this case, X’X, the matrix of cross-products is of dimension к x к. It is square, symmetric and of full rank к. This uses the fact that the rank(X’X) = rank(X) = к. Therefore, (X’X) is nonsingular and the inverse (X’X)-1 exists. This is needed for the computation of Ordinary Least Squares. In fact, for least squares to be feasible, X should be of full column rank к and no variable in X should be a perfect linear combination of the other variables in X. If we write
n where xi denotes the *-th observation, in the data, then X’X = ^П=1 xix’i where xi is a column vector of dimension к 1.
An important and widely encountered matrix is the Identity matrix which will be denoted by In and subscripted by its dimension n. This is a square n x n matrix whose diagonal elements are all equal to one and its off diagonal elements are all equal to zero. Also, a2In will be a familiar scalar covariance matrix, with every diagonal element equal to a2 reflecting homoskedasticity or equal variances (see Chapter 5), and zero covariances or no serial correlation (see Chapter 5). Let
be an (n x n) diagonal matrix with the i-th diagonal element equal to a2 for i = 1, 2,…, n. This matrix will be encountered under heteroskedasticity, see Chapter 9. Note that tr(Q) = Y1 П=і a2 is the sum of its diagonal elements. Also, tr(In) = n and tr(a2In) = na2. Another useful matrix is the projection matrix PX = X(X’X)-1X’ which is of dimension n x n. This matrix is encountered in Chapter 7. If y denotes the n x 1 vector of observations on the dependent variable, then PXy generates the predicted values j from the least squares regression of у on X. This matrix PX is symmetric and idempotent. This means that PX = PX and PX = PXPX = PX as can be easily verified. Some of the properties of idempotent matrices is that their rank is equal to their trace. Hence, rank(PX) = tr(PX) = tr[X(X’X)-1 X’j = tr[X ‘X (X ‘X-1)] = tr(Ifc) = k.
Here, we used the fact that tr(ABC) = tr(CAB) = tr(BCA). In other words, the trace is unaffected by the cyclical permutation of the product. Of course, these matrices should be conformable and the product should result in a square matrix. Note that PX = In — PX is also a symmetric and idempotent matrix. In this case, PXу = у — PXу = у — у = e where e denotes the least squares residuals, у — X^OLS where @ols = (X’X)-1X’у, see Chapter 7. Some properties of these projection matrices are the following:
PxX = X, PxX = 0, Pxe = e and Pxe = 0.
In fact, X’e = 0 means that the matrix X is orthogonal to the vector of least squares residuals e. Note that X’e = 0 means that X'(у — XfiOLS) = 0 or X’у = X’XfiOLS. These k equations are known as the OLS normal equations and their solution yields the least squares estimates j3OLS. By the definition of PX, we have (i) PX + PX = In. Also, (ii) PX and PX are idempotent and (iii) PXPX = 0. In fact, any two of these properties imply the third. The rank(PX) = tr(PX) = tr(In — PX) = n — k. Note that PX and PX are of rank k and (n — k), respectively. Both matrices are not of full column rank. In fact, the only full rank, symmetric idempotent matrix is the identity matrix.
Matrices not of full rank are singular, and their inverse do not exist. However, one can find a generalized inverse of a matrix Q which we will call Q – which satisfies the following requirements:
(i) QQ-Q =Q (ii) Q-QQ- = Q-
(iii) Q-Q is symmetric and (iv) QQ – is symmetric.
Even if Q is not square, a unique Q – can be found for Q which satisfies the above four properties. This is called the Moore-Penrose generalized inverse.
Note that a symmetric idempotent matrix is its own Moore-Penrose generalized inverse. For example, it is easy to verify that if Q = PX, then Q – = PX and that it satisfies the above four properties. Idempotent matrices have characteristic roots that are either zero or one. The number of non-zero characteristic roots is equal to the rank of this matrix. The characteristic roots of Q-1 are the reciprocals of the characteristic roots of Q, but the characteristic vectors of both matrices are the same.
The determinant of a matrix is non-zero if and only if it has full rank. Therefore, if A is singular, then |A| = 0. Also, the determinant of a matrix is equal to the product of its characteristic roots. For two square matrices A and B, the determinant of the product is the product of the determinants IABI = |A|- |B|. Therefore, the determinant of Q-1 is the reciprocal of the determinant of Q. This follows from the fact that |Q||Q-1| = |QQ-1| = ^| = 1. This property is used in writing the likelihood function for Generalized Least Squares (GLS) estimation, see Chapter 9. The determinant of a triangular matrix
is equal to the product of its diagonal elements. Of course, it immediately follows that the determinant of a diagonal matrix is the product of its diagonal elements.
The constant in the regression corresponds to a vector of ones in the matrix of regressors X. This vector of ones is denoted by in where n is the dimension of this column vector. Note that i’nin = n and ini’n = Jn where Jn is a matrix of ones of dimension n x n. Note also that Jn is not idempotent, but Jn = Jn/n is idempotent as can be easily verified. The rank(Jn) = tr(Jn) = 1. Note also that In — Jn is idempotent with rank (n — 1). Jny has a typical element y = ^n=1 y%/n whereas (In — Jn)y has a typical element (yi — y). So that Jn is the averaging matrix, whereas premultiplying by (In — Jn) results in deviations from the mean.
For two nonsingular matrices A and B
(AB)-1 = B-1 A-1
Also, the transpose of a product of two conformable matrices, (AB)1 = B’A’. In fact, for the product of three conformable matrices this becomes (ABC)’ = C’B’A’. The transpose of the inverse is the inverse of the transpose, i. e., (A-1)’ = (A’)-1.
The inverse of a partitioned matrix
A = An A12 |_ A21 A22
E —EA12 A221
—A-21A21E A-1 + A-1 A21EA12 A-1
(A11 — A12A-21A21)-1. Alternatively, it can be expressed as
_ A111 + A Ц1 A12 FA21A]_]^ — A111 A12 F
V — FA21A-11 F where F = (A22 — A21A-11A12)-1. These formulas are used in partitioned regression models, see for example the Frisch-Waugh Lovell Theorem and the computation of the variance-covariance matrix of forecasts from a multiple regression in Chapter 7.
An n x n symmetric matrix l has n distinct characteristic vectors c1,…,cn. The corresponding n characteristic roots A1,…,An may not be distinct but they are all real numbers. The number of nonzero characteristic roots of l is equal to the rank of l. The characteristic roots of a positive definite matrix are positive. The characteristic vectors of the symmetric matrix l are orthogonal to each other, i. e., cicj = 0 for i = j and can be made orthonormal with c’ici = 1 for i = 1, 2,…,n. Hence, the matrix of characteristic vectors C = [c1, c2,…, cn] is an orthogonal matrix, such that CC’ = C’C = In with C’ = C-1. By definition lci = Aici or lC = CЛ where Л = diag[Ai]. Premultiplying the last
equation by C’ we get C AlC = C’C Л = Л. Therefore, the matrix of characteristic vectors C diagonalizes the symmetric matrix l. Alternatively, we can write l = CЛC’ = ^n=1 Aicic’i which is the spectral decomposition of l.
A real symmetric n x n matrix l is positive semi-definite if for every n x 1 non-negative vector y, we have y’ly > 0. If y’ly is strictly positive for any non-zero y then l is said to be positive definite. A necessary and sufficient condition for l to be positive definite is that all the characteristic roots of l are positive. One important application is the comparison of efficiency of two unbiased estimators of a vector of parameters fi. In this case, we subtract the variance-covariance matrix of the inefficient estimator from the more efficient one and show that the resulting difference yields a positive semi-definite matrix, see the Gauss-Markov Theorem in Chapter 7.
If l is a symmetric and positive definite matrix, there exists a nonsingular matrix P such that l = PP’. In fact, using the spectral decomposition of l given above, one choice for P = CЛ1/2 so
that U = CAC’ = PP’. This is a useful result which we use in Chapter 9 to obtain Generalized Least Squares (GLS) as a least squares regression after transforming the original regression model by P— 1 = (CЛ1/2) —1 = A—1/2C’. In fact, if u ~ (0,a2U), then P— 1u has zero mean and var(P— 1u) = P— 1’var(u)P—1′ = a2P— 1UP— 1′ = a2P— 1PP’P— 1′ = a2 In.
From Chapter 2, we have seen that if u ~ N(0,a2In), then ui/a ~ N(0,1), so that u2/a2 ~ x2 and u’u/a2 = 5^П=1 u2/a2 ~ хП – Therefore, u'(a2In)—1u ~ хП. If u ~ N(0,a2U) where U is positive definite, then u* = P— 1u ~ N(0,a2In) and u*’u*/a2 ~ хП. But u*’u* = u’P-1’P— 1u = u’U—1u. Hence,
u’U—1u/a2 ~ хП. This is used in Chapter 9.
Note that the OLS residuals are denoted by e = PXu. If u ~ N(0,a2In), then e has mean zero and var( e) = a2PX InPX = a2 PX so that e ~ N (0, a2 PX). Our estimator of a2 in Chapter 7 is s2 = e’e/(n-k) so that (n — k)s2/a2 = e’e/a2. The last term can also be written as u’PXu/a2. In order to find the distribution of this quadratic form in Normal variables, we use the following result stated as lemma 1 in Chapter 7.
Lemma 1: For every symmetric idempotent matrix A of rank r, there exists an orthogonal matrix P such that P’AP = Jr where Jr is a diagonal matrix with the first r elements equal to one and the rest equal to zero.
We use this lemma to show that the e’e/a2 is a chi-squared with (n — k) degrees of freedom. To see this note that e’e/a2 = u’PXu/a2 and that PX is symmetric and idempotent of rank (n — k). Using the lemma there exists a matrix P such that P’PXP = Jn-k is a diagonal matrix with the first (n — k) elements on the diagonal equal to 1 and the last k elements equal to zero. An orthogonal matrix P is by definition a matrix whose inverse, is its own transpose, i. e., P’P = In. Let v = P’u then v has mean zero and var(v) = a2P’P = a2In so that v is N(0, a2In) and u = Pv. Therefore,
But, the v’s are independent identically distributed N(0, a2), hence v2/a2 is the square of a standardized N(0,1) random variable which is distributed as a x2. Moreover, the sum of independent x2 random variables is a x2 random variable with degrees of freedom equal to the sum of the respective degrees of freedom, see Chapter 2. Hence, eJe/a2 is distributed as x2n-k.
The beauty of the above result is that it applies to all quadratic forms u’Au where A is symmetric and idempotent. In general, for u ~ N(0,a2I), a necessary and sufficient condition for u’Au/a2 to be distributed xk is that A is idempotent of rank k, see Theorem 4.6 of Graybill (1961). Another useful theorem on quadratic forms in normal random variables is the following: If u ~ N(0, a2U), then u’Au/a2 is xk if and only if AU is an idempotent matrix of rank k, see Theorem 4.8 of Graybill (1961). If u ~ N(0,a2I), the two positive semi-definite quadratic forms in normal random variables say u’Au and u’Bu are independent if and only if AB = 0, see Theorem 4.10 of Graybill (1961). A sufficient condition is that tr(AB) = 0, see Theorem 4.15 of Graybill (1961). This is used in Chapter 7 to construct F- statistics to test hypotheses, see for example problem 11. For u ~ N(0,a2I), the quadratic form u’Au is independent of the linear form Bu if BA = 0, see Theorem 4.17 of Graybill (1961). This is used in Chapter 7 to prove the independence of s2 and @ois, see problem 8. In general, if u ~ N(0, £), then u’Au and u’Bu are independent if and only if ASB = 0, see Theorem 4.21 of Graybill (1961). Many other useful matrix properties can be found. This is only a sample of them that will be implicitly or explicitly used in this book.
The Kronecker product of two matrices say £<g> In where S is m x m and In is the identity matrix of dimension n is defined as follows:
a 11 In – – – a 1mIn
am1In – – – amm In
In other words, we place an In next to every element of S = [aij]- The dimension of the resulting matrix is mn x mn. This is useful when we have a system of equations like Seemingly Unrelated Regressions in Chapter 10. In general, if A is m x n and B is p x q then A <g> B is mp x nq. Some properties of Kronecker
products include (A & B)’ = A’ & B’. If both A and B are square matrices of order m x m and p x p then (A & B)-1 = A-1 & B-1, A & B = |A|m|B|p and tr(A ® B) = tr(A)tr(B). Applying this result to £ & In we get
(E & In)-1 = £-1 & In and |£ & n = |£|m^пГ = |£|m
and tr(E & In) = tr(E)tr(In) = n tr(E).
Some useful properties of matrix differentiation are the following:
x where X is 1 x к and b is k x 1.
(A + A’) where A is к x к.
If A is symmetric, then db’Ab/db = 2Ab. These two properties will be used in Chapter 7 in deriving the least squares estimator.
Bellman, R. (1970), Introduction to Matrix Analysis (McGraw Hill: New York).
Searle, S. R. (1982), Matrix Algebra Useful for Statistics (John Wiley and Sons: New York).