Inner Product, Orthogonal Bases, and Orthogonal Matrices
It follows from (I.10) that the cosine of the angle y between the vectors x in (I.2) and y in (I.5) is
Figure I.5. Orthogonalization.
Definition I.13: The quantity x Ty is called the inner product of the vectors x andy.
IfxTy = 0,thencos(y) = 0;hence, у = n/2ory = 3n/4. This corresponds to angles of 90 and 270°, respectively; hence, x andy are perpendicular. Such vectors are said to be orthogonal.
Definition I.14: Conformable vectors x and y are orthogonal if their inner product x Ty is zero. Moreover, they are orthonormal if, in addition, their lengths are 1: ||x|| = ||y|| = 1.
In Figure I.4, if we flip point p over to the other side of the origin along the line through the origin and point a and add b to – p, then the resulting vector c = b — p is perpendicular to the line through the origin and point a. This is illustrated in Figure I.5. More formally,
aTc = aT(b — p) = aT(b — (aTb/||a||2))a
= aTb — (aTb/||a||2)||a||2 = 0.
This procedure can be generalized to convert any basis of a vector space into an orthonormal basis as follows. Let a1,…,ak, k < n be a basis for a subspace of Rn, and let q1 = |a11|—1 ■ a1. The projection of a2 on q1 is now p = (qja2) ■ q1; hence, a| = a2 — (qjTa2) ■ q1 is orthogonal to q1. Thus, let q2 = ||af ||—1a|. The next step is to erect a3 perpendicular to q1 and q2, which can be done by subtracting from a3 its projections on q1 and q2, that is, a3 = a3 — (ajq1)q1 — (ajq2)q2. Using the facts that, by construction,
ПГ ПГ ПГ ПГ
qxq1 = 1, q2 q2 = 1, q^2 = 0, q2q1 = 0,
we have indeed that q^a^ = qTa3 — (ajq1)qjq1 — (ajq2)qq2 = q/a — ajq1 = 0 and similarly, q^a^ = 0. Thus, now let q3 = 11 a| |— 1a3. Repeating this procedure yields
Theorem I.16: Let a1,…, ak be a basis for a subspace of R”, and construct q1,…,qk recursively by
q1 = ||a11|-1 ■ a1 and ajj = aj — ^ (ajq;■) qi,
qj = \aj H—1a} for j = 2, 3,…,k. (I.42)
Thenq1, …,qk is an orthonormal basis for the subspace spanned by a1, …,ak.
The construction (I.42) is known as the Gram-Smidt process. The orthonormality of q1,…,qk has already been shown, but it still has to be shown that q1,…,qk spans the same subspace as a1,…,ak .To show this, observe from (I.42) that a1 ,…,ak is related to q1,…,qk by
aj = ^2, uijqi, j = 1, 2,…,k, (I.43)
uj, j = ||aj II, uitj = qjaj for i < j,
uit j = 0 for i > j, i, j = 1,…, k (I.44)
with aj = a1. It follows now from (I.43) that a1,…,ak are linear combinations of q1,…,qk, and it follows from (I.42) that q1,…,qk are linear combinations of a1,…,ak; hence, the two bases span the same subspace.
Observe from (I.44) that the k x k matrix U with elements ui, j is an upper – triangular matrix with positive diagonal elements. Moreover, if we denote by A the n x k matrix with columns a1,…,ak and by Q the n x k matrix with columns q1,…,qk, it follows from (I.43) that A = QU. Thus, Theorem I.17 follows from Theorem I.16, (I.43), and (I.44):
Theorem I.17: Let A be an n x k matrix with rank k. There exists an n x k matrix Q with orthonormal columns and an upper-triangular k x k matrix U with positive diagonal elements such that A = QU.
In the case k = n, the matrix Q in Theorem I.17 is called an orthogonal matrix:
Definition I.15: An orthogonal matrix Q is a square matrix with orthonormal columns: QT Q = I.
In particular, if Q is an orthogonal n x n matrix with columns q1,…,qn, then the elements of the matrix QT Q are qjqj = I (i = j), where I( ) is the
indicator function9; hence, QTQ = In. Thus, QT = Q-1. It follows now from Theorem I.1 also that QQT = In, that is, the rows of an orthogonal matrix are also orthonormal.
Orthogonal transformations of vectors leave the angles between the vectors, and their lengths, the same. In particular, let x and y be vectors in Kn and let Q be an orthogonal n x n matrix. Then (Qx)T(Qy) = xT QT Qy = xTy, || Qx || =
The last equality in (I.47) is a matter of normalization inasmuch as -(a1b2 – b1a2) would also fit (I.47), but the chosen normalization is appropriate for (I.46) because, then,
det(A) = aib2 – bia2 = 6 x 7 – 3 x 4 = 30. (I.48)
However, as I will show for the matrix (I.50) below, a determinant can be negative or zero.
Equation (I.47) reads in words:
Definition I.16: The determinant of a 2 x 2 matrix is the product of the diagonal elements minus the product of the off-diagonal elements.
We can also express (I.47) in terms of the angles va and vb of the vectors a and b, respectively, with the right-hand side of the horizontal axis:
a1 = ||a|| cos(va), a2 = ||a|| sin(va),
b1 = ||b|| cos(vb), b2 = ||b|| sin(vb);
det(A) = a1b2 — b1a2
= II a II ■ ||b|| ■ (cos(<Pa)sin(Vb) – sin(^a) cos(Vb))
= IIa || ■ ||b|| ■ sin(vb – Va)• (I 49)
Because, in Figure I.3, 0 < vb – Va < n, we have that sin(vb – Va) > 0.
As an example of a negative determinant, let us swap the columns of A and call the result matrix B:
=b “>=(b aO=(? 9 • (l50)
is the elementary permutation matrix involved. Then det(B) = b1a2 – a1b2 = -30.
At first sight this seems odd because the area enclosed by the parallelogram in Figure I.3 has not been changed. However, it has! Recall the interpretation of a matrix as a mapping: A matrix moves a point to a new location by replacing the original perpendicular coordinate system by a new system formed by the
columns space of the matrix involved with new units of measurement the lengths of the columns. In the case of the matrix B in (I.50) we have
Thus, b is now the first unit vector, and a is the second. If we adopt the convention that the natural position of unit vector 2 is above the line spanned by the first unit vector, as is the case for e1 and e2, then we are actually looking at the parallelogram in Figure I.3 from the backside, as in Figure I.6.
Thus, the effect of swapping the columns of the matrix A in (I.46) is that Figure
I.3 is flipped over vertically 180°. Because we are now looking at Figure I.3 from the back, which is the negative side, the area enclosed by the parallelogram is negative too! Note that this corresponds to (I.49): If we swap the columns of A, then we swap the angles va and yb in (I.49); consequently, the determinant flips sign.
As another example, let a be as before, but now position b in the southwest quadrant, as in Figures I.7 and I.8. The fundamental difference between these two cases is that in Figure I.7 point b is above the line through a and the origin, and thus yb – Va < n, whereas in Figure I.8 point b is below that line: Vb – Va > n. Therefore, the area enclosed by the parallelogram in Figure I.7 is positive, whereas the area enclosed by the parallelogram in Figure I.8 is
Figure I.7. det(a, b) > 0.
negative. Hence, in the case of Figure I.7, det(a, b) > 0, and in the case of Figure I.8, det(a, b) < 0. Again, in Figure I.8 we are looking at the backside of the picture; you have to flip it vertically to see the front side.
What I have demonstrated here for 2 x 2 matrices is that, if the columns are interchanged, then the determinant changes sign. It is easy to see that the same
Figure I.8. det(a, b) < 0.
applies to the rows. This property holds for general n x n matrices as well in the following way.
Theorem I.18: If two adjacent columns or rows of a square matrix are swapped,10 then the determinant changes sign only.
Next, let us consider determinants of special 2 x 2 matrices. The first special case is the orthogonal matrix. Recall that the columns of an orthogonal matrix are perpendicularandhaveunit length. Moreover, recall that an orthogonal 2 x 2 matrix rotates a set of points around the origin, leaving angles and distances the same. In particular, consider the set of points in the unit square formed by the vectors (0, 0)T, (0, 1)T, (1, 0)T, and (1, 1)T. Clearly, the area of this unit square equals 1, and because the unit square corresponds to the 2 x 2 unit matrix I2, the determinant of I2 equals 1. Now multiply I2 by an orthogonal matrix Q. The effect is that the unit square is rotated without affecting its shape or size. Therefore,
Theorem I.19: The determinant of an orthogonal matrix is either 1 or — 1, and the determinant of a unit matrix is 1.
The “either-or” part follows from Theorem I.18: swapping adjacent columns of an orthogonal matrix preserves orthonormality of the columns of the new matrix but switches the sign of the determinant. For example, consider the orthogonal matrix Q in (I.45). Then it follows from Definition I.16 that
det( Q) = — cos2(0) — sin2(0) = -1.
Now swap the columns of the matrix (I.45):
/ sin(0) – cos(0)
Q ycos(0) sin(0) у ■
Then it follows from Definition I.16 that
det( Q) = sin2(0) + cos2(0) = 1.
Note that Theorem I.19 is not confined to the 2 x 2 case; it is true for orthogonal and unit matrices of any size.
Next, consider the lower-triangular matrix
10 The operation of swapping a pair of adjacent columns or rows is also called a column or row exchange, respectively.
Figure I.9. det(L).
According to Definition I.16, det(L) = a ■ c — 0 ■ c = a ■ c, and thus in the 2 x 2 case the determinant of a lower-triangular matrix is the product of the diagonal elements. This is illustrated in Figure I.9. The determinant of L is the area in the parallelogram, which is the same as the area in the rectangle formed by the vectors (a, 0)T and (0, c)T. This area is a ■ c. Thus, you can move b freely along the vertical axis without affecting the determinant of L. If you were to flip the picture over vertically, which corresponds to replacing a by —a, the parallelogram would be viewed from the backside; hence, the determinant flips sign.
The same result applies of course to upper-triangular and diagonal 2 x 2 matrices. Thus, we have
Theorem I.20: The determinant of a lower-triangular matrix is the product of the diagonal elements. The same applies to an upper-triangular matrix and a diagonal matrix.
Again, this result is not confined to the 2 x 2 case but holds in general.
Now consider the determinant of a transpose matrix. In the 2 x 2 case the transpose AT of A can be formed by first swapping the columns and then swapping the rows. Then it follows from Theorem I.18 that in each of the two steps only the sign flips; hence,
Theorem I.21: det(A) = det(AT).
The same applies to the general case: the transpose of A can be formed by a sequence of column exchanges and a corresponding sequence of row exchanges, and the total number of column and row exchanges is an even number.
It follows from Theorem I.11 that, in the case of a square matrix A, there exist a permutation matrix P possibly equal to the unit matrix I, a lower-triangular matrix L with diagonal elements all equal to 1, a diagonal matrix D, and an upper-triangular matrix U with diagonal elements all equal to 1 such that PA = LDU. Moreover, recall that a permutation matrix is orthogonal because it consists of permutations of the columns of the unit matrix. Thus, we can write
A = P TLDU.
Now consider the parallelogram formed by the columns of U. Because the diagonal elements of U are 1, the area of this parallelogram is the same as the area of the unit square: det(U) = det(I). Therefore, the effect of the transformation PTLD on the area of the parallelogram formed by the columns of U is the same as the effect of PTLD on the area of the unit square, and consequently det(PTLDU) = det(PTLD). The effect of multiplying D by L is that the rectangle formed by the columns of D is tilted and squeezed without affecting the area itself. Therefore, det(LD) = det(D), and consequently det(PTLDU) = det(PTD). Next, PT permutates the rows of D, and so the effect on det(D) is a sequence of sign switches only. The number of sign switches involved is the same as the number of column exchanges of PT necessary to convert PT into the unit matrix. If this number of swaps is even, then det(P) = det(PT) = 1; otherwise, det(P) = det(PT) = —1. Thus, in the 2 x 2 case (as well as in the general case) we have
Theorem I.22: det (A) = det (P) ■ det (D), where P and D are the permutation matrix and the diagonal matrix, respectively, in the decomposition PA = LDU in Theorem I.11 for the case of a square matrix A.
This result yields two important corollaries. First,
Theorem I.23: The determinant of a singular matrix is zero.
To see this, observe from the decomposition PA = LDU that A is singular if and only if D is singular. If D is singular, then at least one of the diagonal elements of D is zero; hence, det(D) = 0.
Second, for conformable square matrices A and B we have
Theorem I.24: det(AB) = det(A) ■ det(B).
This result can be shown in the same way as Theorem I.22, that is, by showing that det(A) = det(PTLDUB) = det(P) ■ det(DB) and det(DB) = det(D) ■ det( B).
Moreover, Theorems 1.20 and 1.24 imply that
Theorem I.25: Adding or subtracting a constant times a row or column to another row or column, respectively, does not change the determinant.
The reason is that this operation is equivalent to multiplying a matrix by an elementary matrix and that an elementary matrix is triangular with diagonal elements equal to 1.
Furthermore, we have
Theorem I.26: Let A be an n x n matrix and let c be a scalar. If one of the columns or rows is multiplied by c, then the determinant of the resulting matrix is c ■ det(A). Consequently, det(c ■ A) = cn ■ det(A).
This theorem follows straightforwardly from Theorems I.20 and I.24. For example, let B be a diagonal matrix with diagonal elements 1, except for one element, such as diagonal element i, which equals c. Then BA is the matrix A with the ith column multiplied by c. Because, by Theorem I.20, det(B) = c, the first part of Theorem I.26 for the “column” case follows from Theorem I.24, and the “row” case follows from det(AB) = det(A) ■ det(B) = c ■ det(A). The second part follows by choosing B = c ■ In.
The results in this section merely serve as a motivation for what a determinant is as well as its geometric interpretation and basic properties. All the results so far can be derived from three fundamental properties, namely, the results in Theorems I.18, I.20, and I.21. If we were to assume that the results in Theorems I.18, I.20, and I.21 hold and treat these properties as axioms, all the other results would follow from these properties and the decomposition PA = LDU. Moreover, the function involved is unique.
As to the latter, suppose that 8(A) is a function satisfying
(a) If two adjacent rows or columns are swapped, then 8 switches sign only.
(b) If A is triangular, then 8( A) is the product of the diagonal elements of A.
(c) 8(AB) = 8(A) ■ 8(B).
Then it follows from the decomposition A = PTLDU and axiom (c) that
8(A) = 8(P T)8(L )8(D)8(U).
Moreover, it follows from axiom (b) that 8(L) = 8(U) = 1 and 8(D) = det(D). Finally, it follows from axiom (b) that the functions 8( ) and det( ) coincide for unit matrices and therefore by axiom (a), 8(PT) = 8(P) = det(P). Thus, 8(A) = det(A); hence, the determinant is uniquely defined by the axioms (a), (b), and (c). Consequently,
Definition I.17: The determinant of a square matrix is uniquely defined by three fundamental properties:
(a) If two adjacent rows or columns are swapped, then the determinant switches sign only.
(b) The determinant of a triangular matrix is the product of the diagonal elements.
(c) The determinant ofAB is the product of the determinants ofA and B.
These three axioms can be used to derive a general expression for the determinant together with the results in the next section regarding determinants of block-triangular matrices.