SIMULTANEOUS LINEAR EQUATIONS

Throughout this section, A will denote an n X n square matrix and X a matrix that is not necessarily square. Generally, we shall assume that X is n X К with К ^ n.

Consider the following n linear equations:

anxi + a12x2 + ■ ■ ■ + alnxn = y

CL%X} T CL22X2 T ‘ ‘ ‘ T Cl2nXn y2

(11.4.1)

ani*i "f "f ■ ■ ■ T annxn yn

Define x = (xj, x2,. . . , x„)’ and у = (уь Уъ • • • > Уп)’ an(l let A be as in

(11.1.1) with n = m. Then (11.4.1) can be written in matrix notation as

(11.4.1) Ax = y.

A major goal of this section is to obtain a necessary and sufficient condi­tion on A such that (11.4.2) can be solved in terms of x for any y. Using the notation 3 (there exists), V (for any), and s. t. (such that), we can express the last clause of the previous sentence as

V у 3 x s. t. Ax = y.

Let us consider a couple of examples. The matrix

1 2

2 4 does not satisfy the above-mentioned condition because there is clearly no solution to the linear system

 ‘l 2 X V 2 4 Xg і
 (11.4.4)

But there are infinite solutions to the system

 і 2′ X Y 2 4 Xg 2

since any point on the line jq + 2×2 = 1 satisfies (11.4.5). In general, if A is such that Ax = у has no solution for some y, it has infinite solutions for some other y.

Next consider

1 1 1 2

This matrix satisfies the condition because x = 2yi — уъ x2 = y2 — Уі constitute the unique solution to

 1 1’ X-[ Уі 1 2 Xg J2_

for any ji and y2. It can be shown that if A satisfies the said condition, the solution to Ax = у is unique.

We now embark on a general discussion, in which the major results are given as a series of definitions and theorems.

DEFINITION 11.4.1 A set of vectors x1; x2, . . . , xx is said to be linearly independent if l. f=! c, x, = 0 implies сг = 0 for all і = 1, 2К. Otherwise it is linearly dependent.

For example, the vectors (1,1) and (1, 2) are linearly independent because

only for сі = C2 = 0. But (1,2) and (2, 4) are linearly dependent because

can be satisfied by setting ci = — 2 and C2 = 1.

DEFINITION 11.4.2 If the column vectors of a matrix (not necessarily square) are linearly independent, we say the matrix is column independent (abbreviated as CI); if the row vectors are linearly independent, we say the matrix is row independent (RI).

THEOREM 11.4.1 A is not CI => |A| = 0.

Proof. Write A = (ab a2,. . . , a„) and |A| = Р(щ, a2, . . . , an). Since A is not CI, there is a vector x Ф 0 such that Ax = 0. But x Ф 0 means that at least one element of x is nonzero. Assume xx Ф 0 without loss of generality, where xx is the first element of x. From Definition 11.3.1, we have

(11.4.8) T(Ax, a2, . . . , an) = xj|A| + x2F(a2, a2, аз,… , a„)

+ x\$F(a3, аг, a3,. . . , an) + • • •

* xnI’ (a,;, a2, a3,. . . , an).

But the left-hand side of (11.4.8) is zero by Theorem 11.3.2 and the right-hand side is xj |A| by Theorem 11.3.4. Therefore |A| = 0. □

The converse of Theorem 11.4.1, stated below as Theorem 11.4.5, is more difficult to prove; therefore, we prove three other theorems first.

theorem 11.4.2 If X’ is К X n, where К < n, X’ is not CI.

Proof. Assume К = n — 1 without loss of generality, for otherwise we can affix n — 1 — К row vectors of zeroes at the bottom of X’. We prove the theorem by induction.

The theorem is clearly true for n = 2. Assume that it is true for n, and consider n + 1. Is there a vector с Ф 0 such that X’c = 0, where X’ is n X (n + 1) and c = (q, c2, . . . , cn+1)’? Write the nth row of X’ as (xnl, xn2, • • ■ ,xn, n+1) and assume without loss of generality that xnn+1 Ф 0. Solving the last equation of X’c = 0 for cn+i and inserting its value into the remaining equations yields n — 1 equations for which the theorem was assumed to hold. So the prescribed c exists. □

THEOREM 11.4.3 A is СІ => V у 3 x s. t. Ax = y.

Proof. Using the matrix (A, y) asX’ in Theorem 11.4.2 shows that (A, y) is not CL Therefore there exists с Ф 0 such that (A, y)c = 0. Since A is Cl, the coefficient on у in c is nonzero and solving for у yields Ax = y. □

THEOREM 11.4.4 |A| Ф 0 <=> V у 3 x s. t. Ax = y.

Proof. (=>) If |A| Ф 0, A 1 exists by Definition 11.3.2. Set x = A_1y – Premultiplying both sides by A yields Ax = у because of Theorem 11.3.6.

(<=) Let e* be a column vector with 1 in the ith position and 0 everywhere else. Then Axj = ei, Ax2 = e2, . . . , Axn = e„ may be summa­rized as AX = I by setting X = (xj, x2,. . . , x„) and noting that (e1; e2, . . . , en) =1. Since |l| = 1, |A| Ф 0 follows from Theorem 11.3.5. □

Combining Theorems 11.4.3 and 11.4.4 immediately yields the converse of Theorem 11.4.1, namely,

THEOREM 11.4.5 A is Cl => |A| Ф 0.

From the results derived thus far, we conclude that the following five statements are equivalent:

|A| Ф 0.

A 1 exists.

A is CI.

A is RI.

V у 3 x s. t. Ax = y.

definition 11.4.В If any of the above five statements holds, we say A is

nonsingular.

If A is nonsingular, we can solve (11.4.2) for x as x = A ‘y. An alternative solution is given by Cramer’s rule. In this method xt, the ith element of x, is determined by

lBil

(11.4.9) * = ^ –

|A|

where В* is the n X n matrix obtained by replacing the zth column of A by the vector y.

The remainder of this section is concerned with the relationship be­tween nonsingularity and the concept called full rank, see Definition 11.4.5 below.

DEFINITION 11.4.4 For an arbitrary matrix X, we denote the maximal number of linearly independent column vectors of X by CR(X) (read “column rank of X”) and the maximal number of linearly independent row vectors of X by RR(X) (read “row rank of X”).

THEOREM 11.4.6 Let an n X К matrix X, where К < n, be Cl. Then, RR(X) = K.

Proof. RR(X) > К contradicts Theorem 11.4.2. If RR(X) < K, there exists a vector а Ф 0 such that Ха = 0 because of Theorem 11.4.2; but this contradicts the assumption that X is Cl. □

THEOREM 11.4.7 CR(X) = RR(X).

Proof. Suppose that X is и X ЛД < и, and CR(X) = r < K. Let Xj consist of a subset of the column vectors of X such that X] is n X r and CL Then by Theorem 11.4.6, RR(Xj) = r. Let Xu consist of a subset of the row vectors ofXi such that Xn is г X r and RI. Then (Xn, Y) is RI, where Yis an arbitrary г X (K — r) matrix. Therefore RR(X) > CR(X). By reversing the rows and columns in the above argument, we can similarly show RR(X) < CR(X). □

DEFINITION 11.4.5 CR(X) or, equivalently, RR(X), is called the rank of X. If rank(X) = min(number of rows, number of columns), we say X is
full rank. Note that a square matrix is full rank if and only if it is nonsin­gular.

THEOREM 11.4.8 An n X К matrix X, where К < n, is full rank if and only if X’X is nonsingular.

Proof. To prove the “only if’ part, note that X’Xc = 0 => c’X’Xc = 0 => c = 0. To prove the “if” part, note that Xc = 0 => X’Xc = 0 => c = 0. □

THEOREM 11.4.9 Let an n X К matrix X be full rank, where К < n. Then there exists an n X (n — K) matrix Z such that (X, Z) is nonsingular and X’Z = 0.

Proof. Because of Theorem 11.4.2, there exists a vector Ф 0 such that X’zj = 0. By the same theorem, there exists a vector z2 Ф 0 such that (X, zi)’z2 = 0, and so on. Collect these n — К vectors and define Z as (z1; z2,. . . , zп-к)- Clearly, X’Z = 0. We have

where D = Z’Z is a diagonal matrix. Therefore, by Theorems 11.3.1 and 11.3.5,

(11.4.11) |(X, Z)|2 = |X’X| |D|.

But since |X’X| Ф 0 and |D| Ф 0, | (X, Z) | Ф 0. □ theorem 11.4.10 LetX be a matrix not necessarily square, and suppose that there exists an n X (n — K) matrix Z of rank n — К such that Z’X = 0. Then rank(X) £ K. If W’X = 0 for some matrix W with n columns implies that rank(W) ^ n — K, then rank(X) = K.

Proof. Suppose rank(X) = r > K. Let S be r linearly independent columns of X. Suppose Sc + Zd = 0 for some vectors c and d. Premulti­plying by S’ yields S’Sc = 0. Therefore, by Theorem 11.4.8, c = 0. Pre­multiplying by Z’ yields Z’Zd = 0. Again by Theorem 11.4.8, d = 0. Then CR[(X, Z)] > n, which is a contradiction.

Next, suppose that rank(X) = r < K. Let S be as defined above. Then,

by Theorem 11.4.9, there exists an n X (n — r) matrix W such that S’W = 0 and rank(W) = n — r. But this contradicts the assumption of the theorem. Therefore, rank(X) = K. □

theorem n.4.11 For any matrixX not necessarily square, rank(BX) = rank(X) if В is nonsingular (NS).

Proof. Let rank(BX) and rank(X) be ту and r2, respectively. By Theo­rem 11.4.9 there exists a full-rank (n — rf X n matrix Z’ such that Z’BX = 0. Since В is NS, Z’B is also full rank. (To see this, suppose that a’Z’B = 0 for some a. Then a’Z’ = 0 because В is NS. But this implies that a = 0, since Z is full rank.) Therefore r2 ^ r, by the first part of Theorem 11.4.10. Also by Theorem 11.4.9, there exists a full-rank (n — r2) X n matrix Y’ such that Y’X = 0. We can write Y’X = Y’B_1BX. Clearly, Y’B-1 is full rank. Therefore r < r2 and ту = r2. □