SIMULTANEOUS LINEAR EQUATIONS
Throughout this section, A will denote an n X n square matrix and X a matrix that is not necessarily square. Generally, we shall assume that X is n X К with К ^ n.
Consider the following n linear equations:
anxi + a12x2 + ■ ■ ■ + alnxn = y
CL%X} T CL22X2 T ‘ ‘ ‘ T Cl2nXn y2
(11.4.1)
ani*i "f "f ■ ■ ■ T annxn yn
Define x = (xj, x2,. . . , x„)’ and у = (уь Уъ • • • > Уп)’ an(l let A be as in
(11.1.1) with n = m. Then (11.4.1) can be written in matrix notation as
(11.4.1) Ax = y.
A major goal of this section is to obtain a necessary and sufficient condition on A such that (11.4.2) can be solved in terms of x for any y. Using the notation 3 (there exists), V (for any), and s. t. (such that), we can express the last clause of the previous sentence as
V у 3 x s. t. Ax = y.
Let us consider a couple of examples. The matrix
1 2
2 4 does not satisfy the abovementioned condition because there is clearly no solution to the linear system
‘l 2 
X 
V 

2 4 
Xg 
і 
(11.4.4) 
But there are infinite solutions to the system
і 
2′ 
X 
Y 

2 
4 
Xg 
2 
since any point on the line jq + 2×2 = 1 satisfies (11.4.5). In general, if A is such that Ax = у has no solution for some y, it has infinite solutions for some other y.
Next consider
1 1 1 2
This matrix satisfies the condition because x = 2yi — уъ x2 = y2 — Уі constitute the unique solution to
1 1’ 
X[ 
Уі 

1 2 
Xg 
J2_ 
for any ji and y2. It can be shown that if A satisfies the said condition, the solution to Ax = у is unique.
We now embark on a general discussion, in which the major results are given as a series of definitions and theorems.
DEFINITION 11.4.1 A set of vectors x1; x2, . . . , xx is said to be linearly independent if l. f=! c, x, = 0 implies сг = 0 for all і = 1, 2К. Otherwise it is linearly dependent.
For example, the vectors (1,1) and (1, 2) are linearly independent because
only for сі = C2 = 0. But (1,2) and (2, 4) are linearly dependent because
can be satisfied by setting ci = — 2 and C2 = 1.
DEFINITION 11.4.2 If the column vectors of a matrix (not necessarily square) are linearly independent, we say the matrix is column independent (abbreviated as CI); if the row vectors are linearly independent, we say the matrix is row independent (RI).
THEOREM 11.4.1 A is not CI => A = 0.
Proof. Write A = (ab a2,. . . , a„) and A = Р(щ, a2, . . . , an). Since A is not CI, there is a vector x Ф 0 such that Ax = 0. But x Ф 0 means that at least one element of x is nonzero. Assume xx Ф 0 without loss of generality, where xx is the first element of x. From Definition 11.3.1, we have
(11.4.8) T(Ax, a2, . . . , an) = xjA + x2F(a2, a2, аз,… , a„)
+ x$F(a3, аг, a3,. . . , an) + • • •
* xnI’ (a,;, a2, a3,. . . , an).
But the lefthand side of (11.4.8) is zero by Theorem 11.3.2 and the righthand side is xj A by Theorem 11.3.4. Therefore A = 0. □
The converse of Theorem 11.4.1, stated below as Theorem 11.4.5, is more difficult to prove; therefore, we prove three other theorems first.
theorem 11.4.2 If X’ is К X n, where К < n, X’ is not CI.
Proof. Assume К = n — 1 without loss of generality, for otherwise we can affix n — 1 — К row vectors of zeroes at the bottom of X’. We prove the theorem by induction.
The theorem is clearly true for n = 2. Assume that it is true for n, and consider n + 1. Is there a vector с Ф 0 such that X’c = 0, where X’ is n X (n + 1) and c = (q, c2, . . . , cn+1)’? Write the nth row of X’ as (xnl, xn2, • • ■ ,xn, n+1) and assume without loss of generality that xnn+1 Ф 0. Solving the last equation of X’c = 0 for cn+i and inserting its value into the remaining equations yields n — 1 equations for which the theorem was assumed to hold. So the prescribed c exists. □
THEOREM 11.4.3 A is СІ => V у 3 x s. t. Ax = y.
Proof. Using the matrix (A, y) asX’ in Theorem 11.4.2 shows that (A, y) is not CL Therefore there exists с Ф 0 such that (A, y)c = 0. Since A is Cl, the coefficient on у in c is nonzero and solving for у yields Ax = y. □
THEOREM 11.4.4 A Ф 0 <=> V у 3 x s. t. Ax = y.
Proof. (=>) If A Ф 0, A 1 exists by Definition 11.3.2. Set x = A_1y – Premultiplying both sides by A yields Ax = у because of Theorem 11.3.6.
(<=) Let e* be a column vector with 1 in the ith position and 0 everywhere else. Then Axj = ei, Ax2 = e2, . . . , Axn = e„ may be summarized as AX = I by setting X = (xj, x2,. . . , x„) and noting that (e1; e2, . . . , en) =1. Since l = 1, A Ф 0 follows from Theorem 11.3.5. □
Combining Theorems 11.4.3 and 11.4.4 immediately yields the converse of Theorem 11.4.1, namely,
THEOREM 11.4.5 A is Cl => A Ф 0.
From the results derived thus far, we conclude that the following five statements are equivalent:
A Ф 0.
A 1 exists.
A is CI.
A is RI.
V у 3 x s. t. Ax = y.
definition 11.4.В If any of the above five statements holds, we say A is
nonsingular.
If A is nonsingular, we can solve (11.4.2) for x as x = A ‘y. An alternative solution is given by Cramer’s rule. In this method xt, the ith element of x, is determined by
(11.4.9) * = ^ –
where В* is the n X n matrix obtained by replacing the zth column of A by the vector y.
The remainder of this section is concerned with the relationship between nonsingularity and the concept called full rank, see Definition 11.4.5 below.
DEFINITION 11.4.4 For an arbitrary matrix X, we denote the maximal number of linearly independent column vectors of X by CR(X) (read “column rank of X”) and the maximal number of linearly independent row vectors of X by RR(X) (read “row rank of X”).
THEOREM 11.4.6 Let an n X К matrix X, where К < n, be Cl. Then, RR(X) = K.
Proof. RR(X) > К contradicts Theorem 11.4.2. If RR(X) < K, there exists a vector а Ф 0 such that Ха = 0 because of Theorem 11.4.2; but this contradicts the assumption that X is Cl. □
THEOREM 11.4.7 CR(X) = RR(X).
Proof. Suppose that X is и X ЛД < и, and CR(X) = r < K. Let Xj consist of a subset of the column vectors of X such that X] is n X r and CL Then by Theorem 11.4.6, RR(Xj) = r. Let Xu consist of a subset of the row vectors ofXi such that Xn is г X r and RI. Then (Xn, Y) is RI, where Yis an arbitrary г X (K — r) matrix. Therefore RR(X) > CR(X). By reversing the rows and columns in the above argument, we can similarly show RR(X) < CR(X). □
DEFINITION 11.4.5 CR(X) or, equivalently, RR(X), is called the rank of X. If rank(X) = min(number of rows, number of columns), we say X is
full rank. Note that a square matrix is full rank if and only if it is nonsingular.
THEOREM 11.4.8 An n X К matrix X, where К < n, is full rank if and only if X’X is nonsingular.
Proof. To prove the “only if’ part, note that X’Xc = 0 => c’X’Xc = 0 => c = 0. To prove the “if” part, note that Xc = 0 => X’Xc = 0 => c = 0. □
THEOREM 11.4.9 Let an n X К matrix X be full rank, where К < n. Then there exists an n X (n — K) matrix Z such that (X, Z) is nonsingular and X’Z = 0.
Proof. Because of Theorem 11.4.2, there exists a vector Ф 0 such that X’zj = 0. By the same theorem, there exists a vector z2 Ф 0 such that (X, zi)’z2 = 0, and so on. Collect these n — К vectors and define Z as (z1; z2,. . . , zпк) Clearly, X’Z = 0. We have
where D = Z’Z is a diagonal matrix. Therefore, by Theorems 11.3.1 and 11.3.5,
(11.4.11) (X, Z)2 = X’X D.
But since X’X Ф 0 and D Ф 0,  (X, Z)  Ф 0. □ theorem 11.4.10 LetX be a matrix not necessarily square, and suppose that there exists an n X (n — K) matrix Z of rank n — К such that Z’X = 0. Then rank(X) £ K. If W’X = 0 for some matrix W with n columns implies that rank(W) ^ n — K, then rank(X) = K.
Proof. Suppose rank(X) = r > K. Let S be r linearly independent columns of X. Suppose Sc + Zd = 0 for some vectors c and d. Premultiplying by S’ yields S’Sc = 0. Therefore, by Theorem 11.4.8, c = 0. Premultiplying by Z’ yields Z’Zd = 0. Again by Theorem 11.4.8, d = 0. Then CR[(X, Z)] > n, which is a contradiction.
Next, suppose that rank(X) = r < K. Let S be as defined above. Then,
by Theorem 11.4.9, there exists an n X (n — r) matrix W such that S’W = 0 and rank(W) = n — r. But this contradicts the assumption of the theorem. Therefore, rank(X) = K. □
theorem n.4.11 For any matrixX not necessarily square, rank(BX) = rank(X) if В is nonsingular (NS).
Proof. Let rank(BX) and rank(X) be ту and r2, respectively. By Theorem 11.4.9 there exists a fullrank (n — rf X n matrix Z’ such that Z’BX = 0. Since В is NS, Z’B is also full rank. (To see this, suppose that a’Z’B = 0 for some a. Then a’Z’ = 0 because В is NS. But this implies that a = 0, since Z is full rank.) Therefore r2 ^ r, by the first part of Theorem 11.4.10. Also by Theorem 11.4.9, there exists a fullrank (n — r2) X n matrix Y’ such that Y’X = 0. We can write Y’X = Y’B_1BX. Clearly, Y’B1 is full rank. Therefore r < r2 and ту = r2. □
Leave a reply