Uniform Continuity

A function g on Кк is called uniformly continuous if for every є > 0 there exists a 8 > 0 such that |g(x) – g(y)| < є if ||x – уУ <8. In particular,

Theorem II.7: If a function g is continuous on a compact subset © ofRk, then it is uniformly continuous on ©.

Proof: Let є > 0 be arbitrary, and observe from the continuity of g that, for each x in ©, there exists a 8(x) > 0 such that |g(x) – g(y)| < є/2 if \x — y\ < 2S(x). Now let U(x) = {y є Rk : ||y – x У < 8(x)}. Then the col­lection {U(x), x є ©} is an open covering of ©; hence, by compactness of © there exists a finite number of points 6, ■■■,6n in © such that © c и"= U (Qj). Next, let 8 = mini< j<„ 8(Qj )■ Each point x є © belongs to at least one of the open sets U(Qj):x є U(Qj) for some j. Then ||x — Qj || < 8(Qj) < 28(Qj) and hence |g(x) — g(Qj)| < є/2. Moreover, if ||x — y || <8, then

У У — Qj II = y — x + x — Qj У < ||x — y У

+ ||x — Qj || <8 + 8(Qj) < 28(Qj);

hence, |g(y) — g(Qj)| < є/2. Consequently, |g(x) — g(y)| < |g(x) — g(Qj)| +

lg(y) — g(Qj)l < є if ||x — y|| < 8. Q. E.D.

II.3. Derivatives of Vector and Matrix Functions

Consider a real function f(x) = f(x1,…,xn) on Rn, where x = (x1;…,x„ )T. Recall that the partial derivative of f to a component xi of x is denoted and defined by

df (x ) df(x1,…,xn )

d xi  def… f(x1 ,…,xi—1, xi +8, xi+1,

= lim————————————

5^0   For example, let f (x) = вTx = xт в = в1 x1 +———- finx„. Then  This result could also have been obtained by treating x T as a scalar and taking the derivative of f (x) = xTe to xT : d(xTв)/dx T = в. This motivates the conven­tion to denote the column vector of a partial derivative of f (x) by df (x )/d xT. Similarly, if we treat x as a scalar and take the derivative of f (x) = вTx to x, then the result is a row vector: д(вTx)/dx = вT. Thus, in general,   If the function H is vector valued, for instance H(x) = (h1(x),■■■, hm (x))T, x є К", then applying the operation d/dx to each of the components yields an m x n matrix:

Moreover, applying the latter to a column vector of partial derivatives of a real function f yields

 / д2 f(x) д2 f(x) д x 1 д x 1 д x 1 д xn д 2 f (x) д 2 f (x) д xn д x 1 дxn дxn /
 d (df (x )/d x T) d x

 d 2 f (x) d x d x T ’

for instance.  In the case of an m x n matrix X with columns x1,…,xn є R, xj = (x^ j, ■■■, xm, j ) and a differentiable function f (X) on the vector space of k x n matrices, we may interpret X = (x1,■■■,xn) as a “row” of column vec­tors, and thus

i-p def t-1

is an n x m matrix. For the same reason, дf (X)/дX1 = (дf (X)/дX)1 ■ An example of such a derivative to a matrix is given by Theorem I.33 in Appendix I, which states that if X is a square nonsingular matrix, then д ln[det(X)]/дX = X-f

Next, consider the quadratic function f (x) = a + xTb + xTCx, where

Thus, C is a symmetric matrix. Then d df (x )/dxk = –

ST’ , dxi V“^ V“^ dxici, jxj

= £1,1 d; + £ z—

n n

= bk + ‘2c;,;x; + ^ ]xici, k + ^ ] ck, j xj

i=1 j = 1

i =k j =;

n

= bk + 2^2 c;, jXj, k = 1,…,n;

j=1

hence, stacking these partial derivatives in a column vector yields

df(x )/d xT = b + 2Cx. (II.8)

If C is not symmetric, we may without loss of generality replace C in the func­tion f (x) by the symmetric matrix C/2 + CT/2 because xTCx = (xTCx)T = xTCTx, and thus

df (x )/dxT = b + Cx + CTx.

The result (II.8) for the case b = 0 can be used to give an interesting alternative interpretation of eigenvalues and eigenvectors of symmetric matrices, namely, as the solutions of a quadratic optimization problem under quadratic restrictions. Consider the optimization problem

maxorminxT Ax s ■ t ■ xTx = 1, (II.9)

where A is a symmetric matrix and “max” and “min” include local maxima and minima and saddle-point solutions. The Lagrange function for solving this problem is

<(x, X) = xTAx + X(1 — xTx) with first-order conditions

d<(x, X)/dxT = 2Ax — 2Xx = 0 ^ Ax = Xx, (II.10)

d<(x, X)/dX = 1 — xTx = 0 ^ ||x || = 1. (II.11)

Condition (II.10) defines the Lagrange multiplier X as the eigenvalue and the solution for x as the corresponding eigenvector of A, and (II.11) is the normal­ization of the eigenvector to unit length. If we combine (II.10) and (II.11), it follows that X = xTAx. Figure II.1. The mean value theorem.