# The Multivariate Normal Distribution

Now let the components of X = (x,…, xn)T be independent, standard nor­mally distributed random variables. Then, E(X) = 0(e Kn) and Var(X) = In. Moreover, the joint density f (x) = f (x,.xn )of X in this case is the product of the standard normal marginal densities:

A exp(-x2

f(x) = f(X1, …,xn) = П ——————-

j=1 jln

_ exp (-2 ЕП=1xj) exp (-1XTx)

(fbn )n (fbn )n ‘

The shape of this density for the case n = 2 is displayed in Figure 5.1.

Next, consider the following linear transformations of X : Y = д + AX, where д = (д1,…, дп )T is a vector of constants and A is a nonsingu­lar n x n matrix with nonrandom elements. Because A is nonsingular and therefore invertible, this transformation is a one-to-one mapping with inverse X = A-1 (Y – д). Then the density function g(y) of Yis equal to  f (x )|det(9 x/дy )| f (A-1 y – A-V)|det(d(A-1 y

f (A-1 y – A-V)|det(A-1)| =

exp [-2(y – д)T(A 1)TA 1(y – д)]

(V2n )n |det( A)|

exp [-2(y – д)T(AAT)-1(y – д)] (V2n )V |det( AAT)| .

Observe that д is the expectation vector of Y : E(Y) = д + A (E(X)) = д. But what is AAT? We know from (5.2) that Var(Y) = E [iYT] – дд’1′. Therefore, substituting Y = д + AX yields

Var(Y) = E[(д + AX)(^T + XTAT) – ддТ]

= д( E (XT)) AT + A(E ^д^ + A( E (XXT)) AT = AAT Figure 5.1. The bivariate standard normal density on [-3, 3] x [-3, 3].

because E(X) = 0 and E[XXT] = In. Thus, AAT is the variance matrix of Y. This argument gives rise to the following definition of the n-variate normal distribution:   Definition 5.1: Let Y be an n x 1 random vector satisfying E(Y) = д and Var(Y) = £, where £ is nonsingular. Then Y is distributed Nn (д, £) if the density g(y) ofY is of the form

In the same way as before we can show that a nonsingular (hence one-to-one) linear transformation of a normal distribution is normal itself:

Theorem 5.1: Let Z = a + BY, where Y is distributed Nn(д, £) and B is a nonsingular matrix of constants. Then Z is distributed Nn (a + B д, B£ BT).

Proof: First, observe that Z = a + BY implies Y = B-1(Z – a). Let h(z) be the density of Z and g(y) the density of Y. Then

h(z) = g(y)|det(9y/d z)|

= g(B-1 z – B-1a)|det(9(B-1 z – B-1a)/9z)|

_ g(B-1 z – B-1a) g(B-1(z – a))

|det(B)| det(BBT)

exp [-2(B-1(z – a) – д)^-1(B-1(z – a) – д)]

(V2n)V det(£Vdet(BBT)
exp [-2(z – a – B ifT(B£BT)-1(z – a – B д)]

= (V2n )V det( B £ B T) ‘

Q. E.D.

I will now relax the assumption in Theorem 5.1 that the matrix B is a nonsin­gular n x n matrix. This more general version of Theorem 5.1 can be proved using the moment-generating function or the characteristic function of the mul­tivariate normal distribution.

Theorem 5.2: Let Y be distributed Nn(д, E). Then the moment-generating function of Y is m(t) = exp(tTд + tTEt/2), and the characteristic of Y is tp(t) = exp(i ■ tтд — tTEt/2). Proof: We have

Because the last integral is equal to 1, the result for the moment-generating function follows. The result for the characteristic function follows from p(t) = m(i ■ t). Q. E.D.

Theorem 5.3: Theorem 5.1 holds for any linear transformation Z = a + BY.

Proof: Let Z = a + BY, where B is m x n. It is easy to verify that the char­acteristic functionof Z is vz(t) = E[exp(i ■ tTZ)] = E[exp(i ■ tT(a + BY))] =

exp(i ■ tTa)E[exp(i ■ tTBY)] = exp(i ■ (a + Bд)тt – 1 tTBEBTt). Theorem