Moment-Generating Functions and Characteristic Functions

2.8.1. Moment-Generating Functions

The moment-generating function ofa bounded random variable X (i. e., P [| X | < M] = 1 for some positive real number M < to) is defined as the function

m(t) = E[exp(t ■ X)], t e R, (2.31)

where the argument t is nonrandom. More generally:

Definition 2.15: The moment generating function of a random vector X in R* is defined by m(t) = E[exp(tTX)] for t e T c R*, where T is the set of nonrandom vectors t for which the moment-generating function exists and is finite.

Подпись: m (t) Подпись: E [exp(t ■ X)] = E Подпись: ^ t*X* ^ *! *0 image088

For bounded random variables the moment-generating function exists and is finite for all values of t. In particular, in the univariate bounded case we can write

It is easy to verify that the jth derivative of m(t) is

Подпись: m ( ft)dj m(t) ^ t*—j E[X*]

image090 Подпись: (2.32)

(dt )j = *=j (* — j)!

hence, the jth moment of X is

m(j }(0) = E [Xj ]. (2.33)

This is the reason for calling m(t) the “moment-generating function.”

Although the moment-generating function is a handy tool for computing moments of a distribution, its actual importance arises because the shape of the moment-generating function in an open neighborhood of zero uniquely characterizes the distribution of a random variable. In order to show this, we need the following result.

Theorem 2.21: The distributions of two random vectors X and Y in Rk are the same if and only iffor all bounded continuous functions p on Rk, E [p(X)] = E [p (Y)].

Proof: I will only prove this theorem for the case in which X and Y are random variables: к = 1. Note that the “only if” case follows from the definition of expectation.

Подпись: p(x) image093 Подпись: if x > b, if x < a, if a < x < b. Подпись: (2.34)

Let F(x) be the distribution function of X and let G(y) be the distribution function of Y. Let a < b be arbitrary continuity points of F(x) and G(y) and define

Clearly, (2.34) is a bounded, continuous function and therefore, by assumption, we have E[p(X)] = E[p(Y)]. Now observe from (2.34) that

b

E [p(X)] = j p(x)dF(x) = F(a) + j ^—— dF(x) > F(a)

a

and

b

E[p(X)] = f p(x)dF(x) = F(a) + f -—— dF(x) < F(b). J J b — a

a

image096

Similarly,

and

b

E[p(X)] = f P(y)dG(y) = G(a) + j b-LdG(x) < G(b).

a

If we combine these inequalities with E [p(X)] = E [p(Y)], it follows that for arbitrary continuity points a < b of F(x) and G(y),

G(a) < F(b), F(a) < G(b). (2.35)

If we let b і a, it follows from (2.35) that F(a) = G(a). Q. E.D.

Now assume that the random variables X and Y are discrete, and take with probability 1 the values xi,.. ., xn. Without loss of generality we may assume that xj = j, that is,

P[X e{1, 2,…, n}] = P[Y e{1, 2,…, n}] = 1.

Suppose that all the moments ofXand Ymatch: For k = 1, 2, 3,…, E[Xk] = E[Yk]. I will show that then, for an arbitrary bounded continuous function p

on R, E[p(X)] = E[p(Y)].

Denoting pj = P[X = j], qj = P[Y = j], we can write E[p(X)] = Jfj=1 P(j)Pj, E[p(Y)] = YTj=1 P(j)qj. It is always possible to construct a polynomial p (t) = J2 j-o Pktk such that p(j) = p(j) for j = 1,… n by solving

/111 .

. 1

po

(P (1)

1 2 22 .

. 2j-1

p1

=

P(2)

1 n n2 .

. nj-1)

pj-4

P(n)j

Подпись: j-1 k=o pk

Then E[p(X)] = £j=^j-1 PkjkPj =ТГЛ PkTTj=1 jkPj = E:

E[Xk] and, similarly, E[p(Y)] = X^j-o PkE[Yk]. Hence, it follows from Theorem 2.21 that if all the corresponding moments of X and Y are the same, then the distributions of X and Y are the same. Thus, if the moment-generating functions of X and Y coincide on an open neighborhood of zero, and if all the moments ofXand Yare finite, it follows from (2.33) that all the corresponding moments of X and Y are the same:

Theorem 2.22: If the rajdom variables X ajd Y are discrete ajd take with probability 1 only a finite number of values, then the distributions of X and Y are the same if and only if the moment-generating functions ofX and Y coincide on an arbitrary, small, open neighborhood of zero.

However, this result also applies without the conditions that X and Y are discrete and take only a finite number of values, and for random vectors as well, but the proof is complicated and is therefore omitted:

Theorem 2.23: If the moment-generating functions mX(t) and mY(t) of the random vectors X and Y in R are defined and finite in an open neighborhood N0(8) = {x є R : \x II <5} of the origin ofRk, then the distributions ofX and Y are the same if and only if mX (t) = mY (t) for all t є N0(8)

2.8.2. Characteristic Functions

Подпись: f (x) Подпись: 1 n (1 + x2)’ Подпись: (2.36)

The disadvantage of the moment-generating function is that it may not be finite in an arbitrarily small, open neighborhood of zero. For example, if X has a standard Cauchy distribution, that is, X has density

then

m(t) = j exp(t ■x)f(x)dx I = if t = (237)

— TO

There are many other distributions with the same property as (2.37) (see Chapter 4); hence, the moment-generating functions in these cases are of no use for comparing distributions.

The solution to this problem is to replace t in (2.31) with i ■ t, where i = V—T. The resulting function <p(t) = m(i ■ t) is called the char­acteristic function of the random variable X : y(t) = E [exp(i ■ t ■ X)], t є К. More generally,

Definition 2.16: The characteristic function of a random vector X in Kk is defined by ф(t) = E[exp(i ■ tTX)], t є Kk, where the argument t is nonrandom.

The characteristic function is bounded because exp(i ■ x) = cos(x) + i ■ sin(x). See Appendix III. Thus, the characteristic function in Definition 2.16 can be written as

q>(t) = E[cos(tTX)] + i ■ E[sin(tTX)], t є Kk.

Note that by the dominated convergence theorem (Theorem 2.11), lim^0 ф($) = 1 = ф(0); hence, a characteristic function is always continuous in t = 0.

Replacing moment-generating functions with characteristic functions, we find that Theorem 2.23 now becomes

Theorem 2.24: Random variables or vectors have the same distribution ifand only if their characteristic functions are identical.

The proof of this theorem is complicated and is therefore given in Appendix

2. A at the end of this chapter. The same applies to the following useful result, which is known as the inversion formula for characteristic functions:

Theorem 2.25: Let X be a random vector in Rk with characteristic function q>(i).Iff>(t) is absolutely integrable (i. e., fRk |^(t)|dt < ж), then the distribution of X is absolutely continuous with joint density f (x) = (2n)—k /Rt exp(-i ■ tT x )<p(t )dt.

2.9. Exercises

1. Prove that the collection D in the proof of Theorem 2.1 is a a – algebra.

2. Prove Theorem 2.3.

3. Prove Theorem 2.4 for the max, sup, limsup, and lim cases.

4. Why is it true that if g is Borel measurable then so are g+ and g – in (2.6)?

5. Prove Theorem 2.7.

6. Prove Theorem 2.8.

7. Let g(x) = x if x is rational and g(x) = —x if x is irrational. Prove that g(x) is Borel measurable.

8. Prove parts (a)-( f) of Theorem 2.9 for simple functions

n m

g(x) = Y1 aiI(x є Bi^f(x) = Y1 bjI(x є Cj’)■

i=1 j=1

9. Why can you conclude from Exercise 8 that parts (a)-( f) of Theorem 2.9 hold for arbitrary, nonnegative, Borel-measurable functions?

10. Why can you conclude from Exercise 9 that Theorem 2.9 holds for arbitrary Borel-measurable functions provided that the integrals involved are defined?

11. From which result on probability measures does (2.11) follow?

12. Determine for each inequality in (2.12) which part of Theorem 2.9 has been used.

13. Why do we need the condition in Theorem 2.11 that f g(x)d/x(x) < ж?

14. Note that we cannot generalize Theorem 2.5 to random variables because some­thing missing prevents us from defining a continuous mapping X: Й ^ R. What is missing?

15. Verify (2.16) and complete the proof of Theorem 2.18.

16. Prove equality (2.2).

17. Show that var(X) = E(X2) — (E(E))2, cov(X, Y) = E(X ■ Y) — (E(X)).

(E(Y)), and —1 < corr(X, Y) < 1. Hint: Derive the latter result from var(Y — XX) > 0 for all X.

18. Prove (2.17).

19. Which parts of Theorem 2.15 have been used in (2.18)?

20. How does (2.20) follow from (2.19)?

21. Why does it follow from (2.28) that (2.29) holds for simple random variables?

22. Prove Theorem 2.19.

23. Complete the proof of Theorem 2.20 for the case p = q = 1.

24. Let X = U0(U — 0.5) and Y = U0(U2 – 0.5), where U0, U1, and U2 are independent and uniformly [0, 1] distributed. Show that E[X2 • Y2] = (E [ X2])( E [Y2]).

25. Prove that if (2.29) holds for simple random variables, it holds for all random variables. Hint: Use the fact that convex and concave functions are continuous (see Appendix II).

26. Derive the moment-generating function of the binomial (n, p) distribution.

27. Use the results in Exercise 26 to derive the expectation and variance of the binomial (n, p) distribution.

28. Show that the moment-generating function of the binomial (n, p) distribution

converges pointwise in t to the moment-generating function of the Poisson (A) distribution if n andp 4 0 such that n • p ^ A.

29. Derive the characteristic function of the uniform [0, 1] distribution. Is the inversion formula for characteristic functions applicable in this case?

30. If the random variable X has characteristic function exp(i • t), what is the dis­tribution of X?

31. Show that the characteristic function of a random variable X is real-valued if and only if the distribution of X is symmetric (i. e., X and —X have the same distribution).

32. Use the inversion formula for characteristic functions to show that p(t) = exp(—|t |) is the characteristic function of the standard Cauchy distribution [see (2.36) for the density involved]. Hints: Show first, using Exercise 31 and the inversion formula, that

ж

image101

0

and then use integration by parts.

APPENDIX

2. A. Uniqueness of Characteristic Functions

To understand characteristic functions, you need to understand the basics of complex analysis, which is provided in Appendix III. Therefore, it is recom­mended that Appendix III be read first.

In the univariate case, Theorem 2.24 is a straightforward corollary of the following link between a probability measure and its characteristic function.

Theorem 2.A.1: Let д be a probability measure on the Borel sets in К with characteristic function p, and let a < b be continuity points of д : д({а}) = д([Ь}) = 0. Then

T

1 f exp(-i ■ t ■ a) – exp(-i ■ t ■ b)

д((a, b]) = lim — I ———————————————— p(t)dt.

t^to 2n J i ■ t

-T

(2.38)

Proof: Using the definition of characteristic function, we can write

T

Подпись: / T Подпись: p(t)dtimage104
exp(-i ■ t ■ a) – exp(-i ■ t ■ b)
i ■ t

I Подпись: <exp(-i ■ t ■ a) – exp(-i ■ t ■ b)| /2(1 – cos(t ■ (b – a))

t2

< b – a.

image106

Therefore, it follows from the bounded convergence theorem that

 

image107
image108

d n(x). (2.40)

 

The integral between square brackets can be written as

 

image109
image110

t

 

image111
image112

where sgn(x) = 1ifx > 0, sgn(0) = 0, and sgn(x) = -1ifx < 0. The last two integrals in (2.41) are of the form

to x

image113 image114

j j sin(t)exp(-t • u)dtdu 00

(2.42)

where the last equality follows from integration by parts:

x

J sin(t)exp(-t • u)dt 0

x

Подпись: /Подпись: ■ exp(-1 • u)dtdcos(t)

dt

x

image117

= cos(t)exp(-t • u)|0 – u. j cos(t)exp(-t • u)dt

= 1 – cos(x) exp(-x • u) – u • sin(x) exp(-x • u)

x

– u2 j sin(t) exp(-t • u)dt. 0

Clearly, the second integral at the right-hand side of (2.42) is bounded in x > 0 and converges to zero as x ^to. The first integral at the right-hand side of (2.42) is f du f

Подпись: + u2 00 I 2 =1 d arctan(u) = arctan(TO) = n/2.

Thus, the integral (2.42) is bounded (hence so is (2.41)), and

T

[ exp(i ■ t(x – a)) – exp(i ■ t ■ (x – b)) .

lim ——————————————————- dt

tJ i ■ t

-T

= n [sgn(x – a) – sgn(x – b)]. (2.43)

image119

It follows now from (2.39), (2.40), (2.43), and the dominated convergence theorem that

Подпись: (2.44)= 1 J [sgn(x – a) – sgn(x – b)]d/л(х) = n,((a, b)) + 2^({a}) + 2n-({b}).

Подпись: 0 if x < a or x > b, 1 if x = a or x = b, 2 if a < x < b.
Подпись: sgn(x - a) - sgn(x - b)

The last equality in (2.44) follow from the fact that

The result (2.38) now follows from (2.44) and the condition ^({a}) = ^({b}) = 0. Q. E.D.

Note that (2.38) also reads as

T

i-т/ ,• 1 f exP(-i ■t ■ a) – exp(-i ■t ■ b)

F(b) – F(a) = lim – r – —————— ———————- v(t)dt,

t2n J і ■ t

-T

(2.45)

where F is the distribution function corresponding to the probability measure /г.

Next, suppose that у is absolutely integrable:/ГО l<K0|dt < to. Then(2.45) can be written as

Подпись: y(t )dt,^ ^ 4 1 f exP(-i ■t ■ a) – exp(-i ■t ■ b)

Подпись: і ■ tF(b) – F(a) = — ———————

2n J і■t

image125

and it follows from the dominated convergence theorem that

Подпись: TOПодпись: — TOПодпись: 1 - exp(—г ■ t ■ (b - a)) i ■ t ■ (b — a)Подпись: TOimage130exp(—i ■ t ■ a)p(t)dt

Подпись: i ■ t ■ a)p(t)dt.=b! exp(

—TO

This proves Theorem 2.25 for the univariate case. In the multivariate case Theorem 2.A.1 becomes

Подпись: д(B) = lim ... lim TI^TO Tk кто image133 image134

Theorem 2.A.2: Let д be a probability measure on the Borel sets in Kk with characteristic function p. Let B = xk=1(aj, bj], where aj < bj for j = 1, 2,… ,k, and let d B be the border ofB, that is, d B = {xkj=1[aj, bj ]} [xkj=x(aj, bj)}. If д(дB) = 0; then

Подпись: (2.46)x p(t)dt,

where t = (t1tk)T

image136 Подпись: p(t)dt,

This result proves Theorem 2.24 for the general case.
Moreover, if fKk |p(t)|dt < to, (2.46) becomes

Подпись: d k д( B) d ai.. .d ak Подпись: exp( Подпись: i ■ tTa)p(t)dt, Подпись: (2.47)

and by the dominated convergence theorem we may take partial derivatives inside the integral:

where a = (a1,…, ak)T. The latter is just the density corresponding to д in point a. Thus, (2.47) proves Theorem 2.25.

Leave a reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>