# Moment-Generating Functions and Characteristic Functions

2.8.1. Moment-Generating Functions

The moment-generating function ofa bounded random variable X (i. e., P [| X | < M] = 1 for some positive real number M < to) is defined as the function

m(t) = E[exp(t ■ X)], t e R, (2.31)

where the argument t is nonrandom. More generally:

Definition 2.15: The moment generating function of a random vector X in R* is defined by m(t) = E[exp(tTX)] for t e T c R*, where T is the set of nonrandom vectors t for which the moment-generating function exists and is finite.

For bounded random variables the moment-generating function exists and is finite for all values of t. In particular, in the univariate bounded case we can write

It is easy to verify that the jth derivative of m(t) is

dj m(t) ^ t*—j E[X*]

(dt )j = *=j (* — j)!

hence, the jth moment of X is

m(j }(0) = E [Xj ]. (2.33)

This is the reason for calling m(t) the “moment-generating function.”

Although the moment-generating function is a handy tool for computing moments of a distribution, its actual importance arises because the shape of the moment-generating function in an open neighborhood of zero uniquely characterizes the distribution of a random variable. In order to show this, we need the following result.

Theorem 2.21: The distributions of two random vectors X and Y in Rk are the same if and only iffor all bounded continuous functions p on Rk, E [p(X)] = E [p (Y)].

Proof: I will only prove this theorem for the case in which X and Y are random variables: к = 1. Note that the “only if” case follows from the definition of expectation.

Let F(x) be the distribution function of X and let G(y) be the distribution function of Y. Let a < b be arbitrary continuity points of F(x) and G(y) and define

Clearly, (2.34) is a bounded, continuous function and therefore, by assumption, we have E[p(X)] = E[p(Y)]. Now observe from (2.34) that

b

E [p(X)] = j p(x)dF(x) = F(a) + j ^—— dF(x) > F(a)

a

and

b

E[p(X)] = f p(x)dF(x) = F(a) + f -—— dF(x) < F(b). J J b — a

a

Similarly,

and

b

E[p(X)] = f P(y)dG(y) = G(a) + j b-LdG(x) < G(b).

a

If we combine these inequalities with E [p(X)] = E [p(Y)], it follows that for arbitrary continuity points a < b of F(x) and G(y),

G(a) < F(b), F(a) < G(b). (2.35)

If we let b і a, it follows from (2.35) that F(a) = G(a). Q. E.D.

Now assume that the random variables X and Y are discrete, and take with probability 1 the values xi,.. ., xn. Without loss of generality we may assume that xj = j, that is,

P[X e{1, 2,…, n}] = P[Y e{1, 2,…, n}] = 1.

Suppose that all the moments ofXand Ymatch: For k = 1, 2, 3,…, E[Xk] = E[Yk]. I will show that then, for an arbitrary bounded continuous function p

on R, E[p(X)] = E[p(Y)].

Denoting pj = P[X = j], qj = P[Y = j], we can write E[p(X)] = Jfj=1 P(j)Pj, E[p(Y)] = YTj=1 P(j)qj. It is always possible to construct a polynomial p (t) = J2 j-o Pktk such that p(j) = p(j) for j = 1,… n by solving

 /111 . . 1 po (P (1) 1 2 22 . . 2j-1 p1 = P(2) 1 n n2 . . nj-1) pj-4 P(n)j

Then E[p(X)] = £j=^j-1 PkjkPj =ТГЛ PkTTj=1 jkPj = E:

E[Xk] and, similarly, E[p(Y)] = X^j-o PkE[Yk]. Hence, it follows from Theorem 2.21 that if all the corresponding moments of X and Y are the same, then the distributions of X and Y are the same. Thus, if the moment-generating functions of X and Y coincide on an open neighborhood of zero, and if all the moments ofXand Yare finite, it follows from (2.33) that all the corresponding moments of X and Y are the same:

Theorem 2.22: If the rajdom variables X ajd Y are discrete ajd take with probability 1 only a finite number of values, then the distributions of X and Y are the same if and only if the moment-generating functions ofX and Y coincide on an arbitrary, small, open neighborhood of zero.

However, this result also applies without the conditions that X and Y are discrete and take only a finite number of values, and for random vectors as well, but the proof is complicated and is therefore omitted:

Theorem 2.23: If the moment-generating functions mX(t) and mY(t) of the random vectors X and Y in R are defined and finite in an open neighborhood N0(8) = {x є R : \x II <5} of the origin ofRk, then the distributions ofX and Y are the same if and only if mX (t) = mY (t) for all t є N0(8)

2.8.2. Characteristic Functions

The disadvantage of the moment-generating function is that it may not be finite in an arbitrarily small, open neighborhood of zero. For example, if X has a standard Cauchy distribution, that is, X has density

then

m(t) = j exp(t ■x)f(x)dx I = if t = (237)

— TO

There are many other distributions with the same property as (2.37) (see Chapter 4); hence, the moment-generating functions in these cases are of no use for comparing distributions.

The solution to this problem is to replace t in (2.31) with i ■ t, where i = V—T. The resulting function <p(t) = m(i ■ t) is called the char­acteristic function of the random variable X : y(t) = E [exp(i ■ t ■ X)], t є К. More generally,

Definition 2.16: The characteristic function of a random vector X in Kk is defined by ф(t) = E[exp(i ■ tTX)], t є Kk, where the argument t is nonrandom.

The characteristic function is bounded because exp(i ■ x) = cos(x) + i ■ sin(x). See Appendix III. Thus, the characteristic function in Definition 2.16 can be written as

q>(t) = E[cos(tTX)] + i ■ E[sin(tTX)], t є Kk.

Note that by the dominated convergence theorem (Theorem 2.11), lim^0 ф(\$) = 1 = ф(0); hence, a characteristic function is always continuous in t = 0.

Replacing moment-generating functions with characteristic functions, we find that Theorem 2.23 now becomes

Theorem 2.24: Random variables or vectors have the same distribution ifand only if their characteristic functions are identical.

The proof of this theorem is complicated and is therefore given in Appendix

2. A at the end of this chapter. The same applies to the following useful result, which is known as the inversion formula for characteristic functions:

Theorem 2.25: Let X be a random vector in Rk with characteristic function q>(i).Iff>(t) is absolutely integrable (i. e., fRk |^(t)|dt < ж), then the distribution of X is absolutely continuous with joint density f (x) = (2n)—k /Rt exp(-i ■ tT x )<p(t )dt.

2.9. Exercises

1. Prove that the collection D in the proof of Theorem 2.1 is a a – algebra.

2. Prove Theorem 2.3.

3. Prove Theorem 2.4 for the max, sup, limsup, and lim cases.

4. Why is it true that if g is Borel measurable then so are g+ and g – in (2.6)?

5. Prove Theorem 2.7.

6. Prove Theorem 2.8.

7. Let g(x) = x if x is rational and g(x) = —x if x is irrational. Prove that g(x) is Borel measurable.

8. Prove parts (a)-( f) of Theorem 2.9 for simple functions

n m

g(x) = Y1 aiI(x є Bi^f(x) = Y1 bjI(x є Cj’)■

i=1 j=1

9. Why can you conclude from Exercise 8 that parts (a)-( f) of Theorem 2.9 hold for arbitrary, nonnegative, Borel-measurable functions?

10. Why can you conclude from Exercise 9 that Theorem 2.9 holds for arbitrary Borel-measurable functions provided that the integrals involved are defined?

11. From which result on probability measures does (2.11) follow?

12. Determine for each inequality in (2.12) which part of Theorem 2.9 has been used.

13. Why do we need the condition in Theorem 2.11 that f g(x)d/x(x) < ж?

14. Note that we cannot generalize Theorem 2.5 to random variables because some­thing missing prevents us from defining a continuous mapping X: Й ^ R. What is missing?

15. Verify (2.16) and complete the proof of Theorem 2.18.

16. Prove equality (2.2).

17. Show that var(X) = E(X2) — (E(E))2, cov(X, Y) = E(X ■ Y) — (E(X)).

(E(Y)), and —1 < corr(X, Y) < 1. Hint: Derive the latter result from var(Y — XX) > 0 for all X.

18. Prove (2.17).

19. Which parts of Theorem 2.15 have been used in (2.18)?

20. How does (2.20) follow from (2.19)?

21. Why does it follow from (2.28) that (2.29) holds for simple random variables?

22. Prove Theorem 2.19.

23. Complete the proof of Theorem 2.20 for the case p = q = 1.

24. Let X = U0(U — 0.5) and Y = U0(U2 – 0.5), where U0, U1, and U2 are independent and uniformly [0, 1] distributed. Show that E[X2 • Y2] = (E [ X2])( E [Y2]).

25. Prove that if (2.29) holds for simple random variables, it holds for all random variables. Hint: Use the fact that convex and concave functions are continuous (see Appendix II).

26. Derive the moment-generating function of the binomial (n, p) distribution.

27. Use the results in Exercise 26 to derive the expectation and variance of the binomial (n, p) distribution.

28. Show that the moment-generating function of the binomial (n, p) distribution

converges pointwise in t to the moment-generating function of the Poisson (A) distribution if n andp 4 0 such that n • p ^ A.

29. Derive the characteristic function of the uniform [0, 1] distribution. Is the inversion formula for characteristic functions applicable in this case?

30. If the random variable X has characteristic function exp(i • t), what is the dis­tribution of X?

31. Show that the characteristic function of a random variable X is real-valued if and only if the distribution of X is symmetric (i. e., X and —X have the same distribution).

32. Use the inversion formula for characteristic functions to show that p(t) = exp(—|t |) is the characteristic function of the standard Cauchy distribution [see (2.36) for the density involved]. Hints: Show first, using Exercise 31 and the inversion formula, that

 ж 0

and then use integration by parts.

APPENDIX

2. A. Uniqueness of Characteristic Functions

To understand characteristic functions, you need to understand the basics of complex analysis, which is provided in Appendix III. Therefore, it is recom­mended that Appendix III be read first.

In the univariate case, Theorem 2.24 is a straightforward corollary of the following link between a probability measure and its characteristic function.

Theorem 2.A.1: Let д be a probability measure on the Borel sets in К with characteristic function p, and let a < b be continuity points of д : д({а}) = д([Ь}) = 0. Then

T

1 f exp(-i ■ t ■ a) – exp(-i ■ t ■ b)

д((a, b]) = lim — I ———————————————— p(t)dt.

t^to 2n J i ■ t

-T

(2.38)

Proof: Using the definition of characteristic function, we can write

T

exp(-i ■ t ■ a) – exp(-i ■ t ■ b)
i ■ t

I exp(-i ■ t ■ a) – exp(-i ■ t ■ b)| /2(1 – cos(t ■ (b – a))

t2

< b – a.

 Therefore, it follows from the bounded convergence theorem that

 d n(x). (2.40)

 The integral between square brackets can be written as

 t

where sgn(x) = 1ifx > 0, sgn(0) = 0, and sgn(x) = -1ifx < 0. The last two integrals in (2.41) are of the form

to x

j j sin(t)exp(-t • u)dtdu 00

(2.42)

where the last equality follows from integration by parts:

x

J sin(t)exp(-t • u)dt 0

x

dcos(t)

dt

x

= cos(t)exp(-t • u)|0 – u. j cos(t)exp(-t • u)dt

= 1 – cos(x) exp(-x • u) – u • sin(x) exp(-x • u)

x

– u2 j sin(t) exp(-t • u)dt. 0

Clearly, the second integral at the right-hand side of (2.42) is bounded in x > 0 and converges to zero as x ^to. The first integral at the right-hand side of (2.42) is f du f

I 2 =1 d arctan(u) = arctan(TO) = n/2.

Thus, the integral (2.42) is bounded (hence so is (2.41)), and

T

[ exp(i ■ t(x – a)) – exp(i ■ t ■ (x – b)) .

lim ——————————————————- dt

tJ i ■ t

-T

= n [sgn(x – a) – sgn(x – b)]. (2.43)

It follows now from (2.39), (2.40), (2.43), and the dominated convergence theorem that

= 1 J [sgn(x – a) – sgn(x – b)]d/л(х) = n,((a, b)) + 2^({a}) + 2n-({b}).

The last equality in (2.44) follow from the fact that

The result (2.38) now follows from (2.44) and the condition ^({a}) = ^({b}) = 0. Q. E.D.

Note that (2.38) also reads as

T

i-т/ ,• 1 f exP(-i ■t ■ a) – exp(-i ■t ■ b)

F(b) – F(a) = lim – r – —————— ———————- v(t)dt,

t2n J і ■ t

-T

(2.45)

where F is the distribution function corresponding to the probability measure /г.

Next, suppose that у is absolutely integrable:/ГО l<K0|dt < to. Then(2.45) can be written as

^ ^ 4 1 f exP(-i ■t ■ a) – exp(-i ■t ■ b)

F(b) – F(a) = — ———————

2n J і■t

and it follows from the dominated convergence theorem that

exp(—i ■ t ■ a)p(t)dt

=b! exp(

—TO

This proves Theorem 2.25 for the univariate case. In the multivariate case Theorem 2.A.1 becomes

Theorem 2.A.2: Let д be a probability measure on the Borel sets in Kk with characteristic function p. Let B = xk=1(aj, bj], where aj < bj for j = 1, 2,… ,k, and let d B be the border ofB, that is, d B = {xkj=1[aj, bj ]} [xkj=x(aj, bj)}. If д(дB) = 0; then

x p(t)dt,

where t = (t1tk)T

This result proves Theorem 2.24 for the general case.
Moreover, if fKk |p(t)|dt < to, (2.46) becomes

and by the dominated convergence theorem we may take partial derivatives inside the integral:

where a = (a1,…, ak)T. The latter is just the density corresponding to д in point a. Thus, (2.47) proves Theorem 2.25.