# Conditional Expectations

3.1. Introduction

Roll a die, and let the outcome be Y. Define the random variable X = 1 if Y is even, and X = 0 if Y is odd. The expected value of Y is E[Y] = (1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5. But what would the expected value of Y be if it is revealed that the outcome is even: X = 1? The latter information implies that Y is 2, 4, or 6 with equal probabilities 1 /3; hence, the expected value of Y, conditional on the event X = 1, is E[Y|X = 1] = (2 + 4 + 6)/3 = 4. Similarly, if it is revealed that X = 0, then Y is 1, 3, or, 5 with equal probabilities 1 /3; hence, the expected value of Y, conditional on the event X = 0, is E[Y|X = 0] = (1 + 3 + 5)/3 = 3. Both results can be captured in a single statement:

E [Y | X] = 3 + X. (3.1)

In this example the conditional probability of Y = y, given, X = x is[12]

P(Y = y and X = x)
P (X = x)

P ({y} П {2, 4, 6}) P ({y})

P ({2, 4, 6}) P ({2, 4, 6})

1

= – if x = 1 and y є {2, 4, 6} P ({y} П {2, 4, 6}) P (0)

P({2, 4, 6}) P({2, 4, 6})

= 0 if x = 1 and y Є {2, 4, 6}

P({y} П {1, 3, 5}) P({y})

P({1, 3, 5}) P({1, 3, 5})

= 1 ifx = 0 and y є {1, 3, 5}

P({y} П {1, 3, 5}) P(0)

P({1, 3, 5}) P({1, 3, 5})

= 0 ifx = 0 and уЄ {1, 3, 5}; (3.2)

hence,

Thus, in the case in which both Y and X are discrete random variables, the conditional expectation E[Y|X] can be defined as

E[Y|X] = ^2 УР(у1 X), where

p(y lx) = P(Y = y|X = x) for P(X = x) > 0.

A second example is one in which X is uniformly [0, 1] distributed, and given the outcome x of X, Y is randomly drawn from the uniform [0, x] distribution. Then the distribution function F(y) of Y is

P(Y < y) = P(Y < y and X < y) + P(Y < y and X > y) P(X < y) + P(Y < y and X > y) y + E[I(Y < y)I(X > y)]

I(z < y) x-1 dz І I(x > y) dx

Hence, the density of Y is

f (y) = F'(y) = – ln(y) for y є (0, 1], f (y) = 0 for y Є (0, 1].

Thus, the expected value of Y is E[Y] = /J y(- ln(y))dy = 1/4. But what would the expected value be if it is revealed that X = x for a given number x є (0, 1)? The latter information implies that Y is now uniformly [0, x] distributed; hence, the conditional expectation involved is

More generally, the conditional expectation of Y given X is

X

E [Y | X] = X-1 J ydy

о

The latter example is a special case of a pair (Y, X) of abso­lutely continuously distributed random variables with joint density function f (y, x) and marginal density fx (x). The conditional distribution function of Y, given the event X є [x, x + 5],5 > 0, is

Letting 5 I 0 then yields the conditional distribution function of Y given the event X = x:

F(y|x) = lim P(Y < y |X є [x, x + 5])

5^0

y

= j f (u, x)du/fx(x), providedfx(x) > 0.

—to

Note that we cannot define this conditional distribution function directly as

F(y|x) = P(Y < y and X = x)/P(X = x)

because for continuous random variables X, P (X = x) = 0.

The conditional density of Y, given the event X = x, is now

f(yx) = 9F(y|x)/dy = f(y, x)/fx(x),

and the conditional expectation of Y given the event X = x can therefore be defined as

TO

E[Y|X = x] =f уЛуlx» = S<x). for mstaiire.

— to

Plugging in X for x then yields

These examples demonstrate two fundamental properties of conditional ex­pectations. The first one is that E[Y|X] is a function of X, which can be trans­lated as follows: Let Y and X be two random variables defined on a common probability space {Й, Ж, P}, and let.!XX be the a-algebra generated by X, ЖX = {X—1(B), B є B}, where X—1(B) is a shorthand notation for the set {ш є Й : X(ш) є B} and B is the Euclidean Borel field. Then,

Z = E[Y|X] is measurable &X, (3.5)

which means that, for all Borel sets B, {ш є Й : Z(ш) є В}є ЖX. Secondly, we have

E[(Y — E[Y|X])I(X є B)] = 0 for all Borel sets B. (3.6)

In particular, in the case (3.4) we have E[(Y — E[Y|X])I(X є B)]

TO TO

(3.7)

Because ЖX = {X ‘(B), B є B}, property (3.6) is equivalent to

Moreover, note that Й є ЖX, and thus (3.8) implies

E (Y) = Y (ш^Р(ш) = Z (ш^Р(ш) = E (Z)

provided that the expectations involved are defined. A sufficient condition for the existence of E(Y) is that

E(| Y|) < <x. (3.10)

We will see later that (3.10) is also a sufficient condition for the existence of E (Z).

I will show now that condition (3.6) also holds for the examples (3.1) and (3.3). Of course, in the case of (3.3) I have already shown this in (3.7), but it is illustrative to verify it again for the special case involved.

In the case of (3.1) the random variable Y■ I(X = 1) takes the value 0 with probability V2 and the values 2,4, or 6 with probability 1 /6; the random variable Y■ I(X = 0) takes the value 0 with probability V2 and the values 1, 3, or 5 with probability 1 /6. Thus,

E [Y ■ I(X є B)] = E [Y ■ I(X = 1)] = 2

E [Y ■ I(X є B)] = E [Y ■ I(X = 0)] = 1.5

E[Y ■ I(X є B)] = E[Y] = 3.5

E [Y ■ I(X є B)] = 0

which by (3.1) and (3.6) is equal to

E[(E[Y|X])I(X є B)]

= 3E[I(X є B)] + E[X ■ I(X є B)]

= 3P(X є B) + P(X = 1 and X є B)

 3 P (X = 1) + P (X = 1) =2 if 1 є B and 0 / B, 3 P (X = 0) + P (X = 1 and X = 0) = 1.5 if 1 / B and 0 B, 3 P (X = 0 or X = 1) + P (X = 1) = 3.5 if 1 є B and 0 B, 0 if 1 / B and 0 / B.

Moreover, in the case of (3.3) the distribution function of Y ■ I(X є B) is

Fb(y) = P(Y ■ I(X є B) < y) = P(Y < y and X є B) + P(X / B)

= P(X є B П [0, y]) + P(Y < y and X є B П (y, 1)) + P(X / B) y 1 1

= f I(x є B)dx + y f x—11(x є B)dx + 1 — /1(x є B)dx 0 y 0

1 1

= 1 — /1(x є B)dx + y f x—11(x є B)dx for 0 < y < 1;

yy

hence, the density involved is 1

fB(y) = j x—11(x є B)dx for y є [0, 1], fB(y) = 0 for y є [0, 1].

y

Thus,

x-11 (x є B )dx^ dy

1

= 2 j y ■ I (y e B )dy,

0

which is equal to

1

1 1 f

E(E[Y|X]I(X є B)) = – E[X ■ I(X є B)] = – x ■ I(x є B)dx.

0

The two conditions (3.5) and (3.8) uniquely define Z = E[Y|X] in the sense that if there exist two versions of E[Y|X] such as Z1 = E[Y|X] and Z2 = E[Y|X] satisfying the conditions (3.5) and (3.8), then P(Z1 = Z2) = 1. To see this, let

A = {ш є & : Z1(rn) < Z2(w)}. (3.11)

Then A є XX; hence, it follows from (3.8) that

j(Z2(«) – Z 1(eo))dP(eo) = E[(Z2 – Z 1)I(Z2 – Z1 > 0)] = 0.

A

The latter equality implies P(Z2 – Z1 > 0) = 0 as I will show in Lemma 3.1. If we replace the set A by A = {ш є & : Z 1(ш) > Z2(«)}, it follows similarly that P (Z2 – Z1 < 0) = 0. Combining these two cases, we find that P(Z2 = Z1) = 0.

Lemma 3.1: E [Z ■ I(Z > 0)] = 0 implies P (Z > 0) = 0.

Proof: Choose є > 0 arbitrarily. Then

0 = E [Z ■ I(Z > 0)] = E [Z ■ I(0 < Z < є)] + E [Z ■ I(Z > є)]

> E[Z ■ I(Z > є)] > єE[I(Z > є)] = єP(Z > є);

hence, P(Z > є) = 0 for all є > 0. Now take є = 1/n, n = 1, 2,… and let Cn = {co є & : Z(ш) > n-1}.

Then Cn c Cn+1; hence,

Q. E.D.

Conditions (3.5) and (3.8) only depend on the conditioning random variable X via the sub-a-algebra &X of &. Therefore, we can define the conditional expectation of a random variable Y relative to an arbitrary sub-a-algebra &0 of &, denoted by E[Y|&0], as follows:

Definition 3.1: Let Y be a random variable defined on a probability space &, P} satisfying E(| Y|) < to, and let &0 c & be a sub-a-algebra of &. The conditional expectation ofY relative to the sub-a-algebra &0, denoted by E[Y|&0] = Z, for instance, is a random variable Z that is measurable &0 and is such that for all sets A є &0,

j Y(w)dP(w) = j Z(rn)dP(rn).

AA