Properties of Conditional Expectations

As conjectured following (3.10), the condition E(| Y|) < to is also a sufficient condition for the existence of E(E[Y|&0]). The reason is twofold. First, I have already established in (3.9) that

Theorem 3.1: E[E(Y|&0)] = E(Y).

Second, conditional expectations preserve inequality:

Theorem 3.2: If P(X < Y) = 1, then P(E(X|&0) < E(Y|&0)) = 1.

Proof: Let A = {co є ^ : E(X|&0)(ш) > E(Y|&0)(ш)}. Then A є &0, and j X(w)dP(w) = j E(X|&0)(w)dP(w) < j Y(w)dP(w)

A A A

= f E(Y|&0)HdP(«);

A

hence,

0 <1 (E(Y|&0)И – E(X|&b)H)dP(m) < 0. (3.13)

A

It follows now from (3.13) and Lemma 3.1 that P({ш є ^: E(X|&0)(ш) > E (Y |&,)(ш)}) = 0. Q. E.D.

Theorem 3.2 implies that | E (Y |&0)| < E (|Y ||&0) with probability 1, and if we apply Theorem 3.1 it follows that E[|E(Y|&0)|] < E(|Y|). Therefore, the condition E(|Y|) < to is sufficient for the existence of E(E[Y|&0]). Conditional expectations also preserve linearity:

Theorem 3.3: IfE[|X|] < to and E[|Y1] < to, then P[E(aX + вY|У0) = a E (X |Уо) + в E (Y |Уо)] = 1.

Proof: Let Z0 = E(aX + вY|У0), Z1 = E(X|У0), Z2 = E(Y|У0). For every A є У0 we have

j Z0(a>)dP(a>) = j(aX(ш) + вY(rn))dP(rn)

A A

= a j X(rn)dP(rn) + в j Y(rn)dP(rn),

A A

j Zi(«)dP(«) = j X((o)dP(a>),

and j Z2{rn)dP(rn) = j Y(m)dP(rn);

AA  j(Zq(w) – aZi(rn) — вZ2((o))dP((o) = 0.

A

Ifwetake A = {ш є ^ : Z 0(ш) — a Z 1(ш) — в Z2(w) > 0} it follows from (3.14) and Lemma 3.1 that P(A) = 0, if we take A = {ш є ^ : Z0(«) — aZ 1(ш) — вZ2(rn) < 0} it follows similarly that P(A) = 0; hence, P({ш є ^ : Z0(ш) — a Z 1(ш) — в Z2(«) = 0}) = 0. Q. E.D.

If we condition a random variable Y on itself, then intuitively we may expect that E(YIY) = Y because then Yacts as a constant. More formally, this result can be stated as

Theorem 3.4: Let E[|Y|] < to. If Y is measurable У, then P(E(Y|У) = Y) = 1.

Proof: Let Z = E(Y |У). For every A є Уwe have

j (Y (ш) — Z (w))dP(«) = 0. (3.15)

A

Take A = {ш є ^: Y(ш) — Z(ш) > 0}. Then A є У; hence, it follows from (3.15) and Lemma 3.1 that P(A) = 0. Similarly, if one takes A = {ш є ^ : Y(ш) — Z(ш) < 0}, it follows that P(A) = 0. Thus, P({ш є ^ : Y(ш) — Z (ш) = 0}) = 0. Q. E.D.

In Theorem 3.4 I have conditioned Y on the largest sub-a-algebra of & – namely & itself. The smallest sub-a-algebra of & is t = {£2, 0}, which is called the trivial a – algebra.

Theorem 3.5: Let E[|Y|] < to. Then P[E(Y|t) = E(Y)] = 1.

Proof: Exercise, along the same lines as the proofs of Theorems 3.2 and 3.4. The following theorem, which plays a key role in regression analysis, follows from combining the results of Theorems 3.3 and 3.4:

Theorem 3.6: LetE[|Y|] < to andU = Y — E[Y|&0]. Then P[E(U|&o) = 0] = 1.

Proof: Exercise.

Next, let (Y, X, Z) be jointly continuously distributed with joint density function f (y, x, z) and marginal densities fyx(y, x), fx, z(x, z) and fx(x). Then the conditional expectation of Y given X = x and Z = z is E[Y|X, Z] = /—TO yf(ylX, Z)dy = gx, z(X, Z), for instance, where f(y|x, z) = f(y, x, z)/ fx, z (x, z) is the conditional density of Y given X = x and Z = z. The con­ditional expectation of Y given X = x alone is E[Y|X] = /TO yf (y lX)dy = gx (X), for instance, where f (y |x) = fyx (y, x)/fx (x) is the conditional density of Y given X = x alone. If we denote the conditional density of Z given X = x by fz(z|x) = fz, x(z, x)/fx(x), it follows now that  E f yfy|X)dy = E[YlX]

This is one of the versions of the law of iterated expectations. Denoting by &X, z the a-algebra generated by (X, Z) and by &X the a-algebra generated by X, we find this result can be translated as

E (E [7 |&x, z ]|&x) = E [7 |&x ]•

Note that &X c &Xz because

&x = {{ш є & : X(ш) є B}, B є В}

= {{ш є & : X(rn) є Bi, Z(ш) є К}, Bi є В}

c {{ш є & : X(ш) є Bi, Z(ш) є B2}, Bi, B2 є В} = &x, z■

Therefore, the law of iterated expectations can be stated more generally as

Theorem 3.7: Let E[|71] < to, and let &0 c &1 be sub-a-algebras of &■ Then

P [E (E [7 |&i]|&o) = E (7 |&o)] = 1.

Proof: Let Z0 = E[7|&0], Zi = E[7|&i] and Z2 = E[Zi |&0]. It has to be shown that P (Z0 = Z2) = i. Let A є &0 ■ Then also A є &i. It follows from Definition 3.i that Z0 = E[71&0] implies fA 7(ш)dP(ш) = fA Z0(ш)dP(ш), Zi = E[7|&^ implies /A 7fш)dP(ш) = fA Z 1(ш)dP(ш), and Z2 = E[Zi|&0] implies j’A Z2(ш)dP(ш) = fAZi(oA)dP(rn). If we combine these equalities, it follows that for all A є &0,

f (Z0(ш) – Z2(ш)) dP(ш) = 0. (3.i6)

A

Now choose A = {ш є &: Z0^) — Z2(ш) > 0}. Note that A є &0. Then it follows from (3.i6) and Lemma 3.i that P(A) = 0. Similarly, if we choose A = {ш є & : Z0(ш) — Z2(ш) < 0},then, again, P(A) = 0. Therefore, P(Z0 = Z 2) = i. Q. E.D.

The following monotone convergence theorem for conditional expectations plays a key role in the proofs of Theorems 3.9 and 3.i0 below.

Theorem 3.8: (Monotone convergence). Let Xn be a sequence of non­negative random variables defined on a common probability space {& , &, P} such that P(Xn < Xn+i) = i and E[supn>iXn] < to. Then P (limn^TO E [X„ |&0] = E [limn^TO Xn |&o]) = i.

Proof: Let Zn = E[Xn |&0] and X = limn^TOXn. It follows from Theo­rem 3.2 that Zn is monotonic nondecreasing; hence, Z = limn^TOZn exists. Let A є &0 be arbitrary and 7n(ш) = Zn(ш) ■ I(ш є A), 7(ш) = Z(ш) ■ I(ш є

A) for ш є Then also Yn is nonnegative and monotonic nondecreasing and Y = limn^TO Yn; hence, it follows from the monotone convergence theorem that limn^TO / Yn(w)dP(w) = fY(a>)dP(a>), which is equivalent to limn^TO j Zn (rn)dP(rn) = j Z (w)dP(w).

Similarly, if we let Un(ш) = Xn(ш) ■ I(ш є A), U(ш) = X(ш) ■ I(ш є A), it   follows from the monotone convergence theorem that limn^TO/ Un (rn)dP(oA) = f U(ш^Р(ш), which is equivalent to  Moreover, it follows from the definition of Zn = E [Xn |&0] that f Zn (ш)^Р(ш) = J Xn (ш)АР(ш).

AA

It follows now from (3.17)-(3.19) that

j Z(rn)dP(rn) = j X(ш)dP(ш).

AA

Theorem 3.8 easily follows from (3.20). Q. E.D.

The following theorem extends the result of Theorem 3.4:

Theorem 3.9: LetXbe measurable &0, and let both E(| Y|) and E(|XY|) be finite. Then P[E(XY|&0) = X ■ E(Y|&0)] = 1.

Proof: I will prove the theorem involved only for the case in which both X and Y are nonnegative with probability 1, leaving the general case as an easy exercise.

Let Z = E(XY|&0), Z0 = E(Y|&0). If VA є. W0′. j Z(rn)dP(rn) = j X(ш)Z0(rn)dP(rn),

AA

then the theorem under review holds.

(a) First, consider the case in which X is discrete: X(ш) = J2nj=i PjI(ш є Aj), for instance, where the Aj’s are disjoint sets in &0 and the fj’s are nonnegative numbers. Let A є &0 be arbitrary, and observe that A П Aj є &0 for j = 1,…, n. Then by Definition 3.1,

which proves the theorem for the case in which X is discrete.

(b) If X is not discrete, then there exists a sequence of discrete random variables Xn such that for each ш є ^ we have 0 < Xn (ш) < X(ш) and Xn(ш) f X(ш) monotonic; hence, Xn{ш)Y(ш) f X{ш)Y(ш) monotonic. Therefore, it follows from Theorem 3.8 and part (a) that E [XY|«^0] = limn^cx, E [ XnY |^] = lim„^TO XnE [Y |^] = XE[Y |^0] with probability 1. Thus, the theorem under review holds for the case that both X and Y are nonnegative with probability 1.

(c) The rest of the proof is left as an exercise. Q. E.D.

We have seen for the case in which Y and X are jointly, absolutely contin­uously distributed that the conditional expectation E[Y| X] is a function of X. This holds also more generally:

Theorem 3.10: Let Y and X be random variables defined on the probability space {^, P}, and assume that E(|Y|) < x. Then there exists a Borel – measurable function g such that P [E(Y|X) = g(X)] = 1. This result carries over to the case in which X is a finite-dimensional random vector.

Proof: The proof involves the following steps:

(a) Suppose that Yis nonnegative and bounded: 3K < x : P({ш є ^ : 0 < Y (ш) < K}) = 1, and let Z = E (Y |^X), where.^X is the a – algebra generated by X. Then  P({ш є ^ : 0 < Z(ш) < K}) = 1.

(b) Under the conditions of part (a) there exists a sequence of discrete

random variables Zm, Zm (ю) = £m=1 at, m I(ю e Ai, m), where Ai:„ e

XX, Ai, m Tl Aj, m = 0 if i = j, Ц = 1 Ai, m = 0 ^ ai, m < (TO for i =

1,…,m such that Zm(ю) f Z(ю) monotonic. For each Ai, m we can find a Borel set Bi, m such that Ai, m = X-1(Bi, m). Thus, if we take gm (X) = J2m=1 ai, m I(X Є Bi, m ), then Zm = gm (X) with probability 1.

Next, let g(x) = limsupm^TOgm(x). This function is Borel measur­able, and Z = limsupm^TO Zm = limsupm^TO gm (X) = g(X) withprob – ability 1.

(c) Let Yn = Y ■ I(Y < n). Then Yn(ю) f Y(ю) monotonic. By part (b) it follows that there exists a Borel-measurable function gn (x) such that E(YnXX) = gn (X). Let g(x) = limsupn^TOgn (x), which is Borel mea­surable. It follows now from Theorem 3.8 that

E(YXx) = lim E(YnXx) = limsupn^E(YnXx)

n^TO

= limsupn^TOgn (X) = g(X).

(d) Let Y + = max(Y, 0), Y – = max(-Y, 0). Then Y = Y + – Y-, and therefore by part (c), E(Y + XX) = g+(X), for instance, and E(Y-Xx) = g-(X). Then E(YXx) = g+(X) – g-(X) = g(X). Q. E.D.

If random variables X and Y are independent, then knowing the realization of X will not reveal anything about Y, and vice versa. The following theorem formalizes this fact.

Theorem 3.11: Let X and Y be independent random variables. If E[Y] < to, then P(E[YX] = E[Y]) = 1. More generally, let Y be defined on the probability space {Q, X, P}, let XY be the a-algebra generated by Y, and let X0 be a sub-a – algebra of X such that XY and X0 are independent, that is, for all A e XY and B e X0, P(A П B) = P(A)P(B). IfE [Y] < to, thenP (E[YX0] = E[Y]) = 1.

Proof: Let XX be the a-algebra generated by X, and let A e XX be arbitrary. There exists a Borel set B such that A = {ю є Q : X(ю) e B}. Then

j Y(rn)dP(rn) = j Y(ю)I(ю e A)dP(rn)

A £2

= j Y(ю)I(X(ю) e B )dP(m)

= E[YI(X e B)] = E[Y]E[I(X e B)],

where the last equality follows from the independence of Y and X. Moreover, E[Y] j I(X(of e B)dP(rn)

E[Y] j I(ш e A)dP(rn) = j E[Y]dP(rn).

2 A

Thus,

j Y((o)dP(a>) = j E[Y]dP(of.

AA

By the definition of conditional expectation, this implies that E[Y|X] = E[Y] with probability 1. Q. E.D.