Bayes’ Rule

Let A and B be sets in &. Because the sets A and A form a partition of the sample space ^, we have B = (B П A) U (B П A); hence,

P(B) = P(B П A) + P(B П A) = P(B|A)P(A) + P(B |A)P(A).

Moreover,

P(AB)= P(A П B) P(B|A)P(A)

( 1 ) P (B) P (B) ■

Combining these two results now yields Bayes’ rule: P (B | A) P (A)

P(B | A)P(A) + P(B| A)P(A) ■

Thus, Bayes’ rule enables us to compute the conditional probability P(A |B) if P(A) and the conditional probabilities P(B | A) and P(B | A) are given.

More generally, if Aj, j = 1, 2,…,n (< to) is a partition of the sample space ^ (i. e., the Aj’s are disjoint sets in Ж such that ^ = Un=j Aj), then P (B | Ai) P (Ai)

TTj=1 P (B| Aj) P (Aj) •

Bayes’ rule plays an important role in a special branch of statistics (and econometrics) called Bayesian statistics (econometrics).

1.10.3. Independence

If P(A|B) = P(A), knowing that the outcome is in B does not give us any information about A. In that case the events A and B are described as being independent. For example, if I tell you that the outcome of the dice experiment is contained in the set {1, 2, 3, 4, 5, 6} = ^, then you know nothing about the outcome: P (A|^) = P (A П ^)/P (^) = P (A);hence, ^ is independent of any other event A.

Note that P(A|B) = P(A) is equivalent to P(A П B) = P(A)P(B). Thus,

Definition 1.14: Sets A and B in Ж are (pairwise) independent ifP (A П B) =

P (A) P (B).

If events A and B are independent, and events B and C are independent, are events A and C independent? The answer is not necessarily. As a counterexam­ple, observe that if A and B are independent, then so are A and B, A and B, and A and B because

P(A П B) = P(B) – P(A П B) = P(B) – P(A)p(B)

= (1 – P(A))P(B) = P(A)p(B),

and similarly,

P(A n B) = P(A)P(B) and P(A n B) = P(A)p(B).

Now if C = A and 0 < P (A) < 1, then B and C = A are independent if A and B are independent, but

P(A n C) = P(A n A) = P(0) = 0,

whereas

P(A)P(C) = P(A)P(A) = P(A)(1 – P(A)) = 0.

Thus, for more than two events we need a stronger condition for independence than pairwise independence, namely,

Definition 1.15: A sequence Aj of sets in & is independent iffor every subse­quence Aji, і = 1, 2,…,n, P (nn=! Ajt) = 1X1=1 P A ).

By requiring that the latter hold for all subsequences rather than P (nf= 1 Ai) = Ші P(Ai), we avoid the problem that a sequence of events would be called independent if one of the events were the empty set.

The independence of a pair or sequence of random variables or vectors can now be defined as follows.

Definition 1.16: Let Xj be a sequence of random variables or vectors de­fined on a common probability space {^, &, P}. X1 and X2 are pairwise independent if for all Borel sets B1, B2 the sets A1 = {ш e ^ : X 1(a>) e B1} and A2 = {ш e ^ : X2(a>) e B2} are independent. The sequence Xj is in­dependent if for all Borel sets Bj the sets Aj = {ш e ^ : Xj (ш) e Bj} are independent.

As we have seen before, the collection &j = {{ш e ^ : Xj(ш) e B}, B e B}} = {X-‘(B), B e B} is a sub-a-algebra of &. Therefore, Definition 1.16 also reads as follows: The sequence of random variables Xj is independent if for arbitrary Aj e & the sequence of sets Aj is independent according to Definition 1.15.

Independence usually follows from the setup of a statistical experiment. For example, draw randomly with replacement n balls from a bowl containing R red balls and N — R white balls, and let Xj = 1 if the jth draw is a red ball and Xj = 0 if the jth draw is a white ball. Then X1,…, Xn are independent (and

X1 +——– + Xn has the binomial (n, p) distribution with p = R/N). However,

if we drew these balls without replacement, then X1Xn would not be independent.

For a sequence of random variables Xj it suffices to verify only the condition in Definition 1.16 for Borel sets Bj of the type (—to, xj ], xj e R:

Theorem 1.14: Let X1,…, Xn be random variables, and denote, for x e R and j = 1,.. .,n, Aj (x) = {ш e ^ : Xj (ш) < x}. Then X1,Xn are inde­pendent if and only if for arbitrary (x1,…,xn )T e R” the sets A1(x1), …, a” (xn) are independent.

The complete proof of Theorem 1.14 is difficult and is therefore omitted, but the result can be motivated as follow. Let & = {{2, 0, X – ‘((—to, x]), X- ‘((y, to)), Vx, y e R together with all finite unions and intersections of the latter two types of sets}. Then & is an algebra such that for arbitrary Aj e & the sequence of sets Aj is independent. This is not too hard to prove. Now Xj = {X-‘(5), B є B}} is the smallest a-algebra containing X and is also the smallest monotone class containing X. One can show (but this is the hard part), using the properties of monotone class (see Exercise 11 below), that, for arbitrary Aj є Xj, the sequence of sets Aj is independent as well.

It follows now from Theorem 1.14 that

Theorem 1.15: The random variables XiXn are independent if and only if the joint distribution function F (x) ofX = (X1Xn )T can be written as the product of the distribution functions Fj (xj) of the Xj’s, that is, F(x) = П"=1 Fj (xj), where x = (x1Xn )T.

The latter distribution functions Fj (xj) are called the marginal distribution functions. Moreover, it follows straightforwardly from Theorem 1.15 that, if the joint distribution of X = (X1Xn )T is absolutely continuous with joint density function f (x), then X1Xn are independent if and only if f (x) can be written as the product of the density functions fj (xj) of the Xj’s:

n

f(x) = П fj (xj), where x = (x1,…, x„ )T.

j=1

The latter density functions are called the marginal density functions.

1.6. Exercises

1. Prove (1.4).

2. Prove (1.17) by proving that ln[(1 — x/n)n] = n ln(1 — x/n) ^ —x for n ^ж.

3. Let X* be the collection of all subsets of Й = (0, 1] of the type (a, b], where a < b are rational numbers in [0, 1], together with their finite disjoint unions and the empty set 0. Verify that X* is an algebra.

4. Prove Theorem 1.2.

5. Prove Theorem 1.5.

6. Let Й = (0, 1], and let ® be the collection of all intervals of the type (a, b] with 0 < a < b < 1. Give as many distinct examples as you can of sets that are contained in a (®) (the smallest a – algebra containing this collection ®) but not in a(®) (the smallest algebra containing the collection ®).

7. Show that a ({[a, b] : V a < b, a, b є К}) = B.

8. Prove part (g) of Theorem 1.8.

9. Prove that X0 defined by (1.20) is an algebra.

10. Prove (1.22).

11. A collection Ж of subsets of a set й is called a monotone class if the following two conditions hold:

An Є Ж, An C A„+1, n = 1, 2, 3, … imply U“ 1 An Є Ж,

An Є Ж, An D An+1, n = 1, 2, 3, … imply П“ 1 An Є Ж.

Show that an algebra is a a – algebra if and only if it is a monotone class.

12. A collection Жх of subsets of a set й is called a Л-system if A Є Жх implies A Є Жх, and for disjoint sets Aj Є Жх, U“= Aj Є Жх. A collection Жп of subsets of a set й is called a n -system if A, B Є Жп implies that A П B Є Жп. Prove that if a Л-system is also a n-system, then it is a a – algebra.

13. Let Ж be the smallest a – algebra of subsets of К containing the (countable) collection ofhalf-open intervals (-ж, q] with rational endpoints q. Prove that Ж contains all the Borel subsets of К : B = Ж.

14. Consider the following subset of К2 : L = {(x, y) Є К2 : y = x, 0 < x < 1}. Explain why L is a Borel set.

15. Consider the following subset of R2 : C = {(x, y) Є К2 : x2 + y2 < 1}. Ex­plain why C is a Borel set.

16. Prove Theorem 1.11. Hint: Use Definition 1.12 and Theorem 1.8. Determine first which parts of Theorem 1.8 apply.

17. Let F(x) = f (u)du be an absolutely continuous distribution function. Prove that the corresponding probability measure /x is given by the Lebesgue integral (1.27).

18. Prove that the Lebesgue integral over a Borel set with zero Lebesgue measure is zero.

19. Let {й, Ж, P} be a probability space, and let B Є Ж with P (B) > 0. Verify that {B, ЖП B, P(•IB)} is a probability space.

20. Are disjoint sets in Ж independent?

21. (Application of Bayes’ rule): Suppose that a certain disease, for instance HIV+, afflicts 1 out of 10,000 people. Moreover, suppose that there exists a medical test for this disease that is 90% reliable: If you don’t have the disease, the test will confirm that with probability 0.9; the probability is the same if you do have the disease. If a randomly selected person is subjected to this test, and the test indicates that this person has the disease, what is the probability that this person actually has this disease? In other words, if you were this person, would you be scared or not?

22. Let A and B in Ж be pairwise independent. Prove that A and B are independent (and therefore A and B are independent and A and B are independent).

23. Draw randomly without replacement n balls from a bowl containing R red balls and N — R white balls, and let Xj = 1 if the jth draw is a red ball and Xj = 0 if the jth draw is a white ball. Show that X1Xn are not independent.

APPENDIXES

1. A. Common Structure of the Proofs of Theorems 1.6 and 1.10

The proofs of Theorems 1.6 and 1.10 employ a similar argument, namely the following:

Theorem 1.A.1: Let ® be a collection of subsets of a set ^, and let a (®) be the smallest a-algebra containing ®. Moreover, let p be a Boolean function on a (®), that is, p is a set function that takes either the value “True” or “False.” Furthermore, let p (A) = True for all sets A in ®. If the collection D of sets A in a (®) for which p (A) = True is a a-algebra itself, then p( A) = True for all sets A in a (®).

Proof: Because D is a collection of sets in a (®) we have D c a (®). More­over, by assumption, ® c D, and D isa a – algebra. But a (Ф) is the smallest a – algebra containing ®; hence, a (Ф) c D. Thus, D = a (Ф) and, consequently, p(A) = True for all sets A in a(®). Q. E.D.

This type of proof will also be used later on.

Of course, the hard part is to prove thatD is a a-algebra. In particular, the collection D is not automatically a a-algebra. Take, for example, the case in which ^ = [0, 1], ® is the collection of all intervals [a, b] with0 < a < b < 1, and p (A) = True if the smallest interval [a, b] containing A has positive length: b — a > 0 and p(A) = False otherwise. In this case a(Ф) consists of all the Borel subsets of [0, 1] butD does not contain singletons, whereas a (Ф) does, and thus D is smaller than a (Ф) and is therefore not a a-algebra.