# Random Variables and Their Distributions

1.8.1. Random Variables and Vectors

In broad terms, a random variable is a numerical translation of the outcomes of a statistical experiment. For example, flip a fair coin once. Then the sample space is О = {H, T}, where H stands for heads and T stands for tails. The a-algebra involved is Ж = {О, 0, {H}, {T}}, and the corresponding probability measure is defined by P({H}) = P({T}}) = 1/2. Now define the function X(ю) = 1 if ю = H, X(ю) = 0 if ю = T. Then Xis a random variable that takes the value 1 with probability 1 /2 and the value 0 with probability 1/2:

(shorthand notation)

P(X = 1) = P ({ю є О, : X(ю) = 1}) = P ({H}) = 1/2,

(shorthand notation)

P(X = 0) = P({ю є О : X(ю) = 0}) = P({T}) = 1 /2.

Moreover, for an arbitrary Borel set B we have

P(X є B) =

P([ш є & : X(rn) є B})

where, again, P(X є B) is a shorthand notation[7] for P([ш є & : X(ш) є B}).

In this particular case, the set [ш є & : X(ш) є B} is automatically equal to one of the elements of &, and therefore the probability P (X є B) = P([ш є & : X(ш) є B}) is well-defined. In general, however, we need to confine the mappings X : & ^ К to those for which we can make probability statements about events of the type [ш є & : X(ш) є B}, where B is an arbitrary Borel set, which is only possible if these sets are members of &:

Definition 1.8: Let {&, &, P} be a probability space. A mapping X : & ^ К is called a random variable defined on {&, &, P} ifX is measurable &, which means that for every Borel set B, [ш є & : X(ш) є B} є &. Similarly, a mapping X: & ^ R is called a k-dimensional random vector defined on {&, &, P} if X is measurable & in the sense that for every Borel set B in [ш є & : X(ш) є B }є &.

In verifying that a real function X: & ^ К is measurable &, it is not necessary to verify that for all Borel sets B, [ш є & : X(ш) є B }є &, but only that this property holds for Borel sets of the type (-to, x ]:

Theorem 1.10: A mapping X : & ^ К is measurable & (hence Xis a random variable) if and only iffor allx є К the sets {ш є & : X(ш) < x} are members of & Similarly, a mapping X : & ^ К is measurable & (hence X is a random vector of dimension k) if and only iffor all x = (x,…, xk )T є К the sets Ck=1[ш є & : Xj(ш) < xj} = [ш є & : X(ш) є =j(-to, xj]} are members

of &, where theXj ’s are the components ofX.

Proof: Consider the case k = 1. Suppose that [ш є & : X^) є (-to, x]} є &, Vx є К. Let D be the collection of all Borel sets B for which [ш є & : X(ш) є B} є &. Then D c B and D contains the collection of half-open intervals (-to, x ], x є RIf D is a a-algebra itself, it is a a-algebra containing

the half-open intervals. ButB is the smallest a-algebra containing the half-open intervals (see Theorem 1.6), and thus B c D; hence, D = B. Therefore, it suffices to prove thatD is a a – algebra:

(a) Let B є D. Then {ш є tt : X(ш) є B }є X; hence,

~{ш є tt : X(ш) є B} = {co є tt : X(ш) є B}є X, and thus B є D.

(b) Next, let Bj є D for j = 1,2,…. Then {ш є tt : X(ш) є Bj} є.‘X; hence,

и7°=1{ш є tt : X(ш) є Bj} = {ш є tt : Х(ш) є U°= Bj} є X, and thus UjO0=1 Bj є ■&.

The proof of the case k > 1 is similar. Q. E.D.10

The sets {ш є tt : X(ш) є B} are usually denoted by X-1(B):

X-1(B) = {ш є tt : X(ш) є B}.

The collection XX = {X-1(B), V B є B} isa a-algebra itself (Exercise: Why?) and is called the a – algebra generated by the random variable X. More generally,

Definition 1.9: Let X be a random variable (k = 1) or a random vector (k > 1). The a-algebra XX = {X-1(B), VB є B} is called the a – algebra generated by X.

In the coin-tossing case, the mapping X is one-to-one, and therefore in that case XX is the same as X, but in general XX will be smaller than X. For example, roll a dice and let X = 1 if the outcome is even and X = 0 if the outcome is odd. Then

Xx = {{1, 2, 3, 4, 5, 6}, {2, 4, 6}, {1, 3, 5}, 0},

whereas X in this case consists of all subsets of tt = {1, 2, 3, 4, 5, 6}.

Given a k-dimensional random vector X, or a random variable X (the case k = 1), define for arbitrary Borel sets B є B:

Xx(B) = P (X-1(B)) = P ({ш є tt : Х(ш) є B}). (1.24)

Then xX() is a probability measure on {Kk, &k}

(a) for all B є Bk, xx(B) > 0;

(b) /xx (Rk) = 1;

(c) for all disjoint Bj є B, xx(Uj= 1 Bj) = 1 xx(Bj). [8]

Thus, the random variable X maps the probability space {^, Ж, P} into anew probability space, {R, B, /xX}, which in its turn is mapped back by X-1 into the (possibly smaller) probability space {^, ЖX, P}. The behavior of random vectors is similar.

Definition 1.10: The probability measure /лх (■) defined by (1.24) is called the probability measure induced by X.

1.8.2. Distribution Functions

For Borel sets of the type (-to, x ], or x*=1(-to, xj ] in the multivariate case, the value of the induced probability measure /xX is called the distribution function:

Definition 1.11: Let X be a random variable (k = 1) or a random vector (k > 1) with induced probability measure [xX. The function F(x) = /xX (xj=1(-to, xj ]), x = (x1,…, xk )T є Rk is called the distribution function ofX.

It follows from these definitions and Theorem 1.8 that

Theorem 1.11: A distribution function of a random variable is always right continuous, that is, Vx є R, lim8;0F(x + 8) = F(x), and monotonic nondecreasing, that is, F(x1) < F(x2) if x1 < x2, with limx;—TOF(x) = 0,

limx фто F(x) = 1.

Proof: Exercise.

However, a distribution function is not always left continuous. As a counterexample, consider the distribution function of the binomial (n, p) distribution in Section 1.2.2. Recall that the corresponding probability space consists of sample space ^ = {0, 1, 2,…, n}, the a-algebra Ж of all subsets of Й, and probability measure P({k}) defined by (1.15). The random variable Xinvolved is defined as X(k) = k with distribution function

F(x) = 0 for x < 0,

F (x) = E P({k}) for x є [0, n],

k<x

F(x) = 1 for x > n.

Now, for example, let x = 1. Then, for 0 < 8 < 1, F(1 — 8) = F(0), and F(1 + 8) = F(1); hence, lims;0 F(1 + 8) = F(1), but lims;0 F(1 — 8) = F(0) < F(1).

The left limit of a distribution function F in x is usually denoted by F(x—):

def

F (x —) =. lim F (x — 8).

8ф0

Thus, if x is a continuity point, then F(x -) = F(x); if x is a discontinuity point, then F (x -) < F (x).

The binomial distribution involved is an example of a discrete distribution. The uniform distribution on [0, 1] derived in Section 1.5 is an example of a continuous distribution with distribution function

F(x) = 0 for x < 0,

F(x) = x for x є [0, 1], (1.25)

F (x) = 1 for x > 1.

In the case of the binomial distribution (1.15), the number of discontinuity points of F is finite, and in the case of the Poisson distribution (1.16) the number of discontinuity points of F is countable infinite. In general, we have that

Theorem 1.12: The set of discontinuity points of a distribution function of a random variable is countable.

Proof: Let D be the set of all discontinuity points of the distribution function F(x). Every point x in D is associated with a nonempty open interval (F(x-), F(x)) = (a, b), for instance, which is contained in [0, 1]. For each of these open intervals (a, b) there exists a rational number q such a < q < b; hence, the number of open intervals (a, b) involved is countable because the rational numbers are countable. Therefore, D is countable. Q. E.D.

The results of Theorems 1.11 and 1.12 only hold for distribution functions of random variables, though. It is possible to generalize these results to distribution functions of random vectors, but this generalization is far from trivial and is therefore omitted.

As follows from Definition 1.11, a distribution function of a random variable or vector X is completely determined by the corresponding induced probability measure /xX( ). But what about the other way around? That is, given a distribution function F(x), is the corresponding induced probability measure /xX( ) unique? The answer is yes, but I will prove the result only for the univariate case:

Theorem 1.13: Given the distribution function F of a random vector X є R* there exists a unique probability measure д on {R*, B*} such that for x = (x1,…, x*)T є R*, F(x) = m(x*=j(-to, xi]).

Proof: Let * = 1 and let To be the collection of all intervals of the type

(a, b), [a, b], (a, b], [a, b), (-to, a), (to, a], (b, to), [b, to), a < b є R

together with their finite unions, where [a, a] is the singleton {a}, and (a, a), (a, a], and [a, a) should be interpreted as the empty set 0. Then each set in T0 can be written as a finite union of disjoint sets of the type (1.26) (compare (1.20)); hence, T0 is an algebra. Define for – to < a < b < to,

ti((a, a)) = fi((a, a]) = fi([a, a)) = г(0) = 0

^({a}) = F(a) — lim F(a — 8), г((a, b]) = F(b) — F(a)

5^0

ti([a, b)) = fi((a, b]) — ti({b}) + ti({a}),

ti([a, b]) = ti((a, b]) + ^-({a})

ti((a, b)) = fi((a, b]) — ti({b}), /л((—то, a]) = F(a)

[z([—to, a)) = F(a) — ^({a}), д((Ь, to)) = 1 — F(b) n([b, то)) = іл((Ь, to)) + ^({b})

and let д(и”=j Aj) = YTj=i г(Aj) for disjoint sets An of the type

(1.26). Then, the distribution function F defines a probability measure г on T0, and this probability measure coincides on T0 with the induced-probability measure гх. It follows now from Theorem 1.9 that there exists a a – algebra T containing T0 for which the same applies. This a-algebra T may be chosen equal to the a-algebra B of Borel sets. Q. E.D.

The importance of this result is that there is a one-to-one relationship between the distribution function F of a random variable or vector X and the induced probability measure гх. Therefore, the distribution function contains all the information about /iX.

Definition 1.12: A distribution function F on Kk and its associated probability measure г on {Kk, B} are called absolutely continuous with respect to Lebesgue measure if for every Borel set B in Kk with zero Lebesgue measure, г( B) = 0.

We will need this concept in the next section.

## Leave a reply