# CHANGE OF VARIABLES

In this section we shall primarily study how to derive the probability distribution of a random variable Y from that of another random variable X when Y is given as a function, say ф(Х), of X. The problem is simple if X and Y are discrete, as we saw in Section 3.2.1; here we shall assume that they are continuous.

We shall initially deal with monotonic functions (that is, either strictly increasing or decreasing) and later consider other cases. We shall first prove a theorem formally and then illustrate it by a diagram.

theorem 3.6.1 Let f(x) be the density of X and let Y = ф(Х), where ф is a monotonic differentiable function. Then the density g(y) of Y is given ЬУ

(3.6.1) g(y) =/[ф_1(у)] • where ф_1 is the inverse function of ф. (Do not mistake it for 1 over ф.)

Proof. We have

(3.6.2) P(Y < у) = Р[ф(Х) < у].

Suppose ф is increasing. Then we have from (3.6.2)

(3.6.3) P(Y<y) = Р[Х<ф“1(у)].

Denote the distribution functions of У and X by G (•) andF(•), respectively. Then (3.6.3) can be written as

(3.6.4) G(y) =Т[ф_1(у)].

Differentiating both sides of (3.6.4) with respect to y, we obtain

(3.6.5) g(y) =/[ф_1(у)] •

Next, suppose ф is decreasing. Then we have from (3.6.2)

(3.6.6) P(Y < у) = P[X > ф 1 (jy)], which can be rewritten as

(3.6.7) G(y) = 1 —Т[ф-1(у)].

Differentiating both sides of (3.6.7), we obtain

(3.6.8) g(y)= -/[ф-у)]^-.

The theorem follows from (3.6.5) and (3.6.8). □

The term in absolute value on the right-hand side of (3.6.1) is called the Jacobian of transformation.

Since d\$~x/dy = {d\$/dx)~l, we can write (3.6.1) as f(x)

(3.6.9) g(y) = (or, mnemonically, g(y)|dy| = f(x)dx),

which is a more convenient formula than (3.6.1) in most cases. However, since the right-hand side of (3.6.9) is still given as a function of x, one must replace x with ф_1(у) to obtain the final answer.

EXAMPLE 3.6.1 Suppose f(x) = 1 for 0 < x < 1 and = 0 otherwise. Assuming Y = X, obtain the density g(y) of Y.

Since dy/dx = 2x, we have by (3.6.9)

(3.6.10) g(y) = ^ , 0 < x < 1.  Vy, we have from (3.6.10)

It is a good idea to check for the accuracy of the result by examining that the obtained function is indeed a density. The test will be passed in this case, because (3.6.11) is clearly nonnegative and we have

The same result can be obtained by using the distribution function and without using Theorem 3.6.1, as follows. We have

(3.6.13) G(y) = P(Y < y) = P(X2 < y) = P{X < Vy)

= f(x)dx = dx = Vy. Jo Jo  Therefore, differentiating (3.6.13) with respect to y, we obtain This latter method is more lengthy, as it does not utilize the power of Theorem 3.6.1. It has the advantage, however, of being more fundamental.

Figure 3.10 illustrates the result of Theorem 3.6.1. Since Y lies between у and у + Ay if and only if X lies between x and x + Ax, shaded regions (1) and (2) must have the same area. If Ax is small then Ay is also small, and the area of (1) is approximately f(x) Ax and the area of (2) is approxi­mately g(y) Ay. Therefore we have approximately

(3.6.15) g(y)Ay = f(x) Ax.

But if Ax is small, we also have approximately

(3.6.16) Ay = ^r-Ax.

ax

From (3.6.15) and (3.6.16) we have

Since we can make Ax arbitrarily small, (3.6.17) in fact holds exactly. In this example we have considered an increasing function. It is clear that we would need the absolute value of ddy/dx if we were to consider a decreasing function instead.

In the case of a nonmonotonic function, the formula of Theorem 3.6.1 will not work, but we can get the correct result if we understand the process by which the formula is derived, either through the formal ap­proach, using the distribution function, or through the graphic approach.

EXAMPLE 3.6.2 Given f(x) = l/z, — 1 < x < 1, and Y = X if X > 0,

= X2 if X < 0,

find g(y).

We shall first employ a graphic approach. In Figure 3.11 we must have area (3) = area (1) + area (2). Therefore

(3.6.18) g(y)Ay = f(xi)Axi + /(х2)Дх2 = ^ Axi + Дх2 . Therefore  2 2 2×2 2 4л5Г

3.11 is helpful even in a formal

P(Y < y) = P(x2 < X < Xi)

= P(-Vy < X < y)

= P(X < y) – P(X < -Vy).

Therefore

(3.6.21) G(y) = F(y) ~ F(-‘Jy). Differentiating (3.6.21) with respect to y, we get (3.6.22) g{y) = f(y) + f(-‘ly) ^

2чу * 4чу

The result of Example 3.6.2 can be generalized and stated formally as follows.

THEOREM 3.6.2 Suppose the inverse of у = ф(х) is multivalued and can be written as

(3.6.23) = ф;(у), і = 1, 2, . . . , ny.

Note that ny indicates the possibility that the number of values of x varies with y. Then the density g(-) of У is given by  ЛФ.<зО]

|ф'[фі(у)]|

where /(•) is the density of X and ф’ is the derivative of ф.

So far we have studied the transformation of one random variable into another. In the next three examples we shall show how to obtain the density of a random variable which is a function of two other random variables. We shall always use the method in which the distribution func­tion is obtained first and then the density is obtained by differentiation. Later in this section we shall discuss an alternative method, called the

Jacobian method, which is a generalization of Theorem 3.6.1; but the pre­sent method is more fundamental and will work even when the Jacobian method fails.

EXAMPLE 3.6.3 Assume f(x, y) = 1 for 0 < x < 1, 0 < 3) < 1 and = 0 otherwise. Calculate the density function g(z) of Z = max(X, F).

For any z, the event (Z < z) is equivalent to the event (X < z, Y < z); hence, the probability of the two events is the same. Since X and F are independent, we have

(3.6.25) P(Z < z) = P{X < z)P(Y < z)

= z2, 0 < z < 1.

Since the density g(z) is the derivative of the distribution function, we conclude that g(z) = 2z, 0 < z < 1.

EXAMPLE 3.6.4 Let X and F have the joint density f(x, у) = 1 for 0 < x < 1 and 0 < у < 1. Obtain the density of Z defined by Z = Y/X.

See Figure 3.12. Let F(-) be the distribution function of Z. Then

(3.6.26) F(z) = P(Y/X < z) = P(Y < zX)

= area A = — for 0 < z < 1,

2 1

= 1 — area В = 1 — — for z ^ 1. 2z

Differentiating (3.6.26) with respect to z, we get

(3.6.27) /(z) = | for 0 < z < 1,

for z > 1. EXAMPLE 3.6.5 Assume again f(x, y) = 1 for 0<x<l,0<y<l and = 0 otherwise. Obtain the conditional density f(x | F = 0.5 + X).

This problem was solved earlier, in Example 3.4.9, but here we shall present an alternative solution using the distribution function. The pre­sent solution is more complicated but serves as an exercise. Define Z = F — X — 0.5. Then we have

(3.6.28) F(z I x) = P(Z < z I X = x) = P(Y < z + x + 0.5)

= z + x + 0.5, —0.5 — x < z < 0.5 — x.

Therefore

(3.6.29) /(z | x) = 1, —0.5 — x < z < 0.5 — x, 0 < x < 1.

Therefore, from (3.6.29) and the marginal density of X,

(3.6.30) f(x, z) = 1, —0.5 — x < z < 0.5 — x, 0 < x < 1.

The domain of the joint density/(x, z) is indicated by the shaded region in Figure 3.13. From (3.6.30) we get  f( l/2)-z 1

(3.6.31) /(z) = dx = – – z,

J о 2

dx = — + z,

— (1/2) —z 2

Therefore, from (3.6.30) and (3.6.31), we finally get

(3.6.32) /(x I Y = 0.5 + X) = /(x I Z = 0) = 2 for 0 < x < 0.5,  = 0

Theorem 3.6.3 generalizes Theorem 3.6.1 to a linear transformation of a bivariate random variable into another.

theorem 3.6.3 Let f{x,x 2) be the joint density of a bivariate random variable (Xi, X2) and let (Tj, У2) be defined by a linear transformation

(3.6.33) Ті = йцХі + Й12Х2 Y2 = 1З21Х1 + 022-^2-

Suppose «11^22 — «12^21 ^ 0 so that (3.6.33) can be solved for X and X2 as

(3.6.34) Xi = b]Y + Й12Т2 X2 — Й21Т1 + b^^Y2-

Then the joint density g(yi, у2) of (yj, T2) is given by  , S f(b\y + ЬПУ2,Ь2У + ^22X2) Ші. Уг) = і 1 ’

where the support of g, that is, the range of (y, у2) over which g is positive, must be appropriately determined.

The absolute value |йца22 — аі2й2іІ appearing on the right-hand side of

(3.6.35) is called the Jacobian of transformation. That this is needed can be best understood by the following geometric consideration. Consider a small rectangle on the Xi-X2 plane, where the coordinates of its four corners—counterclockwise starting from the southwest corner—are given by (Хь X2), (X1 + ДХЬ X2), (Xj + ДХь X2 + ДХ2), and (X1; X2 + ДХ2). The linear mapping (3.6.33) maps this rectangle to a parallelogram on the Tj-T2 plane, whose coordinates are given by {a\X + a12X2, a^Xi + «22X2), («1 ]X1 – j – «19X2 1 (i[ [ДХ], «91Xі + «9^X9 I «91ДA~]), («] ] X ] + «12X"9 T ОцДХі T #]2ДХ2, a%iX] T o22X2 T о2іДХі T о22ДХ2), and (a\X T ai2X2 + «12ДХ2, a^iXi + 022X2 + а22ДХ2). The area of the rectangle is ДХ]ДХ2, and if we suppose for simplicity that all the a’s are positive and that ЙЦЙ22 — ana2 > 0, then the area of the parallelogram must be (0ц022 0i202i) ДХіДХ2.

Chapter 11 shows that йца22 — Оі2а2і is the determinant of the 2X2 matrix

an an #21 #22

By using matrix notation, Theorem 3.6.3 can be generalized to a linear transformation of a general n-variate random variable into another.

EXAMPLE 3.6.6 Suppose /(x 1, x2) = 4xjx2 for 0 < xj < 1 and 0 ^ x2 ^

1. If

(3.6.36) Ti = Xx + 2X2

T2 = Xj – x2,

what is the joint density of (Уь T2)?

Solving (3.6.36) for Xi and X2, we obtain

(3.6.37) Xj = ^У1+|т2

Inserting the appropriate numbers into (3.6.35), we immediately obtain •6.38) g(yh y2) = (yi + 2y2) (yi – y2).

-1 –   figure 3.14 Illustration for Example 3.6.6

Next we derive the support of g. Since 0 ^ xx < 1 and 0 < x2 ^ 1, we have from (3.6.37) (3.6.39) 0 < І Ті + IT2 < 1

Thus the support of g is given as the inside of the parallelogram in Figure 3.14.