# Mathematical Expectation

With these new integrals introduced, we can now answer the second question stated at the end of the introduction: How do we define the mathematical ex­pectation if the distribution of X is neither discrete nor absolutely continuous?

Definition 2.12: The mathematical expectation of a random variable X is defined as E(X) = f X(o)dP(o) or equivalently as E(X) = f xdF(x) (cf(2.15)), whereFis the distribution function ofX, provided that the integrals involved are defined. Similarly, if g(x) is a Borel-measurable function on Kk and Xis a random vector in Kk, then, equivalently, E[g(X)] = f g(X(o))dP(o) = f g(x )dF(x), provided that the integrals involved are defined.

Note that the latter part of Definition 2.12 covers both examples (2.1) and (2.3).

As motivated in the introduction, the mathematical expectation E [g(X)] may be interpreted as the limit of the average payoff of a repeated game with pay­off function g. This is related to the strong law of large numbers, which we
will discuss in Chapter 7: Let X1, X2, X3, …be a sequence of independent random variables or vectors each distributed the same as X, and let g be a Borel-measurable function such that E[|g(X)|] < to. Then

P ( Hm(1/n)£ g(Xj) = E[g(X)] J = 1.

There are a few important special cases of the function g – in particular the variance ofX, which measures the variation of Xaround its expectation E(X) – and the covariance of a pair of random variables X and Y, which measures how Xand Yfluctuate together around their expectations:   Definition 2.13: The m’s moment (m = 1, 2, 3,…) of a random variable X is definedasE (Xм), and them’s central moment ofX is defined by E (| X — /лх |m), where /лх = E(X). The second central moment is called the variance of X, var(X) = E [(X — ixx )2] = ax, for instance. The covariance of a pair (X, Y) of random variables is defined as cov(X, Y) = E[(X — цх) (Y — /лу)], where /лх is the same as before, and /лу = E(Y). The correlation (coefficient) of a pair (X, Y) of random variables is defined as

The correlation coefficient measures the extent to which Y can be approxi­mated by a linear function of X, and vice versa. In particular,

If exactly Y = a + вX, then corr(X, Y) = 1 if в > 0,

corr(X, Y) =-1 if в < 0. (2.17)

Moreover,

Definition 2.14: Random variables X and Y are said to be uncorrelated if cov(X, Y) = 0. A sequence of random variables Xj is uncorrelated if, for all i = j, Xi and Xj are uncorrelated.

Furthermore, it is easy to verify that

Theorem 2.19: If Xi,…,Xn are uncorrelated, then var(Jfj =1 Xj) =

T! j= var(Xj).

Proof: Exercise.

2.5. Some Useful Inequalities Involving Mathematical Expectations

There are a few inequalities that will prove to be useful later on – in particular the inequalities of Chebishev, Holder, Liapounov, and Jensen.

2.6.1. Chebishev’s Inequality

Let X be a nonnegative random variable with distribution Function F(x), and let p(x) be a monotonic, increasing, nonnegative Borel-measurable function on [0, to). Then, for arbitrary є > 0,

E [p(X)] = j p(x)dF(x ) = j p(x)dF(x)

{p(x )>р(є)}

+ j p(x)dF(x) > j p(x)dF(x) > р(є)

{р(х)<р(є)} {р(х)>р(є)}

x j dF(x) = р(є) j dF(x) = р(є)(1 — F(є));

{p(x )>р(є)} {x >є}

(2.18)

hence,

P(X > є) = 1 — F(є) < E[р(Х)]/р(є). (2.19)

In particular, it follows from (2.19) that, for a random variable Ywith expected value ц. у = E(Y) and variance oj,

P({« є П : |Y(«) — ^y| >J0y/) < є. (2.20)

2.6.2. Holder’s Inequality

Holder’s inequality is based on the fact that ln(x) is a concave function on (0, to): for 0 < a < b, and 0 < X < 1, ln(Xa + (1 — X)b) > Xln(a) + (1 — X) ln(b);

hence,

Xa + (1 — X)b > aX b1—X. (2.21)   Now let X and Y be random variables, and put a = |X|p/E(|X|p), b = |Y|q/E(|Y|q), where p > 1 and p—1 + q—1 = 1. Then it follows from (2.21), with X = 1/p and 1 — X = 1/q, that

= | X ■ Y |

(E(|X|p))1/p (E(|Y|q))1/q ‘

Taking expectations yields Holder’s inequality:

E(|X ■ Y|) < (E(|X|p))1/p (E(|Y |q))1/q,

where p > 1 and 1 + – = 1. (2.22)

pq

Forthecasep = q = 2, inequality (2.22)reads E(|X ■ Y|) < ^E(X2)^/E(Y2), which is known as the Cauchy-Schwartz inequality.