# Implications of Linearity

Suppose random variables yt and xf have finite second moments and their variance-covariance matrix is denoted by Then we can always write

+ + (1.1.3)

where yj[ = %22^12> A) = Eyt Оі2%22Ех*, El)t = 0, Vl}t — a 012^22^ 12> and Ex? v, = 0. It is important to realize that Model 1 implies certain assump­tions that (1.1.3) does not: (1.1.3) does not generally imply linearity of E(y,xf) because E(v,xf) may not generally be zero.

We call y?0 + xf’A in (1.1.3) the best linear predictor of y, given xf because Д) and fii can be shown to be the values of b0 and b, that minimize E(yt — b0 — xf’ b, )2. In contrast, the conditional mean E(y,|xf) is called the best predictor of y, given xf because E[y, — E(ytxf )]2 if E[y—g(xf )]2 for any function g.

The reader might ask why we work with eq. (1.1.2) rather than with (1.1.3). The answer is that (1.1.3) is so general that it does not allow us to obtain interesting results. For example, whereas the natural estimators of /f0 and fii can be defined by replacing the moments of y, and xf that characterize Д, and with their corresponding sample moments (they actually coincide with the least squares estimator), the mean of the estimator cannot be evaluated with­out specifying more about the relationship between xf and vt.

How restrictive is the linearity of E(y,xf)? It holds if y, and xf are jointly normal or if y, and xf are both scalar dichotomous (Bernoulli) variables.1 But the linearity may not hold for many interesting distributions. Nevertheless, the linear assumption is not as restrictive as it may appear at first glance because xf can be variables obtained by transforming the original indepen­dent variables in various ways. For example, if the conditional mean of y„ the supply of good, is a quadratic function of the price, p„ we can put xf = (pt, pf)’, thereby making E(ytxf) linear.