Prediction
We shall add to Model 1 thepth period relationship (wherep> T)
yp = ‘p0 + up, (1.6.1)
where yp and up are scalars and p are the pth period observations on the regressors that we assume to be random variables distributed independently of up and u.10 We shall also assume that up is distributed independently of u with Eup = 0 and Vup = a2. The problem we shall consider in this section is how to predict yp by a function of у, X, and xp when and a2 are unknown.
We shall only consider predictors of yp that can be written in the following form:
y*=x’pfi*, (1.6.2)
where fi* is an arbitrary estimator offi and a function of у and X. Here, /?* may be either linear or nonlinear and unbiased or biased. Although there are more
general predictors of the form /(xp, у, X), it is natural to consider (1.6.2) because x’pft is the best predictor of yp if is known.
The mean squared prediction error of у * conditional on xp is given by
E[(y*~ yp)2Xp] = o2 + x^Eifi* – ftfi* – p)’xp,
where the equality follows from the assumption of independence between fi* and Up. Equation (1.6.3) clearly demonstrates that as long as we consider only predictors of the form (1.6.2) and as long as we use the mean squared prediction error as a criterion for ranking predictors, the prediction of yp is essentially the same problem as the estimation of x’pfi. Thus the better the estimator of xpfi is, the better the predictor of yp.
In particular, the result of Section 1.2.5 implies that xpfi, where fi is the LS estimator, is the best predictor in the class of predictors of the form xp С’ у such that C’X = I, which we shall state as the following theorem:
A
Theorem 1.6.1. Let fi be the LS estimator and C be an arbitrary TX К matrix of constants such that C’X = I. Then
E[(x’fi – ypf Xp] Si Г[(х’С’у – yp)2Xp],
where the equality holds if and only if C = X(X’X) L.
Actually we can prove a slightly stronger theorem, which states that the least squares predictor xpfi is the best linear unbiased predictor.
Theorem 1.6.2. Let d be a Гvector the elements of which are either constants or functions of xp. Then
EWy ~ Ур)2W ^ E[(xtf – Ур)2Хр]
for any d such that is(d’yXp) = E(ypXp). The equality holds if and only if d’ = x’tX’Xr’X’.
Proof. The unbiasedness condition EXd’ylXp) = E(ypXp) implies
d’x = x;.
Using (1.6.5), we have
E[(d’y – yp)2Xp] = E[(d’u – Up)2xp]
= ct2(1 + d’d).
But from (1.6.3) we obtain
Therefore the theorem follows from
d’d – x^X’X)~lXp = [d – X(X’X)1×1,],[d – Х(Х’Х)‘х„], (1.6.8)
where we have used (1.6.S) again.
Theorem 1.6.2 implies Theorem 1.6.1 because Cxp of Theorem 1.6.1 satisfies the condition for d given by (1.6.5).11
In Section 1.2.4 we stated that if we cannot choose between a pair of estimators by the criterion defined by Definition 1.2.2 (that is, if the difference of the mean squared error matrices is neither nonnegative definite nor nonpositive definite), we can use the trace or the determinant as a criterion. The conditional mean squared prediction error defined in (1.6.3) provides an alternative scalar criterion, which may have a more intuitive appeal than the trace or the determinant because it is directly related to an important purpose to which estimation is put—namely, prediction. However, it has one serious weakness: At the time when the choice of estimators is made, xp is usually not observed.
A solution to the dilemma is to assume a certain distribution for the random variables xp and take the further expectation of (1.6.3) with respect to that distribution. Following Amemiya (1966), let us assume
EXpX^T‘X’X. (1.6.9)
Then we obtain from (1.6.3) the unconditional mean squared prediction error
E(y*yp)2 = <x2 + T‘E(p*p)’X’X{fi*p). (1.6.10)
This provides a workable and intuitively appealing criterion for choosing an estimator. The use of this criterion in choosing models will be discussed in Section 2.1.5. The unconditional mean squared prediction error of the least squares predictor xpfi is given by
E(x’Jyp)2 = o2(l + TlK). (1.6.11)
Exercises
1. (Section 1.1.2)
Give an example of a pair of random variables that are noncorrelated but not independent.
2. (Section 1.1.3)
Let у and x be scalar dichotomous random variables with zero means.
Define u = y — Cov(y, x^Vx) lx. Prove E(ux) = 0. Are и and x independent?
3. (Section 1.1.3)
Let у be a scalar random variable and x be a vector of random variables. Prove E[y — E(yx)]2 S E[y — g(x)]2 for any function g.
4. (Section 1.1.3)
A fair die is rolled. Let у be the face number showing and define x by the rule:
x = у if у is even = 0 if у is odd.
Find the best predictor and the best linear predictor of у based on x and compare the mean squared prediction errors.
5. (Section 1.2.1)
Assume that у is З X 1 and X = (Xj, X2) is З X 2 and draw a threedimensional analog of Figure 1.1.
6. (Section 1.2.5)
Prove that the class of linear unbiased estimators is equivalent to the class of instrumental variables estimators.
7. (Section 1.2.5)
In Model 1 find a member of the class of linear unbiased estimators for which the trace of the mean squared error matrix is the smallest, by minimizing tr C’C subject to the constraint C’X = I.
8. (Section 1.2.5)
Prove that Pi defined by (1.2.14) is a best linear unbiased estimator of.
9. (Section 1.2.5)
In Model 1 further assume К = 1 and X = 1, where 1 is the vector of ones. Define 0+ = ГуЦТ + 1), obtain its mean squared error, and compare it with that of the least squares estimator Д.
10. (Section 1.2.5)
In Model 1 further assume that T= З, K= 1, X = (1, 1, 1)’, and that {«г), / = 1, 2, 3, are independent with the distribution
и, — о with probability і
— —о with probability
Obtain the mean squared errorofPR = y’ y/x’ у and compare it with that of the least squares estimator p. (Note that PR, the reverse least squares estimator, is obtained by minimizing the sum of squared errors in the direction of the xaxis.)
11. (Section 1.3.2)
Assume К = 1 in Model 1 with normality and furthermore assume p2 = a2. Obtain the maximum likelihood estimator of P and obtain the CramerRao lower bound.
12. (Section 1.4.2) Suppose
Find a row vector R’ such that (Q, R) is nonsingular and R’Q = 0.
13. (Section 1.4.2)
Somebody has run a least squares regression in the classical regression model (Model 1) and reported
On the basis of this information, how would you estimate P if you believed Pi + Pi = Pp.
14. (Section 1.5.3)
We have T observations on the dependent variable y, and the independent variables x, and zt, t= 1,2, . . . , T. We believe a structural change occurred from the first period consisting of Г, observations to the second period consisting of T2 observations (Г, + T2=T) in such a way that in the first period Ey, depends linearly on x, (with an intercept) but not on z, whereas in the second period Eyt depends linearly on z( (with an intercept) but not on x,. How do you test this hypothesis? You may assume that {y,) are independent with constant variance a2 for / = 1, 2, . . . , T.
15. (Section 1.5.3)
Consider the following two regression equations, each of which satisfies the assumptions of the classical regression model (Model 1):
(1) y, =al + a2*i +Uj
where a’s and P’s are scalar parameters, y,, y2, x,, x2, u,, and u2 are sevencomponent vectors, and 1 is the sevencomponent vector of ones. Assume that u, and u2 are normally distributed with the common variance a2 and that ut and u2 are independent of each other. Suppose that Гх, = Гх2 = 0 and І’Уі = І’Уг= 7. Suppose also that the sample moment matrix of the four observable variables is given as
Уі 
У2 
*2 

Уі 
9.3 
7 
2 
1.5 
У2 
7 
9 
3 
1 
X1 
2 
3 
2 
1.2 
*2 
1.5 
1 
1.2 
1 
For example, the table shows у {у, = 9.3 and у y2 = 7. Should you reject the joint hypothesis “c^ = /?, and a, + 2a2 = p2 at the 5% significance level? How about at 1%?
16. (Section 1.5.3)
Consider a classical regression model y, = a, x, + /?,z, +u,
y2 = 0!2*2 + &Z2 + U2
Уз = <*3X3 + Рзгз + U3,
where a’s and P’s are scalar unknown parameters, the other variables are vectors often elements, x’s and z’s are vectors of known constants, and u’s are normal with mean zero and £ii, u = a11 for every і and 2ш, и, = 0 if і Ф j. Suppose the observed vector products are as follow:
УІУі =2, 
У2У2 = 5, 
УзУз = 2 
УІ*і = і, 
У 2 *2 = 3, 
Уз*з = 2 
У іжі =2, 
У2*2 =3, 
w, II 
ХХ, = Z2 
!, = 4 for every і 

x’jZt =0 
for every 
І. 
Test the joint hypothesis (a, = a2 and p2 — Рз) at the 5% significance level.
17. (Section 1.6)
Consider Model 1 with the added predictionperiod equation (1.6.1). Suppose Z is a T X L matrix of known constants and zp is an Lvector of known constants. Which of the following predictors of yp is better? Explain.
9p = z’p{Z’Z)‘Z’y
jV = z;(Z’Z)1Z’X(X’X)‘X/y.
18. (Section 1.6)
Consider the case К = 2 in Model 1, where у, — ftx+ P2x, + ut. For the prediction period we have yp = + fl2xP + where up satisfies the as
sumptions of Section 1.6. Obtain the mean squared prediction error of the predictor pp=T~l! y, and compare it with the mean squared prediction error of the least squares predictor.
19. (Section 1.6)
Prove that any d satisfying (1.6.5) can be written as Cxp for some C such thatC’X = I.
Leave a reply