# Linear Models

Suppose there are observations of 1 + k1 + k2 variables (yit, x’t, §’t) of N cross­sectional units over T time periods, where i = 1,…, N, and t = 1,.. ., T. Let y = (yv…, yN)^ X = diag (X) X = diag ^ where у’ = (yiy…, Уіт^ Xi and Zi

are T x k1 and T x k2 matrices of T observed values of the explanatory variables x’t and § ‘t for i = 1,…, N. If all the individuals in the panel data are different, we have the unconstrained linear model,

у = Xp + Zy + u, (16.2)

where p = (p1,…, pN)i and y = (yj,…, yN)’ are Nk1 x 1 and Nk2 x 1 vector of constants, u = (™1,…, ™N) is an NT x 1 vector of the error term, pi, y and ui denote the coefficients of Xi, Zi and the error of the ith individual for i = 1,…, N. We assume that u is independent of v and z and is multivariately normally distri­buted with mean 0 and covariance matrix C1,1 u ~ N(0, C1).

There is no particular advantage of pooling the panel data to estimate (16.2) except for the possibility of exploiting the common shocks in the error term if C1 is not block diagonal by applying the Zellner’s (1962) seemingly unrelated re­gression estimator. To take advantage of the panel data there must be constraints on (16.2). Two types of constraints are commonly imposed – stochastic and exact. We shall assume that the coefficients of xit are subject to stochastic constraints and the coefficients of §it are subject to exact constraints.

To postulate stochastic constraints, we let = A1P + £,

V, N J

where A1 is an Nk1 x m matrix with known elements, ° is an m x 1 vector of constants, and

e ~ N(0, C2). (16.5)

The variance covariance matrix C2 is assumed to be nonsingular. Furthermore, we assume that2

cov(e, u) = 0, cov(e, X) = 0 and cov(e, Z) = 0. (16.6)

To postulate exact constraints, we let   (
У1

чУ Ny

where A2 is an Nk2 x n matrix with known elements, and y is an n x 1 vector of constants. Because A2 is known, (16.2) is formally identical to у = Xp + Zy + u,

where Z = ZA2.

Formulation (16.4)-(16.8) encompasses various linear panel data models as special cases. These include: A common model for all cross-sectional units by letting X = 0, A2 = eN ® Ik2, where eN is an N x 1 vector of ones, Ip denotes a p x p identity matrix and ® denotes Kronecker product.

Different models for different cross-sectional units by letting X = 0 and A2 be an Nk2 x Nk2 identity matrix.

 and I*2_1 is a k2 x (k2  Variable intercept models (e. g. Kuh, 1963; Mundlak, 1978) by letting X = 0, Zi = (gT, Zi), A2 = (In ® Іk2 і Sn ® I*2_1) where! k2 is a k2 x 1 vector of (1, 0,…, 0)’ Error components model (e. g. Balestra and Nerlove, 1966; Wallace and Hussain, 1969) by letting X; = eT, A1 = eN, C2 = o2 In.

Random coefficients models (e. g. Hsiao, 1974, 1975; Swamy, 1970) by letting Z = 0, A1 = eN ® ff, C2 = In ® A, where A = E(g; _ §)(§,- _ §)’.

Mixed fixed and random coefficients models (e. g. Hsiao et al., 1989; Hsiao and Mountain, 1995; Hsiao and Tahmiscioglu, 1997) as postulated by (16.4)-(16.8).

Substituting (16.4) into (16.8), we have

у = XA1° + Zy + u*, (16.9)

with u* = u + Xe. The generalized least squares (GLS) estimator of (16.9) is A1X ‘(C1 + XC2 X ‘)-1 у v f(Cx + XC2X’)V y The GLS estimator is also the Bayes estimator conditional on C1 and C2 with a diffuse prior for ° and « (Lindley and Smith, 1972; Hsiao, 1990). Moreover, if predicting individual • is of interest, the Bayes procedure predicts • as a weighted average between the GLS estimator of ° and the individual least squares esti­mator of p; (Hsiao et al., 1993)

p* = {X’DX + C-1}-1 {X’DXp + C^1A1§}, (16.11)

where D = [C-1 – C-1Z(Z’C-1Z)-1Z ,C1] and

p = {X’DX }-1{X’Dy} (16.12)

In other words, if cross-sectional units are similar as postulated in (16.4), there is an advantage of pooling since if there is not enough information about a cross­sectional unit, one can obtain a better prediction of that individual’s outcome by learning from other cross section units that behave similarly.

The above formulation presupposes that which variables are subject to stochastic constraints and which variables are subject to deterministic constraints is known. In practice, there is very little knowledge about it. In certain special cases, formal statistical testing procedures have been proposed (e. g. Breusch and Pagan, 1980; Hausman, 1978). However, a typical test postulates a simple null versus a com­posite alternative. The distribution of a test statistic is derived under an assumed true null hypothesis. A rejection of a null hypothesis does not automatically imply the acceptance of a particular alternative. However, most of the tests of fixed versus random effects specification are indirect in the sense that they ex­ploit a particular implication of the random effects formulation. For instance, the rejection of a null of homoskedasticity does not automatically imply the accep­tance of a particular alternative of random effects. In fact, it would be more useful to view the above different formulations as different models and treat them as an issue of model selection. Various model selection criteria (e. g. Akaike, 1973; Schwarz, 1978) can be used. Alternatively, predictive density ratio (e. g. Hsiao and Tahmiscioglu, 1997; Min and Zellner, 1993) can be used to select the appro­priate formulation. The predictive density ratio approach divides the time series observations into two periods, 1 to T1, denoted by у* and T1 + 1 to T, denoted by у*. The first period observations, у*, are used to derive the posterior probability distribution of 0O and 01 given hypothesis H0 and H1, f (0о|у*) and /(0|у*). The second period observations are used to compare how H0 or H1 predicts the out­come. The predictive density ratio is then computed as

if(yJIQo/ y*)f(Qoly*)dQo

, (16.13)

if(y*IQi/ y*)f(0oly?)d0i

where f (y* I 01/ y*) and f (y* | 02, y*) are the conditional densities of y* given y* and 00 or 01. If (16.13) is greater than 1, then H0 is favored. If (16.13) is less than 1, then H1 is favored. When T is small, a recursive updating scheme of the posterior probability distribution of 0o and 01 each with additional observations can be used to balance the sample dependence of predictive outcome and informativeness of the conditional density of 0i given observables. The Monte Carlo studies ap­pear to indicate that the predictive density ratio performs well in selecting the appropriate formulation (e. g. Hsiao et al., 1995; Hsiao and Tahmiscioglu, 1997).