# Overspecification and Underspecification of the Regression Equation

So far we have assumed that the true linear regression relationship is always correctly specified. This is likely to be violated in practice. In order to keep things simple, we consider the case where the true model is a simple regression with one regressor X1.

True model: Yi = a + 01X1i + ui

with ui ~ IID(0, a2), but the estimated model is overspecified with the inclusion of an additional irrelevant variable X2, i. e.,

Estimated model: Yi = a + 01X1i + 02X2i

From the previous section, it is clear that 01 = £1=a иYi/Ym=1 ai where a1 is the OLS residuals of X1 on X2. Substituting the true model for Y we get a1 = 01 £1=1 a1iX1i/ £n=1 a1i + £n=1 a1iUi/ £ = a

since n=1 a1i = 0. But, X1i = X1i + ии and £n=1 Xut/u = 0 implying that £n=1 Xu =

£?= 1a1i. Hence, a1 = 01 + Y, 1=1 <a1iui^ n=1 au

and E(01) = 01 since a1 is a linear combination of the X’s, and E(Xku) = 0 for k = 1,2. Also,

var(a1) = a2/ £™=1 a1i = a2/ ££1 а? и(1 – R2) (4.9)

where x1i = X1i — X1 and R1 is the R2 of the regression of X1 on X2. Using the true model to estimate 01, one would get b1 = £™=1 x1iyi/Y^i=1 x1i with E(b1) = 01 and var(b1) =

a2/Y, 7=1 х21г. ) > var(61). Note also that in the overspecified model, the estimate

for в2 which has a true value of zero is given by  ^2 = £7=1 £/£7=

where a2 is the OLS residual of X2 on X1. Substituting the true model for Y we get

^2 = £7=1 ^2i Ui / YU U

since £1= ‘a2iX1i = 0 and Y7=1 a2i = 0. Hence, E(в2) = 0 since a2 is a linear combination of the X’s and E(Xku) = 0 for k = 1,2. In summary, overspecification still yields unbiased estimates of f31 and в2, but the price is a higher variance.

Similarly, the true model could be a two-regressors model

True model: Yi = a + P1X1i + j32X2i + ui where ui ~ IID(0, a2) but the estimated model is Estimated model: Yi = a + /31X1i

The estimated model omits a relevant variable X2 and underspecifies the true relationship. In this case £ = £ 7=1 xnYi/Y 7=1 x2u

where x1i = X1i — X]_. Substituting the true model for Y we get

a1 = ^1 + e2 £7=1 xnX2i/ Yji=1 x2i + ££1 xW ££1 xu (4.13)

Hence, Е(в1) = в 1 + в2b12 since E(x1 u) = 0 with b12 = £7=1 xuX^/ Y7=1 ^2i- Note that b12 is the regression slope estimate obtained by regressing X2 on X1 and a constant. Also, the

var(a1) = E(a1 — E(£)£ = E(£7=1 хищ/ Y7=l х2ц)2 = a2/ £7= x2i

which understates the variance of the estimate of в1 obtained from the true model, i. e., b1 =

££1 a1iY£ £7=1 a2i with

var(b1) = a2/Yni=1 a1i = a2UU=1 x1i(1 — r2) > var(a1)- (4.14)

In summary, underspecification yields biased estimates of the regression coefficients and under­states the variance of these estimates. This is also an example of imposing a zero restriction on в2 when in fact it is not true. This introduces bias, because the restriction is wrong, but reduces the variance because it imposes more information even if this information may be false. We will encounter this general principle again when we discuss distributed lags in Chapter 6.