Empirical Example
Table 3.2 gives (i) the logarithm of cigarette consumption (in packs) per person of smoking age (> 16 years) for 46 states in 1992, (ii) the logarithm of real price of cigarettes in each state, and (iii) the logarithm of real disposable income per capita in each state. This is drawn from Baltagi and Levin (1992) study on dynamic demand for cigarettes. It can be downloaded as Cigarett. dat from the Springer web site.
Table 3.2 Cigarette Consumption Data
LNC: log of consumption (in packs) per person of smoking age (>16) LNP: log of real price (1983$/pack) LNY: log of real disposable income percapita (in thousand 1983$)
Data: Cigarette Consumption of 46 States in 1992 
66 Chapter 3: Simple Linear Regression Table 3.3 Cigarette Consumption Regression Analysis of Variance

Log of Real Price (1983$/Pack) Figure 3.9 Residuals Versus LNP 
Table 3.3 gives the SAS output for the regression of logC on logP. The price elasticity of demand for cigarettes in this simple model is (dlogC/logP) which is the slope coefficient. This is estimated to be —1.198 with a standard error of 0.282. This says that a 10% increase in real price of cigarettes has an estimated 12% drop in per capita consumption of cigarettes. The R2 of this regression is 0.29, s2 is given by the Mean Square Error of the regression which is 0.0266. Figure 3.9 plots the residuals of this regression versus the independent variable, while Figure
3.10 plots the predictions along with the 95% confidence interval band for these predictions. One observation clearly stands out as an influential observation given its distance from the rest of the data and that is the observation for Kentucky, a producer state with very low real price. This observation almost anchors the straight line fit through the data. More on influential observations in Chapter 8.
.2 .3 .4
Log of Real Price (1983$/Pack)
Figure 3.10 95% Confidence Band for Predicted Values
1. For the simple regression with a constant Yi = a + /ЗХі + ui, given in equation (3.1) verify the following numerical properties of the OLS estimator:
(c) Show that ciOLS is consistent for a.
(d) Show that cov(aOLS^вOLS) = —Xvar(вOLS) = —a2X/J2"=i x2. This means that the sign of the covariance is determined by the sign of X. If X is positive, this covariance will be negative. This also means that if aOLS is overestimated, вOLS will be underestimated.
(b)
Show that EП=1 Xi = 1 and EП=1 XiXi = 0.
(e) Prove that var(a) = ЕЕ Г=1 bi = E E Г=1 X2 + EE IE fi =var(iOLS) + E E n=1f2.
7. (a) Differentiate (3.9) with respect to a and в and show that 3MLE = 3OLS, PmLe = вOLS.
(b) Differentiate (3.9) with respect to a2 and show that H2MLE = EГ=1 e/n.
8. The tStatistic in a Simple Regression. It is well known that a standard normal random variable N(0,1) divided by a square root of a chisquared random variable divided by its degrees of freedom (xV/v)2 results in a random variable that is tdistributed with v degrees of freedom, provided the N(0,1) and the x2 variables are independent, see Chapter 2. Use this fact to show that
9. Relationship Between B2 and r2xy. (a) Using the fact that R2 = EГ=1 32/ EГ=1 Уі ; Зі = liOLSxE and ][2]ols = EГ=1 xiii/ EГ=1 x2, 
r2xy where, 
Xiii)2/(E n=1 xi )(E I=1 yi). 
Уі + ei, show that n=1 УіУі = E 1 УІ)( Г=1 в2) is equal to R2. 
^ = (E n=1 уіУі )2/(E n 10. Prediction. Consider the problem of predicting Y0 from (3.11). Given X0, 
(вOLS – e)/[s/(EI=1 xi) 1 ] – tn2.
(a) Show that E(Y0) = a + вXo.
(b) Show that Y0 is unbiased for E(Y0).
(c) Show that var(Y0) = var(3OLS) + X0var(eOLS) + 2X0cov(3OLS, eOLS). Deduce that var(Y0) = a2[(1/n) + (X0 – .X)2/Е”=1 x2].
(d) Consider a linear predictor of E(Y0), say Y0 = E"=1 a2Y2, show that E"=1 a2 = 1 and E”=1 aiXi = X0 for this predictor to be unbiased for E(Y0).
(e) Show that the var(Y0) = a2 EП=1 a2. Minimize EП=1 a2 subject to the restrictions given in (d). Prove that the resulting predictor is Y0 = 3OLS + eOLSX0 and that the minimum variance is a2 [(1/n) + (X0 – X)2/ EП=1 x2].
11. Optimal Weighting of Unbiased Estimators. This is based on Baltagi (1995). For the simple regression without a constant Y2 = eX2 + u2,i = 1, 2,…,N; where в is a scalar and u2 — IID(0, a2) independent of X2. Consider the following three unbiased estimators of в:
y1 = E”=1 XiYi/Y:n=1 X2, 3 = y/X and
y3 = E”=1(Xi – X)(Y – Y)/E”=1(Xi – .X)2, where X = EГ=1 Xi/n and Y = En=1 Yi/n.
(a) Show that cov(в1,в2) = var^) > 0, and that p12 = (the correlation coefficient of в1 and в2) = [var(/31)/var(/32)] 2 with 0 < p12 < 1. Show that the optimal combination of /31 and в2, given by в = ав1 + (1 – а)в2 where – то < a < ж occurs at a* = 1. Optimality here refers to minimizing the variance. Hint: Read the paper by SamuelCahn (1994).
(b) Similarly, show that cov(/1, в3) = var(/i) > 0, and that p13 = (the correlation coefficient of ві and /З3) = [var(/31)/var(/33)j 2 = (1 — p^2) 2 with 0 < p13 < 1. Conclude that the optimal combination в1 and в3 is again a* = 1.
(c) Show that cov(/2,в3) = 0 and that optimal combination of в2 and в3 is в = (1 — р22)в3 + р12в2 = в1. This exercise demonstrates a more general result, namely that the BLUE of в in this case в1 , has a positive correlation with any other linear unbiased estimator of в, and that this correlation can be easily computed from the ratio of the variances of these two estimators.
12. Efficiency as Correlation. This is based on Oksanen (1993). Let в denote the Best Linear Unbiased Estimator of в and let в denote any linear unbiased estimator of в. Show that the relative efficiency of в with respect to в is the squared correlation coefficient between в and в. Hint: Compute the variance of в + А(в — в) for any A. This variance is minimized at A = 0 since в is BLUE. This
2
should give you the result that E(в ) = E(вв) which in turn proves the required result, see Zheng (1994).
13. For the numerical illustration given in section 3.9, what happens to the least squares regression coefficient estimates (aoLS ffioLS), s2, the estimated se(aoLS) and se(/oLS), ^statistic for aoLS and вOLS for ; a = 0, and Hb; в = 0 and R2 when:
(a) Y is regressed on X2 + 5 rather than X2. In other words, we add a constant 5 to each observation of the explanatory variable Xi and rerun the regression. It is very instructive to see how the computations in Table 3.1 are affected by this simple transformation on Xi.
(b) Yi + 2 is regressed on Xi. In other words, a constant 2 is added to Yi.
(c) Yi is regressed on 2Xi. (A constant 2 is multiplied by Xi).
14. For the cigarette consumption data given in Table 3.2.
(a) Give the descriptive statistics for logC, logP and logY. Plot their histogram. Also, plot logC versus logY and logC versus logP. Obtain the correlation matrix of these variables.
(b) Run the regression of logC on logY. What is the income elasticity estimate? What is its standard error? Test the null hypothesis that this elasticity is zero. What is the s and R2 of this regression?
(c) Show that the square of the simple correlation coefficient between logC and logY is equal to R2. Show that the square of the correlation coefficient between the fitted and actual values of logC is also equal to R2 .
(d) Plot the residuals versus income. Also, plot the fitted values along with their 95% confidence band.
15. Consider the simple regression with no constant: Yi = вXi + ui i =1, 2,…,n
where ui ~ IID(0,<t2) independent of Xi. Theil (1971) showed that among all linear estimators in Yi, the minimum mean square estimator for в, i. e., that which minimizes E(в — в)2 is given by
в = в2 E”=1 XiViKp2 E”=1 X2 + a2).
(a) Show that E(в) = в/(1 + c), where c = a2/в2 YI"=1 X2 > 0.
(b) Conclude that the Bias (в) = E(в) — в = — [c/(1 + с)]в. Note that this bias is positive (negative) when в is negative (positive). This also means that в is biased towards zero.
(c) Show that MSE(/3) = E(f3 — в)2 = a2/Е"=1 X2 + (a2/в2)]. Conclude that it is smaller than the mse(/?ols).
Table 3.4 Energy Data for 20 countries

16. Table 3.4 gives crosssection Data for 1980 on real gross domestic product (RGDP) and aggregate energy consumption (EN) for 20 countries
(a) Enter the data and provide descriptive statistics. Plot the histograms for RGDP and EN. Plot EN versus RGDP.
(b) Estimate the regression:
log(En) = a + (3log(RGDP) + u.
Be sure to plot the residuals. What do they show?
(c) Test в = 1.
(d) One of your Energy data observations has a misplaced decimal. Multiply it by 1000. Now repeat parts (a), (b) and (c).
(e) Was there any reason for ordering the data from the lowest to highest energy consumption? Explain.
Lesson Learned: Always plot the residuals. Always check your data very carefully.
17. Using the Energy Data given in Table 3.4, corrected as in problem 16 part (d), is it legitimate to reverse the form of the equation?
log(RDGP) = y + Slog(En) + e
(a) Economically, does this change the interpretation of the equation? Explain.
(b) Estimate this equation and compare R2 of this equation with that of the previous problem. Also, check if ё = 1/в. Why are they different?
(c) Statistically, by reversing the equation, which assumptions do we violate?
(d) Show that 6/3 = R2.
(e) Effects of changing units in which variables are measured. Suppose you measured energy in BTU’s instead of kilograms of coal equivalents so that the original series was multiplied by 60. How does it change a and в in the following equations?
log(En) = a + elog(RDGP) + u En = a* + в* RGDP + v
Can you explain why 3 changed, but not в for the loglog model, whereas both 3*and
*
в changed for the linear model?
(f) For the loglog specification and the linear specification, compare the GDP elasticity for Malta and W. Germany. Are both equally plausible?
(g) Plot the residuals from both linear and loglog models. What do you observe?
(h) Can you compare the R2 and standard errors from both models in part (g)? Hint: Retrieve log(En) and log(En) in the loglog equation, exponentiate, then compute the residuals and s. These are comparable to those obtained from the linear model.
18. For the model considered in problem 16: log(En) = a + вlog(RGDP) + u and measuring energy in BTU’s (like part (e) of problem 17).
(a) What is the 95% confidence prediction interval at the sample mean?
(b) What is the 95% confidence prediction interval for Malta?
(c) What is the 95% confidence prediction interval for West Germany?
Additional readings on the material covered in this chapter can be found in:
Baltagi, B. H. (1995), “Optimal Weighting of Unbiased Estimators,” Econometric Theory, Problem 95.3.1, 11:637.
Baltagi, B. H. and D. Levin (1992), “Cigarette Taxation: Raising Revenues and Reducing Consumption,” Structural Change and Economic Dynamics, 3: 321335.
Belsley, D. A., E. Kuh and R. E. Welsch (1980), Regression Diagnostics (Wiley: New York).
Greene, W. (1993), Econometric Analysis (Macmillian: New York).
Gujarati, D. (1995), Basic Econometrics (McGrawHill: New York).
Johnston, J. (1984), Econometric Methods (McGrawHill: New York).
Kelejian, H. and W. Oates (1989), Introduction to Econometrics (Harper and Row: New York). Kennedy, P. (1992), A Guide to Econometrics (MIT Press: Cambridge).
Kmenta, J. (1986), Elements of Econometrics (Macmillan: New York).
Maddala, G. S. (1992), Introduction to Econometrics (Macmillan: New York).
Oksanen, E. H. (1993), “Efficiency as Correlation,” Econometric Theory, Problem 93.1.3, 9: 146. SamuelCahn, E. (1994), “Combining Unbiased Estimators,” The American Statistician, 48: 3436. Wallace, D. and L. Silver (1988), Econometrics: An Introduction (Addison Wesley: New York).
Zheng, J. X. (1994), “Efficiency as Correlation,” Econometric Theory, Solution 93.1.3, 10: 228.
Appendix
From the OLS regression on (3.1) we get
Yi Yi + ei ^ 1 > 2,…,n
where Yi — aoLS + Xi(3OLS. Squaring and summing the above equation we get
n Y 2 — n Y 2 І n e2 i=1 Yi i=1 Yi i=1 ei
since En=i Yiei — 0. The uncentered R2 is given by
uncentered R2 — 1 – ЕП=1 ei/ ЕП=1 Yi2 — ЕП=1 Y2/ Ei=i Yi2
Note that the total sum of squares for Yi is not expressed in deviation from the sample mean Y. In other words, the uncentered R2 is the proportion of variation of ЕП= Yf that is explained by the regression on X. Regression packages usually report the centered R2 which was defined in section 3.6 as 1 — (En=1 e2/Y//n=1 у2) where yi — Yi — Y. The latter measure focuses on explaining the variation in Yi after fitting the constant.
From section 3.6, we have seen that a naive model with only a constant in it gives Y as the estimate of the constant, see also problem 2. The variation in Yi that is not explained by this naive model is ЕП^ УЇ — Y/Jn=1(Yi — Y)2. Subtracting nY2 from both sides of (A.2) we get
En=1 yi — £ П=1 Y2 — nY2 + E П=1 e2
and the centered R2 is
centered R2 — 1 — (ЕП=1 e2/ ЕП=1 у2) — (En=1 Yi2 — nY2)/ En=1 y
If there is a constant in the model Y — Y, see section 3.6, and En=1 Yi2 — EE^Y — Y)2 — En=1 — nY2. Therefore, the centered R2 — En=1 Y2/Y/n=1 у2 which is the R2 reported by
regression packages. If there is no constant in the model, some regression packages give you the option of (no constant) and the R2 reported is usually the uncentered R2. Check your regression package documentation to verify what you are getting. We will encounter uncentered R2 again in constructing test statistics using regressions, see for example Chapter 11.
Leave a reply