Empirical Example

Table 3.2 gives (i) the logarithm of cigarette consumption (in packs) per person of smoking age (> 16 years) for 46 states in 1992, (ii) the logarithm of real price of cigarettes in each state, and (iii) the logarithm of real disposable income per capita in each state. This is drawn from Baltagi and Levin (1992) study on dynamic demand for cigarettes. It can be downloaded as Cigarett. dat from the Springer web site.

Table 3.2 Cigarette Consumption Data

LNC: log of consumption (in packs) per person of smoking age (>16)

LNP: log of real price (1983$/pack)

LNY: log of real disposable income per-capita (in thousand 1983$)

OBS

STATE

LNC

LNP

LNY

1

AL

4.96213

0.20487

4.64039

2

AZ

4.66312

0.16640

4.68389

3

AR

5.10709

0.23406

4.59435

4

CA

4.50449

0.36399

4.88147

5

CT

4.66983

0.32149

5.09472

6

DE

5.04705

0.21929

4.87087

7

DC

4.65637

0.28946

5.05960

8

FL

4.80081

0.28733

4.81155

9

GA

4.97974

0.12826

4.73299

10

ID

4.74902

0.17541

4.64307

11

IL

4.81445

0.24806

4.90387

12

IN

5.11129

0.08992

4.72916

13

IA

4.80857

0.24081

4.74211

14

KS

4.79263

0.21642

4.79613

15

KY

5.37906

-0.03260

4.64937

16

LA

4.98602

0.23856

4.61461

17

ME

4.98722

0.29106

4.75501

18

MD

4.77751

0.12575

4.94692

19

MA

4.73877

0.22613

4.99998

20

MI

4.94744

0.23067

4.80620

21

MN

4.69589

0.34297

4.81207

22

MS

4.93990

0.13638

4.52938

23

MO

5.06430

0.08731

4.78189

24

MT

4.73313

0.15303

4.70417

25

NE

4.77558

0.18907

4.79671

26

NV

4.96642

0.32304

4.83816

27

NH

5.10990

0.15852

5.00319

28

NJ

4.70633

0.30901

5.10268

29

NM

4.58107

0.16458

4.58202

30

NY

4.66496

0.34701

4.96075

31

ND

4.58237

0.18197

4.69163

32

OH

4.97952

0.12889

4.75875

33

OK

4.72720

0.19554

4.62730

34

PA

4.80363

0.22784

4.83516

35

RI

4.84693

0.30324

4.84670

36

SC

5.07801

0.07944

4.62549

37

SD

4.81545

0.13139

4.67747

38

TN

5.04939

0.15547

4.72525

39

TX

4.65398

0.28196

4.73437

40

UT

4.40859

0.19260

4.55586

41

VT

5.08799

0.18018

4.77578

42

VA

4.93061

0.11818

4.85490

43

WA

4.66134

0.35053

4.85645

44

WV

4.82454

0.12008

4.56859

45

WI

4.83026

0.22954

4.75826

46

WY

5.00087

0.10029

4.71169

Data: Cigarette Consumption of 46 States in 1992

66 Chapter 3: Simple Linear Regression Table 3.3 Cigarette Consumption Regression

Analysis of Variance

Sum of

Mean

Source

DF

Squares

Square

F Value

Prob > F

Model

1

0.48048

0.48048

18.084

0.0001

Error

44

1.16905

0.02657

Root MSE

0.16300

R-square

0.2913

Dep Mean

4.84784

Adj R-sq

0.2752

C. V.

3.36234

Parameter Estimates

Parameter

Standard

T for H0:

Variable

DF

Estimate

Error

Parameter=0

Prob > |T|

INTERCEP

1

5.094108

0.06269897

81.247

0.0001

LNP

1

-1.198316

0.28178857

-4.253

0.0001

image095

Log of Real Price (1983$/Pack)

Figure 3.9 Residuals Versus LNP

Table 3.3 gives the SAS output for the regression of logC on logP. The price elasticity of demand for cigarettes in this simple model is (dlogC/logP) which is the slope coefficient. This is estimated to be —1.198 with a standard error of 0.282. This says that a 10% increase in real price of cigarettes has an estimated 12% drop in per capita consumption of cigarettes. The R2 of this regression is 0.29, s2 is given by the Mean Square Error of the regression which is 0.0266. Figure 3.9 plots the residuals of this regression versus the independent variable, while Figure

3.10 plots the predictions along with the 95% confidence interval band for these predictions. One observation clearly stands out as an influential observation given its distance from the rest of the data and that is the observation for Kentucky, a producer state with very low real price. This observation almost anchors the straight line fit through the data. More on influential observations in Chapter 8.

.2 .3 .4

Log of Real Price (1983$/Pack)

Figure 3.10 95% Confidence Band for Predicted Values

Problems

Подпись: EEi ei = °> E EieX = ° E
Подпись: 2. image098

1. For the simple regression with a constant Yi = a + /ЗХі + ui, given in equation (3.1) verify the following numerical properties of the OLS estimator:

(c) Show that ciOLS is consistent for a.

(d) Show that cov(aOLS^вOLS) = —Xvar(вOLS) = —a2X/J2"=i x2. This means that the sign of the covariance is determined by the sign of X. If X is positive, this covariance will be negative. This also means that if aOLS is over-estimated, вOLS will be under-estimated.

(b)

image099

Show that EП=1 Xi = 1 and EП=1 XiXi = 0.

(e) Prove that var(a) = ЕЕ Г=1 bi = E E Г=1 X2 + EE IE fi =var(iOLS) + E E n=1f2.

7. (a) Differentiate (3.9) with respect to a and в and show that 3MLE = 3OLS, PmLe = вOLS.

(b) Differentiate (3.9) with respect to a2 and show that H2MLE = EГ=1 e|/n.

8. The t-Statistic in a Simple Regression. It is well known that a standard normal random variable N(0,1) divided by a square root of a chi-squared random variable divided by its degrees of freedom (xV/v)2 results in a random variable that is t-distributed with v degrees of freedom, provided the N(0,1) and the x2 variables are independent, see Chapter 2. Use this fact to show that

9. Relationship Between B2 and r2xy.

(a) Using the fact that R2 = EГ=1 32/ EГ=1 Уі ; Зі = liOLSxE and ][2]ols = EГ=1 xiii/ EГ=1 x2,

r2xy where,

Xiii)2/(E n=1 xi )(E I=1 yi).

Уі + ei, show that n=1 УіУі = E 1 УІ)( Г=1 в2) is equal to R2.

^ = (E n=1 уіУі )2/(E n

10. Prediction. Consider the problem of predicting Y0 from (3.11). Given X0,

image100 image101

(вOLS – e)/[s/(EI=1 xi) 1 ] – tn-2.

(a) Show that E(Y0) = a + вXo.

(b) Show that Y0 is unbiased for E(Y0).

(c) Show that var(Y0) = var(3OLS) + X0var(eOLS) + 2X0cov(3OLS, eOLS). Deduce that var(Y0) = a2[(1/n) + (X0 – .X)2/Е”=1 x2].

(d) Consider a linear predictor of E(Y0), say Y0 = E"=1 a2Y2, show that E"=1 a2 = 1 and E”=1 aiXi = X0 for this predictor to be unbiased for E(Y0).

(e) Show that the var(Y0) = a2 EП=1 a2. Minimize EП=1 a2 subject to the restrictions given in (d). Prove that the resulting predictor is Y0 = 3OLS + eOLSX0 and that the minimum variance is a2 [(1/n) + (X0 – X)2/ EП=1 x2].

11. Optimal Weighting of Unbiased Estimators. This is based on Baltagi (1995). For the simple re­gression without a constant Y2 = eX2 + u2,i = 1, 2,…,N; where в is a scalar and u2 — IID(0, a2) independent of X2. Consider the following three unbiased estimators of в:

y1 = E”=1 XiYi/Y:n=1 X2, 3 = y/X and

y3 = E”=1(Xi – X)(Y – Y)/E”=1(Xi – .X)2, where X = EГ=1 Xi/n and Y = En=1 Yi/n.

(a) Show that cov(в1,в2) = var^) > 0, and that p12 = (the correlation coefficient of в1 and в2) = [var(/31)/var(/32)] 2 with 0 < p12 < 1. Show that the optimal combination of /31 and в2, given by в = ав1 + (1 – а)в2 where – то < a < ж occurs at a* = 1. Optimality here refers to minimizing the variance. Hint: Read the paper by Samuel-Cahn (1994).

(b) Similarly, show that cov(/1, в3) = var(/i) > 0, and that p13 = (the correlation coefficient of ві and /З3) = [var(/31)/var(/33)j 2 = (1 — p^2) 2 with 0 < p13 < 1. Conclude that the optimal combination в1 and в3 is again a* = 1.

(c) Show that cov(/2,в3) = 0 and that optimal combination of в2 and в3 is в = (1 — р22)в3 + р12в2 = в1. This exercise demonstrates a more general result, namely that the BLUE of в in this case в1 , has a positive correlation with any other linear unbiased estimator of в, and that this correlation can be easily computed from the ratio of the variances of these two estimators.

12. Efficiency as Correlation. This is based on Oksanen (1993). Let в denote the Best Linear Unbiased Estimator of в and let в denote any linear unbiased estimator of в. Show that the relative efficiency of в with respect to в is the squared correlation coefficient between в and в. Hint: Compute the variance of в + А(в — в) for any A. This variance is minimized at A = 0 since в is BLUE. This

2

should give you the result that E(в ) = E(вв) which in turn proves the required result, see Zheng (1994).

13. For the numerical illustration given in section 3.9, what happens to the least squares regression coefficient estimates (aoLS ffioLS), s2, the estimated se(aoLS) and se(/oLS), ^-statistic for aoLS and вOLS for ; a = 0, and Hb; в = 0 and R2 when:

(a) Y is regressed on X2 + 5 rather than X2. In other words, we add a constant 5 to each observation of the explanatory variable Xi and rerun the regression. It is very instructive to see how the computations in Table 3.1 are affected by this simple transformation on Xi.

(b) Yi + 2 is regressed on Xi. In other words, a constant 2 is added to Yi.

(c) Yi is regressed on 2Xi. (A constant 2 is multiplied by Xi).

14. For the cigarette consumption data given in Table 3.2.

(a) Give the descriptive statistics for logC, logP and logY. Plot their histogram. Also, plot logC versus logY and logC versus logP. Obtain the correlation matrix of these variables.

(b) Run the regression of logC on logY. What is the income elasticity estimate? What is its standard error? Test the null hypothesis that this elasticity is zero. What is the s and R2 of this regression?

(c) Show that the square of the simple correlation coefficient between logC and logY is equal to R2. Show that the square of the correlation coefficient between the fitted and actual values of logC is also equal to R2 .

(d) Plot the residuals versus income. Also, plot the fitted values along with their 95% confidence band.

15. Consider the simple regression with no constant: Yi = вXi + ui i =1, 2,…,n

where ui ~ IID(0,<t2) independent of Xi. Theil (1971) showed that among all linear estimators in Yi, the minimum mean square estimator for в, i. e., that which minimizes E(в — в)2 is given by

в = в2 E”=1 XiViKp2 E”=1 X2 + a2).

(a) Show that E(в) = в/(1 + c), where c = a2/в2 YI"=1 X2 > 0.

(b) Conclude that the Bias (в) = E(в) — в = — [c/(1 + с)]в. Note that this bias is positive (negative) when в is negative (positive). This also means that в is biased towards zero.

(c) Show that MSE(/3) = E(f3 — в)2 = a2/|Е"=1 X2 + (a2/в2)]. Conclude that it is smaller than the mse(/?ols).

Table 3.4 Energy Data for 20 countries

Country

RGDP

(in 106 1975 U. S.S’s)

EN

106 Kilograms Coal Equivalents

Malta

1251

456

Iceland

1331

1124

Cyprus

2003

1211

Ireland

11788

11053

Norway

27914

26086

Finland

28388

26405

Portugal

30642

12080

Denmark

34540

27049

Greece

38039

20119

Switzerland

42238

23234

Austria

45451

30633

Sweden

59350

45132

Belgium

62049

58894

Netherlands

82804

84416

Turkey

91946

32619

Spain

159602

88148

Italy

265863

192453

U. K.

279191

268056

France

358675

233907

W. Germany

428888

352.677

16. Table 3.4 gives cross-section Data for 1980 on real gross domestic product (RGDP) and aggregate energy consumption (EN) for 20 countries

(a) Enter the data and provide descriptive statistics. Plot the histograms for RGDP and EN. Plot EN versus RGDP.

(b) Estimate the regression:

log(En) = a + (3log(RGDP) + u.

Be sure to plot the residuals. What do they show?

(c) Test в = 1.

(d) One of your Energy data observations has a misplaced decimal. Multiply it by 1000. Now repeat parts (a), (b) and (c).

(e) Was there any reason for ordering the data from the lowest to highest energy consumption? Explain.

Lesson Learned: Always plot the residuals. Always check your data very carefully.

17. Using the Energy Data given in Table 3.4, corrected as in problem 16 part (d), is it legitimate to reverse the form of the equation?

log(RDGP) = y + Slog(En) + e

(a) Economically, does this change the interpretation of the equation? Explain.

(b) Estimate this equation and compare R2 of this equation with that of the previous problem. Also, check if ё = 1/в. Why are they different?

(c) Statistically, by reversing the equation, which assumptions do we violate?

(d) Show that 6/3 = R2.

(e) Effects of changing units in which variables are measured. Suppose you measured energy in BTU’s instead of kilograms of coal equivalents so that the original series was multiplied by 60. How does it change a and в in the following equations?

log(En) = a + elog(RDGP) + u En = a* + в* RGDP + v

Can you explain why 3 changed, but not в for the log-log model, whereas both 3*and

*

в changed for the linear model?

(f) For the log-log specification and the linear specification, compare the GDP elasticity for Malta and W. Germany. Are both equally plausible?

(g) Plot the residuals from both linear and log-log models. What do you observe?

(h) Can you compare the R2 and standard errors from both models in part (g)? Hint: Retrieve log(En) and log(En) in the log-log equation, exponentiate, then compute the residuals and s. These are comparable to those obtained from the linear model.

18. For the model considered in problem 16: log(En) = a + вlog(RGDP) + u and measuring energy in BTU’s (like part (e) of problem 17).

(a) What is the 95% confidence prediction interval at the sample mean?

(b) What is the 95% confidence prediction interval for Malta?

(c) What is the 95% confidence prediction interval for West Germany?

References

Additional readings on the material covered in this chapter can be found in:

Baltagi, B. H. (1995), “Optimal Weighting of Unbiased Estimators,” Econometric Theory, Problem 95.3.1, 11:637.

Baltagi, B. H. and D. Levin (1992), “Cigarette Taxation: Raising Revenues and Reducing Consumption,” Structural Change and Economic Dynamics, 3: 321-335.

Belsley, D. A., E. Kuh and R. E. Welsch (1980), Regression Diagnostics (Wiley: New York).

Greene, W. (1993), Econometric Analysis (Macmillian: New York).

Gujarati, D. (1995), Basic Econometrics (McGraw-Hill: New York).

Johnston, J. (1984), Econometric Methods (McGraw-Hill: New York).

Kelejian, H. and W. Oates (1989), Introduction to Econometrics (Harper and Row: New York). Kennedy, P. (1992), A Guide to Econometrics (MIT Press: Cambridge).

Kmenta, J. (1986), Elements of Econometrics (Macmillan: New York).

Maddala, G. S. (1992), Introduction to Econometrics (Macmillan: New York).

Oksanen, E. H. (1993), “Efficiency as Correlation,” Econometric Theory, Problem 93.1.3, 9: 146. Samuel-Cahn, E. (1994), “Combining Unbiased Estimators,” The American Statistician, 48: 34-36. Wallace, D. and L. Silver (1988), Econometrics: An Introduction (Addison Wesley: New York).

Zheng, J. X. (1994), “Efficiency as Correlation,” Econometric Theory, Solution 93.1.3, 10: 228.

Appendix

Centered and Uncentered R2

From the OLS regression on (3.1) we get

Подпись: (A.1)Подпись: (A.2)Подпись: (A.3)Yi Yi + ei ^ 1 > 2,…,n

where Yi — aoLS + Xi(3OLS. Squaring and summing the above equation we get

n Y 2 — n Y 2 І n e2 i=1 Yi i=1 Yi i=1 ei

since En=i Yiei — 0. The uncentered R2 is given by

uncentered R2 — 1 – ЕП=1 ei/ ЕП=1 Yi2 — ЕП=1 Y2/ Ei=i Yi2

Note that the total sum of squares for Yi is not expressed in deviation from the sample mean Y. In other words, the uncentered R2 is the proportion of variation of ЕП= Yf that is explained by the regression on X. Regression packages usually report the centered R2 which was defined in section 3.6 as 1 — (En=1 e2/Y//n=1 у2) where yi — Yi — Y. The latter measure focuses on explaining the variation in Yi after fitting the constant.

From section 3.6, we have seen that a naive model with only a constant in it gives Y as the estimate of the constant, see also problem 2. The variation in Yi that is not explained by this naive model is ЕП^ УЇ — Y/Jn=1(Yi — Y)2. Subtracting nY2 from both sides of (A.2) we get

En=1 yi — £ П=1 Y2 — nY2 + E П=1 e2

and the centered R2 is

Подпись: (A.4)centered R2 — 1 — (ЕП=1 e2/ ЕП=1 у2) — (En=1 Yi2 — nY2)/ En=1 y

If there is a constant in the model Y — Y, see section 3.6, and En=1 Yi2 — EE^Y — Y)2 — En=1 — nY2. Therefore, the centered R2 — En=1 Y2/Y/n=1 у2 which is the R2 reported by

regression packages. If there is no constant in the model, some regression packages give you the option of (no constant) and the R2 reported is usually the uncentered R2. Check your regression package documentation to verify what you are getting. We will encounter uncentered R2 again in constructing test statistics using regressions, see for example Chapter 11.

CHAPTER 4

Leave a reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>