Dummy Variables
Many explanatory variables are qualitative in nature. For example, the head of a household could be male or female, white or nonwhite, employed or unemployed. In this case, one codes these variables as “M” for male and “F” for female, or change this qualitative variable into a quantitative variable called FEMALE which takes the value “0” for male and “1” for female. This obviously begs the question: “why not have a variable MALE that takes on the value 1 for male and 0 for female?” Actually, the variable MALE would be exactly 1FEMALE. In other words, the zero and one can be thought of as a switch, which turns on when it is 1 and off when it is 0. Suppose that we are interested in the earnings of households, denoted by EARN, and MALE and FEMALE are the only explanatory variables available, then problem 10 asks the reader to verify that running OLS on the following model:
EARN = ам MALE + aF FEMALE + u (4.21)
gives aM = “average earnings of the males in the sample” and aF = “average earnings of the females in the sample.” Notice that there is no intercept in (4.21), this is because of what is known in the literature as the “dummy variable trap.” Briefly stated, there will be perfect multicollinearity between MALE, FEMALE and the constant. In fact, MALE + FEMALE =
1. Some researchers may choose to include the intercept and exclude one of the sex dummy variables, say MALE, then
EARN = a + в FEMALE + u (4.22)
and the OLS estimates give a = “average earnings of males in the sample” = aM, while в = aF – aM = “the difference in average earnings between females and males in the sample.” Regression (4.22) is more popular when one is interested in contrasting the earnings between males and females and obtaining with one regression the markup or markdown in average earnings (aF – aM) as well as the test of whether this difference is statistically different from zero. This would be simply the tstatistic on в in (4.22). On the other hand, if one is interested in estimating the average earnings of males and females separately, then model (4.21) should be the one to consider. In this case, the ttest for a. F – ам = 0 would involve further calculations not directly given from the regression in (4.21) but similar to the calculations given in Example 3.
What happens when another qualitative variable is included, to depict another classification of the individuals in the sample, say for example, race? If there are three race groups in the sample, WHITE, BLACK and HISPANIC. One could create a dummy variable for each of these classifications. For example, WHITE will take the value 1 when the individual is white and 0 when the individual is nonwhite. Note that the dummy variable trap does not allow the inclusion of all three categories as they sum up to 1. Also, even if the intercept is dropped, once MALE and FEMALE are included, perfect multicollinearity is still present because MALE + FEMALE = WHITE + BLACK + HISPANIC. Therefore, one category from race should be dropped. Suits (1984) argues that the researcher should use the dummy variable category omission to his or her advantage, in interpreting the results, keeping in mind the purpose of the study. For example, if one is interested in comparing earnings across the sexes holding race constant, the omission of MALE or FEMALE is natural, whereas, if one is interested in the race differential in earnings holding gender constant, one of the race variables should be omitted. Whichever variable is omitted, this becomes the base category for which the other earnings are compared. Most researchers prefer to keep an intercept, although regression packages allow for a no intercept option. In this case one should omit one category from each of the race and sex classifications. For example, if MALE and WHITE are omitted:
EARN = a + eF FEMALE + вв BLACK + вН HISPANIC + u (4.23)
Assuming the error u satisfies all the classical assumptions, and taking expected values of both sides of (4.23), one can see that the intercept a = the expected value of earnings of the omitted category which is “white males”. For this category, all the other switches are off. Similarly, a + вf is the expected value of earnings of “white females,” since the FEMALE switch is on. One can conclude that eF = difference in the expected value of earnings between white females and white males. Similarly, one can show that a + вв is the expected earnings of “black males” and a + eF + в в is the expected earnings of “black females.” Therefore, eF represents the difference in expected earnings between black females and black males. In fact, problem 11 asks the reader to show that вf represents the difference in expected earnings between hispanic females and hispanic males. In other words, вF represents the differential in expected earnings between females and males holding race constant. Similarly, one can show that вв is the difference in expected earnings between blacks and whites holding sex constant, and вН is the differential in expected earnings between hispanics and whites holding sex constant. The main key to the interpretation of the dummy variable coefficients is to be able to turn on and turn off the proper switches, and write the correct expectations.
The real regression will contain other quantitative and qualitative variables, like
EARN = a + вF FEMALE + вв BLACK + вН HISPANIC + y 4 EXP (4.24)
+Y2EXP2 + y 3 EDUC + y4 UNION + u
where EXP is years of job experience, EDUC is years of education, and UNION is 1 if the individual belongs to a union and 0 otherwise. EXP2 is the squared value of EXP. Once again, one can interpret the coefficients of these regressions by turning on or off the proper switches. For example, y4 is interpreted as the expected difference in earnings between union and nonunion members holding all other variables included in (4.24) constant. Halvorsen and Palmquist (1980) warn economists about the interpretation of dummy variable coefficients when the dependent variable is in logs. For example, if the earnings equation is semilogarithmic:
log(Earnings) = a + в UNION + yEDUC + u
then y = % change in earnings for one extra year of education, holding union membership constant. But, what about the returns for union membership? If we let Y = log(Earnings) when the individual belongs to a union, and Y0 = log(Earnings) when the individual does not belong to a union, then g = % change in earnings due to union membership = (eYl — eYo )/eYo. Equivalently, one can write that log(1 + g) = Yi — Y0 = в, or that g = ee — 1. In other words, one should not hasten to conclude that в has the same interpretation as 7. In fact, the % change in earnings due to union membership is ee — 1 and not в. The error involved in using в rather than ee — 1 to estimate g could be substantial, especially if в is large. For example, when /3 = 0.5, 0.75,1; 3 = ee — 1 = 0.65,1.12,1.72, respectively. Kennedy (1981) notes that if /3 is unbiased for в, 3 is not necessarily unbiased for g. However, consistency of в implies consistency for 3. If one assumes lognormal distributed errors, then E(ee) = ee+05Var(e). Based on this result, Kennedy (1981) suggests estimating g by 3 = ee+0J5Ya’r^e)1, where Var(/3) is a consistent estimate of Var^).
Another use of dummy variables is in taking into account seasonal factors, i. e., including 3 seasonal dummy variables with the omitted season becoming the base for comparison. i For example:
Sales = a + вш Winter + в s Spring + вР Fall + 71Price + u (4.25)
the omitted season being the Summer season, and if (4.25) models the sales of airconditioning units, then вf is the difference in expected sales between the Fall and Summer seasons, holding the price of an airconditioning unit constant. If these were heating units one may want to change the base season for comparison.
Another use of dummy variables is for War years, where consumption is not at its normal level say due to rationing. Consider estimating the following consumption function
Ct = a + вYt + SWAR t + ut t = 1,2,…,T (4.26)
where Ct denotes real per capita consumption, Yt denotes real per capita personal disposable income, and WARt is a dummy variable taking the value 1 if it is a War time period and 0 otherwise. Note that the War years do not affect the slope of the consumption line with respect to income, only the intercept. The intercept is a in nonWar years and a + S in War years. In other words, the marginal propensity out of income is the same in War and nonWar years, only the level of consumption is different.
Of course, one can dummy other unusual years like periods of strike, years of natural disaster, earthquakes, floods, hurricanes, or external shocks beyond control, like the oil embargo of 1973. If this dummy includes only one year like 1973, then the dummy variable for 1973, call it D73, takes the value 1 for 1973 and zero otherwise. Including D73 as an extra variable in the regression has the effect of removing the 1973 observation from estimation purposes, and the resulting regression coefficients estimates are exactly the same as those obtained excluding the 1973 observation and its corresponding dummy variable. In fact, using matrix algebra in Chapter 7, we will show that the coefficient estimate of D73 is the forecast error for 1973, using the regression that ignores the 1973 observations. In addition, the standard error of the dummy coefficient estimates is the standard error of this forecast. This is a much easier way of obtaining the forecast error and its standard error from the regression package without additional computations, see Salkever (1976). More on this in Chapter 7.
Interaction Effects
So far the dummy variables have been used to shift the intercept of the regression keeping the slopes constant. One can also use the dummy variables to shift the slopes by letting them interact with the explanatory variables. For example, consider the following earnings equation:
EARN = a + aF FEMALE + /3EDUC + u (4.27)
In this regression, only the intercept shifts from males to females. The returns to an extra year of education is simply p, which is assumed to be the same for males as well as females. But if we now introduce the interaction variable (FEMALE x EDUC), then the regression becomes:
EARN = a + aFFEMALE + pEDUC + y(FEMALE x EDUC) + u (4.28)
In this case, the returns to an extra year of education depends upon the sex of the individual. In fact, d(EARN)/d(EDUC) = p + 7(FEMALE) = p if male, and p + 7 if female. Note that the interaction variable = EDUC if the individual is female and 0 if the individual is male.
Estimating (4.28) is equivalent to estimating two earnings equations, one for males and another one for females, separately. The only difference is that (4.28) imposes the same variance across the two groups, whereas separate regressions do not impose this, albeit restrictive, equality of the variances assumption. This setup is ideal for testing the equality of slopes, equality of intercepts, or equality of both intercepts and slopes across the sexes. This can be done with the Ftest described in (4.17). In fact, for Ho; equality of slopes, given different intercepts, the restricted residuals sum of squares (RRSS) is obtained from (4.27), while the unrestricted residuals sum of squares (URSS) is obtained from (4.28). Problem 12 asks the reader to set up the Ftest for the following null hypothesis: (i) equality of slopes and intercepts, and (ii) equality of intercepts given the same slopes.
Dummy variables have many useful applications in economics. For example, several tests including the Chow (1960) test, and Utts (1982) Rainbow test described in Chapter 8, can be applied using dummy variable regressions. Additionally, they can be used in modeling splines, see Poirier (1976) and Suits, Mason and Chan (1978), and fixed effects in panel data, see Chapter 12. Finally, when the dependent variable is itself a dummy variable, the regression equation needs special treatment, see Chapter 13 on qualitative limited dependent variables.
Empirical Example: Table 4.1 gives the results of a regression on 595 individuals drawn from the Panel Study of Income Dynamics (PSID) in 1982. This data is provided on the Springer web site as EARN. ASC. A description of the data is given in Cornwell and Rupert (1988). In particular, log wage is regressed on years of education (ED), weeks worked (WKS), years of fulltime work experience (EXP), occupation (OCC = 1, if the individual is in a bluecollar occupation), residence (SOUTH = 1, SMSA = 1, if the individual resides in the South, or in a standard metropolitan statistical area), industry (IND = 1, if the individual works in a manufacturing industry), marital status (MS = 1, if the individual is married), sex and race (FEM = 1, BLK = 1, if the individual is female or black), union coverage (UNION = 1, if the individual’s wage is set by a union contract). These results show that the returns to an extra year of schooling is 5.7%, holding everything else constant. It shows that Males on the average earn more than Females. Blacks on the average earn less than Whites, and Union workers earn more than nonunion workers. Individuals residing in the South earn less than those living elsewhere. Those residing in a standard metropolitan statistical area earn more on the average than those
Dependent Variable: LWAGE Analysis of Variance
Sum of 
Mean 

Source 
DF 
Squares 
Square 
F Value 
Prob > F 
Model 
12 
52.48064 
4.37339 
41.263 
0.0001 
Error 
582 
61.68465 
0.10599 

C Total 
594 
114.16529 

Root MSE 
0.32556 
Rsquare 
0.4597 

Dep Mean 
6.95074 
Adj Rsq 
0.4485 

C. V. 
4.68377 

Parameter Estimates 

Parameter 
Standard 
T for H0: 

Variable 
DF 
Estimate 
Error 
Parameter=0 
Prob > T 
INTERCEP 
1 
5.590093 
0.19011263 
29.404 
0.0001 
WKS 
1 
0.003413 
0.00267762 
1.275 
0.2030 
SOUTH 
1 
0.058763 
0.03090689 
1.901 
0.0578 
SMSA 
1 
0.166191 
0.02955099 
5.624 
0.0001 
MS 
1 
0.095237 
0.04892770 
1.946 
0.0521 
EXP 
1 
0.029380 
0.00652410 
4.503 
0.0001 
EXP2 
1 
0.000486 
0.00012680 
3.833 
0.0001 
OCC 
1 
0.161522 
0.03690729 
4.376 
0.0001 
IND 
1 
0.084663 
0.02916370 
2.903 
0.0038 
UNION 
1 
0.106278 
0.03167547 
3.355 
0.0008 
FEM 
1 
0.324557 
0.06072947 
5.344 
0.0001 
1 
0.190422 
0.05441180 
3.500 
0.0005 

ED 
1 
0.057194 
0.00659101 
8.678 
0.0001 
who do not. Individuals who work in a manufacturing industry or are not blue collar workers or are married earn more on the average than those who are not. For EXP2 = (EXP)2, this regression indicates a significant quadratic relationship between earnings and experience. All the variables were significant at the 5% level except for WKS, SOUTH and MS.
1. There are more sophisticated ways of seasonal adjustment than introducing seasonal dummies, see Judge et al. (1985).
1. For the Cigarette Data given in Table 3.2. Run the following regressions:
(a) Real per capita consumption of cigarettes on real price and real per capita income. (All variables are in log form, and all regressions in this problem include a constant).
(b) Real per capita consumption of cigarettes on real price.
(c) Real per capita income on real price.
(d) Real per capita consumption on the residuals of part (c).
(e) Residuals from part (b) on the residuals in part (c).
(f) Compare the regression slope estimates in parts (d) and (e) with the regression coefficient estimate of the real income coefficient in part (a), what do you conclude?
2. Simple Versus Multiple Regression Coefficients. This is based on Baltagi (1987b). Consider the multiple regression
Yi = a + в2 X2i + в3Х3і + ui І = 1 2,…,n
along with the following auxiliary regressions:
X2i = a + l>X3i + V2i
X3i = c + dX2i + V3i
In section 4.3, we showed that в2, the OLS estimate of в2 can be interpreted as a simple regression of Y on the OLS residuals C2. A similar interpretation can be given to /З3. Kennedy (1981, p. 416) claims that в2 is not necessarily the same as S2, the OLS estimate of S2 obtained from the regression Y on C2, c3 and a constant, Yi = 7 + S2v2i + S3v3i + wi. Prove this claim by finding a relationship between the e’s and the As.
3. For the simple regression Yi = a + pXi + ui considered in Chapter 3, show that
(a) Pols = $^n=i xiVi/Yln=i x2 can be obtained using the residual interpretation by regressing X on a constant first, getting the residuals a and then regressing Y on C.
(b) aOLS = Y — вOLSX can be obtained using the residual interpretation by regressing 1 on X and obtaining the residuals cc and then regressing Y on CC.
(c) Check the var(aOLS) and var(POLS) in parts (a) and (b) with those obtained from the residualing interpretation.
4. Effect of Additional Regressors on R2. This is based on Nieswiadomy (1986).
(a) Suppose that the multiple regression given in (4.1) has K1 regressors in it. Denote the least squares sum of squared errors by SSE1. Now add K2 regressors so that the total number of regressors is K = K1 + K2. Denote the corresponding least squares sum of squared errors by SSE2. Show that SSE2 < SSE1, and conclude that the corresponding Rsquares satisfy R2 > R2.
(b) Derive the equality given in (4.16) starting from the definition of R2 and R2.
(c) Show that the corresponding Rsquares satisfy R2 > R2, when the Fstatistic for the joint significance of these additional K2 regressors is less than or equal to one.
5. Perfect Multicollinearity. Let Y be the output and X2 = skilled labor and X3 = unskilled labor in the following relationship:
Yi = a + в2 X2i + P3X3i + в4 (X2i + X3i) + в 5X2i + e6X3i + ui
What parameters are estimable by OLS?
6. Suppose that we have estimated the parameters of the multiple regression model:
Yt = в1 + e2Xt2 + e3Xt3 + ut
by Ordinary Least Squares (OLS) method. Denote the estimated residuals by (et, t = 1,…,T) and the predicted values by (Yt, t = 1,…,T).
(a) What is the R2 of the regression of e on a constant, X2 and X3?
(b) If we regress Y on a constant and Y, what are the estimated intercept and slope coefficients? What is the relationship between the R2 of this regression and the R2 of the original regression?
(c) If we regress Y on a constant and e, what are the estimated intercept and slope coefficients? What is the relationship between the R2 of this regression and the R2 of the original regression?
(d) Suppose that we add a new explanatory variable X4 to the original model and reestimate the parameters by OLS. Show that the estimated coefficient of X4 and its estimated standard error will be the same as in the OLS regression of e on a constant, X2, X3 and X4.
7. Consider the CobbDouglas production function in example 5. How can you test for constant returns to scale using a tstatistic from the unrestricted regression given in (4.18).
8. Testing Multiple Restrictions. For the multiple regression given in (4.1). Set up the Fstatistic described in (4.17) for testing
(a) Ho; P2 = в4 = Ре.
(b) Ho; в2 = вз and въ — ве = 1.
9. Monte Carlo Experiments. Hanushek and Jackson (1977, pp. 6065) generated the following data
Yi = 15 + 1X2i + 2X3i + Ui for i = 1, 2,…, 25 with a fixed set of X2i and X3i, and ui, s that
are IID ~ N(0,100). For each set of 25 ui, s drawn randomly from the normal distribution, a corresponding set of 25 Yj’s are created from the above equation. Then OLS is performed on the resulting data set. This can be repeated as many times as we can afford. 400 replications were performed by Hanushek and Jackson. This means that they generated 400 data sets each of size 25 and ran 400 regressions giving 400 OLS estimates of а, в2, в3 and a2. The classical assumptions are satisfied for this model, by construction, so we expect these OLS estimators to be BLUE, MLE and efficient.
(a) Replicate the Monte Carlo experiments of Hanushek and Jackson (1977) and generate the means of the 400 estimates of the regression coefficients as well as a2. Are these estimates unbiased?
(b) Compute the standard deviation of these 400 estimates and call this ab. Also compute the average of the 400 standard errors of the regression estimates reported by the regression. Denote this mean by sb. Compare these two estimates of the standard deviation of the regression coefficient estimates to the true standard deviation knowing the true a2. What do you conclude?
(c) Plot the frequency of these regression coefficients estimates? Does it resemble its theoretical distribution.
(d) Increase the sample size form 25 to 50 and repeat the experiment. What do you observe?
10. Female and Male Dummy Variables.
(a) Derive the OLS estimates of ар and ам for Yi = арFi + амMi + ui where Y is Earnings, F is FEMALE and M is MALE, see (4.21). Show that ар = Yp, the average of the Yi, s only for females, and SM = YM, the average of the Yi, s only for males.
(b) Suppose that the regression is Yi = а + вFi + ui, see (4.22). Show that а = SM, and
в = ар — ам.
(c) Substitute M = 1 — F in (4.21) and show that а = ам and в = ар — ам.
(d) Verify parts (a), (b) and (c) using the earnings data underlying Table 4.1.
11. Multiple Dummy Variables. For equation (4.23)
EARN = a + eF FEMALE + f3B BLACK + /3H HISPANIC + u Show that
(a) E(Earnings/Hispanic Female) = a + @F + @H; also E(Earnings/Hispanic Male) = a + @H. Conclude that @F = E(Earnings/Hispanic Female) – E(Earnings/Hispanic Male).
(b) E(Earnings/Hispanic Female) – E(Earnings/White Female) = E(Earnings/Hispanic Male) – E(Earnings/White Male) = [IH.
(c) E(Earnings/Black Female) – E(Earnings/White Female) = E(Earnings/Black Male) – E(Earnings/White Male) = (3B.
12. For the earnings equation given in (4.28), how would you set up the Ftest and what are the restricted and unrestricted regressions for testing the following hypotheses:
(a) The equality of slopes and intercepts for Males and Females.
(b) The equality of intercepts given the same slopes for Males and Females. Show that the resulting Fstatistic is the square of a tstatistic from the unrestricted regression.
(c) The equality of intercepts allowing for different slopes for Males and Females. Show that the resulting Fstatistic is the square of a tstatistic from the unrestricted regression.
(d) Apply your results in parts (a), (b) and (c) to the earnings data underlying Table 4.1.
13. For the earnings data regression underlying Table 4.1.
(a) Replicate the regression results given in Table 4.1.
(b) Verify that the joint significance of all slope coefficients can be obtained from (4.20).
(c) How would you test the joint restriction that expected earnings are the same for Males and Females whether Black or NonBlack holding everything else constant?
(d) How would you test the joint restriction that expected earnings are the same whether the individual is married or not and whether this individual belongs to a Union or not?
(e) From Table 4.1 what is your estimate of the % change in earnings due to Union membership? If the disturbances are assumed to be lognormal, what would be the estimate suggested by Kennedy (1981) for this % change in earnings?
(f) What is your estimate of the % change in earnings due to the individual being married?
14. Crude Quality. Using the data set of U. S. oil field postings on crude prices ($/barrel), gravity (degree API) and sulphur (% sulphur) given in the CRUDES. ASC file on the Springer web site.
(a) Estimate the following multiple regression model: POIL = ,^+^GRAVITY + в3 SULPHUR
+ £.
(b) Regress GRAVITY = a0 + a1SULPHUR + vt then compute the residuals (vt). Now perform the regression
POIL = Yi + Y2vt + £
Verify that y2 is the same as в2 in part (a). What does this tell you?
(c) Regress POIL = ф1 + ^2SULPHUR + w. Compute the residuals (w). Now regress w on v obtained from part (b), to get Wt = S1 + S2vt+ residuals. Show that S2 = в2 in part (a). Again, what does this tell you?
(d) To illustrate how additional data affects multicollinearity, show how your regression in part
(a) changes when the sample is restricted to the first 25 crudes.
(e) Delete all crudes with sulphur content outside the range of 1 to 2 percent and run the multiple regression in part (a). Discuss and interpret these results.
Year 
CAR 
QMG (1,000 Gallons) 
PMG ($) 
POP (1,000) 
RGNP (Billion) 
PGNP 
1950 
49195212 
40617285 
0.272 
152271 
1090.4 
26.1 
1951 
51948796 
43896887 
0.276 
154878 
1179.2 
27.9 
1952 
53301329 
46428148 
0.287 
157553 
1226.1 
28.3 
1953 
56313281 
49374047 
0.290 
160184 
1282.1 
28.5 
1954 
58622547 
51107135 
0.291 
163026 
1252.1 
29.0 
1955 
62688792 
54333255 
0.299 
165931 
1356.7 
29.3 
1956 
65153810 
56022406 
0.310 
168903 
1383.5 
30.3 
1957 
67124904 
57415622 
0.304 
171984 
1410.2 
31.4 
1958 
68296594 
59154330 
0.305 
174882 
1384.7 
32.1 
1959 
71354420 
61596548 
0.311 
177830 
1481.0 
32.6 
1960 
73868682 
62811854 
0.308 
180671 
1517.2 
33.2 
1961 
75958215 
63978489 
0.306 
183691 
1547.9 
33.6 
1962 
79173329 
62531373 
0.304 
186538 
1647.9 
34.0 
1963 
82713717 
64779104 
0.304 
189242 
1711.6 
34.5 
1964 
86301207 
67663848 
0.312 
191889 
1806.9 
35.0 
1965 
90360721 
70337126 
0.321 
194303 
1918.5 
35.7 
1966 
93962030 
73638812 
0.332 
196560 
2048.9 
36.6 
1967 
96930949 
76139326 
0.337 
198712 
2100.3 
37.8 
1968 
101039113 
80772657 
0.348 
200706 
2195.4 
39.4 
1969 
103562018 
85416084 
0.357 
202677 
2260.7 
41.2 
1970 
106807629 
88684050 
0.364 
205052 
2250.7 
43.4 
1971 
111297459 
92194620 
0.361 
207661 
2332.0 
45.6 
1972 
117051638 
95348904 
0.388 
209896 
2465.5 
47.5 
1973 
123811741 
99804600 
0.524 
211909 
2602.8 
50.2 
127951254 
100212210 
0.572 
213854 
2564.2 
55.1 

1975 
130918918 
102327750 
0.595 
215973 
2530.9 
60.4 
1976 
136333934 
106972740 
0.631 
218035 
2680.5 
63.5 
1977 
141523197 
110023410 
0.657 
220239 
2822.4 
67.3 
1978 
146484336 
113625960 
0.678 
222585 
3115.2 
72.2 
1979 
149422205 
107831220 
0.857 
225055 
3192.4 
78.6 
1980 
153357876 
100856070 
1.191 
227757 
3187.1 
85.7 
1981 
155907473 
100994040 
1.311 
230138 
3248.8 
94.0 
1982 
156993694 
100242870 
1.222 
232520 
3166.0 
100.0 
1983 
161017926 
101515260 
1.157 
234799 
3279.1 
103.9 
1984 
163432944 
102603690 
1.129 
237001 
3489.9 
107.9 
1985 
168743817 
104719230 
1.115 
239279 
3585.2 
111.5 
1986 
173255850 
107831220 
0.857 
241613 
3676.5 
114.5 
1987 
177922000 
110467980 
0.897 
243915 
3847.0 
117.7 
Table 4.2 U. S. Gasoline Data: 19501987 
15. Consider the U. S. gasoline data from 19501987 given in Table 4.2, and obtained from the file USGAS. ASC on the Springer web site.
(a) For the period 19501972 estimate models (1) and (2):
logQMG = в1 + e2logCAR + в3 logPOP + f34logRGNP (1)
+@5logPGNP + вб logPMG + u
, QMG, RGNP, CAR, PMG
og CAR = Yl + 72 og POP + 73 ogPOP + 74 ogPGNP + ^ (2)
(b) What restrictions should the e’s satisfy in model (1) in order to yield the y’s in model (2)?
(c) Compare the estimates and the corresponding standard errors from models (1) and (2).
(d) Compute the simple correlations among the X’s in model (1). What do you observe?
(e) Use the ChowF test to test the parametric restrictions obtained in part (b).
(f) Estimate equations (1) and (2) now using the full data set 19501987. Discuss briefly the effects on individual parameter estimates and their standard errors of the larger data set.
(g) Using a dummy variable, test the hypothesis that gasoline demand per CAR permanently shifted downward for model (2) following the Arab Oil Embargo in 1973?
(h) Construct a dummy variable regression that will test whether the price elasticity has changed after 1973.
16. Consider the following model for the demand for natural gas by residential sector, call it model (1):
logConsit = во + eilogPgit + e2logPoit + e3logPeit + e4logHDDit + в ^ogPIit + uu
where i = 1, 2,…, 6 states and t = 1, 2,…, 23 years. Cons is the consumption of natural gas by residential sector, Pg, Po and Pe are the prices of natural gas, distillate fuel oil, and electricity of the residential sector. HDD is heating degree days and PI is real per capita personal income. The data covers 6 states: NY, FL, MI, TX, UT and CA over the period 19671989. It is given in the NATURAL. ASC file on the Springer web site.
(a) Estimate the above model by OLS. Call this model (1). What do the parameter estimates imply about the relationship between the fuels?
(b) Plot actual consumption versus the predicted values. What do you observe?
(c) Add a dummy variable for each state except California and run OLS. Call this model (2). Compute the parameter estimates and standard errors and compare to model (1). Do any of the interpretations of the price coefficients change? What is the interpretation of the New York dummy variable? What is the predicted consumption of natural gas for New York in 1989?
(d) Test the hypothesis that the intercepts of New York and California are the same.
(e) Test the hypothesis that all the states have the same intercept.
(f) Add a dummy variable for each state and run OLS without an intercept. Call this model (3). Compare the parameter estimates and standard errors to the first two models. What is the interpretation of the coefficient of the New York dummy variable? What is the predicted consumption of natural gas for New York in 1989?
(g) Using the regression in part (f), test the hypothesis that the intercepts of New York and California are the same.
This chapter draws upon the material in Kelejian and Oates (1989) and Wallace and Silver (1988). Several econometrics books have an excellent discussion on dummy variables, see Gujarati (1978), Judge et al. (1985), Kennedy (1992), Johnston (1984) and Maddala (2001), to mention a few. Other readings referenced in this chapter include:
Baltagi, B. H. (1987a), “To Pool or Not to Pool: The Quality Bank Case,” The American Statistician, 41: 150152.
Baltagi, B. H. (1987b), “Simple versus Multiple Regression Coefficients,” Econometric Theory, Problem
87.1.1, 3: 159.
Chow, G. C. (1960), “Tests of Equality Between Sets of Coefficients in Two Linear Regressions,” Econo – metrica, 28: 591605.
Cornwell, C. and P. Rupert (1988), “Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variables Estimators,” Journal of Applied Econometrics, 3: 149155.
Dufour, J. M. (1980), “Dummy Variables and Predictive Tests for Structural Change,” Economics Letters, 6: 241247.
Dufour, J. M. (1982), “Recursive Stability of Linear Regression Relationships,” Journal of Econometrics, 19: 3176.
Gujarati, D. (1970), “Use of Dummy Variables in Testing for Equality Between Sets of Coefficients in Two Linear Regressions: A Note,” The American Statistician, 24: 1821.
Gujarati, D. (1970), “Use of Dummy Variables in Testing for Equality Between Sets of Coefficients in Two Linear Regressions: A Generalization,” The American Statistician, 24: 5052.
Halvorsen, R. and R. Palmquist (1980), “The Interpretation of Dummy Variables in Semilogarithmic Equations,” American Economic Review, 70: 474475.
Hanushek, E. A. and J. E. Jackson (1977), Statistical Methods for Social Scientists (Academic Press: New York).
Hill, R. Carter and L. C. Adkins (2001), “Collinearity,” Chapter 12 in B. H. Baltagi (ed.) A Companion to Theoretical Econometrics (Blackwell: Massachusetts).
Kennedy, P. E. (1981), “Estimation with Correctly Interpreted Dummy Variables in Semilogarithmic Equations,” American Economic Review, 71: 802.
Kennedy, P. E. (1981), “The Balentine: A Graphical Aid for Econometrics,” Australian Economic Papers, 20: 414416.
Kennedy, P. E. (1986), “Interpreting Dummy Variables,” Review of Economics and Statistics, 68: 174175.
Nieswiadomy, M. (1986), “Effect of an Additional Regressor on R2,” Econometric Theory, Problem
86.3.1, 2:442.
Poirier, D. (1976), The Econometrics of Structural Change (North Holland: Amsterdam).
Salkever, D. (1976), “The Use of Dummy Variables to Compute Predictions, Prediction Errors, and Confidence Intervals,” Journal of Econometrics, 4: 393397.
Suits, D. (1984), “Dummy Variables: Mechanics vs Interpretation,” Review of Economics and Statistics, 66: 132139.
Suits, D. B., A. Mason and L. Chan (1978), “Spline Functions Fitted by Standard Regression Methods,” Review of Economics and Statistics, 60: 132139.
Utts, J. (1982), “The Rainbow Test for Lack of Fit in Regression,” Communications in StatisticsTheory and Methods, 11: 18011815.
Appendix
Leave a reply