Indicator Variables

Indicator variables allow us to construct models in which some or all of the parameters of a model can change for subsets of the sample. As discussed in chapter 2, an indicator variable basically indicates whether a certain condition is met. If it does the variable is equal to 1 and if not, it is 0. They are often referred to as dummy variables, and gretl uses this term in a utility that is used to create indicator variables.

The example used in this section is again based on the utown. gdt real estate data. First we will open the dataset and examine the data.

1 open "@gretldirdatapoeutown. gdt"

2 smpl 1 8

3 print price sqft age utown pool fplace —byobs

4 smpl full

5 summary

The sample is limited to the first 8 observations in line 2. The two numbers that follow the smpl command indicate where the subsample begins and where it ends. Logical statements can be used as well to restrict the sample. Examples of this will be given later. In the current case, eight observations are enough to see that price and sqft are continuous, that age is discrete, and that utown, pool, and fplace are likely to be indicator variables. The print statement is used with the —byobs option so that the listed variables are printed in columns.

price

sqft

age

utown

pool

fplace

1

205.452

23.46

6

0

0

1

2

185.328

20.03

5

0

0

1

3

248.422

27.77

6

0

0

0

4

154.690

20.17

1

0

0

0

5

221.801

26.45

0

0

0

1

6

199.119

21.56

6

0

0

1

7

272.134

29.91

9

0

0

1

8

250.631

27.98

0

0

0

1

The sample is restored to completeness, and the summary statistics are printed. These give an idea of the range and variability of price, sqft and age. The means tell us about the proportions of homes that are near the University and that have pools or fireplaces.

Summary Statistics, using the observations 1-1000

Variable

Mean

Median

Minimum

Maximum

price

247.656

245.833

134.316

345.197

sqft

25.2097

25.3600

20.0300

30.0000

age

9.39200

6.00000

0.000000

60.0000

utown

0.519000

1.00000

0.000000

1.00000

pool

0.204000

0.000000

0.000000

1.00000

fplace

0.518000

1.00000

0.000000

1.00000

Variable

Std. Dev.

C. V.

Skewness

Ex. kurtosis

price

42.1927

0.170368

0.0905617

-0.667432

sqft

2.91848

0.115768

-0.0928347

-1.18500

age

9.42673

1.00370

1.64752

3.01458

utown

0.499889

0.963177

-0.0760549

-1.99422

pool

0.403171

1.97633

1.46910

0.158242

fplace

0.499926

0.965108

-0.0720467

-1.99481

You can see that half of the houses in the sample are near the University (519/1000). It is also pretty clear that prices are measured in units of $1000 and square feet in units of 100. The oldest house is 60 years old and there are some new ones in the sample (age=0). Minimums and maximums of 0 and 1, respectively usually mean that you have indicator variables. This confirms what we concluded by looking at the first few observations in the sample.

Leave a reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>