Monte Carlo Simulation

The first step in a Monte Carlo exercise is to model the data generation process. This requires what Davidson and MacKinnon (2004) refer to as a fully specified statistical model. A fully specified parametric model “is one for which it is possible to simulate the dependent variable once the values of the parameters are known” (Davidson and MacKinnon, 2004, p. 19). First you’ll need a regression function, for instance:

E (ytQt) = ві + в2 xt (2.14)

where yt is your dependent variable, xt the dependent variable, Qt the current information set, and ві and в2 the parameters of interest. The information set Qt contains xt as well as other potential explanatory variables that determine the average of yt. The conditional mean of yt given the information set could represent a linear regression model or a discrete choice model. However, equation (2.14) is not complete; it requires some description of how the unobserved or excluded factors affect ytQt.

To complete the the specification we need to specify an “unambiguous recipe” for simulating the model on a computer (Davidson and MacKinnon, 2004, p. 17). This means we’ll need to specify a probability distribution for the unobserved components of the model and then use a pseudo-random number generator to generate samples of the desired size.

In this example the data generation process will be as follows. We will let N = 40 and consider a linear model of the form

Уі = ві + в2 Xi + ei i = 1, 2, ••• , 40. (2.15)

The errors of the model will iid N(0, 88). The parameters ві = 100 and в2 = 10. Finally, let xl, x2, ■ ■ ■ ,x20 = 10 and let x2l, x22, ■ ■ ■ , x40 = 20. This gives us enough information to simulate samples of yi from the model. The hansl script (hansl is an acronym for hansl’s a neat scripting language is:

1 nulldata 40

2 # Generate X

3 series x = (index>20) ? 20 : 10

4

4 # Generate systematic portion of model

5 series ys = 100 + 10*x

7

6 loop 1000 —progressive —quiet

7 y = ys + normal(0,50)

8 ols y const x

9 scalar b1 = $coeff(const)

10 scalar b2 = $coeff(x)

11 scalar sig2 = $sigma"2

12 print b1 b2 sig2

13 store "@workdircoef. gdt" b1 b2 sig2

14 endloop

17

15 open "@workdircoef. gdt"

16 summary

17 freq b2 —normal

The first line creates an empty dataset that has room for 40 observations. Line 3 contains a ternary conditional assignment operator.2 Here is how it works. A series x is being created. The statement in parentheses is checked. The question mark (?) is the conditional assignment. If the statement in parentheses is true, then x is assigned the value to the left of the colon. If false it gets the value to the right. So, when index (a gretl default way of identifying the observation number) is greater than 20, x is set to 20, if index is less than or equal to 20 it is set to 10.

Next, the systematic portion of the model is created. For this we need x and the known values of the parameters (100, 10). Then we loop from 1 to 1000 in increments of 1. Normal random variates are added to the model, it is estimated by ols, and several statistics from that computation are retrieved, printed, and stored in a specified location.

The normal(0,50) statement generates normal random variables with mean of 0 and a variance of 50. The print statement used in this context actually tells gretl to accumulate the things that are listed and to print out summary statistics from their computation inside the loop. The store command tells gretl to output b1, b2, and sig2 to an external file. The —progressive option to the loop command alters the print and store commands a bit, and you can consult the Gretl Users Guide for more information about how.

Here is the output from the Monte Carlo. First, the output from the progressive loop:

Подпись: Variable

image040 image041

OLS estimates using the 40 observations 1-40 Statistics for 1000 repetitions Dependent variable: у

In a progressive loop, gretl will print out the mean and standard deviation from the series of estimates. It works with all single equation estimators in gretl and is quite useful for Monte Carlo analysis. From this you can see that the average value of the constant in 1000 samples is 100.491. The average slope was 9.962. The third column gives the mean of the standard error calculation

2 A ternary operator has three parts. In this case, the parts give us a fancy way of creating if/else statements. The first part, a, lies to the left of?, the second, b, falls between the question mark and the colon and the last, c, is to the right of the colon, e. g., a? b:c. If a is true, then b if not, then c.

from the simulation. If the standard errors are being estimated consistently, then these should be fairly close to the standard deviation of estimated coefficients to their left. The outcome from the print command is:

Подпись: std. dev.Statistics for 1000 repetitions Variable mean

M 100.491 24.5847

Ь2 9.96204 1.57931

sig2 2497.03 551.720

When the print command is issued, it will compute and print to the screen the ‘mean’ and ‘std. dev.’ of the estimated scalar. Notice that b1 and b2 match the output produced by the —progressive option. The print command is useful for studying the behavior of various statistics (like tests, confidence intervals, etc) and other estimators that cannot be handled properly within a progressive loop (e. g., mle, gmm, and system estimation commands).

The store statement works behind the scenes, but yields this informative piece of information:

3tore: using filename c:teirpcoef. gdt Data written OX.

This tells you where gretl wrote the dataset that contains the listed scalars, and that is was written properly. Now you are ready to open it up and perform additional analysis. In this example, we have used the @workdir macro. This basically tells gretl to go to the working directory to write the file. You could write files to gretl’s temporary directory using @dotdircoef. gdt.

The data set is opened and the summary statistics generated (again, if needed)

1 open "@workdircoef. gdt"

2 summary

3 freq b2 —normal

From here you can plot frequency distribution and test to see whether the least squares estimator of slope is normally distributed.

image043

The histogram certainly appears to be normally distributed compared to the line plot of the normal. Also, the hypothesis test of the normality null against nonnormality cannot be rejected at any reasonable level of significance.

2.2 Script

The script for chapter 2 is found below. These scripts can also be found at my website http: //www. learneconometrics. com/gretl.

1 set echo off

2 open "@gretldirdatapoefood. gdt"

3 setinfo food_exp – d "household food expenditure per week"

4 – n "Food Expenditure"

5 setinfo income – d "weekly household income" – n "Weekly Income"

6 labels

7

7 #Least squares

8 ols food_exp const income —vcv

9 ols 10 2

11

10 #Summary Statistics

11 summary food_exp income

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

#Plot the Data gnuplot food_exp income

#List the Data

print food_exp income —byobs #Elasticity

ols food_exp const income –quiet

scalar elast=$coeff(income)*mean(income)/mean(food_exp) #Prediction

scalar yhat = $coeff(const) + $coeff(income)*20 #Table 2.2

open "@gretldirdatapoetable2_2.gdt" list ylist = y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 loop foreach i ylist

ols ylist.$i const x endloop

# slopes and elasticities at different points open "@gretldirdatapoebr. gdt"

series sqft2 = sqftrt2 ols price const sqft2

scalar slope_2000 = 2*$coeff(sqft2)*2000 scalar slope_4000 = 2*$coeff(sqft2)*4000 scalar slope_6000 = 2*$coeff(sqft2)*6000 scalar elast_2000 = slope_2000*2000/117461.77 scalar elast_4000 = slope_4000*4000/302517.39 scalar elast_6000 = slope_6000*6000/610943.42

# histogram for price and log(price) series l_price = ln(price)

freq price freq l_price

# regression using indicator variables open "@gretldirdatapoeutown. gdt" ols price const utown —quiet

scalar ut = $coeff(const)+$coeff(utown) scalar other = $coeff(const)

printf "nThe average in Utown is %.4f and the average elsewhere is %.4fn",ut, other

# Monte Carlo simulation

open "@gretldirdatapoefood. gdt" set seed 3213789 loop 100 —progressive —quiet series u = normal(0,88) series y1= 80+10*income+u ols y1 const income

endloop

# Monte Carlo simulation #2

# Generate systematic portion of model nulldata 40

# Generate X

series x = (index>20) ? 20 : 10

# Generate systematic portion of model series ys = 100 + 10*x

loop 1000 —progressive —quiet series y = ys + normal(0,50) ols y const x scalar b1 = $coeff(const) scalar b2 = $coeff(x) scalar sig2 = $sigmart2 print b1 b2 sig2

store "@workdircoef. gdt" b1 b2 sig2 endloop

open "@workdircoef. gdt" summary

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

freq b2 —normal

image044

Figure 2.11: Summary statistics

Подпись: Figure 2.9: The model window appears with the regression results. From here you can conduct subsequent operations (graphs, tests, analysis, etc.) on the estimated model. Подпись: Figure 2.10: Using the pull-down menus to obtain summary statistics. Highlight the desired variables and use View>Summary statistics from the pull-down menu.image047

image048

Figure 2.12: Results from the script to compute an elasticity based on a linear regression.

Model 4: OLS, using

obse

Display actual, fitted, residual

Dependent variable:

food

Forecasts…

Confidence intervals for coefficients

coefficient

Cr>nfir|gp|f P РІІІГКР…

e

const 83.4160

^Coefficient covariance matrix_^^

*

income 10.2096

ANOVA

35 ***

Mean dependent var

233

Bootstrap…

.6752

.з гоз ь. v. dependent var

112

Sum squared resid

304505.2 5.E. Of regression

39.

51700

R-squared

0.385002 Adjusted R-squared

0.368818

F(l, 38)

23.

78884 P-value(F)

0.000019

Log-likelihood

-235

.5088 Akaike criterion

475

.0176

Schwarz criterion

473

.3954 Hannan-Quinn

476

.2339

image049Figure 2.13: Obtain the matrix that contains the least squares estimates of variance and covariance from the pull-down menu of your estimated model.

Подпись: Figure 2.14: Choose Data>Define or edit list from the gretl menu bar

Подпись: Variable

Подпись: mean of estimated coefficients Подпись: std. dev. of estimated coefficients Подпись: mean of estimated std. errors Подпись: std. dev. of estimated std. errors

Подпись: const income

Подпись: 88.1474 9.59723 Подпись: 40.3705 2.01529 Подпись: 42.1194 2.03102 Подпись: 4.49704 0.216350

OLS estimates using the 40 observations 1-40 Statistics for 100 repetitions Dependent variable: yl

Подпись: std. dev. 40.3705 2.01529 Statistics for 100 repetitions Variable mean

Ы 38.1474

b2 9.59723

store: using filename c:tempcoeff. gdt

Data, written OK.

Figure 2.15: The summary results from 100 random samples of the Monte Carlo experiment.

image062

1.4е+006

 

1.2е+006

 

le+006

 

800000

 

600000

 

400000

 

200000

 

3000

 

4000

 

5000

 

6000

 

7000

 

8000

 

1000

 

2000

 

image063

image064

Figure 2.16: Price versus size from the quadratic model.

image065

Figure 2.17: Price and its natural logarithm.

 

Chapter

Leave a reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>