A few basic commands and conventions
The first thing I usually do is to change the name to something less generic, e. g., cola, using
> cola <-gretldata
You can also load the current gretl data into R manually as shown below. To load the data in properly, you have to locate the Rdata. tmp file that gretl creates when you launch R from the GUI. Mine was cleverly hidden in C:/Users/Lee/AppData/Roaming/gretl/Rdata. tmp. Once found, use the read. table command in R as shown. The system you are using (Windows in my case) dictate whether the slashes are forward or backward. Also, I read the data in as cola rather than the generic gretldata to make things easier later. R.
The addition of Header = TRUE to the code that gretl writes for you ensures that the variable names, which are included on the first row of the Rdata. tmp, get read into R properly. Then, to run the regression in R.
———— R code to estimate a linear model and print results
1 fitols <- lm(price~feature+display, data=cola)
The fitols <- lm(price feature+display, data=cola) command estimates a linear regression model with price as the dependent variable. The results are stored into memory under the name fitols. The variables feature and display are included as regressors. R automatically includes an intercept. To print the results to the screen, you have to use the summary(fitols) command. Before going further, let me comment on this terse piece of computer code. First, in R
> summary. lm(fitols)
lm(formula = price ~ feature + display, data = cola)
Min IQ Median 3Q Max
-1.24453 -0.12085 -0.01453 0.08915 1.58547
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.404527 0.004198 334.60 <2e-16 ***
feature -0.249883 0.006997 -35.71 <2e-16 ***
display -0.253789 0.007272 -34.90 <2e-16 ***
Signif. codes: 0 ***** 0.001 ‘**’ 0.01 **’ 0.05 0.1 * ‘ 1
Residual standard error: 0.2148 on 5463 degrees of freedom Multiple R-squared: 0.5074, Adjusted R-squared: 0.5072
F-statistic: 2813 on 2 and 5463 DF, p-value: < 2.2e-16
Figure D.3: The fitols <- lm(price feature+display, data=cola) command estimates a linear regression model with price as the dependent variable. The variables feature and display are included as regressors.
the symbol <- is used as the assignment operator; it assigns whatever is on the right hand side (lm(y~x, data=gretldata)) to the name you specify on the left (fitols). It can be reversed -> if you want to call the object to its right what is computed on its left.
The lm command stands for ‘linear model’ and in this example it contains two arguments within the parentheses. The first is your simple regression model. The dependent variable is price and the independent variables feature, display, and a constant. The dependent variable and independent variables are separated by the symbol which substitutes in this case for an equals sign. The independent variables are separated by plus signs (+). In a linear model the meaning of this is unambiguous. The other argument points to the data set that contains these two variables. This data set, pulled into R from gretl, is by default called gretldata. We changed the name to cola above and that is what we refer to here. There are other options for the lm command, and you can consult the substantial pdf manual to learn about them. In any event, you’ll notice that when you enter this line and press the return key (which executes this line) R responds by issuing a command prompt, and no results! R does not bother to print results unless you ask for them. This is handier than you might think, since most programs produce a lot more output than you actually want and must be coerced into printing less. The last line asks R to print the ANOVA table to the screen. This gives the result in Figure D.4. It’s that simple!
To do multiple regression in R, you can also put each of your independent variables (other than the intercept) into a matrix and use the matrix as the independent variable. A matrix is a rectangular array (which means it contains numbers arranged in rows and columns). You can think of a matrix as the rows and columns of numbers that appear in a spreadsheet program like MS Excel. Each row contains an observation on each of your independent variables; each column contains all of the observations on a particular variable. For instance suppose you have two variables, x1 and x2, each having 5 observations. These can be combined horizontally into the matrix, X. Computer programmers sometimes refer to this operation as horizontal concatenation. Concatenation essentially means that you connect or link objects in a series or chain; to concatenate horizontally means that you are binding one or more columns of numbers together.
The function in R that binds columns of numbers together is cbind. So, to horizontally concatenate x1 and x2 use the command
Then the regression is estimated using
fitols <- lm(y~X)
There is one more thing to mention about R that is very important and this example illustrates it vividly. R is case sensitive. That means that two objects x and X can mean two totally different things to R. Consequently, you have to be careful when defining and calling objects in R to get to distinguish lower from upper case letters.
The following is section is taken with very minor changes from Venables et al. (2006).
All R functions and datasets are stored in packages. Only when a package is loaded are its contents available. This is done both for efficiency (the full list would take more memory and would take longer to search than a subset), and to aid package developers, who are protected from name clashes with other code. The process of developing packages is described in section Creating R packages in Writing R Extensions. Here, we will describe them from a users point of view. To see which packages are installed at your site, issue the command library() with no arguments. To load a particular package (e. g., the MCMCpack package containing functions for estimating models in Chapter 16
If you are connected to the Internet you can use the install. packages() and update. packages() functions (both available through the Packages menu in the Windows GUI). To see which packages are currently loaded, use
to display the search list.
> help. start()
With R you can read in datasets in many different formats. Your textbook includes a dataset written in Stata’s format and R can both read and write to this format. To read and write Stata’s. dta files, you’ll have to load the foreign package using the library command:
2 nels <- read. dta("c:/temp/nels_small. dta")
3 pse <- nels$psechoice
Line 2 reads the Stata dataset using the read. dta command directly into R. It is placed into an object called nels. There are two things to note, though. First, the slashes in the filename are backwards from the Windows convention. Second, you need to point to the file in your directory structure and enclose the path/filename in double quotes. R looks for the the file where you’ve directed it and, provided it finds it, reads it into memory. It places the variable names from Stata into the object. Then, to retrieve a variable from the object you create the statement in line 3. Now, you have created a new object called pse that contains the variable retrieved from the nels object called psechoice. This seems awkward at first, but believe it or not, it becomes pretty intuitive after a short time.
The command attach(nels) will take each of the columns of nels and allow you to refer to it by its variable name. So, instead of referring to nels$psechoice you can directly ask for psechoice without using the nels$ prefix. For complex programs, using attach() may lead to unexpected results. If in doubt, it is probably a good idea to forgo this option. If you do decide to use it, you can later undo it using detach(nels).
about using R in econometrics. He gives some alternatives to using MCMCpack for the models discussed in Chapter 16.