# Multinomial Logit

Starting with version 1.8.1, Gretl includes a routine to estimate multinomial logit (MNL) using maximum likelihood. In versions before 1.8.1 the alternatives were either (1) use gretl’s maximum likelihood module to estimate your own or (2) use another piece of software! In this section we’ll estimate the multinomial logit model using the native gretl function and I’ll relegate the other methods to a separate (optional) section 16.3.1. The other methods serve as good examples of how to use gretl’s scripting language and how to use it in conjunction with R.

In this model the dependent variable is categorical and is coded in the following way. A student graduating from high school chooses among three alternatives: attend no college psechoice=1, enroll in a 2-year college psechoice=2, or enroll in a 4-year college psechoice=3. The explanatory variable is grades, which is an index ranging from 1.0 (highest level, A+ grade) to 13.0 (lowest level, F grade) and represents combined performance in English, Math and Social Studies. For this example, the choices are treated as being unordered. There are 1000 observations.

To estimate the model of school choice as a function of grades and a constant open the nelssmall. gdt dataset and use the logit command with the —multinomial option as shown:

1 open "c:Program Filesgretldatapoenels_small. gdt"

2 logit psechoice const grades —multinomial

This yields the output shown in Figure 16.2:

The coefficients appear in sets. The first set are the coefficients that go with psechoice=2 and the second set go with psechoice=3; this implies that gretl chose psechoice=1 used as the base.

Model 1: Multinomial Logitf using observations 1-1000 Dependent variable: psechoice Standard errors based on Hessian

Mean dependent var 2.305000 Log-likelihood -875.3131 Schwarz criterion 1778.257

Number of cases ‘correctly predicted’ = 585 (58.5%) Likelihood ratio test: Chi-square(2) = 286.689 [0.0000]

Figure 16.3: You can obtain the outcome probabilities from the multinomial logit model window. These are also available after estimation in the \$mnlprobs accessor.

The probability of choosing an alternative in multinomial logit is

Obtaining the probabilities is simple. If you estimate the model via the GUI (Model>Nonlinear models>Logit>Multinomial) then you will have an option in the model window to produce the pre­dicted probabilities for each case in the sample. In Figure 16.3 you will see that Analysis>Outcome probabilities can be selected. The first few of these are shown:

Estimated outcome probabilities for psechoice

 1 2 3 1 0.4408 0.3274 0.2319 2 0.3511 0.3308 0.3181 3 0.2539 0.3148 0.4313 4 0.2539 0.3148 0.4313 5 0.2539 0.3148 0.4313 1000 0.0339 0.1351 0.831

A script can be written to obtain predicted probabilities that shows off a few more tricks. The proposed function is called mlogitprob and the script for it is:

1 function list mlogitprob(series y, list x, matrix theta)

2 list probs = null

3 matrix X = { x }

4 scalar j = max(y)

5 scalar k = cols(X)

6 matrix b = mshape(theta, k,j-1)

7 matrix tmp = X*b

8 series den = (1 + sumr(exp(tmp)))

9

9 loop for i=1..j —quiet

10 if i = 1

11 series p\$i = 1/den

12 else

13 scalar q = i – 1

14 series num = exp(X[q,]*b[,q])

15 series p\$i=num/den

16 endif

17 list probs += p\$i

18 endloop

19 return probs

20 end function

The inputs are the dependent variable, y, a list of independent variables, x, and the coefficients from multinomial logit estimation, theta. The function will return a list that contains the computed probabilites. These will be added to the dataset.

An empty list must be created, which is done using list null. In line 3 the independent variables are converted into a matrix called X. The fourth line finds the maximum category in the coding of the dependent variable. Ours, psechoice, takes values 1, 2, and 3 in the data so this will return the value 3. If your data are coded 0, 1, 2 like they sometimes are, you will have to alter your script to account for that. The scalar k counts the number of independent variables. In MNL there are J choices and J — 1 sets of k parameters. The matrix b reshapes the (J — 1)k x 1 vector of coefficients produced by logit —multinomial into a k x (J — 1) matrix. Each column of this matrix contains the coefficients for the (j — 1)th choice. The matrix labeled tmp computes the indexes for each choice. The matrix den computes the row sums of these to produce the denominator found in the MNL probabilities.

The loop is required because of the way MNL probabilities are computed. For the normalized choice, the numerator is 1. For the others it is exp(indexj). The computed probabilities are added to the list probs using the operator (+=), which is a fancy way of appending new results to existing results. The loop ends and you must return the list probs in order for the computed series to be passed out of the function and added to the dataset. If you have been working through the simpler functions we’ve considered up to this point, then the added complexity of this one will not be bothersome. If this is your first function in gretl, then you are no doubt lost. Again, it is not required to get probabilities from MNL.

To use the function, create the variable list, estimate the model and save the coefficients to a matrix. Finally, create a list and print it by observation as in:

1 open "@gretldirdatapoenels_small. gdt"

2 list x = const grades

3 logit psechoice x —multinomial

4 matrix theta = \$coeff

5 list n = mlogitprob(psechoice, x, theta)

6 smpl 1 12

7 print n —byobs

8 smpl full

Of course, you could make things easy on yourself and just use the accessor, \$mnlprobs. This gives you access to the probabilities from the multinomial logit that we obtained using the GUI. Not much fun in that, but it is easy. However, with our homemade function we can compute marginal effects.

To get average marginal effects is a snap at this point. We will simply add 1 to the value of grades, recompute the probabilities, and average the difference between the two. This will require renaming the predicted probabilities, but that is easily done.

1 rename p1 p01

2 rename p2 p02

3 rename p3 p03

4

5 list x1 = const grade1

6 list n1 = mlogitprob(psechoice, x1, theta)

7 series d1 = p1-p01

8 series d2 = p2-p02

9 series d3 = p3-p03

10 summary d* –simple

The script yields:

 Summary statistics, using the observations 1 – 1000 Mean Minimum Maximum Std. Dev. d1 0.080044 0.0092216 0.11644 0.034329 d2 -0.00014717 -0.11560 0.017795 0.023719 d3 -0.066086 -0.31899 -0.00037743 0.069045

As a student’s performance gets worse (grades increases by 1), the average probability of not attending college goes up by 0.08. The probability of attending 4-year school declines by -0.066.

Finding marginal effects at specific points requires another function, but it is very similar to the one used above. The only substantive change is feeding the function a matrix rather than the list of variables and changing series computations within the function to either scalars or matrix. Here is the new function called

1 function matrix mlogitprob_at(series y, matrix X, matrix theta)

2 matrix probs = {}

3 scalar j = max(y)

4 scalar k = cols(X)

5 matrix b = mshape(theta, k,j-1)

6 matrix tmp = X*b

7 scalar den = (1 + sumr(exp(tmp)))

8

8 loop for i=1..j —quiet

9 if i = 1

10 scalar p\$i = 1/den

11 else

12 scalar q = i – 1

13 scalar num = exp(X*b[,q])

14 scalar p\$i=num/den

15 endif

16 matrix probs = probs ~ p\$i

17 endloop

18 return probs

19 end function

The function is quite easy to use and reproduces the results in POE4 Table 16.3.

1 open "@gretldirdatapoenels_small. gdt"

2 list x = const grades

3 logit psechoice x –multinomial

4 matrix theta = \$coeff

5 matrix Xm = {1 , quantile(grades,.50)}

6 matrix p50 = mlogitprob_at(psechoice, Xm, theta)

7 matrix Xm = {1 , quantile(grades,.05)}

8 matrix p05 = mlogitprob_at(psechoice, Xm, theta)

9 printf "nThe predicted probabilities for student

11 printf "nThe predicted probabilities for student

To use the function to get marginal effects of 1 unit change in grades for median and 95th percentile students we create quantiles based on the series grades and use these in our new function. Taking the difference in probabilities will give us an approximate marginal effect at those quantiles.

1 open "@gretldirdatapoenels_small. gdt"

2 list x = const grades

3 logit psechoice x —multinomial

4 matrix theta = \$coeff

6 matrix Xm = {1 , q50-0.5}

7 matrix p0 = mlogitprob_at(psechoice, Xm, theta)

8 matrix Xm = {1 , q50+0.5}

9 matrix p1 = mlogitprob_at(psechoice, Xm, theta)

10 matrix me = p1-p0

11 printf "nThe marginal effect of grades for student

13

14 matrix Xm = {1 , q05-0.5}

15 matrix p0 = mlogitprob_at(psechoice, Xm, theta)

16 matrix Xm = {1 , q05+0.5}

17 matrix p1 = mlogitprob_at(psechoice, Xm, theta)

18 matrix me = p1-p0

19 printf "nThe marginal effect of grades for student

20 grades=%6.2f is %8.4fn", q05, me

Notice that the script returns the predicted probabilities for these students and the change in those probabilities resulting from a 1 unit change in grades. The total probabilities should sum to 1 and the marginal effects should sum to zero. This script also uses a common trick. The one unit change is evaluated at ±0.5 on either side of each quantile; then the discrete difference is taken. The results match those in POE4 reasonably well.

The option —multinomial is used when the choices are unordered. For ordered logit, this option is omitted. Gretl takes a look at the dependent variable, in this case psechoice, to make sure that it is actually discrete. Ours takes on three possible values (1, 2, or 3) and the logit function in gretl will handle this automatically.

In this section I’ll give you an idea of how to estimate this model using another gretl script and in section 16.10 I’ll show you how to estimate the model in another free software called R.

Although versions of Gretl prior to 1.8.1 did not include a specific function for estimating MNL, it can be estimated with a little effort. Gretl contains two things that make this reasonably easy to do. First, it includes a module for maximizing user specified likelihood functions (see chapter 14 for other examples). To use the mle function, the user has to write a program in hansl to compute a model’s log-likelihood given the data. The parameters of the log-likelihood must be declared and given starting values (using the scalar command). If you want, you can specify the derivatives of the log-likelihood function with respect to each of the parameters; if analytical derivatives are not supplied, a numerical approximation is computed. In many instances, the numerical approximations work quite well. In the event that the computations based on numerical derivatives fail, you may have to specify analytical ones to make the program work.

What appears below is taken from the Gretl Users Guide. The example for MNL for POE4 requires only a slight modification in order for the program to run with our dataset.

The multinomial logit function, which can be found in the Gretl User’s Guide, is defined

1 function mlogitlogprobs(series y, matrix X, matrix theta)

2 scalar n = max(y)

3 scalar k = cols(X)

4 matrix b = mshape(theta, k,n)

5 matrix tmp = X*b

6 series ret = – ln(1 + sumr(exp(tmp)))

7 loop for i=1..n —quiet

8 series x = tmp[,i]

9 ret += (y=\$i) ? x : 0

10 end loop

11 return series ret

12 end function

The function is named mlogitlogprobs and has three arguments. The first is the dependent variable, series y, the second is set of independent variables contained in matrix X, and the last is the matrix of parameters, called theta. Scalars in the function are defined for sample size, number of regressors, and the coefficients are placed in an n x k array in order to match the dimensionality of the data. The index tmp=X*b is created and ret returns the log-likelihood function. Don’t worry if you can’t make sense of this because you should not have to change any of this to estimate MNL with another dataset. That is one of the beauties of defining and using a function.

To use the mlogitlogprobs function, you need to know a little about how it works. You will have to get your data into the right form in order for the function to work properly. After loading the data, make sure that the dependent choice variable is in the correct format for the function.

The function requires the choices to start at 0. If you list the data, you’ll find that psechoice is coded 1, 2, 3 instead of the required 0, 1, 2. So the next step is to subtract 1 from psechoice.

Create the matrix of regressors, define the number of regressors and use these to initialize the matrix of coefficients, theta. Then list the dependent variable, matrix of independent variables, and the initialized parameter matrix as arguments in the function. Click the run button and wait for the result.

1 open "@gretldirdatapoenels_small. gdt"

2 psechoice = psechoice-1 # dep. var. must be 0-based

3 list x = const grades

4 smpl full

5 matrix X = { x }

6 scalar k = cols(X)

7 matrix theta = zeros(2*k, 1)

8 mle loglik = mlogitlogprobs(psechoice, X,theta)

9 params theta io end mle —hessian

The only changes I had to make to the original example in the Gretl User Guide are (1) change the dataset (2) change the dependent variable to psechoice (3) put the desired regressors into X and (4) make sure the function contains the desired variables.

The results from the program appear below in Figure 16.4. Wow! They match those in POE4 and are dirt simple to obtain. Finally, if you want to produce the probabilities and marginal effects,

Log-likelihood -875.3131 Akaike criterion 1758.626

Schwarz criterion 1778.257 Hannan-Quinn 1766.087

Figure 16.4: These results are from a gretl function taken from the Gretl Users Guide.

you can use the estimates gretl has stored in the 4 x 1 vector called theta. This was the approach taken in the preceding section and I won’t repeat the details here.