# Spatial dependence in models for qualitative data

Empirical analysis of interacting agents requires models that incorporate spatial dependence for discrete dependent variables, such as counts or binary outcomes (Brock and Durlauf, 1995). This turns out to be quite complex and continues to be an active area of research. While an extensive discussion of the technical aspects associated with spatial discrete choice models is beyond the scope of the current chapter, the salient issues may be illustrated with a spatial version of the probit model, which has recently received considerable attention.19

The point of departure is the familiar expression for a linear model in a latent (unobserved) dependent variable y*

y* = хів + e, (14.17)

where ei is a random variable for which a given distribution is assumed (e. g. the normal for the probit model). The realization of y* is observed in the form of discrete events, yi = 1 for y* > 0, and yi = 0 for y* < 0. The discrete events are related to the underlying probability model through the error term, for example, y* > 0 implies – х’в < є;, and, therefore,

E[y] = P[y = 1] = Ф[х;р], (14.18)

where Ф is the cumulative distribution function for the standard normal.

Spatial autocorrelation can be introduced into this model in the form of a spatial autoregressive process for the error term e; in (14.17), or

є; = X X Wjєj + uu (14.19)

j

where X is an autoregressive parameter, the Wj are the elements in the ith row of a spatial weights matrix, and ui may be assumed to be iid standard normal. As a consequence of the spatial multiplier in the autoregressive specification, the random error at each location now becomes a function of the random errors at all other locations as well. Its distribution is multivariate normal with N x N variance-covariance matrix

Б[єє’] = [(I – XW)(I – XW)]-1. (14.20)

As pointed out above, besides being nondiagonal, (14.20) is also heteroskedastic. Consequently, the usual inequality conditions that are at the basis of (14.18) no longer hold, since each location has a different variance. Moreover, P[-x(P < e;] can no longer be derived from the univariate standard normal distribution, but rather must be expressed explicitly as the marginal distribution of a N-dimen – sional multivariate normal vector, whose variance-covariance matrix contains off-diagonal elements that are a function of the autoregressive parameter X. This is non-standard and typically not analytically tractable, which greatly compli­cates estimation and specification testing. Similar issues are faced in the spatial lag model for a latent variable.20