# Linear Probability

A linear probability model is a linear regression in which the dependent variable is an indicator variable. The model is estimated by least squares.

E[yi] =1 x Pr(yi = 1) + 0 x Pr(yi = 0) = пі

Thus, the mean of a binary random variable can be interpreted as a probability; it is the probability that y = 1. When the regression E[y^Xi2, Xi3,…, XiK] is linear then E[yi] = ві + в2Xi2 +.. .+вкXiK and the mean (probability) is modeled linearly.

E[yi|Xi2,Xi3, .. . , XiK] = Пі = ві + e2Xi2 + … + вкXiK (7.7)

The variance of a binanry random variable is

var[yi] = пі(1 – пі) (7.8)

which means that it will be different for each individual. Replacing the unobserved probability, E(yi), with the observed indicator variable requires adding an error to the model that we can estimate via least squares. In this following example we have 1140 observations from individuals who purchased Coke or Pepsi. The dependent variable takes the value of 1 if the person buys Coke and 0 if Pepsi. These depend on the ratio of the prices, pratio, and two indicator variables, disp_coke and disp_pepsi. These indicate whether the store selling the drinks had promotional displays of Coke or Pepsi at the time of purchase.

OLS, using observations 1-1140
Dependent variable: coke

Heteroskedasticity-robust standard errors, variant HC3

 Coefficient Std. Error t-ratio p-value const 0.8902 0.0656 13.56 5.88e-039 pratio -0.4009 0.0607 -6.60 6.26e-011 disp_coke 0.0772 0.0340 2.27 0.0235 disp_pepsi -0.1657 0.0345 -4.81 1.74e-006

The model was estimated using a variance-covariance matrix estimator that is consistent when the error terms of the model have variances that depend on the observation. That is the case here. I’ll defer discussion of this issue until the next chapter when it will be discussed at some length.