Linear probability model
Summary
The linear probability model
- Given: a random sample: \(\left( y_i,x_i \right)\) where \(y_i\) can only take values 0 or 1 and \(x_i\) is \(k×1\)
- Statistical model:
\[E\left( y_i|x_i \right)=x'_iβ\]
- where \(β\) is a is \(k×1\) vector of unknown parameters.
- Result:
\[E\left( y_i|x_i \right)=P\left( y_i=1 \mid x_i \right)\]
- Since we assume that \(P\left( y_i=1 \mid x_i \right)=x'_iβ\) , this model is called the linear probability model (LPM)
- \(x'_iβ\) is a probability so for the LPM to make sense, \(x'_iβ\) must belong to the interval \(\left[ 0,1 \right]\) .
- Define \(ε_i=y_i-E\left( y_i|x_i \right)\) such that
\[y_i=x'_iβ+ε_i\]
- we can estimate \(β\) consistently using OLS (or MM).
Problems with the linear probability model
- The model assumes constant marginal effects ,
\[ \frac{∂E\left( y|x \right)}{∂x_j}= \frac{∂P\left( y|x \right)}{∂x_j}=β_j\]
- This is unreasonable in most cases. In general, \(∂P\left( y|x \right)/∂x_j\) will decrease in \(x_j\) as \(x_j\) gets large.
- \(ε_i\) can only take two values, \(1-x'_iβ\) and \(-x'_iβ\) with conditional probabilities \(x'_iβ\) and \(1-x'_iβ\) . Exogeneity holds but
\[Var\left( ε_i|x_i \right)=x'_iβ\left( 1-x'_iβ \right)\]
- which is not constant. We have a complicated form of heteroscedasticity.
- Predicted values, \({\hat{y}}_i=x'_ib\) , which are probabilities, may end up outside the \(\left[ 0,1 \right]\) interval.