Binary choice in Stata

Summary

Example

This example is based on the benefits data from chapter 7 in Verbeek. The dependent variable is a dummy variable, 1 if applied for (and received) UI (unemployment) benefits. The explanatory variables are as follows:

stateur: state unemployment rate (in %)
statemb: state maximum benefit level
age: age in years
age2: age squared
tenure: years of tenure in job lost
slack: dummy, 1 if job lost due to slack work
abol: dummy, 1 if job lost because position abolished
seasonal: dummy, 1 if job lost becasue seasonal job ended
nwhite: dummy, 1 if nonwhite
school12: dummy, 1 if more than 12 years of school
male: dummy, 1 if male
bluecol: dummy, 1 if blue collar worker
smsa: dummy, 1 if live is smsa
married: dummy, 1 if married
dkids: dummy, 1 if kids
dykids: dummy, 1 if young kids (0-5 yrs)
yrdispl: year of job displacement (1982=1,..., 1991=10)
rr: replacement rate
rr2: rr squared
head: dummy, 1 if head of household

Linear probability model

regr y rr rr2 age age2 tenure slack abol seasonal head married dkids dykids smsa nwhite yrdispl school12 male statemb stateur

Interpretation (example):

male decreases the probability of receiving benefits by an estimated 3.6 percentage points,
married increases it by 4.9 percentage points
If the state unemployment rate increases by one percentage point, the probability of receiving benefits is reduced by 1.8 percentage points and so on

If you do

you see one problem. There is an individual for which the predicted probability of receiving benefits is 116%.

Logit

logit y rr rr2 age age2 tenure slack abol seasonal head married dkids dykids smsa nwhite yrdispl school12 male statemb stateur

Coefficient now have no interpretation. However, the sign and the p-value have their usual interpretation. The p-values are base on the Wald test.

All fitted values are in [0,1].
We see that \(l_1=-2873\) . We can find \(l_0=-3043\) by typing “logit y”. From this, we can confirm the Pseudo R2 = 0.0558 and the value of the LR statistic, 339.66
We can add vce(vcetype) to get robust standard errors, just as in the linear regression
We can use “test” as in the linear regression to test linear hypothesis using the Wald test
We can perform an LR test using the command “lrtest”.

First estimate the unrestricted model and save the estimates: (“estimates store unrestricted”)
Then estimate the restricted model and save the estimates: (“estimates store restricted”)
Do the LR-test: “lrtest unrestricted restricted”

For example, for the null hypothesis \(H_0: \) dkids = 0 we have \(ξ_W=3.36\) with a p-value of 0.0670 (“test dkids=0”) and \(ξ_{LR}=3.35\) with a p-value of 0.0672

Probit

Coefficient are not comparable between probit and logit. However, the sign and the p-values are.
Everything we said about logit holds for probit as well.