Binary choice in Stata

Summary

Example

This example is based on the benefits data from chapter 7 in Verbeek. The dependent variable is a dummy variable, 1 if applied for (and received) UI (unemployment) benefits. The explanatory variables are as follows:

  • stateur: state unemployment rate (in %)
  • statemb: state maximum benefit level
  • age: age in years
  • age2: age squared
  • tenure: years of tenure in job lost
  • slack: dummy, 1 if job lost due to slack work
  • abol: dummy, 1 if job lost because position abolished
  • seasonal: dummy, 1 if job lost becasue seasonal job ended
  • nwhite: dummy, 1 if nonwhite
  • school12: dummy, 1 if more than 12 years of school
  • male: dummy, 1 if male
  • bluecol: dummy, 1 if blue collar worker
  • smsa: dummy, 1 if live is smsa
  • married: dummy, 1 if married
  • dkids: dummy, 1 if kids
  • dykids: dummy, 1 if young kids (0-5 yrs)
  • yrdispl: year of job displacement (1982=1,..., 1991=10)
  • rr: replacement rate
  • rr2: rr squared
  • head: dummy, 1 if head of household

Linear probability model

regr y rr rr2 age age2 tenure slack abol seasonal head married dkids dykids smsa nwhite yrdispl school12 male statemb stateur

Interpretation (example):

  • male decreases the probability of receiving benefits by an estimated 3.6 percentage points,
  • married increases it by 4.9 percentage points
  • If the state unemployment rate increases by one percentage point, the probability of receiving benefits is reduced by 1.8 percentage points and so on

If you do

you see one problem. There is an individual for which the predicted probability of receiving benefits is 116%.

Logit

logit y rr rr2 age age2 tenure slack abol seasonal head married dkids dykids smsa nwhite yrdispl school12 male statemb stateur

  • Coefficient now have no interpretation. However, the sign and the p-value have their usual interpretation. The p-values are base on the Wald test.

  • All fitted values are in [0,1].
  • We see that \(l_1=-2873\) . We can find \(l_0=-3043\) by typing “logit y”. From this, we can confirm the Pseudo R2 = 0.0558 and the value of the LR statistic, 339.66
  • We can add vce(vcetype) to get robust standard errors, just as in the linear regression
  • We can use “test” as in the linear regression to test linear hypothesis using the Wald test
  • We can perform an LR test using the command “lrtest”.
    • First estimate the unrestricted model and save the estimates: (“estimates store unrestricted”)
    • Then estimate the restricted model and save the estimates: (“estimates store restricted”)
    • Do the LR-test: “lrtest unrestricted restricted”
  • For example, for the null hypothesis \(H_0: \) dkids = 0 we have \(ξ_W=3.36\) with a p-value of 0.0670 (“test dkids=0”) and \(ξ_{LR}=3.35\) with a p-value of 0.0672

Probit

  • Coefficient are not comparable between probit and logit. However, the sign and the p-values are.
  • Everything we said about logit holds for probit as well.