Exercise L4, binary model


Vella and Verbeek (1998) analysed the male data from the Youth Sample of the US National Longitudinal Survey for the period 1980-1987. The number of young males in the sample is 545. The version of the data set (wagepan.dat) we use was obtained from Wooldridge (2002). The same data were used for modelling the binary response trade union membership by Rabe-Hesketh and Skrondal (2005, exercise 4.7).


Data description

Number of observations (rows): 4360

Number of variables (columns): 44


Description of the subset of variables we are going to use in this exercise:

nr= person identifier

year=1980 to 1987

black=1 if respondent is black,0 otherwise

exper=labour market experience (age-6-educ)

hisp=1 if respondent is Hispanic, 0 otherwise

poorhlth=1 if respondent has a health disability, 0 otherwise

married=1 if respondent is married, 0 otherwise

nrthcen=1 if respondent lives in the Northern Central part of the US, 0 otherwise

nrtheast=1 if respondent lives in the North East part of the US, 0 otherwsie

rur=1 if respondent lives in a rural area, 0 otherwise

south=1 if respondent lives in the South of the US, 0 otherwise

educ=years of schooling

union=1 if the respondent is a member of a trade union, 0 otherwise

d8m=1 if the year is 198m, 0 otherwise, m=1,,7


Some of the first few rows and columns of the data (wagepan.dat) looks like (this contains other variables not used in this exercise):




Start Sabre and specify transcript file:


out unionpan.log


data nr year agric black bus construc ent exper fin hisp poorhlth hours &

manuf married min nrthcen nrtheast occ1 occ2 occ3 occ4 occ5 occ6 occ7 &

occ8 occ9 per pro pub rur south educ tra trad union lwage d81 d82 d83 &

d84 d85 d86 d87 expersq

read wagepan.dat


Suggested exercise


(1) Estimate a logit model for trade union membership (union), without covariates.

(2) Allow for the respondent identifier (nr) random effect. Is this random effect significant? How many quadrature points should we use to estimate this model?

(3) Add the explanatory variables black, hisp, exper, educ, poorhlth and married. How does the magnitude of the nr random effect change? Are any of these individual characteristics significant in this model? Do the results make intuitive sense?

(4) Add the contextual explanatory variables rur,nrthcen, nrtheast, south. How does the magnitude of the individual specific random effects coefficient change? Are any of the contextual variables significant in this model? Do the new results make intuitive sense?

(5) Add the indicators for year d8m, m=1,,7. Are any of the year indicator variables significant in this model? Do the new results make intuitive sense?

(6) Create interaction effects between rur and nrthcen, nrtheast, and south and add them to the model. Are any of these new effects significant?

(7) How can the final model be simplified?

(8) Interpret your preferred model.





Rabe-Hesketh, S., and Skrondal, A., (2005), Multilevel and Longitudinal Modelling using Stata, Stata Press, Stata Corp, College Station, Texas


Vella, F., and Verbeek, M., (1998), Whose wages do unions raise? A dynamic model of unionism and wage rate determination for young men. Journal of Applied Econometrics, 13, 163-183.


Wooldridge, J. M., (2002), Econometric Analysis of Cross Section and Panel Data, MIT Press, Cambridge, MA