Exercise L4, binary model
Vella and Verbeek (1998) analysed the male data from the Youth Sample of the US National Longitudinal Survey for the period 1980-1987. The number of young males in the sample is 545. The version of the data set (wagepan.dat) we use was obtained from Wooldridge (2002). The same data were used for modelling the binary response trade union membership by Rabe-Hesketh and Skrondal (2005, exercise 4.7).
Data description
Number of observations (rows): 4360
Number of variables (columns): 44
Description of the subset of variables we are going to use in this exercise:
nr= person identifier
year=1980 to 1987
black=1 if respondent is black,0 otherwise
exper=labour market experience (age-6-educ)
hisp=1 if respondent is Hispanic, 0 otherwise
poorhlth=1 if respondent has a health disability, 0 otherwise
married=1 if respondent is married, 0 otherwise
nrthcen=1 if respondent lives in the Northern Central part of the
nrtheast=1 if respondent lives in the North East part of the
rur=1 if respondent lives in a rural area, 0 otherwise
south=1 if respondent lives in the South of the
educ=years of schooling
union=1 if the respondent is a member of a trade union, 0
otherwise
d8m=1 if the year is 198m, 0
otherwise, m=1,…,7
Some of the first few rows and columns of the data (wagepan.dat) looks like (this contains other variables not used in this exercise):
Start Sabre and specify transcript file:
out unionpan.log
data nr year agric black bus construc ent exper
fin hisp poorhlth hours
&
manuf married min nrthcen nrtheast occ1 occ2 occ3
occ4 occ5 occ6 occ7 &
occ8 occ9 per
pro pub rur south educ tra trad union lwage d81 d82 d83 &
d84 d85 d86
d87 expersq
read wagepan.dat
Suggested exercise
(1) Estimate a logit model for trade union membership (union), without covariates.
(2) Allow for the respondent identifier (nr) random effect. Is this random effect significant? How many quadrature points should we use to estimate this model?
(3) Add the explanatory variables black, hisp, exper, educ, poorhlth and married. How does the magnitude of the nr random effect change? Are any of these individual characteristics significant in this model? Do the results make intuitive sense?
(4) Add the contextual explanatory variables rur,nrthcen, nrtheast, south. How does the magnitude of the individual specific random effects coefficient change? Are any of the contextual variables significant in this model? Do the new results make intuitive sense?
(5) Add the indicators for year d8m, m=1,…,7. Are any of the year indicator variables significant in this model? Do the new results make intuitive sense?
(6) Create interaction effects between rur and nrthcen, nrtheast, and south and add them to the model. Are any of these new effects significant?
(7) How can the final model be simplified?
(8) Interpret your preferred model.
References
Rabe-Hesketh, S., and Skrondal, A., (2005), Multilevel and Longitudinal Modelling using Stata, Stata Press, Stata Corp, College Station, Texas
Vella, F., and Verbeek, M., (1998), Whose wages do unions raise? A dynamic model of unionism and wage rate determination for young men. Journal of Applied Econometrics, 13, 163-183.
Wooldridge, J. M., (2002), Econometric Analysis of Cross
Section and Panel Data, MIT Press,