Exercise FOL2 Probit model of union membership of females, Stewart (2006)

This exercise uses the union data for US young women from the National Longitudinal Survey of Youth (NLSY) from the Stata manual (http://www.stata-press.com/data/r9/union.dta). We use the same subsample that was used by Stewart (2006) to illustrate his Stata program (redprob). To form this subsample Stewart (2006) uses only data from 1978 onwards; the data for 1983 are dropped, and only those individuals observed in each of the remaining 6 waves are kept. This gave a balanced panel with N = 799 individuals observed in each of 6 waves. The observations for 1985 and 1987 are implicitly treated as if they were for 1984 and 1986 respectively, which would give 6 waves at regular 2-year intervals. Trade union membership is determined by the question of whether of not the sampled individual had her wage set in a collective bargaining agreement or not.

References

Stewart, M.B., (2006), -redprob- A Stata program for the Heckman estimator of the random effects dynamic probit model, http://www2.warwick.ac.uk/fac/soc/economics/staff/faculty/stewart/stata/redprobnote.pdf

Conditional analysis

Data description (unionred1.dat)

Number of observations = 3995

Number of cases = 799

The variables include the following:

idcode=NLSY subject identifier code

year=interview year

age=age in current year

not_smsa=1 if living outside a standard metropolitan statistical area, 0 otherwise

south=1 if south, 0 otherwise

union=1 if wage is collectively negotiated, 0 otherwise

t0= year-70

southXt=1 if resident in south, 0 otherwise

black=1 if race black, 0 otherwise

tper=panel wave

lagunion=the value of union in the previous interval

d=2 for all responses, as all responses are post baseline.

d1=0 for all responses, as all responses are post baseline

d2=1 for all responses, as all responses are post baseline

baseunion=1 if union=1 in 1978, 0 otherwise

The first few rows and columns of unionred1.dat look like

Exercise

1)      Estimate a heterogeneous probit (level-2 with idcode, mass 24) model of trade union membership (union), with a constant and the lagged union membership variable (lagunion), age, grade, and southXt regressors.

2)      Add the initial condition of trade union membership in 1978 (baseunion) to the previous model. How does the inference on the lagged response (lagunion) and the scale effects differ between the two models.

Joint analysis of the initial condition and subsequent responses

Data description (unionred.dat)

Number of observations = 4794

Number of cases = 799

The variables are the same as unionred1.dat except that this time the variables

d, d1 and d2 take more values.

d=1, for the initial response, 2 if a subsequent response

d1=1 if d=1, 0 otherwise

d2=1 if d=2, 0 otherwise

The first few rows and columns of unionred.dat look like

Exercise

1)      Estimate a common random effect common scale joint probit model (mass 24) of trade union membership (union). Use constants in both linear predictors. Use the d1 and d2 dummy variables to set up the linear predictors. For the initial response use the regressors: age, grade, southXt and not_smsa. For the subsequent response use the regressors: lagged union membership variable (lagunion), age, grade, southXt. What does this model suggest about state dependence and unobserved heterogeneity?

2)      Re-estimate the model allowing the scale parameters for the initial and subsequent responses to be different. Is this a significant improvement over the common scale parameter model?

3)      To the different scale model add the initial or baseline response (baseunion). Does this make a significant improvement to the model?