|Data preparation in Stata: Creating data sets for bivariate models|
Vella and Verbeek (1998) analysed the male data from the Youth Sample of the US National Longitudinal Survey for the period 1980-1987. The number of young males in the sample is 545. The version of the data set (wagepan.dta) we use was obtained from Wooldridge (2002). The same data were used for modelling the wages and for separately modelling trade union membership by Rabe-Hesketh and Skrondal (2005, exercises 2.7 and 4.7). We start by looking at the data set (wagepan.dta) in a form appropriate for response panel models of log(wage) or union membership
Number of observations (rows): 4360
nr: person identifier
The first few rows and columns of wagepan.dta
The wagepan.dta dataset contains response variables union and lwage for use in univariate models of union membership and log wages respectively. To create a dataset suitable for a bivariate model of union membership and log wages, we can use the Stata commands
gen y1 = union
This creates a single response variable y, a response indicator r and associated dummy variables r1 and r2. The dataset is sorted by individual nr, response indicator r and observation index ij.
The first lines and columns of union-wage.dta
This data set is used in Exercise L10.
Other links: Centre for e-Science | Centre for Applied Statistics