Data preparation in Stata: Creating data sets for bivariate models

Sabre manual


Vella and Verbeek (1998) analysed the male data from the Youth Sample of the US National Longitudinal Survey for the period 1980-1987. The number of young males in the sample is 545. The version of the data set (wagepan.dta) we use was obtained from Wooldridge (2002). The same data were used for modelling the wages and for separately modelling trade union membership by Rabe-Hesketh and Skrondal (2005, exercises 2.7 and 4.7). We start by looking at the data set (wagepan.dta) in a form appropriate for response panel models of log(wage) or union membership


Data description


Number of observations (rows): 4360
Number of variables (columns): 44




nr: person identifier
year: 1980 to 1987
black: 1 if respondent is black, 0 otherwise
exper: labour market experience (age-6-educ)
hisp: 1 if respondent is Hispanic, 0 otherwise
poorhlth: 1 if respondent has a health disability, 0 otherwise
married: 1 if respondent is married, 0 otherwise
nrthcen: 1 if respondent lives in the Northern Central part of the US, 0 otherwise
nrtheast: 1 if respondent lives in the North East part of the US, 0 otherwise
rur: 1 if respondent lives in a rural area, 0 otherwise
south: 1 if respondent lives in the South of the US, 0 otherwise
educ: years of schooling
union: 1 if the respondent is a member of a trade union, 0 otherwise
lwage: log of hourly wage in US dollars
d8m: 1 if the year is 198m, 0 otherwise, m=1,...,7


The first few rows and columns of wagepan.dta

The wagepan.dta dataset contains response variables union and lwage for use in univariate models of union membership and log wages respectively. To create a dataset suitable for a bivariate model of union membership and log wages, we can use the Stata commands


gen y1 = union
gen y2 = lwage
gen ij = _n
reshape long y, i(ij) j(r)
tab r, gen(r)
sort nr r ij
save union-wage, replace


This creates a single response variable y, a response indicator r and associated dummy variables r1 and r2. The dataset is sorted by individual nr, response indicator r and observation index ij.


The first lines and columns of union-wage.dta


This data set is used in Exercise L10.

Go to: Sabre home page | Sabre manual | Downloading & Installing Sabre | Sabre examples | Training materials | Sabre mailing list | Contact us

Other links: Centre for e-Science | Centre for Applied Statistics