sabrelogo

Sabre

Data preparation in Stata: Creating data sets for bivariate models

Sabre manual

 

Vella and Verbeek (1998) analysed the male data from the Youth Sample of the US National Longitudinal Survey for the period 1980-1987. The number of young males in the sample is 545. The version of the data set (wagepan.dta) we use was obtained from Wooldridge (2002). The same data were used for modelling the wages and for separately modelling trade union membership by Rabe-Hesketh and Skrondal (2005, exercises 2.7 and 4.7). We start by looking at the data set (wagepan.dta) in a form appropriate for response panel models of log(wage) or union membership

 

Data description

 

Number of observations (rows): 4360
Number of variables (columns): 44

 

Variables

 

nr: person identifier
year: 1980 to 1987
black: 1 if respondent is black, 0 otherwise
exper: labour market experience (age-6-educ)
hisp: 1 if respondent is Hispanic, 0 otherwise
poorhlth: 1 if respondent has a health disability, 0 otherwise
married: 1 if respondent is married, 0 otherwise
nrthcen: 1 if respondent lives in the Northern Central part of the US, 0 otherwise
nrtheast: 1 if respondent lives in the North East part of the US, 0 otherwise
rur: 1 if respondent lives in a rural area, 0 otherwise
south: 1 if respondent lives in the South of the US, 0 otherwise
educ: years of schooling
union: 1 if the respondent is a member of a trade union, 0 otherwise
lwage: log of hourly wage in US dollars
d8m: 1 if the year is 198m, 0 otherwise, m=1,...,7

 

The first few rows and columns of wagepan.dta

The wagepan.dta dataset contains response variables union and lwage for use in univariate models of union membership and log wages respectively. To create a dataset suitable for a bivariate model of union membership and log wages, we can use the Stata commands

 

gen y1 = union
gen y2 = lwage
gen ij = _n
reshape long y, i(ij) j(r)
tab r, gen(r)
sort nr r ij
save union-wage, replace

 

This creates a single response variable y, a response indicator r and associated dummy variables r1 and r2. The dataset is sorted by individual nr, response indicator r and observation index ij.

 

The first lines and columns of union-wage.dta

 

This data set is used in Exercise L10.

Go to: Sabre home page | Sabre manual | Downloading & Installing Sabre | Sabre examples | Training materials | Sabre mailing list | Contact us

Other links: Centre for e-Science | Centre for Applied Statistics