Data preparation in Stata: Reshaping a dataset with a baseline response

Sabre manual


In this example we illustrate how to stack a Stata data set (respiratory.dta), which comes with time constant covariates, and a baseline response. We shall suppose that you want to model the baseline jointly with the subsequent responses.

Koch et al (1989) analysed the clinical trial data from 2 centres that compared two groups for respiratory illness. Eligible patients were randomised to treatment or placebo groups at each centre. The respiratory status (ordered response) of each patient prior to randomisation and at 4 later visits to the clinic was determined.


The number of young patients in the sample is 110. The version of the data set (respiratory.dat) we use was also used by Rabe-Hesketh and Skrondal (2005, exercise 5.1).




Koch, G. G., Car, G. J., Amara, A., Stokes, M. E., and Uryniak, T. J., (1989), Categorical data analysis. In StateBerry, D., A., Statistical Methodology in the Pharmaceutical Sciences, pp 389-473, Marcel Dekker, New York.


Rabe-Hesketh, S., and Skrondal, A., (2005), Multilevel and Longitudinal Modelling using Stata, Stata Press, Stata Corp, College Station, Texas.


Data description


Number of observations (rows): 110
Number of variables (columns): 10




center: centre (1,2)
drug: 1 if patient was allocated to the treatment group, 0 if placebo
male: 1 if patient was male, 0 otherwise
age: patient's age
bl: patient's respiratory status prior to randomisation
v1: patient's respiratory status at visit 1 (0: terrible; 1: poor; 2: fair; 3: good; 4: excellent)
v2: patient's respiratory status at visit 2 (0: terrible; 1: poor; 2: fair; 3: good; 4: excellent)
v3: patient's respiratory status at visit 3 (0: terrible; 1: poor; 2: fair; 3: good; 4: excellent)
v4: patient's respiratory status at visit 4 (0: terrible; 1: poor; 2: fair; 3: good; 4: excellent)
patient: patient identifier (1,2,...,110)



The first few lines of respiratory.dta


The response variables bl, v1, v2, v3, v4 need to be stacked as a single column

The data is stacked by generating new variables y1-y5 representing the outcomes (baseline and 4 visits), indexing each observation as ij and then reshaping the data from wide to long format. The separate outcomes are thus converted into a single outcome variable y. A new variable r is also generated which indexes the responses and this can be converted into dummy variables r1-r5 by use of the 'tab r, gen(r)' command. The data is sorted by individual, response and observation index. The response indicator is used to generate new baseline and trend covariates from the original baseline measure. Finally, the new data is saved.


We also create a dummy variable bld=1 if status is from pre-randomisation and a linear trend variable, called trend =m if status is from vm, m=1,2,3,4. Further we create the variable base=bl for each row of post treatment data, 0 for the pre-randomisation data.


Stata Commands


use respiratory
gen y1 = bl
gen y2 = v1
gen y3 = v2
gen y4 = v3
gen y5 = v4

reshape long y, i(patient) j(r)
tab r, gen(r)
sort patient r

generate status=y+1
gen bld = r1
gen trend = r-1
gen base = 0
replace base = bl if r >= 2

save respiratory2, replace

The command generate status=y+1 is needed as the ordered response model command in Sabre only work on response variables that use 1 for label of the 1st  category.



The first few lines of the resulting data set, respiratory2.dta


This data set can now be read directly into Sabre, see for example Exercise L6.


Go to: Sabre home page | Sabre manual | Downloading & Installing Sabre | Sabre examples | Training materials | Sabre mailing list | Contact us

Other links: Centre for e-Science | Centre for Applied Statistics