|Data preparation in Stata: Reshaping a dataset with a baseline response|
In this example we illustrate how to stack a Stata data set (respiratory.dta), which comes with time constant covariates, and a baseline response. We shall suppose that you want to model the baseline jointly with the subsequent responses.
Koch et al (1989) analysed the clinical trial data from 2 centres that compared two groups for respiratory illness. Eligible patients were randomised to treatment or placebo groups at each centre. The respiratory status (ordered response) of each patient prior to randomisation and at 4 later visits to the clinic was determined.
The number of young patients in the sample is 110. The version of the data set (respiratory.dat) we use was also used by Rabe-Hesketh and Skrondal (2005, exercise 5.1).
Koch, G. G., Car, G. J., Amara, A., Stokes, M. E., and Uryniak, T. J., (1989), Categorical data analysis. In StateBerry, D., A., Statistical Methodology in the Pharmaceutical Sciences, pp 389-473, Marcel Dekker, New York.
Rabe-Hesketh, S., and Skrondal, A., (2005), Multilevel and Longitudinal Modelling using Stata, Stata Press, Stata Corp, College Station, Texas.
Number of observations (rows): 110
center: centre (1,2)
The first few lines of respiratory.dta
The response variables bl, v1, v2, v3, v4 need to be stacked as a single column
The data is stacked by generating new variables y1-y5 representing the outcomes (baseline and 4 visits), indexing each observation as ij and then reshaping the data from wide to long format. The separate outcomes are thus converted into a single outcome variable y. A new variable r is also generated which indexes the responses and this can be converted into dummy variables r1-r5 by use of the 'tab r, gen(r)' command. The data is sorted by individual, response and observation index. The response indicator is used to generate new baseline and trend covariates from the original baseline measure. Finally, the new data is saved.
We also create a dummy variable bld=1 if status is from pre-randomisation and a linear trend variable, called trend =m if status is from vm, m=1,2,3,4. Further we create the variable base=bl for each row of post treatment data, 0 for the pre-randomisation data.
The command generate status=y+1 is needed as the ordered response model command in Sabre only work on response variables that use 1 for label of the 1st category.
The first few lines of the resulting data set, respiratory2.dta
This data set can now be read directly into Sabre, see for example Exercise L6.
Other links: Centre for e-Science | Centre for Applied Statistics