Data preparation in Stata: Sorting datasets

Sabre manual


Garner and Raudenbush (1991) and Raudenbush and Bryk (2002) studied the role of school and neighbourhood effects on educational attainment. The data set they used (neighbourhood.dta) was for young people who left school between 1984 and 1986 from one Scottish Educational Authority. The same data were used by Rabe-Hesketh and Skrondal (2005, exercise 2.2).




Garner, C. L., and Raudenbush, S. W., (1991), Neighbourhood effects on educational attainment: A multilevel analysis of the influence of pupil ability, family, school and neighbourhood, Sociology of education, 64, 252-262.


Raudenbush, S. W., and Bryk, A. S., (2002), Hierarchical Linear Models, Sage, Thousand Oaks, CA.


Rabe-Hesketh, S., and Skrondal, A., (2005), Multilevel and Longitudinal Modelling using Stata, Stata Press, Stata Corp, College Station, Texas.


Data description


Number of observations (rows): 2310
Number of variables (columns): 12




neighid: respondent's neighbourhood identifier
schid: respondent's schools identifier
attain: respondent's combined end of school educational attainment as measured by grades from various exams
p7vrq:  respondent's verbal reasoning quotient as measured by a test at age 11-12 in primary school
p7read:  respondent's reading test score as measured by a test at age 11-12 in primary school
dadocc: respondent's father's occupation
dadunemp: 1 if respondent's father unemployed, 0 otherwise
daded: 1 if respondent's father was in full time education after age 15, 0 otherwise
momed: 1 if respondent's mother was in full time education after age 15, 0 otherwise
male: 1 if respondent is male, 0 otherwise
deprive:  index of social deprivation for the local community in which the respondent lived
dummy: 1 to 4; representing collections of the schools or neighbourhoods



The first few lines of neighborood.dta

The neighborhood.dta dataset could be used for random effects models at both the school and neighborhood levels. To obtain separate datasets for each level, with each one sorted on a particular variable (which will be specified as the case variable within Sabre) we can use


use neighborhood
sort schid
save neighborhood1, replace
sort neighid
save neighborhood2, replace



The first few lines of neighborood2.dta


This data set can now be read directly into Sabre, see for example, Exercise C2.


Go to: Sabre home page | Sabre manual | Downloading & Installing Sabre | Sabre examples | Training materials | Sabre mailing list | Contact us

Other links: Centre for e-Science | Centre for Applied Statistics