Data preparation in Stata: Grouped means

The data we use in this example are a sub-sample from the 1982 High School and Beyond Survey (Raudenbush, Bryk, 2002), and include information on 7,185 students nested within 160 schools: 90 public and 70 Catholic. Sample sizes vary from 14 to 67 students per school.




Raudenbush, S.W., Bryk, A.S., 2002, Heirarchical Linear Models, Thousand Oaks, CA. Sage.


Data description


Number of observations (rows): 7185
Number of variables (columns): 15




school: school identifier
student: student identifier
minority: 1 if student is from an ethnic minority, 0 otherwise
gender: 1 if student is female, 0 otherwise
ses: a standardized scale constructed from variables measuring parental education, occupation, and income, socio economic status
meanses: mean of the SES values for the students in this school
mathach: a measure of the students mathematics achievement
size : school enrolment
sector : 1 if school is from the Catholic sector, 0 otherwise
pracad : proportion of students in the academic track
disclim: a scale measuring disciplinary climate
himnty : 1 if more than 40% minority enrolment, 0 otherwise



The first few lines of hsb.dta

The hsb.dta dataset contains a variable ses for each student and the variable meanses which is the mean of the SES values for the students in this school. If this school level variable had not been made available with the data set it would need to be created. To create the mean value of 'ses' in Stata for each school based on the students in the sample, we would use the commands


sort school
by school: egen meanses2 = mean(ses)


This data set can be used in Sabre, see for example, Examples C1 and C2 ..


