Data preparation in Stata: Missing values

Sabre manual


Raudenbush and Bhumirat (1992) analysed data on children repeating a grade during their time at primary school. The data were from a national survey of primary education in Thailand in 1988, we use a sub set of that data here.




Raudenbush, S.W., Bhumirat, C., 1992. The distribution of resources for primary education and its consequences for educational achievement in Thailand, International Journal of Educational Research, 17, 143-164


Data description


Number of observations (rows): 8582
Number of variables (columns): 5




schoolid :  school identifier
sex: 1 if child is male, 0 otherwise
pped: 1 if the child had pre primary experience, 0 otherwise
repeat: 1 if the child repeated a grade during primary school, 0 otherwise
msesc: mean pupil socio economic status at the school level


The first few lines of thaieduc.dta


This shows that the thaieduc.dta dataset contains a variable msesc which has missing values. For models which do not use msesc, we can simply drop this variable from the dataset as follows


use thaieduc
drop msesc
save thaieduc1, replace

This dataset has 8,582 observations on 4 variables. For models which do use
msesc we need to drop all of the missing values. To do this, we can use


use thaieduc
drop if msesc ==.
save thaieduc2, replace


This dataset has 7,516 observations on 5 variables. This data set can now be read directly into Sabre, see for example, Example C3.

Go to: Sabre home page | Sabre manual | Downloading & Installing Sabre | Sabre examples | Training materials | Sabre mailing list | Contact us

Other links: Centre for e-Science | Centre for Applied Statistics