Data preparation in Stata: Creation of dummy variables

Sabre manual


Johnson and Albert (1999) analysed data on the grading of the same essay by five experts. Essays were graded on a scale of 1 to 10 with 10 being excellent. In this exercise we use the subset of the data limited to the grades from graders 1 to 5 on 198 essays (essays.dta). The same data were used by Rabe-Hesketh and Skrondal (2005, exercise 5.4).




Johnson, V. E., and Albert, J. H., (1999), Ordinal Data Modelling, Springer, New York.


Rabe-Hesketh, S., and Skrondal, A., (2005), Multilevel and Longitudinal Modelling using Stata, Stata Press, Stata Corp, College Station, Texas.


Data description


Number of observations: (rows): 990
Number of variables (columns): 11




essay: essay identifier (1,2,...,198}
grader: grader identifier {1,2,3,4,5}
grade: essay grade {1,2,...,10}
rating: essay rate {1,2,...,10}, not used in this exercise
constant: 1 for all observations, not used in this exercise
wordlength: average word length
sqrtwords: square root of the number of words in the essay
commas: number of commas times 100 and divided by the number of words in the essay
errors: percentage of spelling errors in the essay
prepos:  percentage of prepositions in the essay
sentlength:  average length of sentences in the essay



The first few lines of the Stata data set essays.dta


The essays.dta dataset contains a variable grade which gives the grading of essays on a scale of 1 to 10 (the highest grade given is actually 8 in this data set). If we want to create a grouping variables/binary indicator/dummy variable for those essays that obtained a grade of 5 or over, as compared to those essays that got less than 5 we would use the command


pass=1, if grade (5-10), 0 if grade (1-4)

we can also do this by using the commands


gen pass = 0
replace pass = 1 if grade >= 5


The variable grader which identifies different examiners and takes the values 1,2,3,4,5. To create dummy variables for examiners 2-5, we can use


gen grader2 = 0
replace grader2 = 1 if grader == 2
gen grader3 = 0
replace grader3 = 1 if grader == 3
gen grader4 = 0
replace grader4 = 1 if grader == 4
gen grader5 = 0
replace grader5 = 1 if grader == 5
save essays2, replace



The first few lines of the new data, essays2.dta


This data set can now be read directly into Sabre, see for example, Exercise C3.

Go to: Sabre home page | Sabre manual | Downloading & Installing Sabre | Sabre examples | Training materials | Sabre mailing list | Contact us

Other links: Centre for e-Science | Centre for Applied Statistics