Sabre

Data preparation in Stata: Creation of dummy variables

Sabre manual

Johnson and Albert (1999) analysed data on the grading of the same essay by five experts. Essays were graded on a scale of 1 to 10 with 10 being excellent. In this exercise we use the subset of the data limited to the grades from graders 1 to 5 on 198 essays (essays.dta). The same data were used by Rabe-Hesketh and Skrondal (2005, exercise 5.4).

References

Johnson, V. E., and Albert, J. H., (1999), Ordinal Data Modelling, Springer, New York.

Rabe-Hesketh, S., and Skrondal, A., (2005), Multilevel and Longitudinal Modelling using Stata, Stata Press, Stata Corp, College Station, Texas.

Data description

Number of observations: (rows): 990
Number of variables (columns): 11

Variables

essay: essay identifier (1,2,...,198}
rating: essay rate {1,2,...,10}, not used in this exercise
constant: 1 for all observations, not used in this exercise
wordlength: average word length
sqrtwords: square root of the number of words in the essay
commas: number of commas times 100 and divided by the number of words in the essay
errors: percentage of spelling errors in the essay
prepos:  percentage of prepositions in the essay
sentlength:  average length of sentences in the essay

The first few lines of the Stata data set essays.dta

The essays.dta dataset contains a variable grade which gives the grading of essays on a scale of 1 to 10 (the highest grade given is actually 8 in this data set). If we want to create a grouping variables/binary indicator/dummy variable for those essays that obtained a grade of 5 or over, as compared to those essays that got less than 5 we would use the command

we can also do this by using the commands

gen pass = 0
replace pass = 1 if grade >= 5

The variable grader which identifies different examiners and takes the values 1,2,3,4,5. To create dummy variables for examiners 2-5, we can use

save essays2, replace

The first few lines of the new data, essays2.dta

This data set can now be read directly into Sabre, see for example, Exercise C3.