Exercise C3, binary response model

Johnson and Albert (1999) analysed data on the grading of the same essay by five experts. Essays were graded on a scale of 1 to 10 with 10 being excellent. In this exercise we use the subset of the data limited to the grades from graders 1 to 5 on 198 essays (essays2.dat). The same data were used by Rabe-Hesketh and Skrondal (2005, exercise 5.4).

Data description

Number of observations: (rows) 990

Number of variables (columns): 16

Variables:

essay= essay identifier (1,2,…,198}

rating= essay rate {1,2,…,10}, not used in this exercise

constant=1 for all observations, not used in this exercise

wordlength=average word length

sqrtwords=square root of the number of words in the essay

commas= number of commas times 100 and divided by the number of words in the essay

errors=percentage of spelling errors in the essay

prepos=percentage of prepositions in the essay

sentlength= average length of sentences in the essay

The first few lines of the data (essays2.dat) look like:

Start Sabre and specify transcript file:

out essays.log

Suggested exercise:

(1) Fit a binary probit model to pass but without any random effects

(2) Fit a binary probit model allowing for the essay random effect, is the essay effect significant? How many quadrature points should we use to estimate this model?

(3) Add the 4 grader dummy variables to the model, what are the differences between the graders?

(4) Add the 6 essay characteristics to the previous model. Which of them are significant? How has including the essay characteristics improved the model?

(5) Create interaction effects between the grader specific dummy variables and the sqrtwords explanatory variable and add these effects to the model. What do the results tell you?

References

Johnson, V. E., and Albert, J. H., (1999), Ordinal Data Modelling, Springer, New York.

Rabe-Hesketh, S., and Skrondal, A., (2005), Multilevel and Longitudinal Modelling using Stata, Stata Press, Stata Corp, College Station, Texas