Exercise 3LC3. Binary response model: Guatemalan immunisation of children (1595 mothers in 161 communities)


This exercise uses the Rodríguez and Goldman (2001) data on Guatemalan families, decisions whether or not to immunize their children.  The survey was conducted in 1987, in order to establish the effectiveness of the Guatemalan government’s campaign to immunize children against major childhood diseases. The questionnaire contains information on the immunization status of alive children born in the previous 5 years. If the child was more than 2 years old at the time of the interview they were old enough to be immunized during the 1986 campaign. The data set contains the binary response immun which represents whether the child was immunized (1 yes, 0 otherwise) for child i in family j (level 2), within community k (level 3). 


The same data were used by Rabe-Hesketh and Skrondal (2005, section 7.5).




Rodriguez, G., and Goldman, N., (2001), Improved estimation procedures for multilevel models with binary response: a case study. Journal of the Royal Statistical Society, A 164, 339–355.

Rabe-Hesketh, S., and Skrondal, A., (2005), Multilevel and Longitudinal Modelling using Stata, Stata Press, Stata Corp, College Station, Texas.


Data description


Number of observations = 2159

Number of level-2 cases (‘mom’ = identifier for mothers) = 1595

Number of level-3 cases (‘cluster’ = identifier for communities) = 161


The variables are:

kid   =   child identifier              

mom  =  identifier for mothers                 

cluster  =  identifier for communities                    

immun  =  whether the child was immunized (1 yes, 0 otherwise)      

kid2p= child aged 2-3 years

mom25p= mother aged 25+ years

order23= birth order 2-3

order46= birth order 4-6

order7p= birth order 7+

indnospa= indigenous, speaks no Spanish

inspa= indigenous, speaks Spanish

momedpri= mother's education primary

momedsec= mother's education secondary+

husedppri= husband's education primary

husedsec= husband's education secondary+

huseddk= husband's education missing

momwork  =  mother working (1 yes, 0 otherwise)                   

rural =  identifier for a rural community (1 yes, 0 otherwise)                          

pcind81 =   proportion indigenous in 1981         






The first few lines of guatemala_immun.dat look like:



Suggested exercise:


1.    Estimate a logit model (without random effects) with a constant for the binary response immun with the covariates kid2p, mom25p, order23, order46, order7p, indnospa, indspa, momedpri, momedsec, husedpri, husedsec, huseddk, momwork, rural and pcind81.

2.    Allow for the family random effect (mom), use mass 24. Is this random effect significant?

3.    Allow for both the level 2 family random effect (mom) and for the level 3 community random effects, use mass 24 for both levels. Are both these random effects significant? Is this model a significant improvement over the model estimated in part 2 of this exercise?

4.    How did your results on kid2p change when you allowed for mom-level (level 2) and then community-level (level 3) effects?

5.    Repeat the exercise using the cloglog link. Are there any inferential differences between the two sets of models? If so, what does this result tell you?