Exercise 3LC3. Binary response model: Guatemalan immunisation of children (1595
mothers in 161 communities)
This exercise uses the Rodríguez and Goldman (2001) data on Guatemalan families, decisions whether or not to immunize their children. The survey was conducted in 1987, in order to establish the effectiveness of the Guatemalan government’s campaign to immunize children against major childhood diseases. The questionnaire contains information on the immunization status of alive children born in the previous 5 years. If the child was more than 2 years old at the time of the interview they were old enough to be immunized during the 1986 campaign. The data set contains the binary response immun which represents whether the child was immunized (1 yes, 0 otherwise) for child i in family j (level 2), within community k (level 3).
The same data were used by Rabe-Hesketh and Skrondal (2005, section 7.5).
References
Rodriguez, G., and Goldman, N., (2001), Improved estimation procedures for multilevel models with binary response: a case study. Journal of the Royal Statistical Society, A 164, 339–355.
Rabe-Hesketh, S., and Skrondal, A., (2005), Multilevel and Longitudinal Modelling using Stata, Stata Press, Stata Corp, College Station, Texas.
Data
description
Number of
observations = 2159
Number of
level-2 cases (‘mom’ = identifier for mothers) = 1595
Number of
level-3 cases (‘cluster’ = identifier for communities) = 161
The variables are:
kid = child identifier
mom = identifier for mothers
cluster = identifier for communities
immun = whether the child was immunized (1 yes, 0 otherwise)
kid2p= child aged 2-3 years
mom25p= mother aged 25+ years
order23= birth order 2-3
order46= birth order 4-6
order7p= birth order 7+
indnospa= indigenous, speaks no Spanish
inspa= indigenous, speaks Spanish
momedpri= mother's education primary
momedsec= mother's education secondary+
husedppri= husband's education primary
husedsec= husband's education secondary+
huseddk= husband's education missing
momwork = mother working (1 yes, 0 otherwise)
rural = identifier for a rural community (1 yes, 0 otherwise)
pcind81 = proportion indigenous in 1981
The first few lines of guatemala_immun.dat look like:
Suggested
exercise:
1. Estimate a logit model (without random effects) with a constant for the binary response immun with the covariates kid2p, mom25p, order23, order46, order7p, indnospa, indspa, momedpri, momedsec, husedpri, husedsec, huseddk, momwork, rural and pcind81.
2. Allow for the family random effect (mom), use mass 24. Is this random effect significant?
3. Allow for both the level 2 family random effect (mom) and for the level 3 community random effects, use mass 24 for both levels. Are both these random effects significant? Is this model a significant improvement over the model estimated in part 2 of this exercise?
4. How did your results on kid2p change when you allowed for mom-level (level 2) and then community-level (level 3) effects?
5. Repeat the exercise using the cloglog link. Are there any inferential differences between the two sets of models? If so, what does this result tell you?