## Sabre |

Models fitted by Sabre: Introduction | |

## Sabre manual |
Sabre
is a program for the statistical analysis of multi-process event/response
sequences. These responses can take the form of binary, ordinal, count and
linear recurrent events. The response sequences can also be combinations of
different types (e.g. linear {wages} and binary {trade union membership}). Such
multi-process data are common in many research areas, e.g. in the analysis of
work and life histories from the British Household Panel Survey or the German
Socio-Economic Panel Study where researchers often want to disentangle state
dependence (the effect of previous responses or related outcomes) from any
omitted effects that might be present in recurrent behaviour (e.g.
unemployment). Sabre
can also be used to model collections of single sequences such as may occur in
medical trials, e.g. headaches and epileptic seizures (Crouchley
and Davies, 1999, 2001), or clustered cross sectional data such as the
educational attainment of children in schools. The class of models that can be
estimated by Sabre may be called Multivariate Generalised Linear Mixed Models.
These models have special features added to the standard multilevel models to
help them disentangle state dependence from the incidental parameters (omitted
or unobserved effects). The incidental parameters can be treated as random or
fixed, the random effects models being estimated using normal Gaussian quadrature or Adaptive Gaussian quadrature.
‘End effects' can also be added to the models to accommodate ‘stayers’ or ‘non-susceptibles’.
The fixed effects algorithm we have developed uses code for large sparse
matrices from the Harwell Subroutine Library, see [1]. Sabre
also includes the option to undertake all of the calculations using increased
accuracy. This is important because numerical underflow and overflow often
occur in the estimation process for models with
incidental parameters. This feature does
not seem to be available in other similar software [2, 3, 4, 5, 6].
An
important type of discrete data occurs with the modelling of the duration to
some re-specified event such as the duration in unemployment from the start of
a spell of unemployment until the start of work, the time between shopping
trips, or the time to first marriage. This type of discrete data has several
important features. For instance, the duration or times to the events of
interest are often not observed for all the sampled subjects or individuals.
This often happens because the event of interest had not happened by the end of
the observation window; when this happens we say that the spell was right
censored. The
second important feature of social science duration data is that the temporal
scale of most social processes is so large (months/years) that it is
inappropriate to assume that the explanatory variables remain constant, e.g. in
an unemployment spell, the local labour market unemployment rate will vary (at
the monthly level) as the local and national economic conditions change. Other
explanatory variables like the subject's age change automatically with time. The
third important feature of social science duration data occurs when the
observation window cuts into an ongoing spell; this is called left censoring.
We will assume throughout that left censoring is non-informative for event
history models. The first 3 features of social science duration data can be
accommodated using 2 and 3 level binary response models in Sabre. The
fourth important feature of duration data is that the spells can be of
different types, e.g. the duration of a household in rented accommodation until
they move to another rented property could have different characteristics to
the duration of a household in rented accommodation until they become owner
occupiers. This type of data can be modelled using competing risk models using
multivariate generalised linear (binary response) models in Sabre. In
social science duration data we typically observe a spell over a sequence of
intervals, e.g. weeks or months, so Sabre concentrates on the discrete-time
methods. We are not reducing our modelling options by doing this, as durations
measured at finer intervals of time such as days, hours, or even seconds can
also be written out as a sequence of intervals. We can also group the data by
using larger intervals (such as weeks or months) than those at which the
durations are measured. Event
history data occur when we observe repeated duration events. If these events are
of the same type, we have renewal data which can be modelled using univariate generalised linear (binary response) models in
Sabre.
There
are empirical situations in which a subset of the population behave differently
to those that follow the proposed generalised linear model. For instance, in a
migration study we could observe a large group who do not move or migrate from
a region over the study period. These observed non migrators
could be made up of two distinct groups, those that consider migrating, but are
not observed to do so during the observation period, and those that would never
ever consider migrating (stayers). This phenomena can
occur in various contexts, e.g. zero inflated Poisson Model (Bohning et al 1999), the mover-stayer
model (Goodman, 1961) and in the competing risk context where the data could
come from a population that consists of some subjects that are susceptible and
others who are nonsusceptible to the events of
interest. It
has also often been noted that the goodness-of-fit of mixture models like generalised
linear mixed models can be improved by adding a spike to the parametric
distribution for the random effects to explicitly represent stayers,
giving 'spiked distributions' (Singer and Spillerman,
1976). Non parametric representations of the random effects distribution,
Heckman and Singer (1984), Davies and Crouchley
(1986), have the flexibility to accommodate stayers.
However, non parametric random effects distributions can require a lot of
parameters (locations and masses) to be estimated. Spiked distributions are
available for binary response and Poisson models in Sabre.
Longitudinal
and panel data on recurrent events are substantively important in social
science research for two reasons. First, they provide some scope for extending
control for variables that have been omitted from the analysis. For example,
differencing provides a simple way of removing time constant effects (omitted
and observed) from the analysis. Second, a distinctive feature of social
science theory is that it postulates that behaviour and outcomes are typically
influenced by previous behaviour and outcomes, that is, there is positive
`feedback' (e.g. the McGinnis (1968) `axiom of cumulative inertia'). A
frequently noted empirical regularity in the analysis of unemployment data is
that those who were unemployed in the past or have worked in the past are more
likely to be unemployed (or working) in the future (Heckman (2001, p. 706). Is
this due to a causal effect of being unemployed (or working) or is it a
manifestation of a stable trait (random effect)? These
two issues are related because inference about feedback effects are
particularly prone to bias if the additional variation due to omitted variables
(random effects) are ignored. With dependence upon previous outcome, the
explanatory variables representing the previous outcome will, for structural
reasons, normally be correlated with omitted explanatory variables and
therefore always be subject to bias using conventional modelling methods.
Understanding of this generic substantive issue dates back to the study of
accident proneness by Bates and Neyman (1952) and has
been discussed in many applied areas, including consumer behaviour (Massy et
al. 1970) and voting behaviour (Davies and Crouchley,
1985). An
important attraction of longitudinal data is that, in principle, they make it
possible to distinguish a key type of causality, namely state dependence, i.e.
the dependence of current behaviour on earlier or related outcomes, from the
confounding effects of unobserved heterogeneity {random effects}, or omitted
variables and non-stationarity, i.e. changes in the
scale and relative importance of the systematic relationships over time. Large
sample sizes reduce the problems created by local maxima in disentangling the heterogeneity
state dependence and non stationarity effects. Most
observational schemes for collecting panel and other longitudinal data commence
with the process already under way. They will therefore tend to have an
informative start; the initial observed response is typically dependent upon
pre-sample outcomes and unobserved variables. In contrast to time series
analysis and, as explained by Anderson and Hsiao (1981), Heckman (1981a,b), Bhargava and Sargan (1983) and
others, failure to allow for this informative start when state dependence and random
effects are present will prejudice consistent parameter estimation. Various
treatments of the initial conditions problem for recurrent events with state
dependence and random effects have been proposed, Crouchley
and Davies (2001), Wooldridge (2005), Alfo and Aitkin
(2006), Kazemi and Crouchley
(2006), Stewart (2007). Sabre has special models that allow for these
treatments of initial condition in 1
The
main objective of the random effects/multilevel modelling approach so far
considered is the estimation of the regression parameters in the presence of
the random effects or incidental parameters. This has been done by assuming that the
incidental parameters are Gaussian distributed and by computing the expected
behaviour of individuals j randomly sampled from this distribution (in other
words by integrating the random effects out of the model). This approach will
provide consistent estimates of the regression parameters so long as in the
true model, the random effects are independent of the covariates. An
alternative approach is to estimate the model parameters by the usual maximum
likelihood procedures using dummy variables for the incidental parameters. Hsiao
(1986, section 3.2) shows that by using dummy variables for the incidental
parameters in a linear model with time varying covariates we get the same estimates
as those of the time demeaned model. Sabre has a fixed effects estimator for
the linear model.
Alfň M.,
& Aitkin, M., (2006), Variance component models for longitudinal count data
with baseline information: epilepsy data revisited. Anderson, T.W., & Hsiao, C., (1981), Estimation of
dynamic models with error components, Bhargava, A. & Sargan, J.D., (1983), Estimating
dynamic random effects models from panel data covering short time periods, Bates, G.E., and Neyman, J., (1952),
Contributions to the theory of accident proneness, I, An optimistic model of
the correlation between light and severe accidents, II, True or false
contagion, Bohning, D., Ekkehart Dietz
E., Schlattmann, P., Mendonca
L., and Kirchner, U., (1999), The Zero-Inflated Poisson Model and the Decayed,
Missing and Filled Teeth Index in Dental Epidemiology, Crouchley, R.
and Davies, R.B., (1999), A comparison of population average and random effect
models for the analysis of longitudinal count data with base-line information, Crouchley, R. and Davies, R.B., (2001), A
comparison of GEE and random effects models for distinguishing heterogeneity, nonstationarity and state dependence in a collection of
short binary event series, Davies, R.B. and Crouchley, R., (1985), The determinants of party loyalty: a
disaggregate analysis of panel data from the 1974 and 1979 General Elections in
England, Davies, R., and Crouchley,
R., (1986), The Mover-Stayer Model Requiescat in
Pace, Goodman, L.A., (1961), Statistical methods for the
mover stayer model, Heckman J.J., (1981a), Statistical models for discrete
panel data, In Manski, C.F. & McFadden, D, (eds), Heckman J.J., (1981b), The incidental parameters
problem and the problem of initial conditions in estimating a discrete
time-discrete data stochastic process, In Manski,
C.F. & McFadden, D, (eds), Heckman J.J., (2001), "Micro data, heterogeneity
and the evaluation of public policy: Nobel lecture", Heckman, J.J., and Singer, B., (1984), A method for
minimizing the impact of distributional assumptions in econometric models of
duration data, Hsiao, C., (1986), Kazemi, I., & Crouchley, R., (2006), Modelling the initial conditions in dynamic regression
models of panel data with random effects, Ch 4, in Baltagi,
B.H., Massy, W.F., Montgomery, D.B., and Morrison, D.G., (1970), McGinnis, R., (1968), A stochastic model of social
mobility, Singer, B., and Spillerman,
S., (1976), Some methodological issues in the analysis of longitudinal surveys,
Stewart, M.B., (2007), The interrelated dynamics of
unemployment and low-wage employment, Wooldridge, J.M., (2005), Simple solutions to the
initial conditions problem in dynamic, nonlinear panel data models with
unobserved heterogeneity
[1] http://www.cse.scitech.ac.uk/nag/hsl/ [2]
http://cran.r-project.org/web/packages/lme4/index.html [3]
http://cran.r-project.org/web/packages/npmlreg/index.html |

Other links: Centre for e-Science | Centre for Applied Statistics