Models fitted by Sabre: Introduction

Sabre is a program for the statistical analysis of multi-process event/response sequences. These responses can take the form of binary, ordinal, count and linear recurrent events. The response sequences can also be combinations of different types (e.g. linear {wages} and binary {trade union membership}). Such multi-process data are common in many research areas, e.g. in the analysis of work and life histories from the British Household Panel Survey or the German Socio-Economic Panel Study where researchers often want to disentangle state dependence (the effect of previous responses or related outcomes) from any omitted effects that might be present in recurrent behaviour (e.g. unemployment).

Sabre can also be used to model collections of single sequences such as may occur in medical trials, e.g. headaches and epileptic seizures (Crouchley and Davies, 1999, 2001), or clustered cross sectional data such as the educational attainment of children in schools. The class of models that can be estimated by Sabre may be called Multivariate Generalised Linear Mixed Models. These models have special features added to the standard multilevel models to help them disentangle state dependence from the incidental parameters (omitted or unobserved effects). The incidental parameters can be treated as random or fixed, the random effects models being estimated using normal Gaussian quadrature or Adaptive Gaussian quadrature. ‘End effects' can also be added to the models to accommodate ‘stayers’ or ‘non-susceptibles’. The fixed effects algorithm we have developed uses code for large sparse matrices from the Harwell Subroutine Library, see [1].

Sabre also includes the option to undertake all of the calculations using increased accuracy. This is important because numerical underflow and overflow often occur in the estimation process for models with incidental parameters. This feature does not seem to be available in other similar software [2, 3, 4, 5, 6].

Event History Models

An important type of discrete data occurs with the modelling of the duration to some re-specified event such as the duration in unemployment from the start of a spell of unemployment until the start of work, the time between shopping trips, or the time to first marriage. This type of discrete data has several important features. For instance, the duration or times to the events of interest are often not observed for all the sampled subjects or individuals. This often happens because the event of interest had not happened by the end of the observation window; when this happens we say that the spell was right censored.

The second important feature of social science duration data is that the temporal scale of most social processes is so large (months/years) that it is inappropriate to assume that the explanatory variables remain constant, e.g. in an unemployment spell, the local labour market unemployment rate will vary (at the monthly level) as the local and national economic conditions change. Other explanatory variables like the subject's age change automatically with time.

The third important feature of social science duration data occurs when the observation window cuts into an ongoing spell; this is called left censoring. We will assume throughout that left censoring is non-informative for event history models. The first 3 features of social science duration data can be accommodated using 2 and 3 level binary response models in Sabre.

The fourth important feature of duration data is that the spells can be of different types, e.g. the duration of a household in rented accommodation until they move to another rented property could have different characteristics to the duration of a household in rented accommodation until they become owner occupiers. This type of data can be modelled using competing risk models using multivariate generalised linear (binary response) models in Sabre.

In social science duration data we typically observe a spell over a sequence of intervals, e.g. weeks or months, so Sabre concentrates on the discrete-time methods. We are not reducing our modelling options by doing this, as durations measured at finer intervals of time such as days, hours, or even seconds can also be written out as a sequence of intervals. We can also group the data by using larger intervals (such as weeks or months) than those at which the durations are measured.

Event history data occur when we observe repeated duration events. If these events are of the same type, we have renewal data which can be modelled using univariate generalised linear (binary response) models in Sabre.

Stayers, Nonsusceptibles and Endpoints

There are empirical situations in which a subset of the population behave differently to those that follow the proposed generalised linear model. For instance, in a migration study we could observe a large group who do not move or migrate from a region over the study period. These observed non migrators could be made up of two distinct groups, those that consider migrating, but are not observed to do so during the observation period, and those that would never ever consider migrating (stayers). This phenomena can occur in various contexts, e.g. zero inflated Poisson Model (Bohning et al 1999), the mover-stayer model (Goodman, 1961) and in the competing risk context where the data could come from a population that consists of some subjects that are susceptible and others who are nonsusceptible to the events of interest.

It has also often been noted that the goodness-of-fit of mixture models like generalised linear mixed models can be improved by adding a spike to the parametric distribution for the random effects to explicitly represent stayers, giving 'spiked distributions' (Singer and Spillerman, 1976). Non parametric representations of the random effects distribution, Heckman and Singer (1984), Davies and Crouchley (1986), have the flexibility to accommodate stayers. However, non parametric random effects distributions can require a lot of parameters (locations and masses) to be estimated. Spiked distributions are available for binary response and Poisson models in Sabre.

State Dependence

Longitudinal and panel data on recurrent events are substantively important in social science research for two reasons. First, they provide some scope for extending control for variables that have been omitted from the analysis. For example, differencing provides a simple way of removing time constant effects (omitted and observed) from the analysis. Second, a distinctive feature of social science theory is that it postulates that behaviour and outcomes are typically influenced by previous behaviour and outcomes, that is, there is positive `feedback' (e.g. the McGinnis (1968) `axiom of cumulative inertia'). A frequently noted empirical regularity in the analysis of unemployment data is that those who were unemployed in the past or have worked in the past are more likely to be unemployed (or working) in the future (Heckman (2001, p. 706). Is this due to a causal effect of being unemployed (or working) or is it a manifestation of a stable trait (random effect)?

These two issues are related because inference about feedback effects are particularly prone to bias if the additional variation due to omitted variables (random effects) are ignored. With dependence upon previous outcome, the explanatory variables representing the previous outcome will, for structural reasons, normally be correlated with omitted explanatory variables and therefore always be subject to bias using conventional modelling methods. Understanding of this generic substantive issue dates back to the study of accident proneness by Bates and Neyman (1952) and has been discussed in many applied areas, including consumer behaviour (Massy et al. 1970) and voting behaviour (Davies and Crouchley, 1985).

An important attraction of longitudinal data is that, in principle, they make it possible to distinguish a key type of causality, namely state dependence, i.e. the dependence of current behaviour on earlier or related outcomes, from the confounding effects of unobserved heterogeneity {random effects}, or omitted variables and non-stationarity, i.e. changes in the scale and relative importance of the systematic relationships over time. Large sample sizes reduce the problems created by local maxima in disentangling the heterogeneity state dependence and non stationarity effects.

Most observational schemes for collecting panel and other longitudinal data commence with the process already under way. They will therefore tend to have an informative start; the initial observed response is typically dependent upon pre-sample outcomes and unobserved variables. In contrast to time series analysis and, as explained by Anderson and Hsiao (1981), Heckman (1981a,b), Bhargava and Sargan (1983) and others, failure to allow for this informative start when state dependence and random effects are present will prejudice consistent parameter estimation. Various treatments of the initial conditions problem for recurrent events with state dependence and random effects have been proposed, Crouchley and Davies (2001), Wooldridge (2005), Alfo and Aitkin (2006), Kazemi and Crouchley (2006), Stewart (2007). Sabre has special models that allow for these treatments of initial condition in 1^st order state dependence in generalised linear models.

Linear Model with Fixed Effects

The main objective of the random effects/multilevel modelling approach so far considered is the estimation of the regression parameters in the presence of the random effects or incidental parameters. This has been done by assuming that the incidental parameters are Gaussian distributed and by computing the expected behaviour of individuals j randomly sampled from this distribution (in other words by integrating the random effects out of the model). This approach will provide consistent estimates of the regression parameters so long as in the true model, the random effects are independent of the covariates.

An alternative approach is to estimate the model parameters by the usual maximum likelihood procedures using dummy variables for the incidental parameters. Hsiao (1986, section 3.2) shows that by using dummy variables for the incidental parameters in a linear model with time varying covariates we get the same estimates as those of the time demeaned model. Sabre has a fixed effects estimator for the linear model.

References

Alfò M., & Aitkin, M., (2006), Variance component models for longitudinal count data with baseline information: epilepsy data revisited. Statistics and Computing, Volume 16, 231-238

Anderson, T.W., & Hsiao, C., (1981), Estimation of dynamic models with error components, JASA, 76, 598-606.

Bhargava, A. & Sargan, J.D., (1983), Estimating dynamic random effects models from panel data covering short time periods, Econometrica, 51, 1635-1657.

Bates, G.E., and Neyman, J., (1952), Contributions to the theory of accident proneness, I, An optimistic model of the correlation between light and severe accidents, II, True or false contagion, Univ Calif, Pub Stat, 26, 705-720.

Bohning, D., Ekkehart Dietz E., Schlattmann, P., Mendonca L., and Kirchner, U., (1999), The Zero-Inflated Poisson Model and the Decayed, Missing and Filled Teeth Index in Dental Epidemiology, Journal of the Royal Statistical Society. Series A (Statistics in Society), Vol. 162, pp. 195-209

Crouchley, R. and Davies, R.B., (1999), A comparison of population average and random effect models for the analysis of longitudinal count data with base-line information, Journal of the Royal Statistical Society, Series A, 162, 331-347

Crouchley, R. and Davies, R.B., (2001), A comparison of GEE and random effects models for distinguishing heterogeneity, nonstationarity and state dependence in a collection of short binary event series, Statistical Modelling, 1, 271-285

Davies, R.B. and Crouchley, R., (1985), The determinants of party loyalty: a disaggregate analysis of panel data from the 1974 and 1979 General Elections in England, Political Geography Quarterly, 4, 307-320.

Davies, R., and Crouchley, R., (1986), The Mover-Stayer Model Requiescat in Pace, Sociological Methods and Research, 14, 356-380

Goodman, L.A., (1961), Statistical methods for the mover stayer model, Journal of the American Statistical Association, 56, 841-868.

Heckman J.J., (1981a), Statistical models for discrete panel data, In Manski, C.F. & McFadden, D, (eds), Structural Analysis of Discrete Data with Econometric Applications, MIT press, Cambridge, Mass.

Heckman J.J., (1981b), The incidental parameters problem and the problem of initial conditions in estimating a discrete time-discrete data stochastic process, In Manski, C.F. & McFadden, D, (eds), Structural Analysis of Discrete Data with Econometric Applications, MIT press, Cambridge, Mass.

Heckman J.J., (2001), "Micro data, heterogeneity and the evaluation of public policy: Nobel lecture", Journal of Political Economy, 109, 673---748.

Heckman, J.J., and Singer, B., (1984), A method for minimizing the impact of distributional assumptions in econometric models of duration data, Econometrica, 52, 271-320.

Hsiao, C., (1986), Analysis of Panel Data, Cambridge University Press, Cambridge.

Kazemi, I., & Crouchley, R., (2006), Modelling the initial conditions in dynamic regression models of panel data with random effects, Ch 4, in Baltagi, B.H., Panel Data Econometrics, theoretical Contributions and Empirical Applications, Elsevier, Amsterdam, Netherlands.

Massy, W.F., Montgomery, D.B., and Morrison, D.G., (1970), Stochastic models of buying behaviour, MIT Press, Cambridge, Mass.

McGinnis, R., (1968), A stochastic model of social mobility, American Sociological Review, 23, 712-722.

Singer, B., and Spillerman, S., (1976), Some methodological issues in the analysis of longitudinal surveys, Annals of Economic and Social Measurement, 5, 447-474.

Stewart, M.B., (2007), The interrelated dynamics of unemployment and low-wage employment, Journal of Applied Econometrics, Volume 22, 511-531

Wooldridge, J.M., (2005), Simple solutions to the initial conditions problem in dynamic, nonlinear panel data models with unobserved heterogeneity, Journal of Applied Econometrics, 20, 39-54.

URL Links

[1] http://www.cse.scitech.ac.uk/nag/hsl/

[2] http://cran.r-project.org/web/packages/lme4/index.html

[3] http://cran.r-project.org/web/packages/npmlreg/index.html

[4] http://www.stata.com/

[5] http://www.gllamm.org/

[6] http://www.sas.com/

Sabre

Sabre manual