The EPICURE Regression Programs

The EPICURE regression modules are tools for working with general risk models arising in analyses of epidemiologic or experimental studies. In this section we describe the types of data, basic assumptions, and, in general terms, the types of models available in each of the regression modules.

All of the EPICURE modules have extensive capabilities for data entry and transformation, variable creation, and subset selection. In the discussions that follow we sometimes refer to a variable that is coded in a specific way, for example, 1 for cases and 0 for non-cases. However, because variables can be transformed or recoded from within any of the programs at any time, it is not necessary that variables be coded in any particular way in the input data file.

Summary statistics including sums, means, percentiles, and frequency tables for categorical variables are easily computed in any of the programs. These summary statistics may be computed within strata defined by one or more categorical variables. Simple scatter plots or histograms can be made in any of the programs. In addition, data and detailed information on the nature of the fitted models and parameter estimates can be written to files for use by other programs, such as high-resolution graphics software.

Each EPICURE regression module involves a statistical model for a quantity that takes on the values  for observations . The programs are used to describe the variation in  as a function of a covariate vector , model parameters, , and possibly stratum parameters denoted as . The meaning of  depends on the statistical model, but in many cases  can be interpreted, in some sense, as a measure of risk. The following list describes  for each of the programs:

GMBO/PECAN – probabilities, odds or functions of the odds for binomial data and odds ratios for case-control data

PEANUTS - the relative risk or hazard ratio modifying a nonparametric underlying hazard function for censored survival data

AMFIT - the Poisson mean or a piecewise constant hazard function for rates (grouped survival data)

The most commonly used risk regression models take the form

                                            

Such models are easily fit in the EPICURE programs, but these programs also make available several more general classes of models. These generalizations various extensions of the standard model such as

highly stratified models without the explicit use of stratum indicator variables

models in which the components of  are modeled as a product of linear and log-linear subterms

models that include additive and multiplicative joint effects of different risk factors

The first of these extensions makes it possible to fit models with large numbers of stratum parameters because these parameters are estimated using special algorithms that do not require the inversion of large matrices.

A specific example of the type of models allowed by the second extension is

                     

The third extension refers to the fact that EPICURE can be used to fit general relative risk models  in which the  are parameters or, in the case of PEANUTS, functions associated with user-specified strata and takes on various forms including

                              additive excess relative risk

                              multiplicative excess relative risk

                                additive relative risk,

where each of the terms  is a product of linear and loglinear parts (subterms). The  are parameters or, in the case of PEANUTS, functions associated with user-specified strata.  For binomial and count/rate table data, it is also possible to fit unstratified additive absolute risk models of the form

                                 

Covariates in the simple standard loglinear relative risk model,    can be specified as arguments to the FIT command. To specify more general models, you simply designate the term number and subterm type and list the covariates to be included in the subterm. Stratification is specified by providing a list of factors to be used for stratification.

As discussed below, the EPICURE regression modules can be used to model both time-dependent and time-independent risks. Risks may be modeled directly ( absolute risk model s) or relative to some baseline ( relative risk model s). The input data may be in the form of records for individuals, event-count tables, or event-time tables. The following table summarizes the features of the regression modules with regard to three factors: time-dependence of the response variable; type of risk function; and form of the input data.

 

MODULES

 

 

GMBO/PECAN

PEANUTS

AMFIT

Data Types

Binomial data

Matched case-control data

Individual survival time data

Count or rate data

Input Data

 

 

 

 

Individual records

×

×

×

 

Event-count Table

×

 

 

×

Event-time Table

 

 

 

×

Time-Dependence

 

 

 

 

Yes

 

 

×

×

No

×

×

 

 

Risk Function

 

 

 

 

Absolute

×

 

 

×

Relative

×

×

×

×

GMBO/PECAN is used for modeling binomial probabilities, odds, or odds ratios. The data for the ith subject consist of a binary outcome variable yi indicating cases or events and a covariate vector zi. The covariates may include both continuous and categorical variables. For matched case-controlstudies, as discussed below, one or more of the covariates must define the matched (risk) set or stratum to which each person belongs. For unmatched studies involving binary outcomes the data can also be grouped as , where  is the number of events in  trials.

The modules in the last two columns are used for modeling incidence rates or hazard functions for censored survival data. For these modules the data for the  subject consist of the follow-up time ; a binary indicator of whether or not the event of interest occurred, ; and a covariate vector, . The covariates may include both continuous and categorical variables, which can be time-dependent. PEANUTS works directly with the ungrouped survival data, while AMFIT makes use of data that have been grouped with respect to time. The grouped data have the form , where  is the number of cases and  is the total time at risk in time intervals.  DATAB can be used to carry out the grouping needed to prepare data for use in AMFIT. AMFIT can also be used to model Poisson means for count data.