Working with the Standard Model

Although interpretation of the models varies from program to program, all of the EPICURE risk regression programs fit models of the same general form. Commonly used risk models in epidemiologic studies have the form

                                        (4.1)

where    is some measure of risk such as a binomial odds, an odds ratio, or a hazard function ,  is a vector of covariates,   is a vector of parameters to be estimated, and  is a scaling or stratum parameter or function. For example, in the standard logistic regression model,   is the odds and  is usually 1, although it could represent a stratum parameter (stratification was discussed briefly in Using the EPICURE Programs, and will be discussed further below). In the proportional hazards, or Cox regression model,   is the hazard or instantaneous rate function and  is the baseline hazard that depends on time in some unspecified way. In stratified proportional hazards models there is a separate baseline hazard function for each stratum. In matched case-control studies,   is the odds ratio and  is equal to 1.

Models such as (4.1) are the standard model in all of the EPICURE regression programs. These models can be specified and fit using only the FIT command followed by a list of the names of the covariates to be included in the model. For some models, such as the unconditional odds ratio models in GMBO/PECAN or the piecewise constant hazard function models in AMFIT, one of the covariates in the model may be identically equal to 1. In regression, such a covariate is often referred to as the constant term or, more simply, the constant. The parameter associated with this covariate is usually called the intercept. Because we use term to describe components of more general models, a covariate that is always 1 will be referred to as a constant covariate. The reserved name %CON is used to indicate that a model is to include a constant covariate. As an example, suppose we want to use GMBO/PECAN to fit the logistic regression model

                                            (4.2)

in which dose is the name of one of the variables in the data set. This model can be defined and fit using the command

FIT %CON dose @

The list of covariates following the FIT command is called the model formula. If the FIT command is given without a model formula (FIT @) then the previous model, if any is refit. The effect of a FIT @ command when no model has been defined depends upon the type of analysis and, for AMFIT and unstratified models fit with GMBO/PECAN, the setting of the INTERCEPT option.  For stratified models in GMBO/PECAN, the null model is one in which the odds ratio is 1 in each matched set. In PEANUTS, the null model leads to the likelihood associated with the Nelson-Aalen estimate of the survival function (Fleming and Harrington 2011). (The Nelson-Aalen estimate can be computed using PEANUTS survival curve commands.) In GMBO/PECAN and AMFIT, when the INTERCEPT option is on, the null model is . In this model the probability (GMBO/PECAN) or hazard (AMFIT) is constant.

Model formulae may include operators. The operators are +, -, *, =, and :. The + and - operators are used to indicate that the covariate following the operator is to be added to (+) or deleted from (-) the current model. If a model is to be updated, the first item in the model formula must be either the + or the - operator. When a model is being updated, omission of + or - operators (except prior to the first variable in the model) is equivalent to +. When a variable already in the model is added, the parameter associated with that variable will be reinitialized.

The interaction operator, *, is used to indicate that a covariate computed as the product of the indicated covariates is to be included in the model. Only first-order interactions may be included in model formulae, that is, covariates of the form abc are not allowed. (If two of the variables are not defined as categorical described below, one could include this interaction in the model by using transformations to define new variables. In the previous example, if neither b nor c is categorical, we could use a transformation to define bc = bc and then include abc in the model formula.)

The = and : operators are initialization operators, which follow a covariate name and precede a number. The number is used as the initial value for all parameters associated with the preceding covariate, which may be a factor or an interaction. The = operator is used when the parameters are to be fixed at their initial values and not allowed to vary during the estimation process. As the examples will show, this is useful if one wishes to obtain a (score) test of the hypothesis that the parameter in question is equal to the specified value. The : operator is used to indicate that the parameter is to be initialized to the specified value but will be estimated in the fit. If an initial value is not specified, 0 is used.

As an example, consider updating model by adding two covariates, dose squared and another covariate, called afe, and fixing the parameter associated with dose squared at 0. The FIT command to accomplish this is

FIT +dose*dose=0 afe @

After this FIT command, the model is

                                  (4.3)

Note that the parameter associated with the dosedose covariate is treated as the third parameter even though it is fixed.