A model is specified by indicating which covariates are
included in the subterms associated with specific terms. Terms are referenced by
a term number which may range from 0 to 4, while subterms are referred to by a
name associated with their functional form. Subterm types currently available
include linear (
), loglinear (
), and product linear (
) functions.
The covariates to be included in a subterm are described in a
model formula. A model formula may include variable names,
modification operators (
or
), the interaction operator (
), initial value operators (= or :), or
the special name %CON. Modification operators must be
followed by the name of a variable. The interaction operator must occur between
two variable names. Initial value operators must be preceded by a variable name
and followed by a number. Thus, the command LOGLINEAR 1
sex lage @ indicates that the loglinear subterm of term 1
is to contain only the covariates sex and lage. Any covariates
already included in the loglinear 1 subterm would be dropped. If, however, the
command had been LOGLINEAR 1 + sex lage @
then the sex and lage covariates would be added to the
currently defined subterm.
If a model formula begins with a
or a
, then the subterm is updated by first
dropping any covariates in the model formula whose names are preceded
with a
and then adding all other
covariates in the list to the subterm. If the model formula begins with a
variable name or is omitted, all covariates currently in the subterm are dropped
prior to adding the new covariates (if any) indicated in the model
formula.
Two-way interactions can be indicated in a model formula using
the interaction operator (
). None, one, or both variables
in an interaction may be categorical variables (defined with the LEVELS or TLEVEL commands). When a
model is fitted, categorical variables and interactions that include categorical
variables are expanded to the appropriate number of covariates. These programs
do not allow for the specification of three-way or higher order interactions in
model formulae.
The initial value operators are used to specify fixed (the = operator) or free (:) initial values for the variable or interaction whose name they follow. Thus, the command LOGLINEAR 0 %CON:1 sex=0 @ indicates that the constant term in the loglinear subterm of term 0 is to be assigned an initial value of 1 and the parameter associated with the sex covariates is to be fixed at 0. When the model is fitted, only the constant term will be estimated. For categorical variables or interactions, the initial value will be assigned to each of the associated covariates. The PARAMETER command (described in this section) can be used to assign initial values to specific parameters. %CON is used to indicate that a constant term (vector of ones) is to be included in the subterm.
Additional examples of model formula e will be given with the description of the subterm specification commands (LINEAR, LOGLINEAR, and PLINEAR). The EPICURE User's Guide contains annotated examples of complete model specifications.
Parameters are stored in order by term. Within a term, the subterms are ordered with the linear subterm first, followed by the loglinear subterm, followed by the linear product subterm.
Although the user may specify covariates within a subterm in any order, prior to working with a model, the program reorders the covariates within each subterm.
Parameters are ordered by term and within a term. Parameters in the linear subterm come first, followed by parameters in the loglinear subterm, followed by parameters in the linear product subterm. The order of parameters within a subterm is partially determined by the type of parameter. In any subterm the first parameter is the %CON parameter, which is associated with an implicit column of 1's in the design matrix. This is followed by the parameters associated with categorical variables that are, in turn, followed by interactions involving 2 categorical variables. Simple variables and products of simple variables come next. Interactions that involve simple and categorical variables come last.
For models that include categorical variables, mixtures, or factorial interactions, the programs will determine redundant (intrinsically aliased) parameters and exclude them from the fit. Because of the nonlinear nature of the generalized risk models available in EPICURE programs, the detection of intrinsic aliasing is not trivial. The programs will detect intrinsic aliasing within a subterm and in some cases across subterms. If a program fails to detect some intrinsically aliased parameters prior to fitting, the fitting algorithm will fix one or more parameters as necessary.
For some commands (PARAMETER and BOUNDS), it is necessary to refer to the number associated with a specific parameter. Since these numbers change as the model is modified, it is often useful to request a summary of the current model using the MODEL command.
In order to increase the usefulness of GMBO, it is possible to specify alternative functions linking the probability and regression function. This is done using the LINK command. The default link is the odds link in which

This model includes logistic regression as a special case. Indeed, the default model, in which parameters are included only in the loglinear subterm of term 0, is the logistic regression model.
The other link functions available in GMBO are the probit, complementary log and identity links.
The probit link is

where
is the inverse of the normal
cumulative distribution function. The default probit model is the standard
probit regression model:
The complementary log link can be written as

which includes the complementary log-log link

as a special (default) case. The identity link is simply

When this link is chosen, the default subterm is changed to the linear subterm of term 0. A log link can be obtained by forcing the default subterm to be the log-linear subterm of term 0. Working with the identity or log links can be difficult because the models do not explicitly constrain the estimated probabilities to the required range, that is, (0,1), and the default initial values lead to invalid estimates in many cases. The latter problem is easily overcome by explicit initialization of parameters (particularly main effects) to allowable values. That is values in the range 0 and 1 for the (linear) identity link, and values less than 0 when the log link is used.
In the remainder of this chapter we describe the syntax of the subterm definition commands (LINEAR, LOGLINEAR, PLINEAR), the geometric mixture model command, (GTERM); the model specification commands (RRISK, GMIX, and ADD); the stratification commands (STRATA and NOSTRATA); the parameter initialization and control command (PARAMETER); the model display command (MODEL); the command to clear the current model (NOMODEL), and the LINK command which specifies the function linking the regression function to the probability in GMBO.
Prior to describing these commands in detail, we will present commands which can be used to specify the models given in the beginning of this section. For these examples, it is assumed that the default fit options are in effect, that is, the product additive excess model with loglinear term 0 as the default subterm. It is also assumed that the models are to be specified completely rather than as an extension to the previous model.
In the PECAN example, the model can be defined and fit with the single command
FIT est gall gall*est @
The commands to specify and fit the first AMFIT example are:
LOGL 0 %CON sex lage @
LINE 1 dose @
LOGL 1 sex agex @
FIT @
For the second AMFIT example, it is necessary to indicate that an additive model is to be used. This can be done either before or after defining the subterms in the model. Since in this model, the logarithm of the linear dose coefficient is to be estimated, the coefficient for dose is initialized and fixed at 1.0. A set of commands to describe and fit this model are:
TRAN dose2 = dose * dose @
ADD @
LINE 0 brate @
LINE 1 dose=1 dose2 @
LOGL 1 %CON sex ltime @
FIT @
For the PEANUTS example, if it is assumed that a LEVELS has been given to indicate that stage is a categorical variable, the following commands can be used to specify and fit the model:
STRATA stage @
LINE 1 trmnt @
FIT agein @