In working with binomial data, one is interested in modeling
the probability of occurrence of an event of interest as a function of one or
more covariates. The most commonly used regression model for such data is the
logistic regression model in which the log odds,
(logit),
is modeled as a linear function of the covariates. As shown in Specification of More General Models, the default GMBO/PECAN model corresponds to logistic regression. Because of the generality of the models available in EPICURE, it is straightforward to use GMBO/PECAN to fit generalized models for the odds. However, in some cases it is useful to model functions other than the logit or log odds. The function describing the relationship between the binomial probability and the regression function is called the link. In addition to the default odds/logit link, it is also possible to use GMBO/PECAN to fit models that involve complementary log and identity links. The complementary log link is

which includes the complementary log-log link

as a special (default) case. The identity link is

In this section we will present examples illustrating the use of alternative link functions. These data were used by Wacholder (Wacholder 1986) to illustrate the use of alternatives to logistic regression. For our purposes, we will focus on the effect of alcohol consumption and ignore the smoking and socioeconomic status information in the data. The models to be fit are equivalent to

where
index three levels of alcohol
consumption. The first example involves a complementary log-log link, the second
an identity link, and the third a log link.
Example 5.7 Modeling the Odds: Changing the Default Subterm
The data are read and the alcohol, smoking, and socioeconomic variables are created with the following commands:
NAMES x n @
INPUT ../exdata/sholom.dat @
TRAN alcohol = 4 - GL(3,1) ; socio = GL(3,6) ; smoker = GL(2,3) @
LEVELS alcohol socio smoker @
CASES x
N n
The function GL(n,r) produces a variable with n distinct values, 1 to n. Each value is repeated r times. The whole sequence is repeated as many times as necessary. Thus, the command GL(2,3) leads to a variable whose first 8 values are 1,1,2,2,3,3,1,1. This results in a set of integer-coded variables that can be used as factors in the subsequent analyses.
In this example we will fit models of the form

that is, we will model the odds directly. It is not necessary to change the link function to do this. Rather, one simply uses the linear subterm of term 0 instead of the default log-linear subterm. This is easily done using the LINEAR 0 model specification command. It is also necessary to remove the default intercept from the log-linear subterm of term 0. Finally, it is necessary to provide a valid initial value for the intercept or any other parameter that directly estimates an odds. This is necessary because the default initial value (0.00) is not included inside the parameter space for such parameters. For these models, initial values must be positive.
The commands to specify and fit a simple model for the odds in this population are
LINEAR 0 %CON:1 @
FIT -%CON @
The first command specifies the model of interest and provides an acceptable starting value (1). The FIT command is used not only to fit the model but also to remove the default intercept from the log-linear subterm of term 0. An alternative command sequence with the same effect is
FITOPT LINEAR 0 @
LOGLINEAR 0 - %CON @
FIT %CON:1 @
In this command sequence, the FITOPT command is used to change the default subterm, that is, the term updated by model formulae given on a FIT command to be the linear subterm of term 0. The LOGLINEAR command is used to remove the default intercept. Finally, the model is specified and fit using the FIT command with an appropriate model formula. The parameter summary table produced by either set of commands is shown below.
Output 5.6 A direct model for the odds
Binomial odds regression
Additive excess relative risk T0 * (1 + T1 + T2 + ...)
x is used for cases
n is used for number of trials
Parameter Summary Table
# Name Estimate Std.Err. Test Stat. P value
-- ---------------------------- ---------- --------- ---------- --------
Linear term 0
1 %CON..................... 0.1222 0.01308 9.345 < 0.001
Records used 18
Deviance 33.2051
AIC 35.2051
Pearson Chi2 37.182 Degrees of freedom 17
We see that the odds is estimated to be 0.1222 with a standard error of 0.0131. This estimate corresponds to

Assuming we have changed the default subterm as indicated above, the commands
FIT + alcohol @
FIT -%CON @
fit a model with direct estimates of the odds for each alcohol category. The output below shows the results for the first model.
Output 5.7 An odds difference model
FIT + alcohol @
Iter Step Deviance
0 0 33.2051
1 0 23.9900
2 0 23.9635
3 0 23.9635
Binomial odds regression
Additive excess relative risk T0 * (1 + T1 + T2 + ...)
x is used for cases
n is used for number of trials
Parameter Summary Table
# Name Estimate Std.Err. Test Stat. P value
-- ---------------------------- ---------- --------- ---------- --------
Linear term 0
1 %CON..................... 0.09881 0.01465 6.746 < 0.001
3 alcohol_2................ 0.01230 0.03274 0.3756 > 0.5
4 alcohol_3................ 0.1117 0.04349 2.569 0.0102
Records used 18
Deviance 23.9635
AIC 29.9635
Pearson Chi2 23.8906 Degrees of freedom 15
Note that the effect of alcohol consumption at level 1 is not shown because this parameter is aliased. The intercept (%CON) is the maximum likelihood estimate for the odds in the first alcohol consumption category. The other two parameters in the model are the difference in group 1 and groups 2 and 3 odds, respectively. Thus, the odds for group 2 is estimated as 0.09881+0.01230=0.11111, while the estimated odds for group 3 is 0.21051. These estimates are obtained directly from the next model, in which the intercept is dropped.
Example 5.8 Using Generalized Models: Estimates of Odds Ratios
In this example we fit a model equivalent to the last model in the previous example; however, we consider an alternative parameterization in which we estimate one minus the odds ratios for alcohol consumption in groups 2 and 3 relative to group 1. The model to be fit can be written

In this model
is the odds for group 1
and
are the odds ratios. This model
involves two subterms in two terms. The intercept parameter,
, is included in the linear subterm of
term 0, while the odds ratio parameters are in the linear subterm of term
1.
The commands to specify and fit this model are
LINEAR 1 alcohol @
LINEAR 0 %CON:0.1 @
PARAMETER 2=0 @
FIT @
The first two commands are standard EPICURE model specification commands. Note that we specified an explicit initial value for the %CON parameter. The PARAMETER command is used to fix the parameter associated with alcohol group 1 to be 0. Since one of the parameters in the full model is aliased and since the intrinsic aliasing checks in EPICURE do not extend across terms or subterms, we use the PARAMETER command to explicitly fix the parameter associated with alcohol group 1 at 0. The program would fit the model and note that one parameter is aliased even if this were not done. However, interpretation of the parameters would not be straightforward. The output for this model is shown below.
Output 5.8 An odds ratio model
Binomial odds regression
Additive excess relative risk T0 * (1 + T1 + T2 + ...)
x is used for cases
n is used for number of trials
Parameter Summary Table
# Name Estimate Std.Err. Test Stat. P value
-- ---------------------------- ---------- --------- ---------- --------
Linear term 0
1 %CON..................... 0.09881 0.01465 6.746 < 0.001
Linear term 1
2 alcohol_1................ 0.000 Aliased
3 alcohol_2................ 0.1244 0.34 0.366 > 0.5
4 alcohol_3................ 1.131 0.521 2.17 0.03
Records used 18
Deviance 23.9635
AIC 29.9635
Pearson Chi2 23.8906 Degrees of freedom 15
The odds ratios for groups 2 and 3 are 1.12 and 2.13, respectively.
Example 5.9 Changing the Link: Complementary Log-Log Models
In this example we will fit the same models as in the last example, but we will make use of the complementary log link. With this link the regression model is

Because the default subterm is the log-linear subterm of term 0, the default model, that is, the model specified by model formulae given with the FIT command, is a complementary log-log model:

To work with the complementary log link, we must first change the link. This is done with the command
LINK COMP @
In addition to changing the link function, this command removes any parameters from the current model, sets the default subterm to be the log-linear subterm of term 0, and turns on the automatic intercept option. Thus, if the next command is a FIT command with no arguments, we fit the simplest complementary log-log model:

The estimate of
in this model is
-2.160, which is easily seen to be equivalent to the value obtained in Example 5.7 in
which the odds was estimated to be 0.1222.
We can fit alternative models that include alcohol effects with the commands
FIT alcohol @
FIT -%CON @
As in the examples in the last section, the alcohol parameters in the former model estimate contrasts (differences on the complementary log-log scale) between alcohol consumption groups 2 and 3 and consumption group 1. Dropping the intercept, as in the latter example, does not change the fit but leads to a different parameterization in which the parameter estimates correspond to the complementary log-log values for the three groups. The model summary table for this model is shown below.
Output 5.9 A complementary log-log model
Binomial regression with complementary log link
Additive excess relative risk T0 * (1 + T1 + T2 + ...)
x is used for cases
n is used for number of trials
Parameter Summary Table
# Name Estimate Std.Err. Test Stat. P value
-- ---------------------------- ---------- --------- ---------- --------
Log-linear term 0
1 alcohol_1................ -2.362 0.1415 -16.7 < 0.001
2 alcohol_2................ -2.250 0.2501 -8.997 < 0.001
3 alcohol_3................ -1.655 0.177 -9.349 < 0.001
Records used 18
Deviance 23.9635
AIC 29.9635
Pearson Chi2 23.8906 Degrees of freedom 15
As you can see from this figure, the model summary table
includes information on the current link and indicates the interpretation of the
default model. (This description is correct only for models that contain no
parameters in other subterms.) Note that the deviance is identical to that
obtained for the equivalent odds models shown in Output 5.7 and
Output 5.8. This
equivalence holds because we are working with simple main-effects models. It is
easy to compute the probabilities associated with three levels of alcohol
consumption. For example, the probability for consumption group 1 is 0.08993
. The probabilities for the other two
groups are 0.10003 and 0.17394, respectively.
Example 5.10 Using the Identity Link
In this example we fit the same models as in the last two examples using the identity link. With this link the regression model is

Because the default subterm is the linear subterm of term 0,
the default model is one in which the probabilities of interest are modeled as a
linear function of the parameters. As was the case with the odds models
considered earlier, there are implicit constraints on the parameter space
because
must be between 0 and 1. Thus,
maximum likelihood estimates may not exist for models of this type, which
include continuous covariates. Also, as was the case for the odds models, the
default initial parameter value (0) is not appropriate for main effects.
The LINK ID command is used to choose the identity link. In addition to changing the link function, this command removes any parameters from the current model, sets the default subterm to be the linear subterm of term 0, and turns on the automatic intercept option. The intercept is included in the linear subterm of term 0.
LINK ID @
The command
FIT %CON:0.1 @
can be used to obtain the maximum likelihood estimate of the
disease probability in this sample. The explicit initialization is necessary in
this example because the default initial value of 0 leads to estimates that are
not in the allowable range, and the maximization stops with an error in this
case. The estimated case probability is 0.1089 with a standard error of 0.01038.
These estimates are simply
and
where
is the number of cases and
the total number of subjects in
the study. In this example
is 98 and
is 900.
We can fit models that include alcohol effects with the commands
FIT alcohol @
FIT alcohol:0.1 - %CON @
In the first model, the intercept is the estimated probability of being a case in alcohol group 1, while the other two parameters are the differences between the probabilities in the other two groups and group 1. The fit summary table for this model is shown below.
Output 5.10 A probability difference model
Binomial regression with identity link
Additive excess relative risk T0 * (1 + T1 + T2 + ...)
x is used for cases
n is used for number of trials
Parameter Summary Table
# Name Estimate Std.Err. Test Stat. P value
-- ---------------------------- ---------- --------- ---------- --------
Linear term 0
1 %CON..................... 0.08993 0.01213 7.412 < 0.001
3 alcohol_2................ 0.01007 0.02664 0.3781 > 0.5
4 alcohol_3................ 0.08398 0.03046 2.757 0.00583
Records used 18
Deviance 23.9635
AIC 29.9635
Pearson Chi2 23.8906 Degrees of freedom 15
We see from this table that the group 2 probability is 0.010 greater than that in group 1, while the group 3 probability is 0.084 greater than that in group 1. The standard errors suggest that there is a significant difference in the case probabilities for groups 1 and 3. However, one should be wary of using Wald tests in nonstandard models such as this one. Likelihood ratio tests or likelihood-based bounds provide a more reliable guide to the statistical significance in these models. Using the commands
BOUNDS 3 @
BOUNDS 4 @
we obtain 95 percent likelihood bounds of (-0.038; 0.067) and (0.028; 0.15) for the two contrasts. The following series of commands can be used to obtain the likelihood ratio test of the hypothesis of no difference in the case probability for groups 1 and 3:
NULL @
PARAMETER 4=0 2=0 @
FIT @
LRT
The first command is used to indicate that the current (full) model is to be used as the comparison model in the test. The PARAMETER command is used to force the contrast for group 3 (parameter 4) to be 0. It is also necessary to fix parameter 2 at 0 to keep the program from fitting an alternative form of the current model. The PARAMETER command must be used here since alcohol is a categorical variable and we are only constraining some of the levels. The FIT command fits the constrained model, and the LRT command computes the likelihood ratio statistic. The output from this command indicates that the statistic is -9.080 with -1 degrees of freedom and a P-value of 0.0026. The statistic is negative because we used the full model as a base model in the comparison.