Interactions and Aliasing

Categorical variables may also be used to define interactions in a model formula on a FIT command. Assuming the categories are numbered from 1 to n, if the interaction involves one categorical and one continuous (noncategorical) variable, the model includes a parameter for each level of the categorical variable. For a subject in the  category, the  covariate is equal to the value of the continuous covariate for the subject, and all of the other covariates are zero. If both variables in an interaction are categorical with  and  levels, respectively, a total of nm covariates are added to the model. For a subject in which the first categorical variable has the value  and the second, the  covariate is 1 and the remaining  are zero. The indicator variables used when a categorical variable is included in a fit are created for each subject as the model is fit. Since these covariates do not use any of the data workspace, they will be called virtual covariates. The names for virtual covariates are generated by the programs. These names include the categorical variable name and a numeric suffix that indicates the level.

The model formula initialization operators (= and :) may be used with categorical variables or interactions involving categorical variables. In these cases the specified initial value applies to all parameters associated with the categorical variable. The PARAMETER command can be used to initialize specific parameters associated with a categorical variable.

When working with categorical variables, it may happen that some of the parameters are redundant, or aliased. For example, if sex is a categorical variable with two levels, the models

                                    (4.4)

and

                                    (4.5)

are equivalent. In any of the EPICURE regression programs, these models could be specified and fit using the commands FIT sex@ and FIT %CON sex@, respectively. The latter model contains a redundant parameter.

In this and most similar cases, EPICURE detects this form of aliasing, called intrinsic aliasing, before fitting the model and automatically drops one of the covariates from the model. In this example,  would be set equal to 0 and not be allowed to vary during the maximization, in which case the relationship between the parameters in the two models is  and .

In general, when EPICURE detects intrinsic aliasing, the parameters associated with the lowest levels of the categorical variables involved will be dropped. This method is similar to that used in GLIM, but it differs from the SAS approach. The default choice can be overridden by explicitly fixing the parameters associated with the parameters to be omitted from the model. This is done using the PARAMETER command as illustrated in the following examples. If a model contains aliased parameters that are not detected prior to fitting, they will be detected as the model is fit. This type of aliasing is called extrinsic aliasing. Extrinsic aliasing occurs when one parameter is a linear combination or a (1-1) function of other parameters in the model. The parameter summary for a fitted model indicates any aliased parameters detected during a fit.

We will use models (4.4) and (4.5) to make this discussion more concrete. If sex is a categorical variable, model (4.4) could be specified with the command

FIT sex @

This model contains no redundant parameters. On the other hand, the command

FIT %CON sex @

specifies model (4.5), which is equivalent to model (4.4) but includes a redundant parameter. Using its standard rules, the program determines that there is an extra parameter and drops the sex_1 parameter from the model (and the output summary). However, if the command

PARAMETER 3=0 @

used to fix the parameter associated with sex_2 at 0, the sex_1 parameter will be included in the model. The first argument in this command is the parameter number; the  operator is followed by the initial value. This operator is used to indicate that the indicated parameter is to be treated as fixed at the specified value throughout the iterations. The other PARAMETER command operators are :, >, and <, which indicate initialization without fixing, parameter minima, and parameter maxima, respectively. The parameter number can be obtained from the parameter summary table printed after a fit or from the model summary table printed after the MODEL command.