Subset Selection

In EPICURE it is easy to analyze interesting subsets of the data. As noted in section Fitting Models to Subsets of the Data, the BY option of the FIT command makes it easy to fit separate models to distinct subsets of the full data set indexed by one or more categorical variables. However, in some cases one is interested in a specific subset and does not want to fit models to each stratum defined by some categorical variables. The SELECT command makes it easy to select a specific subset of the data for use in an analysis. For example, if we wanted to fit some models to a subset of the data in which the variable dose is greater than 0 and the variable age is less than or equal to 75, we would use the following command to specify the records to be included:

SELECT dose > 0 AND age ≤ 75 @

When a selection has been made, the selected records are referred to as the active subset. Only records in the active subset are used in a fit or any other analyses (for example, sums, means) or written following output requests made with the DATA or SAVE command. Note, however, that transformations involving variables are applied to all records whether or not they are in the active subset, but assignments to named constants are made only for records in the active subset. (Named constants are constants that have been given a symbolic name. Named constants are distinguished from variables by the first character of their name, which must be #.) NOSELECT or SELECT @ commands may be used to make the full data set active.

The SELECT command is not cumulative. The syntax of the selection string is limited. In particular, the only allowable comparisons are those in which a variable is compared to a number. Both AND and OR may be used as logical operators; however, the selection string is evaluated from left to right and parentheses may not be used to control the order of evaluation. This means that a selection string like

                       

is evaluated as

                     

and not as

                     

If you are working with complex selections, it is best to use transformations to define a binary selection variable and use this selection variable with the SELECT command. For example, the following commands could be used to make the selection in the last example:

TRAN svar = (a AND b) OR c @

SELECT svar == 1 @