The example in this section illustrates the use of the SORT command and shows how the SELECT command can be used to limit the scope of operations to a specific subset of the data. Additional examples demonstrating more features of the SAVE and DATA commands are also included.
Example 3.10 Sorting and Subset Selection
The final section of this series of examples contains the following commands:
SORT hosp - brstat @
SELECT expcat /= 2 @
SAVE id brstat byr dxyr dose expcat aft aftcat numbtf ;
TO tbfdose @
SELECT expcat == 2 @
SUM brstat @
SETFORMAT id hosp brstat ; I5@
DATA id hosp brstat @
QUIT @
This group of commands begins with a command to sort the data
on hosp in
ascending order and brstat in descending order (indicated by
the
prefix in the SORT command). Assuming that the brstat variable is coded as 1 for cases and
0 for controls, this sort will group women by sanitarium with cases before
controls within each sanitarium.
The SELECT command defines the subset of records to be used in analyses. The currently selected records are called the active subset. The active subset remains in effect until the next SELECT command. Once an active subset has been defined, only records in the subset are used in analyses or written following SAVE or DATA commands. (However, variables in all records are affected by transformations regardless of the currently active subset.) The records are written in order determined by the most recent sort command, if any.
In this example the first SELECT command defines the active subset to include only those records for which expcat is not equal to 2, that is, for which dose is not missing. The subsequent SAVE command creates another BSF file, called TBFDOSE.BSF which includes the indicated variables and does not contain any records with missing values for dose. Since a selection is in effect, the program will ask if you really want to only write the active subset of the data.
The statement SELECT expcat == 2 @ redefines the active subset to include only those records with missing values for dose. We then use the SETFORMAT command to define output formats for selected variables. As with the input formats discussed above the I# format specifies that the value is truncated and then written as an integer with a minimum of # spaces and the f#.# format specifies that numbers are written with a minimum width specified by the first # and with the number of digits after the decimal point indicated by the second #.
The DATA command writes the id, hosp, brstat variables to a csv file.
The SELECT@ command can be used to return to the initial state in which the full data set is the active subset.