UNGROUP

Purpose

Create a dataset that contains one record for each time a person year table is updated.

Syntax

UNGROUP {TO fname} {CSV | TAB | FS char | BSF | EPIHEADER | FREE | FORMAT fstr }

{NONAMES} {IDVAR} {DETAIL} {LOG} {CELLID} @

Arguments and Subcommands

TO fname

Specify name of the output file. The default name is ungroup.csv. If the EPIHEADER, or FREE, or FORMAT subcommands are used to specify other text formats for the output file, the default file extension is .ind. If fname includes an extension this will be used in place of the default regardless of the output format.

CSV

Write data to a comma-separated value text file. Unless the NONAME option is given, the first record will contain a delimited list of variable names and the data will be written with one record containing all of the data used for each table update.

This is the default format.

TAB

Write data to a tab-delimited text file. Unless the NONAME option is given, the first record will contain a delimited list of variable names and the data will be written with one record containing all of the data used for each table update.

FS char

Write data to a text file in which fields are separated by the indicated character. Unless the NONAME option is given, the first record will contain a delimited list of variable names and the data will be written with one record containing all of the data used for each table update.

BSF

Write the ungrouped data to an EPICURE BSF file. With this option the BSF file contains the category lookup values (i.e. the input variable values used to determining the category index values) in place of the category index values written to ungroup text files.

EPIHEADER

Write a text file with header records containing commands to read the data into an EPICURE program.

FREE

FORMAT fstr

IDVAR varname

Include the value of the varname in each record. This is generally used to include information that allows you to determine the input record on which the data being used to update the table are based. It is assumed that the ID variable can be written as an integer.

LOG

Write ungrouped records to log file.

DETAIL

A rarely used option that results in the writing of one record to the ungroup file each time a person-cell is updated. When the RESOLUTION command is used the ungroup file will usually be more than one record for each person-cell. When this option is used the values written for the category variables are the current values of the variables used to define the categories. In addition to these records, the detailed ungroup file also includes the final summary record for each person-cell.

Remarks

The files produced by this command can be quite large since for each “person” who is ever at risk they contain one record for each cell in which the “person” is at risk. Even if the RESOLUTION command is used to specify a finer table-update resolution than that determined by the basic table definition, the UNGROUP file will contain only one record for each cell in which a person is at risk unless the DETAIL option is specified.

The ungrouped person year file is useful for debugging complex tables, or for producing datasets for use in analyses that require individual survival data (Cox regression) with complex time-dependent covariates. It should be noted that there will be some, generally minor differences, between parameter estimates obtained when fitting a model to a person-year table and fitting the same model to an ungrouped data set produced as that table was produced. The reason for this is that the ungroup table contains more highly individualized data than the corresponding person-year table.

The default ungroup contains one record per person-cell that includes the following items:

IDVar - a subject ID variable which is the value of the input variable indicated by the IDVAR subcommand. This is optional but is often important. This is written as in integer.

Category index values - Index values for each category variable. These are written as integers.

Summary variable values - These are the summary variable values for this person-cell. When a user-specified resolution is given means are the weighted) within cell mean values for this person. By default these means are weighted by person-years, however it is also possible to use other variables as weights.

When the DETAIL option noted above is used the output also includes one record summarizing the input data written each time the cell is updated for a person. In this case the category-index-values are replaced by the category-index lookup values (written as floating point numbers). In addition each detail record contains a cell ID value that can be used to group all the records for a given person-cell together.

Examples

a) Create a standard CSV file called lungung.csv with the udbid as is the ID variable

UNGROUP TO lungung IDVAR udbid @

b) Create an ungroup file using semicolon (;) as the field separator. For this example the default names is used which is ungroup.ind.

UNGROUP IDVAR udbid FS ; @