MAKELOOK command

Purpose

Create a basic EPICURE lookup table from data read into EPICURE.

Syntax

MAKELOOK cvarlist [AS cdimname] BY dimvarlist TO rfname [SCALE #] @

Arguments and Subcommands

cvarlist

List of variables that contain the values of interest. These might be cause-specific rates, exposure levels, other quantities of interest. These variables define the levels of the primary dimension in the table. They are numbered from 1 to n in the order listed, where n is the number of variables in the list. The table header (.bsh) file lists the numbers of levels and the names for each level. These names are the variable names in this list. There must be at least one variable in this list.

AS cdimname

Name to be used for primary dimension in the table. This might be something like rates or exposure. If this subcommand is omitted, the name for the primary dimension variable is taken as cause.

BY dimvarlist

A list of one or more categorical variables that define the table structure. These must be coded as 1 to nlev. At present, all of these variables must be categorical variables. Once the basic table has been created, you can edit the table header (.bsh) file to specify categories for continuous variables (such as age or period). How this is done is described in the following section on Editing a lookup table header file. The default names for the dimensions are determined by the names in this list. This can also be changed by editing the table header file, as discussed below.

TO rfname

The table name. The table will consist of a (text) header file and a (binary) file that contains the table data. These files are called rfname.bsh and rfname.bst, respectively. The rfname may include a path, in which case the table files are written to the specified path. Otherwise they are written to the current working directory.

SCALE #

A scale factor for the values in the table (i.e. the values in the cvarlist variables). This value is written to table header file and is available to the user in an EPICURE variable called #_RTFACT that is created when the SETLOOK command is used to specify a table. This is particularly useful when dealing with population rates which are typically specified with units like cases per 10,000 or cases per 100,000 person years. If, for example, the rates in a table are to be used to compute expected numbers of cases for a standardized mortality/incidence analysis based on a person year table, the expected values would be equal to the product the person-years times the expected rate (obtained using the lookup function) times the rate scaling factor. This will be illustrated in the examples that follow.

Remarks

While this command works as described above to create a lookup table with a valid header file that could be used as is, it is a work in progress since in most cases the table header file should be edited to make the table easier to use. The principle limitation is that all of the table (secondary) dimension variables are described as categorical in the header. In most tables one or more of these dimensions refer to continuous variables such as age, age at exposure, or time period. A table produced using MAKELOOK could be used as is if one were to recode all continuous variables used in defining the table structure as categorical variables with categories numbered from one to n before using the LOOKUP function. However, a better solution to this problem is to edit the header file created by MAKELOOK to indicate the continuous variables and define the category cutpoints for these variables. When editing a table header file, one might also replace the generic and rather uninformative description of the data in the table with an informative title.

Example

Read a dataset with two estimators of sex, city, time period and birth cohort specific residence probabilities and create a basic lookup table that can be used to define residence probabilities for use in migration–adjusted analyses of disease rate data.

usetxt resprob2012.csv @ ! specify file with residence probability estiamtes

input @

level city sex @ ! set city and sex as categorical

categ year 1950 (5) 2000 / 2020 as yrcat @ ! define year categories

categ agex 0 (5) 70 / 120 as agxcat @ ! define age at exposure categories

! create basic lookup table
makelook rp1985 rp2012 as vers by city sex yrcat agxcat to resprob8512 @

! Diplay info about this table

setlook resprob8512 show @

The SETLOOK command with the SHOW option produces the following output:

5-way Table: Table generated by EPICURE at September 5, 2014 3:10:02 pm

vers with 2 categories:

1> rp1985 rp2012

city with 2 categories:

1> 1 2

sex with 2 categories:

1> 1 2

yrcat with 11 categories:

1> 1 2

3> 3 4

5> 5 6

7> 7 8

9> 9 10

11> 11

agxcat with 15 categories:

1> 1 2

3> 3 4

5> 5 6

7> 7 8

9> 9 10

11> 11 12

13> 13 14

15> 15

city as city

sex as sex

yrcat as yrcat

agxcat as agxcat

The header is rather uninformative as it contains no useful information on the table contents or the category definitions, and both age at exposure and year are treated as strictly categorical variables.

The header file (resprob8512.bsh) produced in this example looks like this

Table generated by EPICURE at September 5, 2014 3:10:02 pm

vers 0 2

rp1985 rp2012

city 0 2

1 2

sex 0 2

1 2

yrcat 0 11

1 2 3 4 5 6

7 8 9 10 11

agxcat 0 15

1 2 3 4 5 6

7 8 9 10 11 12

13 14 15

The first line contains a time stamp and short generic description of the table. The next line indicates that this table has 5 dimensions. Each dimension is described with an initial line that gives the default dimension name followed by a type indicator which is 0 for a categorical variable and 1 for a continuous variable. All of the dimension types are categorical. This is followed by the number of categories. Subsequent lines provide a description of each category. In this basic table these are just the integers 1 to n. Using text editor, such as Notepad++, Sublime, or one of the myriad others that are freely available (and often free), you can edit the header file to produce a more useable table. The edits include:

1) Replacing the description in the first line (Table generated by EPICURE) with something more informative

2) Replacing the numeric labels for the levels of categorical variables with more informative labels. The LOOKUP function does not use these labels but they do serve as a clue as to how the corresponding dimension variable needs to be coded for use with the LOOKUP function. For this example the descriptive labels for sex are male and female while those for city are Hiroshima and Nagasaki.

N.B. Value labels for categorical variables are read as 20-character fixed length strings so, when adding these labels, each label should start at the position of the corresponding number in the basic header file created by the MAKELOOK command.

3) Changing the type indicator for the year and age at exposure dimensions to be continuous (by changing the type code from 0 to 1) and replacement of the category value with appropriate cutpoints. There is one more cutpoint value than there are categories. The extra value is the upper limit of the final category. Since category cutpoints are read as a space delimited set of numbers, you do not need to worry about the field width, and you can put as many or as few of these values on a line as you would like.

The edited header file for this example looks like this:

Residence probability table generated by EPICURE at September 5, 2014 3:10:02 pm

vers 0 2

rp1985 rp2012

city 0 2

Hiro Naga

sex 0 2

Male Female

yrcat 1 11

1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2020

agxcat 1 15

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 120

The output from the SETLOOK command with the SHOW option for the edited header file is:

5-way Table: Residence probability table generated by EPICURE at September 5, 2014 3:10:02 pm

vers with 2 categories:

1> rp1985 rp2012

city with 2 categories:

1> Hiro Naga

sex with 2 categories:

1> Male Female

yrcat with 11 intervals:

1950.0 1955.0 1960.0 1965.0 1970.0 1975.0 1980.0

1985.0 1990.0 1995.0 2000.0 2020.0

agxcat with 15 intervals:

0.0 5.0 10.0 15.0 20.0 25.0 30.0

35.0 40.0 45.0 50.0 55.0 60.0 65.0

70.0 120.0

city as city

sex as sex

yrcat as yrcat

agxcat as agxcat