Sabre

Using Sabre: Examining and manipulating data

Sabre manual

The two ways in which raw data can be visually displayed in SABRE are through the LOOK and HISTOGRAM commands. The FACTOR command is used to change variables into factors, while the TRANSFORM command may be used for either data transformations or the creation of interaction terms.

The LOOK command allows printing of up to six variables. For example, to print variables A and B type

LOOK a b

This will print observations in blocks of twenty, with the user prompted at the end of each block whether she wants to carry on printing. If specific blocks of observations are required, the start and end line numbers can be added as arguments. Thus, to print observations 28 to 67 type

LOOK a b 28 67

Note that, for factor and pseudo factor (see below) arguments, the printed value is the (pseudo) factor level.

The HISTOGRAM command draws a histogram and prints group frequencies. The command can be issued for both variables and factors. With variables, an optional argument can be added to specify the maximum number of groups (ie, bars) in the histogram. The actual number of bars printed may be less than this argument due to the suppression of groups with zero frequency. Thus, to draw a histogram with at most nine groups for variable A type

HISTOGRAM a 9

If a factor (see below) is used as the argument, group specification is not allowed. However, in general, better histograms are obtained by using the factor rather than the variable from which the factor was produced.

The FACTOR command produces a series of dummy variables which indicate the presence or absence of a level of the factor. There can be up to 100 factor levels. When using SABRE, variables can be factorised in two ways. If the variable has a discrete number of values the command is simply

FACTOR var fac

where var is the name of the variable to be factorised and fac is the name of the resulting factor. If the variable is continuous, cut-points can be specified which divide the variable into the relevant number of levels n. The form of the command is then

FACTOR var fac [n-1 cut-points]

For example, to create a factor (afac) which indicates whether a variable (bvar) is less than, or greater than or equal to 10, the command is

FACTOR bvar afac 10

and would result in

bvar        afac
 3          1  0
 9  mapped  1  0
12    to    0  1
 8          1  0

If a variable element is equal to a cut-point, the element is placed in the higher cut-point group.

Data transformations may be performed using the TRANSFORM command. The only permitted functional transformations are exponentiation and natural logarithm. Variables can also be raised to a constant power or acted upon by the arithmetic operators multiplication, division, addition and subtraction. In the case of arithmetic manipulation, the post-operator argument may be either another variable or a constant (eg, both A + B and A + 2 are valid expressions). Although the command allows only a single transforming operation at a time, compound transformations may be dealt with simply by issuing a sequence of related TRANSFORM commands. For example, given variables A, B and C, the transformed variable

D = exp((A**1/4 - loge(B))/(3*C**2 + 1))

could be derived via the following series of instructions

TRANSFORM var1 a ^ 0.25
TRANSFORM var2 log b
TRANSFORM var3 var1 - var2
TRANSFORM var4 c ^ 2
TRANSFORM var5 var4 * 3
TRANSFORM var6 var5 + 1
TRANSFORM var7 var3 / var6
TRANSFORM d exp var7

The TRANSFORM command may also be used to create interactions between variables and/or factors. A variable by variable interaction is formed simply by multiplying together the corresponding values of each of the two variables. For example, to create an interaction term cint from variables avar and bvar, the command is

TRANSFORM cint avar * bvar

and would result in

avar bvar        cint
 3   10          30
 9    5  mapped  45
12    2    to    24
 8    6          48

A variable by factor (or factor by variable) interaction is formed by replacing the unit element in the factor by the variable value while keeping all the remaining elements at zero. For example, to create an interaction term cint from a variable avar and a 3-level factor bfac, the command is

TRANSFORM cint avar * bfac

and would result in

avar   bfac           cint
 3     0 1 0          0 3  0
 9     1 0 0  mapped  9 0  0
12     0 0 1    to    0 0 12
 8     0 1 0          0 8  0

Note that the interaction term cint is neither a variable (it has more than one level) nor a factor (it doesn't consist of just 0's and 1's). However, it is similar to a factor in the sense that it contains only a single non-zero entry. For this reason, we refer to such structures as pseudo factors; ie, an interaction between a variable and an n-level factor produces an n-level pseudo factor.

An m-level factor by n-level factor interaction forms an mn-level factor. If the unit elements of the original factors are at indices i and j respectively, then index (i-1)n + j of the new factor will be the unit element. For example, to create an interaction term cint from a 4-level factor afac and a 3-level factor bfac, the command is

TRANSFORM cint afac * bfac

and would result in

afac     bfac           cint
0 1 0 0  1 0 0          0 0 0 1 0 0 0 0 0 0 0 0
0 0 0 1  0 1 0  mapped  0 0 0 0 0 0 0 0 0 0 1 0
1 0 0 0  0 0 1    to    0 0 1 0 0 0 0 0 0 0 0 0
0 0 1 0  0 1 0          0 0 0 0 0 0 0 1 0 0 0 0

As noted above (with m=4 and n=3), cint is indeed a 12-level factor.

Other links: Centre for e-Science | Centre for Applied Statistics