CLUSTER command

Concept Information

Clustering data

Groups records into clusters based on similar values in one or more numeric fields. Clusters can be uni-dimensional or multidimensional.

Note

The CLUSTER command is not supported if you are running Analytics on a 32-bit computer. The computation required by the command is processor-intensive and better suited to 64-bit computers.

Syntax

CLUSTER ON key_field <...n> KVALUE number_of_clusters ITERATIONS number_of_iterations INITIALIZATIONS number_of_initializations <SEED seed_value> <OTHER field < ...n>|OTHER ALL> TO table_name <IF test> <WHILE test> <FIRST range|NEXT range> OPEN {no_keyword|NOCENTER|NOSCALE}

Parameters

Name Description
ON key_field <...n>

One or more numeric fields to cluster. Multiple fields must be separated by spaces.

KVALUE number_of_clusters

The number of clusters generated in the output results.

For more information, see Choosing the number of clusters (K value).

ITERATIONS number_of_iterations The maximum number of times the cluster calculation is re-performed.
INITIALIZATIONS number_of_initializations

The number of times to generate an initial set of random centroids.

SEED seed_value

optional

The seed value to use to initialize the random number generator in Analytics.

If you omit SEED, Analytics randomly selects the seed value.

OTHER field <...n> | OTHER ALL

optional

One or more additional fields to include in the output.

  • OTHER field <...n> include the specified field or fields

    Fields are included in the order that you list them.

  • OTHER ALL include all fields in the table

    Fields are included in the order that they appear in the table layout.

Note

Key fields are automatically included in the output table, although the values are scaled unless you specify NOSCALE. You can use OTHER to include a second, unscaled instance of a key field or fields.

TO table_name

The location to send the results of the command to:

  • table_name saves the results to an Analytics table

    Specify table_name as a quoted string with a .FIL file extension. For example: TO "Output.FIL"

    By default, the table data file (.FIL) is saved to the folder containing the Analytics project.

    Use either an absolute or relative file path to save the data file to a different, existing folder:

    • TO "C:\Output.FIL"
    • TO "Results\Output.FIL"

    Note

    Table names are limited to 64 alphanumeric characters, not including the .FIL extension. The name can include the underscore character ( _ ), but no other special characters, or any spaces. The name cannot start with a number.

IF test

optional

A conditional expression that must be true in order to process each record. The command is executed on only those records that satisfy the condition.

Note

The IF parameter is evaluated against only the records remaining in a table after any scope parameters have been applied (WHILE, FIRST, NEXT).

WHILE test

optional

A conditional expression that must be true in order to process each record. The command is executed until the condition evaluates as false, or the end of the table is reached.

Note

If you use WHILE in conjunction with FIRST or NEXT, record processing stops as soon as one limit is reached.

FIRST range | NEXT range

optional

The number of records to process:

  • FIRST start processing from the first record until the specified number of records is reached
  • NEXT start processing from the currently selected record until the specified number of records is reached

Use range to specify the number of records to process.

If you omit FIRST and NEXT, all records are processed by default.

OPEN

optional

Opens the table created by the command after the command executes. Only valid if the command creates an output table.

no_keyword | NOCENTER | NOSCALE

The method for preprocessing key field numeric values before calculating the clusters.

  • no_keyword center key field values on a mean of zero (0), and scale them by dividing by their standard deviation, a process that converts the values to their z-score equivalent (standard score)
  • NOCENTER scale key field values by dividing by their standard deviation, but do not center them on a mean of zero (0)
  • NOSCALE use the raw key field values, uncentered and unscaled

For more information, see Specify a data preprocessing method.

Examples

Clustering on invoice amount

In addition to stratifying an accounts receivable table on the Invoice_Amount field, you also decide to cluster on the same field.

  • Stratifying groups the amounts into strata with predefined numeric boundaries – for example, $1000 intervals.
  • Clustering discovers any organic groupings of amounts that exist in the data without requiring that you decide on numeric boundaries in advance.
Open Ar
CLUSTER ON Invoice_Amount KVALUE 8 ITERATIONS 30 INITIALIZATIONS 10 OTHER No Due Date Ref Type TO "Clustered_invoices" NOSCALE

As a quick way of discovering how many records are contained in each output cluster, you classify the Clustered_invoices output table on the Cluster field.

OPEN Clustered_invoices
CLASSIFY ON Cluster TO SCREEN

Remarks

For more information about how this command works, see Clustering data.