CLUSTER command

Groups records into clusters based on similar values in one or more numeric fields. Clusters can be uni-dimensional or multidimensional.

Syntax

CLUSTER ON key_field <...n> KVALUE number_of_clusters ITERATIONS number_of_iterations INITIALIZATIONS number_of_initializations <SEED seed_value> <OTHER field < ...n>> TO table_name <IF test> <WHILE test> <FIRST range|NEXT range> OPEN {no_keyword|NOCENTER|NOSCALE}

Parameters

Name Description
ON key_field <...n>

One or more numeric fields to cluster. Multiple fields must be separated by spaces.

KVALUE number_of_clusters The number of clusters generated in the output results.
ITERATIONS number_of_iterations The maximum number of times the cluster calculation is re-performed.
INITIALIZATIONS number_of_initializations

The number of times to generate an initial set of random centroids.

SEED seed_value

optional

The seed value to use to initialize the random number generator in Analytics.

If you omit SEED, Analytics randomly selects the seed value.

OTHER field <...n>

optional

One or more additional fields to include in the output.

Note

Key fields are automatically included in the output table, and do not need to be specified using OTHER.

TO table_name

The location to send the results of the command to:

  • table_name saves the results to an Analytics table

    Specify table_name as a quoted string with a .FIL file extension. For example: TO "Output.FIL"

    By default, the table data file (.FIL) is saved to the folder containing the Analytics project.

    Use either an absolute or relative file path to save the data file to a different, existing folder:

    • TO "C:\Output.FIL"
    • TO "Results\Output.FIL"

    Note

    Table names are limited to 64 alphanumeric characters, not including the .FIL extension. The name can include the underscore character ( _ ), but no other special characters, or any spaces. The name cannot start with a number.

IF test

optional

A conditional expression that must be true in order to process each record. The command is executed on only those records that satisfy the condition.

Note

The IF parameter is evaluated against only the records remaining in a table after any scope parameters have been applied (WHILE, FIRST, NEXT).

WHILE test

optional

A conditional expression that must be true in order to process each record. The command is executed until the condition evaluates as false, or the end of the table is reached.

Note

If you use WHILE in conjunction with FIRST or NEXT, record processing stops as soon as one limit is reached.

FIRST range | NEXT range

optional

The number of records to process:

  • FIRST start processing from the first record until the specified number of records is reached
  • NEXT start processing from the currently selected record until the specified number of records is reached

Use range to specify the number of records to process.

If you omit FIRST and NEXT, all records are processed by default.

OPEN

optional

Opens the table created by the command after the command executes. Only valid if the command creates an output table.

no_keyword | NOCENTER | NOSCALE

The method for standardizing key field numeric values.

  • no_keyword center key field values around zero (0), and scale the values to unit variance when calculating the clusters
  • NOCENTER scale key field values to unit variance when calculating the clusters, but do not center the values around zero (0)
  • NOSCALE use the raw key field values, unscaled, when calculating the clusters

Examples

Clustering on invoice amount

In addition to stratifying an accounts receivable table on the Invoice_Amount field, you also decide to cluster on the same field.

  • Stratifying groups the amounts into strata with predefined numeric boundaries – for example, $1000 intervals.
  • Clustering discovers any organic groupings of amounts that exist in the data without requiring that you decide on numeric boundaries in advance.
Open Ar
CLUSTER ON Invoice_Amount KVALUE 8 ITERATIONS 30 INITIALIZATIONS 10 OTHER No Due Date Ref Type TO "Clustered_invoices" NOSCALE

As a quick way of discovering how many records are contained in each output cluster, you classify the Clustered_invoices output table on the Cluster field.

OPEN Clustered_invoices
CLASSIFY ON Cluster TO SCREEN

Remarks

Note

For more information about how this command works, see the Analytics Help.

ACL Scripting Guide 14.1