CLUSTER command

Concept Information

Groups records into clusters based on similar values in one or more numeric fields. Clusters can be uni-dimensional or multidimensional.

Note

The CLUSTER command is not supported if you are running Analytics on a 32-bit computer. The computation required by the command is processor-intensive and better suited to 64-bit computers.

Syntax

CLUSTER ON key_field <...n> KVALUE number_of_clusters ITERATIONS number_of_iterations INITIALIZATIONS number_of_initializations <SEED seed_value> <OTHER field < ...n>|OTHER ALL> TO table_name <IF test> <WHILE test> <FIRST range|NEXT range> OPEN {no_keyword|NOCENTER|NOSCALE}

Parameters

Name	Description
ON key_field <...n>	One or more numeric fields to cluster. Multiple fields must be separated by spaces.
KVALUE number_of_clusters	The number of clusters generated in the output results. For more information, see Choosing the number of clusters (K value).
ITERATIONS number_of_iterations	The maximum number of times the cluster calculation is re-performed.
INITIALIZATIONS number_of_initializations	The number of times to generate an initial set of random centroids.
SEED seed_value optional	The seed value to use to initialize the random number generator in Analytics. If you omit SEED, Analytics randomly selects the seed value.
OTHER field <...n> \| OTHER ALL optional	One or more additional fields to include in the output. OTHER field <...n> include the specified field or fields Fields are included in the order that you list them. OTHER ALL include all fields in the table Fields are included in the order that they appear in the table layout. Note Key fields are automatically included in the output table, although the values are scaled unless you specify NOSCALE. You can use OTHER to include a second, unscaled instance of a key field or fields.
TO table_name	The location to send the results of the command to: table_name saves the results to an Analytics table Specify table_name as a quoted string with a .FIL file extension. For example: TO "Output.FIL" By default, the table data file (.FIL) is saved to the folder containing the Analytics project. Use either an absolute or relative file path to save the data file to a different, existing folder: TO "C:\Output.FIL" TO "Results\Output.FIL" Note Table names are limited to 64 alphanumeric characters, not including the .FIL extension. The name can include the underscore character ( _ ), but no other special characters, or any spaces. The name cannot start with a number.
IF test optional	A conditional expression that must be true in order to process each record. The command is executed on only those records that satisfy the condition. Note The IF parameter is evaluated against only the records remaining in a table after any scope parameters have been applied (WHILE, FIRST, NEXT).
WHILE test optional	A conditional expression that must be true in order to process each record. The command is executed until the condition evaluates as false, or the end of the table is reached. Note If you use WHILE in conjunction with FIRST or NEXT, record processing stops as soon as one limit is reached.
FIRST range \| NEXT range optional	The number of records to process: FIRST start processing from the first record until the specified number of records is reached NEXT start processing from the currently selected record until the specified number of records is reached Use range to specify the number of records to process. If you omit FIRST and NEXT, all records are processed by default.
OPEN optional	Opens the table created by the command after the command executes. Only valid if the command creates an output table.
no_keyword \| NOCENTER \| NOSCALE	The method for preprocessing key field numeric values before calculating the clusters. no_keyword center key field values on a mean of zero (0), and scale them by dividing by their standard deviation, a process that converts the values to their z-score equivalent (standard score) NOCENTER scale key field values by dividing by their standard deviation, but do not center them on a mean of zero (0) NOSCALE use the raw key field values, uncentered and unscaled For more information, see Specify a data preprocessing method.

Examples

Clustering on invoice amount

In addition to stratifying an accounts receivable table on the Invoice_Amount field, you also decide to cluster on the same field.

Stratifying groups the amounts into strata with predefined numeric boundaries – for example, $1000 intervals.
Clustering discovers any organic groupings of amounts that exist in the data without requiring that you decide on numeric boundaries in advance.

Open Ar
CLUSTER ON Invoice_Amount KVALUE 8 ITERATIONS 30 INITIALIZATIONS 10 OTHER No Due Date Ref Type TO "Clustered_invoices" NOSCALE

As a quick way of discovering how many records are contained in each output cluster, you classify the Clustered_invoices output table on the Cluster field.

OPEN Clustered_invoices
CLASSIFY ON Cluster TO SCREEN

Remarks

For more information about how this command works, see Clustering data.