CLUSTER command
Concept Information
Groups records into clusters based on similar values in one or more numeric fields. Clusters can be uni-dimensional or multidimensional.
Note
The CLUSTER command is not supported if you are running Analytics on a 32-bit computer. The computation required by the command is processor-intensive and better suited to 64-bit computers.
Syntax
CLUSTER ON key_field <...n> KVALUE number_of_clusters ITERATIONS number_of_iterations INITIALIZATIONS number_of_initializations <SEED seed_value> <OTHER field < ...n>|OTHER ALL> TO table_name <IF test> <WHILE test> <FIRST range|NEXT range> OPEN {no_keyword|NOCENTER|NOSCALE}
Parameters
Name | Description |
---|---|
ON key_field <...n> |
One or more numeric fields to cluster. Multiple fields must be separated by spaces. |
KVALUE number_of_clusters |
The number of clusters generated in the output results. For more information, see Choosing the number of clusters (K value). |
ITERATIONS number_of_iterations | The maximum number of times the cluster calculation is re-performed. |
INITIALIZATIONS number_of_initializations |
The number of times to generate an initial set of random centroids. |
SEED seed_value optional |
The seed value to use to initialize the random number generator in Analytics. If you omit SEED, Analytics randomly selects the seed value. |
OTHER field <...n> | OTHER ALL optional |
One or more additional fields to include in the output.
Note Key fields are automatically included in the output table, although the values are scaled unless you specify NOSCALE. You can use OTHER to include a second, unscaled instance of a key field or fields. |
TO table_name |
The location to send the results of the command to:
|
IF test optional |
A conditional expression that must be true in order to process each record. The command is executed on only those records that satisfy the condition. Note The IF parameter is evaluated against only the records remaining in a table after any scope parameters have been applied (WHILE, FIRST, NEXT). |
WHILE test optional |
A conditional expression that must be true in order to process each record. The command is executed until the condition evaluates as false, or the end of the table is reached. Note If you use WHILE in conjunction with FIRST or NEXT, record processing stops as soon as one limit is reached. |
FIRST range | NEXT range optional |
The number of records to process:
Use range to specify the number of records to process. If you omit FIRST and NEXT, all records are processed by default. |
OPEN optional |
Opens the table created by the command after the command executes. Only valid if the command creates an output table. |
no_keyword | NOCENTER | NOSCALE |
The method for preprocessing key field numeric values before calculating the clusters.
For more information, see Specify a data preprocessing method. |
Examples
Clustering on invoice amount
In addition to stratifying an accounts receivable table on the Invoice_Amount field, you also decide to cluster on the same field.
- Stratifying groups the amounts into strata with predefined numeric boundaries – for example, $1000 intervals.
- Clustering discovers any organic groupings of amounts that exist in the data without requiring that you decide on numeric boundaries in advance.
OPEN Ar
CLUSTER ON Invoice_Amount KVALUE 8 ITERATIONS 30 INITIALIZATIONS 10 OTHER No Due Date Ref Type TO "Clustered_invoices" NOSCALE
As a quick way of discovering how many records are contained in each output cluster, you classify the Clustered_invoices output table on the Cluster field.
OPEN Clustered_invoices
CLASSIFY ON Cluster TO SCREEN
Remarks
For more information about how this command works, see Clustering data.