CLUSTER command
Groups records into clusters based on similar values in one or more numeric fields. Clusters can be uni-dimensional or multidimensional.
Syntax
CLUSTER ON key_field <...n> KVALUE number_of_clusters ITERATIONS number_of_iterations INITIALIZATIONS number_of_initializations <SEED seed_value> <OTHER field < ...n>> TO table_name <IF test> <WHILE test> <FIRST range|NEXT range> OPEN {no_keyword|NOCENTER|NOSCALE}
Parameters
Name | Description |
---|---|
ON key_field <...n> |
One or more numeric fields to cluster. Multiple fields must be separated by spaces. |
KVALUE number_of_clusters | The number of clusters generated in the output results. |
ITERATIONS number_of_iterations | The maximum number of times the cluster calculation is re-performed. |
INITIALIZATIONS number_of_initializations |
The number of times to generate an initial set of random centroids. |
SEED seed_value optional |
The seed value to use to initialize the random number generator in Analytics. If you omit SEED, Analytics randomly selects the seed value. |
OTHER field <...n> optional |
One or more additional fields to include in the output. Note Key fields are automatically included in the output table, and do not need to be specified using OTHER. |
TO table_name |
The location to send the results of the command to:
|
IF test optional |
A conditional expression that must be true in order to process each record. The command is executed on only those records that satisfy the condition. Note The IF parameter is evaluated against only the records remaining in a table after any scope parameters have been applied (WHILE, FIRST, NEXT). |
WHILE test optional |
A conditional expression that must be true in order to process each record. The command is executed until the condition evaluates as false, or the end of the table is reached. Note If you use WHILE in conjunction with FIRST or NEXT, record processing stops as soon as one limit is reached. |
FIRST range | NEXT range optional |
The number of records to process:
Use range to specify the number of records to process. If you omit FIRST and NEXT, all records are processed by default. |
OPEN optional |
Opens the table created by the command after the command executes. Only valid if the command creates an output table. |
no_keyword | NOCENTER | NOSCALE |
The method for standardizing key field numeric values.
|
Examples
Clustering on invoice amount
In addition to stratifying an accounts receivable table on the Invoice_Amount field, you also decide to cluster on the same field.
- Stratifying groups the amounts into strata with predefined numeric boundaries – for example, $1000 intervals.
- Clustering discovers any organic groupings of amounts that exist in the data without requiring that you decide on numeric boundaries in advance.
Open Ar CLUSTER ON Invoice_Amount KVALUE 8 ITERATIONS 30 INITIALIZATIONS 10 OTHER No Due Date Ref Type TO "Clustered_invoices" NOSCALE
As a quick way of discovering how many records are contained in each output cluster, you classify the Clustered_invoices output table on the Cluster field.
OPEN Clustered_invoices CLASSIFY ON Cluster TO SCREEN
Remarks
Note
For more information about how this command works, see the Analytics Help.