Sorting and indexing
Sorting and indexing are two different methods for sequentially ordering data in tables. Some Analytics commands require that the input is first sorted or indexed. Ordering data can also be a useful analytical operation in itself, bringing patterns and anomalies into focus.
Operation | Description |
---|---|
Sorting |
Sorting a table physically reorders data into a sequential order and outputs the results to a new Analytics table. |
Indexing |
Indexing does not make any change to the underlying physical order of data. Instead, it creates a separate index file that references records in a table, allowing access to the records in a sequential order rather than a physical order. Data in a view is reordered in accordance with an index only while the index is active. |
Ordering data as a prerequisite for other operations
Because computers process files in sequence, starting with the first record, sequentially ordering data is a prerequisite for several analytical tests and other operations in Analytics. Multiple-table operations, such as joins or relations, may require key fields be sorted or indexed.
Other Analytics tests and operations may not require ordered data, but they execute much more quickly if the data is first sorted or indexed.
Should I sort or index?
The decision whether to sort or index may depend on the particular task you want to perform. For example:
- Sorting might be a better choice for investigative work because it outputs a new table that can serve as the basis for subsequent analysis
- Indexing might be a better choice for informational or preliminary work because it allows you to quickly switch between different representations of the data in the active table
Benefits and drawbacks of sorting and indexing
The table below compares the benefits and drawbacks of sorting and indexing, and lists operations that require either sorting or indexing as a prerequisite.
|
Sorting |
Indexing |
---|---|---|
Outputs results to a new, physically separate Analytics table |
Yes |
No |
Physically reorders data |
Yes |
No |
Speed of operation |
Slower |
Faster |
Required disk space for processing |
More |
Less |
Resulting file size |
Larger |
Smaller |
Subsequent processing of the sorted or indexed file |
Faster |
Slower |
Searching character fields |
Slower |
Faster |
Prerequisite for |
|
|
The Sort Order option and sort sequences
The Sort Order option (Tools > Options > Table) specifies the sort sequence (collation) for character data. The option you specify defines what sort sequence is used when you sort or index records, or test sequential order, using a character field.
What is a sort sequence?
A sort sequence is like a template against which Analytics compares the first character or characters of each value in a character field when sorting, indexing, testing sequential order, or performing a quick sort.
The table below shows the default Analytics Sort Order setting and the associated sort sequence.
Analytics Edition |
Sort Order default |
Associated sort sequence |
---|---|---|
non-Unicode |
System Default (ASCII) |
|
Unicode |
Mix Languages (UCA) (Unicode collation algorithm) |
|
Changing the Sort Order
You can change the Sort Order to a different language if it better matches the data you are analyzing. In the Unicode edition of Analytics, you can also make this change on a command basis by using the ISOLOCALE parameter in the command line or a script.
Modifying a sort sequence
In the non-Unicode edition of Analytics, when you select a different language, you have the option of modifying the associated sort sequence by changing the order of the characters in the Sort Order text box.
You also have the option of creating a custom sort sequence by selecting Custom in the Sort Order field and specifying a sequence, or by entering SET ORDER <TO> values in the command line or a script and specifying a sequence. Whatever characters you specify will be sorted before all other characters, and in the sequence you specify. For example, you could specify that lowercase and uppercase letters are intermixed by entering the values aAbBcC.... Specifying SET ORDER returns the sort sequence to its default setting.
Default sort sequence based on byte order
The default sort sequence for individual languages is derived from the byte order of each character in its character set. You can view the byte order of characters in character sets using the Windows Character Map.