Sorting and indexing

Sorting and indexing are two different methods for sequentially ordering data in tables. Some Analytics commands require that the input is first sorted or indexed. Ordering data can also be a useful analytical operation in itself, bringing patterns and anomalies into focus.

Operation Description
Sorting

Sorting a table physically reorders data into a sequential order and outputs the results to a new Analytics table.

Indexing

Indexing does not make any change to the underlying physical order of data. Instead, it creates a separate index file that references records in a table, allowing access to the records in a sequential order rather than a physical order. Data in a view is reordered in accordance with an index only while the index is active.

Ordering data as a prerequisite for other operations

Because computers process files in sequence, starting with the first record, sequentially ordering data is a prerequisite for several analytical tests and other operations in Analytics. Multiple-table operations, such as joins or relations, may require key fields be sorted or indexed.

Other Analytics tests and operations may not require ordered data, but they execute much more quickly if the data is first sorted or indexed.

Should I sort or index?

The decision whether to sort or index may depend on the particular task you want to perform. For example:

  • Sorting might be a better choice for investigative work because it outputs a new table that can serve as the basis for subsequent analysis
  • Indexing might be a better choice for informational or preliminary work because it allows you to quickly switch between different representations of the data in the active table

Benefits and drawbacks of sorting and indexing

The table below compares the benefits and drawbacks of sorting and indexing, and lists operations that require either sorting or indexing as a prerequisite.

 

Sorting

Indexing

Outputs results to a new, physically separate Analytics table

Yes

No

Physically reorders data

Yes

No

Speed of operation

Slower

Faster

Required disk space for processing

More

Less

Resulting file size

Larger

Smaller

Subsequent processing of the sorted or indexed file

Faster

Slower

Searching character fields

Slower

Faster

Prerequisite for

  • Join

    (recommended, but not enforced, for the primary table)

  • Merge
  • Duplicates
  • Gaps
  • Define Relation

    (indexing of the child table key field performed automatically by Analytics)

  • Join

    (applying an index to the secondary table can only be performed from the command line or in a script)

  • Merge

    (applying an index to the secondary table can only be performed from the command line or in a script)

  • Duplicates
  • Gaps
  • Find
  • Find Literal search option
  • Seek
  • Seek Expression search option

The Sort Order option and sort sequences

The Sort Order option (Tools > Options > Table) specifies the sort sequence (collation) for character data. The option you specify defines what sort sequence is used when you sort or index records, or test sequential order, using a character field.

What is a sort sequence?

A sort sequence is like a template against which Analytics compares the first character or characters of each value in a character field when sorting, indexing, testing sequential order, or performing a quick sort.

The table below shows the default Analytics Sort Order setting and the associated sort sequence.

Analytics Edition

Sort Order default

Associated sort sequence

non-Unicode

System Default

(ASCII)

  • numbers, then uppercase, then lowercase

    0, 1, 2...  A, B, C...  a, b, c...

    For example, “Z” sorts before “a”.

  • Special characters occur at different points in the sequence, depending on the character.

  • Characters with diacritical marks occur at the end of the sequence and use the same uppercase before lowercase internal sequence.

Unicode

Mix Languages (UCA)

(Unicode collation algorithm)

  • numbers, then lowercase and uppercase intermixed

    0, 1, 2...  a, A, b, B, c, C...

    For example, “a” sorts before “Z”.

  • Special characters occur before numbers.

  • Characters with diacritical marks are intermixed with characters without diacritical marks.

    For example: e, E, é, É, f, F

Changing the Sort Order

You can change the Sort Order to a different language if it better matches the data you are analyzing. In the Unicode edition of Analytics, you can also make this change on a command basis by using the ISOLOCALE parameter in the command line or a script.

Modifying a sort sequence

In the non-Unicode edition of Analytics, when you select a different language, you have the option of modifying the associated sort sequence by changing the order of the characters in the Sort Order text box.

You also have the option of creating a custom sort sequence by selecting Custom in the Sort Order field and specifying a sequence, or by entering SET ORDER <TO> values in the command line or a script and specifying a sequence. Whatever characters you specify will be sorted before all other characters, and in the sequence you specify. For example, you could specify that lowercase and uppercase letters are intermixed by entering the values aAbBcC.... Specifying SET ORDER returns the sort sequence to its default setting.

Default sort sequence based on byte order

The default sort sequence for individual languages is derived from the byte order of each character in its character set. You can view the byte order of characters in character sets using the Windows Character Map.

Analytics 14.1 Help