DUPLICATES command

Concept Information

Testing for duplicates

Detects whether duplicate values or entire duplicate records exist in an Analytics table.

Syntax

DUPLICATES {<ON> key_field <D> <...n>|<ON> ALL <EXCLUDE field_name <...n>>} <OTHER field <...n>|OTHER ALL <EXCLUDE field_name <...n>>> <UNFORMATTED> <ADDGROUP> <PRESORT> <IF test> <WHILE test> <FIRST range|NEXT range> <APPEND> <OPEN> <TO {SCREEN|table_name|filename|PRINT}> <LOCAL> <HEADER header_text> <FOOTER footer_text> <ISOLOCALE locale_code>

Parameters

Name Description
ON key_field D <...n> | ON ALL

The key field or fields, or the expression, to test for duplicates.

  • ON key_field use the specified field or fields

    If you test by more than one field, records identified as duplicates require identical values in every specified field.

    Fields are included in the output results in the order that you list them.

    Include D to sort a key field in descending order. The default sort order is ascending.

  • ON ALL use all fields in the table

    If you test by all the fields in a table, records identified as duplicates must be entirely identical.

    Fields are included in the output results in the order that they appear in the table layout.

    An ascending sort order is the only option for ON ALL.

    Note

    Undefined portions of records are not tested.

EXCLUDE field_name

optional

Only valid when testing for duplicates using ON ALL.

The field or fields to exclude from the command. EXCLUDE allows you to fine-tune ON ALL, by excluding the specified fields.

EXCLUDE must immediately follow ON ALL. For example:

ON ALL EXCLUDE field_1 field_2
OTHER field <...n> | OTHER ALL

optional

One or more additional fields to include in the output.

  • OTHER field <...n> include the specified field or fields
  • OTHER ALL include all fields in the table that are not specified as key fields
EXCLUDE field_name

optional

Only valid when using OTHER ALL.

The field or fields to exclude from the command. EXCLUDE allows you to fine-tune OTHER ALL, by excluding the specified fields.

EXCLUDE must immediately follow OTHER ALL. For example:

OTHER ALL EXCLUDE field_1 field_2

UNFORMATTED

optional

Suppresses page headings and page breaks when the results are output to a file.

ADDGROUP

optional

Include the Group Number field ( GROUP_NUM ) in the output table.

The Group Number field assigns a sequentially incremented number to each unique group of duplicates.

Tip

The ability to reference groups of duplicates by number can be useful when you analyze data in the output table.

PRESORT

optional

Sorts the table on the key field before executing the command.

Note

You cannot use PRESORT inside the GROUP command.

IF test

optional

A conditional expression that must be true in order to process each record. The command is executed on only those records that satisfy the condition.

Note

The IF parameter is evaluated against only the records remaining in a table after any scope parameters have been applied (WHILE, FIRST, NEXT).

WHILE test

optional

A conditional expression that must be true in order to process each record. The command is executed until the condition evaluates as false, or the end of the table is reached.

Note

If you use WHILE in conjunction with FIRST or NEXT, record processing stops as soon as one limit is reached.

FIRST range | NEXT range

optional

The number of records to process:

  • FIRST start processing from the first record until the specified number of records is reached
  • NEXT start processing from the currently selected record until the specified number of records is reached

Use range to specify the number of records to process.

If you omit FIRST and NEXT, all records are processed by default.

APPEND

optional

Appends the command output to the end of an existing file instead of overwriting it.

Note

You must ensure that the structure of the command output and the existing file are identical:

  • the same fields
  • the same field order
  • matching fields are the same length
  • matching fields are the same data type

Analytics appends output to an existing file regardless of its structure. If the structure of the output and the existing file do not match, jumbled, missing, or inaccurate data can result.

OPEN

optional

Opens the table created by the command after the command executes. Only valid if the command creates an output table.

TO SCREEN | table_name | filename | PRINT

optional

The location to send the results of the command to:

  • SCREEN displays the results in the Analytics display area

    Tip

    You can click any linked result value in the display area to drill down to the associated record or records in the source table.

  • table_name saves the results to an Analytics table

    Specify table_name as a quoted string with a .FIL file extension. For example: TO "Output.FIL"

    By default, the table data file (.FIL) is saved to the folder containing the Analytics project.

    Use either an absolute or relative file path to save the data file to a different, existing folder:

    • TO "C:\Output.FIL"
    • TO "Results\Output.FIL"

    Note

    Table names are limited to 64 alphanumeric characters, not including the .FIL extension. The name can include the underscore character ( _ ), but no other special characters, or any spaces. The name cannot start with a number.

  • filename saves the results to a file

    Specify filename as a quoted string with the appropriate file extension. For example: TO "Output.TXT"

    By default, the file is saved to the folder containing the Analytics project.

    Use either an absolute or relative file path to save the file to a different, existing folder:

    • TO "C:\Output.TXT"
    • TO "Results\Output.TXT"
  • PRINT sends the results to the default printer

LOCAL

optional

Saves the output file in the same location as the Analytics project.

Note

Applicable only when running the command against a server table with an output file that is an Analytics table.

The LOCAL parameter must immediately follow the TO parameter.

HEADER header_text

optional

The text to insert at the top of each page of a report.

header_text must be specified as a quoted string. The value overrides the Analytics HEADER system variable.

FOOTER footer_text

optional

The text to insert at the bottom of each page of a report.

footer_text must be specified as a quoted string. The value overrides the Analytics FOOTER system variable.

ISOLOCALE locale_code

optional

Note

Applicable in the Unicode edition of Analytics only.

The system locale in the format language_country. For example, to use Canadian French, enter fr_ca.

Use the following codes:

  • language ISO 639 standard language code
  • country ISO 3166 standard country code

    If you do not specify a country code, the default country for the language is used.

If you do not use ISOLOCALE, the default system locale is used.

Analytics output variables

Name Contains
GAPDUPn

The total number of gaps, duplicates, or fuzzy duplicate groups identified by the command.

Examples

Test for duplicate values in one field

The following example:

  • tests for duplicate values in the Invoice_Number field
  • outputs any records that contain duplicate invoice numbers to a new Analytics table
DUPLICATES ON Invoice_Number OTHER Vendor_Number Invoice_Date Invoice_Amount PRESORT TO "Duplicate_Invoices.FIL"

Test for duplicate values in two or more fields in combination

The following example:

  • tests for duplicate combinations of values in the Invoice_Number and Vendor_Number fields
  • outputs any records that contain the same invoice number and the same vendor number to a new Analytics table

The difference between this test and the previous test is that a identical invoice number from two different vendors is not reported as a false positive.

DUPLICATES ON Invoice_Number Vendor_Number OTHER Invoice_Date Invoice_Amount PRESORT TO "Duplicate_Invoices.FIL"

Test for duplicate records

The following examples:

  • test for duplicate values in every field in an Inventory table
  • output any entirely identical records to a new Analytics table
DUPLICATES ON ProdNum ProdClass Location ProdDesc ProdStatus UnitCost CostDate SalePrice PriceDate PRESORT TO "Duplicate_Inventory_Items.FIL"

You can simplify the syntax by using ALL :

DUPLICATES ON ALL PRESORT TO "Duplicate_Inventory_Items.FIL"

Filter duplicates output table by group number

You use several key fields in combination to test an accounts payable table for duplicate records:

  • vendor number
  • invoice number
  • invoice date
  • invoice amount

You want to filter the resulting duplicates output table so that only some of the groups of duplicates are subject to additional processing.

To create a filter using the combination of key fields would be laborious. For example:

SET FILTER TO ((Vendor_No = "11475") AND (Invoice_No = "8752512") AND (Invoice_Date = `20191021`) AND (Invoice_Amount = 7125.80)) OR ((Vendor_No = "12130") AND (Invoice_No = "589134") AND (Invoice_Date = `20191117`) AND (Invoice_Amount = 10531.71)) OR ((Vendor_No = "13440") AND (Invoice_No = "5518912") AND (Invoice_Date = `20191015`) AND (Invoice_Amount = 11068.20))

Instead, you achieve the same result by creating a filter based on group number:

SET FILTER TO MATCH(GROUP_NUM, 3 , 8, 11)

Remarks

For more information about how this command works, see Testing for duplicates.

Sorting and duplicates

Generally, you should run the duplicates command only on a sorted key field or fields. Duplicate values in a key field are only found if they are immediately adjacent.

If you run the duplicates command on an unsorted key field, non-adjacent duplicate values are not reported as duplicates. If two or more clusters of the same duplicate value exist, they are reported as duplicates, but in separate groups.

Depending on your analysis goal, it may make sense to run the duplicates command on an unsorted key field. For example, you may want to find only those duplicate values that are immediately adjacent in the source table, and ignore duplicate values that are non-adjacent.