DUPLICATES command
Concept Information
Detects whether duplicate values or entire duplicate records exist in an Analytics table.
Syntax
DUPLICATES {<ON> key_field <D> <...n>|<ON> ALL <EXCLUDE field_name <...n>>} <OTHER field <...n>|OTHER ALL <EXCLUDE field_name <...n>>> <UNFORMATTED> <ADDGROUP> <PRESORT> <IF test> <WHILE test> <FIRST range|NEXT range> <APPEND> <OPEN> <TO {SCREEN|table_name|filename|PRINT}> <LOCAL> <HEADER header_text> <FOOTER footer_text> <ISOLOCALE locale_code>
Parameters
Name | Description |
---|---|
ON key_field D <...n> | ON ALL |
The key field or fields, or the expression, to test for duplicates.
|
EXCLUDE field_name optional |
Only valid when testing for duplicates using ON ALL. The field or fields to exclude from the command. EXCLUDE allows you to fine-tune ON ALL, by excluding the specified fields. EXCLUDE must immediately follow ON ALL. For example: ON ALL EXCLUDE field_1 field_2 |
OTHER field <...n> | OTHER ALL optional |
One or more additional fields to include in the output.
|
EXCLUDE field_name optional |
Only valid when using OTHER ALL. The field or fields to exclude from the command. EXCLUDE allows you to fine-tune OTHER ALL, by excluding the specified fields. EXCLUDE must immediately follow OTHER ALL. For example: OTHER ALL EXCLUDE field_1 field_2 |
UNFORMATTED optional |
Suppresses page headings and page breaks when the results are output to a file. |
ADDGROUP optional |
Include the Group Number field ( GROUP_NUM ) in the output table. The Group Number field assigns a sequentially incremented number to each unique group of duplicates. Tip The ability to reference groups of duplicates by number can be useful when you analyze data in the output table. |
PRESORT optional |
Sorts the table on the key field before executing the command. Note You cannot use PRESORT inside the GROUP command. |
IF test optional |
A conditional expression that must be true in order to process each record. The command is executed on only those records that satisfy the condition. Note The IF parameter is evaluated against only the records remaining in a table after any scope parameters have been applied (WHILE, FIRST, NEXT). |
WHILE test optional |
A conditional expression that must be true in order to process each record. The command is executed until the condition evaluates as false, or the end of the table is reached. Note If you use WHILE in conjunction with FIRST or NEXT, record processing stops as soon as one limit is reached. |
FIRST range | NEXT range optional |
The number of records to process:
Use range to specify the number of records to process. If you omit FIRST and NEXT, all records are processed by default. |
APPEND optional |
Appends the command output to the end of an existing file instead of overwriting it. Note You must ensure that the structure of the command output and the existing file are identical:
Analytics appends output to an existing file regardless of its structure. If the structure of the output and the existing file do not match, jumbled, missing, or inaccurate data can result. |
OPEN optional |
Opens the table created by the command after the command executes. Only valid if the command creates an output table. |
TO SCREEN | table_name | filename | PRINT optional |
The location to send the results of the command to:
|
LOCAL optional |
Saves the output file in the same location as the Analytics project. Note Applicable only when running the command against a server table with an output file that is an Analytics table. The LOCAL parameter must immediately follow the TO parameter. |
HEADER header_text optional |
The text to insert at the top of each page of a report. header_text must be specified as a quoted string. The value overrides the Analytics HEADER system variable. |
FOOTER footer_text optional |
The text to insert at the bottom of each page of a report. footer_text must be specified as a quoted string. The value overrides the Analytics FOOTER system variable. |
ISOLOCALE locale_code optional |
Note Applicable in the Unicode edition of Analytics only. The system locale in the format language_country. For example, to use Canadian French, enter fr_ca. Use the following codes:
If you do not use ISOLOCALE, the default system locale is used. |
Analytics output variables
Name | Contains |
---|---|
GAPDUPn |
The total number of gaps, duplicates, or fuzzy duplicate groups identified by the command. |
Examples
Test for duplicate values in one field
The following example:
- tests for duplicate values in the Invoice_Number field
- outputs any records that contain duplicate invoice numbers to a new Analytics table
DUPLICATES ON Invoice_Number OTHER Vendor_Number Invoice_Date Invoice_Amount PRESORT TO "Duplicate_Invoices.FIL"
Test for duplicate values in two or more fields in combination
The following example:
- tests for duplicate combinations of values in the Invoice_Number and Vendor_Number fields
- outputs any records that contain the same invoice number and the same vendor number to a new Analytics table
The difference between this test and the previous test is that a identical invoice number from two different vendors is not reported as a false positive.
DUPLICATES ON Invoice_Number Vendor_Number OTHER Invoice_Date Invoice_Amount PRESORT TO "Duplicate_Invoices.FIL"
Test for duplicate records
The following examples:
- test for duplicate values in every field in an Inventory table
- output any entirely identical records to a new Analytics table
DUPLICATES ON ProdNum ProdClass Location ProdDesc ProdStatus UnitCost CostDate SalePrice PriceDate PRESORT TO "Duplicate_Inventory_Items.FIL"
You can simplify the syntax by using ALL :
DUPLICATES ON ALL PRESORT TO "Duplicate_Inventory_Items.FIL"
Filter duplicates output table by group number
You use several key fields in combination to test an accounts payable table for duplicate records:
- vendor number
- invoice number
- invoice date
- invoice amount
You want to filter the resulting duplicates output table so that only some of the groups of duplicates are subject to additional processing.
To create a filter using the combination of key fields would be laborious. For example:
SET FILTER TO ((Vendor_No = "11475") AND (Invoice_No = "8752512") AND (Invoice_Date = `20191021`) AND (Invoice_Amount = 7125.80)) OR ((Vendor_No = "12130") AND (Invoice_No = "589134") AND (Invoice_Date = `20191117`) AND (Invoice_Amount = 10531.71)) OR ((Vendor_No = "13440") AND (Invoice_No = "5518912") AND (Invoice_Date = `20191015`) AND (Invoice_Amount = 11068.20))
Instead, you achieve the same result by creating a filter based on group number:
SET FILTER TO MATCH(GROUP_NUM, 3 , 8, 11)
Remarks
For more information about how this command works, see Testing for duplicates.
Sorting and duplicates
Generally, you should run the duplicates command only on a sorted key field or fields. Duplicate values in a key field are only found if they are immediately adjacent.
If you run the duplicates command on an unsorted key field, non-adjacent duplicate values are not reported as duplicates. If two or more clusters of the same duplicate value exist, they are reported as duplicates, but in separate groups.
Depending on your analysis goal, it may make sense to run the duplicates command on an unsorted key field. For example, you may want to find only those duplicate values that are immediately adjacent in the source table, and ignore duplicate values that are non-adjacent.