Duplicate
values in one or more fields, or duplicate records, can be the result
of data entry errors or fraudulent activity such as splitting credit
card transactions to avoid scrutiny.
Requirement for unique values
Fields that should never contain duplicates are ones
where the values uniquely identify records. For example, an employee
table should never have duplicate employee numbers because each number should identify a unique employee.
Valid duplicates
Duplicate values may also be valid. For example, a transaction table could have
duplicate customer numbers because of multiple transactions by the
same customers.
Different types of duplicates testing
You
can use Analytics to test for duplicates in the following ways:
Test scope
Use this test when:
One field
All values
in a particular field should be unique, such as employee numbers,
or check numbers.
Two or more fields in combination
Uniqueness is not a requirement of any field in
isolation, but it is a requirement of certain fields in combination.
Example
In a payroll file covering a year, the employee number
field and the pay date field are going to contain numerous
duplicates. Employees get paid every two weeks, and multiple employees
are paid on the same date.
However, an individual employee should
appear only once for a particular date. If a duplicate exists across
the employee number and pay date fields in combination, an employee may have been paid twice for the same pay period.
All fields in a record
Checking for entire duplicate records, in which every field in a record
is duplicated. Entire duplicate records could be the result of data
entry error, or other transactional irregularities.
Sorting and duplicates
Generally, you should only test for duplicates using a sorted key field or fields. Duplicate values in a key field are only found if they are immediately adjacent.
If you test for duplicates using an unsorted key field, non-adjacent duplicate values are not reported as duplicates. If two or more clusters of the same duplicate value exist, they are reported as duplicates, but in separate groups.
Depending on your analysis goal, it may make sense to test for duplicates using an unsorted key field. For example, you may want to find only those duplicate values that are immediately adjacent in the source table, and ignore duplicate values that are non-adjacent.
Including the Group Number field in the output table
You have the option of including the Group Number field in the duplicates output table. The field assigns a sequentially incremented number to each unique group of duplicates. The ability to reference groups of duplicates by number can be useful when you analyze data in the output table.
Filter duplicates output table by group number
You use several key fields in combination to test an accounts payable table for duplicate records:
vendor number
invoice number
invoice date
invoice amount
You want to filter the resulting duplicates output table so that only some of the groups of duplicates are subject to additional processing.
To create a filter using the combination of key fields would be laborious. For example:
SETFILTERTO ((Vendor_No = "11475") AND (Invoice_No = "8752512") AND (Invoice_Date = `20191021`) AND (Invoice_Amount = 7125.80)) OR ((Vendor_No = "12130") AND (Invoice_No = "589134") AND (Invoice_Date = `20191117`) AND (Invoice_Amount = 10531.71)) OR ((Vendor_No = "13440") AND (Invoice_No = "5518912") AND (Invoice_Date = `20191015`) AND (Invoice_Amount = 11068.20))
Instead, you achieve the same result by creating a filter based on group number:
SETFILTERTOMATCH(GROUP_NUM, 3 , 8, 11)
Steps
You can test one or
more fields in the active table to detect whether duplicate values
or entire duplicate records exist.
You can test character, numeric, and datetime fields for duplicates.
If letters and numbers appear together in a character field, all
alphanumeric characters are tested.
Note
For results to be valid, the
field or fields being tested must be in sequential order prior to testing.
You can sort the field or fields in advance, or use the Presort option
during the duplicates test.
Select the fields
In the Navigator, open the table that you want to test for duplicates.
Select Analyze > Duplicates.
To detect duplicates in one or more fields:
On the Main tab, do one of the following:
From the Duplicates
On list, select a field or fields to test.
Click Duplicates On to select a field or fields to test, or to create an expression.
The order in which you select multiple fields is the order in which the fields appear in the output results. If you want to append results to an existing Analytics table, the field selection and order must be identical to the field selection and order in the existing table.
Optional. If you clicked Duplicates On, click the sort arrow beside one or more selected fields to specify a descending sort order in the output results (the default is ascending).
Do one of the following:
From the List Fields list, select one or more additional fields to
include in the output results.
Click List
Fields to select one or more additional fields to
include, or to create an expression.
Additional fields can provide useful context for the results.
The fields that you select for duplicates testing are displayed automatically
at the beginning of any result records. You do not need to specifically
selected them under List Fields.
Optional. Select Add Groups if you want to include the Group Number field in the output table.
The Group Number field assigns a sequentially incremented number to each unique group of duplicates.
To detect entire duplicate records:
In the Main tab, click Duplicates On.
Click Add All to add all fields
to Selected Fields.
Optional. Click the sort arrow beside one or more selected fields to specify a descending sort order in the output results (the default is ascending).
Click OK.
You do not need to select any fields from the List
Fields list because all fields in the table are displayed
automatically in the result records.
Optional. Select Add Groups if you want to include the Group Number field in the output table.
The Group Number field assigns a sequentially incremented number to each unique group of duplicates.
Exclude records from processing (optional)
If there are records in the current view that you want to exclude from processing, do one of the following:
Enter a condition in the If text box
Click If to create an IF statement using the Expression Builder
The If condition considers all records in the view and filters out those that do not meet the specified condition.
Note
The If condition is evaluated against only the records remaining in a table after any scope options have been applied (First, Next, While).
Deselect Presort (optional)
If the test field or fields are already sorted
based on a previous operation, deselect Presort to
save time when testing large tables for duplicates.
If data in the test field or fields is not sorted, you must leave Presort selected to ensure that all duplicates are found.
Note
If you deselect Presort, the
test field or fields must match the
field or fields that were previously sorted for results to be valid.
The message Warning: File out of sequence accompanies
results if there is a mismatch between selected and sorted fields.
If you output results to an Analytics table, the warning message appears
in the command log.
Configure the output
Click the Output tab.
In the To panel, select the appropriate output option.
Note
Output options are disabled is they do not apply to a particular analytical operation.
Option
Details
Screen
Displays the results in the Analytics display area.
Tip
You can click any linked result value in the display area to drill down to the associated record or records in the source table.
If the output table contains a large number of records, it is faster and more useful to save the results to a file than to display the results on the screen.
Print
Sends the results to the default printer.
Graph
Creates a graph of the results and display it in the Analytics display area.
File
Saves or appends the results to an Analytics table or a text file, or overwrites an existing table or file.
Analytics table the table is added to the open project if it is not already in the project.
Text file the file is saved outside Analytics.
If you selected File as the output type, specify the appropriate information in the As panel.
Option
Details
File Type
Analytics Table – saves or appends the results to an Analytics table, or overwrites an existing table.
ASCII Text File or Unicode Text file – saves or appends the results to a text file, or overwrites an existing file.
Name
To save to a new table or file
Enter a file name in the Name text box. If Analytics prefills a file name, you can accept the prefilled name, or change it.
To append to an existing table or file, or to overwrite an existing table or file
Click Name and select an existing table or file in the Save or Save File As dialog box.
You can specify an absolute or relative file path, or navigate to a different folder, to save, append, or overwrite in a location other than the project location. For example:
C:\Results\Output.fil or Results\Output.fil.
Note
Analytics table names are limited to 64 alphanumeric characters, not including the .FIL extension. The name can include the underscore character ( _ ), but no other special characters, or any spaces. The name cannot start with a number.
Local
Enabled only when connected to a server table and saving, appending, or overwriting to an Analytics table.
Select Local saves, appends, or overwrites to the same location as the Analytics project.
Leave Local deselected saves, appends, or overwrites to the Prefix folder on a server.
Note
For output results produced from analysis or processing of AX Server tables, select Local. You cannot deselect the Local setting to import results tables to AX Server.
Optional. Specify a Header or a Footer in the appropriate text box.
Click Header or Footer to enter a header or footer of more than one line. You can also enter a semi-colon (;) as a line-break character in the header or footer text box.
Headers and footers are centered by default. Type a left angle bracket (<) before the header or footer text to left align the text. Left aligning multiple lines requires a left angle bracket at the beginning of each line.
Note
Specifying a header or a footer is not available for some output types.
Specify the scope of the operation
Click the More tab.
In the Scope panel, select the appropriate option:
(Default) Specifies that all records in the view are processed.
First
Enter a number in the text box. Starts processing at the first record in the view and includes only the specified number of records.
Next
Enter a number in the text box. Starts processing at the currently selected record in the view and includes only the specified number of records. The actual record number must be selected in the leftmost column in the view, not data in the row.
While
Use a WHILE statement to limit the processing of records in the view based on a particular criterion or set of criteria.
Enter a condition in the While text box, or click While to create a WHILE statement using the Expression Builder.
A WHILE statement allows records in the view to be processed only while the specified condition evaluates to true. As soon as the condition evaluates to false, the processing terminates, and no further records are considered.
You can use the While option in conjunction with the All, First, or Next options. Record processing stops as soon as one limit is reached.
Note
The First or Next options reference either the physical order or the indexed order of records in a table. First or Next disregard any filtering or quick sorting applied to a table view. However, output results of analytical operations respect any filtering.
If a view is quick sorted, Next behaves like First.
Finalize the settings and output the results
Optional. If you selected File as the output type, and you want to append the output results to the end of an existing table or file, do one of the following:
Select Append To Existing File if you are certain that the output results and the table or file are identical in structure.
Selecting Append To Existing File forces the append to happen even if the record lengths of the output results and an existing table or file are different. If the record lengths are different, the data structure is not identical, and the append will not work correctly.
Leave Append To Existing File deselected if you want Analytics to compare the record lengths of the output results and an existing table. This automated check is available only for Analytics tables, not text files.
Note
Leave Append To Existing File deselected if you are uncertain whether the output results and the existing table have an identical data structure. For more information about appending and data structure, see Appending output results to an existing table.
If you selected File (Analytics Table) as the output type, select Use Output Table if you want the output table to open automatically upon completion of the operation.
Click OK.
In the output results, only the duplicate values or records are displayed,
and not the initial occurrence of a value or a record, if you do both of the following:
Output the results to screen or a text file
Include
only tested fields in the output results and do not select any additional
fields
If you output to screen, you can click any value to see
the initial occurrence of a value or a record along with the duplicates.
If the overwrite prompt appears, select the appropriate
option.
You
can use the summarize operation to remove duplicate values or records from
a data set and save the remaining unique values or records to a new Analytics table.
In the Navigator, open the table that you want to remove duplicates from.
Select Analyze > Summarize.
On the Main tab, do one of the
following:
Select the field or fields that may contain duplicate
values from the Summarize On list.
Click Summarize On to select the
field or fields, or to create an expression.
The order in which you select the fields is the order in which the columns appear in the results.
Note
Select the appropriate fields to achieve your required degree of uniqueness.
For example, if you want to remove duplicate
employee records and you select only the last name field,
you risk removing all records for employees who have the same
last name, but a different first name. Select both the last name and the first name fields to increase the degree of uniqueness.
To remove only perfectly
duplicate records, click Summarize On and Add All.
Do not select any Subtotal Fields.
Optional. Do one of the following:
From the Other Fields list, select the other field(s) to include in the
output results.
Click Other Fields to select the
field(s), or to create an expression.
Note
Select only fields that contain the same value for all records in each summarized group. For more information, see The Other Fields option.
Deselect Presort (optional)
If the field that may contain duplicate values is already
sorted, deselect Presort.
If the data in the field is not sorted, you must leave Presort selected to
ensure valid results.
Exclude records from processing (optional)
If there are records in the current view that you want to exclude from processing, do one of the following:
Enter a condition in the If text box
Click If to create an IF statement using the Expression Builder
The If condition considers all records in the view and filters out those that do not meet the specified condition.
Note
The If condition is evaluated against only the records remaining in a table after any scope options have been applied (First, Next, While).
Configure the output
Click the Output tab.
In the To panel, select File.
Specify the following information in the As panel:
File Type – Analytics
Table is the only option. Saves the results to a new Analytics table,
or appends the results to an existing Analytics table.
Name – Enter a table name in the Name text
box. Or click Name and enter the table name,
or select an existing table in the Save or Save
File As dialog box to overwrite or append to the table.
If Analytics prefills a table name, you can accept the prefilled
name, or change it.
You can also specify an absolute or relative
file path, or navigate to a different folder, to save or append
the table in a location other than the project location. For example:
C:\Results\No_duplicates.fil or Results\No_duplicates.fil.
Local – Only enabled when connected
to a server table. Select Local to save the output
table to the same location as the project, or to specify a path
or navigate to a different local folder. Leave Local deselected
to save the output table to the Prefix folder on the Analytics server.
Note
For
output results produced from analysis or processing of Analytics Exchange server
tables, select Local. You cannot use the Local setting
to import results tables to AX Server.
Specify the scope of the operation
Click the More tab.
In the Scope panel, select the appropriate option:
(Default) Specifies that all records in the view are processed.
First
Enter a number in the text box. Starts processing at the first record in the view and includes only the specified number of records.
Next
Enter a number in the text box. Starts processing at the currently selected record in the view and includes only the specified number of records. The actual record number must be selected in the leftmost column in the view, not data in the row.
While
Use a WHILE statement to limit the processing of records in the view based on a particular criterion or set of criteria.
Enter a condition in the While text box, or click While to create a WHILE statement using the Expression Builder.
A WHILE statement allows records in the view to be processed only while the specified condition evaluates to true. As soon as the condition evaluates to false, the processing terminates, and no further records are considered.
You can use the While option in conjunction with the All, First, or Next options. Record processing stops as soon as one limit is reached.
Note
The First or Next options reference either the physical order or the indexed order of records in a table. First or Next disregard any filtering or quick sorting applied to a table view. However, output results of analytical operations respect any filtering.
If a view is quick sorted, Next behaves like First.
Finalize the settings and output the results
Select Use Output Table if you want the output table to open automatically upon completion of the operation.
If you want to append the output results to the end of
an existing Analytics table, do one of the following:
Select Append To Existing File if
you are certain the output results and the existing table are identical
in structure.
Leave Append To Existing File deselected
if you want Analytics to compare the record lengths of the output
results and the existing table. If the record lengths are not identical,
the data structure is not identical, and the append will not work correctly.
Note
Leaving Append To Existing File deselected
is recommended if you are uncertain whether the output results and
the existing table have an identical data structure. For more information
about appending and data structure, see Appending output results to an existing table.
Click OK.
If the overwrite prompt appears, select the appropriate
option.