Testing for duplicates

Duplicate values in one or more fields, or duplicate records, can be the result of data entry errors or fraudulent activity such as splitting credit card transactions to avoid scrutiny.

Requirement for unique values

Fields that should never contain duplicates are ones where the values uniquely identify records. For example, an employee table should never have duplicate employee numbers because each number should identify a unique employee.

Valid duplicates

Duplicate values may also be valid. For example, a transaction table could have duplicate customer numbers because of multiple transactions by the same customers.

Different types of duplicates testing

You can use Analytics to test for duplicates in the following ways:

Test scope Use this test when:
One field

All values in a particular field should be unique, such as employee numbers, or check numbers.

Two or more fields in combination

Uniqueness is not a requirement of any field in isolation, but it is a requirement of certain fields in combination.

Example

In a payroll file covering a year, the employee number field and the pay date field are going to contain numerous duplicates. Employees get paid every two weeks, and multiple employees are paid on the same date.

However, an individual employee should appear only once for a particular date. If a duplicate exists across the employee number and pay date fields in combination, an employee may have been paid twice for the same pay period.

All fields in a record

Checking for entire duplicate records, in which every field in a record is duplicated. Entire duplicate records could be the result of data entry error, or other transactional irregularities.

Steps

You can test one or more fields in the active table to detect whether duplicate values or entire duplicate records exist.

Remove duplicates

You can use the summarize operation to remove duplicate values or records from a data set and save the remaining unique values or records to a new Analytics table.

Analytics 14.1 Help