Fuzzy duplicate helper functions

Two ACL functions help make the fuzzy duplicates feature more effective:

OMIT function

Prior to using the fuzzy duplicates feature, you can use the OMIT( ) function to create a computed field that removes specific words, abbreviations, or sequences of letters or numbers from a character field. Removal of generic elements such as “Corporation”, “Corp.”, “Street”, “Ave.”, and so on, focuses the fuzzy duplicates comparisons on just the portion of the character values where a meaningful difference may occur. You can test the computed field instead of the original field, and use a much lower Difference Threshold, which produces a smaller, more focused set of results containing fewer false positives.

For example, ‘Intercity Couriers Corporation’ and ‘Inter-city Couriers Corp.’ would require a Difference Threshold of at least 8 to be included in the results, which could allow the values to escape detection as fuzzy duplicates. A Difference Threshold that high would also produce a large, unfocused set of results containing mostly false positives. By contrast, if you use OMIT( ) to create a computed field with generic elements removed, a Difference Threshold of only 1 would be required to return ‘Inter-city Couriers’ and ‘Intercity Couriers’ as fuzzy duplicates.

For detailed information about the OMIT( ) function, see the ACL Language Reference. For more information about the Difference Threshold, see How the difference settings work.

ISFUZZYDUP function

After using the fuzzy duplicates feature and reviewing the results, you can use the ISFUZZYDUP( ) function to output an exhaustive list of fuzzy duplicates for any single character value in the results that appears to be of particular relevance to your audit goal. Exhaustive means that all values within the specified degree of difference of the test value are returned, regardless of their position in the test field relative to the test value.

By design, the results of the fuzzy duplicates feature are non-exhaustive, to prevent results from becoming very large and unmanageable. The non-exhaustive results may be sufficient for the purposes of your analysis. If they are not, you can use ISFUZZYDUP( ) to produce exhaustive results for individual character values.

For detailed information about the ISFUZZYDUP( ) function, see the ACL Language Reference. For more information about non-exhaustive groups and results, see How fuzzy duplicates are grouped.

Related concepts
About fuzzy duplicates
Controlling the size of fuzzy duplicate results
How the difference settings work
How fuzzy duplicates are grouped
Fuzzy duplicates overview
Related tasks
Testing for fuzzy duplicates
Working with fuzzy duplicate output results
Related reference
ISFUZZYDUP( ) function
OMIT( ) function


(C) 2013 ACL Services Ltd. All Rights Reserved. | Send feedback