OMIT( ) function
Returns a string with one or more specified substrings removed.
Syntax
OMIT(string1, string2 <,case_sensitive>)
Parameters
Name | Type | Description |
---|---|---|
string1 | character | The field, expression, or literal value to remove one or more substrings from. |
string2 | character |
One or more substrings to remove.
|
case_sensitive
optional |
logical |
Specify T to make the substrings case-sensitive, or F to ignore case. If case_sensitive is omitted, the default value of T is used. |
Output
Character.
Examples
Basic examples
Literal character input
Returns "Intercity Couriers":
OMIT("Intercity Couriers Corporation", " Corporation, Corp.")
Returns "Inter-city Couriers":
OMIT("Inter-city Couriers Corp.", " Corporation, Corp.")
Note
The Levenshtein distance between the returned values in the first two examples is 1. If the generic elements are not removed, the distance between the two examples is 8, which could allow the values to escape detection as fuzzy duplicates.
Field input
Returns all the values in the Vendor_Name field with generic elements such as "Corporation" and "Inc." removed:
OMIT(Vendor_Name," Corporation, Corp., Corp, Inc., Inc, Ltd., Ltd")
Returns all the values in the Vendor_Name field with generic elements such as "Corporation" and "Inc." removed:
OMIT(Vendor_Name," ,.,Corporation,Corp,Inc,Ltd")
Note
The two preceding examples both return the same results but the syntax for the second example is more efficient.
Returns all the values in the Vendor_Name field with "Corporation" and "Corp" removed, and all commas removed:
OMIT(Vendor_Name," Corporation, Corp,")
Remarks
OMIT( ) can remove substrings as units
The OMIT( ) function removes one or more substrings from a string. It differs from functions such as CLEAN( ), EXCLUDE( ), INCLUDE( ), and REMOVE( ) because it matches and removes characters on a substring basis rather than on a character-by-character basis. Substring removal allows you to remove specific words, abbreviations, or repeated sequences of characters from a string without affecting the remainder of the string.
A helper function for fuzzy comparisons
OMIT( ) can improve the effectiveness of the LEVDIST( ) or ISFUZZYDUP( ) functions, or the FUZZYDUP or FUZZYJOIN commands, by removing generic elements such as "Corporation" or "Inc." from field values. Removal of generic elements focuses the string comparisons performed by these functions and commands on just the portion of the strings where a meaningful difference may occur.
How the order of substrings affects results
If you specify multiple substrings for removal, the order in which you list them in string2 can affect the output results.
When the OMIT( ) function is processed, the first substring is removed from all values that contain it, the second substring is then removed from all values that contain it, and so on. If one substring forms part of another substring – for example, "Corp" and "Corporation" – removing the shorter substring first also alters values containing the longer substring ("Corporation" becomes "oration") and prevents the longer substring from being found.
To avoid this situation, specify longer substrings before any shorter substrings they contain. For example:
OMIT(Vendor_Name," Corporation, Corp., Corp")
Try removing special characters first
You can specify single-character substrings, such as punctuation marks, special characters, and a blank space, which further reduces the generic material in strings. It may be more efficient to remove a single character first, such as a period or a blank, which reduces the number of substring variations you subsequently need to specify. Compare the third and fourth examples above. They both return the same results, but the fourth example is more efficient.
Dealing with blanks or spaces
Blanks or spaces in substrings are treated like any other character. You must explicitly specify each blank you want removed as part of a substring. For example, if you specify an ampersand without any blanks ("&"), "Ricoh Sales & Service" becomes "Ricoh Sales Service". If you include blanks (" & "), "Ricoh Sales & Service" becomes "Ricoh SalesService".
If you specify a blank that is not part of the substring, the substring will not be found. For example, if you specify an ampersand with blanks (" & "), "Ricoh Sales&Service" remains unchanged.
When using commas to separate multiple substrings, follow the comma with a space only if it corresponds with the actual substring you want to remove.
One approach to dealing with blanks is to remove all blanks from a field first, by specifying a blank as a single-character substring prior to specifying any other substrings.
Review the results of using OMIT( )
After using OMIT( ) to created a computed field, review the contents of the field to confirm that you have not inadvertently omitted meaningful portions of strings. For example, omitting "Co" gets rid of a common abbreviation for "Company", but it also removes the letters "Co" from two locations in "Coca-Cola".