OMIT( ) function

Returns a string with one or more specified substrings removed.

Syntax

OMIT(string1, string2 <,case_sensitive>)

Parameters

Name Type Description
string1 character The field, expression, or literal value to remove one or more substrings from.
string2 character

One or more substrings to remove.

  • Use commas to separate multiple substrings.
  • Use a space after a comma only if it is part of the substring you want to remove.
  • If the double quotation mark character occurs in any of the substrings, enclose the entire string2 parameter in single quotation marks (' ') rather than double quotation marks.
  • To omit a comma, place a single comma last in the list of substrings, followed immediately by the closing quotation mark (see the final example below).
case_sensitive

optional

logical

Specify T to make the substrings case-sensitive, or F to ignore case.

If case_sensitive is omitted, the default value of T is used.

Output

Character.

Examples

Basic examples

Literal character input

Returns "Intercity Couriers":

OMIT("Intercity Couriers Corporation", " Corporation, Corp.")

Returns "Inter-city Couriers":

OMIT("Inter-city Couriers Corp.", " Corporation, Corp.")

Note

The Levenshtein distance between the returned values in the first two examples is 1. If the generic elements are not removed, the distance between the two examples is 8, which could allow the values to escape detection as fuzzy duplicates of each other.

Field input

Returns all the values in the Vendor_Name field with generic elements such as "Corporation" and "Inc." removed:

OMIT(Vendor_Name," Corporation, Corp., Corp, Inc., Inc, Ltd., Ltd")

Returns all the values in the Vendor_Name field with generic elements such as "Corporation" and "Inc." removed:

OMIT(Vendor_Name," ,.,Corporation,Corp,Inc,Ltd")

Note

The two preceding examples both return the same results but the syntax for the second example is more efficient.

Returns all the values in the Vendor_Name field with "Corporation" and "Corp" removed, and all commas removed:

OMIT(Vendor_Name," Corporation, Corp,")

Remarks

OMIT( ) can remove substrings as units

The OMIT( ) function removes one or more substrings from a string. It differs from functions such as CLEAN( ), EXCLUDE( ), INCLUDE( ), and REMOVE( ) because it matches and removes characters on a substring basis rather than on a character-by-character basis. Substring removal allows you to remove specific words, abbreviations, or repeated sequences of characters from a string without affecting the remainder of the string.

A helper function for fuzzy comparisons

OMIT( ) can improve the effectiveness of the ISFUZZYDUP( ) function, or the FUZZYDUP or FUZZYJOIN commands, by removing generic elements such as "Corporation" or "Inc.", or characters such as commas, periods, and ampersands (&), from field values.

Removal of generic elements and punctuation focuses the fuzzy comparison on just the portion of the values where a meaningful difference may occur.

How the order of substrings affects results

If you specify multiple substrings for removal, the order in which you list them in string2 can affect the output results.

When the OMIT( ) function is processed, the first substring is removed from all values that contain it, the second substring is then removed from all values that contain it, and so on. If one substring forms part of another substring – for example, "Corp" and "Corporation" – removing the shorter substring first also alters values containing the longer substring ("Corporation" becomes "oration") and prevents the longer substring from being found.

To avoid this situation, specify longer substrings before any shorter substrings they contain. For example:

OMIT(Vendor_Name," Corporation, Corp., Corp")

Try removing special characters first

You can specify single-character substrings, such as punctuation marks, special characters, and a blank space, which further reduces the generic material in strings. It may be more efficient to remove a single character first, such as a period or a blank, which reduces the number of substring variations you subsequently need to specify. Compare the third and fourth examples above. They both return the same results, but the fourth example is more efficient.

Dealing with blanks or spaces

Blanks or spaces in substrings are treated like any other character. You must explicitly specify each blank you want removed as part of a substring. For example, if you specify an ampersand without any blanks ("&"), "Ricoh Sales & Service" becomes "Ricoh Sales  Service". If you include blanks (" & "), "Ricoh Sales & Service" becomes "Ricoh SalesService".

If you specify a blank that is not part of the substring, the substring will not be found. For example, if you specify an ampersand with blanks (" & "), "Ricoh Sales&Service" remains unchanged.

When using commas to separate multiple substrings, follow the comma with a space only if it corresponds with the actual substring you want to remove.

One approach to dealing with blanks is to remove all blanks from a field first, by specifying a blank as a single-character substring prior to specifying any other substrings.

Review the results of using OMIT( )

After using OMIT( ) to created a computed field, review the contents of the field to confirm that you have not inadvertently omitted meaningful portions of strings. For example, omitting "Co" gets rid of a common abbreviation for "Company", but it also removes the letters "Co" from two locations in "Coca-Cola".