REGEXFIND( ) function

Returns a logical value indicating whether the pattern specified by a regular expression occurs in a string.

Syntax

REGEXFIND(string, pattern)

Parameters

string

Character. The field, expression, or literal value to test for a matching pattern.

pattern

Character. The pattern (regular expression) to search for. The pattern parameter can contain literal characters, metacharacters (shown in Table 1), or a combination of the two. Literal characters include all alphanumeric characters, some punctuation characters, and blanks. The pattern parameter must be enclosed in quotation marks. The search is case-sensitive, which means that uppercase and lowercase alpha characters must be explicitly specified.

Output

Logical. Returns T (true) if the specified pattern value is found, and F (false) otherwise.

Remarks

The REGEXFIND( ) function allows you to construct a regular expression to find matching patterns in data. Regular expressions are powerful and flexible search strings that combine literal characters and metacharacters, which are special characters that perform a wide variety of search operations. For example:

REGEXFIND(Last_Name,"Sm(i|y)the{0,1}")

uses the group, alternation, and quantifier metacharacters to create a regular expression that finds “Smith”, “Smyth”, “Smithe”, or “Smythe” in the Last_Name field. A number of additional examples appear below.

Matching between the string and pattern parameters is performed sequentially. In the example above, “S” is matched against the first position in the Last_Name field, “m” is matched against the second position, “i” and “y” are matched against the third position, “t” is matched against the fourth position, and so on.

Spaces (blanks) are treated as characters in both the string and pattern parameters, so you should exercise care when dealing with spaces. In the pattern parameter, you can indicate a space either literally, by typing a space, or by using the metacharacter \s. Using the metacharacter makes spaces easier to read, and less likely to be overlooked, in a regular expression.

If you want to replace matching patterns with a new string, see REGEXREPLACE( ) function.

You can concatenate two or more fields in the string parameter if you want to search across multiple fields simultaneously. For example:

REGEXFIND(Vendor_Name+Vendor_Street,"Hardware.*Main")

searches both the Vendor_Name and the Vendor_Street fields for the words “Hardware” and “Main” separated by zero or more characters. A business with the word “Hardware” in its name, located on a street called “Main”, would match the regular expression. So would a business called “Hardware on Main”. The concatenated fields are treated like a single field that includes leading and trailing spaces from the individual fields, unless you use the ALLTRIM( ) function to remove spaces. Because REGEXFIND( ) searches for the characters in the pattern parameter in the order in which you specify them, the order in which you concatenate the fields has an effect. If you reversed Vendor_Name and Vendor_Street in the expression above, you would be less likely to get any results.

Constructing regular expressions can be tricky, especially if you are new to the syntax. You may be able to achieve your search goals using simpler ACL search functions such as FIND( ), MATCH( ), or MAP( ). If your search requirements exceed the capabilities of these simpler functions, regular expressions give you almost unlimited flexibility in constructing search strings.

Table 1 lists the most common metacharacters and describes the operation that each one performs. Additional regular expression syntax exists, and is more complex. A full explanation is beyond the scope of this guide. Numerous resources explaining regular expressions are available on the Internet. ACL uses the ECMAScript implementation of regular expressions. Most implementations of regular expressions use a common core syntax.

Note

The current implementation of regular expressions in ACL does not fully support searching languages other than English.

Table 1. Regular expression metacharacters

Metacharacter

Description

.

Matches any character (except a new line character)

?

Matches 0 or 1 occurrences of the immediately preceding literal, metacharacter, or element

*

Matches 0 or more occurrences of the immediately preceding literal, metacharacter, or element

+

Matches 1 or more occurrences of the immediately preceding literal, metacharacter, or element

{}

Matches the specified number of occurrences of the immediately preceding literal, metacharacter, or element. You can specify an exact number, a range, or an open-ended range.

For example:

  • a{3} matches “aaa”

  • X{0,2}L matches “L”, “XL”, and “XXL”

  • AB-\d{2,}-YZ matches any alphanumeric identifier with the prefix “AB-”, the suffix “-YZ”, and two or more numbers in the body of the identifier

[]

Matches any single character inside the brackets

For example:

  • [aeiou] matches a, or e, or i, or o, or u

  • [^aeiou] matches any character that is not a, or e, or i, or o, or u

  • [A-G] matches any uppercase letter from A to G

  • [A-Ga-g] matches any uppercase letter from A to G, or any lowercase letter from a to g

  • [5-9] matches any number from 5 to 9

()

Creates a group that defines a sequence or block of characters, which can then be treated as a single unit.

For example:

  • S(ch)?mid?th? matches “Smith” or “Schmidt”

  • (56A.*){2} matches any alphanumeric identifier in which the sequence “56A” occurs at least twice

  • (56A).*-.*\1 matches any alphanumeric identifier in which the sequence “56A” occurs at least twice, with a hyphen located between two of the occurrences

\

An escape character that specifies that the character immediately following is a literal. Use the escape character if you want to literally match metacharacters. For example, \( finds a left parenthesis, and \\ finds a backslash.

Use the escape character if you want to literally match any of the following characters:

^ $ . * + ? = ! : | \ ( ) [ ] { }

Other punctuation characters such as the ampersand (&) or the ‘at sign’ (@) do not require the escape character.

\int

Specifies that a group, previously defined with parentheses ( ), recurs. int is an integer that identifies the sequential position of the previously defined group in relation to any other groups. This metacharacter can be used in the pattern parameter in both REGEXFIND( ) and REGEXREPLACE( ).

For example:

  • (123).*\1 matches any identifier in which the group of digits “123” occurs at least twice

  • ^(\d{3}).*\1 matches any identifier in which the first 3 digits recur

  • ^(\d{3}).*\1.*\1 matches any identifier in which the first 3 digits recur at least twice

  • ^(\D)(\d)-.*\2\1 matches any identifier in which the alphanumeric prefix recurs with the alpha and numeric characters reversed

$int

Specifies that a group found in a target string is used in a replacement string. int is an integer that identifies the sequential position of the group in the target string in relation to any other groups. This metacharacter can only be used in the new_string parameter in REGEXREPLACE( ).

For example:

  • If the pattern (\d{3})[ -]?(\d{3})[ -]?(\d{4}) is used to match a variety of different telephone number formats, the new_string ($1)-$2-$3 can be used to replace the numbers with themselves, and standardize the formatting. 999 123-4567 and 9991234567 both become (999)-123-4567.

|

Matches the character, block of characters, or expression before or after the pipe (|)

For example:

  • a|b matches a or b

  • abc|def matches “abc” or “def”

  • Sm(i|y)th matches Smith or Smyth

  • [a-c]|[Q-S]|[x-z] matches any of the following letters: a, b, c, Q, R, S, x, y, z

  • \s|- matches a space or a hyphen

\w

Matches any word character (a to z, A to Z, 0 to 9, and the underscore character _ )

\W

Matches any non-word character (not a to z, A to Z, 0 to 9, or the underscore character _ )

\d

Matches any number (any decimal digit)

\D

Matches any non-number (any character that is not a decimal digit)

\s

Matches a space (a blank)

\S

Matches any non-space (a non-blank character)

\b

Matches a word boundary (between \w and \W characters)

Word boundaries consume no space themselves. For example:

  • “United Equipment” contains 4 word boundaries – one either side of the space, and one at the start and the end of the string. “United Equipment” is matched by the regular expression \b\w*\b\W\b\w*\b

^

Matches the start of a string

Inside brackets [ ], ^ negates the contents

$

Matches the end of a string

Examples

Example Return value

REGEXFIND(Vendor_City, "Phoenix|Austin|Los Angeles")

Returns ‘T’ for all records that contain “Phoenix”, “Austin”, or “Los Angeles” in the Vendor_City field. Returns ‘F’ otherwise.

REGEXFIND(Last_Name,"^Joh?n")

Returns ‘T’ for all last names that start with “John” or “Jon”.

For example: John, Jon, Johnson, Johnston, Jonson, Jonston, Jones, and so on.

Returns ‘F’ otherwise.

REGEXFIND(Last_Name,"^Joh?n\b")

Returns ‘T’ for only those last names that are “John” or “Jon”. Returns ‘F’ otherwise.

REGEXFIND(Product_Code, "\b\d{3}-[a-zA-Z]{6}\b")

Returns ‘T’ for all records with product codes that start with 3 numbers, followed by a hyphen and 6 letters. Returns ‘F’ otherwise.

REGEXFIND(Product_Code, "\b\d{3,}-[a-zA-Z]{6}")

Returns ‘T’ for all records with product codes that start with 3 or more numbers, followed by a hyphen and 6 or more letters. Returns ‘F’ otherwise.

REGEXFIND(Invoice_Number, "98")

Returns ‘T’ for all records with invoice numbers that contain “98”. Returns ‘F’ otherwise.

REGEXFIND(Invoice_Number, "\b98")

Returns ‘T’ for all records with invoice numbers that begin with “98”. Returns ‘F’ otherwise.

REGEXFIND(Invoice_Number, "98\b")

Returns ‘T’ for all records with invoice numbers that end with “98”. Returns ‘F’ otherwise.

REGEXFIND(Invoice_Number, "\b\d\d\d\d98")

Returns ‘T’ for all records with invoice numbers that contain “98” in the 5th and 6th positions. Returns ‘F’ otherwise.

REGEXFIND(Invoice_Number, "\b\d{4}98")

Returns ‘T’ for all records with invoice numbers that contain “98” in the 5th and 6th positions. Returns ‘F’ otherwise.

REGEXFIND(Invoice_Number, "\b\w{4}98")

Returns ‘T’ for all records with alphanumeric invoice identifiers that contain “98” in the 5th and 6th positions. Returns ‘F’ otherwise.

REGEXFIND(Invoice_Number, "\b.{4}98")

Returns ‘T’ for all records with invoice identifiers that contain:

  • any character in the first four positions, and

  • “98” in the 5th and 6th positions

Returns ‘F’ otherwise.

REGEXFIND(Invoice_Number, "\b.{1,4}98")

Returns ‘T’ for all records with invoice identifiers that contain “98” preceded by 1 to 4 initial characters. Returns ‘F’ otherwise.

REGEXFIND(Invoice_Number, "\b.{3}[56]98")

Returns ‘T’ for all records with invoice identifiers that contain:

  • any character in the first three positions, and

  • “5” or “6” in the 4th position, and

  • “98” in the 5th and 6th positions

Returns ‘F’ otherwise.

REGEXFIND(Invoice_Number, "\b.{2}(55|56)98")

Returns ‘T’ for all records with invoice identifiers that contain:

  • any character in the first two positions, and

  • “55” or “56” in the 3rd and 4th positions, and

  • “98” in the 5th and 6th positions

Returns ‘F’ otherwise.

Related reference
REGEXREPLACE( ) function
FIND( ) function
MATCH( ) function
MAP( ) function


(C) 2015 ACL Services Ltd. All Rights Reserved.