REGEXFIND( ) function

Returns a logical value indicating whether the pattern specified by a regular expression occurs in a string.

Syntax

REGEXFIND(string, pattern)

Parameters

Name Type Description
string

character

The field, expression, or literal value to test for a matching pattern.
pattern

character

The pattern string (regular expression) to search for.

pattern can contain literal characters, metacharacters, or a combination of the two. Literal characters include all alphanumeric characters, some punctuation characters, and blanks.

The search is case-sensitive, which means that uppercase and lowercase alpha characters must be explicitly specified.

Output

Logical. Returns T (true) if the specified pattern value is found, and F (false) otherwise.

Examples

Basic examples

Alpha character patterns

Returns T for all records that contain "Phoenix", "Austin", or "Los Angeles" in the Vendor_City field. Returns F otherwise:

REGEXFIND(Vendor_City, "Phoenix|Austin|Los Angeles")

Returns T for all last names that start with "John" or "Jon". For example: John, Jon, Johnson, Johnston, Jonson, Jonston, Jones, and so on. Returns F otherwise:

REGEXFIND(Last_Name,"^Joh?n")

Returns T for only those last names that are "John" or "Jon". Returns F otherwise:

REGEXFIND(Last_Name,"^Joh?n\b")

Numeric character patterns

Returns T for all records with invoice numbers that contain "98". Returns F otherwise:

REGEXFIND(Invoice_Number, "98")

Returns T for all records with invoice numbers that begin with "98". Returns F otherwise:

REGEXFIND(Invoice_Number, "\b98")

Returns T for all records with invoice numbers that end with "98". Returns F otherwise:

REGEXFIND(Invoice_Number, "98\b")

Returns T for all records with invoice numbers that contain "98" in the 5th and 6th positions. Returns F otherwise:

REGEXFIND(Invoice_Number, "\b\d\d\d\d98")
REGEXFIND(Invoice_Number, "\b\d{4}98")

Mixed character patterns

Returns T for all records with product codes that start with 3 numbers, followed by a hyphen and 6 letters. Returns F otherwise:

REGEXFIND(Product_Code, "\b\d{3}-[a-zA-Z]{6}\b")

Returns T for all records with product codes that start with 3 or more numbers, followed by a hyphen and 6 or more letters. Returns F otherwise:

REGEXFIND(Product_Code, "\b\d{3,}-[a-zA-Z]{6}")

Returns T for all records with alphanumeric invoice identifiers that contain "98" in the 5th and 6th positions. Returns F otherwise:

REGEXFIND(Invoice_Number, "\b\w{4}98")

Returns T for all records with invoice identifiers that contain both of the following, otherwise returns F:

  • any character in the first four positions
  • "98" in the 5th and 6th positions
REGEXFIND(Invoice_Number, "\b.{4}98")

Returns T for all records with invoice identifiers that contain "98" preceded by 1 to 4 initial characters. Returns F otherwise:

REGEXFIND(Invoice_Number, "\b.{1,4}98")

Returns 'T' for all records with invoice identifiers that contain all of the following, otherwise returns F:

  • any character in the first three positions
  • "5" or "6" in the 4th position
  • "98" in the 5th and 6th positions
REGEXFIND(Invoice_Number, "\b.{3}[56]98")

Returns T for all records with invoice identifiers that contain all of the following, otherwise returns F:

  • any character in the first two positions
  • "55" or "56" in the 3rd and 4th positions
  • "98" in the 5th and 6th positions
REGEXFIND(Invoice_Number, "\b.{2}(55|56)98")

Remarks

How it works

The REGEXFIND( ) function uses a regular expression to search data in Analytics.

Regular expressions are powerful and flexible search strings that combine literal characters and metacharacters, which are special characters that perform a wide variety of search operations.

For example:

REGEXFIND(Last_Name,"Sm(i|y)the{0,1}")

uses the group ( ) , alternation | , and quantifier { } metacharacters to create a regular expression that finds "Smith", "Smyth", "Smithe", or "Smythe" in the Last_Name field.

Matching performed sequentially

Matching between string and pattern values is performed sequentially. In the example above:

  • "S" is matched against the first position in the Last_Name field
  • "m" is matched against the second position
  • "i" and "y" are matched against the third position
  • "t" is matched against the fourth position
  • "h" is matched against the fifth position
  • "e" is matched against the sixth position, if a sixth position exists in the source value

When to use REGEXFIND( )

Use REGEXFIND( ) to search data using simple or complex pattern matching.

Constructing regular expressions can be tricky, especially if you are new to the syntax. You may be able to achieve your search goals using simpler Analytics search functions such as FIND( ), MATCH( ), or MAP( ).

If your search requirements exceed the capabilities of these simpler functions, regular expressions give you almost unlimited flexibility in constructing search strings.

How REGEXFIND( ) handles spaces

Spaces (blanks) are treated as characters in both string and pattern, so you should exercise care when dealing with spaces.

In pattern, you can indicate a space either literally, by typing a space, or by using the metacharacter \s. Using the metacharacter makes spaces easier to read in pattern, and therefore less likely to be overlooked, especially when you construct more complex patterns.

Concatenating fields

You can concatenate two or more fields in string if you want to search across multiple fields simultaneously.

For example:

REGEXFIND(Vendor_Name+Vendor_Street,"Hardware.*Main")

searches both the Vendor_Name and the Vendor_Street fields for the words "Hardware" and "Main" separated by zero or more characters.

A business with the word "Hardware" in its name, located on a street called "Main", matches the regular expression. So does a business called "Hardware on Main".

The concatenated fields are treated like a single field that includes leading and trailing spaces from the individual fields, unless you use the ALLTRIM( ) function to remove spaces.

Order of concatenated fields matters

Because REGEXFIND( ) searches for the characters in pattern in the order in which you specify them, the order in which you concatenate the fields has an effect. If you reversed Vendor_Name and Vendor_Street in the expression above, you would be less likely to get any results.

Regular expression metacharacters

The table below lists metacharacters you can use with REGEXFIND( ) and REGEXREPLACE( ) and describes the operation that each one performs.

Additional regular expression syntax exists, and is supported by Analytics, but it is more complex. A full explanation of additional syntax is beyond the scope of this guide. Numerous resources explaining regular expressions are available on the Internet.

Analytics uses the ECMAScript implementation of regular expressions. Most implementations of regular expressions use a common core syntax.

Note

The current implementation of regular expressions in Analytics does not fully support searching languages other than English.

Metacharacter

Description

.

Matches any character (except a new line character)

?

Matches 0 or 1 occurrences of the immediately preceding literal, metacharacter, or element

*

Matches 0 or more occurrences of the immediately preceding literal, metacharacter, or element

+

Matches 1 or more occurrences of the immediately preceding literal, metacharacter, or element

{}

Matches the specified number of occurrences of the immediately preceding literal, metacharacter, or element. You can specify an exact number, a range, or an open-ended range.

For example:

  • a{3} matches "aaa"

  • X{0,2}L matches "L", "XL", and "XXL"

  • AB-\d{2,}-YZ matches any alphanumeric identifier with the prefix "AB-", the suffix "-YZ", and two or more numbers in the body of the identifier

[]

Matches any single character inside the brackets

For example:

  • [aeiou] matches a, or e, or i, or o, or u

  • [^aeiou] matches any character that is not a, or e, or i, or o, or u

  • [A-G] matches any uppercase letter from A to G

  • [A-Ga-g] matches any uppercase letter from A to G, or any lowercase letter from a to g

  • [5-9] matches any number from 5 to 9

()

Creates a group that defines a sequence or block of characters, which can then be treated as a single unit.

For example:

  • S(ch)?mid?th? matches "Smith" or "Schmidt"

  • (56A.*){2} matches any alphanumeric identifier in which the sequence "56A" occurs at least twice

  • (56A).*-.*\1 matches any alphanumeric identifier in which the sequence "56A" occurs at least twice, with a hyphen located between two of the occurrences

\

An escape character that specifies that the character immediately following is a literal. Use the escape character if you want to literally match metacharacters. For example, \( finds a left parenthesis, and \\ finds a backslash.

Use the escape character if you want to literally match any of the following characters:

^ $ . * + ? = ! : | \ ( ) [ ] { }

Other punctuation characters such as the ampersand (&) or the 'at sign' (@) do not require the escape character.

\int

Specifies that a group, previously defined with parentheses ( ), recurs. int is an integer that identifies the sequential position of the previously defined group in relation to any other groups. This metacharacter can be used in the pattern parameter in both REGEXFIND( ) and REGEXREPLACE( ).

For example:

  • (123).*\1 matches any identifier in which the group of digits "123" occurs at least twice

  • ^(\d{3}).*\1 matches any identifier in which the first 3 digits recur

  • ^(\d{3}).*\1.*\1 matches any identifier in which the first 3 digits recur at least twice

  • ^(\D)(\d)-.*\2\1 matches any identifier in which the alphanumeric prefix recurs with the alpha and numeric characters reversed

$int

Specifies that a group found in a target string is used in a replacement string. int is an integer that identifies the sequential position of the group in the target string in relation to any other groups. This metacharacter can only be used in the new_string parameter in REGEXREPLACE( ).

For example:

  • If the pattern (\d{3})[ -]?(\d{3})[ -]?(\d{4}) is used to match a variety of different telephone number formats, the new_string($1)-$2-$3 can be used to replace the numbers with themselves, and standardize the formatting. 999 123-4567 and 9991234567 both become (999)-123-4567.

|

Matches the character, block of characters, or expression before or after the pipe (|)

For example:

  • a|b matches a or b

  • abc|def matches "abc" or "def"

  • Sm(i|y)th matches Smith or Smyth

  • [a-c]|[Q-S]|[x-z] matches any of the following letters: a, b, c, Q, R, S, x, y, z

  • \s|- matches a space or a hyphen

\w

Matches any word character (a to z, A to Z, 0 to 9, and the underscore character _ )

\W

Matches any non-word character (not a to z, A to Z, 0 to 9, or the underscore character _ )

\d

Matches any number (any decimal digit)

\D

Matches any non-number (any character that is not a decimal digit)

\s

Matches a space (a blank)

\S

Matches any non-space (a non-blank character)

\b

Matches a word boundary (between \w and \W characters)

Word boundaries consume no space themselves. For example:

  • "United Equipment" contains 4 word boundaries – one either side of the space, and one at the start and the end of the string. "United Equipment" is matched by the regular expression \b\w*\b\W\b\w*\b

Tip

In addition to spaces, word boundaries can result from commas, periods, and other non-word characters.

For example, the following expression evaluates to True:

REGEXFIND("jsmith@example.net", "\bexample\b")

^

Matches the start of a string

Inside brackets [ ], ^ negates the contents

$

Matches the end of a string

Related functions

If you want to find and replace matching patterns, use the REGEXREPLACE( ) function.