REGEXFIND( ) function
Returns a logical value indicating whether the pattern specified by a regular expression occurs in a string.
Syntax
REGEXFIND(string, pattern)
Parameters
Name | Type | Description |
---|---|---|
string |
character |
The field, expression, or literal value to test for a matching pattern. |
pattern |
character |
The pattern string (regular expression) to search for. pattern can contain literal characters, metacharacters, or a combination of the two. Literal characters include all alphanumeric characters, some punctuation characters, and blanks. The search is case-sensitive, which means that uppercase and lowercase alpha characters must be explicitly specified. |
Output
Logical. Returns T (true) if the specified pattern value is found, and F (false) otherwise.
Examples
Basic examples
Alpha character patterns
Returns T for all records that contain "Phoenix", "Austin", or "Los Angeles" in the Vendor_City field. Returns F otherwise:
REGEXFIND(Vendor_City, "Phoenix|Austin|Los Angeles")
Returns T for all last names that start with "John" or "Jon". For example: John, Jon, Johnson, Johnston, Jonson, Jonston, Jones, and so on. Returns F otherwise:
REGEXFIND(Last_Name,"^Joh?n")
Returns T for only those last names that are "John" or "Jon". Returns F otherwise:
REGEXFIND(Last_Name,"^Joh?n\b")
Numeric character patterns
Returns T for all records with invoice numbers that contain "98". Returns F otherwise:
REGEXFIND(Invoice_Number, "98")
Returns T for all records with invoice numbers that begin with "98". Returns F otherwise:
REGEXFIND(Invoice_Number, "\b98")
Returns T for all records with invoice numbers that end with "98". Returns F otherwise:
REGEXFIND(Invoice_Number, "98\b")
Returns T for all records with invoice numbers that contain "98" in the 5th and 6th positions. Returns F otherwise:
REGEXFIND(Invoice_Number, "\b\d\d\d\d98")
REGEXFIND(Invoice_Number, "\b\d{4}98")
Mixed character patterns
Returns T for all records with product codes that start with 3 numbers, followed by a hyphen and 6 letters. Returns F otherwise:
REGEXFIND(Product_Code, "\b\d{3}-[a-zA-Z]{6}\b")
Returns T for all records with product codes that start with 3 or more numbers, followed by a hyphen and 6 or more letters. Returns F otherwise:
REGEXFIND(Product_Code, "\b\d{3,}-[a-zA-Z]{6}")
Returns T for all records with alphanumeric invoice identifiers that contain "98" in the 5th and 6th positions. Returns F otherwise:
REGEXFIND(Invoice_Number, "\b\w{4}98")
Returns T for all records with invoice identifiers that contain both of the following, otherwise returns F:
- any character in the first four positions
- "98" in the 5th and 6th positions
REGEXFIND(Invoice_Number, "\b.{4}98")
Returns T for all records with invoice identifiers that contain "98" preceded by 1 to 4 initial characters. Returns F otherwise:
REGEXFIND(Invoice_Number, "\b.{1,4}98")
Returns 'T' for all records with invoice identifiers that contain all of the following, otherwise returns F:
- any character in the first three positions
- "5" or "6" in the 4th position
- "98" in the 5th and 6th positions
REGEXFIND(Invoice_Number, "\b.{3}[56]98")
Returns T for all records with invoice identifiers that contain all of the following, otherwise returns F:
- any character in the first two positions
- "55" or "56" in the 3rd and 4th positions
- "98" in the 5th and 6th positions
REGEXFIND(Invoice_Number, "\b.{2}(55|56)98")
Remarks
How it works
The REGEXFIND( ) function uses a regular expression to search data in Analytics.
Regular expressions are powerful and flexible search strings that combine literal characters and metacharacters, which are special characters that perform a wide variety of search operations.
For example:
REGEXFIND(Last_Name,"Sm(i|y)the{0,1}")
uses the group ( ) , alternation | , and quantifier { } metacharacters to create a regular expression that finds "Smith", "Smyth", "Smithe", or "Smythe" in the Last_Name field.
Matching performed sequentially
Matching between string and pattern values is performed sequentially. In the example above:
- "S" is matched against the first position in the Last_Name field
- "m" is matched against the second position
- "i" and "y" are matched against the third position
- "t" is matched against the fourth position
- "h" is matched against the fifth position
- "e" is matched against the sixth position, if a sixth position exists in the source value
When to use REGEXFIND( )
Use REGEXFIND( ) to search data using simple or complex pattern matching.
Constructing regular expressions can be tricky, especially if you are new to the syntax. You may be able to achieve your search goals using simpler Analytics search functions such as FIND( ), MATCH( ), or MAP( ).
If your search requirements exceed the capabilities of these simpler functions, regular expressions give you almost unlimited flexibility in constructing search strings.
How REGEXFIND( ) handles spaces
Spaces (blanks) are treated as characters in both string and pattern, so you should exercise care when dealing with spaces.
In pattern, you can indicate a space either literally, by typing a space, or by using the metacharacter \s. Using the metacharacter makes spaces easier to read in pattern, and therefore less likely to be overlooked, especially when you construct more complex patterns.
Concatenating fields
You can concatenate two or more fields in string if you want to search across multiple fields simultaneously.
For example:
REGEXFIND(Vendor_Name+Vendor_Street,"Hardware.*Main")
searches both the Vendor_Name and the Vendor_Street fields for the words "Hardware" and "Main" separated by zero or more characters.
A business with the word "Hardware" in its name, located on a street called "Main", matches the regular expression. So does a business called "Hardware on Main".
The concatenated fields are treated like a single field that includes leading and trailing spaces from the individual fields, unless you use the ALLTRIM( ) function to remove spaces.
Order of concatenated fields matters
Because REGEXFIND( ) searches for the characters in pattern in the order in which you specify them, the order in which you concatenate the fields has an effect. If you reversed Vendor_Name and Vendor_Street in the expression above, you would be less likely to get any results.
Regular expression metacharacters
The table below lists metacharacters you can use with REGEXFIND( ) and REGEXREPLACE( ) and describes the operation that each one performs.
Additional regular expression syntax exists, and is supported by Analytics, but it is more complex. A full explanation of additional syntax is beyond the scope of this guide. Numerous resources explaining regular expressions are available on the Internet.
Analytics uses the ECMAScript implementation of regular expressions. Most implementations of regular expressions use a common core syntax.
Note
The current implementation of regular expressions in Analytics does not fully support searching languages other than English.
Metacharacter |
Description |
---|---|
. |
Matches any character (except a new line character) |
? |
Matches 0 or 1 occurrences of the immediately preceding literal, metacharacter, or element |
* |
Matches 0 or more occurrences of the immediately preceding literal, metacharacter, or element |
+ |
Matches 1 or more occurrences of the immediately preceding literal, metacharacter, or element |
{} |
Matches the specified number of occurrences of the immediately preceding literal, metacharacter, or element. You can specify an exact number, a range, or an open-ended range. For example:
|
[] |
Matches any single character inside the brackets For example:
|
() |
Creates a group that defines a sequence or block of characters, which can then be treated as a single unit. For example:
|
\ |
An escape character that specifies that the character immediately following is a literal. Use the escape character if you want to literally match metacharacters. For example, \( finds a left parenthesis, and \\ finds a backslash. Use the escape character if you want to literally match any of the following characters: ^ $ . * + ? = ! : | \ ( ) [ ] { } Other punctuation characters such as the ampersand (&) or the 'at sign' (@) do not require the escape character. |
\int |
Specifies that a group, previously defined with parentheses ( ), recurs. int is an integer that identifies the sequential position of the previously defined group in relation to any other groups. This metacharacter can be used in the pattern parameter in both REGEXFIND( ) and REGEXREPLACE( ). For example:
|
$int |
Specifies that a group found in a target string is used in a replacement string. int is an integer that identifies the sequential position of the group in the target string in relation to any other groups. This metacharacter can only be used in the new_string parameter in REGEXREPLACE( ). For example:
|
| |
Matches the character, block of characters, or expression before or after the pipe (|) For example:
|
\w |
Matches any word character (a to z, A to Z, 0 to 9, and the underscore character _ ) |
\W |
Matches any non-word character (not a to z, A to Z, 0 to 9, or the underscore character _ ) |
\d |
Matches any number (any decimal digit) |
\D |
Matches any non-number (any character that is not a decimal digit) |
\s |
Matches a space (a blank) |
\S |
Matches any non-space (a non-blank character) |
\b |
Matches a word boundary (between \w and \W characters) Word boundaries consume no space themselves. For example:
Tip In addition to spaces, word boundaries can result from commas, periods, and other non-word characters. For example, the following expression evaluates to True: REGEXFIND("jsmith@example.net", "\bexample\b") |
^ |
Matches the start of a string Inside brackets [ ], ^ negates the contents |
$ |
Matches the end of a string |
Related functions
If you want to find and replace matching patterns, use the REGEXREPLACE( ) function.