REGEXREPLACE( ) function
Replaces all instances of strings matching a regular expression with a new string.
Syntax
REGEXREPLACE(string, pattern, new_string)
Parameters
Name | Type | Description |
---|---|---|
string |
character |
The field, expression, or literal value to test for a matching pattern. |
pattern |
character |
The pattern string (regular expression) to search for. pattern can contain literal characters, metacharacters, or a combination of the two. Literal characters include all alphanumeric characters, some punctuation characters, and blanks. The search is case-sensitive, which means that uppercase and lowercase alpha characters must be explicitly specified. |
new_string | character |
The string to use to replace all values matching pattern. The replacement string can contain literal characters, groups of characters from the original string (using the $int element), or a combination of the two. |
Output
Character.
Examples
Basic examples
Working with spaces
Returns "AB CD EF", by replacing multiple spaces between text characters with a single space:
REGEXREPLACE("AB CD EF", "\s+", " ")
Returns the character field data with the spacing between words standardized on a single space:
REGEXREPLACE(character_field, "\s+", " ")
Returns the character field data with the spacing between words standardized on a single space. Using the BLANKS( ) function in new_string, rather than a literal space, makes spaces easier to read and less likely to be overlooked:
REGEXREPLACE(character_field, "\s+", BLANKS(1))
Standardizing telephone numbers
Returns "(123) 456-7890". The formatting of the telephone number '1234567890' is standardized:
REGEXREPLACE(SUBSTR("1234567890",1,14), "(\d{3})[\s-]*(\d{3})[\s-]*(\d{4})","($1) $2-$3")
Returns the numbers in the Telephone_Number field and standardizes their formatting:
REGEXREPLACE(Telephone_Number, ".*(\d{3})[\s-\.\)]*(\d{3})[\s-\.]*(\d{4})", "($1) $2-$3")
Extracts "123-456-7890" from the surrounding text:
REGEXREPLACE("Tel num: 123-456-7890 (office)", "(.*)(\d{3}[\s-\)\.]*\d{3}[\s-\.]*\d{4})(.*)", "$2")
Extracts telephone numbers from surrounding text in the Comment field and standardizes their formatting:
REGEXREPLACE(Comment, "(.*)(\d{3})[\s-\)\.]*(\d{3})[\s-\.]*(\d{4})(.*)","($2) $3-$4")
Identifying generic formats
Returns "9XXX-999xx", which represents the generic format of the value specified by string ("1ABC-123aa"):
REGEXREPLACE(REGEXREPLACE(REGEXREPLACE("1ABC-123aa","\d","9"),"[a-z]","x"),"[A-Z]", "X")
Returns the generic format of all identifiers in the Invoice_Number field:
REGEXREPLACE(REGEXREPLACE(REGEXREPLACE(Invoice_Number,"\d","9"),"[a-z]","x"),"[A-Z]", "X")
Standardizing name format
Returns "John David Smith":
REGEXREPLACE("Smith, John David", "^(\w+),(\s\w+)(\s\w+)?(\s\w+)?","$2$3$4 $1")
Returns the names in the Full_Name field in their regular order: First (Middle) (Middle) Last:
REGEXREPLACE(Full_Name, "^(\w+),(\s\w+)(\s\w+)?(\s\w+)?","$2$3$4 $1")
Note
Name data can present various complications, such as apostrophes in names. Accounting for variations in name data typically requires more complex regular expressions than the one provided in the example above.
Remarks
How it works
The REGEXREPLACE( ) function uses a regular expression to find matching patterns in data, and replaces any matching values with a new string.
For example:
REGEXREPLACE(character_field, "\s+", " ")
standardizes spacing in character data by replacing one or more spaces between text characters with a single space.
The search portion of REGEXREPLACE( ) is identical to the REGEXFIND( ) function. For detailed information about the search capability common to both functions, see REGEXFIND( ) function.
When to use REGEXREPLACE( )
Use REGEXREPLACE( ) any time you want to find and replace data in Analytics using simple or complex pattern matching.
Replacing characters with themselves
You can use the $int element to replace characters with themselves, which allows you to preserve the meaningful parts of data, while standardizing or omitting surrounding or intermixed data.
Several examples using telephone numbers and names appear above.
To use the $int element you must first create groups by using parentheses ( ) in the pattern value. For more information, see REGEXFIND( ) function.
Avoiding sequential character matching
You can avoid sequential character matching, and replace substrings regardless of their position in relation to one another, by nesting REGEXREPLACE( ) functions.
The problem in the two examples below is to derive a generic format from alphanumeric source data in which numbers and letters can appear in any order. Without knowing the order of numbers and letters, how do you construct the pattern string?
The solution is to first find and replace numbers using the inner REGEXREPLACE( ) function, and then find and replace letters using the outer REGEXREPLACE( ) function.
Returns "999XXX":
REGEXREPLACE(REGEXREPLACE("123ABC","\d","9"),"[A-Z]","X")
Returns "9X9X9X":
REGEXREPLACE(REGEXREPLACE("1A2B3C","\d","9"),"[A-Z]","X")
Replacement string length and truncation
When you use REGEXREPLACE( ) to create a computed field, the computed field length is identical to the original field length.
If the replacement string length exceeds the target string length, overall string length increases, which results in truncation if the computed field length cannot accommodate the increased string length.
Characters that trail the target string are truncated first, followed by trailing replacement string characters. The examples below illustrate truncation:
string |
pattern |
new_string |
Field length |
Result |
Truncated characters |
---|---|---|---|---|---|
x123x |
123 |
A |
5 |
"xAx" |
none |
x123x |
123 |
ABC |
5 |
"xABCx" |
none |
x123x |
123 |
ABCD |
5 |
"xABCD" |
"x" |
x123x |
123 |
ABCDE |
5 |
"xABCD" |
"x", "E" |
x123x |
123 |
ABCDE |
6 |
"xABCDE" |
"x" |
x123x |
123 |
ABCDE |
7 |
"xABCDEx" |
none |
How to avoid truncation
To avoid truncation, use the SUBSTR( ) function to increase field length, as demonstrated by the second example below.
Returns "xABCD", which truncates the replacement character "E" and the existing character "x":
REGEXREPLACE("x123x","123","ABCDE")
Returns "xABCDEx", which includes all replacement characters and unreplaced existing characters:
REGEXREPLACE(SUBSTR("x123x",1,10),"123","ABCDE")
Regular expression metacharacters
The table below lists metacharacters you can use with REGEXFIND( ) and REGEXREPLACE( ) and describes the operation that each one performs.
Additional regular expression syntax exists, and is supported by Analytics, but it is more complex. A full explanation of additional syntax is beyond the scope of this guide. Numerous resources explaining regular expressions are available on the Internet.
Analytics uses the ECMAScript implementation of regular expressions. Most implementations of regular expressions use a common core syntax.
Note
The current implementation of regular expressions in Analytics does not fully support searching languages other than English.
Metacharacter |
Description |
---|---|
. |
Matches any character (except a new line character) |
? |
Matches 0 or 1 occurrences of the immediately preceding literal, metacharacter, or element |
* |
Matches 0 or more occurrences of the immediately preceding literal, metacharacter, or element |
+ |
Matches 1 or more occurrences of the immediately preceding literal, metacharacter, or element |
{} |
Matches the specified number of occurrences of the immediately preceding literal, metacharacter, or element. You can specify an exact number, a range, or an open-ended range. For example:
|
[] |
Matches any single character inside the brackets For example:
|
() |
Creates a group that defines a sequence or block of characters, which can then be treated as a single unit. For example:
|
\ |
An escape character that specifies that the character immediately following is a literal. Use the escape character if you want to literally match metacharacters. For example, \( finds a left parenthesis, and \\ finds a backslash. Use the escape character if you want to literally match any of the following characters: ^ $ . * + ? = ! : | \ ( ) [ ] { } Other punctuation characters such as the ampersand (&) or the 'at sign' (@) do not require the escape character. |
\int |
Specifies that a group, previously defined with parentheses ( ), recurs. int is an integer that identifies the sequential position of the previously defined group in relation to any other groups. This metacharacter can be used in the pattern parameter in both REGEXFIND( ) and REGEXREPLACE( ). For example:
|
$int |
Specifies that a group found in a target string is used in a replacement string. int is an integer that identifies the sequential position of the group in the target string in relation to any other groups. This metacharacter can only be used in the new_string parameter in REGEXREPLACE( ). For example:
|
| |
Matches the character, block of characters, or expression before or after the pipe (|) For example:
|
\w |
Matches any word character (a to z, A to Z, 0 to 9, and the underscore character _ ) |
\W |
Matches any non-word character (not a to z, A to Z, 0 to 9, or the underscore character _ ) |
\d |
Matches any number (any decimal digit) |
\D |
Matches any non-number (any character that is not a decimal digit) |
\s |
Matches a space (a blank) |
\S |
Matches any non-space (a non-blank character) |
\b |
Matches a word boundary (between \w and \W characters) Word boundaries consume no space themselves. For example:
Tip In addition to spaces, word boundaries can result from commas, periods, and other non-word characters. For example, the following expression evaluates to True: REGEXFIND("jsmith@example.net", "\bexample\b") |
^ |
Matches the start of a string Inside brackets [ ], ^ negates the contents |
$ |
Matches the end of a string |
Related functions
If you want to find matching patterns without replacing them, use the REGEXFIND( ) function.