REGEXREPLACE( ) function

Replaces all instances of strings matching a regular expression with a new string.

Syntax

REGEXREPLACE(string, pattern, new_string)

Parameters

Name Type Description
string

character

The field, expression, or literal value to test for a matching pattern.
pattern

character

The pattern string (regular expression) to search for.

pattern can contain literal characters, metacharacters, or a combination of the two. Literal characters include all alphanumeric characters, some punctuation characters, and blanks.

The search is case-sensitive, which means that uppercase and lowercase alpha characters must be explicitly specified.

new_string character

The string to use to replace all values matching pattern.

The replacement string can contain literal characters, groups of characters from the original string (using the $int element), or a combination of the two.

Output

Character.

Examples

Basic examples

Working with spaces

Returns "AB CD EF", by replacing multiple spaces between text characters with a single space:

REGEXREPLACE("AB CD   EF", "\s+", " ")

Returns the character field data with the spacing between words standardized on a single space:

REGEXREPLACE(character_field, "\s+", " ")

Returns the character field data with the spacing between words standardized on a single space. Using the BLANKS( ) function in new_string, rather than a literal space, makes spaces easier to read and less likely to be overlooked:

REGEXREPLACE(character_field, "\s+", BLANKS(1))

Standardizing telephone numbers

Returns "(123) 456-7890". The formatting of the telephone number '1234567890' is standardized:

REGEXREPLACE(SUBSTR("1234567890",1,14), "(\d{3})[\s-]*(\d{3})[\s-]*(\d{4})","($1) $2-$3")

Returns the numbers in the Telephone_Number field and standardizes their formatting:

REGEXREPLACE(Telephone_Number, ".*(\d{3})[\s-\.\)]*(\d{3})[\s-\.]*(\d{4})", "($1) $2-$3")

Extracts "123-456-7890" from the surrounding text:

REGEXREPLACE("Tel num: 123-456-7890 (office)", "(.*)(\d{3}[\s-\)\.]*\d{3}[\s-\.]*\d{4})(.*)", "$2")

Extracts telephone numbers from surrounding text in the Comment field and standardizes their formatting:

REGEXREPLACE(Comment, "(.*)(\d{3})[\s-\)\.]*(\d{3})[\s-\.]*(\d{4})(.*)","($2) $3-$4")

Identifying generic formats

Returns "9XXX-999xx", which represents the generic format of the value specified by string ("1ABC-123aa"):

REGEXREPLACE(REGEXREPLACE(REGEXREPLACE("1ABC-123aa","\d","9"),"[a-z]","x"),"[A-Z]", "X")

Returns the generic format of all identifiers in the Invoice_Number field:

REGEXREPLACE(REGEXREPLACE(REGEXREPLACE(Invoice_Number,"\d","9"),"[a-z]","x"),"[A-Z]", "X")

Standardizing name format

Returns "John David Smith":

REGEXREPLACE("Smith, John David", "^(\w+),(\s\w+)(\s\w+)?(\s\w+)?","$2$3$4 $1")

Returns the names in the Full_Name field in their regular order: First (Middle) (Middle) Last:

REGEXREPLACE(Full_Name, "^(\w+),(\s\w+)(\s\w+)?(\s\w+)?","$2$3$4 $1")

Note

Name data can present various complications, such as apostrophes in names. Accounting for variations in name data typically requires more complex regular expressions than the one provided in the example above.

Removing HTML markup

Returns "https://www.flgov.com/wp-content/uploads/orders/2020/EO_20-166.pdf":

REGEXREPLACE("<a href='https://www.flgov.com/wp-content/uploads/orders/2020/EO_20-166.pdf' target='blank'>https://www.flgov.com/wp-content/uploads/orders/2020/EO_20-166.pdf</a>", "<[^>]*>",' ')

Returns the hyperlinks in the Source_URL_Link field with the HTML markup removed:

REGEXREPLACE(Source_URL_Link, "<[^>]*>",' ')

Remarks

How it works

The REGEXREPLACE( ) function uses a regular expression to find matching patterns in data, and replaces any matching values with a new string.

For example:

REGEXREPLACE(character_field, "\s+", " ")

standardizes spacing in character data by replacing one or more spaces between text characters with a single space.

The search portion of REGEXREPLACE( ) is identical to the REGEXFIND( ) function. For detailed information about the search capability common to both functions, see REGEXFIND( ) function.

When to use REGEXREPLACE( )

Use REGEXREPLACE( ) any time you want to find and replace data in Analytics using simple or complex pattern matching.

Replacing characters with themselves

You can use the $int element to replace characters with themselves, which allows you to preserve the meaningful parts of data, while standardizing or omitting surrounding or intermixed data.

Several examples using telephone numbers and names appear above.

To use the $int element you must first create groups by using parentheses ( ) in the pattern value. For more information, see REGEXFIND( ) function.

Avoiding sequential character matching

You can avoid sequential character matching, and replace substrings regardless of their position in relation to one another, by nesting REGEXREPLACE( ) functions.

The problem in the two examples below is to derive a generic format from alphanumeric source data in which numbers and letters can appear in any order. Without knowing the order of numbers and letters, how do you construct the pattern string?

The solution is to first find and replace numbers using the inner REGEXREPLACE( ) function, and then find and replace letters using the outer REGEXREPLACE( ) function.

Returns "999XXX":

REGEXREPLACE(REGEXREPLACE("123ABC","\d","9"),"[A-Z]","X")

Returns "9X9X9X":

REGEXREPLACE(REGEXREPLACE("1A2B3C","\d","9"),"[A-Z]","X")

Replacement string length and truncation

When you use REGEXREPLACE( ) to create a computed field, the computed field length is identical to the original field length.

If the replacement string length exceeds the target string length, overall string length increases, which results in truncation if the computed field length cannot accommodate the increased string length.

Characters that trail the target string are truncated first, followed by trailing replacement string characters. The examples below illustrate truncation:

string

pattern

new_string

Field length

Result

Truncated characters

x123x

123

A

5

"xAx"

none

x123x

123

ABC

5

"xABCx"

none

x123x

123

ABCD

5

"xABCD"

"x"

x123x

123

ABCDE

5

"xABCD"

"x", "E"

x123x

123

ABCDE

6

"xABCDE"

"x"

x123x

123

ABCDE

7

"xABCDEx"

none

How to avoid truncation

To avoid truncation, use the SUBSTR( ) function to increase field length, as demonstrated by the second example below.

Returns "xABCD", which truncates the replacement character "E" and the existing character "x":

REGEXREPLACE("x123x","123","ABCDE")

Returns "xABCDEx", which includes all replacement characters and unreplaced existing characters:

REGEXREPLACE(SUBSTR("x123x",1,10),"123","ABCDE")

Regular expression metacharacters

The table below lists metacharacters you can use with REGEXFIND( ) and REGEXREPLACE( ) and describes the operation that each one performs.

Additional regular expression syntax exists, and is supported by Analytics, but it is more complex. A full explanation of additional syntax is beyond the scope of this guide. Numerous resources explaining regular expressions are available on the Internet.

Analytics uses the ECMAScript implementation of regular expressions. Most implementations of regular expressions use a common core syntax.

Note

The current implementation of regular expressions in Analytics does not fully support searching languages other than English.

Metacharacter

Description

.

Matches any character (except a new line character)

?

Matches 0 or 1 occurrences of the immediately preceding literal, metacharacter, or element

*

Matches 0 or more occurrences of the immediately preceding literal, metacharacter, or element

+

Matches 1 or more occurrences of the immediately preceding literal, metacharacter, or element

{}

Matches the specified number of occurrences of the immediately preceding literal, metacharacter, or element. You can specify an exact number, a range, or an open-ended range.

For example:

  • a{3} matches "aaa"

  • X{0,2}L matches "L", "XL", and "XXL"

  • AB-\d{2,}-YZ matches any alphanumeric identifier with the prefix "AB-", the suffix "-YZ", and two or more numbers in the body of the identifier

[]

Matches any single character inside the brackets

For example:

  • [aeiou] matches a, or e, or i, or o, or u

  • [^aeiou] matches any character that is not a, or e, or i, or o, or u

  • [A-G] matches any uppercase letter from A to G

  • [A-Ga-g] matches any uppercase letter from A to G, or any lowercase letter from a to g

  • [5-9] matches any number from 5 to 9

()

Creates a group that defines a sequence or block of characters, which can then be treated as a single unit.

For example:

  • S(ch)?mid?th? matches "Smith" or "Schmidt"

  • (56A.*){2} matches any alphanumeric identifier in which the sequence "56A" occurs at least twice

  • (56A).*-.*\1 matches any alphanumeric identifier in which the sequence "56A" occurs at least twice, with a hyphen located between two of the occurrences

\

An escape character that specifies that the character immediately following is a literal. Use the escape character if you want to literally match metacharacters. For example, \( finds a left parenthesis, and \\ finds a backslash.

Use the escape character if you want to literally match any of the following characters:

^ $ . * + ? = ! : | \ ( ) [ ] { }

Other punctuation characters such as the ampersand (&) or the 'at sign' (@) do not require the escape character.

\int

Specifies that a group, previously defined with parentheses ( ), recurs. int is an integer that identifies the sequential position of the previously defined group in relation to any other groups. This metacharacter can be used in the pattern parameter in both REGEXFIND( ) and REGEXREPLACE( ).

For example:

  • (123).*\1 matches any identifier in which the group of digits "123" occurs at least twice

  • ^(\d{3}).*\1 matches any identifier in which the first 3 digits recur

  • ^(\d{3}).*\1.*\1 matches any identifier in which the first 3 digits recur at least twice

  • ^(\D)(\d)-.*\2\1 matches any identifier in which the alphanumeric prefix recurs with the alpha and numeric characters reversed

$int

Specifies that a group found in a target string is used in a replacement string. int is an integer that identifies the sequential position of the group in the target string in relation to any other groups. This metacharacter can only be used in the new_string parameter in REGEXREPLACE( ).

For example:

  • If the pattern (\d{3})[ -]?(\d{3})[ -]?(\d{4}) is used to match a variety of different telephone number formats, the new_string($1)-$2-$3 can be used to replace the numbers with themselves, and standardize the formatting. 999 123-4567 and 9991234567 both become (999)-123-4567.

|

Matches the character, block of characters, or expression before or after the pipe (|)

For example:

  • a|b matches a or b

  • abc|def matches "abc" or "def"

  • Sm(i|y)th matches Smith or Smyth

  • [a-c]|[Q-S]|[x-z] matches any of the following letters: a, b, c, Q, R, S, x, y, z

  • \s|- matches a space or a hyphen

\w

Matches any word character (a to z, A to Z, 0 to 9, and the underscore character _ )

\W

Matches any non-word character (not a to z, A to Z, 0 to 9, or the underscore character _ )

\d

Matches any number (any decimal digit)

\D

Matches any non-number (any character that is not a decimal digit)

\s

Matches a space (a blank)

\S

Matches any non-space (a non-blank character)

\b

Matches a word boundary (between \w and \W characters)

Word boundaries consume no space themselves. For example:

  • "United Equipment" contains 4 word boundaries – one either side of the space, and one at the start and the end of the string. "United Equipment" is matched by the regular expression \b\w*\b\W\b\w*\b

Tip

In addition to spaces, word boundaries can result from commas, periods, and other non-word characters.

For example, the following expression evaluates to True:

REGEXFIND("jsmith@example.net", "\bexample\b")

^

Matches the start of a string

Inside brackets [ ], ^ negates the contents

$

Matches the end of a string

Related functions

If you want to find matching patterns without replacing them, use the REGEXFIND( ) function.