SOUNDEX( ) function

Returns the soundex code for the specified string, which can be used for phonetic comparisons with other strings.

Syntax

SOUNDEX(name)

Parameters

Name Type Description
name character The character expression to evaluate.

Output

Character. Returns a four-character soundex code.

Examples

Basic examples

Words that sound the same but are spelled differently

The two examples below return the same soundex code because they sound the same even though they are spelled differently.

Returns F634:

SOUNDEX("Fairdale")

Returns F634:

SOUNDEX("Faredale")

Words that sound similar

The two examples below return soundex codes that are different, but close to one another, because the two words sound similar.

Returns J525:

SOUNDEX("Jonson")

Returns J523:

SOUNDEX("Jonston")

Words that sound different

The two examples below return soundex codes that are quite different, because the two words sound nothing alike.

Returns S530:

SOUNDEX("Smith")

Returns M235:

SOUNDEX("MacDonald")

Field input

Returns the soundex code for each value in the Last_Name field:

SOUNDEX(Last_Name)

Advanced examples

Identifying matching soundex codes

Create the computed field Soundex_Code to display the soundex code for each value in the Last_Name field:

DEFINE FIELD Soundex_Code COMPUTED SOUNDEX(Last_Name)

Add the computed field Soundex_Code to the view, and then perform a duplicates test on the computed field to identify any matching soundex codes:

DUPLICATES ON Soundex_Code OTHER Last_Name PRESORT OPEN TO "Possible_Dupes.fil"

Matching soundex codes indicate that the associated character values in the Last_Name field are possible duplicates.

Remarks

When to use SOUNDEX( )

Use the SOUNDEX( ) function to find values that sound similar. Phonetic similarity is one way of locating possible duplicate values, or inconsistent spelling in manually entered data.

How it works

SOUNDEX( ) returns the American Soundex code for the evaluated string. All codes are one letter followed by three numbers. For example: "F634".

How the soundex code is derived

  • The first character in the code represents the first letter of the evaluated string.
  • Each number in the code represents one of the six American Soundex groups. The groups are composed of phonetically similar consonants.

    Based on these groups, the soundex process encodes the first three consonants in the evaluated string after the first letter.

What the soundex process ignores

The soundex process ignores:

  • capitalization
  • vowels
  • the consonants "H", "W", and "Y"
  • any consonants that appear after the three encoded consonants

One or more trailing zeros (0) in the returned code indicate an evaluated string with fewer than three consonants after the first letter.

Limitations of the soundex process

Both the SOUNDEX( ) and SOUNDSLIKE( ) functions have certain limitations:

  • The soundex algorithm is designed to work with words pronounced in English, and has varying degrees of effectiveness when used with other languages.
  • Although the soundex process performs a phonetic match, matching words must all begin with the same letter, which means that some words that sound the same are not matched.

    For example, a word that begins with "F", and a word that begins with a "Ph", could sound the same but they will never be matched.

Related functions

  • SOUNDSLIKE( ) an alternate method for phonetically comparing strings.
  • ISFUZZYDUP( ) and LEVDIST compare strings based on an orthographic comparison (spelling) rather than on a phonetic comparison (sound).
  • DICECOEFFICIENT( ) de-emphasizes or completely ignores the relative position of characters or character blocks when comparing strings.
ACL Scripting Guide 14.1