HASH( ) function
Returns a salted cryptographic hash value based on the input value.
Syntax
HASH(field <,salt_value>)
Parameters
Name | Type | Description |
---|---|---|
field |
character numeric datetime logical |
The value to hash. |
salt_value
optional |
character numeric |
The salt value to
use. You can specify a If omitted, the Analytics default salt value is used. The salt value is limited to 128 characters, and is automatically truncated to 128 characters if you specify a longer salt value. For more information, see The salt value. |
Output
Character.
Examples
Basic examples
With the Analytics default salt value
Returns "819A974BB91215D58E7753FD5A42226150100A0763087CA7DECD93F3C3090405":
HASH("555-44-3322")
Returns the hash value for each number in the Credit_card_num field:
HASH(Credit_card_num)
With a user-specified salt value
Returns "AD1E7D9B97B6F6B5345AB13471A74C31EBE6630CA2622BB7E8C280E9FBEE1F17":
HASH("555-44-3322", "my salt value 123")
Advanced examples
Ensuring hash values are identical
Use other functions in conjunction with HASH( ) to standardize clear text values that should produce identical hash values.
Consider the following set of examples. Note how the case of the clear text values completely alters the output hash value in the first two examples.
Returns "DF6789E1EC65055CD9CA17DD5B0BEA5892504DFE7661D258737AF7CB9DC46462":
HASH("John Smith")
Returns "3E12EABB5940B7A2AD90A6B0710237B935FAB68E629907927A65B3AA7BE6781D":
HASH("JOHN SMITH")
By using the UPPER( ) function to standardize case, an identical hash value results.
Returns "3E12EABB5940B7A2AD90A6B0710237B935FAB68E629907927A65B3AA7BE6781D":
HASH(UPPER("John Smith"))
Using HASH( ) to compare large blocks of text
Use HASH( ) to test if blocks of text in two comment fields are identical.
To perform this test, create two computed fields similar to the ones shown below, and then create a filter to find any text blocks that are not identical.
DEFINE FIELD Hash_1 COMPUTED HASH(Comment_Field_1) DEFINE FIELD Hash_2 COMPUTED HASH(Comment_Field_2) SET FILTER TO Hash_1 <> Hash_2
If the comment fields are in separate tables, create a computed HASH( ) field in each table and then use the computed fields as a common key field to do an unmatched join of the two tables. The records in the joined output table represent text blocks that are not identical.
Remarks
When to use HASH( )
Use the HASH( ) function to protect sensitive data, such as credit card numbers, salary information, or social security numbers.
How it works
HASH( ) provides one-way encoding. Data in clear text can be used to produce a hash value, however the hash value cannot subsequently be unencoded or decrypted.
A specific clear text value always produces the same hash value, so you can search a field of hashed credit card numbers for duplicates, or join two fields of hashed credit card numbers, and the results are the same as if you had performed the operation on the equivalent clear text fields.
Protecting sensitive data
To avoid storing sensitive data on a server, you can create a computed field locally using the HASH( ) function, and then create a new table by extracting the hashed field and any other required fields, while excluding the clear text field. You can use the new table on the server for your analysis, and once you have the results, refer back to the original table if you need to see the clear text version of any of the hashed data.
If storing sensitive data locally, beyond initial use, is prohibited, you can delete the original table after you have created the new table with the hashed values, and refer to the original source system for the clear text values.
Clear text values must be exactly identical
In order to produce identical hash values, two clear text values must be exactly identical. For example, different hash values result from the same credit card number with or without hyphens, or the same name in title case or all upper case.
You may need to incorporate functions such as INCLUDE( ), EXCLUDE( ), or UPPER( ) in the HASH( ) function to standardize clear text values.
Leading and trailing blanks are automatically trimmed by the HASH( ) function, so there is no need to use the TRIM( ) or ALLTRIM( ) functions.
What if leading or trailing blanks are meaningful?
If you have data in which leading or trailing blanks represent meaningful differences between values you need to replace the blanks with another character before hashing the values.
Replaces blanks in the field values with the underscore character (_) before hashing:
HASH(REPLACE(field_name, " ", "_"))
The cryptographic algorithm used by HASH( )
HASH( ) uses an SHA-2 cryptographic hash algorithm that produces a fixed-length hashed output of 64 bytes, regardless of the length of the input value. The clear text input value can be longer than 64 bytes.
The salt value
How it works
The protection offered by the HASH( ) function is strengthened by the automatic addition of a salt value prior to hashing. The salt value is an alphanumeric string that is concatenated with the source data value. The entire concatenated string is then used to produce the salted, hashed value. This approach makes the hashed values more resistant to decoding techniques.
Optionally specify your own salt value
A fixed, default salt value is automatically used unless you specify a salt value. You can use either of the following methods to specify a salt value:
-
Salt value as clear text string
Specify an alphanumeric string. For example:
HASH(Credit_card_num, "my salt value")
-
Salt value as password
Use the PASSWORD command in conjunction with the HASH( ) function and specify a PASSWORD identifier number from 1 to 10. For example:
PASSWORD 3 "Enter a salt value" EXTRACT FIELDS HASH(Credit_card_num, 3) TO "Protected_table"
Note
The PASSWORD salt value must be entered before the field in the HASH( ) function can be extracted.
The benefit of using a PASSWORD identifier number with HASH( ) is that you do not have to expose a clear text salt value.
For more information, see PASSWORD command.
Password method guidelines
The password method is intended for use in scripts that prompt for the password at the beginning of the script, or prior to the HASH( ) function appearing in the script.
The password method is not suitable for use in computed fields because PASSWORD assignments are deleted when you close Analytics.
In addition, computed fields that use a password-based salt value are automatically removed from views when you reopen Analytics. This removal is necessary to avoid the recalculation of hash values using the default salt value. The recalculated values would differ from the original hash values calculated with a user-supplied salt value.