Checking for Unicode compatibility

When upgrading to a Unicode edition, you need to verify that any custom logic you have added to scripts will produce the same results when run against Unicode data. There are predictable areas where scripts may be affected when they are run against Unicode data.

Bit and Character functions

Each of the functions listed below returns values based on byte locations or byte counts. You need to check to ensure that these functions are still being used correctly when you move from the single-byte representation of characters used in the non-Unicode edition to the double-byte character encoding used for Unicode data:

  • ASCII( )

  • BIT( )

  • BYTE( )

  • CHR( )

  • DIGIT( )

  • HEX( )

  • MASK( )

  • SHIFT( )

Byte length does not equal character length

You need to check the way the following functions are used in your scripts to ensure that they do not assume a one-to-one correspondence between the number of characters in data and the number of bytes.

If you find any instances where the logic assumes a one-to-one correspondence between characters and bytes, you must adjust the logic to work correctly with Unicode data, which uses two bytes to represent each character. Numbers specified as string function parameters, such as 4 in STRING(1000, 4) refer to the number of characters, so standard usage of these functions will not cause problems.

Conversion Functions

  • PACKED( )

  • STRING( )

  • UNSIGNED( )

  • VALUE( )

  • ZONED( )

String functions

  • AT( )

  • BLANKS( )

  • INSERT( )

  • LAST( )

  • LENGTH( )

  • REPEAT( )

  • SUBSTRING( )

Miscellaneous functions

  • FILESIZE( )

  • LEADING( )

  • OFFSET( )

  • RECLEN( )

Substituting Unicode-specific functions

Diligent Unicode products support six Unicode-specific functions that support conversions between non-Unicode and Unicode data. The following functions are available in Diligent Unicode products:

  • BINTOSTR() converts ZONED or EBCDIC data to its corresponding Unicode string. This ensures that values encoded as ZONED or EBCDIC data can be displayed correctly
  • DHEX( ) returns the hexadecimal equivalent of a specified Unicode field value. This function is the inverse of HTOU( )
  • DBYTE( ) returns the Unicode character interpretation of a double-byte character at a specified position in a record
  • DTOU( ) converts a date value to the correct Unicode string display based on the specified locale setting
  • HTOU( ) returns the Unicode equivalent of a specified hexadecimal string. This function is the inverse of DHEX( )
  • UTOD( ) converts a locale-specific Unicode string to an Analytics date value