ACL Unicode products

The Unicode editions of ACL products allow you to view and work with files that contain Unicode or encoded data. Unicode is a standard of encoding text that uses two bytes to represent each character. Characters for all languages are contained in a single character set. You can define and analyze Unicode data, such as multilingual data with various alphabets and character representations in ACL. Many major software application vendors such as Microsoft, Oracle and SAP allow you to store and transfer data in Unicode format.

Unicode or non-Unicode?

To identify which edition of ACL you are using, select Help > About ACL Analytics to open a dialog box with the product serial number, and the software version number. Unicode editions of ACL display the word Unicode after the version number. Non-Unicode editions display only the version number.

Importing Unicode files into ACL

ACL supports importing both UTF-16 and UTF-8 Unicode files. UTF-16 and UTF-8 are the two most common Unicode character encodings. The Data Definition Wizard allows you to select the type of file you are importing.

Importing files with ASCII or EBCDIC text into ACL

If you use the Unicode edition of ACL to define a file that contains ASCII or EBCDIC-encoded text as a Delimited text file or Print Image (Report) File, the resulting fields containing this text are assigned the Unicode type by default. ACL creates a .fil file to hold the data from the original file, and this .fil file is Unicode. As a result, any print image or delimited file that you create with ACL cannot have any fields with an ASCII or EBCDIC type. If you want a field to have a Character data type, you must select Unicode from the Type drop-down list. If you select ASCII or EBCDIC from the Type drop-down list for any field, even if the original data is ASCII or EBCDIC, the resulting table will display and behave incorrectly.

For example, if you import an ASCII file as a delimited text file in ACL, and you want to change a field from a numeric to character type, you must select Unicode from the Type drop-down list in the Data Definition Wizard or the Table Layout dialog box.

If you choose to define a file using the Other file format option, ACL does not create a .fil file. Therefore, if the file contains ASCII or EBCDIC-encoded text, ACL automatically selects the correct data type. If you want to change the type of a field, such as a numeric field to a character field, you must determine whether the underlying data is ASCII or EBCDIC or Unicode and select the appropriate choice from the Type drop-down list.

Conversion of non-Unicode ACL projects to Unicode

If you use a Unicode edition of ACL to open an ACL project created in a non-Unicode edition, you have the option of automatically converting the project and the associated log file to Unicode, or cancelling the operation. If you proceed with the conversion, copies of the original non-Unicode project and the log file are saved with the file extension .OLD, and are not altered. The original .fil files are not converted to Unicode and remain unchanged.

Once you have converted a non-Unicode project to Unicode, you are no longer able to open it in non-Unicode editions of ACL, and ACL does not support converting it back to non-Unicode. You must use a Unicode edition of ACL to open and work with Unicode projects.

Importing encoded (UTF-8) text files

In the Data Definition Wizard, ACL has an Encoded Text option on the Character Set page. ACL uses the term Encoded Text to refer to UTF-8 text.

By default, ACL does not select the Encoded Text option for any file. However, if you import data into ACL that has been encoded as UTF-8 data, you should select the Encoded Text option. You can then select the appropriate code page for your data file. Note that when you finish defining the file, all character fields in the file are assigned a type of Unicode.

Field lengths in ACL

If you define an ASCII text file in the Data Definition Wizard, and select Delimited Text File or Print Image (Report) file in the File Format page, all character fields in the resulting table are assigned a Unicode type by default. Because they are defined as Unicode, which uses two bytes for each character, these character fields have a length twice as long as fields defined as ASCII in the non-Unicode edition of ACL. Similarly, if you define a Microsoft Excel or Access file in ACL, all character fields are now defined as Unicode by default with the resulting longer length. Do not change the Unicode type to ASCII, as your fields will not behave correctly.

When an ASCII file is defined as Other file format, the ASCII type is preserved for the character fields in that file.

Fields with non-character data types, such as Datetime or Numeric, are not affected.

When working with Unicode data, keep in mind the distinction between the length of a field in bytes, which appears in the Table Layout dialog box, and the length of a field in terms of characters. If a Unicode field has a length of 44 bytes as shown in the Table Layout dialog box, it actually consists of 22 characters. When you use functions such as STRING( ) and SUBSTRING( ) that include a parameter that refers to the length of a character field, the length should be calculated in characters, not bytes.

Little-endian and big-endian data

“Little-endian” and “big-endian” are terms that refer to two different methods of encoding Unicode data. Unicode data that originates from Microsoft Windows computers is typically encoded as little-endian. If you use ACL on a Windows computer, you cannot analyze big-endian data.

Unicode-specific functions in ACL

ACL has six Unicode-specific functions to aid with data analysis and conversion. The functions are summarized in the table below. For detailed information about these functions, refer to the ACL Language Reference.

Function

Purpose

BINTOSTR( )

Converts ZONED or EBCDIC data to its corresponding Unicode string. This ensures that values encoded in ZONED or EBCDIC can be displayed correctly.

DBYTE( )

Returns the Unicode character interpretation of a double byte at a specified position in a record.

DHEX( )

Converts a Unicode string to a hexadecimal string. The inverse of HTOU( ).

HTOU( )

Converts a hexadecimal string to a Unicode string. The inverse of DHEX( ).

DTOU( )

Converts a date to a Unicode string, which allows dates in a numeric format to be displayed in various languages. The inverse of UTOD( ).

UTOD( )

Converts a Unicode string to a date, which allows dates in various languages to be displayed in a numeric format. The inverse of DTOU( ).



(C) 2013 ACL Services Ltd. All Rights Reserved. | Send feedback