Defining misaligned fields in a print image or PDF file

The procedure below outlines techniques for defining misaligned fields in a print image or PDF file. The techniques require that you only approximately define the misaligned fields in the Data Definition Wizard. Once the data is in Analytics, you create one or more computed fields that use Analytics functions to precisely shape the data in the fields, including aligning all values.

For information about creating a computed field, see Define a conditional computed field.

To define misaligned fields in a print image or PDF file:

  1. Define any aligned fields in the usual manner.

    At a minimum, you must define at least one field as part of defining a set of records. Try to create a set of records that captures all the record data in the file, even if much of the data is misaligned.

    For more information about defining a set of records, see Quick Start: How to define a print image or PDF file.

  2. Use one or more of the following techniques to define misaligned fields:

    Create a field definition that is long enough to capture the leftmost and the rightmost characters in the misaligned field.

    In Analytics, you will use the ALLTRIM( ) function to align the field.

    Create overlapping field definitions, if necessary.

    In some cases, data misalignment results in the values in two different fields overlapping. Define each field separately, so that all the values that belong in each field are captured by their respective field definitions.

    The same byte positions will be contained by the end of one field definition and the beginning of the other field definition. In the shared byte positions, try to capture consistently structured data – for example, a single unbroken string of characters, rather than a single string in some records, and two separate strings, or no characters, in other records.

    In Analytics, you will use the ALLTRIM( ), the REGEXREPLACE( ), and the SPLIT( ) functions to align the fields and get rid of unwanted characters.

    Create a single, long field definition that encompasses multiple misaligned fields.

    If an entire section of a set of records is misaligned, you can use a long field definition to capture the problematic section of the record data. The field must be long enough to capture the leftmost and the rightmost characters in the block of misaligned data.

    If misaligned data occurs in separate sections of a set of records, create additional long field definitions to capture each misaligned section.

    Note

    If the values in a field vary in the number of words they contain, try to create a separate field definition for these values, or ensure that these values represent the last field at the end of a long field definition encompassing multiple misaligned fields. The “Product Description” field in the sample “Inventory.pdf” is an example of a field in which values vary in number of words.

    In Analytics, you will use the ALLTRIM( ), the REGEXREPLACE( ), and the SPLIT( ) functions to break up the single field into separate, aligned data elements.

  3. Check the entire file to ensure that none of the values in the misaligned fields are outside the aqua-blue highlighting of their field definition. Adjust the length of the field definition, if required.
  4. Make sure that a data type of Character is specified for each field definition in the Field Definition dialog box.
  5. Complete the import process in the usual manner.

    In the Edit Field Properties page, make sure that a data type of ASCII or UNICODE is specified for every field.

    For more information, see Define and import a print image file, or Define and import a PDF file.

  6. For a misaligned field with no data from an overlapping field, create a computed field in Analytics that uses the following expression:
    ALLTRIM(misaligned_field_name)

    Leading and trailing spaces are removed from the field, which aligns all values in the field.

  7. For a misaligned field that contains data from an overlapping field, do the following in Analytics:
    1. Create an initial computed field that uses the following expression to replace one or more spaces between the field value and the unwanted characters with a single space:
      ALLTRIM(REGEXREPLACE(misaligned_field_name, "\s+", " "))

      The expression also removes leading and trailing spaces from the misaligned field.

    2. Create a second computed field that uses one of the following expressions to extract the field value and discard the unwanted characters.
      • If the unwanted characters are at the end of the field, use this expression:

        SPLIT(initial_computed_field_name," ", 1)
      • If the unwanted characters are at the beginning of the field, use this expression:

        SPLIT(initial_computed_field_name," ", 2)

      Tip

      If unwanted characters are sometimes at the end of a field, and sometimes at the beginning, or if they are present in only some of the records, you need to create a conditional computed field that applies different versions of the SPLIT( ) expression to different parts of the misaligned field. For example, the condition RECNO( ) > 100 allows you to apply a version of the expression to only those records beyond the first 100 records.

      For more information, see Define a conditional computed field.

  8. For a long field definition that encompasses multiple misaligned fields, do the following in Analytics:
    1. Create an initial computed field that uses the following expression to replace one or more spaces between data elements with a single space:
      ALLTRIM(REGEXREPLACE(long_field_name, "\s+", " "))

      The expression also removes leading and trailing spaces from the long field.

      Tip

      You may find including the OMIT( ) function in the expression is useful for removing pieces of data that appear inconsistently and complicate subsequent processing. For example, OMIT(ALLTRIM(REGEXREPLACE(long_field_name, "\s+", " ")), "-") does the same as the expression above, and also removes hyphens.

    2. Create a second computed field that uses this expression to extract the first data element:
      SPLIT(initial_computed_field_name," ", 1)
    3. Create as many additional computed fields as required, using variations of the same expression, to extract all the data elements.

      For example:

      SPLIT(initial_computed_field_name," ", 2)
      SPLIT(initial_computed_field_name," ", 3)

      To specify successive data elements, keep increasing the number in the segment parameter of the SPLIT( ) function.

    Note

    For field values that contain more than one word, such as the values in the “Product Description” field in the sample “Inventory.pdf”, this technique isolates each word in a separate field. If required, you can reunite the values by concatenating the separate fields. For more information, see Concatenating fields.

  9. Once you have finished extracting all the data elements to separate fields, do the following to convert numeric and datetime data to the appropriate data type:
    1. For numeric fields, create a computed field that uses this expression:
      VALUE(field_name, number_of_decimal_places)

      For more information, see VALUE( ) function.

    2. For date fields, create a computed field that uses this expression:
      CTOD(field_name, "date_format")

      For more information, see CTOD( ) function.

      To converted datetime or time values, use the CTODT( ) or the CTOT( ) functions.

    Tip

    You can save labor, and create fewer computed fields, by converting the data type at the same time you apply functions to correct misaligned data. For example:

    VALUE(ALLTRIM(misaligned_field_name), 2)

  10. Once you have created all the required computed fields, add them to the table view.

    You do not need to add the initial computed field to the view, and you can remove any misaligned fields, or long field or fields, from the view.

    For more information, see Add columns to a view, or Remove columns from a view.