Defining and importing subsets of print image or PDF data

If defining a complete set of records in a print image or PDF file is difficult or even impossible because of misaligned data, you can define and import multiple subsets of data from the file. In Analytics, you can then append the resulting Analytics tables to assemble a complete data set.

This technique works best if the source file in the Data Definition Wizard contains blocks of records in which all fields are aligned within each block. If the data is more randomly misaligned, see Defining misaligned fields in a print image or PDF file.

Tip:

For PDF definition, you have the option of parsing the PDF file on a page-by-page basis. In some cases, data misalignment occurs across page breaks. You may be able to solve an alignment issue by using page-sized subsets of data.

To define and import a subset of print image or PDF data:

  1. Perform the definition and import process in the usual manner, with these differences:

    Define and import the same file multiple times.

    With each iteration, define a different subset of records. The fields in each subset must be internally aligned.

    A subset of records does not need to be contiguous. For example, the fields in records at the start and at the end of a file could be aligned with each other, but misaligned with fields in the middle of the file.

    Devise a method for keeping track of which records are included in each subset.

    If you unintentionally capture the same record more than once, you can remove duplicate records from the reassembled data set in Analytics. For more information, see Remove duplicates.

    With each iteration, make sure the data structure remains consistent.

    Ensure that the name, the length, the data type, and the order of corresponding fields remain consistent. Maintaining this consistency of data structure makes appending the resulting Analytics tables much easier.

    Tip:

    After importing the first subset, open the resulting table in Analytics, and enter DISPLAY in the command line to display the data structure of the table layout. Use the displayed table layout information as a guide for creating the subsequent subsets of records and fields.

    To save labor, use the generic Analytics field names (“Field_1”, “Field_2”, and so on) when defining and importing subsets of records. Once you have reassembled the data set in Analytics, you can rename all the fields in the reassembled table.

  2. When you save each Analytics data file, and each Analytics table layout, use an incrementing numeric suffix to prevent overwriting tables you have already created. For example, “Table_1.fil”, “Table_2.fil”, and so on.
  3. Once you have defined and imported all the records in the source file, append the multiple Analytics tables.

    For more information, see Extracting and appending data.