Define and import a PDF file

Concept Information

IMPORT PDF command

You can create an Analytics table by defining and importing an Adobe PDF file.

When you use the Data Definition Wizard to process a PDF file, Analytics may fully or partially auto-define the file, or you may need to manually define the file.

Note

Defining PDF files can be challenging. If you encounter problems, review Defining and importing print image (report) files and PDF files.

Locate and select the PDF file

  1. Select File > New > Table.
  2. If the Select Platform for Data Source page is displayed, select Local and click Next.
  3. In the Select Local Data Source page, select File and click Next.
  4. In the Select File to Define dialog box, locate and select the PDF file you want to create the Analytics table from and click Open.

    Adobe PDF files have a .pdf file extension.

  5. In the File Format page, verify that PDF Adobe Acrobat file is selected and click Next.

Define the PDF file

  1. In the PDF File Definition page, if required, enter the password for the PDF file and click Next.
  2. If you want to specify a particular page or page range for parsing, rather than All pages, select Pages, and specify one or more page numbers.

    You can specify individual pages separated by commas (1,3,5), page ranges (2-7), or a combination (1, 3, 5-7, 11).

    Tip

    In some circumstances, parsing a PDF file on a page-by-page basis can help with data misalignment.

    If you take this approach, you need to import the file more than once, create more than one Analytics table, and then append the resulting tables in Analytics.

    For more information, see Defining and importing subsets of print image or PDF data.

  3. Leave the PDF Parser at the default setting of Xpdf, or select VeryPDF.

    If you are importing the file for the first time, and you have no reason to do otherwise, leave the setting at Xpdf.

    If you have already encountered data alignment issues when using Xpdf with the file, select VeryPDF to see if the parsing results are better.

  4. Click Next.

    The PDF file is parsed and the PDF File Definition page updates to display the parsed file.

  5. Scroll vertically and horizontally to examine the parsed file.

    Highlighting indicates whether Analytics has auto-defined data in the file:

    HighlightingMeaning
    Aqua-blue highlightingData auto-defined as a field.
    Gray highlighting

    Data auto-defined as a record.

    Record definition depends on at least one field being defined in the record.

    White background

    Undefined data.

    Analytics was not able to detect a pattern in the data and could not auto-define it.

  6. Optional. If the data in the parsed file is misaligned, click Back, switch the parser selection in PDF Parser, and click Next.

    The PDF file is re-parsed using the parser you selected, which may produce better data alignment.

    Any existing field and record definitions are deleted when you re-parse the file.

  7. Do one of the following:
    Result of auto-definitionAction to take
    If Analytics auto-defined the file and you do not want to make any updatesIf Analytics auto-defined the entire file perfectly, and you do not want to:
    • update the generic field names
    • add any header or footer data to the detail data

    go to Finalize the PDF file definition

    If Analytics auto-defined the file and you want to make updatesIf Analytics auto-defined the entire file perfectly, and you want to:

    Tip

    You can also update the generic field names in a subsequent page in the Data Definition Wizard, which you may find more convenient.

    If the auto-definition contains errorsIf the auto-definition:
    • contains errors
    • excludes data that you need
    • includes data that you do not need

    you must do one of the following:

    Tip

    If the auto-definition contains significant errors, deleting the entire auto-definition and manually defining the file can be easier.

    If the parsed file is entirely undefinedIf the parsed file is entirely undefined, indicated by a completely white background, you must Manually define the PDF file

Edit the auto-definition

If you want to edit the auto-definition (or a manual definition), in the PDF File Definition page, do any of the following:

Edit task Instructions
Edit a field definition Right-click an aqua-blue field and select Edit Field, or double-click the field.

You can make a number of changes, including:

  • updating the field name
  • updating the data type
  • under Advanced Options:
    • changing the field length (Field Width)
    • changing the starting position of the field

For detailed information, see Working with field definitions.

Edit a record definition Right-click a gray record and select Edit Record, or double-click the record.

You can make two main changes:

  • update the categorization of the record – detail, header, and footer are the options
  • modify the criteria that Analytics used to capture the set of records

For detailed information, see Working with record definitions.

Delete a field definition or a record definition Right-click a field or a record and select Delete Field or Delete Record.

You can delete definitions for fields that you do not want in the Analytics table, or that you want to define manually because of errors in their auto-definition.

If you delete a record definition, any field definitions contained by the record are also deleted, and all instances of the record definition in the file are deleted.

Note

You are deleting the field definition or the record definition only, not the actual data. If necessary, you can redefine the same field or record data.

Tip

If you want to selectively delete records, select Edit Record and fine-tune the criteria that Analytics used to capture the set of records.

For detailed information, see Working with record definitions.

Manually define the PDF file

Tip

Before you begin, you may find it helpful to first review the basic version of the steps below, with accompanying screen captures: Quick start steps.

Note

You can also define a PDF file using saved field and record definitions, if they exist.

For more information, see Define the PDF file using a set of saved field and record definitions.

  1. In the PDF File Definition page, select a data value to start defining one of the fields in the table.

    For example, you could select a social security number in an SSN field. When you select the data value, the Field Definition dialog box opens.

    Guidelines:

    • You can select a value anywhere in the data. You do not have to use the first field in the table, or select the first value in a field.
    • The value you select can be detail data, header data, or footer data.
    • Do not select field names. Leave all field names in the source file unselected. If you select field names in the source file, Analytics treats them as data contained in fields.
    • If field values vary in length, select the longest value, or select extra blank spaces to allow for longer values that may be lower in the field and not currently displayed.

    If you intend to use the initial data value you selected to uniquely identify a set of records, see Working with field definitions.

  2. Enter a name for the field, if necessary update the data type, and click OK.
  3. In the data value you just selected, or in the same row in the file, select the character, or string of characters, that uniquely identifies the set of records in the source file.

    For example, select:

    • a slash in a date value
    • a decimal point in a numeric value
    • a unique identifying value anywhere in the row containing the data value you selected

    When you select the unique character or characters, the Record Definition dialog box opens, and all records containing the character or characters are highlighted gray.

    For detailed information, see Defining and importing print image (report) files and PDF files.

    If you need to define a record that extends beyond one row in the source file, see Working with multiline records and fields.

  4. If required, update the Record Type to match the type of data you are defining: detail, header, or footer.
  5. If required, modify the criteria used to capture the set of records.

    For example, you could add additional criteria to omit some of the records that were initially captured.

    For detailed information, see Working with record definitions.

  6. Click OK.

    The field you defined is highlighted aqua-blue, and the associated set of captured records is highlighted gray.

  7. Scroll vertically to examine the defined field, and the associated set of captured records.
  8. If the field is not defined correctly, or if the set of captured records needs adjustment, double-click the field or the record, and make the necessary edits in the Field Definition dialog box, or the Record Definition dialog box.

    For more information, see Working with field definitions, or Working with record definitions.

  9. Define the remaining fields in the record by selecting a representative data value for each field.

    Additional fields automatically conform to the set of records.

    Guidelines:

    • Define only those fields you want in the resulting Analytics table.
    • With each field definition, scroll vertically to examine the defined field. Edit the definitions as required.

      For example, if data values are not fully contained by a field, you need to adjust the length or starting position of the field, or both.

      For more information, see Edit the auto-definition.

    • If you need to define field values that extend beyond one row in the source file, see Working with multiline records and fields.
    Tip

    The order in which you define detail fields is the order in which they appear in the resulting Analytics table.

    If you delete a detail field during the definition process, and then re-add it, it loses its original position and is placed last among detail fields.

  10. If you want to define another record, repeat steps 1 to 9.

    Guidelines:

    • When you select a data value to start defining a new field and associated set of records, ensure New Record is selected in the dialog box that appears, and click OK.
    • You can define multiple header or footer records, but only one detail record. The order in which you define the different record types is not enforced.

Define the PDF file using a set of saved field and record definitions

You can define a PDF file using field and record definitions from a previous file definition session that have been saved in a print image query file. The print image query file must already exist, and the saved definitions must match the current data.

Note

Loading a print image query file deletes any current field and record definitions.

  1. In the PDF File Definition page, click Load.
  2. Navigate to a previously saved print image query file, select it, and click Open.

    The definitions are applied to the current data.

    Print image query files have a .txt extension.

    Note

    Only load a file with definitions that you know match, or closely match, the current data.

  3. After loading the file, do one of the following:

Finalize the PDF file definition

  1. Optional. If you want to save the current set of field and record definitions to a print image query file, do the following:
    1. Click Save.
    2. Enter a name for the print image query file and click Save.

    Note

    Field and record definitions often represent a lot of work, and it is recommended that you save them.

    If you subsequently discover that the imported data needs an adjustment, and must be redefined and reimported, saved definitions do not have to be recreated from scratch.

  2. When you are satisfied with all field and record definitions, click Next.

    Note

    If required, you can return to this point in the process and make updates to the field and record definitions.

Save the Analytics data file

In the Save Data File As dialog box, enter a name for the Analytics data file and click Save.

If Analytics prefills a data file name, you can accept the prefilled name, or change it.

You can also navigate to a different folder to save the data file if you do not want to use the default location opened by Analytics.

Edit the Analytics field properties

In the Edit Field Properties page, review the settings assigned by Analytics to the properties listed below, make any required updates, and click Next.

Note

Select a column heading in the preview table to see the properties associated with the column.

Property Description
Ignore this field Excludes the field from the resulting table layout.

The data in the field is still imported, but it is undefined, and does not appear in the new Analytics table. It can be defined later, if necessary, and added to the table.

Name The name for the field in the table layout.

You can keep the name assigned by Analytics, or enter a different name.

Column Title The column title for the field in the default Analytics view.

If you do not specify a column title, the Name value is used.

Type The data type assigned to the field in Analytics.

You can keep the data type assigned by Analytics, or select an appropriate data type from the drop-down list.

For information about the supported data types in Analytics, see Data types in Analytics.

Value A read-only property that displays the first value in the field.

The value dynamically updates based on any edits you make.

Decimal Numeric fields only. The number of decimal places in the source data.

Note

The Decimal text box appears automatically when you select a Numeric data type.

Input Format Datetime fields only. The format of datetime values in the source data.

The format you specify must exactly match the format in the source data.

For more information about date and time formats, see Formats of date and time source data.

Finalize the import

  1. In the Final page, verify the settings for the new Analytics table and click Finish.

    If you want to make any changes, click Back to get to the appropriate page in the wizard.

  2. Enter a name for the table layout that you are adding to the project, or keep the default name, and click OK.

    The new Analytics table is created with data from the imported file.