IMPORT PDF command

Concept Information

Define and import a PDF file

Creates an Analytics table by defining and importing an Adobe PDF file.

Syntax

IMPORT PDF TO table <PASSWORD num> import_filename FROM source_filename <SERVER profile_name> skip_length <PARSER "VPDF"> <PAGES page_range> {[record_syntax] [field_syntax] <...n>} <...n>
record_syntax ::=
RECORD record_name record_type lines_in_record transparent [test_syntax] <...n>
test_syntax ::=
TEST include_exclude match_type AT start_line,start_position,range logic text
field_syntax ::=
FIELD name type AT start_line,start_position SIZE length,lines_in_field DEC value WID bytes PIC format AS display_name

Parameters

General parameters

Name Description
TO table

The name of the Analytics table to import the data into.

Note

Table names are limited to 64 alphanumeric characters. The name can include the underscore character ( _ ), but no other special characters, or any spaces. The name cannot start with a number.

PASSWORD num

optional

Used for password-protected PDF files.

The password definition to use.

You do not use PASSWORD num to prompt for, or specify, an actual password. The password definition refers to a password previously supplied or set using the PASSWORD command, the SET PASSWORD command, or the PASSWORD analytic tag.

num is the number of the password definition. For example, if two passwords have been previously supplied or set in a script, or when scheduling an analytic, PASSWORD 2 specifies that password #2 is used.

For more information about supplying or setting passwords, see:

import_filename

The name of the Analytics data file to create.

Specify import_filename as a quoted string with a .FIL file extension. For example, "Invoices.FIL".

By default, the data file (.FIL) is saved to the folder containing the Analytics project.

Use either an absolute or relative file path to save the data file to a different, existing folder:

  • "C:\data\Invoices.FIL"
  • "data\Invoices.FIL"

FROM source_filename

The name of the source data file. source_filename must be a quoted string.

If the source data file is not located in the same directory as the Analytics project, you must use an absolute path or a relative path to specify the file location:

  • "C:\data\source_filename"
  • "data\source_filename"
SERVER profile_name

optional

The profile name for the server that contains the data that you want to import.
skip_length

optional

The number of bytes to skip at the start of the file.

For example, if the first 32 bytes contains header information, specify a skip length value of 32 to omit this information.

Note

For Unicode data, specify an even number of bytes only. Specifying an odd number of bytes can cause problems with subsequent processing of the imported data.

PARSER "VPDF"

optional

Use the VeryPDF parser to parse the PDF file during the file definition process.

If you omit PARSER, the default Xpdf parser is used.

If you are importing the PDF file for the first time, and you have no reason to do otherwise, use the default Xpdf parser. If you have already encountered data alignment issues when using Xpdf, use the VeryPDF parser to see if the parsing results are better.

PAGES page_range

optional

The pages to include if you do not want to import all of the pages in the PDF file. page_range must be specified as a quoted string.

You can specify:

  • individual pages separated by commas (1,3,5)
  • page ranges (2-7)
  • a combination of pages and ranges (1, 3, 5-7, 11)

If you omit PAGES, all pages in the PDF file are imported.

RECORD parameter

General record definition information.

Note

Some of the record definition information is specified using numeric codes that map to options in the Data Definition Wizard.

In scripts, specify the numeric code, not the option name.

Name Description

RECORD record_name

The name of the record in the Data Definition Wizard.

Specifying record_name is required in the IMPORT PDF command, but the record_name value does not appear in the resulting Analytics table.

In the Data Definition Wizard, Analytics provides default names based on the type of record:

  • Detail
  • Headern
  • Footern

You can use the default names, or specify different names.

record_type

The three possible record types when defining a PDF file:

  • 0 – detail
  • 1 – header
  • 2 – footer

Note

You can define multiple sets of header and footer records in a single execution of IMPORT PDF, but only one set of detail records.

lines_in_record

The number of lines occupied by a record in the PDF file.

You can define single-line or multiline records to match the data in the PDF file.

transparent

The transparency setting for a header record.

Note

Applies to header records only.

  • 0 – not transparent
  • 1 – transparent

Transparent header records do not split multiline detail records.

If a header record splits a multiline detail record in the source PDF file, which can happen at a page break, specifying 1 (transparent) unifies the detail record in the resulting Analytics table.

TEST parameter

The criteria for defining a set of records in the PDF file. You can have one or more occurrences of TEST (up to 8) for each occurrence of RECORD.

Note

Some of the criteria are specified using numeric codes that map to options in the Data Definition Wizard (option names are shown in parentheses below).

In scripts, specify the numeric code, not the option name.

Name Description
TEST include_exclude

How to treat matching data:

  • 0 – (Include) data meeting the criteria is included in the set of records
  • 1 – (Exclude) data meeting the criteria is excluded from the set of records
match_type

The type of matching to perform:

  • 0 – (Exact Match) matching records must contain the specified character, or string of characters, in the specified start line, starting at the specified position
  • 2 – (Alpha) matching records must contain one or more alpha characters, in the specified start line, at the specified start position, or in all positions of the specified range
  • 3 – (Numeric) matching records must contain one or more numeric characters, in the specified start line, at the specified start position, or in all positions of the specified range
  • 4 – (Blank) matching records must contain one or more blank spaces, in the specified start line, at the specified start position, or in all positions of the specified range
  • 5 – (Non-Blank) matching records must contain one or more non-blank characters (includes special characters), in the specified start line, at the specified start position, or in all positions of the specified range
  • 7 – (Find in Line) matching records must contain the specified character, or string of characters, anywhere in the specified start line
  • 8 – (Find in Range) matching records must contain the specified character, or string of characters, in the specified start line, anywhere in the specified range
  • 10 – (Custom Map) matching records must contain characters that match the specified character pattern, in the specified start line, starting at the specified position
AT start_line, start_position, range
  • start_line the line of a record that the criteria apply to

    For example, if you create a custom map to match zip codes, and the zip codes appear on the third line of a three-line address record, you must specify 3 in start_line.

    Note

    For single-line records, the start_line value is always 1.

  • start_position the starting byte position in the PDF file for the comparison against the criteria
  • range the number of bytes from the starting byte position in the PDF file to use in the comparison against the criteria

    If you are using starting byte position only, without a range, specify 0 for range.

    Note

    non-Unicode Analytics1 byte = 1 character
    Unicode Analytics2 bytes = 1 character
logic

The logical relations between criteria:

  • 0 – (And) the current and the next criteria are related with a logical AND
  • 1 – (Or) the current and the next criteria are related with a logical OR
  • 4 – (New Group > And) the current criterion is the last in a group of logical criteria, and the current group and the next group are related with a logical AND
  • 5 – (New Group > Or) the current criterion is the last in a group of logical criteria, and the current group and the next group are related with a logical OR
  • 7 – (End) the current criterion is the last in a group of logical criteria
text

Literal or wildcard characters to match against:

  • For Exact Match, Find in Line, or Find in Range specifies the character, or string of characters, that uniquely identifies the set of records in the PDF file
  • For Custom Map specifies the character pattern that uniquely identifies the set of records in the PDF file

    The Custom Map option uses the same syntax as the MAP( ) function.

For other match types, text is an empty string "".

FIELD parameters

Field definition information.

Name Description
FIELD name type

The individual fields to import from the source data file, including the name and data type of the field. To exclude a field from being imported, do not specify it.

For information about type, see Identifiers for field data types.

AT start_line, start_position
  • start_line the start line of the field in the record in the PDF file

    For multiline records in a PDF file, start_line allows you to start a field at any line of the record. start_line is always 1 if lines_in_record is 1.

  • start_position the starting byte position of the field in the PDF file

    Note

    non-Unicode Analytics1 byte = 1 character
    Unicode Analytics2 bytes = 1 character

    In Unicode Analytics, typically you should specify an odd-numbered starting byte position. Specifying an even-numbered starting position can cause characters to display incorrectly.

SIZE length, lines_in_field
  • length the length in bytes of the field in the Analytics table layout

    Note

    non-Unicode Analytics 1 byte = 1 character
    Unicode Analytics 2 bytes = 1 character

    In Unicode Analytics, specify an even number of bytes only. Specifying an odd number of bytes can cause characters to display incorrectly.

  • lines_in_field the number of lines occupied by a single field value in the PDF file

    You can define single-line or multiline fields to match the data in the file.

    Note

    The number of lines specified for a field cannot exceed the number of lines specified for the record containing the field.

DEC value

The number of decimals for numeric fields.

WID bytes

The display width of the field in bytes.

The specified value controls the display width of the field in Analytics views and reports. The display width never alters data, however it can hide data if it is shorter than the field length.

PIC format

Note

Applies to numeric or datetime fields only.

  • numeric fields the display format of numeric values in Analytics views and reports
  • datetime fields the physical format of datetime values in the source data (order of date and time characters, separators, and so on)

    Note

    For datetime fields, format must exactly match the physical format in the source data. For example, if the source data is 12/31/2014, you must enter the format as "MM/DD/YYYY".

format must be enclosed in quotation marks.

AS display_name

The display name (alternate column title) for the field in the view in the new Analytics table.

Specify display_name as a quoted string. Use a semi-colon (;) between words if you want a line break in the column title.

AS is required when you are defining FIELD. To make the display name the same as the field name, enter a blank display_name value using the following syntax: AS "". Make sure there is no space between the two double quotation marks.

Examples

Importing data from a specific page of a PDF file

You import data from page 1 of the password-protected PDF file, Vendors.pdf.

One set of detail records, with three fields, is created in the resulting Analytics table, Vendor_List:

IMPORT PDF TO Vendor_List PASSWORD 1 "Vendor_List.FIL" FROM "Vendors.pdf" 2 PAGES "1" RECORD "Detail" 0 1 0 TEST 0 3 AT 1,1,0 7 "" FIELD "Vendor_Number" C AT 1,1 SIZE 10,1 DEC 0 WID 10  PIC "" AS "" FIELD "Vendor_Name" C AT 1,33 SIZE 58,1 DEC 0 WID 58  PIC "" AS "" FIELD "Last_Active_Date" D AT 1,277 SIZE 20,1 DEC 0 WID 20  PIC "DD/MM/YYYY" AS ""

Remarks

For more information about how this command works, see Defining and importing print image (report) files and PDF files.

Troubleshooting PDF imports in the Unicode edition of Analytics

If you encounter issues when you import a PDF file using the Unicode edition of Analytics, the problem may be related to length specifications:

  • If foreign language characters are appearing unexpectedly, or the layout in the resulting Analytics table is skewed, check that SIZE length is set to an even number.

    Specifying an odd number of bytes for SIZE length can cause problems with processing of the imported data.

  • If the Analytics table is created, but contains zero records, trying setting skip_length to 2, or some other even number if there is header data at the beginning of the file that you want to skip.

Identifiers for field data types

The table below lists the letters that you must use when specifying type for FIELD. Each letter corresponds to an Analytics data type.

For example, if you are defining a Last Name field, which requires a character data type, you would specify "C": FIELD "Last_Name" C.

For more information, see Data types in Analytics.

Note

When you use the Data Definition Wizard to define a table that includes EBCDIC, Unicode, or ASCII fields, the fields are automatically assigned the letter "C" (for the CHARACTER type).

When you enter an IMPORT statement manually, or edit an existing IMPORT statement, you can substitute the more specific letters "E" or "U" for EBCDIC or Unicode fields.

Letter

Analytics Data type

A

ACL

B

BINARY

C

CHARACTER

D

DATETIME

E

EBCDIC

F

FLOAT

G

ACCPAC

I

IBMFLOAT

K

UNSIGNED

L

LOGICAL

N

PRINT

P

PACKED

Q

BASIC

R

MICRO

S

CUSTOM

T

PCASCII

U

UNICODE

V

VAXFLOAT

X

NUMERIC

Y

UNISYS

Z

ZONED