IMPORT PDF command
Concept Information
Creates an Analytics table by defining and importing an Adobe PDF file.
Syntax
IMPORT PDF TO table <PASSWORD num> import_filename FROM source_filename <SERVER profile_name> skip_length <PARSER "VPDF"> <PAGES page_range> {[record_syntax] [field_syntax] <...n>} <...n>
record_syntax ::=
RECORD record_name record_type lines_in_record transparent [test_syntax] <...n>
test_syntax ::=
TEST include_exclude match_type AT start_line,start_position,range logic text
field_syntax ::=
FIELD name type AT start_line,start_position SIZE length,lines_in_field DEC value WID bytes PIC format AS display_name
Parameters
General parameters
Name | Description |
---|---|
TO table |
The name of the Analytics table to import the data into. Note Table names are limited to 64 alphanumeric characters. The name can include the underscore character ( _ ), but no other special characters, or any spaces. The name cannot start with a number. |
PASSWORD num optional |
Used for password-protected PDF files. The password definition to use. You do not use PASSWORD num to prompt for, or specify, an actual password. The password definition refers to a password previously supplied or set using the PASSWORD command, the SET PASSWORD command, or the PASSWORD analytic tag. num is the number of the password definition. For example, if two passwords have been previously supplied or set in a script, or when scheduling an analytic script, PASSWORD 2 specifies that password #2 is used. For more information about supplying or setting passwords, see: |
import_filename |
The name of the Analytics data file to create. Specify import_filename as a quoted string with a .FIL file extension. For example, "Invoices.FIL". By default, the data file (.FIL) is saved to the folder containing the Analytics project. Use either an absolute or relative file path to save the data file to a different, existing folder:
|
FROM source_filename |
The name of the source data file. source_filename must be a quoted string. If the source data file is not located in the same directory as the Analytics project, you must use an absolute path or a relative path to specify the file location:
|
SERVER profile_name optional |
The profile name for the server that contains the data that you want to import. |
skip_length
optional |
The number of bytes to skip at the start of the file. For example, if the first 32 bytes contains header information, specify a skip length value of 32 to omit this information. Note For Unicode data, specify an even number of bytes only. Specifying an odd number of bytes can cause problems with subsequent processing of the imported data. |
PARSER "VPDF" optional |
Use the VeryPDF parser to parse the PDF file during the file definition process. If you omit PARSER, the default Xpdf parser is used. If you are importing the PDF file for the first time, and you have no reason to do otherwise, use the default Xpdf parser. If you have already encountered data alignment issues when using Xpdf, use the VeryPDF parser to see if the parsing results are better. |
PAGES page_range optional |
The pages to include if you do not want to import all of the pages in the PDF file. page_range must be specified as a quoted string. You can specify:
If you omit PAGES, all pages in the PDF file are imported. |
RECORD parameter
General record definition information.
Note
Some of the record definition information is specified using numeric codes that map to options in the Data Definition Wizard.
In scripts, specify the numeric code, not the option name.
Name | Description |
---|---|
RECORD record_name |
The name of the record in the Data Definition Wizard. Specifying record_name is required in the IMPORT PDF command, but the record_name value does not appear in the resulting Analytics table. In the Data Definition Wizard, Analytics provides default names based on the type of record:
You can use the default names, or specify different names. |
record_type |
The three possible record types when defining a PDF file:
Note You can define multiple sets of header and footer records in a single execution of IMPORT PDF, but only one set of detail records. |
lines_in_record |
The number of lines occupied by a record in the PDF file. You can define single-line or multiline records to match the data in the PDF file. |
transparent |
The transparency setting for a header record. Note Applies to header records only.
Transparent header records do not split multiline detail records. If a header record splits a multiline detail record in the source PDF file, which can happen at a page break, specifying 1 (transparent) unifies the detail record in the resulting Analytics table. |
TEST parameter
The criteria for defining a set of records in the PDF file. You can have one or more occurrences of TEST (up to 8) for each occurrence of RECORD.
Note
Some of the criteria are specified using numeric codes that map to options in the Data Definition Wizard (option names are shown in parentheses below).
In scripts, specify the numeric code, not the option name.
Name | Description | ||||
---|---|---|---|---|---|
TEST include_exclude |
How to treat matching data:
|
||||
match_type |
The type of matching to perform:
|
||||
AT start_line, start_position, range |
|
||||
logic |
The logical relations between criteria:
|
||||
text |
Literal or wildcard characters to match against:
For other match types, text is an empty string "". |
FIELD parameters
Field definition information.
Name | Description | ||||
---|---|---|---|---|---|
FIELD name type |
The individual fields to import from the source data file, including the name and data type of the field. To exclude a field from being imported, do not specify it. For information about type, see Identifiers for field data types. |
||||
AT start_line, start_position |
|
||||
SIZE length, lines_in_field |
|
||||
DEC value |
The number of decimals for numeric fields. |
||||
WID bytes |
The display width of the field in bytes. The specified value controls the display width of the field in Analytics views and reports. The display width never alters data, however it can hide data if it is shorter than the field length. |
||||
PIC format |
Note Applies to numeric or datetime fields only.
format must be enclosed in quotation marks. |
||||
AS display_name |
The display name (alternate column title) for the field in the view in the new Analytics table. Specify display_name as a quoted string. Use a semi-colon (;) between words if you want a line break in the column title. AS is required when you are defining FIELD. To make the display name the same as the field name, enter a blank display_name value using the following syntax: AS "". Make sure there is no space between the two double quotation marks. |
Examples
Importing data from a specific page of a PDF file
You import data from page 1 of the password-protected PDF file, Vendors.pdf.
One set of detail records, with three fields, is created in the resulting Analytics table, Vendor_List:
IMPORT PDF TO Vendor_List PASSWORD 1 "Vendor_List.FIL" FROM "Vendors.pdf" 2 PAGES "1" RECORD "Detail" 0 1 0 TEST 0 3 AT 1,1,0 7 "" FIELD "Vendor_Number" C AT 1,1 SIZE 10,1 DEC 0 WID 10 PIC "" AS "" FIELD "Vendor_Name" C AT 1,33 SIZE 58,1 DEC 0 WID 58 PIC "" AS "" FIELD "Last_Active_Date" D AT 1,277 SIZE 20,1 DEC 0 WID 20 PIC "DD/MM/YYYY" AS ""
Remarks
For more information about how this command works, see Defining and importing print image (report) files and PDF files.
Troubleshooting PDF imports in the Unicode edition of Analytics
If you encounter issues when you import a PDF file using the Unicode edition of Analytics, the problem may be related to length specifications:
-
If foreign language characters are appearing unexpectedly, or the layout in the resulting Analytics table is skewed, check that SIZE length is set to an even number.
Specifying an odd number of bytes for SIZE length can cause problems with processing of the imported data.
- If the Analytics table is created, but contains zero records, trying setting skip_length to 2, or some other even number if there is header data at the beginning of the file that you want to skip.
Identifiers for field data types
The table below lists the letters that you must use when specifying type for FIELD. Each letter corresponds to an Analytics data type.
For example, if you are defining a Last Name field, which requires a character data type, you would specify "C": FIELD "Last_Name" C.
For more information, see Data types in Analytics.
Note
When you use the Data Definition Wizard to define a table that includes EBCDIC, Unicode, or ASCII fields, the fields are automatically assigned the letter "C" (for the CHARACTER type).
When you enter an IMPORT statement manually, or edit an existing IMPORT statement, you can substitute the more specific letters "E" or "U" for EBCDIC or Unicode fields.
Letter |
Analytics Data type |
---|---|
A |
ACL |
B |
BINARY |
C |
CHARACTER |
D |
DATETIME |
E |
EBCDIC |
F |
FLOAT |
G |
ACCPAC |
I |
IBMFLOAT |
K |
UNSIGNED |
L |
LOGICAL |
N |
|
P |
PACKED |
Q |
BASIC |
R |
MICRO |
S |
CUSTOM |
T |
PCASCII |
U |
UNICODE |
V |
VAXFLOAT |
X |
NUMERIC |
Y |
UNISYS |
Z |
ZONED |