PDF Form Source

A PDF Form Source object provides users with the functionality of extracting data from a fillable PDF document. A fillable PDF document comprises of certain data points or digital fields which are editable by a user using any modern PDF viewers. They are often used instead of official documents on the web. The PDF Form Source object detects those points, extracts the written data, and creates relative fields for them.

In this article, we will explore how to make use of the PDF Form Source object in Astera Centerprise to retrieve data.

Sample Use-Case

01-PDFFormSource-SampleUseCase

Note: This is a Scholarship Application Form with fillable data fields for Personal Information, Contact Details, and Education Qualifications.

Utilizing the PDF Form Source Object

1. Select the PDF Form Source object from the Toolbox and drag-and-drop it onto the dataflow designer.

02-dataflow-toolbox

03-PDFFormSource-Object

2. Right-click on the PDF Form Source object’s header and select the Properties option from the context menu.

04-PDFFormSource-Properties

A configuration window will open, as shown below.

05-PDFFormSource-PropertiesLayout

3. Provide the File Path for the fillable PDF document.

06-PDFFormSource-FilePath

  • Owner Password: If the file is protected, then enter the password that is configured by the owner of the fillable pdf document. If the file is not protected, this option can be left blank.
  • Use UTF-8 Encoding: Check this option if the file is UTF-8 i.e., Unicode Transformation Format – 8-bit, encoded.

Click Next.

07-PDFFormSource-PropertiesLayout

This is the Layout Builder window, where you can see the data fields extracted from the fillable PDF document. Click Next.

08-PDFFormSource-LayoutBuilder

This is the Config Parameters window. Click Next.

09-PDFFormSource-ConfigParameters

This is the General Options window. Click OK.

10-PDFFormSource-GeneralSettings

4. Right-click on the PDF Form Source object’s header and select Preview Output from the context menu.

11-PDFFormSource-PreviewOutput

View the data through the Data Preview window.

12-PDFFormSource-PreviewOutput

The data is now available for mapping. For simplicity, we will delete the non-required data fields and store the output in a separate file. To store the data, we must write it to a destination file.

5. We are using a Delimited Destination object. Drag-and-drop the Delimited Destination object onto the dataflow designer and map the fields from the PDF Form Source object to the destination object.

13-drag-and-drop-Deliminated

Right-click on the fields that you do not want to store and select the Remove Element option.

Note:

  • Do not delete the data fields from the PDF Form Source object, as it will disturb the layout that has been generated for the detected data fields.
  • You can also delete the data fields in the destination file by using the Layout Builder. Or map only the relevant fields onto the nodes of the destination object. You can refer to this article to learn more about the Delimited Destination object.

6. Simply double-click or right-click on the Delimited Destination object’s header and select the Properties option from the context menu. Specify the File Path where you want to store the destination file. Click OK.

14-delimitedDestination-FilePath

7. To preview the data, right-click on the destination object’s header and select Preview Output from the context menu.

15-delimitedDestination-PreviewOutput

Here, you can see data of the selected fields.

16-Delimited-DataPreview

This is how a PDF Form Source object is used in Astera Centerprise to mine data point/digital fields from fillable PDF documents.