PDF Form Source

A PDF Form Source object provides us with the functionality of extracting data from a fillable PDF document. A fillable PDF document comprises of certain data points or digital fields which are editable by a user using any modern PDF viewers. They are often used instead of official documents on the web. The PDF Form Source object detects those points, extracts out the written data, and creates relative fields for them.

In this article, we will explore how to make use of the PDF Form Source object in Astera Centerprise to retrieve data.

Sample Use-Case

01-PDFFormSource-SampleUseCase

Note: This is a Scholarship Application Form with fillable data fields for Personal Information, Contact Details, and Education Qualifications.

Utilizing the PDF Form Source Object

1. Select the PDF Form Source object from the toolbox window and drag-and-drop onto the dataflow designer.

02-dataflow-toolbox

03-PDFFormSource-Object

2. Right-click on the header of the PDF Form Source object and select the Properties option from the context menu.

04-PDFFormSource-Properties

A configuration window will open, as shown below.

05-PDFFormSource-PropertiesLayout

3. Provide the file path for the fillable PDF document.

06-PDFFormSource-FilePath

  • Owner Password: If the file is protected, then enter the password that is configured by the owner of the fillable pdf document. If the file is not protected, this option can be left blank.
  • Use UTF-8 Encoding: Check this option if the file is UTF-8 i.e., Unicode Transformation Format – 8-bit, encoded.

Click on Next.

07-PDFFormSource-PropertiesLayout

This is the Layout Builder window, where you can see the data fields extracted from the fillable PDF document. Click on Next.

08-PDFFormSource-LayoutBuilder

This is the Config Parameters window. Click on Next.

09-PDFFormSource-ConfigParameters

This is the General Setting window. Click on OK.

10-PDFFormSource-GeneralSettings

4. Right-click the PDF Form Source object’s header and select Preview Output from the context menu.

11-PDFFormSource-PreviewOutput

View the data through the Data Preview window.

12-PDFFormSource-PreviewOutput

The data is now available for mapping. For simplicity, we will delete the non-required data fields and store the output in a separate file. To store the data, we must write it to a destination file.

5. We are using a Delimited Destination object. Drag-and-drop the Delimited Destination object onto the dataflow designer and map the fields from PDF Form Source object to the destination object.

13-drag-and-drop-Deliminated

Right-click on the fields that you do not want to store and select the Remove Element option.

Note:

  • Do not delete the data fields from the PDF Form Source object, as it will disturb the layout that has been generated for the detected data fields.
  • You can also delete the data fields in the destination file by using the Layout Builder. Or map only the relevant fields onto the nodes of the destination object. You can refer to this article to learn more about Delimited Destination object.

6. Simply double-click or right-click on the Delimited Destination object’s header and select the Properties option from the context menu. Select the file path where you want to store the file. Click on OK.

14-delimitedDestination-FilePath

7. To preview the data, right-click on the destination object’s header and select Preview Output from the context menu.

15-delimitedDestination-PreviewOutput

Here, you can see the data of the selected fields.

16-Delimited-DataPreview

This is how a PDF Form Source object is used in Astera Centerprise to mine out data point/digital fields from fillable PDF documents.