How to use PDF Form Source in ReportMiner

PDF Form Source in Astera ReportMiner enables users to extract data from a PDF file directly without creating an extraction template. This saves the need to create a report model since ReportMiner reads the layout of the PDF form automatically including check-boxes and radio buttons.

In this document, we will learn how to use PDF Form Source in Astera ReportMiner.

Sample Use-Case

In this case, the data that we want ReportMiner to read is contained within a PDF form as shown below. This form has radio-buttons as well as text boxes. Header name for each question has been set up.

Download the sample PDF form from here.

1_source_data

It is not feasible to create a report model to extract this data because:

1. ReportMiner’s designer does not detect radio buttons and check boxes and,

2. If you want ReportMiner to read a number of similar forms, a single model cannot be applied to all the forms since each form will contain different answers.

To address the above mentioned issues, we will use a PDF Form Source object in a dataflow.

Using PDF Form Source object

1. To get a PDF Form Source from the Toolbox, go to Toolbox > Sources > PDF Form Source and drag-and-drop the object onto the designer.

2_object

You can see that the dragged source object is currently empty and needs to be configured.

2. To configure PDF Form Source object, right-click on its header and select Properties from the context menu.

3_properties

A configuration window will open as shown below.

4_config

3. Provide the File Path of the PDF file from which ReportMiner will read the data.

5_file_path

There are some other options here as well:

  • Owner Password – Use this option if your PDF file is password protected.
  • Use UTF-8 Encoding – Use this option if your data follows a Unicode standard.

4. Click Next, Layout Builder screen will open. Here you can change the field names and their data types.

6_layout_builder

Click OK. The PDF Form Source object on the dataflow designer will show the fields with the layout defined in the PDF form. Observe that the object will pick up the header name defined in the making of the PDF form.

7_object

To preview the extracted data, right-click on the object’s header and select Preview Output.

8_preview

A Data Preview window will open showing you that ReportMiner has picked up data indicated by radio buttons as well as the content in the text box.

9_data

Your data is now ready for further integration.