How To Use PDF Form Source in ReportMiner

The PDF Form Source in Astera ReportMiner enables users to extract data from a PDF file directly without creating an extraction template. This saves the need to create a report model since ReportMiner reads the layout of the PDF form automatically including check-boxes and radio buttons.

In this document, we will learn how to use PDF Form Source in Astera ReportMiner.

Sample Use Case

In this case, the data that we want ReportMiner to read is stored in a PDF form as shown below. This form has radio-buttons as well as text boxes. The Header name for each question has been set up.

You can download the sample PDF form from here.

1_source_data

It is not feasible to create a report model to extract this data because:

1. ReportMiner’s designer does not detect radio buttons and check boxes, and

2. If you want ReportMiner to read a number of similar forms, a single model cannot be applied to all the forms since each form will contain different answers.

To address the above mentioned issues, we will use a PDF Form Source object in a dataflow.

Using PDF Form Source object

1. To get a PDF Form Source from the Toolbox, go to Toolbox > Sources > PDF Form Source and drag and drop the object onto the dataflow designer.

1-pdf-form-source

You can see that the dragged source object is currently empty and needs to be configured.

2. To configure the PDF Form Source object, right-click on its header and select Properties from the context menu.

2-properties

A configuration window will open as shown below.

3-pdf-form-source

3. Provide the File Path of the PDF file from which ReportMiner will read the data.

4-properties

There are some other options here as well:

  • Owner Password – Use this option if your PDF file is password protected.
  • Use UTF-8 Encoding – Use this option if your data follows a Unicode standard.

4. Click Next. The next screen is the Layout Builder screen. Here, you can change the field names and their data types.

5-layout-builder

Click OK. The PDF Form Source object on the dataflow designer will show the fields with the layout defined in the PDF form. You can see that the object will pick up the header name defined in the making of the PDF form.

6-data-loaded

To preview the extracted data, right-click on the object’s header and select Preview Output.

7-output-preview

A Data Preview window will open showing you that ReportMiner has picked up data indicated by radio buttons as well as the content in the text box.

8-output-preview

Your data is now ready for further integration. This concludes working with PDF Form source in Astera ReportMiner.