Auto Generate Layout - UI Walkthrough

Astera ReportMiner gives users the functionality to auto-create the data regions and data fields with just one click. There is an option to Auto-Generate Layout in the toolbar on top of the designer, which automatically creates an extraction template for your unstructured document. There are two additional sub-features: Auto Generate Table, and Auto Create Fields (Single-Instance), which are used specifically for creating a table region and capturing key-value pairs, respectively.

These features make the extraction process much more efficient as it reduces the effort of designing report models from scratch. To make the extraction template more robust and customized, users can further tweak the auto-generated layout to fit their business requirements.

How it Works

This feature detects key-value pairs and tables in an unstructured document and creates clones of single-instance and collection data regions, respectively. It uses the same techniques of specifying patterns and defining region properties to capture data regions. Furthermore, these data regions have additional properties of their own. Then, it detects useful information within a region, whether it is key-value pairs or table, and creates data fields. Finally, the user has the options of auto-creating fields (single-instance) or auto-generating the uncaptured table by selecting relevant data on the canvas in either of the data regions.

These options reduce the amount of time taken to create a report model from scratch and are particularly useful when the unstructured document is a large file consisting of many data regions and fields.

Auto-Generate Layout Icon

After loading the unstructured document in a report model, the option to Auto-Generate-Layout will be enabled in the toolbar above the canvas.

01-icon

By clicking on this icon, a user can initiate the process of automatically creating an extraction template.

Output Window

Once the report model is created, an Output window will open automatically, showing the details of the data regions created.

02-output-window

Here, we can see that this window shows the following details:

1. Whether the region created is a Table or Data (key-value) region.

2. The lines on which the data region spans in the unstructured document.

3. The number of fields and records within a data region.

4. The total time duration for the entire extraction process.

Data Preview

Click on the Preview Data option in the secondary toolbar above the designer to check if the data has been extracted correctly.

data-preview

A Data Preview window will open, displaying all the information extracted using the Auto Generate Layout feature.

03-data-preview

Auto Create Fields (Single-Instance)

The option to Auto Create Fields (Single Instance) in the Region Properties panel allows the user to create specific fields by selecting the relevant data on the canvas. This feature is specifically designed as a sub-component of the AGL feature to extract useful key-value pairs from the unstructured document. That is, if AGL fails to pick up some key-value pair, then the ACF (Single-Instance) option can be used to create those highlighted fields instantly.

Note: The existing Auto-Create Fields (Collection) option works best for collection regions as it accurately picks the value of the headers. The new Auto Create Fields (Single Instance) option is specifically designed for extracting key-value pairs and that can only be done in a single-instance data region.

04-auto-create-fields

This feature extracts the field data and determines the field names automatically.

05-model-layout

Note: In order to use this option, make sure the correct data region is selected.

Auto Generate Table (Beta)

Users can auto-create a table by using this simple two-step process:

1. Select the data you want to extract.

2. Right-click on the selected data and select Auto Generate Table (Beta).

06-auto-generate-table

This option automatically creates a table region and extracts the data fields within this region.

Best Practices for Using the AGL Feature:

Although the Auto-Generate Layout feature can help speed up the process of extraction, it does not completely detect which fields to extract and which ones to leave out according to the user’s requirement. Therefore, users must not exclusively rely on this feature for creating an end-to-end extraction template. The Auto Create Fields (Single-Instance) and Auto Create Table sub-components are specially designed to help the user tweak the report model that has been generated automatically.

Here are some pointers to keep in mind to increase the accuracy of the Auto Generate Layout feature:

  • Auto Generate Layout and Auto Generate Table support PDF format files written in English language only.

  • Files with a simple format, where the key-value pairs are at the top and the table is at the bottom, are ideal. Examples of such files include invoices, purchase orders, etc. The file used in this article serves as a perfect example.

  • PDF files with at most 50 pages are optimal.

  • It is recommended for key-value pairs to have a separator (“:”) for accurate detection.

  • For table detection, the best results are achieved with:

    • Single line rows and headers in the table.
    • Table with clear boundaries.
    • Uniform column spacing in tables without boundaries.
    • Consistent format (identical header name, header span, and field length) if table spans over multiple pages.
  • Make sure the file does not have an alignment issue. If it is not completely aligned, it may be due to incorrect Scaling Factor. You can change the Scaling Factor in the Report Options panel.

    07-scaling-factor

This concludes our discussion on the Auto Generate Layout feature.