Pattern is a Regular Expression¶
As the name suggests, the ‘Pattern is a Regular Expression’ feature in ReportMiner reads the specified pattern as a regular expression. A regular expression is a special text string used to define a search pattern. You can think of regular expressions as wildcards. For example, wildcard notations such as *.txt is used to find all text files in a file manager.
In this article, we will explore a scenario where the Pattern is a Regular Expression feature helps us in selecting a data region.
Sample Use Case¶
In this case, we have some unstructured data stored in a .txt file.
Download the sample txt file from here.
This file contains contact details of the business dealers of the company.
Here, we want to capture the information including name, company, and state of these dealers along with their phone and fax numbers. Notice that the information in the phone and fax fields is written in a different format than the rest of the data on this file. In order to capture this information, we will use the Pattern is a Regular Expression feature.
Creating a Report Model¶
1. Load the unstructured source file in ReportMiner’s designer window.
2. Right-click on the Record node in the Model layout panel and select Add Data Region from the context menu.
A pattern-matching box, properties panel and a data node is added on the Report Model screen.
3. Check Pattern is a Regular Expression option present in the Pattern Properties panel. This is done so that ReportMiner can capture phone and fax numbers that have been formatted differently, through a regular expression.
4. Specify the pattern in the form of a regular expression to capture the required data region.
In this case, the first symbol is **\ ** (a backslash) which means starts with, which indicates that the required data starts with the character/symbol used in the following pattern.
You can see that the entire data has been highlighted.
Next, put ( in the pattern box after a backslash, which indicates that the required data starts with an open bracket.
Notice that only the data lines with phone numbers starting with an open bracket have been highlighted. Since some phone numbers do not have an open bracket as the first character, put ? in the pattern box after (, which will indicate that the data may or may not start with an open bracket.
Notice that all the numbers are using x character. So, write x in the pattern box, after the question mark. You can see that all the data lines with phone and fax numbers have been selected now.
To be more precise, you can put + (a plus sign) after x, indicating that the character x can appear more than once.
The pattern in the form of a regular expression has been specified.
5. To capture the rest of the data, increase the Line Count to 3 and Apply Pattern to Line to 1.
The required data region has been selected. Now, let’s create data fields.
6. Highlight the information inside the data region, right-click on it and select Add Data Field from the context menu. In this case, let’s rename it to Name.
Repeat the same process to create more data fields and name them as shown below. You can see the layout of the extraction template in the Model Layout panel.
7. Preview the data by clicking on Preview Data icon placed in the toolbar at the top of the designer window.
A Data Preview window will open displaying a preview of the extracted data.