Raw Text Filters in File Sources

The option to apply text filters to raw data at the point of extraction that has existed for some time in Centerprise has been upgraded in Centerprise 7.6 release with some notable bug fixes. This feature is available in Fixed-Length and Delimited file sources and provides you the flexibility to source selected data by applying filters at the point of extraction, without modifying the original source data.

Steps to Use Raw Text Filters with Fixed-Length or Delimited Files

  1. To apply raw text filters at the point of extraction, drag and drop a fixed-length or delimited source object onto the designer.

Here I’m going to use a delimited file source to retrieve data using the text filter option. But you can also use a fixed-length file source for the purpose.

1570707625803

  1. Right-click on the delimited or fixed-length file source object and go to Properties from the context menu.

../_images/1570704474612.png

  1. This will open the Properties window. Point the path to where your source file is located.

1570707801009

  1. Now scroll down to the bottom of the same screen. Here, you can find Raw Text Filter options, among many other options.

1570707847522

You can filter data using the following three options:

  • No filter. Process all records: If you choose this option, the delimited or fixed-length reader will process and retrieve entire source data without skimming any records.

In this example, we have choosen the ‘No filter’ option and therefore, the delimited file reader has retrieved original data from our source file wihtout filtering any records. Here’s a preview of the output:

1570704624047

  • Process if begins with: You can choose this option to filter out records that start with a certain letter, digit, character, word, numeric value, or phrase.

In this example, we want to retrieve records starting with ‘7014.’ Therefore, we’ll select the second filtering option and specify the starting value.

1570707890139

Here’s a preview of the output:

1570707935285

  • Process if matches this regular expression: You can specify a regular expression to extract matching records.

In this example, we have specified a regular expression ‘\s\s+’ to retrieve all matching records.

1570707961869

Here’s the preview of the output:

1570704869773

Once you have selected the desired filter, click OK. The source object will retrieve the data accordingly and you can then consume this data in other transformation or loading processes further in the dataflow.

Note: One thing to note while filtering data at the point of extraction is that these raw text filters can only be applied to the starting (left-most) position of a record. Which means that making changes in the Layout Builder to bring a certain field at the starting (left-most) position and then applying a text filter won’t yield the desired output. You will need to change the layout in the source file itself and place your required field(s) at the specified position to be able to filter out the required records.