Using File System Items Source in Astera Centerprise

The File System Items Source in Astera Centerprise is used to provide metadata information to a task in a dataflow or workflow. In a dataflow, it can be used in conjunction with a source object, in cases where you want to process multiple files through the transformation and loading process. In a workflow, the File System Items Source can be used to provide input paths to a subsequent task such as a RunDataflow task.

Let’s see how it works in a dataflow.

Using File Systems Items Source in a Dataflow

Scenario

Here we have a dataflow that we want to run on multiple source files that contain customer data from a fictitious organization. We’re going to use the source object as a transformation and provide the location of the source files using a File System Items Source object. The File System Items Source will provide the path to the location where our source files reside and the source will object pick the source files from that location, one by one, and pass it on for further processing in the dataflow.

Steps to Use the File System Item Source in a Dataflow

Here, we want to sort the data, filter out records of customers from Germany and write the filtered records into a database table. The source data is stored in delimited (.csv) files.

image-20200330133415502

First, change the source object into a transformation. This is because the data is stored in multiple delimited files and we want to process all of them in the dataflow. For this, right-click on the source object’s header and click on ‘Transformation’ in the context menu.

img

You can see that the color of the source object has changed from green to purple which indicates that the source object has changed into a transformation.

img

You can also notice that the source object now has two nodes: Input and Output. The Input node has an input mapping port which means that it can take the path to the source file from another object.

img

Now we will use a File System Items Source to provide path to Customer_Data Transformation object. Go to the Sources section in the Toolbox and drag and drop the File System Items Source onto the designer.

img

If you look at the File System Items Source, you can see that the layout is pre-populated with fields such as FileName, FileNameWithoutExtension, Extension, FullPAth, Directory, ReadOnly, Size, and other attributes of the files.

img

To configure the properties of the File System Items Source, right-click on the File System Items Source object’s header and go to Properties.

img

This will open the File System Properties window.

img

The first thing you need to do is point the path to the directory or folder where your source files reside.

img

You can see a couple of other options on this screen:

Filter: If your specified source location contains multiple files in different formats, you can use this option to filter and read files in the specified format. For instance, our source folder contains multiple PDF, .txt. doc, .xls, and .csv files, so we will write “*.csv” in the filter box to filter and read delimited files only.

img

Include items in subdirectories: Check this option if you want to process files present in the sub-directories

Include Entries for Directories: Check this option if you want to include all items in the specified directory

Once you’ve specified the path and other options, click OK.

img

Now right-click on the File System Items Source object and select Preview Output.

img

You can see that the File System Items Source object has filtered out delimited files from the specified location and has returned the metadata in the output. You can see the FileName, FileNameWithoutExtension, Extension, FullPath, Directory, and other attributes such as whether the file is ReadOnly, FileSize, LastAccessed, and other details in the output.

img

Now let’s do the mapping. Map the FullPath field from the File System Items Source to the FullPath field under the Input node in the Customer_Data Transformation object.

img

Once mapped, when we run the dataflow, the File System Items Source will pass the path to the source files, one by one, to the Customer_Data Transformation object. The Customer_Data Transformation object will read the data from the source file and pass it to the subsequent transformation object to be processed further in the dataflow.

Using File System Items Source in a Workflow

In a workflow, the File System Items Source can be used to provide input paths to a subsequent task such as a RunDataflow task. Let’s see how it works.

Steps to Use the File System Items Source in a Workflow

We want to design a workflow to orchestrate the process of extracting customer data stored in delimited files, sorting that data, filtering out records of customers from Germany and loading the filtered records in a database table.

img

We have already designed a dataflow for the process and have called this dataflow in our workflow using the RunDataflow task object.

img

We have multiple source files that we want to process in this dataflow. So, we will use a File System Items Source object to provide the path to our source files to the RunDataFlow task. For this, go to the Sources section in the toolbox and drag and drop the File System Items Source onto the designer.

img

If you look at the File System Items Source, you can see that the layout is pre-populated with fields such as FileName, FileNameWithoutExtension, Extension, FullPAth, Directory, ReadOnly, Size, and other attributes of the files. Also, there’s this small blue icon with the letter ‘s’ which indicates that the object is set to run in Singleton mode.

img

By default, all objects in a workflow are set to execute in Singleton mode. However, since we have multiple files to process in the dataflow, we will set the File System Items Source object to run in loop. For this, right-click on the File System Items Source and click ‘Loop’ in the context menu.

img

You can see that the color of the object has changed to purple, and it now has this purple icon over the header which denotes the loop function.

img

It also has these two mapping ports on the header to map the File System Items Source object to the subsequent action in the workflow. Let’s map it to the RunDataflowTask.

img

To configure the properties of the File System Items Source, right-click on the File System Item Source object’s header and go to Properties.

img

This will open the File System Items Source Properties window.

img

The first thing you need to do is point the path to the directory or folder where your source files reside.

img

You can see a couple of other options on this screen:

Filter: If your specified source location contains multiple files in different formats, you can use this option to filter and read files in the specified format. For instance, our source folder contains multiple PDF, .txt. doc, .xls, and .csv files, so we will write “*.csv” in the filter box to filter and read delimited files only.

img

Include items in subdirectories: Check this option if you want to process files present in the sub-directories

Include Entries for Directories: Check this option if you want to include all items in the specified directory

Once you’ve specified the path and other options, click OK.

img

Now right-click on the File System Items Source object and click Preview Output.

img

You can see that the File System Items Source object has filtered out delimited files from the specified location and has returned the metadata in the output. You can see the FileName, FileNameWithoutExtension, Extension, FullPath, Directory, and other attributes such as whether the file is ReadOnly, FileSize, LastAccessed, and other details in the output.

img

Now let’s do the mapping. Map the FullPath field from the File System Items Source object to the SourceFile field in the RunDataflow task.

img

Once mapped, upon running the dataflow, the File System Items Source will pass the path to the source files, one by one, to the RunDataflow task. In other words, the File System Items Source acts as a driver to provide source files to the RunDataflow tasks, which will then process them in the dataflow.

When the File System Items Source is set to run in a loop, the dataflow will run for ‘n’ number of times; where ‘n’ = the number of files passed by the File System Items Source to the RunDataflow task. For instance, you can see that we have six source files in the specified folder. The RunDataflow task object will pass these six files one by one to the RunDataflow task to be processed in the dataflow.

img

This concludes using the File Systems Items Source in Astera Centerprise.