Data Cleanse Transformation

Data cleanse transformation is a new addition to Astera Centerprise’s library of transformations. It makes it all the more convenient for business users to cleanse raw data and present it in a more refined, standardized, and enterprise-ready format. Using the Data Cleanse transformation, users can clean up data from null values and redundant text and characters, and prepare raw data for transformation, validation, profiling, and record matching functions.

Steps to Use the Data Cleanse Transformation in Astera Centerprise

1. Retrieve the data you want to cleanse using the relevant source object. (Click here to learn more about setting up sources in Centerprise.)

1570020199130

2. Now drag the Data Cleanse transformation object from the Transformations section of the toolbox and drop it onto the designer.

1570012116337

3. This is what a Data Cleanse transformation object looks like.

1570012676734

4. Map data from the source object to the Data Cleanse transformation object. You can either auto-map the entire data set or map a few fields manually.

1570012694963

5. Now you have to specify the criteria for data cleansing. Right-click on the Data Cleanse transformation object and go to Properties from the context menu.

1570012705574

6. This will open a new window where you have to set up the properties for data cleansing. The first screen is the Layout Builder Screen. Here you can customize the layout of your dataset by adding, removing or renaming fields. Once you have created the layout, click Next to proceed to the next screen.

1570012739632

7. This is where you set up the data cleanse criteria for your source data.

1569932779770

You can find various data cleanse options arranged in different sections. Let’s explore them one by one.

Remove

The options provided within this category allow you to remove values, spaces, tabs, and line breaks from your data. You can find the following options within this category:

1569932906261

· All whitespaces – removes all whitespaces from the data

· Leading and trailing whitespaces – removes the whitespaces preceding and succeeding the values

· Tabs and line breaks – removes tabs and line breaks within the source values

· Duplicate whitespaces – removes double spaces from the data

· Letters – removes all alphabets from the data

· Digits – removes all digits from the data

· Punctuation – removes all punctuation from the data

· Specified Character – removes any specific character from the data

Replace Nulls

As the name suggests, the options within this category allow you to replace null values inside a string or numeric field with a corresponding value – blank in case of a string, and zero, in case of a numeric field.

1569932949212

· Null strings with blanks: replaces all null strings with blanks

· Null numerics with zeros: replaces all null numeric values with zeros

Find and Replace

The Find and Replace options enable users to replace a value in the source dataset with another value.

1569933028345

It also provides users the option to choose whether the find and replace function is to be performed on a case sensitive basis. You can select a search mode from three options:

· Normal – will perform a normal find and replace function

As in this example, we want to change the status from ‘Planned’ to ‘Scheduled.’

1569933082621

So, we’ll type in ‘Planned’ in the Find bar and ‘Scheduled’ in the Replace bar.

1569933128389

Now, if we look at the output, we can see that the Data Cleanse transformation has found and replaced the status values from ‘Planned’ to ‘Scheduled.’

1569933163228

· Extended – allows you to search for tabs(\t), newline(\r\n), or a character by its value (\o, \x, \b, \d, \t, \n, \r and ) and replace with the desired value

In the example below, we want to replace whitespaces within our source values with a hyphen (-).

1569933199995

So, we’ll type ‘\s’ in the Find bar and ‘-’ in the Replace bar.

1569933227115

Now, if we look at the output, we can see that the Data Cleanse transformation has found and replaced whitespaces from within the values with a hyphen.

Preview before applying the Extended Find and Replace function

1569933253385

Preview after applying the Extended Find and Replace function

1569933278028

· Regular Expressions – allows you to find and replace a value based on a regular expression

In the example below, we want to replace the ALFKI value(s) in the CustomerID field with A1234.

1569933316200

For this, we will write a regex in the Find bar and the desired value in the Replace bar.

1569933337947

Now, if we look at the preview, you can see that Centerprise has replaced the values in the source data with the desired values.

1569933357183

Case

Case options allow users to convert the letter case of source data to Upper, Lower, or Title case.

1569933379524

You can choose from the following options:

· None – keeps the letter case as is

1569933406820

· Upper – changes all letters to upper case

1569933467575

· Lower – change all letters to lower case

1569933474518

· Title – changes all letters to title case

1569933486225

Modify Data

The Modify Data option provides you the flexibility and convenience of applying an expression to all fields in your data. Check the ‘Run expression on all fields’ option to activate this feature.

The ‘Run Expression on all fields’ feature was previously called ApplyToAll and offered as a standalone transformation in Centerprise 7.5 and previous releases. It had a limited functionality though, compared to the all-new Data Cleanse transformation, which is why it has been replaced altogether with the Data Cleanse transformation in Centerprise 7.6.

The ‘Run Expression on all fields’ feature is enabled by default for any existing flows created prior to Centerprise 7.6. This means that existing flows created on Centerprise 7.5 or a prior release will continue to work seamlessly on 7.6 upgrade and won’t require any modification at all.

1569933583931

Now click on this 1569933617264 button to open the expression builder.

1569933639076

Here, you can choose from the extensive library of built-in expressions and apply it on all the mapped fields by adding it to a $FieldValue parameter.

As in this example, we have mapped a regular expression to $FieldValue parameter.

1569933815346

Now if we look at the preview, you can see that Centerprise has applied the regular expression to all fields and removed whitespaces from the values.

Preview before running the expression on all fields:

DC-V7

Preview after running the expression on all fields:

1569934023746

This function was previously performed using the ApplyToAll transformation in Centerprise 7.5 and previous releases. However, in Centerprise 7.6, you can perform this and other data cleanse tasks using the Data Cleanse transformation.