Data Quality Mode

In addition to the standard logging functionality, Centerprise provides a special Data Quality Mode, which is useful for advanced profiling and debugging. When a dataflow is created/opened in data quality mode, most objects on the dataflow show the Messages node with output ports.

In this document, we will learn how to use Data Quality Mode in Astera Centerprise.

Sample Use-Case

In this case, we have a simple dataflow designed to perform a data quality check. It contains customers data coming in from an Excel Workbook Source. A Data Quality Rule object is added to validate data for errors and perform warning checks.

1_dataflow

If you preview the output for customers data at this stage, you will see that some of the records have missing Region and Fax values.

2_data_preview

The data quality rules are set so that the records with empty Region field are marked as errors and the records with empty Fax field are marked as warnings. A red exclamation sign in the data preview identifies the records that have failed to match the rule and returned an error or a warning as a result.

3_dq_preview

Now for instance, we want to collect information regarding the number of errors/warnings in a single record, and the error/warning messages attached to the records, and write this information to a destination. For this purpose, we will use Data Quality Mode.

Note: The Record Level Log feature also collects and records this information but we can not further process it in the dataflow.

Activating Data Quality Mode

1. To activate the data quality mode, click the Data Quality Mode ../_images/image52.gif icon placed at the top of the dataflow designer.

4_dq_mode

Once Data Quality Mode is activated, a Messages node will be added to all the objects in the dataflow.

5_dq_mode_activated

The Messages node captures following statistical information:

  • TotalCount
  • ErrorCount
  • WarningCount
  • InfoCount
  • MessagesText
  • DbAction

In addition to that, FirstItem, LastItem, and Items sub-nodes provide a way to collect quality control data for each of the records. The quality control data includes ElementName, MessageType, or Action etc. and can be written to a destination object for record-keeping purposes.

Writing to a Destination

Connecting the Messages node’s output ports to another object’s input ports on the dataflow makes it possible to get both - summary statistics and record-level statistics for the dataset, which is useful for analysis and debugging. To do that:

1. Drag-and-drop Delimited File Destination onto the dataflow designer by going to Toolbox > Destinations > Delimited File Destination.

6_destination

2. Auto-map fields under the Messages node onto the destination object.

7_auto-map

3. Configure settings for Delimited File Destination to save this data.

4. Right-click on the header of destination object and select Preview Output from the context menu.

8_preview_output

A Data Preview window will open.

9_data_preview