Data Quality Mode¶
In addition to the standard logging functionality, Centerprise provides a special Data Quality Mode, which is useful for advanced profiling and debugging. When a dataflow is created/opened in data quality mode, most objects on the dataflow show the Messages node with output ports.
In this document, we will learn how to use Data Quality Mode in Astera Centerprise.
Sample Use-Case¶
In this case, we have a simple dataflow designed to perform a data quality check. It contains customers data coming in from an Excel Workbook Source. A Data Quality Rule object is added to validate data for errors and perform warning checks.
If you preview the output for customers data at this stage, you will see that some of the records have missing Region and Fax values.
The data quality rules are set so that the records with empty Region field are marked as errors and the records with empty Fax field are marked as warnings. A red exclamation sign in the data preview identifies the records that have failed to match the rule and returned an error or a warning as a result.
Now for instance, we want to collect information regarding the number of errors/warnings in a single record, and the error/warning messages attached to the records, and write this information to a destination. For this purpose, we will use Data Quality Mode.
Note: The Record Level Log feature also collects and records this information but we can not further process it in the dataflow.
Activating Data Quality Mode¶
1. To activate the data quality mode, click the Data Quality Mode icon placed at the top of the dataflow designer.
Once Data Quality Mode is activated, a Messages node will be added to all the objects in the dataflow.
The Messages node captures following statistical information:
- TotalCount
- ErrorCount
- WarningCount
- InfoCount
- MessagesText
- DbAction
In addition to that, FirstItem, LastItem, and Items sub-nodes provide a way to collect quality control data for each of the records. The quality control data includes ElementName, MessageType, or Action etc. and can be written to a destination object for record-keeping purposes.
Writing to a Destination¶
Connecting the Messages node’s output ports to another object’s input ports on the dataflow makes it possible to get both - summary statistics and record-level statistics for the dataset, which is useful for analysis and debugging. To do that:
1. Drag-and-drop Delimited File Destination onto the dataflow designer by going to Toolbox > Destinations > Delimited File Destination.
2. Auto-map fields under the Messages node onto the destination object.
3. Configure settings for Delimited File Destination to save this data.
4. Right-click on the header of destination object and select Preview Output from the context menu.
A Data Preview window will open.