Lineage and Impact Analysis¶
Lineage diagram in Astera Centerprise helps in tracing the data roots while Impact diagram helps in identifying where that data is consumed in the data processes.
Conducting data impact and lineage analysis is helpful in following business scenarios:
- To get a complete picture of your business data – where it is originating from, how it is transformed during the processing cycle, what target systems are impacted as a result
- To see how changing a data source will alter the established data processes
- How a specific data entity is related to other objects in a data model or an ETL flow
- To identify who has access to data and how the data is transformed at each step
The main purpose of conducting the impact and lineage analysis is to get a bird’s eye view of the journey of your business data, and how (at each level) a specific data entity will impact other entities and objects in your ETL project. When you have this information, you can:
- Make informed business decisions
- Do an impact analysis before adding a new data source or altering an existing one
- Protect and govern your enterprise data
Centerprise offers creating lineage and impact analysis diagrams at three levels:
1. Document level lineage
2. Object level lineage
3. Field level lineage
In this document, we will see how data lineage and impact feature works in Astera Centerprise.
Here, we have a project comprising of two subflows, a dataflow, and a workflow document. These flows are interlinked.
Subflows are called in the dataflow through Subflow Transformation object. Subflow_Union performs Union Transformation on incoming data, followed by Subflow_Aggregate that uses Aggregate Transformation to calculate totals of price, quantity and discount based on OrderID. A Data Quality Rule is then applied to treat records with no discount values as errors and the validated data is written to a delimited file.
Workflow orchestrates this dataflow using a Run Dataflow Task object. Based on the criterion defined in the decision object, a notification email is sent to the administrator.
Let’s create Lineage and Impact diagrams for these flows at three levels.
Field Level Lineage¶
Centerprise enables users to see data lineage and impact at field level.
For instance, you have a formula field in a dataflow, and you’d like to see the lineage (where the data is coming from in that field) and impact (what other fields/objects that field is impacting), you make use of the Field Lineage option available.
In this example, we want to see the field lineage for the TotalUnitPrice field in the Data Quality Rules object.
To see/create a field lineage diagram, right-click on the field and select Show Lineage from the context-menu. The Lineage and Impact window will open up at the bottom of the screen.
This window provides a detailed map of the fields in the lineage and the fields in the impact of TotalUnitPrice field.
If you are unable to see Lineage and Impact window, go to View > Lineage and Impact or press Ctrl + Alt + O.
Moreover, a new tab will open in Centerprise, showing you the Lineage and Impact diagram for TotalUnitPrice field.
In the above screenshot, observe that the representative element for TotalUnitPrice field is visibly prominent.
The blue links indicate the flow of data inside the TotalUnitPrice field from the fields in the Lineage to the fields in the Impact.
Links in black represent the flow of data of other fields inside the Data Quality Rules object.
Object Level Lineage¶
When using several different objects in multiple flows within a single project, manually tracing data roots of a particular object becomes difficult.
Centerprise makes tracing data roots of an object simpler by providing the option to create object-level lineage and impact diagrams.
In this example, we want to create lineage and impact diagram for DataQualityRules_Task (Run Dataflow Task) object used inside the workflow.
To create an object lineage diagram, right-click on the object’s header and select Show Lineage from the context menu.
A detailed map showing objects in the lineage and the objects in the impact of DataQualityRule_Task object can be seen in Lineage and Impact window.
A new tab will open in Centerprise, showing you the Lineage and Impact diagram for DataQualityRules_Task object.
In the above screenshot, see that the representative element for DataQualityRule_Task is visibly prominent to indicate the object under focus.
The source object for DataQualityRule_Task object is called its Lineage and the target objects are called its Impact.
Document Level Lineage¶
Once field level lineage and object level lineage diagrams are created, users can view document level lineage.
A Centerprise project can comprise of multiple interlinked flow documents (subflows, dataflows, and workflows). Tracing which flows are linked to a particular flow document in question, becomes easier with a document level lineage graph.
To see a document level lineage, go to the Centerprise tab showing object lineage, right-click on any representative element in the lineage diagram and select Show Lineage for this Document from the context menu.
In this example, we are creating a document level lineage for the dataflow. On the same tab, Centerprise will create the document level lineage and impact analysis diagram for the dataflow document.
Observe that the lineage includes two subflows and the impact comprises of a workflow shown by their representative elements. These are the flows that the DataQuality.Df dataflow document is interlinked to, which means that the data in this dataflow is coming from two subflow documents and further this dataflow is being called to a workflow.
Lineage and Impact window shows a map of documents in the lineage of DataQuality.Df and the document in its impact under separate headings.
Another way to see Data Lineage¶
There are two ways a user can access data lineage and impact in Centerprise. One way is already defined in previous sections of this article.
The other way is directly from the Project Explorer panel.
To view document level lineage, go to the Project Explorer panel, right-click on DataQuality.Df node and select Show Lineage from the context menu.
To view object level lineage, expand the Workflow.Wf node, right-click on the DataQualityRule_Task object and select Show Lineage from the context menu.
To view field level lineage, expand the DataQuality.Df node then expand DataQualityRules object, right-click on TotalUnitPrice field and select Show Lineage from the context menu.
In order to check how many lineage graphs have been created by Centerprise, you can view its output.
To check the lineage output, go to View > Output or press Ctrl + Alt + O.
An Output window opens, displaying the lineage graph output of the project. This window will only show the output once Show Lineage command has been executed.