Aggregate Transformation¶
Overview¶
The Aggregate transformation object provides the functionality to create aggregations of your dataset, using aggregate functions such as Sum, Count, First, Last, Min, Max, Average, Var or Standard Deviation. The dataset can be split into groups so that the aggregate value(s) can be generated for the group instead of the whole dataset. For example, calculate product count by month of year, or get average sales price by region and year.
Aggregate Transformation can be applied to unsorted data or data sorted on group by values. When applied to an input stream that is sorted on group by fields, Aggregate Transformation performs substantially better and consumes very little memory. Alternatively, when applied to unsorted datasets, Aggregate Transformation may consume substantial memory resources for large data sets and may slow down the performance of the server.
Use Case¶
In this scenario, we have products data stored in a csv file. The source file contains information such as products and supplier IDs, unit price of the various products, quantity of products available etc. Now you want to derive following information from your source data:
1. Number of products per category
2. Total price of all the products per category
3. Minimum price per category
4. Maximum price per category
We will use Aggregate Transformation to calculate the information we want. Let’s see how to go about it.
How to Work with Aggregate Transformation¶
1. Select the Source object from the sources section in the toolbox and drag-and-drop it on the designer. In this case, a Delimited File Source object is used.
2. To work with Aggregate Transformation, drag-and-drop the Aggregate Transformation object from Toolbox > Transformations > Aggregate.
3. Right-click on the transformation object and select Properties. Layout builder screen will open.
4. Write names of the fields you want to map to the transformation object in the Name column and specify the relevant aggregate functions for them.
For this case:
- CategoryID: We will select aggregate function Group-By for the field because we want to group the records based on the product categories available.
(Note: At least one field should be marked as Group-By for the Aggregate transformation to work.)
ProductID: We will select aggregate function Count for the field in order to calculate the number of products per category.
UnitPrice: The field is mapped three times.
- To calculate TotalPricePerCategory, select aggregate function Sum .
- To calculate MaxPricePerCategory, select aggregate function Max.
- To calculate MinPricePerCategory, select aggregate function as Min.
5. Click on Next, Aggregate Transformation Properties window will open.
There are three sorting options in Aggregate transformation:
- Incoming data is presorted on group by fields: In this option, aggregate requires data to be sorted by the specified group-by field.
- Sort Incoming data before building aggregate: This option will first sort the incoming data, then build its aggregate.
- Build aggregate using unsorted data: This option will build aggregate using the incoming data whether it is sorted or not.
6. Click on Next, Config Parameters window will open where you can further configure and define parameters for the Aggregate transformation.
7. Click Next, to go on the General Options screen. Click on Ok.
General Options screen:
This screen shares the options common to most objects on the dataflow.
Clear Incoming Record Messages
When this option is checked, any messages coming in from objects preceding the current object will be cleared. This is useful when you need to capture record messages in the log generated by the current object and filter out any record messages generated earlier in the dataflow.
Do Not Process Records with Errors
When this option is checked, records with errors will not be output by the object. When this option is off, records with errors will be output by the object, and a record message will be attached to the record. This record message can then feed into downstream objects on the dataflow, for example a destination file that will capture record messages, or a log that will capture the messages and as well as collect their statistics.
The Comments input allows you to enter comments associated with this object.
8. After you have configured the properties, click OK.
9. You will see the fields in the object that were added on the layout builder screen.
10. Map the data fields from the source object to the transformation object. You can auto-map the entire dataset from source to transformation object or only map the selected fields that you want to work with. In this case, we will map CategoryID, ProductID and UnitPrice because those are the fields we want to find aggregations for*.*
(Note: UnitPrice field has been mapped three times because TotalPricePerCategory, MaximumPricePerCategory and MinimumPriceperCategory will be determined by the it.)
11. Right-click on Aggregate transformation object and click Preview Output.
12. You will see that the specified aggregate functions have been applied.