Incremental
Aggregation is the process of capturing the changes in the source and
calculating the aggregations in a session. This process makes the
integration service to update the target incrementally and avoids the
process of calculating the aggregations on the entire source. Consider
the below sales table as an example and see how the incremental
aggregation works.
Source: YEAR PRICE ---------- 2010 100 2010 200 2010 300 2011 500 2011 600 2012 700
For
simplicity, I have used only the year and price columns of sales table.
We need to do aggregation and find the total price in each year.
When
you run the session for the first time using the incremental
aggregation, then integration service process the entire source and
stores the data in two file, index and data file. The integration
service creates the files in the cache directory specified in the
aggregator transformation properties.After the aggregation, the target table will have the below data.
Target: YEAR PRICE ---------- 2010 600 2011 1100 2012 700
Now assume that the next day few more rows are added into the source table.
Source: YEAR PRICE ---------- 2010 100 2010 200 2010 300 2011 500 2011 600 2012 700 2010 400 2011 100 2012 200 2013 800
Now
for the second run, you have to pass only the new data changes to the
incremental aggregation. So, the source will contain the last four
records. The incremental aggregation uses the data stored in the cache
and calculates the aggregation. Once the aggregation is done, the
integration service writes the changes to the target and the cache. The
target table will contains the below data.
Target: YEAR PRICE ---------- 2010 1000 2011 1200 2012 900 2013 800
Points to remember
- When you use incremental aggregation, first time you have to run the session with complete source data and in the subsequent runs you have to pass only the changes in the source data.
- Use incremental aggregation only if the target is not going to change significantly. If the incremental aggregation process changes more than hhalf of the data in target, then the session perfromance many not benfit. In this case go for normal aggregation.
Note: The integration service creates a new aggregate cache when
- A new version of mapping is saved
- Configure the session to reinitialize the aggregate cache
- Moving or deleting the aggregate files
- Decreasing the number of partitions
Configuring the mapping for incremental aggregation
Before
enabling the incremental aggregation option, make sure that you capture
the changes in the source data. You can use lookup transformation or
stored procedure transformation to remove the data which is already
processed. You can also create a trigger on the source database and can
read only the source changes in the mapping.
No comments:
Post a Comment