Data Stage FAQS: Aggregator Stage

Aggregator Stage

The Aggregator stage is a processing stage. It classifies data rows from a single input link into groups and computes totals or other aggregate functions for each group. The summed totals for each group are output from the stage via an output link. Follow this link for a list of steps you must take when deploying an Aggregator stage in your job.

The stage editor has three pages:

Stage page. This is always present and is used to specify general information about the stage.

Inputs page. This is where you specify details about the data being grouped and/or aggregated.

Outputs page. This is where you specify details about the groups being output from the stage.

The aggregator stage gives you access to grouping and summary operations. One of the easiest ways to expose patterns in a collection of records is to group records with similar characteristics, then compute statistics on all records in the group. You can then use these statistics to compare properties of the different groups. For example, records containing cash register transactions might be grouped by the day of the week to see which day had the largest number of transactions, the largest amount of revenue, etc.

Records can be grouped by one or more characteristics, where record characteristics correspond to column values. In other words, a group is a set of records with the same value for one or more columns. For example, transaction records might be grouped by both day of the week and by month. These groupings might show that the busiest day of the week varies by season.

Data Stage FAQS

Wednesday, February 10, 2010

Aggregator Stage

No comments:

Post a Comment

Blog Archive