03 sort dedup and reformat components
TRANSCRIPT
-
8/12/2019 03 Sort Dedup and Reformat Components
1/41
Accenture Ab Initio Training 1
Introduction toAb Initio
Prepared By : Ashok Chanda
-
8/12/2019 03 Sort Dedup and Reformat Components
2/41
Accenture Ab Initio Training 2
Ab initio Session 3
SortSort within GroupDeDup SortedReformat: Example showing multipleoutput
-
8/12/2019 03 Sort Dedup and Reformat Components
3/41
Accenture Ab Initio Training 3
Graphical DevelopmentEnvironment GDE
-
8/12/2019 03 Sort Dedup and Reformat Components
4/41
Accenture Ab Initio Training 4
Components
Components may run on any computer runningthe Co>Operating System.
Different components do different jobs.The particular work a component accomplishesdepends upon its parameter settings.
Some parameters are data transformations, thatis business rules to be applied to an input (s) toproduce a required output.
-
8/12/2019 03 Sort Dedup and Reformat Components
5/41
Accenture Ab Initio Training 5
The Sort Component
Reads records from input port, sorts themby key, and writes the result on the outputport.
-
8/12/2019 03 Sort Dedup and Reformat Components
6/41
Accenture Ab Initio Training 6
Sorting
-
8/12/2019 03 Sort Dedup and Reformat Components
7/41
Accenture Ab Initio Training 7
Sorting - The Key SpecifierEditor
-
8/12/2019 03 Sort Dedup and Reformat Components
8/41
Accenture Ab Initio Training 8
Parameters for Sort
key(key specifier, required)Name(s) of the key field(s) and the sequence specifier(s)
you want Sort to use when it orders data records.max-core
(integer, required)Maximum memory usage in bytes. The default value of
max-core is 100663296 (100 megabytes). When Sortreaches the number of bytes specified in the max-core parameter, it sorts the records it has read and writes atemporary file to disk.
-
8/12/2019 03 Sort Dedup and Reformat Components
9/41
Accenture Ab Initio Training 9
Runtime Behavior of Sort
The Sort component:Reads the records from all the flows connected to the in port until it reaches the number of bytes specified in themax-core parameter.Sorts the records and writes the results to a temporaryfile on disk.Repeats this procedure until it has read all records.
Merges all the temporary files, maintaining the sort orderWrites the result to the out port Sort stores temporaryfiles in the working directories specified by its layout.
-
8/12/2019 03 Sort Dedup and Reformat Components
10/41
Accenture Ab Initio Training 10
Sort within Groups
Sort within Groups refines the sorting ofdata records already sorted according toone key specifier: it sorts the recordswithin the groups formed by the first sortaccording to a second key specifier.
-
8/12/2019 03 Sort Dedup and Reformat Components
11/41
Accenture Ab Initio Training 11
Parameters for Sort within
Groups major-key
(key specifier, required) :Name(s) of the key field(s) and the sequencespecifier(s) by which Sort within Groups assumes input is ordered.minor-key
(key specifier, required) :Name(s) of the key field(s) and the sequencespecifier(s) you want Sort within Groups to use when it orders datarecords.max-core
(integer, required) :Maximum memory usage in bytes before Sortwithin Groups stops the execution of the graph. The default value ofmax-core is 10485760 (10 megabytes).
-
8/12/2019 03 Sort Dedup and Reformat Components
12/41
Accenture Ab Initio Training 12
Runtime Behavior of Sort
within Groups Sort within Groups assumes input records are sortedaccording to the major-key parameter. Sort withinGroups reads data records from all the flows connected
to the in port until it either reaches the end of a groupor reaches the number of bytes specified in the max-core parameter. When Sort within Groups reaches theend of a group, it does the following:Sorts the records in the group according to the minor-key parameterWrites the results to the out portRepeats this procedure with the next group
-
8/12/2019 03 Sort Dedup and Reformat Components
13/41
Accenture Ab Initio Training 13
Sort Within Groups : Example
Input data sorted by Cust Id Output data Sorted by Cust Id And Tran Id
Major Key
Minor Key
-
8/12/2019 03 Sort Dedup and Reformat Components
14/41
Accenture Ab Initio Training 14
Sort Within Group
Major Key : classMinor Key : roll_nbr
-
8/12/2019 03 Sort Dedup and Reformat Components
15/41
Accenture Ab Initio Training 15
Dedup Sorted
Dedup Sorted separates one specifieddata record in each group of data recordsfrom the rest of the records in the group.Dedup Sorted requires grouped input
-
8/12/2019 03 Sort Dedup and Reformat Components
16/41
Accenture Ab Initio Training 16
Input
Output
Removing Duplicates
Occurrence of DuplicateRecords in Customer Infofile
-
8/12/2019 03 Sort Dedup and Reformat Components
17/41
Accenture Ab Initio Training 17
Delete Duplicates
Deletes duplicates from a group of records based on the key/s Data should be sorted on the same key/s before using Dedup keep property can be used to select either first, last or unique
record from within the group
-
8/12/2019 03 Sort Dedup and Reformat Components
18/41
Accenture Ab Initio Training 18
Runtime Behavior of Dedup
Sorted The Dedup Sorted component:Reads a grouped flow of records from thein port. If your records are not alreadygrouped, use Sort to group them.
Applies the expression in the select parameter to the records, if you havedefined the select parameter
-
8/12/2019 03 Sort Dedup and Reformat Components
19/41
Accenture Ab Initio Training 19
Runtime Behavior of Dedup
SortedIf the expression evaluates to 0 for a particular record,Dedup Sorted does not process the record (that is, therecord does not appear on any output port).
If the expression produces NULL for a particular record,Dedup Sorted writes the record to the reject port andwrites a descriptive error message to the error port.Dedup Sorted discards the information if you do notconnect flows to the reject or error ports.If the expression evaluates to anything other than 0 orNULL for a particular record, Dedup Sorted processes therecord. If you do not supply an expression for the select parameter, Dedup Sorted processes all the records onthe in port.
-
8/12/2019 03 Sort Dedup and Reformat Components
20/41
Accenture Ab Initio Training 20
Runtime Behavior of Dedup
SortedDedup Sorted considers any consecutive recordswith the same key value to be in the samegroup:If a group consists of one record, Dedup Sortedwrites that record to the out port.If a group consists of more than one record,Dedup Sorted uses the value of the keep
parameter to determine:Which record if any to write to the out port.Which record or records to write to the dup port.
-
8/12/2019 03 Sort Dedup and Reformat Components
21/41
Accenture Ab Initio Training 21
Runtime Behavior of Dedup
SortedIf you have chosen unique-only for thekeep parameter, Dedup Sorted does not
write records to the out port from anygroups consisting of more than onerecord.
Both the out and dup ports are optional;if you do not connect flows to them,Dedup Sorted discards the records.
-
8/12/2019 03 Sort Dedup and Reformat Components
22/41
Accenture Ab Initio Training 22
More Complex Components
In these componentsthe record format
metadata typicallychanges (goesthrough atransformation) from
input to output
-
8/12/2019 03 Sort Dedup and Reformat Components
23/41
Accenture Ab Initio Training 23
Reformat-Transform
ComponentTransform components modify ormanipulate data records by using one or
more transform functions.Reformat: Changes the record format ofyour data by dropping fields or by using
DML expressions to add fields, combinefields, or modify the data.
-
8/12/2019 03 Sort Dedup and Reformat Components
24/41
Accenture Ab Initio Training 24
Data Transformation
0345,090263John,Smith;
1000345Smith 1963.09.02
Drop
id+1000000
Reformat
Reformat Reorder
Input record format:recorddecimal(,) id; date(MMDDYY) bday; string(,)first_name; string(;) last_name;
end
Output record format: record
decimal(7) id;string(8) last_name;date(YYYY.MM.DD) bday;
end
-
8/12/2019 03 Sort Dedup and Reformat Components
25/41
Accenture Ab Initio Training 25
The Reformat Component
Reads records from input port, reformats eachaccording to a transform function (optional in thecase of the Reformat Component), and writes theresult records to the output (out0) port.
Additional output ports (out1, ...) can be created byadjusting the count parameter.
-
8/12/2019 03 Sort Dedup and Reformat Components
26/41
Accenture Ab Initio Training 26
The Transform Function Editor
-
8/12/2019 03 Sort Dedup and Reformat Components
27/41
Accenture Ab Initio Training 27
Reformat
Reformat with 5 ports
-
8/12/2019 03 Sort Dedup and Reformat Components
28/41
Accenture Ab Initio Training 28
About Transform Functions
A transform function (or transform ) is thelogic that drives data transformation
most commonly, transform functionsexpress record reformatting logic. Ingeneral, however, you can use transform
functions in data cleansing, recordmerging, and record aggregation.
-
8/12/2019 03 Sort Dedup and Reformat Components
29/41
Accenture Ab Initio Training 29
About Transform Functions
To be more specific, a transform function is a collectionof business rules, local variables, and statements. Thetransform expresses the connections between the rules,
variables, and statements, as well as the connectionsbetween these elements and the input and output fields.Transform functions are always associated withtransform components; these are components that havea transform parameter: the Aggregate, DenormalizeSorted, Fuse, Join, Match Sorted, MultiReformat,Normalize, Reformat, Rollup, and Scan components.
-
8/12/2019 03 Sort Dedup and Reformat Components
30/41
Accenture Ab Initio Training 30
About Transform Functions
Each component that has a transform parameter:
Determines the values that are passed tothe transform functionInterprets the results of the transformfunction
-
8/12/2019 03 Sort Dedup and Reformat Components
31/41
Accenture Ab Initio Training 31
Runtime Behavior of Reformat
The n in out n gives each out port a unique number.Each out n port has a corresponding reject n and error n port.
The Reformat component:Reads records from the in port.If you supply an expression for the select parameter,the expression filters the records on the in port:
If the expression evaluates to 0 for a particular record, Reformatdoes not process the record, which means that the record doesnot appear on any output port.If the expression produces NULL for any record, Reformat writesa descriptive error message and stops execution of the graph.
-
8/12/2019 03 Sort Dedup and Reformat Components
32/41
Accenture Ab Initio Training 32
Runtime Behavior of Reformat
If the expression evaluates to anything otherthan 0 or NULL for a particular record, Reformatprocesses the record. If you do not supply anexpression for the select parameter, Reformatprocesses all the records on the in port.Passes the records to the transform functions,calling the transform function on each port, inorder, for each record, beginning with out port0 and progressing through out port count - 1.Writes the results to the out ports.
-
8/12/2019 03 Sort Dedup and Reformat Components
33/41
Accenture Ab Initio Training 33
Parameters for Reformat
Reformat :Parameterscount
limitlogramp
reject-thresholdselectTransform n
-
8/12/2019 03 Sort Dedup and Reformat Components
34/41
Accenture Ab Initio Training 34
Parameters for Reformat
count :(integer, required)Integer from 1 to 20 that sets the numberof each of the following. The default is 1 .
out portsreject ports
error portstransform parameters
-
8/12/2019 03 Sort Dedup and Reformat Components
35/41
Accenture Ab Initio Training 35
Parameters for Reformat
Transform: (filename or string, optional)Either the name of the file, or a transform string,
containing a transform function corresponding toan out port; n represents the number of an out port .Transform functions for Reformat shouldhave one input and one output.select : (expression, optional)Filter for data records before reformatting.
-
8/12/2019 03 Sort Dedup and Reformat Components
36/41
Accenture Ab Initio Training 36
Parameters for Reformat
limit : (integer, required) A number representing reject events.When the reject-threshold parameter isset to Use ramp/limit , the componentuses the values of the ramp and limit
parameters in a formula to determine thecomponent's tolerance for reject events.Default is 0 .
-
8/12/2019 03 Sort Dedup and Reformat Components
37/41
Accenture Ab Initio Training 37
A Look Inside the Reformat
Component b c a
x z y
-
8/12/2019 03 Sort Dedup and Reformat Components
38/41
Accenture Ab Initio Training 38
45 QF 9
out :: trans(in) = beginout.x :: in.b - 1;out.y :: in.a;out.z :: fn(in.c);
end;
A R ecord arrives at the
input port
-
8/12/2019 03 Sort Dedup and Reformat Components
39/41
Accenture Ab Initio Training 39
45 QF9
out :: trans(in) = beginout.x :: in.b - 1;out.y :: in.a;out.z :: fn(in.c);
end;
The Transformation
Function is evaluated
Th l d i i
-
8/12/2019 03 Sort Dedup and Reformat Components
40/41
Accenture Ab Initio Training 40
out :: trans(in) = beginout.x :: in.b - 1;out.y :: in.a;out.z :: fn(in.c);
end;
44 RG 9
The result record is written tothe output port of the
component
-
8/12/2019 03 Sort Dedup and Reformat Components
41/41
Thank You
End of Session 3