informatica interview

SED COMMAND IN UNIX AND LINUX EXAMPLES Sed is a Stream Editor used for modifying the files in unix (or linux). Whenever you want to make changes to the file automatically, sed comes in handy to do this. Most people never learn its power; they just simply use sed to replace text. You can do many things apart from replacing text with sed. Here I will describe the features of sed with examples.

Consider the below text file as an input.

>cat file.txt

unix is great os. unix is opensource. unix is free os.

learn operating system.

unixlinux which one you choose.

SED COMMAND EXAMPLES

1. Replacing or substituting string

Sed command is mostly used to replace the text in a file. The below simple sed command replaces the word "unix" with "linux" in the file.

>sed 's/unix/linux/' file.txt

linux is great os. unix is opensource. unix is free os.


linuxlinux which one you choose.

Here the "s" specifies the substitution operation. The "/" are delimiters. The "unix" is the search pattern and the "linux" is the replacement string.

By default, the sed command replaces the first occurrence of the pattern in each line and it won't replace the second, third...occurrence in the line.

2. Replacing the nth occurrence of a pattern in a line.

Use the /1, /2 etc flags to replace the first, second occurrence of a pattern in a line. The below command replaces the second occurrence of the word "unix" with "linux" in a line.

>sed 's/unix/linux/2' file.txt

unix is great os. linux is opensource. unix is free os.



3. Replacing all the occurrence of the pattern in a line.

The substitute flag /g (global replacement) specifies the sed command to replace all the occurrences of the string in the line.

>sed 's/unix/linux/g' file.txt

linux is great os. linux is opensource. linux is free os.



4. Replacing from nth occurrence to all occurrences in a line.

Use the combination of /1, /2 etc and /g to replace all the patterns from the nth occurrence of a pattern in a line. The following sed command replaces the third, fourth, fifth... "unix" word with "linux" word in a line.

>sed 's/unix/linux/3g' file.txt

unix is great os. unix is opensource. linux is free os.



5. Changing the slash (/) delimiter

You can use any delimiter other than the slash. As an example if you want to change the web url to another url as

>sed 's/http:\/\//www/' file.txt

In this case the url consists the delimiter character which we used. In that case you have to escape the slash with backslash character, otherwise the substitution won't work.

Using too many backslashes makes the sed command look awkward. In this case we can change the delimiter to another character as shown in the below example.

>sed 's_http://_www_' file.txt

>sed 's|http://|www|' file.txt

6. Using & as the matched string

There might be cases where you want to search for the pattern and replace that pattern by adding some extra characters to it. In such cases & comes in handy. The & represents the matched string.

>sed 's/unix/{&}/' file.txt

{unix} is great os. unix is opensource. unix is free os.


{unix}linux which one you choose.

>sed 's/unix/{&&}/' file.txt

{unixunix} is great os. unix is opensource. unix is free os.


{unixunix}linux which one you choose.

7. Using \1,\2 and so on to \9

The first pair of parenthesis specified in the pattern represents the \1, the second represents the \2 and so on. The \1,\2 can be used in the replacement string to make changes to the source string. As an example, if you want to replace the word "unix" in a line with twice as the word like "unixunix" use the sed command as below.

>sed 's/$unix$/\1\1/' file.txt

unixunix is great os. unix is opensource. unix is free os.


unixunixlinux which one you choose.

The parenthesis needs to be escaped with the backslash character. Another example is if you want to switch the words "unixlinux" as "linuxunix", the sed command is

>sed 's/$unix$$linux$/\2\1/' file.txt



linuxunix which one you choose.

Another example is switching the first three characters in a line

>sed 's/^$.$$.$$.$/\3\2\1/' file.txt

inux is great os. unix is opensource. unix is free os.

aelrn operating system.

inuxlinux which one you choose.

8. Duplicating the replaced line with /p flag

The /p print flag prints the replaced line twice on the terminal. If a line does not have the search pattern and is not replaced, then the /p prints that line only once.

>sed 's/unix/linux/p' file.txt






9. Printing only the replaced lines

Use the -n option along with the /p print flag to display only the replaced lines. Here the -n option suppresses the duplicate rows generated by the /p flag and prints the replaced lines only one time.

>sed -n 's/unix/linux/p' file.txt



If you use -n alone without /p, then the sed does not print anything.

10. Running multiple sed commands.

You can run multiple sed commands by piping the output of one sed command as input to another sed command.

>sed 's/unix/linux/' file.txt| sed 's/os/system/'

linux is great system. unix is opensource. unix is free os.


linuxlinux which one you chosysteme.

Sed provides -e option to run multiple sed commands in a single sed command. The above output can be achieved in a single sed command as shown below.

>sed -e 's/unix/linux/' -e 's/os/system/' file.txt

linux is great system. unix is opensource. unix is free os.


linuxlinux which one you chosysteme.

11. Replacing string on a specific line number.

You can restrict the sed command to replace the string on a specific line number. An example is

>sed '3 s/unix/linux/' file.txt




The above sed command replaces the string only on the third line.

12. Replacing string on a range of lines.

You can specify a range of line numbers to the sed command for replacing a string.

>sed '1,3 s/unix/linux/' file.txt




Here the sed command replaces the lines with range from 1 to 3. Another example is

>sed '2,$ s/unix/linux/' file.txt




Here $ indicates the last line in the file. So the sed command replaces the text from second line to last line in the file.

13. Replace on a lines which matches a pattern.

You can specify a pattern to the sed command to match in a line. If the pattern match occurs, then only the sed command looks for the string to be replaced and if it finds, then the sed command replaces the string.

>sed '/linux/ s/unix/centos/' file.txt



centoslinux which one you choose.

Here the sed command first looks for the lines which has the pattern "linux" and then replaces the word "unix" with "centos".

14. Deleting lines.

You can delete the lines a file by specifying the line number or a range or numbers.

>sed '2 d' file.txt

>sed '5,$ d' file.txt

15. Duplicating lines

You can make the sed command to print each line of a file two times.

>sed 'p' file.txt

16. Sed as grep command

You can make sed command to work as similar to grep command.

>grep 'unix' file.txt

>sed -n '/unix/ p' file.txt

Here the sed command looks for the pattern "unix" in each line of a file and prints those lines that has the pattern.

You can also make the sed command to work as grep -v, just by using the reversing the sed with NOT (!).

>grep -v 'unix' file.txt

>sed -n '/unix/ !p' file.txt

The ! here inverts the pattern match.

17. Add a line after a match.

The sed command can add a new line after a pattern match is found. The "a" command to sed tells it to add a new line after a match is found.

>sed '/unix/ a "Add a new line"' file.txt


"Add a new line"



"Add a new line"

18. Add a line before a match

The sed command can add a new line before a pattern match is found. The "i" command to sed tells it to add a new line before a match is found.

>sed '/unix/ i "Add a new line"' file.txt

"Add a new line"



"Add a new line"


19. Change a line

The sed command can be used to replace an entire line with a new line. The "c" command to sed tells it to change the line.

>sed '/unix/ c "Change line"' file.txt

"Change line"


"Change line"

20. Transform like tr command

The sed command can be used to convert the lower case letters to upper case letters by using the transform "y" option.

>sed 'y/ul/UL/' file.txt

Unix is great os. Unix is opensoUrce. Unix is free os.

Learn operating system.

UnixLinUx which one yoU choose.

HOW TO LOAD ROWS INTO FACT TABLE IN DATA WAREHOUSE A general data warehouse consists of dimension and fact tables. In a data warehouse the data loading into dimension tables are implemented using SCDs. Mostly SCD type 2 effective data is implemented to load dimension table. Once the dimension tables are loaded then the fact table is loaded with transactional data.

This article covers the process to load data into the fact table. Follow the below steps for loading data into the fact table.

First implement the SCD type 2 method to load data into the dimension table. As an example choose the SCD type 2 effective date to load data into the customer dimension table. The data in the customer dimension table looks as:

Table Name: Customer_Dim

Cust_key Cust_id Begin_date End_Date

------------------------------------

1 10 01-Jan-12 30-Jul-12

2 20 01-Apr-12 30-Aug-12

3 10 31-Jul-12 Null

4 20 31-Aug-12 Null

5 30 31-Aug-12 Null

Let say, you want to load the sales fact table. Consider the following source transactional (sales) data:

Table Name: sales

Cust_id Price

-------------

10 10000

20 50000

30 20000

When loading the data into the fact table, you have to get the relavant dimensional keys (surrogate keys) from all the dimension tables and then insert the records into the fact table. When getting the dimension keys from the dimension table we have to get the rows for which the End_date column is Null. The following SQL query shows how to load the data into the fact table

SELECT s.price, c.cust_key

FROM cust_dim c

sale s

WHERE c.cust_id = s.cust_id

AND c.end_date is null

After loading the data in the fact table looks as

Table Name: Sales_fact

Price Cust_key

--------------

10000 3

50000 4

30000 5

UPDATE STRATEGY - SESSION SETTINGS IN INFORMATICA This post is continuation to my previous one on update strategy. Here we will see the different settings that we can configure for update strategy at session level.

Single Operation of All Rows:

http://www.folkstalk.com/2012/05/update-strategy-transformation-in.html

We can specify a single operation for all the rows using the "Treat Sources Rows As" setting in the session properties tab. The different values you can specify for this option are:

Insert: The integration service treats all the rows for insert operation. If inserting a new row violates the primary key or foreign key constraint in the database, then the integration service rejects the row.

Delete: The integration service treats all the rows for delete operation and deletes the corresponding row in the target table. You must define a primary key constraint in the target definition.

Update: The integration service treats all the rows for update operation and updates the rows in the target table that matches the primary key value. You must define a primary key in the target definition.

Data Driven: An update strategy transformation must be used in the mapping. The integration service either inserts or updates or deletes a row in the target table based on the logic coded in the update strategy transformation. If you do not specify the data driven option when you are using a update strategy in the mapping, then the workflow manager displays a warning. The integration service does not follow the instructions in the update strategy transformation.

Update Strategy Operations for each Target Table:

You can also specify the update strategy options for each target table individually. Specify the update strategy options for each target in the Transformations view on the Mapping tab of the session:

Insert: Check this option to insert a row in the target table. Delete: Check this option to delete a row in the target table. Truncate Table: check this option to truncate the target table before loading the data. Update as Update: Update the row in the target table. Update as Insert: Insert the row which is flagged as update. Update else Insert: If the row exists in the target table, then update the row. Otherwise,

insert the row.

The below table illustrates how the data in target table is inserted or updated or deleted for various combinations of "Row Flagging" and "Settings of Individual Target Table".

Row Flagging Type

Target Table Settings

Result

Insert Insert is specified Source row is inserted into the target.

InsertInsert option is not specified

Source row is not inserted into the target

Delete Delete option is If the row exists in target, then it will be deleted.

specified

DeleteDelete option is not specified

Even if the row exists in target, then it will not be deleted from the target.

Update Update as Update If the row exists in target, then it will be updated.

UpdateInsert is specifiedUpdate as Insert is specified

Even if the row is flagged as udpate, it will not be updated in Target. Instead, the row will be inserted into the target.

Update

Insert is not specifiedUpdate as Insert is Specified.

Neither update nor insertion of row happens

UpdateInsert is specifiedUpdate else Insert is specified

If the row exists in target, then it will be updated. Otherwise it will be inserted.

Update

Insert is not specifiedUpdate else Insert is Specified

If the row exists in target, then it will be updated. Row will not be inserted in case if it not exists in target

TARGET UPDATE OVERRIDE - INFORMATICA When you used an update strategy transformation in the mapping or specified the "Treat Source Rows As" option as update, informatica integration service updates the row in the target table whenever there is match of primary key in the target table found.

The update strategy works only

when there is primary key defined in the target definition. When you want update the target table based on the primary key.

What if you want to update the target table by a matching column other than the primary key? In this case the update strategy wont work. Informatica provides feature, "Target Update Override", to update even on the columns that are not primary key.

You can find the Target Update Override option in the target definition properties tab. The syntax of update statement to be specified in Target Update Override is

UDATE TARGET_TABLE_NAME

SET TARGET_COLUMN1 = :TU.TARGET_PORT1,

[Additional update columns]

WHERE TARGET_COLUMN = :TU.TARGET_PORT

AND [Additional conditions]

Here TU means target update and used to specify the target ports.

Example: Consider the employees table as an example. In the employees table, the primary key is employee_id. Let say we want to update the salary of the employees whose employee name is MARK. In this case we have to use the target update override. The update statement to be specified is

UPDATE EMPLOYEES

SET SALARY = :TU.SAL

WHERE EMPLOYEE_NAME = :TU.EMP_NAME

CONSTRAINT BASED LOADING IN INFORMATICA Constraint based load ordering is used to load the data first in to a parent table and then in to the child tables. You can specify the constraint based load ordering option in the Config Object tab of the session. When the constraint based load ordering option is checked, the integration service order the target load order on a row by row basis. For every row generated by the active source, the integration service first loads the row into the primary key table and then to the foreign key tables. The constraint based loading is helpful to normalize the data from a denormalized source data.

The constraint based load ordering option applies for only insert operations. You cannot update or delete the rows using the constraint base load ordering. You have to define the primary key and foreign key relationships for the targets in the warehouse or

target designer. The target tables must be in the same Target connection group.

Complete Constraint based load ordering

There is a work around to do updates and deletes using the constraint based load ordering. The informatica powercenter provides an option called complete constraint-based loading for inserts, updates and deletes in the target tables. To enable complete constraint based loading, specify FullCBLOSupport=Yes in the Custom Properties attribute on the Config Object tab of session. This is shown in the below image.

When you enable complete constraint based loading, the change data (inserts, updates and deletes) is loaded in the same transaction control unit by using the row ID assigned to the data by the CDC reader. As a result the data is applied to the target in the same order in which it was applied to the sources. You can also set this property in the integration service, which makes it applicable for all the sessions and workflows. When you use complete constraint based load ordering, mapping should not contain active transformations which change the row ID generated by the CDC reader.

The following transformations can change the row ID value

Aggregator Transformation Custom Transformation configured as an active Joiner Transformation Normalizer Transformation Rank Transformation Sorter Transformation

Mapping Implementation of constraint based load ordering

As an example, consider the following source table with data to be loaded into the target tables using the custom transformation.

Table Name: EMP_DEPT

Create table emp_dept

(

dept_id number,

dept_name varchar2(30),

http://2.bp.blogspot.com/-4vRmAPOiBXQ/T2cUfmD3XpI/AAAAAAAAARQ/_x81hjc_KzA/s1600/Constraint_based_load_ordering.jpg

emp_id number,

emp_name varchar2(30)

);

dept_id dept_name emp_id emp_name

---------------------------------

10 Finance 1 Mark

10 Finance 2 Henry

20 Hr 3 Christy

20 Hr 4 Tailor

The target tables should contain the below data.

Target Table 1: Dept

Create table dept

(

dept_id number primary key,

dept_name varchar2(30)

);

dept_id dept_name

-----------------

10 Finance

20 Hr

Target Table 2: Emp

create table emp

(

dept_id number,

emp_id number,

emp_name varchar2(30),

foreign key (dept_id) references dept(dept_id)

);

dept_id emp_id emp_name

---------------------------------

10 1 Mark

10 2 Henry

20 3 Christy

20 4 Tailor

Follow the below steps for creating the mapping using constraint based load ordering option.

Create the source and target tables in the oracle database Go to the mapping designer, source analyzer and import the source definition from the oracle

database. Now go to the warehouse designer or target designer and import the target definitions from the

oracle database. Make sure that the foreign key relationship exists between the dept and emp targets. Otherwise

create the relationship as shown in the below images.

Now create a new mapping. Drag the source and targets into the mapping. Connect the appropriate ports of source qualifier transformation to the target definition as shown in

the below image.

Go to the workflow manager tool, create a new workflow and then session. Go to the Config object tab of session and check the option of constraint based load ordering. Go to the mapping tab and enter the connections for source and targets. Save the mapping and run the workflow.

http://4.bp.blogspot.com/-qtUhuKRD8W0/T2cUpsTpheI/AAAAAAAAARY/oDoTHZ3Wz8Y/s1600/target_relational_source_primary_key.jpg

http://2.bp.blogspot.com/-TC2W-abSeJs/T2cUujBTZpI/AAAAAAAAARg/8iP7Atrra_k/s1600/target_relational_source_foreign_key.jpg

http://1.bp.blogspot.com/-ziRUBK8vB7Q/T2cU2lHQk2I/AAAAAAAAARo/gJlmeKO3re8/s1600/Constraint_based_load_ordering_mapping.jpg

INCREMENTAL AGGREGATION IN INFORMATICA Incremental Aggregation is the process of capturing the changes in the source and calculating the aggregations in a session. This process makes the integration service to update the target incrementally and avoids the process of calculating the aggregations on the entire source. Consider the below sales table as an example and see how the incremental aggregation works.

Source:

YEAR PRICE

----------

2010 100

2010 200

2010 300

2011 500

2011 600

2012 700

For simplicity, I have used only the year and price columns of sales table. We need to do aggregation and find the total price in each year.

When you run the session for the first time using the incremental aggregation, then integration service process the entire source and stores the data in two file, index and data file. The integration service creates the files in the cache directory specified in the aggregator transformation properties.

After the aggregation, the target table will have the below data.

Target:

YEAR PRICE

----------

2010 600

2011 1100

2012 700

Now assume that the next day few more rows are added into the source table.

Source:

YEAR PRICE

----------

2010 100

2010 200

2010 300

2011 500

2011 600

2012 700

2010 400

2011 100

2012 200

2013 800

Now for the second run, you have to pass only the new data changes to the incremental aggregation. So, the source will contain the last four records. The incremental aggregation uses the data stored in the cache and calculates the aggregation. Once the aggregation is done, the integration service writes the changes to the target and the cache. The target table will contains the below data.

Target:

YEAR PRICE

----------

2010 1000

2011 1200

2012 900

2013 800

Points to remember

1. When you use incremental aggregation, first time you have to run the session with complete source data and in the subsequent runs you have to pass only the changes in the source data.

2. Use incremental aggregation only if the target is not going to change significantly. If the incremental aggregation process changes more than hhalf of the data in target, then the session perfromance many not benfit. In this case go for normal aggregation.

Note: The integration service creates a new aggregate cache when

A new version of mapping is saved Configure the session to reinitialize the aggregate cache Moving or deleting the aggregate files Decreasing the number of partitions

Configuring the mapping for incremental aggregation

Before enabling the incremental aggregation option, make sure that you capture the changes in the source data. You can use lookup transformation or stored procedure transformation to remove the data which is already processed. You can also create a trigger on the source database and can read only the source changes in the mapping.

AVOIDING SEQUENCE GENERATOR TRANSFORMATION IN INFORMATICA Q) How to generate sequence numbers without using the sequence generator transformation?

We use sequence generator transformation mostly in SCDs. Using a sequence generator transformation to generate unique primary key values can cause performance issues as an additional transformation is required to process in mapping.

You can use expression transformation to generate surrogate keys in a dimensional table. Here we will see the logic on how to generate sequence numbers with expression transformation.

Sequence Generator Reset Option:

When you use the reset option in a sequence generator transformation, the sequence generator

uses the original value of Current Value to generate the numbers. The sequences will always start from the same number.

As an example, if the Current Value is 1 with reset option checked, then the sequences will always start from value 1 for multiple session runs. We will see how to implement this reset option with expression transformation.

Follow the below steps:

Create a mapping parameter and call it as $$Current_Value. Assign the default value to this parameter, which is the start value of the sequence numbers.

Now create an expression transformation and connect the source qualifier transformation ports to the expression transformation.

In the expression transformation create the below additional ports and assign the expressions:

v_seq (variable port) = IIF(v_seq>0,v_seq+1,$$Current_Value)

o_key (output port) = v_seq

The v_seq port generates the numbers same as NEXTVAL port in sequence generator transformation.

Primary Key Values Using Expression and Parameter:

We will see here how to generate the primary key values using the expression transformation and a parameter. Follow the below steps:

Create a mapping to write the maximum value of primary key in the target to a parameter file. Assign the maximum value to the parameter ($$MAX_VAL) in this mapping. Create a session for this mapping. This should be the first session in the workflow.

Create another mapping where you want to generate the sequence numbers. In this mapping, connect the required ports to the expression transformation, create the below additional ports in the expression transformation and assign the below expressions:

v_cnt (variable port) = v_cnt+1

v_seq (variable port) = IIF( ISNULL($$MAX_VAL) OR $$MAX_VAL=0,1,v_cnt+$$MAX_VAL)

o_surrogate_key (output port) = v_seq

The o_surrogate_key port generates the primary key values just as the sequence generator transformation.

Primary Key Values Using Expression and Lookup Transformations:

Follow the below steps to generate sequence numbers using expression and lookup transformations.

Create an unconnected lookup transformation with lookup table as target. Create a primary_key_column port with type as output/lookup/return in the lookup ports tab. Create another port input_id with type as input. Now overwrite the lookup query to get the maximum value of primary key from the target. The query looks as

SELECT MAX(primary_key_column) FROM Dimension_table

Specify the lookup condition as primary_key_column >= input_id Now create an expression transformation and connect the required ports to it. Now we will call the

unconnected lookup transformation from this expression transformation. Create the below additional port in the expression transformation:

v_cnt (variable port) = v_cnt+1

v_max_val (variable port) = IIF(v_cnt=1, :LKP.lkp_trans(1), IIF(ISNULL(v_max_val) or v_max_val=0, 1, v_max_val))

v_seq (variable port) = IIF(ISNULL(v_max_val) or v_max_val=0, 1, v_cnt+v_max_val)

o_primary_key (output port) = v_seq

The o_primary_key port generates the surrogate key values for the dimension table.

PMCMD COMMAND USAGE IN INFORMATICA Informatica provides four built-in command line programs or utilities to interact with the informatica features. They are:

infacmd infasetup pmcmd pmrep

This article covers only about the pmcmd command. The pmcmd is a command line utility provided by the informatica to perform the following tasks.

Start workflows. Start workflow from a specific task. Stop, Abort workflows and Sessions. Schedule the workflows.

How to use PMCMD Command in Informatica:

1. Scheduling the workflow

The pmcmd command syntax for scheduling the workflow is shown below:

pmcmd scheduleworkflow -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name

You cannot specify the scheduling options here. This command just schedules the workflow for the next run.

2. Start workflow

The following pmcmd command starts the specified workflow:

pmcmd startworkflow -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name

3. Stop workflow

Pmcmd command to stop the infromatica workflow is shown below:

pmcmd stopworkflow -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name

4. Start workflow from a task

You can start the workflow from a specified task. This is shown below:

pmcmd startask -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name -startfrom task-name

5. Stopping a task.

The following pmcmd command stops the specified task instance:

pmcmd stoptask -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name task-name

6. Aborting workflow and task.

The following pmcmd commands are used to abort workflow and task in a workflow:

pmcmd abortworkflow -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name

pmcmd aborttask -service informatica-integration-Service -d domain-name -u user-name -p password -f folder-name -w workflow-name task-name

DIFFERENCE BETWEEN STOP AND ABORT IN INFORMATICA You can stop or abort running workflow by one of the following ways:

Issuing stop of abort in the informatica workflow monitor Issuing stop of abort command in pmcmd. specifying in the control task.

Stopping or Aborting Task:

When you stop, the integration service first tries to stop processing the task. The integration service does not process other tasks that are in sequence. However it process the tasks that are in parallel to the task on which the stop or abort command is issued. If the Integration Service cannot stop the task, you can try to abort the task. When you abort a task, the Integration Service kills the process on the task.

Stopping or Aborting a Session Task:

When you issue a stop command on a session, the integration service first stops reading the data from the sources. It continues processing and writing data to the targets and then commits the data.

Abort command is handled the same way as the stop command, except that the abort command has timeout period of 60 seconds. If the Integration Service cannot finish processing and committing data within the timeout period, it kills the DTM process and terminates the session.

Difference Between Stop and Abort:

When you run a session, it holds memory blocks in the OS. When issue a abort on the session, it kills the threads and leaves the memory blocks. This causes memory issues in the server and leads to poor performance. Some operating systems clean the lost memory blocks automatically. However most of the operating systems do not clean up these memory blocks. Stop is clean way of killing the sessions and cleans up the memory blocks.

REUSABLE VS NON REUSABLE & PROPERTIES OF SEQUENCE GENERATOR TRANSFORMATION We will see the difference of reusable and non reusable sequence generator transformation along with the properties of the transformation.

Sequence Generator Transformation Properties:

You have to configure the following properties of a sequence generator transformation:

Start Value:

Specify the Start Value when you configure the sequence generator transformation for Cycle option. If you configure the cycle, the integration service cycles back to this value when it reaches the End Value. Use Cycle to generate a repeating sequence numbers, such as numbers 1 through 12 to correspond to the months in a year. To cycle the integration service through a sequence:

Enter the lowest value in the sequence to use for the Start Value. Enter the highest value to be used for End Value. Select Cycle option.

Increment By:

The Integration service generates sequence numbers based on the Current Value and the Increment By properties in the sequence generator transformation. Increment By is the integer the integration service adds to the existing value to create the new value in the sequence. The default value of Increment By is 1.

End Value:

End value is the maximum value that the integration service generates. If the integration service reaches the end value and the sequence generator is not configured for cycle option, then the session fails with the following error message:

TT_11009 Sequence Generator Transformation: Overflow error.

If the sequence generator is configured for cycle option, then the integration service cycles back to the start value and starts generating numbers from there.

Current Value:

The integration service uses the Current Value as the basis for generated values for each session. Specify the value in "Current Value" you want the integration service as a starting value to generate sequence numbers. If you want to cycle through a sequence of numbers, then the current value must be greater than or equal to the Start Value and less than the End Value.

At the end of the session, the integration service updates the current value to the last generated sequence number plus the Increment By value in the repository if the sequence generator Number of Cached Values is 0. When you open the mapping after a session run, the current value displays the last sequence value generated plus the Increment By value.

Reset:

The reset option is applicable only for non reusable sequence generator transformation and it is disabled for reusable sequence generator. If you select the Reset option, the integration service based on the original current value each time it starts the session. Otherwise the integration service updates the current value in the repository with last value generated plus the increment By value.

Number of Cached Values:

The Number of Cached Values indicates the number of values that the integration service caches at one time. When this value is configured greater than zero, then the integration service caches the specified number of values and updates the current value in the repository.

Non Reusable Sequence Generator:

The default value of Number of Cached Values is zero for non reusable sequence generators. It means the integration service does not cache the values. The integration service, accesses the Current Value from the repository at the start of the session, generates the sequence numbers, and then updates the current value at the end of the session.

When you set the number of cached values greater than zero, the integration service caches the specified number of cached values and updates the current value in the repository. Once the cached values are used, then the integration service again accesses the current value from repository, caches the values and updates the repository. At the end of the session, the integration service discards any unused cached values.

For non-reusable sequence generator setting the Number of Cached Values greater than zero can increase the number of times the Integration Service accesses the repository during the session. And also discards unused cache values at the end of the session. As an example when you set the Number of Cached Values to 100 and you want to process only 70 records in a session. The integration service first caches 100 values and updates the current value with 101. As there are only 70 rows to be processed, only the first 70 sequence number will be used and the remaining 30 sequence numbers will be discarded. In the next run the sequence numbers starts from 101.

The disadvantage of having Number of Cached Values greater than zero are: 1) Accessing the repository multiple times during the session. 2) Discarding of unused cached values, causing discontinuous sequence numbers

Reusable Sequence Generators:

The default value of Number of Cached Values is 100 for reusable sequence generators. When you are using the reusable sequence generator in multiple sessions which run in parallel, then specify the Number of Cache Values greater than zero. This will avoid generating the same sequence numbers in multiple sessions.

If you increase the Number of Cached Values for reusable sequence generator transformation, the

number of calls to the repository decreases. However there is chance of having highly discarded values. So, choose the Number of Cached values wisely.

AGGREGATOR TRANSFORMATION IN INFORMATICA Aggregator transformation is an active transformation used to perform calculations such as sums, averages, counts on groups of data. The integration service stores the data group and row data in aggregate cache. The Aggregator Transformation provides more advantages than the SQL, you can use conditional clauses to filter rows.

Creating an Aggregator Transformation:

Follow the below steps to create an aggregator transformation

Go to the Mapping Designer, click on transformation in the toolbar -> create. Select the Aggregator transformation, enter the name and click create. Then click Done. This will

create an aggregator transformation without ports. To create ports, you can either drag the ports to the aggregator transformation or create in the

ports tab of the aggregator.

Configuring the aggregator transformation:

You can configure the following components in aggregator transformation

Aggregate Cache: The integration service stores the group values in the index cache and row data in the data cache.

Aggregate Expression: You can enter expressions in the output port or variable port. Group by Port: This tells the integration service how to create groups. You can configure input,

input/output or variable ports for the group. Sorted Input: This option can be used to improve the session performance. You can use this option

only when the input to the aggregator transformation in sorted on group by ports.

Properties of Aggregator Transformation:

The below table illustrates the properties of aggregator transformation

Property Description

Cache Directory The Integration Service creates the index and data cache files.

Tracing Level Amount of detail displayed in the session log for this transformation.

Sorted InputIndicates input data is already sorted by groups. Select this option only if the input to the Aggregator transformation is sorted.

Aggregator Data Cache Size

Default cache size is 2,000,000 bytes. Data cache stores row data.

Aggregator Index Cache Size

Default cache size is 1,000,000 bytes. Index cache stores group by ports data

Transformation ScopeSpecifies how the Integration Service applies the transformation logic to incoming data

Group By Ports:

The integration service performs aggregate calculations and produces one row for each group. If you do not specify any group by ports, the integration service returns one row for all input rows. By default, the integration service returns the last row received for each group along with the result of aggregation. By using the FIRST function, you can specify the integration service to return the first row of the group.

Aggregate Expressions:

You can create the aggregate expressions only in the Aggregator transformation. An aggregate expression can include conditional clauses and non-aggregate functions. You can use the following aggregate functions in the Aggregator transformation,

AVG

COUNT

FIRST

LAST

MAX

MEDIAN

MIN

PERCENTILE

STDDEV

SUM

VARIANCE

Examples: SUM(sales), AVG(salary)

Nested Aggregate Functions:

You can nest one aggregate function within another aggregate function. You can either use single-level aggregate functions or multiple nested functions in an aggregate transformation. You cannot use both single-level and nested aggregate functions in an aggregator transformation. The Mapping designer marks the mapping as invalid if an aggregator transformation contains both single-level and nested aggregate functions. If you want to create both single-level and nested aggregate functions, create separate aggregate transformations.

Examples: MAX(SUM(sales))

Conditional clauses:

You can reduce the number of rows processed in the aggregation by specifying a conditional clause.

Example: SUM(salary, slaray>1000)

This will include only the salaries which are greater than 1000 in the SUM calculation.

Non Conditional clauses:

You can also use non-aggregate functions in aggregator transformation.

Example: IIF( SUM(sales) <20000, SUM(sales),0)

Note: By default, the Integration Service treats null values as NULL in aggregate functions. You can change this by configuring the integration service.

Incremental Aggregation:

After you create a session that includes an Aggregator transformation, you can enable the session option, Incremental Aggregation. When the Integration Service performs incremental aggregation, it passes source data through the mapping and uses historical cache data to perform aggregation calculations incrementally.

Sorted Input:

You can improve the performance of aggregator transformation by specifying the sorted input. The Integration Service assumes all the data is sorted by group and it performs aggregate calculations as it reads rows for a group. If you specify the sorted input option without actually sorting the data, then integration service fails the session.

informatica interview

Documents