informatica best practices

Informatica Best Practices

By Yogaraj Kathirvelu

Based on Informatica Velocity Document and the Personal experience, Please feel free to drop a

comment to improve the quality of the document.

1. General

Category Sub Category Area Best Practices

File Handling

Source Extracts Flat File Type

Loading data from Fixed-width files take less time than delimited, since delimited files require extra parsing. In case of Fixed width files, Integration service know the Start and End position of each column upfront and thus reduces the processing time.

File Handling

Source Extracts Flat File Location

Using flat files located on the server machine loads faster than a database located on the server machine.

General Transformations

Number of Transformations

Reduce the number of transformations. There is always overhead involved in moving data between transformations

General Transformations Shared Memory

Consider more shared memory for large number of transformations. Session shared memory between 12MB and 40MB should suffice.

General Transformations Reusability

Calculate once, use many times. Use both mapping variables and ports. For example Within an expression, use variable ports to calculate a value that can be used multiple times within that transformation.

General Transformations Reusability

Use mapplets to encapsulate multiple reusable transformations. Use mapplets to leverage the work of critical developers and minimize mistakes when performing similar functions.

General Transformations Data Flow

Map only required Ports between Transformations. Delete unnecessary links between transformations to minimize the amount of data moved, particularly in the Source Qualifier. Also, active transformations that reduce the number of records should be used as early in the mapping.

General Transformations Data Type

The engine automatically converts compatible types. Sometimes data conversion is excessive. Data types are automatically converted when types differ between connected ports. Minimize data type changes between transformations by planning data flow prior to developing the mapping.

General Transformations Default value on Ports

Remove all "DEFAULT" value expressions where possible. Having a default value even the "ERROR (xxx)" command slows down the session. It causes an unnecessary evaluation of values for every data element in the map.

General Session Tracing The Override Tracing of a session should be Normal or Terse.

2. Transformation Specific

Category

Sub

Category Area Best Practices

Transformation Lookup Ports

In lookup transformations, change unused ports to be neither input nor output. This makes the transformations cleaner looking. It also makes the generated SQL override as small as possible, which cuts down on the amount of cache necessary and thereby improves performance.

Transformation Lookup Cache Size

Caching is often faster on very large lookup tables, When your source is large, cache lookup table columns for those lookup tables of 500,000 rows or less. This is only true if the standard row byte count is 1,024 or less. This typically improves performance by 10 to 20 percent. Cache only lookup tables if the number of lookup calls is more than 10 to 20 percent of the lookup table rows.

Transformation Lookup Lookup Condition

When using a Lookup Table Transformation, If you include multiple conditions, enter the conditions in the following order to optimize lookup performance: Equal to (=) Less than () -less than or equal to (=) Not equal to (!=)

Transformation Lookup Use Concurrent caches

We can configure the session to build caches sequentially or concurrently. When you build sequential caches, the Integration Service creates caches as the source rows enter the Lookup transformation. In this case if source extraction takes more time then Integration service waits till it gets the first row which will slow down the performance. When you configure the session to build concurrent caches, the Integration Service does not wait for the first row to enter the Lookup transformation.

When Concurrent Caching enabled it does not need to wait for data to reach the Lookup transformation, even if we have multiple Lookup all the cache will be built Concurrent.

If you configure the session to build

concurrent caches for an unconnected Lookup

transformation, the Integration Service

ignores this setting and builds unconnected

Lookup transformation caches sequentially

Transformation Lookup

dynamic caching with persistent cache

For Large Lookup tables Cache the entire table to a persistent file on the first run, enable the "update else insert" option on the dynamic cache and the engine never has to go back to the database to read data from this table. You can also partition this persistent cache at run time for further performance gains.

Transformation Lookup Multi Match

On multiple matches, use the "Return any matching value" setting whenever possible. Also use this setting if the lookup is being performed to determine that a match exists, but the value returned is irrelevant. The lookup creates an index based on the key ports rather than all lookup transformation ports. This simplified indexing process can improve performance.

Transformation Lookup Comparison Avoid date comparisons in lookup; replace with string or integer. Integer would be the most preferred option

Transformation Sequence Generator

Cache Value

To optimize Sequence Generator transformations, create a reusable Sequence Generator and using it in multiple mappings simultaneously. Also, configure the Number of Cached Values property. The Number of Cached Values property determines the number of values the Integration Service caches at one time. Make sure that the Number of Cached Value is not too small. Consider configuring the Number of Cached Values to a value greater than 1,000.

If you do not have to cache values, set the Number of Cache Values to 0. Sequence Generator transformations that do not use cache are faster than those that require cache.

Transformation Filter Filter Expressions

Try to create the filter (true/false) answer inside a

port expression upstream. Complex filter

expressions slowdown the mapping. Again,

expressions/conditions operate fastest in an

Expression with an output port for the result.

Turns out - the longer the expression, or the more

complex - the more severe the speed degradation.

Place the actual expression (complex or not) in an

EXPRESSION upstream from the filter. Compute

a single numerical flag: 1 for true, 0 for false as an

output port. Pump this in to the filter - we would

see the maximum performance ability with this

configuration.

Transformation Joiner Choosing Master Source

The table with the lesser number of rows should be

the driving/master table for a faster join.

Transformation Joiner Source with duplicate

Designate the master source as the source with

fewer duplicate key values. When the Integration

Service processes a sorted Joiner transformation, it

caches rows for one hundred unique keys at a

time. If the master source contains many rows with

the same key value, the Integration Service must

cache more rows, and performance can be slowed.

Transformation Joiner Sort Source Data

To improve session performance, configure the

Joiner transformation to use sorted input. When

you configure the Joiner transformation to use

sorted data, the Integration Service improves

Performance by minimizing disk input and output.

You see the greatest performance improvement

when you work with large data sets. For an

unsorted Joiner transformation, designate the

source with fewer rows as the master source.

Transformation Source Qualifier

Using Source Filter

If Source is a DB use Source Qualifier to Filter

the data instead of using a Filter transformation

Transformation Aggregator Grouping and

Data

Aggregator transformations often slow

performance because they must group data before

processing it.

Aggregator transformations need additional

memory to hold intermediate group results.

Use the following guidelines to optimize the

performance of an Aggregator transformation:

Use numbers instead of string and dates in the

columns used for the GROUP BY.

Pass only the data which is required for

calculations, if data needs to be filtered and if it

can be done before aggregation filter it.

Limit the number of connected input/output or

output ports to reduce the amount of data the

Aggregator transformation stores in the data cache.

Transformation Aggregator Use sorted

input

The Sorted Input option decreases the use of

aggregate caches. When you use the Sorted Input

option, the Integration Service assumes all data is

sorted by group and it reads rows for a group then

it performs aggregate calculations.

When necessary, it stores group information in

memory.

The Sorted Input option reduces the amount of

data cached during the session and improves

performance. Use this option with the Source

Qualifier Number of Sorted Ports option or a

Sorter transformation to pass sorted data

to the Aggregator transformation.

You can increase performance when you use the

Sorted Input option in sessions with multiple

partitions

Transformation Aggregator incremental

aggregation

From source if we can capture only the

incremental data that affect less than half the

target, you can use incremental

Aggregation to optimize the performance of

Aggregator transformations.

When you use incremental aggregation, you apply

captured changes in the source to aggregate

calculations in a session.

The Integration Service updates the target

incrementally, rather than processing the entire

source and recalculating the same calculations

every time you run the session.

You can increase the index and data cache sizes to

hold all data in memory without paging to disk.

informatica best practices

Documents