informatica best practices
DESCRIPTION
Informatica best practices can be followed during the development process to improve the performance and the quality of the code.TRANSCRIPT
-
Informatica Best Practices
By Yogaraj Kathirvelu
Based on Informatica Velocity Document and the Personal experience, Please feel free to drop a
comment to improve the quality of the document.
-
1. General
Category Sub Category Area Best Practices
File Handling
Source Extracts Flat File Type
Loading data from Fixed-width files take less time than delimited, since delimited files require extra parsing. In case of Fixed width files, Integration service know the Start and End position of each column upfront and thus reduces the processing time.
File Handling
Source Extracts Flat File Location
Using flat files located on the server machine loads faster than a database located on the server machine.
General Transformations
Number of Transformations
Reduce the number of transformations. There is always overhead involved in moving data between transformations
General Transformations Shared Memory
Consider more shared memory for large number of transformations. Session shared memory between 12MB and 40MB should suffice.
General Transformations Reusability
Calculate once, use many times. Use both mapping variables and ports. For example Within an expression, use variable ports to calculate a value that can be used multiple times within that transformation.
General Transformations Reusability
Use mapplets to encapsulate multiple reusable transformations. Use mapplets to leverage the work of critical developers and minimize mistakes when performing similar functions.
General Transformations Data Flow
Map only required Ports between Transformations. Delete unnecessary links between transformations to minimize the amount of data moved, particularly in the Source Qualifier. Also, active transformations that reduce the number of records should be used as early in the mapping.
General Transformations Data Type
The engine automatically converts compatible types. Sometimes data conversion is excessive. Data types are automatically converted when types differ between connected ports. Minimize data type changes between transformations by planning data flow prior to developing the mapping.
-
General Transformations Default value on Ports
Remove all "DEFAULT" value expressions where possible. Having a default value even the "ERROR (xxx)" command slows down the session. It causes an unnecessary evaluation of values for every data element in the map.
General Session Tracing The Override Tracing of a session should be Normal or Terse.
-
2. Transformation Specific
Category
Sub
Category Area Best Practices
Transformation Lookup Ports
In lookup transformations, change unused ports to be neither input nor output. This makes the transformations cleaner looking. It also makes the generated SQL override as small as possible, which cuts down on the amount of cache necessary and thereby improves performance.
Transformation Lookup Cache Size
Caching is often faster on very large lookup tables, When your source is large, cache lookup table columns for those lookup tables of 500,000 rows or less. This is only true if the standard row byte count is 1,024 or less. This typically improves performance by 10 to 20 percent. Cache only lookup tables if the number of lookup calls is more than 10 to 20 percent of the lookup table rows.
Transformation Lookup Lookup Condition
When using a Lookup Table Transformation, If you include multiple conditions, enter the conditions in the following order to optimize lookup performance: Equal to (=) Less than () -less than or equal to (=) Not equal to (!=)
Transformation Lookup Use Concurrent caches
We can configure the session to build caches sequentially or concurrently. When you build sequential caches, the Integration Service creates caches as the source rows enter the Lookup transformation. In this case if source extraction takes more time then Integration service waits till it gets the first row which will slow down the performance. When you configure the session to build concurrent caches, the Integration Service does not wait for the first row to enter the Lookup transformation.
-
When Concurrent Caching enabled it does not need to wait for data to reach the Lookup transformation, even if we have multiple Lookup all the cache will be built Concurrent.
If you configure the session to build
concurrent caches for an unconnected Lookup
transformation, the Integration Service
ignores this setting and builds unconnected
Lookup transformation caches sequentially
Transformation Lookup
dynamic caching with persistent cache
For Large Lookup tables Cache the entire table to a persistent file on the first run, enable the "update else insert" option on the dynamic cache and the engine never has to go back to the database to read data from this table. You can also partition this persistent cache at run time for further performance gains.
Transformation Lookup Multi Match
On multiple matches, use the "Return any matching value" setting whenever possible. Also use this setting if the lookup is being performed to determine that a match exists, but the value returned is irrelevant. The lookup creates an index based on the key ports rather than all lookup transformation ports. This simplified indexing process can improve performance.
Transformation Lookup Comparison Avoid date comparisons in lookup; replace with string or integer. Integer would be the most preferred option
Transformation Sequence Generator
Cache Value
To optimize Sequence Generator transformations, create a reusable Sequence Generator and using it in multiple mappings simultaneously. Also, configure the Number of Cached Values property. The Number of Cached Values property determines the number of values the Integration Service caches at one time. Make sure that the Number of Cached Value is not too small. Consider configuring the Number of Cached Values to a value greater than 1,000.
-
If you do not have to cache values, set the Number of Cache Values to 0. Sequence Generator transformations that do not use cache are faster than those that require cache.
Transformation Filter Filter Expressions
Try to create the filter (true/false) answer inside a
port expression upstream. Complex filter
expressions slowdown the mapping. Again,
expressions/conditions operate fastest in an
Expression with an output port for the result.
Turns out - the longer the expression, or the more
complex - the more severe the speed degradation.
Place the actual expression (complex or not) in an
EXPRESSION upstream from the filter. Compute
a single numerical flag: 1 for true, 0 for false as an
output port. Pump this in to the filter - we would
see the maximum performance ability with this
configuration.
Transformation Joiner Choosing Master Source
The table with the lesser number of rows should be
the driving/master table for a faster join.
Transformation Joiner Source with duplicate
Designate the master source as the source with
fewer duplicate key values. When the Integration
Service processes a sorted Joiner transformation, it
caches rows for one hundred unique keys at a
time. If the master source contains many rows with
the same key value, the Integration Service must
cache more rows, and performance can be slowed.
Transformation Joiner Sort Source Data
To improve session performance, configure the
Joiner transformation to use sorted input. When
you configure the Joiner transformation to use
sorted data, the Integration Service improves
Performance by minimizing disk input and output.
You see the greatest performance improvement
when you work with large data sets. For an
unsorted Joiner transformation, designate the
source with fewer rows as the master source.
Transformation Source Qualifier
Using Source Filter
If Source is a DB use Source Qualifier to Filter
the data instead of using a Filter transformation
Transformation Aggregator Grouping and
Data
Aggregator transformations often slow
performance because they must group data before
processing it.
Aggregator transformations need additional
memory to hold intermediate group results.
Use the following guidelines to optimize the
performance of an Aggregator transformation:
-
Use numbers instead of string and dates in the
columns used for the GROUP BY.
Pass only the data which is required for
calculations, if data needs to be filtered and if it
can be done before aggregation filter it.
Limit the number of connected input/output or
output ports to reduce the amount of data the
Aggregator transformation stores in the data cache.
Transformation Aggregator Use sorted
input
The Sorted Input option decreases the use of
aggregate caches. When you use the Sorted Input
option, the Integration Service assumes all data is
sorted by group and it reads rows for a group then
it performs aggregate calculations.
When necessary, it stores group information in
memory.
The Sorted Input option reduces the amount of
data cached during the session and improves
performance. Use this option with the Source
Qualifier Number of Sorted Ports option or a
Sorter transformation to pass sorted data
to the Aggregator transformation.
You can increase performance when you use the
Sorted Input option in sessions with multiple
partitions
Transformation Aggregator incremental
aggregation
From source if we can capture only the
incremental data that affect less than half the
target, you can use incremental
Aggregation to optimize the performance of
Aggregator transformations.
When you use incremental aggregation, you apply
captured changes in the source to aggregate
calculations in a session.
The Integration Service updates the target
incrementally, rather than processing the entire
source and recalculating the same calculations
every time you run the session.
You can increase the index and data cache sizes to
hold all data in memory without paging to disk.