informatica best practices

7
Informatica Best Practices By Yogaraj Kathirvelu Based on Informatica Velocity Document and the Personal experience, Please feel free to drop a comment to improve the quality of the document.

Upload: yoga-kathirvelu

Post on 24-Nov-2015

40 views

Category:

Documents


0 download

DESCRIPTION

Informatica best practices can be followed during the development process to improve the performance and the quality of the code.

TRANSCRIPT

  • Informatica Best Practices

    By Yogaraj Kathirvelu

    Based on Informatica Velocity Document and the Personal experience, Please feel free to drop a

    comment to improve the quality of the document.

  • 1. General

    Category Sub Category Area Best Practices

    File Handling

    Source Extracts Flat File Type

    Loading data from Fixed-width files take less time than delimited, since delimited files require extra parsing. In case of Fixed width files, Integration service know the Start and End position of each column upfront and thus reduces the processing time.

    File Handling

    Source Extracts Flat File Location

    Using flat files located on the server machine loads faster than a database located on the server machine.

    General Transformations

    Number of Transformations

    Reduce the number of transformations. There is always overhead involved in moving data between transformations

    General Transformations Shared Memory

    Consider more shared memory for large number of transformations. Session shared memory between 12MB and 40MB should suffice.

    General Transformations Reusability

    Calculate once, use many times. Use both mapping variables and ports. For example Within an expression, use variable ports to calculate a value that can be used multiple times within that transformation.

    General Transformations Reusability

    Use mapplets to encapsulate multiple reusable transformations. Use mapplets to leverage the work of critical developers and minimize mistakes when performing similar functions.

    General Transformations Data Flow

    Map only required Ports between Transformations. Delete unnecessary links between transformations to minimize the amount of data moved, particularly in the Source Qualifier. Also, active transformations that reduce the number of records should be used as early in the mapping.

    General Transformations Data Type

    The engine automatically converts compatible types. Sometimes data conversion is excessive. Data types are automatically converted when types differ between connected ports. Minimize data type changes between transformations by planning data flow prior to developing the mapping.

  • General Transformations Default value on Ports

    Remove all "DEFAULT" value expressions where possible. Having a default value even the "ERROR (xxx)" command slows down the session. It causes an unnecessary evaluation of values for every data element in the map.

    General Session Tracing The Override Tracing of a session should be Normal or Terse.

  • 2. Transformation Specific

    Category

    Sub

    Category Area Best Practices

    Transformation Lookup Ports

    In lookup transformations, change unused ports to be neither input nor output. This makes the transformations cleaner looking. It also makes the generated SQL override as small as possible, which cuts down on the amount of cache necessary and thereby improves performance.

    Transformation Lookup Cache Size

    Caching is often faster on very large lookup tables, When your source is large, cache lookup table columns for those lookup tables of 500,000 rows or less. This is only true if the standard row byte count is 1,024 or less. This typically improves performance by 10 to 20 percent. Cache only lookup tables if the number of lookup calls is more than 10 to 20 percent of the lookup table rows.

    Transformation Lookup Lookup Condition

    When using a Lookup Table Transformation, If you include multiple conditions, enter the conditions in the following order to optimize lookup performance: Equal to (=) Less than () -less than or equal to (=) Not equal to (!=)

    Transformation Lookup Use Concurrent caches

    We can configure the session to build caches sequentially or concurrently. When you build sequential caches, the Integration Service creates caches as the source rows enter the Lookup transformation. In this case if source extraction takes more time then Integration service waits till it gets the first row which will slow down the performance. When you configure the session to build concurrent caches, the Integration Service does not wait for the first row to enter the Lookup transformation.

  • When Concurrent Caching enabled it does not need to wait for data to reach the Lookup transformation, even if we have multiple Lookup all the cache will be built Concurrent.

    If you configure the session to build

    concurrent caches for an unconnected Lookup

    transformation, the Integration Service

    ignores this setting and builds unconnected

    Lookup transformation caches sequentially

    Transformation Lookup

    dynamic caching with persistent cache

    For Large Lookup tables Cache the entire table to a persistent file on the first run, enable the "update else insert" option on the dynamic cache and the engine never has to go back to the database to read data from this table. You can also partition this persistent cache at run time for further performance gains.

    Transformation Lookup Multi Match

    On multiple matches, use the "Return any matching value" setting whenever possible. Also use this setting if the lookup is being performed to determine that a match exists, but the value returned is irrelevant. The lookup creates an index based on the key ports rather than all lookup transformation ports. This simplified indexing process can improve performance.

    Transformation Lookup Comparison Avoid date comparisons in lookup; replace with string or integer. Integer would be the most preferred option

    Transformation Sequence Generator

    Cache Value

    To optimize Sequence Generator transformations, create a reusable Sequence Generator and using it in multiple mappings simultaneously. Also, configure the Number of Cached Values property. The Number of Cached Values property determines the number of values the Integration Service caches at one time. Make sure that the Number of Cached Value is not too small. Consider configuring the Number of Cached Values to a value greater than 1,000.

  • If you do not have to cache values, set the Number of Cache Values to 0. Sequence Generator transformations that do not use cache are faster than those that require cache.

    Transformation Filter Filter Expressions

    Try to create the filter (true/false) answer inside a

    port expression upstream. Complex filter

    expressions slowdown the mapping. Again,

    expressions/conditions operate fastest in an

    Expression with an output port for the result.

    Turns out - the longer the expression, or the more

    complex - the more severe the speed degradation.

    Place the actual expression (complex or not) in an

    EXPRESSION upstream from the filter. Compute

    a single numerical flag: 1 for true, 0 for false as an

    output port. Pump this in to the filter - we would

    see the maximum performance ability with this

    configuration.

    Transformation Joiner Choosing Master Source

    The table with the lesser number of rows should be

    the driving/master table for a faster join.

    Transformation Joiner Source with duplicate

    Designate the master source as the source with

    fewer duplicate key values. When the Integration

    Service processes a sorted Joiner transformation, it

    caches rows for one hundred unique keys at a

    time. If the master source contains many rows with

    the same key value, the Integration Service must

    cache more rows, and performance can be slowed.

    Transformation Joiner Sort Source Data

    To improve session performance, configure the

    Joiner transformation to use sorted input. When

    you configure the Joiner transformation to use

    sorted data, the Integration Service improves

    Performance by minimizing disk input and output.

    You see the greatest performance improvement

    when you work with large data sets. For an

    unsorted Joiner transformation, designate the

    source with fewer rows as the master source.

    Transformation Source Qualifier

    Using Source Filter

    If Source is a DB use Source Qualifier to Filter

    the data instead of using a Filter transformation

    Transformation Aggregator Grouping and

    Data

    Aggregator transformations often slow

    performance because they must group data before

    processing it.

    Aggregator transformations need additional

    memory to hold intermediate group results.

    Use the following guidelines to optimize the

    performance of an Aggregator transformation:

  • Use numbers instead of string and dates in the

    columns used for the GROUP BY.

    Pass only the data which is required for

    calculations, if data needs to be filtered and if it

    can be done before aggregation filter it.

    Limit the number of connected input/output or

    output ports to reduce the amount of data the

    Aggregator transformation stores in the data cache.

    Transformation Aggregator Use sorted

    input

    The Sorted Input option decreases the use of

    aggregate caches. When you use the Sorted Input

    option, the Integration Service assumes all data is

    sorted by group and it reads rows for a group then

    it performs aggregate calculations.

    When necessary, it stores group information in

    memory.

    The Sorted Input option reduces the amount of

    data cached during the session and improves

    performance. Use this option with the Source

    Qualifier Number of Sorted Ports option or a

    Sorter transformation to pass sorted data

    to the Aggregator transformation.

    You can increase performance when you use the

    Sorted Input option in sessions with multiple

    partitions

    Transformation Aggregator incremental

    aggregation

    From source if we can capture only the

    incremental data that affect less than half the

    target, you can use incremental

    Aggregation to optimize the performance of

    Aggregator transformations.

    When you use incremental aggregation, you apply

    captured changes in the source to aggregate

    calculations in a session.

    The Integration Service updates the target

    incrementally, rather than processing the entire

    source and recalculating the same calculations

    every time you run the session.

    You can increase the index and data cache sizes to

    hold all data in memory without paging to disk.