complete reference to informatica

67
Introduction ETL Life Cycle The typical real-life ETL cycle consists of the following execution steps: 1. Cycle initiation 2. Build reference data 3. Extract (from sources) 4. Validate 5. Transform (clean, apply business rules, check for data integrity, create aggregates or disaggregates) 6. Stage (load into staging tables, if used) 7. Audit reports (for example, on compliance with business rules. Also, in case of failure, helps to diagnose/repair) 8. Publish (to target tables) 9. Archive 10. Clean up Best practices Four-layered approach for ETL architecture design Functional layer: Core functional ETL processing (extract, transform, and load). Operational management layer: Job-stream definition and management, parameters, scheduling, monitoring, communication and alerting. Audit, balance and control (ABC) layer: Job-execution statistics, balancing and controls, rejects- and error-handling, codes management. Utility layer: Common components supporting all other layers. Use file-based ETL processing where possible Storage costs relatively little Intermediate files serve multiple purposes: Used for testing and debugging Used for restart and recover processing Used to calculate control statistics Helps to reduce dependencies - enables modular programming. Allows flexibility for job-execution and -scheduling Better performance if coded properly, and can take advantage of parallel processing capabilities when the need arises. Use data-driven methods and minimize custom ETL coding Parameter-driven jobs, functions, and job-control Code definitions and mapping in database Consideration for data-driven tables to support more complex code-mappings and business-rule application. Qualities of a good ETL architecture design : Performance Scalable Migratable Recoverable (run_id, ...) Operable (completion-codes for phases, re-running from checkpoints, etc.) Auditable (in two dimensions: business requirements and technical troubleshooting) What is Informatica Informatica Power Center is a powerful ETL tool from Informatica Corporation.

Upload: hemanthgaddam

Post on 01-Dec-2015

1.360 views

Category:

Documents


6 download

DESCRIPTION

Informatica Material

TRANSCRIPT

Page 1: Complete Reference to Informatica

Introduction

ETL Life Cycle

The typical real-life ETL cycle consists of the following execution steps:

1. Cycle initiation

2. Build reference data

3. Extract (from sources)

4. Validate

5. Transform (clean, apply business rules, check for data integrity, create aggregates or disaggregates)

6. Stage (load into staging tables, if used)

7. Audit reports (for example, on compliance with business rules. Also, in case of failure, helps to diagnose/repair)

8. Publish (to target tables)

9. Archive

10. Clean up

Best practices

Four-layered approach for ETL architecture design

Functional layer: Core functional ETL processing (extract, transform, and load).

Operational management layer: Job-stream definition and management, parameters, scheduling, monitoring, communication and alerting.

Audit, balance and control (ABC) layer: Job-execution statistics, balancing and controls, rejects- and error-handling, codes management.

Utility layer: Common components supporting all other layers.

Use file-based ETL processing where possible

Storage costs relatively little

Intermediate files serve multiple purposes:

Used for testing and debugging

Used for restart and recover processing

Used to calculate control statistics

Helps to reduce dependencies - enables modular programming.

Allows flexibility for job-execution and -scheduling

Better performance if coded properly, and can take advantage of parallel processing capabilities when the need arises.

Use data-driven methods and minimize custom ETL coding

Parameter-driven jobs, functions, and job-control

Code definitions and mapping in database

Consideration for data-driven tables to support more complex code-mappings and business-rule application.

Qualities of a good ETL architecture design :

Performance

Scalable

Migratable

Recoverable (run_id, ...)

Operable (completion-codes for phases, re-running from checkpoints, etc.)

Auditable (in two dimensions: business requirements and technical troubleshooting)

What is Informatica

Informatica Power Center is a powerful ETL tool from Informatica Corporation.

Informatica Corporation products are:

Informatica Power Center

Informatica on demand

Informatica B2B Data Exchange

Informatica Data Quality

Informatica Data Explorer

Informatica Power Center is a single, unified enterprise data integration platform for accessing, discovering, and integrating data from virtually any business system, in any format, and delivering that data throughout the enterprise at any speed.

Informatica Power Center Editions :

Because every data integration project is different and includes many variables such as data volumes, latency requirements, IT infrastructure, and methodologies—Informatica offers three Power Center Editions and a suite of Power Center Options to meet your project’s and organization’s specific needs.

Page 2: Complete Reference to Informatica

Standard Edition

Real Time Edition

Advanced Edition

Informatica Power Center Standard Edition:

Power Center Standard Edition is a single, unified enterprise data integration platform for discovering, accessing, and integrating data from virtually any business system, in any format, and delivering that data throughout the enterprise to improve operational efficiency.

Key features include:

A high-performance data integration server

A global metadata infrastructure

Visual tools for development and centralized administration

Productivity tools to facilitate collaboration among architects, analysts, and developers .

Informatica Power Center Real Time Edition :

Packaged for simplicity and flexibility, Power Center Real Time Edition extends Power Center Standard Edition with additional capabilities for integrating and provisioning transactional or operational data in real-time. Power Center Real Time Edition provides the ideal platform for developing sophisticated data services and delivering timely information as a service, to support all business needs. It provides the perfect real-time data integration complement to service-oriented architectures, application integration approaches, such as enterprise application integration (EAI), enterprise service buses (ESB), and business process management (BPM).

Key features include:

Change data capture for relational data sources

Integration with messaging systems

Built-in support for Web services

Dynamic partitioning with data smart parallelism

Process orchestration and human workflow capabilities

Informatica Power Center Real Time Edition :

Power Center Advanced Edition addresses requirements for organizations that are Standardizing data integration at an enterprise level, across a number of projects and  departments. It combines all the capabilities of Power Center Standard Edition and features additional capabilities that are ideal for data governance and Integration Competency Centers.

Key features include:

Dynamic partitioning with data smart parallelism

Powerful metadata analysis capabilities

Web-based data profiling and reporting capabilities

Power Center includes the following components:

Power Center domain

Administration Console

Power Center repository

Power Center Client

Repository Service

Integration Service

Web Services Hub

SAP BW Service

Data Analyzer

Metadata Manager

Power Center Repository Reports

POWERCENTER CLIENT

The Power Center Client consists of the following applications that we use to manage the repository, design mappings, mapplets, and create sessions to load the data:

1. Designer

2. Data Stencil

3. Repository Manager

4. Workflow Manager

5. Workflow Monitor

1. Designer:

Use the Designer to create mappings that contain transformation instructions for the Integration Service.

The Designer has the following tools that you use to analyze sources, design target Schemas, and build source-to-target mappings:

Page 3: Complete Reference to Informatica

 Source Analyzer: Import or create source definitions.

 Target Designer: Import or create target definitions.

 Transformation Developer: Develop transformations to use in mappings.

You can also develop user-defined functions to use in expressions.

 Mapplet Designer: Create sets of transformations to use in mappings.

 Mapping Designer: Create mappings that the Integration Service uses to Extract, transform, and load data.

2.Data Stencil

Use the Data Stencil to create mapping template that can be used to generate multiple mappings. Data Stencil uses the Microsoft Office Visio interface to create mapping templates. Not used by a developer usually.

3.Repository Manager

Use the Repository Manager to administer repositories. You can navigate through multiple folders and repositories, and complete the following tasks:

Manage users and groups: Create, edit, and delete repository users and User groups. We can assign and revoke repository privileges and folder Permissions.

Perform folder functions: Create, edit, copy, and delete folders. Work we perform in the Designer and Workflow Manager is stored in folders. If we want to share metadata, you can configure a folder to be shared.

View metadata: Analyze sources, targets, mappings, and shortcut dependencies, search by keyword, and view the properties of repository Objects. We create repository objects using the Designer and Workflow Manager Client tools.

We can view the following objects in the Navigator window of the Repository Manager:

Source definitions: Definitions of database objects (tables, views, synonyms) or Files that provide source data.

Target definitions: Definitions of database objects or files that contain the target data.

Mappings: A set of source and target definitions along with transformations containing business logic that you build into the transformation. These are the instructions that the Integration Service uses to transform and move data.

Reusable transformations: Transformations that we use in multiple mappings.

Mapplets: A set of transformations that you use in multiple mappings.

Sessions and workflows: Sessions and workflows store information about how and When the Integration Service moves data. A workflow is a set of instructions that Describes how and when to run tasks related to extracting, transforming, and loading Data. A session is a type of task that you can put in a workflow. Each session Corresponds to a single mapping.

4.Workflow Manager :

Use the Workflow Manager to create, schedule, and run workflows. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data.

The Workflow Manager has the following tools to help us develop a workflow:

Task Developer: Create tasks we want to accomplish in the workflow.

Work let Designer: Create a worklet in the Worklet Designer. A worklet is an object that groups a set of tasks. A worklet is similar to a workflow, but without scheduling information. We can nest worklets inside a workflow.

Workflow Designer: Create a workflow by connecting tasks with links in the Workflow Designer. You can also create tasks in the Workflow Designer as you develop the workflow.

When we create a workflow in the Workflow Designer, we add tasks to the workflow. The Workflow Manager includes tasks, such as the Session task, the Command task, and the Email task so you can design a workflow. The Session task is based on a mapping we build in the Designer.

Page 4: Complete Reference to Informatica

We then connect tasks with links to specify the order of execution for the tasks we created. Use conditional links and workflow variables to create branches in the workflow.

5.Workflow Monitor

Use the Workflow Monitor to monitor scheduled and running workflows for each Integration Service. We can view details about a workflow or task in Gantt chart view or Task view. We Can run, stop, abort, and resume workflows from the Workflow Monitor. We can view Sessions and workflow log events in the Workflow Monitor Log Viewer.

The Workflow Monitor displays workflows that have run at least once. The Workflow Monitor continuously receives information from the Integration Service and Repository Service. It also fetches information from the repository to display historic Information.

Services Behind Scene

INTEGRATION SERVICE PROCESS

The Integration Service starts an Integration Service process to run and monitor workflows. The Integration Service process accepts requests from the Power Center Client and from pmcmd. It performs the following tasks:

Manages workflow scheduling.

Locks and reads the workflow.

Reads the parameter file.

Creates the workflow log.

Runs workflow tasks and evaluates the conditional links connecting tasks.

Starts the DTM process or processes to run the session.

Writes historical run information to the repository.

Sends post-session email in the event of a DTM failure.

LOAD BALANCER

The Load Balancer is a component of the Integration Service that dispatches tasks to achieve optimal performance and scalability. When we run a workflow, the Load Balancer dispatches the Session, Command, and predefined Event-Wait tasks within the workflow.

The Load Balancer dispatches tasks in the order it receives them. When the Load Balancer needs to dispatch more Session and Command tasks than the Integration Service can run, it places the tasks it cannot run in a queue. When nodes become available, the Load Balancer dispatches tasks from the queue in the order determined by the workflow service level.

DTM PROCESS

When the workflow reaches a session, the Integration Service process starts the DTM process. The DTM is the process associated with the session task. The DTM process performs the following tasks:

Retrieves and validates session information from the repository.

Performs pushdown optimization when the session is configured for pushdown optimization.

Adds partitions to the session when the session is configured for dynamic partitioning.

Expands the service process variables, session parameters, and mapping variables and parameters.

Creates the session log.

Validates source and target code pages.

Verifies connection object permissions.

Runs pre-session shell commands, stored procedures, and SQL.

Sends a request to start worker DTM processes on other nodes when the session is configured to run on a grid.

Creates and run mapping, reader, writer, and transformation threads to extract, transform, and load data.

Runs post-session stored procedures, SQL, and shell commands.

Sends post-session email.

PROCESSING THREADS

The DTM allocates process memory for the session and divides it into buffers. This is also known as buffer memory. The default memory allocation is 12,000,000 bytes.

The DTM uses multiple threads to process data in a session. The main DTM thread is called the master thread.

The master thread can create the following types of threads:

 Mapping Threads: One mapping thread for each session.

 Pre- and Post-Session Threads: One thread created.

 Reader Threads: One thread for each partition

 Transformation Threads: One thread for each partition

 Writer Threads: One thread for each partition

CODE PAGES and DATA MOVEMENT

A code page contains the encoding to specify characters in a set of one or more languages. An encoding is the assignment of a number to a character in the character set.

The Integration Service can move data in either ASCII or Unicode data movement mode. These modes determine how the Integration Service handles character data.

Page 5: Complete Reference to Informatica

We choose the data movement mode in the Integration Service configuration settings. If we want to move multi byte data, choose Unicode data movement mode.

ASCII Data Movement Mode: In ASCII mode, the Integration Service recognizes 7-bit ASCII and EBCDIC characters and stores each character in a single byte.

Unicode Data Movement Mode: Use Unicode data movement mode when sources or targets use 8-bit or multi byte character sets and contain character data.

Try U R Hand's on Admin-Console

Repository Manager Tasks:

Add domain connection information

Add and connect to a repository

Work with Power Center domain and repository connections

Search for repository objects or keywords

View object dependencies

Compare repository objects

Truncate session and workflow log entries

View user connections

Release locks

Exchange metadata with other business intelligence tools

Add a repository to the Navigator, and then configure the domain connection information when we connect to the repository.

1.Adding a Repository to the Navigator :

1. In any of the Power Center Client tools, click Repository > Add.

2. Enter the name of the repository and a valid repository user name.

3. Click OK.

Before we can connect to the repository for the first time, we must configure the Connection information for the domain that the repository belongs to.

2.Configuring a Domain Connection

1. In a Power Center Client tool, select the Repositories node in the Navigator.

2. Click Repository > Configure Domains to open the Configure Domains dialog box.

3. Click the Add button. The Add Domain dialog box appears.

4. Enter the domain name, gateway host name, and gateway port number.

5. Click OK to add the domain connection.

3.Connecting to a Repository

1. Launch a Power Center Client tool.

2. Select the repository in the Navigator and click Repository > Connect, or double-click the repository.

3. Enter a valid repository user name and password.

4. Click Connect.

         Click on more button to add, change or view domain information.

Page 6: Complete Reference to Informatica

4.Viewing Object Dependencies

Before we change or delete repository objects, we can view dependencies to see the impact on other objects. For example, before you remove a session, we can find out which workflows use the session. We can view dependencies for repository objects in the Repository Manager, Workflow Manager, and Designer tools.

Steps:

1. Connect to the repository.

2. Select the object of use in navigator.

3. Click Analyze and Select the dependency we want to view.

5.Validating Multiple Objects

We can validate multiple objects in the repository without fetching them into the workspace. We can save and optionally check in objects that change from invalid to valid status as a result of the validation. We can validate sessions, mappings, mapplets, workflows, and worklets.

Steps:

1. Select the objects you want to validate.

2. Click Analyze and Select Validate

3. Select validation options from the Validate Objects dialog box

4. Click Validate.

5. Click a link to view the objects in the results group.

6.Comparing Repository Objects

We can compare two repository objects of the same type to identify differences between the objects. For example, we can compare two sessions to check for differences. When we compare two objects, the Repository Manager displays their attributes.

Steps:

1. In the Repository Manager, connect to the repository.

2. In the Navigator, select the object you want to compare.

3. Click Edit > Compare Objects.

4. Click Compare in the dialog box displayed.

7.Truncating Workflow and Session Log Entries

When we configure a session or workflow to archive session logs or workflow logs, the Integration Service saves those logs in local directories. The repository also creates an entry for each saved workflow log and session log. If we move or delete a session log or workflow log from the workflow log directory or session log directory, we can remove the entries from the repository.

Steps:

1. In the Repository Manager, select the workflow in the Navigator window or in the Main window.

2. Choose Edit > Truncate Log. The Truncate Workflow Log dialog box appears.

3. Choose to delete all workflow and session log entries or to delete all workflow and session log entries with an end time before a particular date.

4. If you want to delete all entries older than a certain date, enter the date and time.

5. Click OK.

8.Managing User Connections and Locks

In the Repository Manager, we can view and manage the following items:

Repository object locks: The repository locks repository objects and folders by user. The repository creates different types of locks depending on the task. The Repository Service locks and unlocks all objects in the repository.

User connections: Use the Repository Manager to monitor user connections to the repository. We can end connections when necessary.

Types of locks created:

1. In-use lock: Placed on objects we want to view

2. Write-intent lock: Placed on objects we want to modify.

3. Execute lock: Locks objects we want to run, such as workflows and sessions

Steps:

1. Launch the Repository Manager and connect to the repository.

2. Click Edit > Show User Connections or Show locks

3. The locks or user connections will be displayed in a window.

4. We can do the rest as per our need.

9.Managing Users and Groups

1. In the Repository Manager, connect to a repository.

2. Click Security > Manage Users and Privileges.

Page 7: Complete Reference to Informatica

3. Click the Groups tab to create Groups. or

4. Click the Users tab to create Users

5. Click the Privileges tab to give permissions to groups and users.

6. Select the options available to add, edit, and remove users and groups.

There are two default repository user groups:

Administrators: This group initially contains two users that are created by default.

           The default users are Administrator and the database user that created the repository. We cannot delete these users from the repository or remove them from the Administrators group.

Public: The Repository Manager does not create any default users in the Public group.

10 Working with Folders

We can create, Edit or delete folder as per our need.

1. In the Repository Manager, connect to a repository.

2. Click Folder > Create.

Enter the following information:

3. Click ok.

Difference Between 7.1 and 8.6

1. Target from Transformation: In Informatica 8X we can create target from Transformation by dragging transformation in Target designer

2. Pushdown optimization: Uses increased performance by pushing Transformation logic to the database by analyzing the transformations and Issuing SQL statements to sources and targets. Only processes any Transformation logic that it cannot push to the database.

3. New function in expression editor: New function have been introduced in Informatica 8X like reg_extract and reg_match

4. Repository query available in both versioned and non versioned Repositories previously it was available only for versioned repository.

5. UDF (User defined function) similar to macro in excel

6. FTP: We can have partitioned FTP targets and Indirect FTP File source (with file list).

7. Propagating Port Descriptions: In Informatica 8 we can edit a port description and propagate the description to other transformations in the mapping.

8. Environment SQL Enhancements: Environment SQL can still be used to Execute an SQL statement at start of connection to the database. We can Use SQL commands that depend upon a transaction being opened during The entire read or write process. For example, the following SQL command Modifies how the session handles characters: Alter session set

NLS_DATE_FORMAT='DD/MM/YYYY';".

9. Concurrently write to multiple files in a session with partitioned targets.

10. Flat File Enhancements:

Reduced conversion of data types

Delimited file performance has improved

Flat file now can have integer and double data types

Data can be appended to existing flat files

Informatica power center 8 is having the following features which makes it more powerful, easy to use and manage when compared to previous versions.

Supports Service oriented architecture

Access to structured, unstructured and semi structured data

Support for grid computing

High availability

Page 8: Complete Reference to Informatica

Pushdown optimization

Dynamic partitioning

Metadata exchange enhancements

Team based Development

Global Web-based Admin console

New transformations

23 New functions

User defined functions

Custom transformation enhancements

Flat file enhancements

New Data Federation option

Enterprise GRID

Testing

Unit Testing

Unit testing can be broadly classified into 2 categories.

Quantitative Testing

Validate your Source and Target

a) Ensure that your connectors are configured properly.

b) If you are using flat file make sure have enough read/write permission on the file share.

c) You need to document all the connector information.

Analyze the Load Time

a) Execute the session and review the session statistics.

b) Check the Read and Write counters. How long it takes to perform the load.

c) Use the session and workflow logs to capture the load statistics.

d) You need to document all the load timing information.

Analyze the success rows and rejections.

a) Have customized SQL queries to check the source/targets and here we will perform the Record Count Verification.

b) Analyze the rejections and build a process to handle those rejections. This requires a clear business requirement from the business on how to handle the data rejections. Do we need to reload or reject and inform etc? Discussions are required and appropriate process must be developed.

Performance Improvement

a) Network Performance

b) Session Performance

c) Database Performance

d) Analyze and if required define the Informatica and DB partitioning requirements.

Qualitative Testing

Analyze & validate your transformation business rules. More of functional testing.

e) You need review field by field from source to target and ensure that the required transformation logic is applied.

f) If you are making changes to existing mappings make use of the data lineage feature Available with Informatica Power Center. This will help you to find the consequences of Altering or deleting a port from existing mapping.

g) Ensure that appropriate dimension lookup’s have been used and your development is in Sync with your business requirements.

Integration Testing

After unit testing is complete; it should form the basis of starting integration testing. Integration testing should

Test out initial and incremental loading of the data warehouse.

Integration testing will involve following

1. Sequence of ETL jobs in batch.

2. Initial loading of records on data warehouse.

3. Incremental loading of records at a later date to verify the newly inserted or updated data.

4. Testing the rejected records that don’t fulfill transformation rules.

5. Error log generation.

Integration Testing would cover End-to-End Testing for DWH. The coverage of the tests would include the below:

Count Validation

Record Count Verification: DWH backend/Reporting queries against source and target as an initial check.

Page 9: Complete Reference to Informatica

Control totals: To ensure accuracy in data entry and processing, control totals can be compared by the system with manually entered or otherwise calculated control totals using the data fields such as quantities, line items, documents, or dollars, or simple record counts

Hash totals: This is a technique for improving data accuracy, whereby totals are obtained on identifier fields (i.e., fields for which it would logically be meaningless to construct a total), such as account number, social security number, part number, or employee number. These totals have no significance other than for internal system control purposes.

Limit checks: The program tests specified data fields against defined high or low value limits (e.g., quantities or dollars) for acceptability before further processing.

Dimensional Analysis

Data integrity between the various source tables and relationships.

Statistical Analysis

Validation for various calculations.

When you validate the calculations you don’t require loading the entire rows into target and Validating it.

Instead you use the Enable Test Load feature available in Informatica Power Center.

Property Description

Enable Test Load You can configure the Integration Service to perform a test load.

With a test load, the Integration Service reads and transforms data without writing to targets. The Integration Service generates all session files, and performs all pre- and post-session Functions, as if running the full session.

The Integration Service writes data to relational targets, but rolls back the data when the session completes. For all other target types, such as flat file and SAP BW, the Integration Service does not write data to the targets.

Enter the number of source rows you want to test in the Number of Rows to Test field. You cannot perform a test load on sessions using XML sources. You can perform a test load for relational targets when you configure a session for normal Mode. If you configure the session for bulk mode, the session fails.

Number of Rows to Test Enter the number of source rows you want the Integration Service to test load. The Integration Service reads the number you configure for the test load.

Data Quality Validation

Check for missing data, negatives and consistency. Field-by-Field data verification can be done to check the consistency of source and target data.

Overflow checks: This is a limit check based on the capacity of a data field or data file area to accept data. This programming technique can be used to detect the truncation of a financial or quantity data field value after computation (e.g., addition, multiplication, and division). Usually, the first digit is the one lost.

Format checks: These are used to determine that data are entered in the proper mode, as numeric or alphabetical characters, within designated fields of information. The proper mode in each case depends on the data field definition.

Sign test: This is a test for a numeric data field containing a designation of an algebraic sign, + or - , which can be used to denote, for example, debits or credits for financial data fields.

Size test: This test can be used to test the full size of the data field. For example, a social security number in the United States should have nine digits

Granularity

Validate at the lowest granular level possible

Other validations

Audit Trails, Transaction Logs, Error Logs and Validity checks.

Note: Based on your project and business needs you might have additional testing requirements.

User Acceptance Test

In this phase you will involve the user to test the end results and ensure that business is satisfied with the quality of the data.

Any changes to the business requirement will follow the change management process and eventually those changes have to follow the SDLC process.

Optimize Development, Testing, and Training Systems

Dramatically accelerate development and test cycles and reduce storage costs by creating fully functional, smaller targeted data subsets for development, testing, and training systems, while maintaining full data integrity.

Quickly build and update nonproduction systems with a small subset of production data and replicate current subsets of nonproduction copies faster.

Simplify test data management and shrink the footprint of nonproduction systems to significantly reduce IT infrastructure and maintenance costs.

Reduce application and upgrade deployment risks by properly testing configuration updates with up-to-date, realistic data before introducing them into production .

Easily customize provisioning rules to meet each organization’s changing business requirements.

Page 10: Complete Reference to Informatica

Lower training costs by standardizing on one approach and one infrastructure.

Train employees effectively using reliable, production-like data in training systems.

Support Corporate Divestitures and Reorganizations

Untangle complex operational systems and separate data along business lines to quickly build the divested organization’s system.

Accelerate the provisioning of new systems by using only data that’s relevant to the divested organization.

Decrease the cost and time of data divestiture with no reimplementation costs .

Reduce the Total Cost of Storage Ownership

Dramatically increase an IT team’s productivity by reusing a comprehensive list of data objects for data selection and updating processes across multiple projects, instead of coding by hand—which is expensive, resource intensive, and time consuming .

Accelerate application delivery by decreasing R&D cycle time and streamlining test data management.

Improve the reliability of application delivery by ensuring IT teams have ready access to updated quality production data.

Lower administration costs by centrally managing data growth solutions across all packaged and custom applications.

Substantially accelerate time to value for subsets of packaged applications.

Decrease maintenance costs by eliminating custom code and scripting.

Informatica Power Center Testing

Debugger: Very useful tool for debugging a valid mapping to gain troubleshooting information about data and error conditions. Refer Informatica documentation to know more about debugger tool.

Test Load Options – Relational Targets.

Running the Integration Service in Safe Mode

Test a development environment. Run the Integration Service in safe mode to test a development environment before migrating to production

Troubleshoot the Integration Service. Configure the Integration Service to fail over in safe mode and troubleshoot errors when you migrate or test a production environment configured for high availability. After the Integration Service fails over in safe mode, you can correct the error that caused the Integration Service to fail over.

Syntax Testing: Test your customized queries using your source qualifier before executing the session. Performance Testing for identifying the following bottlenecks:

Target

Source

Mapping

Session

System

Use the following methods to identify performance bottlenecks:

Run test sessions. You can configure a test session to read from a flat file source or to write to a flat file target to identify source and target bottlenecks.

Analyze performance details. Analyze performance details, such as performance counters, to determine where session performance decreases.

Analyze thread statistics. Analyze thread statistics to determine the optimal number of partition points.

Monitor system performance. You can use system monitoring tools to view the percentage of CPU use, I/O waits, and paging to identify system bottlenecks. You can also use the Workflow Monitor to view system resource usage. Use Power Center conditional filter in the Source Qualifier to improve performance.

Share metadata. You can share metadata with a third party. For example, you want to send a mapping to someone else for testing or analysis, but you do not want to disclose repository connection information for security reasons. You can export the mapping to an XML file and edit the repository connection information before sending the XML file. The third party can import the mapping from the XML file and analyze the metadata.

Debugger

You can debug a valid mapping to gain troubleshooting information about data and error conditions. To debug a mapping, you configure and run the Debugger from within the Mapping Designer. The Debugger uses a session to run the mapping on the Integration Service. When you run the Debugger, it pauses at breakpoints and you can view and edit transformation output data.

You might want to run the Debugger in the following situations:

Before you run a session. After you save a mapping, you can run some initial tests with a debug session before you create and configure a session in the Workflow Manager.

After you run a session. If a session fails or if you receive unexpected results in the target, you can run the Debugger against the session. You might also want to run the Debugger against a session if you want to debug the mapping using the configured session properties.

Page 11: Complete Reference to Informatica

Debugger Session Types:

You can select three different debugger session types when you configure the Debugger. The Debugger runs a workflow for each session type. You can choose from the following Debugger session types when you configure the Debugger:

Use an existing non-reusable session. The Debugger uses existing source, target, and session configuration properties. When you run the Debugger, the Integration Service runs the non-reusable session and the existing workflow. The Debugger does not suspend on error.

Use an existing reusable session. The Debugger uses existing source, target, and session configuration properties. When you run the Debugger, the Integration Service runs a debug instance of the reusable session And creates and runs a debug workflow for the session.

Create a debug session instance. You can configure source, target, and session configuration properties through the Debugger Wizard. When you run the Debugger, the Integration Service runs a debug instance of the debug workflow and creates and runs a debug workflow for the session.

Debug Process

To debug a mapping, complete the following steps:

1. Create breakpoints. Create breakpoints in a mapping where you want the Integration Service to evaluate data and error conditions.

2. Configure the Debugger. Use the Debugger Wizard to configure the Debugger for the mapping. Select the session type the Integration Service uses when it runs the Debugger. When you create a debug session, you configure a subset of session properties within the Debugger Wizard, such as source and target location. You can also choose to load or discard target data.

3. Run the Debugger. Run the Debugger from within the Mapping Designer. When you run the Debugger, the Designer connects to the Integration Service. The Integration Service initializes the Debugger and runs the debugging session and workflow. The Integration Service reads the breakpoints and pauses the Debugger

when the breakpoints evaluate to true.

4. Monitor the Debugger. While you run the Debugger, you can monitor the target data, transformation and mapplet output data, the debug log, and the session log. When you run the Debugger, the Designer displays the following windows:

Debug log. View messages from the Debugger.

Target window. View target data.

Instance window. View transformation data.

Page 12: Complete Reference to Informatica

5. Modify data and breakpoints. When the Debugger pauses, you can modify data and see the effect on transformations, mapplets, and targets as the data moves through the pipeline. You can also modify breakpoint information.

The Designer saves mapping breakpoint and Debugger information in the workspace files. You can copy breakpoint information and the Debugger configuration to another mapping. If you want to run the Debugger from another Power Center Client machine, you can copy the breakpoint information and the Debugger configuration to the other Power Center Client machine.

Running the Debugger:

When you complete the Debugger Wizard, the Integration Service starts the session and initializes the Debugger. After initialization, the Debugger moves in and out of running and paused states based on breakpoints and commands that you issue from the Mapping Designer. The Debugger can be in one of the following states:

 Initializing. The Designer connects to the Integration Service.

 Running. The Integration Service processes the data.

 Paused. The Integration Service encounters a break and pauses the Debugger.

Note: To enable multiple users to debug the same mapping at the same time, each user must configure different port numbers in the Tools > Options > Debug tab.

The Debugger does not use the high availability functionality.

Monitoring the Debugger :

When you run the Debugger, you can monitor the following information:

 Session status. Monitor the status of the session.

 Data movement. Monitor data as it moves through transformations.

Page 13: Complete Reference to Informatica

 Breakpoints. Monitor data that meets breakpoint conditions.

 Target data. Monitor target data on a row-by-row basis.

The Mapping Designer displays windows and debug indicators that help you monitor the session:

Debug indicators. Debug indicators on transformations help you follow breakpoints and data flow.

Instance window. When the Debugger pauses, you can view transformation data and row information in the Instance window.

Target window. View target data for each target in the mapping.

Output window. The Integration Service writes messages to the following tabs in the Output window:

Debugger tab. The debug log displays in the Debugger tab.

Session Log tab. The session log displays in the Session Log tab.

Notifications tab. Displays messages from the Repository Service.

While you monitor the Debugger, you might want to change the transformation output data to see the effect on subsequent transformations or targets in the data flow. You might also want to edit or add more breakpoint information to monitor the session more closely.

Restrictions

You cannot change data for the following output ports:

Normalizer transformation. Generated Keys and Generated Column ID ports.

 Rank transformation. RANKINDEX port.

 Router transformation. All output ports.

 Sequence Generator transformation. CURRVAL and NEXTVAL ports.

 Lookup transformation. NewLookupRow port for a Lookup transformation configured to use a dynamic cache.

 Custom transformation. Ports in output groups other than the current output group.

 Java transformation. Ports in output groups other than the current output group.

Additionally, you cannot change data associated with the following:

Mapplets that are not selected for debugging

Input or input/output ports

Output ports when the Debugger pauses on an error breakpoint

Constraint-Based Loading:

In the Workflow Manager, you can specify constraint-based loading for a session. When you select this option, the Integration Service orders the target load on a row-by-row basis. For every row generated by an active source, the Integration Service loads the corresponding transformed row first to the primary key table, then to any foreign key tables. Constraint-based loading depends on the following requirements:

Active source. Related target tables must have the same active source.

Key relationships. Target tables must have key relationships.

Target connection groups. Targets must be in one target connection group.

Treat rows as insert. Use this option when you insert into the target. You cannot use updates with constraint based loading.

Active Source:

When target tables receive rows from different active sources, the Integration Service reverts to normal loading for those tables, but loads all other targets in the session using constraint-based loading when possible. For example, a mapping contains three distinct pipelines. The first two contain a source, source qualifier, and target. Since these two targets receive data from different active sources, the Integration Service reverts to normal loading for both targets. The third pipeline contains a source, Normalizer, and two targets. Since these two targets share a single active source (the Normalizer), the Integration Service performs constraint-based loading: loading the primary key table first, then the foreign key table.

Key Relationships:

When target tables have no key relationships, the Integration Service does not perform constraint-based loading.

Similarly, when target tables have circular key relationships, the Integration Service reverts to a normal load. For example, you have one target containing a primary key and a foreign key related to the primary key in a second target. The second target also contains a foreign key that references the primary key in the first target. The Integration Service cannot enforce constraint-based loading for these tables. It reverts to a normal load.

Target Connection Groups:

The Integration Service enforces constraint-based loading for targets in the same target connection group. If you want to specify constraint-based loading for multiple targets that receive data from the same active source, you must verify the tables are in the same target connection group. If the tables with the primary key-foreign key relationship are in different target connection groups, the Integration Service cannot enforce constraint-based loading when you run the workflow. To verify that all targets are in the same target connection group, complete the following tasks:

Verify all targets are in the same target load order group and receive data from the same active source.

Use the default partition properties and do not add partitions or partition points.

Define the same target type for all targets in the session properties.

Page 14: Complete Reference to Informatica

Define the same database connection name for all targets in the session properties.

Choose normal mode for the target load type for all targets in the session properties.

Treat Rows as Insert:

Use constraint-based loading when the session option Treat Source Rows As is set to insert. You might get inconsistent data if you select a different Treat Source Rows As option and you configure the session for constraint-based loading.

When the mapping contains Update Strategy transformations and you need to load data to a primary key table first, split the mapping using one of the following options:

Load primary key table in one mapping and dependent tables in another mapping. Use constraint-based loading to load the primary table.

Perform inserts in one mapping and updates in another mapping.

Constraint-based loading does not affect the target load ordering of the mapping. Target load ordering defines the order the Integration Service reads the sources in each target load order group in the mapping. A target load order group is a collection of source qualifiers, transformations, and targets linked together in a mapping. Constraint based loading establishes the order in which the Integration Service loads individual targets within a set of targets receiving data from a single source qualifier.

Example

The following mapping is configured to perform constraint-based loading:

In the first pipeline, target T_1 has a primary key, T_2 and T_3 contain foreign keys referencing the T1 primary key. T_3 has a primary key that T_4 references as a foreign key.

Since these tables receive records from a single active source, SQ_A, the Integration Service loads rows to the target in the following order:

1. T_1

2. T_2 and T_3 (in no particular order)

3. T_4

The Integration Service loads T_1 first because it has no foreign key dependencies and contains a primary key referenced by T_2 and T_3. The Integration Service then loads T_2 and T_3, but since T_2 and T_3 have no dependencies, they are not loaded in any particular order. The Integration Service loads T_4 last, because it has a foreign key that references a primary key in T_3.After loading the first set of targets, the Integration Service begins reading source B. If there are no key relationships between T_5 and T_6, the Integration Service reverts to a normal load for both targets.

If T_6 has a foreign key that references a primary key in T_5, since T_5 and T_6 receive data from a single active source, the Aggregator AGGTRANS, the Integration Service loads rows to the tables in the following order:

T_5

T_6

T_1, T_2, T_3, and T_4 are in one target connection group if you use the same database connection for each target, and you use the default partition properties. T_5 and T_6 are in another target connection group together if you use the same database connection for each target and you use the default partition properties. The Integration Service includes T_5 and T_6 in a different target connection group because they are in a different target load order group from the first four targets.

Enabling Constraint-Based Loading:

When you enable constraint-based loading, the Integration Service orders the target load on a row-by-row basis. To enable constraint-based loading:

1. In the General Options settings of the Properties tab, choose Insert for the Treat Source Rows As property.

2. Click the Config Object tab. In the Advanced settings, select Constraint Based Load Ordering.

3. Click OK.

Target Load Order

When you use a mapplet in a mapping, the Mapping Designer lets you set the target load plan for sources within the mapplet.

Page 15: Complete Reference to Informatica

Setting the Target Load Order

You can configure the target load order for a mapping containing any type of target definition. In the Designer, you can set the order in which the Integration Service sends rows to targets in different target load order groups in a mapping. A target load order group is the collection of source qualifiers, transformations, and targets linked together in a mapping. You can set the target load order if you want to maintain referential integrity when inserting, deleting, or updating tables that have the primary key and foreign key constraints.

The Integration Service reads sources in a target load order group concurrently, and it processes target load order groups sequentially.

To specify the order in which the Integration Service sends data to targets, create one source qualifier for each target within a mapping. To set the target load order, you then determine in which order the Integration Service reads each source in the mapping.

The following figure shows two target load order groups in one mapping:

In this mapping, the first target load order group includes ITEMS, SQ_ITEMS, and T_ITEMS. The second target load order group includes all other objects in the mapping, including the TOTAL_ORDERS target. The Integration Service processes the first target load order group, and then the second target load order group.

When it processes the second target load order group, it reads data from both sources at the same time.

To set the target load order:

1. Create a mapping that contains multiple target load order groups.

2. Click Mappings > Target Load Plan.

3. The Target Load Plan dialog box lists all Source Qualifier transformations in the mapping and the targets that receive data from each source qualifier.

4. Select a source qualifier from the list.

5. Click the Up and Down buttons to move the source qualifier within the load order.

6. Repeat steps 3 to 4 for other source qualifiers you want to reorder. Click OK.

Advanced Concepts

MAPPING PARAMETERS & VARIABLES

Mapping parameters and variables represent values in mappings and mapplets.

When we use a mapping parameter or variable in a mapping, first we declare the mapping parameter or variable for use in each mapplet or mapping. Then, we define a value for the mapping parameter or variable before we run the session.

MAPPING PARAMETERS

A mapping parameter represents a constant value that we can define before running a session.

A mapping parameter retains the same value throughout the entire session.

Example: When we want to extract records of a particular month during ETL process, we will create a Mapping Parameter of data type and use it in query to compare it with the timestamp field in SQL override.

After we create a parameter, it appears in the Expression Editor.

We can then use the parameter in any expression in the mapplet or mapping.

We can also use parameters in a source qualifier filter, user-defined join, or extract override, and in the Expression Editor of reusable transformations.

MAPPING VARIABLES

Unlike mapping parameters, mapping variables are values that can change between sessions.

The Integration Service saves the latest value of a mapping variable to the repository at the end of each successful session.

We can override a saved value with the parameter file.

We can also clear all saved values for the session in the Workflow Manager.

We might use a mapping variable to perform an incremental read of the source. For example, we have a source table containing time stamped transactions and we want to evaluate the transactions on a daily basis. Instead of manually entering a session override to filter source data each time we run the session, we can create a mapping variable, $$IncludeDateTime. In the source qualifier, create a filter to read only rows whose transaction date equals $$IncludeDateTime, such as:

TIMESTAMP = $$IncludeDateTime

Page 16: Complete Reference to Informatica

In the mapping, use a variable function to set the variable value to increment one day each time the session runs. If we set the initial value of $$IncludeDateTime to 8/1/2004, the first time the Integration Service runs the session, it reads only rows dated 8/1/2004. During the session, the Integration Service sets $$IncludeDateTime to 8/2/2004. It saves 8/2/2004 to the repository at the end of the session. The next time it runs the session, it reads only rows from August 2, 2004.

Used in following transformations:

Expression

Filter

Router

Update Strategy

Initial and Default Value:

When we declare a mapping parameter or variable in a mapping or a mapplet, we can enter an initial value. When the Integration Service needs an initial value, and we did not declare an initial value for the parameter or variable, the Integration Service uses a default value based on the data type of the parameter or variable.

Data ->Default Value

Numeric ->0

String ->Empty String

Date time ->1/1/1

Variable Values: Start value and current value of a mapping variable

Start Value:

The start value is the value of the variable at the start of the session. The Integration Service looks for the start value in the following order:

1. Value in parameter file

2. Value saved in the repository

3. Initial value

4. Default value

Current Value:

The current value is the value of the variable as the session progresses. When a session starts, the current value of a variable is the same as the start value. The final current value for a variable is saved to the repository at the end of a successful session. When a session fails to complete, the Integration Service does not update the value of the variable in the repository.

Note: If a variable function is not used to calculate the current value of a mapping variable, the start value of the variable is saved to the repository.

Variable Data type and Aggregation Type When we declare a mapping variable in a mapping, we need to configure the Data type and aggregation type for the variable. The IS uses the aggregate type of a Mapping variable to determine the final current value of the mapping variable.

Aggregation types are:

 Count: Integer and small integer data types are valid only.

 Max: All transformation data types except binary data type are valid.

 Min: All transformation data types except binary data type are valid.

Variable Functions

Variable functions determine how the Integration Service calculates the current value of a mapping variable in a pipeline.

SetMaxVariable: Sets the variable to the maximum value of a group of values. It ignores rows marked for update, delete, or reject. Aggregation type set to Max.

SetMinVariable: Sets the variable to the minimum value of a group of values. It ignores rows marked for update, delete, or reject. Aggregation type set to Min.

SetCountVariable: Increments the variable value by one. It adds one to the variable value when a row is marked for insertion, and subtracts one when the row is Marked for deletion. It ignores rows marked for update or reject. Aggregation type set to Count.

SetVariable: Sets the variable to the configured value. At the end of a session, it compares the final current value of the variable to the start value of the variable. Based on the aggregate type of the variable, it saves a final value to the repository.

Creating Mapping Parameters and Variables

1. Open the folder where we want to create parameter or variable.

2. In the Mapping Designer, click Mappings > Parameters and Variables. -or- In the Mapplet Designer, click Mapplet > Parameters and Variables.

3. Click the add button.

4. Enter name. Do not remove $$ from name.

5. Select Type and Data type. Select Aggregation type for mapping variables.

6. Give Initial Value. Click ok.

Page 17: Complete Reference to Informatica

Example: Use of Mapping of Mapping Parameters and Variables

EMP will be source table.

Create a target table MP_MV_EXAMPLE having columns: EMPNO, ENAME, DEPTNO, TOTAL_SAL, MAX_VAR, MIN_VAR, COUNT_VAR and SET_VAR.

TOTAL_SAL = SAL+ COMM + $$BONUS (Bonus is mapping parameter that changes every month)

SET_VAR: We will be added one month to the HIREDATE of every employee.

Create shortcuts as necessary.

Creating Mapping

1. Open folder where we want to create the mapping.

2. Click Tools -> Mapping Designer.

3. Click Mapping-> Create-> Give name. Ex: m_mp_mv_example

4. Drag EMP and target table.

5. Transformation -> Create -> Select Expression for list -> Create –>  Done.

6. Drag EMPNO, ENAME, HIREDATE, SAL, COMM and DEPTNO to Expression.

7. Create Parameter $$Bonus and Give initial value as 200.

8. Create variable $$var_max of MAX aggregation type and initial value 1500.

9. Create variable $$var_min of MIN aggregation type and initial value 1500.

10. Create variable $$var_count of COUNT aggregation type and initial value 0. COUNT is visible when datatype is INT or SMALLINT.

11. Create variable $$var_set of MAX aggregation type.

12. Create 5 output ports out_ TOTAL_SAL, out_MAX_VAR, out_MIN_VAR,

out_COUNT_VAR and out_SET_VAR.

13. Open expression editor for TOTAL_SAL. Do the same as we did earlier for SAL+ COMM. To add $$BONUS to it, select variable tab and select the parameter from mapping parameter. SAL + COMM + $$Bonus

14. Open Expression editor for out_max_var.

15. Select the variable function SETMAXVARIABLE from left side pane. Select

$$var_max from variable tab and SAL from ports tab as shown below.SETMAXVARIABLE($$var_max,SAL)

17. Open Expression editor for out_min_var and write the following expression:

SETMINVARIABLE($$var_min,SAL). Validate the expression.

18. Open Expression editor for out_count_var and write the following expression:

SETCOUNTVARIABLE($$var_count). Validate the expression.

Page 18: Complete Reference to Informatica

19. Open Expression editor for out_set_var and write the following expression:

SETVARIABLE($$var_set,ADD_TO_DATE(HIREDATE,'MM',1)). Validate.

20. Click OK. Expression Transformation below:

21. Link all ports from expression to target and Validate Mapping and Save it.

22. See mapping picture on next page.

PARAMETER FILE

A parameter file is a list of parameters and associated values for a workflow, worklet, or session.

Parameter files provide flexibility to change these variables each time we run a workflow or session.

We can create multiple parameter files and change the file we use for a session or workflow. We can create a parameter file using a text editor such as WordPad or Notepad.

Enter the parameter file name and directory in the workflow or session properties.

A parameter file contains the following types of parameters and variables:

Workflow variable: References values and records information in a workflow.

Worklet variable: References values and records information in a worklet. Use predefined worklet variables in a parent workflow, but we cannot use workflow variables from the parent workflow in a worklet.

 Session parameter: Defines a value that can change from session to session, such as a database connection or file name.

Mapping parameter and Mapping variable

USING A PARAMETER FILE

Parameter files contain several sections preceded by a heading. The heading identifies the Integration Service, Integration Service process, workflow, worklet, or session to which we want to assign parameters or variables.

Make session and workflow.

Give connection information for source and target table.

Run workflow and see result.

Page 19: Complete Reference to Informatica

Sample Parameter File for Our example:

In the parameter file, folder and session names are case sensitive.

Create a text file in notepad with name Para_File.txt

[Practice.ST:s_m_MP_MV_Example]

$$Bonus=1000

$$var_max=500

$$var_min=1200

$$var_count=0

CONFIGURING PARAMTER FILE

We can specify the parameter file name and directory in the workflow or session properties.

To enter a parameter file in the workflow properties:

1. Open a Workflow in the Workflow Manager.

2. Click Workflows > Edit.

3. Click the Properties tab.

4. Enter the parameter directory and name in the Parameter Filename field.

5. Click OK.

To enter a parameter file in the session properties:

1. Open a session in the Workflow Manager.

2. Click the Properties tab and open the General Options settings.

3. Enter the parameter directory and name in the Parameter Filename field.

4. Example: D:\Files\Para_File.txt or $PMSourceFileDir\Para_File.txt

5. Click OK.

MAPPLETS

A mapplet is a reusable object that we create in the Mapplet Designer.

It contains a set of transformations and lets us reuse that transformation logic in multiple mappings.

Created in Mapplet Designer in Designer Tool.

We need to use same set of 5 transformations in say 10 mappings. So instead of making 5 transformations in every 10 mapping, we create a mapplet of these 5 transformations. Now we use this mapplet in all 10 mappings. Example: To create a surrogate key in target. We create a mapplet using a stored procedure to create Primary key for target table. We give target table name and key column name as input to mapplet and get the Surrogate key as output.

Mapplets help simplify mappings in the following ways:

Include source definitions: Use multiple source definitions and source qualifiers to provide source data for a mapping.

Accept data from sources in a mapping

Include multiple transformations: As many transformations as we need.

Pass data to multiple transformations: We can create a mapplet to feed data to multiple transformations. Each Output transformation in a mapplet represents one output group in a mapplet.

Contain unused ports: We do not have to connect all mapplet input and output ports in a mapping.

Page 20: Complete Reference to Informatica

Mapplet Input:

Mapplet input can originate from a source definition and/or from an Input transformation in the mapplet. We can create multiple pipelines in a mapplet.

We use Mapplet Input transformation to give input to mapplet.

Use of Mapplet Input transformation is optional.

Mapplet Output:

The output of a mapplet is not connected to any target table.

We must use Mapplet Output transformation to store mapplet output.

A mapplet must contain at least one Output transformation with at least one connected port in the mapplet.

Example1: We will join EMP and DEPT table. Then calculate total salary. Give the output to mapplet out transformation.

· EMP and DEPT will be source tables.

· Output will be given to transformation Mapplet_Out.

Steps:

1. Open folder where we want to create the mapping.

2. Click Tools -> Mapplet Designer.

3. Click Mapplets-> Create-> Give name. Ex: mplt_example1

4. Drag EMP and DEPT table.

5. Use Joiner transformation as described earlier to join them.

6. Transformation -> Create -> Select Expression for list -> Create -> Done

7. Pass all ports from joiner to expression and then calculate total salary as described in expression transformation.

8. Now Transformation -> Create -> Select Mapplet Out from list –> Create -> Give name and then done.

9. Pass all ports from expression to Mapplet output.

10. Mapplet -> Validate

11. Repository -> Save

Use of mapplet in mapping:

We can mapplet in mapping by just dragging the mapplet from mapplet folder on left pane as we drag source and target tables.

When we use the mapplet in a mapping, the mapplet object displays only the ports from the Input and Output transformations. These are referred to as the mapplet input and mapplet output ports.

Make sure to give correct connection information in session.

Making a mapping: We will use mplt_example1, and then create a filter

transformation to filter records whose Total Salary is >= 1500.

· mplt_example1 will be source.

· Create target table same as Mapplet_out transformation as in picture above. Creating Mapping

1. Open folder where we want to create the mapping.

2. Click Tools -> Mapping Designer.

3. Click Mapping-> Create-> Give name. Ex: m_mplt_example1

4. Drag mplt_Example1 and target table.

5. Transformation -> Create -> Select Filter for list -> Create -> Done.

6. Drag all ports from mplt_example1 to filter and give filter condition.

7. Connect all ports from filter to target. We can add more transformations after filter if needed.

8. Validate mapping and Save it.

Make session and workflow.

Give connection information for mapplet source tables.

Page 21: Complete Reference to Informatica

Give connection information for target table.

Run workflow and see result.

PARTITIONING

A pipeline consists of a source qualifier and all the transformations and Targets that receive data from that source qualifier.

When the Integration Service runs the session, it can achieve higher Performance by partitioning the pipeline and performing the extract, Transformation, and load for each partition in parallel.

A partition is a pipeline stage that executes in a single reader, transformation, or Writer thread. The number of partitions in any pipeline stage equals the number of Threads in the stage. By default, the Integration Service creates one partition in every pipeline stage.

 PARTITIONING ATTRIBUTES

1. Partition points

By default, IS sets partition points at various transformations in the pipeline.

Partition points mark thread boundaries and divide the pipeline into stages.

A stage is a section of a pipeline between any two partition points.

2. Number of Partitions

we can define up to 64 partitions at any partition point in a pipeline.

When we increase or decrease the number of partitions at any partition point, the Workflow Manager increases or decreases the number of partitions at all Partition points in the pipeline.

increasing the number of partitions or partition points increases the number of threads.

The number of partitions we create equals the number of connections to the source or target. For one partition, one database connection will be used.

3. Partition types

The Integration Service creates a default partition type at each partition point.

If we have the Partitioning option, we can change the partition type. This option is purchased separately.

The partition type controls how the Integration Service distributes data among partitions at partition points.

Page 22: Complete Reference to Informatica

PARTITIONING TYPES

1. Round Robin Partition Type

In round-robin partitioning, the Integration Service distributes rows of data evenly to all partitions.

Each partition processes approximately the same number of rows.

Use round-robin partitioning when we need to distribute rows evenly and do not need to group data among partitions.

2. Pass-Through Partition Type

In pass-through partitioning, the Integration Service processes data without Redistributing rows among partitions.

All rows in a single partition stay in that partition after crossing a pass-Through partition point.

Use pass-through partitioning when we want to increase data throughput, but we do not want to increase the number of partitions.

3. Database Partitioning Partition Type

Use database partitioning for Oracle and IBM DB2 sources and IBM DB2 targets only.

Use any number of pipeline partitions and any number of database partitions.

We can improve performance when the number of pipeline partitions equals the number of database partitions.

Database Partitioning with One Source

When we use database partitioning with a source qualifier with one source, the Integration Service generates SQL queries for each database partition and distributes the data from the database partitions among the session partitions Equally.

For example, when a session has three partitions and the database has five partitions, 1st and 2nd session partitions will receive data from 2 database partitions each. Thus four DB partitions used. 3rd Session partition will receive Data from the remaining 1 DB partition.

Partitioning a Source Qualifier with Multiple Sources Tables

The Integration Service creates SQL queries for database partitions based on the Number of partitions in the database table with the most partitions.

If the session has three partitions and the database table has two partitions, one of the session partitions receives no data.

4. Hash Auto-Keys Partition Type

The Integration Service uses all grouped or sorted ports as a compound Partition key.

Use hash auto-keys partitioning at or before Rank, Sorter, Joiner, and Unsorted Aggregator transformations to ensure that rows are grouped Properly before they enter these transformations.

5. Hash User-Keys Partition Type

The Integration Service uses a hash function to group rows of data among Partitions.

we define the number of ports to generate the partition key.

we choose the ports that define the partition key .

6. Key range Partition Type

We specify one or more ports to form a compound partition key.

The Integration Service passes data to each partition depending on the Ranges we specify for each port.

Use key range partitioning where the sources or targets in the pipeline are Partitioned by key range.

Example: Customer 1-100 in one partition, 101-200 in another and so on. We Define the range for each partition.

WORKING WITH LINKS

Use links to connect each workflow task.

We can specify conditions with links to create branches in the workflow.

The Workflow Manager does not allow us to use links to create loops in the workflow. Each link in the workflow can run only once.

Valid Workflow :

Page 23: Complete Reference to Informatica

Example of loop:

Specifying Link Conditions:

Once we create links between tasks, we can specify conditions for each link to determine the order of execution in the workflow.

If we do not specify conditions for each link, the Integration Service runs the next task in the workflow by default.

Use predefined or user-defined workflow variables in the link condition.

Steps:

1. In the Workflow Designer workspace, double-click the link you want to specify.

2. The Expression Editor appears.

3. In the Expression Editor, enter the link condition. The Expression Editor provides predefined workflow variables, user-defined workflow variables, variable functions, and Boolean and arithmetic operators.

4. Validate the expression using the Validate button.

Using the Expression Editor:

The Workflow Manager provides an Expression Editor for any expressions in the workflow. We can enter expressions using the Expression Editor for the following:

Link conditions

Decision task

Assignment task

SCHEDULERS

We can schedule a workflow to run continuously, repeat at a given time or interval, or we can manually start a workflow. The Integration Service runs a scheduled workflow as configured.

By default, the workflow runs on demand. We can change the schedule settings by editing the scheduler. If we change schedule settings, the Integration Service reschedules the workflow according to the new settings.

A scheduler is a repository object that contains a set of schedule settings.

Scheduler can be non-reusable or reusable.

The Workflow Manager marks a workflow invalid if we delete the scheduler associated with the workflow.

If we choose a different Integration Service for the workflow or restart the Integration Service, it reschedules all workflows.

If we delete a folder, the Integration Service removes workflows from the schedule.

The Integration Service does not run the workflow if:

The prior workflow run fails.

We remove the workflow from the schedule

The Integration Service is running in safe mode

Creating a Reusable Scheduler

For each folder, the Workflow Manager lets us create reusable schedulers so we can reuse the same set of scheduling settings for workflows in the folder.

Use a reusable scheduler so we do not need to configure the same set of scheduling settings in each workflow.

When we delete a reusable scheduler, all workflows that use the deleted scheduler becomes invalid. To make the workflows valid, we must edit them and replace the missing scheduler.

Page 24: Complete Reference to Informatica

Steps:

1. Open the folder where we want to create the scheduler.

2. In the Workflow Designer, click Workflows > Schedulers.

3. Click Add to add a new scheduler.

4. In the General tab, enter a name for the scheduler.

5. Configure the scheduler settings in the Scheduler tab.

6. Click Apply and OK.

Configuring Scheduler Settings

Configure the Schedule tab of the scheduler to set run options, schedule options, start options, and end options for the schedule.

There are 3 run options:

1. Run on Demand

2. Run Continuously

3. Run on Server initialization

1. Run on Demand:

Integration Service runs the workflow when we start the workflow manually.

2. Run Continuously:

Integration Service runs the workflow as soon as the service initializes. The Integration Service then starts the next run of the workflow as soon as it finishes the previous run.

3. Run on Server initialization

Integration Service runs the workflow as soon as the service is initialized. The Integration Service then starts the next run of the workflow according to settings in Schedule Options.

Schedule options for Run on Server initialization:

 Run Once: To run the workflow just once.

 Run every: Run the workflow at regular intervals, as configured.

 Customized Repeat: Integration Service runs the workflow on the dates and times specified in the Repeat dialog box.

Start options for Run on Server initialization:

· Start Date

· Start Time

End options for Run on Server initialization:

End on: IS stops scheduling the workflow in the selected date.

End After: IS stops scheduling the workflow after the set number of

workflow runs.

Forever: IS schedules the workflow as long as the workflow does not fail.

Page 25: Complete Reference to Informatica

Creating a Non-Reusable Scheduler

1. In the Workflow Designer, open the workflow.

2. Click Workflows > Edit.

3. In the Scheduler tab, choose Non-reusable. Select Reusable if we want to select an existing reusable scheduler for the workflow.

4. Note: If we do not have a reusable scheduler in the folder, we must

5. create one before we choose Reusable.

6. Click the right side of the Scheduler field to edit scheduling settings for the non- reusable scheduler

7. If we select Reusable, choose a reusable scheduler from the Scheduler

8. Browser dialog box.

9. Click Ok.

Points to Ponder :

To remove a workflow from its schedule, right-click the workflow in the Navigator window and choose Unscheduled Workflow.

To reschedule a workflow on its original schedule, right-click the workflow in the Navigator window and choose Schedule Workflow.

WORKING WITH TASKS –Part 1

The Workflow Manager contains many types of tasks to help you build workflows and worklets. We can create reusable tasks in the Task Developer.

Types of tasks:

Task Type Tool where task can be created

Reusable or not

Session Task Developer Yes

Email Workflow Designer Yes

Command Worklet Designer Yes

Event-Raise Workflow Designer No

Event-Wait Worklet Designer No

Timer   No

Decision   No

Assignment   No

Control   No

 

SESSION TASK

A session is a set of instructions that tells the Power Center Server how and when to move data from sources to targets.

To run a session, we must first create a workflow to contain the Session task.

We can run as many sessions in a workflow as we need. We can run the Session tasks sequentially or concurrently, depending on our needs.

The Power Center Server creates several files and in-memory caches depending on the transformations and options used in the session.

EMAIL TASK

The Workflow Manager provides an Email task that allows us to send email during a workflow.

Created by Administrator usually and we just drag and use it in our mapping.

Steps:

1. In the Task Developer or Workflow Designer, choose Tasks-Create.

2. Select an Email task and enter a name for the task. Click Create.

3. Click Done.

4. Double-click the Email task in the workspace. The Edit Tasks dialog box appears.

5. Click the Properties tab.

6. Enter the fully qualified email address of the mail recipient in the Email User Name field.

7. Enter the subject of the email in the Email Subject field. Or, you can leave this field blank.

8. Click the Open button in the Email Text field to open the Email Editor.

9. Click OK twice to save your changes.

Page 26: Complete Reference to Informatica

Example: To send an email when a session completes:

Steps:

1. Create a workflow wf_sample_email

2. Drag any session task to workspace.

3. Edit Session task and go to Components tab.

4. See On Success Email Option there and configure it.

5. In Type select reusable or Non-reusable.

6. In Value, select the email task to be used.

7. Click Apply -> Ok.

8. Validate workflow and Repository -> Save

We can also drag the email task and use as per need.

We can set the option to send email on success or failure in components tab of a session task.

COMMAND TASK

The Command task allows us to specify one or more shell commands in UNIX or DOS commands in Windows to run during the workflow.

For example, we can specify shell commands in the Command task to delete reject files, copy a file, or archive target files.

Ways of using command task:

1. Standalone Command task: We can use a Command task anywhere in the workflow or worklet to run shell commands.

2. Pre- and post-session shell command: We can call a Command task as the pre- or post-session shell command for a Session task. This is done in COMPONENTS TAB of a session. We can run it in Pre-Session Command or Post Session Success Command or Post Session Failure Command. Select the Value and Type option as we did in Email task.

Example: to copy a file sample.txt from D drive to E.

Command: COPY D:\sample.txt E:\ in windows

Steps for creating command task:

1. In the Task Developer or Workflow Designer, choose Tasks-Create.

2. Select Command Task for the task type.

3. Enter a name for the Command task. Click Create. Then click done.

4. Double-click the Command task. Go to commands tab.

5. In the Commands tab, click the Add button to add a command.

6. In the Name field, enter a name for the new command.

7. In the Command field, click the Edit button to open the Command Editor.

8. Enter only one command in the Command Editor.

9. Click OK to close the Command Editor.

10. Repeat steps 5-9 to add more commands in the task.

11. Click OK.

Steps to create the workflow using command task:

1. Create a task using the above steps to copy a file in Task Developer.

2. Open Workflow Designer. Workflow -> Create -> Give name and click ok.

3. Start is displayed. Drag session say s_m_Filter_example and command task.

4. Link Start to Session task and Session to Command Task.

5. Double click link between Session and Command and give condition in editor as

6. $S_M_FILTER_EXAMPLE.Status=SUCCEEDED

7. Workflow-> Validate

8. Repository –> Save

WORKING WITH EVENT TASKS

We can define events in the workflow to specify the sequence of task execution.

Types of Events:

Pre-defined event: A pre-defined event is a file-watch event. This event Waits for a specified file to arrive at a given location.

User-defined event: A user-defined event is a sequence of tasks in the Workflow. We create events and then raise them as per need.

Page 27: Complete Reference to Informatica

Steps for creating User Defined Event:

1. Open any workflow where we want to create an event.

2. Click Workflow-> Edit -> Events tab.

3. Click to Add button to add events and give the names as per need.

4. Click Apply -> Ok. Validate the workflow and Save it.

Types of Events Tasks:

EVENT RAISE: Event-Raise task represents a user-defined event. We use this task to raise a user defined event.

EVENT WAIT: Event-Wait task waits for a file watcher event or user defined event to occur before executing the next session in the workflow.

Example1: Use an event wait task and make sure that session s_filter_example runs when abc.txt file is present in D:\FILES folder.

Steps for creating workflow:

1. Workflow -> Create -> Give name wf_event_wait_file_watch -> Click ok.

2. Task -> Create -> Select Event Wait. Give name. Click create and done.

3. Link Start to Event Wait task.

4. Drag s_filter_example to workspace and link it to event wait task.

5. Right click on event wait task and click EDIT -> EVENTS tab.

6. Select Pre Defined option there. In the blank space, give directory and filename to watch. Example: D:\FILES\abc.tct

7. Workflow validate and Repository Save.

Example 2: Raise a user defined event when session s_m_filter_example succeeds. Capture this event in event wait task and run session S_M_TOTAL_SAL_EXAMPLE

Steps for creating workflow:

1. Workflow -> Create -> Give name wf_event_wait_event_raise -> Click ok.

2. Workflow -> Edit -> Events Tab and add events EVENT1 there.

3. Drag s_m_filter_example and link it to START task.

4. Click Tasks -> Create -> Select EVENT RAISE from list. Give name

5. ER_Example. Click Create and then done.Link ER_Example to s_m_filter_example.

6. Right click ER_Example -> EDIT -> Properties Tab -> Open Value for User Defined Event and Select EVENT1 from the list displayed. Apply -> OK.

7. Click link between ER_Example and s_m_filter_example and give the condition $S_M_FILTER_EXAMPLE.Status=SUCCEEDED

8. Click Tasks -> Create -> Select EVENT WAIT from list. Give name EW_WAIT. Click Create and then done.

9. Link EW_WAIT to START task.

10. Right click EW_WAIT -> EDIT-> EVENTS tab.

11. Select User Defined there. Select the Event1 by clicking Browse Events button.

12. Apply -> OK.

13. Drag S_M_TOTAL_SAL_EXAMPLE and link it to EW_WAIT.

14. Mapping -> Validate

15. Repository -> Save.

16. Run workflow and see.

Page 28: Complete Reference to Informatica

WORKING WITH TASKS –Part 2

TIMER TASK

The Timer task allows us to specify the period of time to wait before the Power Center Server runs the next task in the workflow. The Timer task has two types of settings:

Absolute time: We specify the exact date and time or we can choose a user-defined workflow variable to specify the exact time. The next task in workflow will run as per the date and time specified.

Relative time: We instruct the Power Center Server to wait for a specified period of time after the Timer task, the parent workflow, or the top-level workflow starts.

Example: Run session s_m_filter_example relative to 1 min after the timer task.

Steps for creating workflow:

1. Workflow -> Create -> Give name wf_timer_task_example -> Click ok.

2. Click Tasks -> Create -> Select TIMER from list. Give name TIMER_Example. Click Create and then done.

3. Link TIMER_Example to START task.

4. Right click TIMER_Example-> EDIT -> TIMER tab.

5. Select Relative Time Option and Give 1 min and Select ‘From start time of this task’ Option.

6. Apply -> OK.

7. Drag s_m_filter_example and link it to TIMER_Example.

8. Workflow-> Validate and Repository -> Save.

DECISION TASK

The Decision task allows us to enter a condition that determines the execution of the workflow, similar to a link condition.

The Decision task has a pre-defined variable called $Decision_task_name.condition that represents the result of the decision condition.

The Power Center Server evaluates the condition in the Decision task and sets the pre-defined condition variable to True (1) or False (0).

We can specify one decision condition per Decision task.

Example: Command Task should run only if either s_m_filter_example or

S_M_TOTAL_SAL_EXAMPLE succeeds. If any of s_m_filter_example or

S_M_TOTAL_SAL_EXAMPLE fails then S_m_sample_mapping_EMP should run.

Steps for creating workflow:

1. Workflow -> Create -> Give name wf_decision_task_example -> Click ok.

2. Drag s_m_filter_example and S_M_TOTAL_SAL_EXAMPLE to workspace and link both of them to START task.

3. Click Tasks -> Create -> Select DECISION from list. Give name DECISION_Example. Click Create and then done. Link DECISION_Example to both s_m_filter_example and S_M_TOTAL_SAL_EXAMPLE.

4. Right click DECISION_Example-> EDIT -> GENERAL tab.

5. Set ‘Treat Input Links As’ to OR. Default is AND. Apply and click OK.

Page 29: Complete Reference to Informatica

6. Now edit decision task again and go to PROPERTIES Tab. Open the Expression editor by clicking the VALUE section of Decision Name attribute and enter the following condition: $S_M_FILTER_EXAMPLE.Status = SUCCEEDED OR $S_M_TOTAL_SAL_EXAMPLE.Status = SUCCEEDED

7. Validate the condition -> Click Apply -> OK.

8. Drag command task and S_m_sample_mapping_EMP task to workspace and link them to DECISION_Example task.

9. Double click link between S_m_sample_mapping_EMP & DECISION_Example & give the condition: $DECISION_Example.Condition = 0. Validate & click OK.

10. Double click link between Command task and DECISION_Example and give the condition: $DECISION_Example.Condition = 1. Validate and click OK.

11. Workflow Validate and repository Save.

12. Run workflow and see the result.

CONTROL TASK

We can use the Control task to stop, abort, or fail the top-level workflow or the parent workflow based on an input link condition.

A parent workflow or worklet is the workflow or worklet that contains the Control task.

We give the condition to the link connected to Control Task.

Control Option Description

Fail Me Fails the control task.

Fail Parent Marks the status of the WF or worklet that contains the

Control task as failed.

Stop Parent Stops the WF or worklet that contains the Control task.

Abort Parent Aborts the WF or worklet that contains the Control task.

Fail Top-Level WF Fails the workflow that is running.

Stop Top-Level WF Stops the workflow that is running.

Abort Top-Level WF Aborts the workflow that is running.

 

Example: Drag any 3 sessions and if anyone fails, then Abort the top level workflow.

Steps for creating workflow:

1. Workflow -> Create -> Give name wf_control_task_example -> Click ok.

2. Drag any 3 sessions to workspace and link all of them to START task.

3. Click Tasks -> Create -> Select CONTROL from list. Give name cntr_task.

4. Click Create and then done.

5. Link all sessions to the control task cntr_task.

6. Double click link between cntr_task and any session say s_m_filter_example and give the condition: $S_M_FILTER_EXAMPLE.Status = SUCCEEDED.

7. Repeat above step for remaining 2 sessions also.

8. Right click cntr_task-> EDIT -> GENERAL tab. Set ‘Treat Input Links As’ to OR. Default is AND.

9. Go to PROPERTIES tab of cntr_task and select the value ‘Fail top level

10. Workflow’ for Control Option. Click Apply and OK.

11. Workflow Validate and repository Save.

12. Run workflow and see the result.

Page 30: Complete Reference to Informatica

ASSIGNMENT TASK

The Assignment task allows us to assign a value to a user-defined workflow variable.

See Workflow variable topic to add user defined variables.

To use an Assignment task in the workflow, first create and add the

Assignment task to the workflow. Then configure the Assignment task to assign values or expressions to user-defined variables.

We cannot assign values to pre-defined workflow.

Steps to create Assignment Task:

1. Open any workflow where we want to use Assignment task.

2. Edit Workflow and add user defined variables.

3. Choose Tasks-Create. Select Assignment Task for the task type.

4. Enter a name for the Assignment task. Click Create. Then click Done.

5. Double-click the Assignment task to open the Edit Task dialog box.

6. On the Expressions tab, click Add to add an assignment.

7. Click the Open button in the User Defined Variables field.

8. Select the variable for which you want to assign a value. Click OK.

9. Click the Edit button in the Expression field to open the Expression Editor.

10. Enter the value or expression you want to assign.

11. Repeat steps 7-10 to add more variable assignments as necessary.

12. Click OK.

INDIRECT LOADING FOR FLAT FILES

Suppose, you have 10 flat files of same structure. All the flat files have same number of columns and data type. Now we need to transfer all the 10 files to same target.

Names of files are say EMP1, EMP2 and so on.

Solution1:

1. Import one flat file definition and make the mapping as per need.

2. Now in session give the Source File name and Source File Directory location of one file.

3. Make workflow and run.

4. Now open session after workflow completes. Change the Filename and Directory to give information of second file. Run workflow again.

5. Do the above for all 10 files.

Solution2:

1. Import one flat file definition and make the mapping as per need.

2. Now in session give the Source Directory location of the files.

3. Now in Fieldname use $InputFileName. This is a session parameter.

4. Now make a parameter file and give the value of $InputFileName.

$InputFileName=EMP1.txt

5. Run the workflow

6. Now edit parameter file and give value of second file. Run workflow again.

7. Do same for remaining files.

Solution3:

1. Import one flat file definition and make the mapping as per need.

2. Now make a notepad file that contains the location and name of each 10 flat files.

Sample:

D:\EMP1.txt

E:\EMP2.txt

Page 31: Complete Reference to Informatica

E:\FILES\DWH\EMP3.txt and so on

3. Now make a session and in Source file name and Source File Directory location fields, give the name and location of above created file.

4. In Source file type field, select Indirect.

5. Click Apply.

6. Validate Session

7. Make Workflow. Save it to repository and run.

SCD – Type 1

Slowly Changing Dimensions (SCDs) are dimensions that have data that changes slowly, rather than changing on a time-based, regular schedule

For example, you may have a dimension in your database that tracks the sales records of your company's salespeople. Creating sales reports seems simple enough, until a salesperson is transferred from one regional office to another. How do you record such a change in your sales dimension?

You could sum or average the sales by salesperson, but if you use that to compare the performance of salesmen, that might give misleading information. If the salesperson that was transferred used to work in a hot market where sales were easy, and now works in a market where sales are infrequent, her totals will look much stronger than the other salespeople in her new region, even if they are just as good. Or you could create a second salesperson record and treat the transferred person as a new sales person, but that creates problems also.

Dealing with these issues involves SCD management methodologies:

Type 1:

The Type 1 methodology overwrites old data with new data, and therefore does not track historical data at all. This is most appropriate when correcting certain types of data errors, such as the spelling of a name. (Assuming you won't ever need to know how it used to be misspelled in the past.)

Here is an example of a database table that keeps supplier information:

Supplier_Key Supplier_Code Supplier_Name Supplier_State

123 ABC Acme Supply Co CA

In this example, Supplier_Code is the natural key and Supplier_Key is a surrogate key. Technically, the surrogate key is not necessary, since the table will be unique by the natural key (Supplier_Code). However, the joins will perform better on an integer than on a character string.

Now imagine that this supplier moves their headquarters to Illinois. The updated table would simply overwrite this record:

Supplier_Key Supplier_Code Supplier_Name Supplier_State

123 ABC Acme Supply Co IL

The obvious disadvantage to this method of managing SCDs is that there is no historical record kept in the data warehouse. You can't tell if your suppliers are tending to move to the Midwest, for example. But an advantage to Type 1 SCDs is that they are very easy to maintain.

Explanation with an Example:

Source Table: (01-01-11) Target Table: (01-01-11)

Emp no Ename Sal

101 A 1000

102 B 2000

103 C 3000

 

Emp no Ename Sal

101 A 1000

102 B 2000

103 C 3000

The necessity of the lookup transformation is illustrated using the above source and target table.

Source Table: (01-02-11) Target Table: (01-02-11)

Emp no Ename Sal Empno Ename Sal

Page 32: Complete Reference to Informatica

101 A 1000 101 A 1000

102 B 2500 102 B 2500

103 C 3000 103 C 3000

104 D 4000 104 D 4000

In the second Month we have one more employee added up to the table with the Ename D and salary of the Employee is changed to the 2500 instead of 2000.

Step 1: Is to import Source Table and Target table.

Create a table by name emp_source with three columns as shown above in oracle.

Import the source from the source analyzer.

In the same way as above create two target tables with the names emp_target1, emp_target2.

Go to the targets Menu and click on generate and execute to confirm the creation of the target tables.

The snap shot of the connections using different kinds of transformations are shown below.

Step 2: Design the mapping and apply the necessary transformation.

Here in this transformation we are about to use four kinds of transformations namely Lookup transformation, Expression Transformation, Filter Transformation, Update Transformation. Necessity and the usage of all the transformations will be discussed in detail below.

Look up Transformation: The purpose of this transformation is to determine whether to insert, Delete, Update or reject the rows in to target table.

The first thing that we are goanna do is to create a look up transformation and connect the Empno from the source qualifier to the transformation.

The snapshot of choosing the Target table is shown below.

Page 33: Complete Reference to Informatica

What Lookup transformation does in our mapping is it looks in to the target table (emp_table) and compares it with the Source Qualifier and determines whether to insert, update, delete or reject rows.

In the Ports tab we should add a new column and name it as empno1 and this is column for which we are gonna connect from the Source Qualifier.

The Input Port for the first column should be unchked where as the other ports like Output and lookup box should be checked. For the newly created column only input and output boxes should be checked.

In the Properties tab (i) Lookup table name ->Emp_Target.

(ii)Look up Policy on Multiple Mismatch -> use First Value.

(iii) Connection Information ->Oracle.

In the Conditions tab (i) Click on Add a new condition

(ii)Lookup Table Column should be Empno, Transformation port should be Empno1 and Operator should ‘=’.

Expression Transformation: After we are done with the Lookup Transformation we are using an expression transformation to check whether we need to insert the records the same records or we need to update the records. The steps to create an Expression Transformation are shown below.

Drag all the columns from both the source and the look up transformation and drop them all on to the Expression transformation.

Now double click on the Transformation and go to the Ports tab and create two new columns and name it as insert and update. Both these columns are gonna be our output data so we need to have check mark only in front of the Output check box.

The Snap shot for the Edit transformation window is shown below.

The condition that we want to parse through our output data are listed below.

Page 34: Complete Reference to Informatica

Input à IsNull(EMPNO1)

Output à iif(Not isnull (EMPNO1) and Decode(SAL,SAL1,1,0)=0,1,0) .

We are all done here .Click on apply and then OK.

Filter Transformation: we are gonna have two filter transformations one to insert and other to update.

Connect the Insert column from the expression transformation to the insert column in the first filter transformation and in the same way we are gonna connect the update column in the expression transformation to the update column in the second filter.

Later now connect the Empno, Ename, Sal from the expression transformation to both filter transformation.

If there is no change in input data then filter transformation 1 forwards the complete input to update strategy transformation 1 and same output is gonna appear in the target table.

If there is any change in input data then filter transformation 2 forwards the complete input to the update strategy transformation 2 then it is gonna forward the updated input to the target table.

Go to the Properties tab on the Edit transformation

(i) The value for the filter condition 1 is Insert.

(ii) The value for the filter condition 1 is Update.

The Closer view of the filter Connection is shown below.

Update Strategy Transformation: Determines whether to insert, delete, update or reject the rows.

Drag the respective Empno, Ename and Sal from the filter transformations and drop them on the respective Update Strategy Transformation.

Now go to the Properties tab and the value for the update strategy expression is 0 (on the 1st update transformation).

Now go to the Properties tab and the value for the update strategy expression is 1 (on the 2nd update transformation).

We are all set here finally connect the outputs of the update transformations to the target table.

Step 3: Create the task and Run the work flow.

Don’t check the truncate table option.

Change Bulk to the Normal.

Run the work flow from task.

Step 4: Preview the Output in the target table.

Page 35: Complete Reference to Informatica

Type 2

Let us drive the point home using a simple scenario. For eg., in the current month ie.,(01-01-2010) we are provided with an source table with the three columns and three rows in it like (EMpno,Ename,Sal). There is a new employee added and one change in the records in the month (01-02-2010). We are gonna use the SCD-2 style to extract and load the records in to target table.

The thing to be noticed here is if there is any update in the salary of any employee then the history of that employee is displayed with the current date as the start date and the previous date as the end date.

Source Table: (01-01-11)

Emp no Ename Sal

101 A 1000

102 B 2000

103 C 3000

Target Table: (01-01-11)

Skey Emp no Ename Sal S-date E-date Ver Flag

100 101 A 1000 01-01-10 Null 1 1

200 102 B 2000 01-01-10 Null 1 1

300 103 C 3000 01-01-10 Null 1 1

 

Source Table: (01-02-11)

Emp no Ename Sal

101 A 1000

102 B 2500

103 C 3000

104 D 4000

Target Table: (01-02-11)

Skey Emp no Ename Sal S-date E-date Ver Flag

100 101 A 1000 01-02-10 Null 1 1

200 102 B 2000 01-02-10 Null 1 1

300 103 C 3000 01-02-10 Null 1 1

201 102 B 2500 01-02-10 01-01-10 2 0

400 104 D 4000 01-02-10 Null 1 1

In the second Month we have one more employee added up to the table with the Ename D and salary of the Employee is changed to the 2500 instead of 2000.

Step 1: Is to import Source Table and Target table.

Page 36: Complete Reference to Informatica

Create a table by name emp_source with three columns as shown above in oracle.

Import the source from the source analyzer.

Drag the Target table twice on to the mapping designer to facilitate insert or update process.

Go to the targets Menu and click on generate and execute to confirm the creation of the target tables.

The snap shot of the connections using different kinds of transformations are shown below.

In The Target Table we are goanna add five columns (Skey, Version, Flag, S_date ,E_Date).

Step 2: Design the mapping and apply the necessary transformation.

Here in this transformation we are about to use four kinds of transformations namely Lookup transformation (1), Expression Transformation (3), Filter Transformation (2), Sequence Generator. Necessity and the usage of all the transformations will be discussed in detail below.

Look up Transformation: The purpose of this transformation is to Lookup on the target table and to compare the same with the Source using the Lookup Condition.

The first thing that we are gonna do is to create a look up transformation and connect the Empno from the source qualifier to the transformation.

The snapshot of choosing the Target table is shown below.

Drag the Empno column from the Source Qualifier to the Lookup Transformation.

The Input Port for only the Empno1 should be checked.

In the Properties tab (i) Lookup table name ->Emp_Target.

(ii)Look up Policy on Multiple Mismatch -> use Last Value.

(iii) Connection Information ->Oracle.

In the Conditions tab (i) Click on Add a new condition

(ii)Lookup Table Column should be Empno, Transformation port should be Empno1 and Operator should ‘=’.

Expression Transformation: After we are done with the Lookup Transformation we are using an expression transformation to find whether the data on the source table matches with the target table. We specify the condition here whether to insert or to update the table. The steps to create an Expression Transformation are shown below.

Drag all the columns from both the source and the look up transformation and drop them all on to the Expression transformation.

Page 37: Complete Reference to Informatica

Now double click on the Transformation and go to the Ports tab and create two new columns and name it as insert and update. Both these columns are goanna be our output data so we need to have unchecked input check box.

The Snap shot for the Edit transformation window is shown below.

The condition that we want to parse through our output data are listed below.

Insert : IsNull(EmpNO1)

Update: iif(Not isnull (Skey) and Decode(SAL,SAL1,1,0)=0,1,0) .

We are all done here .Click on apply and then OK.

Filter Transformation: We need two filter transformations the purpose the first filter is to filter out the records which we are goanna insert and the next is vice versa.

If there is no change in input data then filter transformation 1 forwards the complete input to Exp 1 and same output is goanna appear in the target table.

If there is any change in input data then filter transformation 2 forwards the complete input to the Exp 2 then it is gonna forward the updated input to the target table.

Go to the Properties tab on the Edit transformation

(i) The value for the filter condition 1 is Insert.

(ii) The value for the filter condition 2 is Update.

The closer view of the connections from the expression to the filter is shown below.

Sequence Generator: We use this to generate an incremental cycle of sequential range of number.The purpose of this in our mapping is to increment the skey in the bandwidth of 100.

Page 38: Complete Reference to Informatica

We are gonna have a sequence generator and the purpose of the sequence generator is to increment the values of the skey in the multiples of 100 (bandwidth of 100).

Connect the output of the sequence transformation to the Exp 1.

Expression Transformation:

Exp 1: It updates the target table with the skey values. Point to be noticed here is skey gets multiplied by 100 and a new row is generated if there is any new EMP added to the list. Else the there is no modification done on the target table.

Drag all the columns from the filter 1 to the Exp 1.

Now add a new column as N_skey and the expression for it is gonna be Nextval1*100.

We are goanna make the s-date as the o/p and the expression for it is sysdate.

Flag is also made as output and expression parsed through it is 1.

Version is also made as output and expression parsed through it is 1.

Exp 2: If same employee is found with any updates in his records then Skey gets added by 1 and version changes to the next higher number,F

Drag all the columns from the filter 2 to the Exp 2.

Now add a new column as N_skey and the expression for it is gonna be Skey+1.

Both the S_date and E_date is gonna be sysdate.

Exp 3: If any record of in the source table gets updated then we make it only as the output.

Page 39: Complete Reference to Informatica

If change is found then we are gonna update the E_Date to S_Date.

Update Strategy: This is place from where the update instruction is set on the target table.

The update strategy expression is set to 1.

Step 3: Create the task and Run the work flow.

Don’t check the truncate table option.

Change Bulk to the Normal.

Run the work flow from task.

Create the task and run the work flow.

Step 4: Preview the Output in the target table.

SCD Type 3

This Method has limited history preservation, and we are goanna use skey as the Primary key here.

Source table: (01-01-2011)

Empno Ename Sal

Page 40: Complete Reference to Informatica

101

102

103

A

B

C

1000

2000

3000

Target Table: (01-01-2011)

Empno Ename C-sal P-sal

101

102

103

A

B

C

1000

2000

3000

-

-

-

Source Table: (01-02-2011)

Empno Ename Sal

101

102

103

A

B

C

1000

4566

3000

Target Table (01-02-2011):

Empno Ename C-sal P-sal

101

102

103

102

A

B

C

B

1000

4566

3000

4544

-

Null

-

4566

So hope u got what I’m trying to do with the above tables:

Step 1: Initially in the mapping designer I’m goanna create a mapping as below. And in this mapping I’m using lookup, expression, filter, update strategy to drive the purpose. Explanation of each and every Transformation is given below.

Step 2: here we are goanna see the purpose and usage of all the transformations that we have used in the above mapping.

Look up Transformation: The look Transformation looks the target table and compares the same with the source table. Based on the Look up condition it decides whether we need to update, insert, and delete the data from being loaded in to the target table.

As usually we are goanna connect Empno column from the Source Qualifier and connect it to look up transformation. Prior to this Look up transformation has to look at the target table.

Next to this we are goanna specify the look up condition empno =empno1.

Finally specify that connection Information (Oracle) and look up policy on multiple mismatches (use last value) in the Properties tab.

Expression Transformation:

We are using the Expression Transformation to separate out the Insert-stuff’s and Update- Stuff’s logically.

Drag all the ports from the Source Qualifier and Look up in to Expression.

Add two Ports and Rename them as Insert, Update.

These two ports are goanna be just output ports. Specify the below conditions in the Expression editor for the ports respectively.

Insert: isnull(ENO1 )

Update: iif(not isnull(ENO1) and decode(SAL,Curr_Sal,1,0)=0,1,0)

Filter Transformation: We are goanna use two filter Transformation to filter out the data physically in to two separate sections one for insert and the other for the update process to happen.

Filter 1:

Page 41: Complete Reference to Informatica

Drag the Insert and other three ports which came from source qualifier in to the Expression in to first filter.

In the Properties tab specify the Filter condition as Insert.

Filter 2:

Drag the update and other four ports which came from Look up in to the Expression in to Second filter.

In the Properties tab specify the Filter condition as update.

Update Strategy: Finally we need the update strategy to insert or to update in to the target table.

Update Strategy 1: This is intended to insert in to the target table.

Drag all the ports except the insert from the first filter in to this.

In the Properties tab specify the condition as the 0 or dd_insert.

Update Strategy 2: This is intended to update in to the target table.

Drag all the ports except the update from the second filter in to this.

In the Properties tab specify the condition as the 1 or dd_update.

Finally connect both the update strategy in to two instances of the target.

Step 3: Create a session for this mapping and Run the work flow.

Step 4: Observe the output it would same as the second target table

Incremental Aggregation:

When we enable the session option-> Incremental Aggregation the Integration Service performs incremental aggregation, it passes source data through the mapping and uses historical cache data to perform aggregation calculations incrementally.

When using incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. If the source changes incrementally and you can capture changes, you can configure the session to process those changes. This allows the Integration Service to update the target incrementally, rather than forcing it to process the entire source and recalculate the same data each time you run the session.

For example, you might have a session using a source that receives new data every day. You can capture those incremental changes because you have added a filter condition to the mapping that removes pre-existing data from the flow of data. You then enable incremental aggregation.

When the session runs with incremental aggregation enabled for the first time on March 1, you use the entire source. This allows the Integration Service to read and store the necessary aggregate data. On March 2, when you run the session again, you filter out all the records except those time-stamped March 2. The Integration Service then processes the new data and updates the target accordingly.Consider using incremental aggregation in the following circumstances:

You can capture new source data. Use incremental aggregation when you can capture new source data each time you run the session. Use a Stored Procedure or Filter transformation to process new data.

Incremental changes do not significantly change the target. Use incremental aggregation when the changes do not significantly change the target. If processing the incrementally changed source alters more than half the existing

Page 42: Complete Reference to Informatica

target, the session may not benefit from using incremental aggregation. In this case, drop the table and recreate the target with complete source data.

Note: Do not use incremental aggregation if the mapping contains percentile or median functions. The Integration Service uses system memory to process these functions in addition to the cache memory you configure in the session properties. As a result, the Integration Service does not store incremental aggregation values for percentile and median functions in disk caches.

Integration Service Processing for Incremental Aggregation

(i)The first time you run an incremental aggregation session, the Integration Service processes the entire source. At the end of the session, the Integration Service stores aggregate data from that session run in two files, the index file and the data file. The Integration Service creates the files in the cache directory specified in the Aggregator transformation properties.

(ii)Each subsequent time you run the session with incremental aggregation, you use the incremental source changes in the session. For each input record, the Integration Service checks historical information in the index file for a corresponding group. If it finds a corresponding group, the Integration Service performs the aggregate operation incrementally, using the aggregate data for that group, and saves the incremental change. If it does not find a corresponding group, the Integration Service creates a new group and saves the record data.

(iii)When writing to the target, the Integration Service applies the changes to the existing target. It saves modified aggregate data in the index and data files to be used as historical data the next time you run the session.

(iv) If the source changes significantly and you want the Integration Service to continue saving aggregate data for future incremental changes, configure the Integration Service to overwrite existing aggregate data with new aggregate data.

Each subsequent time you run a session with incremental aggregation, the Integration Service creates a backup of the incremental aggregation files. The cache directory for the Aggregator transformation must contain enough disk space for two sets of the files.

(v)When you partition a session that uses incremental aggregation, the Integration Service creates one set of cache files for each partition.

The Integration Service creates new aggregate data, instead of using historical data, when you perform one of the following tasks:

Save a new version of the mapping.

Configure the session to reinitialize the aggregate cache.

Move the aggregate files without correcting the configured path or directory for the files in the session properties.

Change the configured path or directory for the aggregate files without moving the files to the new location.

Delete cache files.

Decrease the number of partitions.

When the Integration Service rebuilds incremental aggregation files, the data in the previous files is lost.

Note: To protect the incremental aggregation files from file corruption or disk failure, periodically back up the files.

Preparing for Incremental Aggregation:

When you use incremental aggregation, you need to configure both mapping and session properties:

Implement mapping logic or filter to remove pre-existing data.

Configure the session for incremental aggregation and verify that the file directory has enough disk space for the aggregate files.

Configuring the Mapping

Before enabling incremental aggregation, you must capture changes in source data. You can use a Filter or Stored Procedure transformation in the mapping to remove pre-existing source data during a session.

Configuring the Session

Use the following guidelines when you configure the session for incremental aggregation:

(i) Verify the location where you want to store the aggregate files.

The index and data files grow in proportion to the source data. Be sure the cache directory has enough disk space to store historical data for the session.

When you run multiple sessions with incremental aggregation, decide where you want the files stored. Then, enter the appropriate directory for the process variable, $PMCacheDir, in the Workflow Manager. You can enter session-specific directories for the index and data files. However, by using the process variable for all sessions using incremental aggregation, you can easily change the cache directory when necessary by changing $PMCacheDir.

Changing the cache directory without moving the files causes the Integration Service to reinitialize the aggregate cache and gather new aggregate data.

In a grid, Integration Services rebuild incremental aggregation files they cannot find. When an Integration Service rebuilds incremental aggregation files, it loses aggregate history.

(ii) Verify the incremental aggregation settings in the session properties.

You can configure the session for incremental aggregation in the Performance settings on the Properties tab.

You can also configure the session to reinitialize the aggregate cache. If you choose to reinitialize the cache, the Workflow Manager displays a warning indicating the Integration Service overwrites the existing cache and a reminder to clear this option after running the session.

Page 43: Complete Reference to Informatica

Mapping Templates

A mapping template is a drawing in Visio that represents a PowerCenter mapping. You can configure rules and parameters in a mapping template to specify the transformation logic.

Use the Informatica Stencil and the Informatica toolbar in the Mapping Architect for Visio to create a mapping template. The Informatica Stencil contains shapes that represent mapping objects that you can use to create a mapping template. The Informatica toolbar contains buttons for the tasks you can perform on mapping template.

You can create a mapping template manually, or you can create a mapping template by importing a Power Center mapping.

Creating a Mapping Template Manually:

You can use the Informatica Stencil and the Informatica toolbar to create a mapping template. Save and publish a mapping template to create the mapping template files.

To create a mapping template manually, complete the following steps:

1. Start Mapping Architect for Visio.

2. Verify that the Informatica Stencil and Informatica toolbar are available.

3. Drag the mapping objects from the Informatica Stencil to the drawing window:- Use the mapping objects to create visual representation of the mapping.

4. Create links:- Create links to connect mapping objects.

5. Configure link rules:- Configure rules for each link in the mapping template to indicate how data moves from one mapping object to another. Use parameters to make the rules flexible.

6. Configure the mapping objects:- Add a group or expression required by the transformations in the mapping template. To create multiple mappings, set a parameter for the source or target definition.

7. Declare mapping parameters and variables to use when you run sessions in Power Center:- After you import the mappings created from the mapping template into Power Center, you can use the mapping parameters and variables in the session or workflow.

8. Validate the mapping template.

9. Save the mapping template:- Save changes to the mapping template drawing file.

10. Publish the mapping template:- When you publish the mapping template, Mapping Architect for Visio generates a mapping template XML file and a mapping template parameter file (param.xml).If you edit the mapping template drawing file after you publish it, you need to publish again. Do not edit the mapping template XML file.

Importing a Mapping Template from a Power Center:

If you have a Power Center mapping that you want to use as a basis for a mapping template, export the mapping to a mapping XML file and then use the mapping XML file to create a mapping template.

Note: Export the mapping XML file within the current Power Center release. Informatica does not support imported objects from a different release.

To import a mapping template from a Power Center mapping, complete the following steps:

1. Export a Power Center mapping. In the Designer, select the mapping that you want to base the mapping template on and export it to an XML file.

2. Start Mapping Architect for Visio.

3. Verify that the Informatica stencil and Informatica toolbar are available.

4. Import the mapping. On the Informatica toolbar, click the Create Template from Mapping XML button. Mapping Architect for Visio determines the mapping objects and links included in the mapping and adds the appropriate objects to the drawing window.

5. Verify links. Create or verify links that connect mapping objects.

6. Configure link rules. Configure rules for each link in the mapping template to indicate how data moves from one mapping object to another. Use parameters to make the rules flexible.

7. Configure the mapping objects. Add a group or expression required by the transformations in the mapping template. To create multiple mappings, set a parameter for the source or target definition.

8. Declare mapping parameters and variables to use when you run the session in Power Center. After you import the mappings created from the mapping template into Power Center, you can use the mapping parameters and variables in the session or workflow.

Note: If the Power Center mapping contains mapping parameters and variables, it is possible that the mapping parameters and variables ($$ParameterName) may not work for all mappings you plan to create from the mapping template. Modify or declare new mapping parameters and variables appropriate for running the new mappings created from the mapping template.

9. Validate the mapping template.

10. Save the mapping template. Save changes to the mapping template drawing file.

11. Publish the mapping template. When you publish the mapping template, Mapping Architect for Visio generates a mapping template XML file and a mapping template parameter file (param.xml).

If you make any change to the mapping template after publishing, you need to publish the mapping template again. Do not edit the mapping template XML file.

Note: Mapping Architect for Visio fails to create a mapping template if you import a mapping that includes an unsupported source type, target type, or mapping object.

Page 44: Complete Reference to Informatica

Grid Processing

When a Power Center domain contains multiple nodes, you can configure workflows and sessions to run on a grid. When you run a workflow on a grid, the Integration Service runs a service process on each available node of the grid to increase performance and scalability. When you run a session on a grid, the Integration Service distributes session threads to multiple DTM processes on nodes in the grid to increase performance and scalability. You create the grid and configure the Integration Service in the Administration Console. To run a workflow on a grid, you configure the workflow to run on the Integration Service associated with the grid. To run a session on a grid, configure the session to run on the grid.

The Integration Service distributes workflow tasks and session threads based on how you configure the workflow or session to run:

Running workflows on a grid. The Integration Service distributes workflows across the nodes in a grid. It also distributes the Session, Command, and predefined Event-Wait tasks within workflows across the nodes in a grid.

Running sessions on a grid. The Integration Service distributes session threads across nodes in a grid.

Note: To run workflows on a grid, you must have the Server grid option. To run sessions on a grid, you must have the Session on Grid option.

Running Workflows on a Grid:

When you run a workflow on a grid, the master service process runs the workflow and all tasks except Session, Command, and predefined Event-Wait tasks, which it may distribute to other nodes. The master service process is the Integration Service process that runs the workflow, monitors service processes running on other nodes, and runs the Load Balancer. The Scheduler runs on the master service process node, so it uses the date and time for the master service process node to start scheduled workflows.

The Load Balancer is the component of the Integration Service that dispatches Session, Command, and predefined Event-Wait tasks to the nodes in the grid. The Load Balancer distributes tasks based on node availability. If the Integration Service is configured to check resources, the Load Balancer also distributes tasks based on resource availability.

For example, a workflow contains a Session task, a Decision task, and a Command task. You specify a resource requirement for the Session task. The grid contains four nodes, and Node 4 is unavailable. The master service process runs the Start and Decision tasks. The Load Balancer distributes the Session and Command tasks to

nodes on the grid based on resource availability and node availability.

Running Sessions on a Grid:

When you run a session on a grid, the master service process runs the workflow and all tasks except Session, Command, and predefined Event-Wait tasks as it does when you run a workflow on a grid. The Scheduler runs on the master service process node, so it uses the date and time for the master service process node to start scheduled workflows. In addition, the Load Balancer distributes session threads to DTM processes running on different nodes.

When you run a session on a grid, the Load Balancer distributes session threads based on the following factors:

 Node availability :- The Load Balancer verifies which nodes are currently running, enabled, and available for task dispatch.

 Resource availability :- If the Integration Service is configured to check resources, it identifies nodes that have resources required by mapping objects in the session.

 Partitioning configuration. The Load Balancer dispatches groups of session threads to separate nodes based on the partitioning configuration.

You might want to configure a session to run on a grid when the workflow contains a session that takes a long time to run.

Grid Connectivity and Recovery

When you run a workflow or session on a grid, service processes and DTM processes run on different nodes. Network failures can cause connectivity loss between processes running on separate nodes. Services may shut down unexpectedly, or you may disable the Integration Service or service processes while a workflow or session is running. The Integration Service failover and recovery behavior in these situations depends on the service process that is disabled, shuts down, or loses connectivity. Recovery behavior also depends on the following factors:

High availability option:-When you have high availability, workflows fail over to another node if the node or service shuts down. If you do not have high availability, you can manually restart a workflow on another node to recover it.

Recovery strategy:- You can configure a workflow to suspend on error. You configure a recovery strategy for tasks within the workflow. When a workflow suspends, the recovery behavior depends on the recovery strategy you configure for each task in the workflow.

Shutdown mode:- When you disable an Integration Service or service process, you can specify that the service completes, aborts, or stops processes running on the service. Behavior differs when you disable the Integration Service or you disable a service process. Behavior also differs when you disable a master service process or a worker service process. The Integration Service or service process may also shut down unexpectedly. In this case, the failover and recovery behavior depend on which service process shuts down and the configured recovery strategy.

Running mode:-If the workflow runs on a grid, the Integration Service can recover workflows and tasks on another node. If a session runs on a grid, you cannot configure a resume recovery strategy.

Operating mode:- If the Integration Service runs in safe mode, recovery is disabled for sessions and workflows.

Note: You cannot configure an Integration Service to fail over in safe mode if it runs on a grid.

Workflow Variables

Page 45: Complete Reference to Informatica

You can create and use variables in a workflow to reference values and record information. For example, use a Variable in a Decision task to determine whether the previous task ran properly. If it did, you can run the next task.

If not, you can stop the workflow. Use the following types of workflow variables:

 Predefined workflow variables. The Workflow Manager provides predefined workflow variables for tasks within a workflow.

 User-defined workflow variables. You create user-defined workflow variables when you create a workflow. Use workflow variables when you configure the following types of tasks:

 Assignment tasks. Use an Assignment task to assign a value to a user-defined workflow variable. For Example, you can increment a user-defined counter variable by setting the variable to its current value plus 1.

 Decision tasks. Decision tasks determine how the Integration Service runs a workflow. For example, use the Status variable to run a second session only if the first session completes successfully.

 Links. Links connect each workflow task. Use workflow variables in links to create branches in the workflow. For example, after a Decision task, you can create one link to follow when the decision condition evaluates to true, and another link to follow when the decision condition evaluates to false.

 Timer tasks. Timer tasks specify when the Integration Service begins to run the next task in the workflow. Use a user-defined date/time variable to specify the time the Integration Service starts to run the next task.

Use the following keywords to write expressions for user-defined and predefined workflow variables:

AND

OR

NOT

TRUE

FALSE

NULL

SYSDATE

Predefined Workflow Variables:

Each workflow contains a set of predefined variables that you use to evaluate workflow and task conditions. Use the following types of predefined variables:

Task-specific variables. The Workflow Manager provides a set of task-specific variables for each task in the workflow. Use task-specific variables in a link condition to control the path the Integration Service takes when running the workflow. The Workflow Manager lists task-specific variables under the task name in the Expression Editor.

Built-in variables. Use built-in variables in a workflow to return run-time or system information such as folder name, Integration Service Name, system date, or workflow start time. The Workflow Manager lists built-in variables under the Built-in node in the Expression Editor.

Task-Specific Variables

Description Task Types

Data type

Condition Evaluation result of decision condition expression. If the task fails, the Workflow Manager keeps the condition set to null.

Sample syntax: $Dec_TaskStatus.Condition = <TRUE | FALSE | NULL | any integer>

Decision Integer

End Time Date and time the associated task ended. Precision is to the second. Sample syntax: $s_item_summary.EndTime > TO_DATE('11/10/2004 08:13:25')

All tasks Date/Time

ErrorCode Last error code for the associated task. If there is no error, the Integration Service sets ErrorCode to 0 when the task completes. Sample syntax: $s_item_summary.ErrorCode = 24013. Note: You might use this variable when a task consistently fails with

All tasks Integer

Page 46: Complete Reference to Informatica

this final error message.

ErrorMsg Last error message for the associated task.If there is no error, the Integration Service sets ErrorMsg to an empty string when the task completes. Sample syntax: $s_item_summary.ErrorMsg = 'PETL_24013 Session run completed with failure Variables of type Nstring can have a maximum length of 600 characters. Note: You might use this variable when a task consistently fails with this final error message.

All tasks Nstring

First Error Code Error code for the first error message in the session. If there is no error, the Integration Service sets FirstErrorCode to 0 when the session completes. Sample syntax: $s_item_summary.FirstErrorCode = 7086

Session Integer

FirstErrorMsg First error message in the session.If there is no error, the Integration Service sets FirstErrorMsg to an empty string when the task completes. Sample syntax: $s_item_summary.FirstErrorMsg = 'TE_7086 Tscrubber: Debug info… Failed to evalWrapUp'Variables of type Nstring can have a maximum length of 600 characters.

Session Nstring

PrevTaskStatus Status of the previous task in the workflow that the Integration Service ran. Statuses include: 1.ABORTED 2.FAILED 3.STOPPED 4.SUCCEEDED Use these key words when writing expressions to evaluate the status of the previous task. Sample syntax: $Dec_TaskStatus.PrevTaskStatus = FAILED

All Tasks Integer

SrcFailedRows Total number of rows the Integration Service failed to read from the source. Sample syntax: $s_dist_loc.SrcFailedRows = 0

Session Integer

SrcSuccessRows Total number of rows successfully read from the sources. Sample syntax: $s_dist_loc.SrcSuccessRows > 2500

Session Integer

StartTime Date and time the associated task started. Precision is to the second. Sample syntax: $s_item_summary.StartTime > TO_DATE('11/10/2004 08:13:25')

All Task Date/Time

Page 47: Complete Reference to Informatica

Status Status of the previous task in the workflow. Statuses include: - ABORTED - DISABLED - FAILED - NOTSTARTED - STARTED - STOPPED - SUCCEEDED Use these key words when writing expressions to evaluate the status of the current task. Sample syntax: $s_dist_loc.Status = SUCCEEDED

All Task Integer

TgtFailedRows Total number of rows the Integration Service failed to write to the target. Sample syntax: $s_dist_loc.TgtFailedRows = 0

Session Integer

TgtSuccessRows Total number of rows successfully written to the target. Sample syntax: $s_dist_loc.TgtSuccessRows > 0

Session Integer

TotalTransErrors Total number of transformation errors. Sample syntax: $s_dist_loc.TotalTransErrors = 5

Session Integer

User-Defined Workflow Variables:

You can create variables within a workflow. When you create a variable in a workflow, it is valid only in that workflow. Use the variable in tasks within that workflow. You can edit and delete user-defined workflow variables.

Use user-defined variables when you need to make a workflow decision based on criteria you specify. For example, you create a workflow to load data to an orders database nightly. You also need to load a subset of this data to headquarters periodically, every tenth time you update the local orders database. Create separate sessions to update the local database and the one at headquarters.

Use a user-defined variable to determine when to run the session that updates the orders database at headquarters.

To configure user-defined workflow variables, complete the following steps:

1. Create a persistent workflow variable, $$WorkflowCount, to represent the number of times the workflow has run.

2. Add a Start task and both sessions to the workflow.

3. Place a Decision task after the session that updates the local orders database.Set up the decision condition to check to see if the number of workflow runs is evenly divisible by 10. Use the modulus (MOD) function to do this.

4. Create an Assignment task to increment the $$WorkflowCount variable by one.

5. Link the Decision task to the session that updates the database at headquarters when the decision condition evaluates to true. Link it to the Assignment task when the decision condition evaluates to false. When you configure workflow variables using conditions, the session that updates the local database runs every time the workflow runs. The session that updates the database at headquarters runs every 10th time the workflow runs.

Creating User-Defined Workflow Variables :

You can create workflow variables for a workflow in the workflow properties.

To create a workflow variable:

1. In the Workflow Designer, create a new workflow or edit an existing one.

2. Select the Variables tab.

3. Click Add.

4. Enter the information in the following table and click OK:

Field Description

Page 48: Complete Reference to Informatica

Name Variable name. The correct format is $$VariableName. Workflow variable names are not case sensitive. Do not use a single dollar sign ($) for a user-defined workflow variable. The single dollar sign is reserved for predefined workflow variables

Data type Data type of the variable. You can select from the following data types: - Date/Time - Double - Integer - Nstring

Persistent Whether the variable is persistent. Enable this option if you want the value of the variable retained from one execution of the workflow to the next.

Default Value Default value of the variable. The Integration Service uses this value for the variable during sessions if you do not set a value for the variable in the parameter file and there is no value stored in the repository. Variables of type Date/Time can have the following formats: - MM/DD/RR - MM/DD/YYYY - MM/DD/RR HH24:MI - MM/DD/YYYY HH24:MI - MM/DD/RR HH24:MI:SS - MM/DD/YYYY HH24:MI:SS - MM/DD/RR HH24:MI:SS.MS - MM/DD/YYYY HH24:MI:SS.MS - MM/DD/RR HH24:MI:SS.US - MM/DD/YYYY HH24:MI:SS.US - MM/DD/RR HH24:MI:SS.NS - MM/DD/YYYY HH24:MI:SS.NS You can use the following separators: dash (-), slash (/), backslash (\), colon (:), period (.), and space. The Integration Service ignores extra spaces. You cannot use one- or three-digit valuesfor year or the “HH12” format for hour. Variables of type Nstring can have a maximum length of 600 characters.

Is Null Whether the default value of the variable is null. If the default value is null, enable this option.

Description Description associated with the variable.

 

5. To validate the default value of the new workflow variable, click the Validate button.

6. Click Apply to save the new workflow variable.

7. Click OK.

Interview Zone

Hi readers. These are the questions which normally i would expect by interviewee to know when i sit in panel. So what i would request my reader’s to start posting your answers to this questions in the discussion forum under informatica technical interview guidance tag and i’ll review them and only valid answers will be kept and rest will be deleted.

1. Explain your Project?

2. What are your Daily routines?

3. How many mapping have you created all together in your project?

4. In which account does your Project Fall?

5. What is your Reporting Hierarchy?

6. How many Complex Mapping’s have you created? Could you please me the situation for which you have developed that Complex mapping?

7. What is your Involvement in Performance tuning of your Project?

8. What is the Schema of your Project? And why did you opt for that particular schema?

9. What are your Roles in this project?

Page 49: Complete Reference to Informatica

10. Can I have one situation which you have adopted by which performance has improved dramatically?

11. Where you Involved in more than two projects simultaneously?

12. Do you have any experience in the Production support?

13. What kinds of Testing have you done on your Project (Unit or Integration or System or UAT)? And Enhancement’s were done after testing?

14. How many Dimension Table are there in your Project and how are they linked to the fact table?

15. How do we do the Fact Load?

16. How did you implement CDC in your project?

17. How does your Mapping in File to Load look like?

18. How does your Mapping in Load to Stage look like?

19. How does your Mapping in Stage to ODS look like?

20. What is the size of your Data warehouse?

21. What is your Daily feed size and weekly feed size?

22. Which Approach (Top down or Bottom Up) was used in building your project?

23. How do you access your source’s (are they Flat files or Relational)?

24. Have you developed any Stored Procedure or triggers in this project? How did you use them and in which situation?

25. Did your Project go live? What are the issues that you have faced while moving your project from the Test Environment to the Production Environment?

26. What is the biggest Challenge that you encountered in this project?

27. What is the scheduler tool you have used in this project? How did you schedule jobs using it?

Informatica Experienced Interview Questions – part 1

1. Difference between Informatica 7x and 8x?2. Difference between connected and unconnected lookup transformation in Informatica?3. Difference between stop and abort in Informatica?4. Difference between Static and Dynamic caches?5. What is Persistent Lookup cache? What is its significance?6. Difference between and reusable transformation and mapplet?7. How the Informatica server sorts the string values in Rank transformation?8. Is sorter an active or passive transformation? When do we consider it to be active and passive?9. Explain about Informatica server Architecture?10. In update strategy Relational table or flat file which gives us more performance? Why?11. What are the out put files that the Informatica server creates during running a session?12. Can you explain what are error tables in Informatica are and how we do error handling in Informatica?13. Difference between constraint base loading and target load plan?14. Difference between IIF and DECODE function?15. How to import oracle sequence into Informatica?16. What is parameter file?17. Difference between Normal load and Bulk load?18. How u will create header and footer in target using Informatica?19. What are the session parameters?20. Where does Informatica store rejected data? How do we view them?21. What is difference between partitioning of relational target and file targets?22. What are mapping parameters and variables in which situation we can use them?23. What do you mean by direct loading and Indirect loading in session properties?24. How do we implement recovery strategy while running concurrent batches?25. Explain the versioning concept in Informatica?26. What is Data driven?27. What is batch? Explain the types of the batches?28. What are the types of meta data repository stores?29. Can you use the mapping parameters or variables created in one mapping into another mapping?30. Why did we use stored procedure in our ETL Application?31. When we can join tables at the Source qualifier itself, why do we go for joiner transformation?32. What is the default join operation performed by the look up transformation?33. What is hash table Informatica?34. In a joiner transformation, you should specify the table with lesser rows as the master table. Why?35. Difference between Cached lookup and Un-cached lookup?36. Explain what DTM does when you start a work flow?37. Explain what Load Manager does when you start a work flow?38. In a Sequential batch how do i stop one particular session from running?39. What are the types of the aggregations available in Informatica?40. How do I create Indexes after the load process is done?41. How do we improve the performance of the aggregator transformation?42. What are the different types of the caches available in Informatica? Explain in detail?43. What is polling?44. What are the limitations of the joiner transformation?45. What is Mapplet?46. What are active and passive transformations?

Page 50: Complete Reference to Informatica

47. What are the options in the target session of update strategy transformation?48. What is a code page? Explain the types of the code pages?49. What do you mean rank cache?50. How can you delete duplicate rows with out using Dynamic Lookup? Tell me any other ways using lookup delete

the duplicate rows?51. Can u copy the session in to a different folder or repository?52. What is tracing level and what are its types?53. What is a command that used to run a batch?54. What are the unsupported repository objects for a mapplet?55. If your workflow is running slow, what is your approach towards performance tuning?56. What are the types of mapping wizards available in Informatica?57. After dragging the ports of three sources (Sql server, oracle, Informix) to a single source qualifier, can we map

these three ports directly to target?58. Why we use stored procedure transformation?59. Which object is required by the debugger to create a valid debug session?60. Can we use an active transformation after update strategy transformation?61. Explain how we set the update strategy transformation at the mapping level and at the session level?62. What is exact use of 'Online' and 'Offline' server connect Options while defining Work flow in Work flow monitor?

The system hangs when 'Online' Server connect option. The Informatica is installed on a Personal laptop.63. What is change data capture?64. Write a session parameter file which will change the source and targets for every session. i.e different source and

targets for each session run ?65. What are partition points?66. What are the different threads in DTM process?67. Can we do ranking on two ports? If yes explain how?68. What is Transformation?69. What does stored procedure transformation do in special as compared to other transformation?70. How do you recognize whether the newly added rows got inserted or updated?71. What is data cleansing?72. My flat file’s size is 400 MB and I want to see the data inside the FF with out opening it? How do I do that?73. Difference between Filter and Router?74. How do you handle the decimal places when you are importing the flat file?75. What is the difference between $ & $$ in mapping or parameter file? In which case they are generally used?76. While importing the relational source definition from database, what are the meta data of source U import?77. Difference between Power mart & Power Center?78. What kinds of sources and of targets can be used in Informatica?79. If a sequence generator (with increment of 1) is connected to (say) 3 targets and each target uses the NEXTVAL

port, what value will each target get?80. What do you mean by SQL override?81. What is a shortcut in Informatica?82. How does Informatica do variable initialization? Number/String/Date83. How many different locks are available for repository objects84. What are the transformations that use cache for performance?85. What is the use of Forward/Reject rows in Mapping?86. How many ways you can filter the records?87. How to delete duplicate records from source database/Flat Files? Can we use post sql to delete these records. In

case of flat file, how can you delete duplicates before it starts loading?88. You are required to perform “bulk loading” using Informatica on Oracle, what action would perform at Informatica

+ Oracle level for a successful load?89. What precautions do you need take when you use reusable Sequence generator transformation for concurrent

sessions?90. Is it possible negative increment in Sequence Generator? If yes, how would you accomplish it?91. Which directory Informatica looks for parameter file and what happens if it is missing when start the session? Does

session stop after it starts?92. Informatica is complaining about the server could not be reached? What steps would you take?93. You have more five mappings use the same lookup. How can you manage the lookup?94. What will happen if you copy the mapping from one repository to another repository and if there is no identical

source?95. How can you limit number of running sessions in a workflow?96. An Aggregate transformation has 4 ports (l sum (col 1), group by col 2, col3), which port should be the output?97. What is a dynamic lookup and what is the significance of NewLookupRow? How will use them for rejecting

duplicate records?98. If you have more than one pipeline in your mapping how will change the order of load?99. When you export a workflow from Repository Manager, what does this xml contain? Workflow only?100.  Your session failed and when you try to open a log file, it complains that the session details are not

available. How would do trace the error? What log file would you seek for?101. You want to attach a file as an email attachment from a particular directory using ‘email task’ in

Informatica, How will you do it?102.  You have a requirement to alert you of any long running sessions in your workflow. How can you create a

workflow that will send you email for sessions running more than 30 minutes. You can use any method, shell script, procedure or Informatica mapping or workflow control?

Data warehousing Concepts Based Interview Questions

1. What is a data-warehouse?

Page 51: Complete Reference to Informatica

2. What are Data Marts?

3. What is ER Diagram?

4. What is a Star Schema?

5. What is Dimensional Modelling?

6. What Snow Flake Schema?

7. What are the Different methods of loading Dimension tables?

8. What are Aggregate tables?

9. What is the Difference between OLTP and OLAP?

10. What is ETL?

11. What are the various ETL tools in the Market?

12. What are the various Reporting tools in the Market?

13. What is Fact table?

14. What is a dimension table?

15. What is a lookup table?

16. What is a general purpose scheduling tool? Name some of them?

17. What are modeling tools available in the Market? Name some of them?

18. What is real time data-warehousing?

19. What is data mining?

20. What is Normalization? First Normal Form, Second Normal Form , Third Normal Form?

21. What is ODS?

22. What type of Indexing mechanism do we need to use for a typical

Data warehouse?

23. Which columns go to the fact table and which columns go the dimension table? (My user needs to see <data element<data element broken by <data element<data element>

All elements before broken = Fact Measures

All elements after broken = Dimension Elements

24. What is a level of Granularity of a fact table? What does this signify?(Weekly level summarization there is no need to have Invoice Number in the fact table anymore)

25. How are the Dimension tables designed? De-Normalized, Wide, Short, Use Surrogate Keys, Contain Additional date fields and flags.

26. What are slowly changing dimensions?

27. What are non-additive facts? (Inventory,Account balances in bank)

28. What are conformed dimensions?

29. What is VLDB? (Database is too large to back up in a time frame then it's a VLDB)

30. What are SCD1, SCD2 and SCD3?