informatica interview questioner-ambarish

Informatica Questionnaire

Business Intelligence ambarish Page 1 of 211


1. What are the components of Informatica? And what is the purpose of each? Ans: Informatica Designer, Server Manager & Repository Manager. Designer for Creating Source & Target definitions, Creating Mapplets and Mappings etc. Server Manager for creating sessions & batches, Scheduling the sessions & batches, Monitoring the triggered sessions and batches, giving post and pre session commands, creating database connections to various instances etc. Repository Manage for Creating and Adding repositories, Creating & editing folders within a repository, Establishing users, groups, privileges & folder permissions, Copy, delete, backup a repository, Viewing the history of sessions, Viewing the locks on various objects and removing those locks etc. 2. What is a repository? And how to add it in an informatica client? Ans: Its a location where all the mappings and sessions related information is stored. Basically its a database where the metadata resides. We can add a repository through the Repository manager. 3. Name atleast 5 different types of transformations used in mapping designing and state the

use of each. Ans: Source Qualifier Source Qualifier represents all data queries from the source, Expression Expression performs simple calculations, Filter Filter serves as a conditional filter, Lookup Lookup looks up values and passes to other objects, Aggregator - Aggregator performs aggregate calculations. 4. How can a transformation be made reusable? Ans: In the edit properties of any transformation there is a check box to make it reusable, by checking that it becomes reusable. You can even create reusable transformations in Transformation developer. 5. How are the sources and targets definitions imported in informatica designer? How to

create Target definition for flat files? Ans: When you are in source analyzer there is a option in main menu to Import the source from Database, Flat File, Cobol File & XML file, by selecting any one of them you can import a source definition. When you are in Warehouse Designer there is an option in main menu to import the target from Database, XML from File and XML from sources you can select any one of these. There is no way to import target definition as file in Informatica designer. So while creating the target definition for a file in the warehouse designer it is created considering it as a table, and then in the session properties of that mapping it is specified as file. 6. Explain what is sql override for a source table in a mapping.

Ans: The Source Qualifier provides the SQL Query option to override the default query. You can enter any SQL statement supported by your source database. You might enter your own SELECT statement, or have the database perform aggregate calculations, or call a stored procedure or stored function to read the data and perform some tasks.

7. What is lookup override?



Ans: This feature is similar to entering a custom query in a Source Qualifier transformation. When entering a Lookup SQL Override, you can enter the entire override, or generate and edit the default SQL statement. The lookup query override can include WHERE clause. 8. What are mapplets? How is it different from a Reusable Transformation?

Ans: A mapplet is a reusable object that represents a set of transformations. It allows you to reuse transformation logic and can contain as many transformations as you need. You create mapplets in the Mapplet Designer. Its different than a reusable transformation as it may contain a set of transformations, while a reusable transformation is a single one.

9. How to use an oracle sequence generator in a mapping? Ans: We have to write a stored procedure, which can take the sequence name as input and dynamically generates a nextval from that sequence. Then in the mapping we can use that stored procedure through a procedure transformation. 10. What is a session and how to create it?

Ans: A session is a set of instructions that tells the Informatica Server how and when to move data from sources to targets. You create and maintain sessions in the Server Manager.

11. How to create the source and target database connections in server manager? Ans: In the main menu of server manager there is menu Server Configuration, in that there is the menu Database connections. From here you can create the Source and Target database connections. 12. Where are the source flat files kept before running the session? Ans: The source flat files can be kept in some folder on the Informatica server or any other machine, which is in its domain. 13. What are the oracle DML commands possible through an update strategy? Ans: dd_insert, dd_update, dd_delete & dd_reject. 14. How to update or delete the rows in a target, which do not have key fields? Ans: To Update a table that does not have any Keys we can do a SQL Override of the Target Transformation by specifying the WHERE conditions explicitly. Delete cannot be done this way. In this case you have to specifically mention the Key for Target table definition on the Target transformation in the Warehouse Designer and delete the row using the Update Strategy transformation. 15. What is option by which we can run all the sessions in a batch simultaneously? Ans: In the batch edit box there is an option called concurrent. By checking that all the sessions in that Batch will run concurrently. 16. Informatica settings are available in which file? Ans: Informatica settings are available in a file pmdesign.ini in Windows folder. 17. How can we join the records from two heterogeneous sources in a mapping? Ans: By using a joiner.



18. Difference between Connected & Unconnected look-up. Ans: An unconnected Lookup transformation exists separate from the pipeline in the mapping. You write an expression using the :LKP reference qualifier to call the lookup within another transformation. While the connected lookup forms a part of the whole flow of mapping. 19. Difference between Lookup Transformation & Unconnected Stored Procedure

Transformation Which one is faster ? 20. Compare Router Vs Filter & Source Qualifier Vs Joiner. Ans : A Router transformation has input ports and output ports. Input ports reside in the input group, and output ports reside in the output groups. Here you can test data based on one or more group filter conditions. But in filter you can filter data based on one or more conditions before writing it to targets. A source qualifier can join data coming from same source database. While a joiner is used to combine data from heterogeneous sources. It can even join data from two tables from same database. A source qualifier can join more than two sources. But a joiner can join only two sources. 21. How to Join 2 tables connected to a Source Qualifier w/o having any relationship defined

? Ans : By writing an sql override. 22. In a mapping there are 2 targets to load header and detail, how to ensure that header loads

first then detail table. Ans : Constraint Based Loading (if no relationship at oracle level) OR Target Load Plan(if only 1 source qualifier for both tables) OR select first the header target table and then the detail table while dragging them in mapping. 23. A mapping just take 10 seconds to run, it takes a source file and insert into target, but

before that there is a Stored Procedure transformation which takes around 5 minutes to run and gives output Y or N. If Y then continue feed or else stop the feed. (Hint : since SP transformation takes more time compared to the mapping, it shouldnt run row wise).

Ans : There is an option to run the stored procedure before starting to load the rows.

1.Can 2 Fact Tables share same dimensions Tables? How many Dimension tables are

associated with one Fact Table ur project? Ans: Yes 2.What is ROLAP, MOLAP, and DOLAP...? Ans: ROLAP (Relational OLAP), MOLAP (Multidimensional OLAP), and DOLAP (Desktop OLAP). In these three OLAP architectures, the interface to the analytic layer is typically the same; what is quite different is how the data is physically stored. In MOLAP, the premise is that online analytical processing is best implemented by storing the data multidimensionally; that is, data must be stored multidimensionally in order to be viewed in a multidimensional manner. In ROLAP, architects believe to store the data in the relational model; for instance, OLAP capabilities are best provided



against the relational database. DOLAP, is a variation that exists to provide portability for the OLAP user. It creates multidimensional datasets that can be transferred from server to desktop, requiring only the DOLAP software to exist on the target system. This provides significant advantages to portable computer users, such as salespeople who are frequently on the road and do not have direct access to their office server. 3.What is an MDDB? and What is the difference between MDDBs and RDBMSs? Ans: Multidimensional Database There are two primary technologies that are used for storing the data used in OLAP applications. These two technologies are multidimensional databases (MDDB) and relational databases (RDBMS). The major difference between MDDBs and RDBMSs is in how they store data. Relational databases store their data in a series of tables and columns. Multidimensional databases, on the other hand, store their data in a large multidimensional arrays. For example, in an MDDB world, you might refer to a sales figure as Sales with Date, Product, and Location coordinates of

12-1-2001, Car, and south, respectively. Advantages of MDDB: Retrieval is very fast because The data corresponding to any combination of dimension members can be retrieved with

a single I/O. Data is clustered compactly in a multidimensional array. Values are caluculated ahead of time. The index is small and can therefore usually reside completely in memory. Storage is very efficient because The blocks contain only data. A single index locates the block corresponding to a combination of sparse dimension

numbers. 4. What is MDB modeling and RDB Modeling? Ans: 5. What is Mapplet and how do u create Mapplet? Ans: A mapplet is a reusable object that represents a set of transformations. It allows you to reuse transformation logic and can contain as many transformations as you need. Create a mapplet when you want to use a standardized set of transformation logic in several mappings. For example, if you have a several fact tables that require a series of dimension keys, you can create a mapplet containing a series of Lookup transformations to find each dimension key. You can then use the mapplet in each fact table mapping, rather than recreate the same lookup logic in each mapping. To create a new mapplet: 1. In the Mapplet Designer, choose Mapplets-Create Mapplet.



2. Enter a descriptive mapplet name. The recommended naming convention for mapplets is mpltMappletName. 3. Click OK. The Mapping Designer creates a new mapplet in the Mapplet Designer. 4. Choose Repository-Save. 6. What for is the transformations are used? Ans: Transformations are the manipulation of data from how it appears in the source system(s) into another form in the data warehouse or mart in a way that enhances or simplifies its meaning. In short, u transform data into information. This includes Datamerging, Cleansing, Aggregation: - Datamerging: Process of standardizing data types and fields. Suppose one source system calls integer type data as smallint where as another calls similar data as decimal. The data from the two source systems needs to rationalized when moved into the oracle data format called number. Cleansing: This involves identifying any changing inconsistencies or inaccuracies.

- Eliminating inconsistencies in the data from multiple sources. - Converting data from different systems into single consistent data set suitable for

analysis. - Meets a standard for establishing data elements, codes, domains, formats and naming

conventions. - Correct data errors and fills in for missing data values. Aggregation: The process where by multiple detailed values are combined into a single summary value typically summation numbers representing dollars spend or units sold. - Generate summarized data for use in aggregate fact and dimension tables.

Data Transformation is an interesting concept in that some transformation can occur during the extract, some during the transformation, or even in limited cases--- during load portion of the ETL process. The type of transformation function u need will most often determine where it should be performed. Some transformation functions could even be performed in more than one place. Bze many of the transformations u will want to perform already exist in some form or another in more than one of the three environments (source database or application, ETL tool, or the target db). 7. What is the difference btween OLTP & OLAP? Ans: OLTP stand for Online Transaction Processing. This is standard, normalized database structure. OLTP is designed for

Transactions, which means that inserts, updates, and deletes must be fast. Imagine a call center that takes orders. Call takers are continually taking calls and entering orders that may contain numerous items. Each order and each item must be inserted into a database. Since the performance of database is critical, we want to maximize the speed of inserts (and updates and deletes). To maximize performance, we typically try to hold as few records in the database as possible. OLAP stands for Online Analytical Processing. OLAP is a term that means many things to many people. Here, we will use the term OLAP and Star Schema pretty much



interchangeably. We will assume that star schema database is an OLAP system.( This is not the same thing that Microsoft calls OLAP; they extend OLAP to mean the cube structures built using their product, OLAP Services). Here, we will assume that any system of read-only, historical, aggregated data is an OLAP system. A data warehouse(or mart) is way of storing data for later retrieval. This retrieval is almost always used to support decision-making in the organization. That is why many data warehouses are considered to be DSS (Decision-Support Systems). Both a data warehouse and a data mart are storage mechanisms for read-only, historical, aggregated data. By read-only, we mean that the person looking at the data wont be changing it. If a user wants at the sales yesterday for a certain product, they should not have the ability to change that number. The historical part may just be a few minutes old, but usually it is at least a day old.A data warehouse usually holds data that goes back a certain period in time, such as five years. In contrast, standard OLTP systems usually only hold data as long as it is current or active. An order table, for example, may move orders to an archive table once they have been completed, shipped, and received by the customer. When we say that data warehouses and data marts hold aggregated data, we need to stress that there are many levels of aggregation in a typical data warehouse.

8. If data source is in the form of Excel Spread sheet then how do use? Ans: PowerMart and PowerCenter treat a Microsoft Excel source as a relational database, not a flat file. Like relational sources, the Designer uses ODBC to import a Microsoft Excel source. You do not need database permissions to import Microsoft Excel sources. To import an Excel source definition, you need to complete the following tasks:

Install the Microsoft Excel ODBC driver on your system. Create a Microsoft Excel ODBC data source for each source file in the ODBC 32-bit

Administrator. Prepare Microsoft Excel spreadsheets by defining ranges and formatting columns of

numeric data. Import the source definitions in the Designer. Once you define ranges and format cells, you can import the ranges in the Designer. Ranges display as source definitions when you import the source.

9. Which db is RDBMS and which is MDDB can u name them? Ans: MDDB ex. Oracle Express Server(OES), Essbase by Hyperion Software, Powerplay by Cognos and RDBMS ex. Oracle , SQL Server etc. 10. What are the modules/tools in Business Objects? Explain theier purpose briefly? Ans: BO Designer, Business Query for Excel, BO Reporter, Infoview,Explorer,WEBI, BO Publisher, and Broadcast Agent, BO ZABO).



InfoView: IT portal entry into WebIntelligence & Business Objects. Base module required for all options to view and refresh reports. Reporter: Upgrade to create/modify reports on LAN or Web. Explorer: Upgrade to perform OLAP processing on LAN or Web. Designer: Creates semantic layer between user and database. Supervisor: Administer and control access for group of users. WebIntelligence: Integrated query, reporting, and OLAP analysis over the Web. Broadcast Agent: Used to schedule, run, publish, push, and broadcast pre-built reports and spreadsheets, including event notification and response capabilities, event filtering, and calendar based notification, over the LAN, e- mail, pager,Fax, Personal Digital Assistant( PDA), Short Messaging Service(SMS), etc. Set Analyzer - Applies set-based analysis to perform functions such as execlusion, intersections, unions, and overlaps visually. Developer Suite Build packaged, analytical, or customized apps. 11.What are the Ad hoc quries, Canned Quries/Reports? and How do u create them? (Plz check this pageC\:BObjects\Quries\Data Warehouse - About Queries.htm) Ans: The data warehouse will contain two types of query. There will be fixed queries that are

clearly defined and well understood, such as regular reports, canned queries (standard reports) and common aggregations. There will also be ad hoc queries that are unpredictable, both in quantity and frequency. Ad Hoc Query: Ad hoc queries are the starting point for any analysis into a database. Any business analyst wants to know what is inside the database. He then proceeds by calculating totals, averages, maximum and minimum values for most attributes within the database. These are unpredictable element of a data warehouse. It is exactly that ability to run any query when desired and expect a reasonable response that makes the data warhouse worthwhile, and makes the design such a significant challenge. The end-user access tools are capable of automatically generating the database query that answers any Question posed by the user. The user will typically pose questions in terms that they are familier with (for example, sales by store last week); this is converted into the database query by the access tool, which is aware of the structure of information within the data warehouse.

Canned queries: Canned queries are predefined queries. In most instances, canned queries contain prompts that allow you to customize the query for your specific needs. For example, a prompt may ask you for a School, department, term, or section ID. In this instance you would enter the name of the School, department or term, and the query will retrieve the specified data from the Warehouse.You can measure resource requirements of these queries, and the results can be used for capacity palnning and for database design.

The main reason for using a canned query or report rather than creating your own is that your chances of misinterpreting data or getting the wrong answer are reduced. You are assured of getting the right data and the right answer.

12. How many Fact tables and how many dimension tables u did? Which table precedes what?



Ans: http://www.ciobriefings.com/whitepapers/StarSchema.asp 13. What is the difference between STAR SCHEMA & SNOW FLAKE SCHEMA? Ans: http://www.ciobriefings.com/whitepapers/StarSchema.asp

14. Why did u choose STAR SCHEMA only? What are the benefits of STAR SCHEMA? Ans: Because its denormalized structure , i.e., Dimension Tables are denormalized. Why to denormalize means the first (and often only) answer is : speed. OLTP structure is designed for data inserts, updates, and deletes, but not data retrieval. Therefore, we can often squeeze some speed out of it by denormalizing some of the tables and having queries go against fewer tables. These queries are faster because they perform fewer joins to retrieve the same recordset. Joins are also confusing to many End users. By denormalizing, we can present the user with a view of the data that is far easier for them to understand. Benefits of STAR SCHEMA:

Far fewer Tables. Designed for analysis across time. Simplifies joins. Less database space. Supports drilling in reports. Flexibility to meet business and technical needs.

15. How do u load the data using Informatica? Ans: Using session.

16. (i) What is FTP? (ii) How do u connect to remote? (iii) Is there another way to use FTP without a special utility? Ans: (i): The FTP (File Transfer Protocol) utility program is commonly used for copying files to and from other computers. These computers may be at the same site or at different sites thousands of miles apart. FTP is general protocol that works on UNIX systems as well as other non- UNIX systems. (ii): Remote connect commands: ftp machinename ex: ftp 129.82.45.181 or ftp iesg If the remote machine has been reached successfully, FTP responds by asking for a loginname and password. When u enter ur own loginname and password for the remote machine, it returns the prompt like below ftp> and permits u access to ur own home directory on the remote machine. U should be able to move around in ur own directory and to copy files to and from ur local machine using the FTP interface commands. Note: U can set the mode of file transfer to ASCII ( default and transmits seven bits per character). Use the ASCII mode with any of the following: - Raw Data (e.g. *.dat or *.txt, codebooks, or other plain text documents) - SPSS Portable files. - HTML files.



If u set mode of file transfer to Binary (the binary mode transmits all eight bits per byte and thus provides less chance of a transmission error and must be used to transmit files other than ASCII files). For example use binary mode for the following types of files:

- SPSS System files - SAS Dataset - Graphic files (eg., *.gif, *.jpg, *.bmp, etc.) - Microsoft Office documents (*.doc, *.xls, etc.)

(iii): Yes. If u r using Windows, u can access a text-based FTP utility from a DOS prompt. To do this, perform the following steps:

1. From the Start Programs MS-Dos Prompt 2. Enter ftp ftp.geocities.com. A prompt will appear (or) Enter ftp to get ftp prompt ftp> open hostname ex. ftp>open ftp.geocities.com (It connect to the specified host). 3. Enter ur yahoo! GeoCities member name. 4. enter your yahoo! GeoCities pwd. You can now use standard FTP commands to manage the files in your Yahoo! GeoCities directory.

17.What cmd is used to transfer multiple files at a time using FTP? Ans: mget ==> To copy multiple files from the remote machine to the local machine. You will be prompted for a y/n answer before transferring each file mget * ( copies all files in the current remote directory to ur current local directory,

using the same file names). mput ==> To copy multiple files from the local machine to the remote machine.

18. What is an Filter Transformation? or what options u have in Filter Transformation? Ans: The Filter transformation provides the means for filtering records in a mapping. You pass all the rows from a source

transformation through the Filter transformation, then enter a filter condition for the transformation. All ports in a Filter

transformation are input/output, and only records that meet the condition pass through the Filter transformation.

Note: Discarded rows do not appear in the session log or reject files To maximize session performance, include the Filter transformation as close to the sources in the mapping as possible. Rather than passing records you plan to discard through the mapping, you then filter out unwanted data early in the flow of data from sources to targets.

You cannot concatenate ports from more than one transformation into the Filter transformation; the input ports for the filter

must come from a single transformation. Filter transformations exist within the flow of the mapping and cannot be

unconnected. The Filter transformation does not allow setting output default values.



19.What are default sources which will supported by Informatica Powermart ? Ans :

Relational tables, views, and synonyms. Fixed-width and delimited flat files that do not contain binary data. COBOL files.

20. When do u create the Source Definition ? Can I use this Source Defn to any Transformation? Ans: When working with a file that contains fixed-width binary data, you must create the source definition.

The Designer displays the source definition as a table, consisting of names, datatypes, and constraints. To use a source

definition in a mapping, connect a source definition to a Source Qualifier or Normalizer transformation. The Informatica

Server uses these transformations to read the source data. 21. What is Active & Passive Transformation ? Ans: Active and Passive Transformations Transformations can be active or passive. An active transformation can change the number of records passed through it. A passive transformation never changes the record count. For example, the Filter transformation removes rows that do not meet the filter condition defined in the transformation.

Active transformations that might change the record count include the following: Advanced External Procedure Aggregator Filter Joiner Normalizer Rank Source Qualifier

Note: If you use PowerConnect to access ERP sources, the ERP Source Qualifier is also an active transformation. /* You can connect only one of these active transformations to the same transformation or target, since the Informatica Server cannot determine how to concatenate data from different sets of records with different numbers of rows. */ Passive transformations that never change the record count include the following:

Lookup Expression External Procedure Sequence Generator Stored Procedure Update Strategy

You can connect any number of these passive transformations, or connect one active

transformation with any number of passive transformations, to the same transformation or target.



22. What is staging Area and Work Area? Ans: Staging Area : - - Holding Tables on DW Server. - Loaded from Extract Process - Input for Integration/Transformation - May function as Work Areas - Output to a work area or Fact Table Work Area: - - Temporary Tables - Memory 23. What is Metadata? (plz refer DATA WHING IN THE REAL WORLD BOOK page # 125) Ans: Defn: Data About Data Metadata contains descriptive data for end users. In a data warehouse the term metadata is used in a number of different situations.

Metadata is used for: Data transformation and load Data management Query management

Data transformation and load: Metadata may be used during data transformation and load to describe the source data and any changes that need to be made. The advantage of storing metadata about the data being transformed is that as source data changes the changes can be captured in the metadata, and transformation programs automatically regenerated. For each source data field the following information is reqd: Source Field:

Unique identifier (to avoid any confusion occurring betn 2 fields of the same anme from different sources).

Name (Local field name). Type (storage type of data, like character,integer,floating pointand so on). Location

- system ( system it comes from ex.Accouting system). - object ( object that contains it ex. Account Table).

The destination field needs to be described in a similar way to the source: Destination:

Unique identifier Name Type (database data type, such as Char, Varchar, Number and so on). Tablename (Name of the table th field will be part of).

The other information that needs to be stored is the transformation or transformations that need to be applied to turn the source data into the destination data:

Transformation: Transformation (s)

- Name - Language (name of the lanjuage that transformation is written in). - module name

- syntax



The Name is the unique identifier that differentiates this from any other similar transformations. The Language attribute contains the name of the lnguage that the transformation is written in.

The other attributes are module name and syntax. Generally these will be mutually exclusive, with only one being defined. For simple transformations such as simple SQL functions the syntax will be stored. For complex transformations the name of the module that contains the code is stored instead. Data management:

Metadata is reqd to describe the data as it resides in the data warehouse.This is needed by the

warhouse manager to allow it to track and control all data movements. Every object in the

database needs to be described.

Metadata is needed for all the following: Tables

- Columns - name

- type Indexes

- Columns - name

- type Views

- Columns - name

- type Constraints

- name - type - table

- columns Aggregations, Partition information also need to be stored in Metadata( for details refer page # 30) Query Generation:

Metadata is also required by the query manger to enable it to generate queries. The same metadata can be used by the Whouse manager to describe the data in the data warehouse is also reqd by the query manager.

The query mangaer will also generate metadata about the queries it has run. This metadata can be used to build a history of all quries run and generate a query profile for each user, group of users and the data warehouse as a whole. The metadata that is reqd for each query is:

- query - tables accessed - columns accessed - name

- refence identifier - restrictions applied



- column name - table name - reference identifier - restriction - join Criteria applied - aggregate functions used - group by criteria - sort criteria

- syntax - execution plan - resources

24. What kind of Unix flavoures u r experienced? Ans: Solaris 2.5 SunOs 5.5 (Operating System) Solaris 2.6 SunOs 5.6 (Operating System) Solaris 2.8 SunOs 5.8 (Operating System) AIX 4.0.3 5.5.1 2.5.1 May 96 sun4c, sun4m, sun4d, sun4u, x86, ppc 5.6 2.6 Aug. 97 sun4c, sun4m, sun4d, sun4u, x86 5.7 7 Oct. 98 sun4c, sun4m, sun4d, sun4u, x86 5.8 8 2000 sun4m, sun4d, sun4u, x86 25. What are the tasks that are done by Informatica Server? Ans:The Informatica Server performs the following tasks:

Manages the scheduling and execution of sessions and batches Executes sessions and batches Verifies permissions and privileges Interacts with the Server Manager and pmcmd.

The Informatica Server moves data from sources to targets based on metadata stored in a repository. For instructions on how to move and transform data, the Informatica Server reads a mapping (a type of metadata that includes transformations and source and target definitions). Each mapping uses a session to define additional information and to optionally override mapping-level options. You can group multiple sessions to run as a single unit, known as a batch.

26. What are the two programs that communicate with the Informatica Server? Ans: Informatica provides Server Manager and pmcmd programs to communicate with the Informatica Server:



Server Manager. A client application used to create and manage sessions and batches, and to monitor and stop the Informatica Server. You can use information provided through the Server Manager to troubleshoot sessions and improve session performance. pmcmd. A command-line program that allows you to start and stop sessions and batches, stop the Informatica Server, and verify if the Informatica Server is running.

27. When do u reinitialize Aggregate Cache? Ans: Reinitializing the aggregate cache overwrites historical aggregate data with new aggregate data. When you reinitialize the aggregate cache, instead of using the captured changes in source tables, you typically need to use the use the entire source table. For example, you can reinitialize the aggregate cache if the source for a session changes incrementally every day and completely changes once a month. When you receive the new monthly source, you might configure the session to reinitialize the aggregate cache, truncate the existing target, and use the new source table during the session. /? Note: To be clarified when server manger works for following ?/ To reinitialize the aggregate cache: 1.In the Server Manager, open the session property sheet. 2.Click the Transformations tab. 3.Check Reinitialize Aggregate Cache. 4.Click OK three times to save your changes. 5.Run the session. The Informatica Server creates a new aggregate cache, overwriting the existing aggregate cache. /? To be check for step 6 & step 7 after successful run of session ?/ 6.After running the session, open the property sheet again. 7.Click the Data tab. 8.Clear Reinitialize Aggregate Cache. 9.Click OK. 28. (i) What is Target Load Order in Designer? Ans: Target Load Order: - In the Designer, you can set the order in which the Informatica Server sends records to various target definitions in a mapping. This feature is crucial if you want to maintain referential integrity when inserting, deleting, or updating records in tables that have the primary key and foreign key constraints applied to them. The Informatica Server writes data to all the targets connected to the same Source Qualifier or Normalizer simultaneously, to maximize performance.

28. (ii) What are the minimim condition that u need to have so as to use Targte Load Order

Option in Designer? Ans: U need to have Multiple Source Qualifier transformations. To specify the order in which the Informatica Server sends data to targets, create one Source Qualifier or Normalizer



transformation for each target within a mapping. To set the target load order, you then determine the order in which each

Source Qualifier sends data to connected targets in the mapping. When a mapping includes a Joiner transformation, the Informatica Server sends all

records to targets connected to that Joiner at the same time, regardless of the target load order.

28(iii). How do u set the Target load order? Ans: To set the target load order: 1. Create a mapping that contains multiple Source Qualifier transformations. 2. After you complete the mapping, choose Mappings-Target Load Plan. A dialog box lists all Source Qualifier transformations in the mapping, as well as the targets that receive data from each Source Qualifier. 3. Select a Source Qualifier from the list. 4. Click the Up and Down buttons to move the Source Qualifier within the load order. 5. Repeat steps 3 and 4 for any other Source Qualifiers you wish to reorder. 6. Click OK and Choose Repository-Save.

29. What u can do with Repository Manager?

Ans: We can do following tasks using Repository Manager : - To create usernames, you must have one of the following sets of privileges: - Administer Repository privilege

- Super User privilege To create a user group, you must have one of the following privileges : - Administer Repository privilege

- Super User privilege To assign or revoke privileges , u must hv one of the following privilege..

- Administer Repository privilege - Super User privilege

Note: You cannot change the privileges of the default user groups or the default repository users.

30. What u can do with Designer ?

Ans: The Designer client application provides five tools to help you create mappings: Source Analyzer. Use to import or create source definitions for flat file, Cobol, ERP, and

relational sources.

Warehouse Designer. Use to import or create target definitions. Transformation Developer. Use to create reusable transformations. Mapplet Designer. Use to create mapplets. Mapping Designer. Use to create mappings. Note:The Designer allows you to work with multiple tools at one time. You can also work in multiple folders and repositories

31. What are different types of Tracing Levels u hv in Transformations?

Ans: Tracing Levels in Transformations :- Level Description



Terse Indicates when the Informatica Server initializes the session and its components. Summarizes session results, but not at the level of individual records.

Normal Includes initialization information as well as error messages and notification of rejected data.

Verbose initialization Includes all information provided with the Normal setting plus more extensive information about initializing transformations in the session.

Verbose data Includes all information provided with the Verbose initialization setting. Note: By default, the tracing level for every transformation is Normal. To add a slight performance boost, you can also set the tracing level to Terse, writing the minimum of detail to the session log when running a session containing the transformation.

31(i). What the difference is between a database, a data warehouse and a data mart? Ans: -- A database is an organized collection of information. -- A data warehouse is a very large database with special sets of tools to extract and

cleanse data from operational systems and to analyze data. -- A data mart is a focused subset of a data warehouse that deals with a single area of

data and is organized for quick analysis. 32. What is Data Mart, Data WareHouse and Decision Support System explain briefly?

Ans: Data Mart: A data mart is a repository of data gathered from operational data and other sources that is

designed to serve a particular community of knowledge workers. In scope, the data may derive from an enterprise-wide database or data warehouse or be more specialized. The emphasis of a data mart is on meeting the specific demands of a particular group of knowledge users in terms of analysis, content, presentation, and ease-of-use. Users of a data mart can expect to have data presented in terms that are familiar. In practice, the terms data mart and data warehouse each tend to imply the presence of the other in some form. However, most writers using the term seem to agree that the design of a data mart tends to start from an analysis of user needs and that a data warehouse tends to start from an analysis of what data already exists and how it can be collected in such a way that the data can later be used. A data warehouse is a central aggregation of data (which can be distributed physically); a data mart is a data repository that may derive from a data warehouse or not and that emphasizes ease of access and usability for a particular designed purpose. In general, a data warehouse tends to be a strategic but somewhat unfinished concept; a data mart tends to be tactical and aimed at meeting an immediate need. Data Warehouse: A data warehouse is a central repository for all or significant parts of the data that an enterprise's various business systems collect. The term was coined by W. H. Inmon. IBM sometimes uses the term "information warehouse." Typically, a data warehouse is housed on an enterprise mainframe server. Data from various online transaction processing (OLTP) applications and other sources is selectively extracted



and organized on the data warehouse database for use by analytical applications and user queries. Data warehousing emphasizes the capture of data from diverse sources for useful analysis and access, but does not generally start from the point-of-view of the end user or knowledge worker who may need access to specialized, sometimes local databases. The latter idea is known as the data mart. data mining, Web mining, and a decision support system (DSS) are three kinds of applications that can make use of a data warehouse. Decision Support System: A decision support system (DSS) is a computer program application that analyzes business data and presents it so that users can make business decisions more easily. It is an "informational application" (in distinction to an "operational application" that collects the data in the course of normal business operation). Typical information that a decision support application might gather and present would be: Comparative sales figures between one week and the next Projected revenue figures based on new product sales assumptions The consequences of different decision alternatives, given past experience in a context that is described A decision support system may present information graphically and may include an expert system or artificial intelligence (AI). It may be aimed at business executives or some other group of knowledge workers.

33. What r the differences between Heterogeneous and Homogeneous? Ans: Heterogeneous Homogeneous Stored in different Schemas Common structure

Stored in different file or db types Same database type Spread across in several countries Same data center Different platform n H/W config. Same platform and H/Ware configuration.

34. How do you use DDL commands in PL/SQL block ex. Accept table name from user and drop it, if available else display msg? Ans: To invoke DDL commands in PL/SQL blocks we have to use Dynamic SQL, the Package used is DBMS_SQL. 35. What r the steps to work with Dynamic SQL? Ans: Open a Dynamic cursor, Parse SQL stmt, Bind i/p variables (if any), Execute SQL stmt of Dynamic Cursor and

Close the Cursor. 36. Which package, procedure is used to find/check free space available for db objects like table/procedures/views/synonymsetc? Ans: The Package is DBMS_SPACE The Procedure is UNUSED_SPACE The Table is DBA_OBJECTS Note: See the script to find free space @ c:\informatica\tbl_free_space



37. Does informatica allow if EmpId is PKey in Target tbl and source data is 2 rows with same EmpID?If u use lookup for the same situation does it allow to load 2 rows or only 1? Ans: => No, it will not it generates pkey constraint voilation. (it loads 1 row) => Even then no if EmpId is Pkey. 38. If Ename varchar2(40) from 1 source(siebel), Ename char(100) from another source (oracle) and the target is having Name varchar2(50) then how does informatica handles this situation? How Informatica handles string and numbers datatypes sources?

39. How do u debug mappings? I mean where do u attack? 40. How do u qry the Metadata tables for Informatica? 41(i). When do u use connected lookup n when do u use unconnected lookup?

Ans: Connected Lookups : - A connected Lookup transformation is part of the mapping data flow. With connected lookups, you can have multiple return values. That is, you can pass multiple values from the same row in the lookup table out of the Lookup transformation. Common uses for connected lookups include: => Finding a name based on a number ex. Finding a Dname based on deptno => Finding a value based on a range of dates => Finding a value based on multiple conditions Unconnected Lookups : -

An unconnected Lookup transformation exists separate from the data flow in the mapping. You write an expression using the :LKP reference qualifier to call the lookup within another transformation. Some common uses for unconnected lookups include: => Testing the results of a lookup in an expression => Filtering records based on the lookup results => Marking records for update based on the result of a lookup (for example, updating slowly changing dimension tables) => Calling the same lookup multiple times in one mapping

41(ii). What r the differences between Connected lookups and Unconnected lookups?

Ans: Although both types of lookups perform the same basic task, there are some important differences: --------------------------------------------------------------- --------------------------------------------------------------- Connected Lookup Unconnected Lookup --------------------------------------------------------------- --------------------------------------------------------------- Part of the mapping data flow. Separate from the mapping data flow. Can return multiple values from the same row. Returns one value from each row.



You link the lookup/output ports to another You designate the return value with the Return port (R). transformation. Supports default values. Does not support default values. If there's no match for the lookup condition, the If there's no match for the lookup condition, the server server returns the default value for all output ports. returns NULL. More visible. Shows the data passing in and out Less visible. You write an expression using :LKP to tell of the lookup. the server when to perform the lookup.

Cache includes all lookup columns used in the Cache includes lookup/output ports in the Lookup condition

mapping (that is, lookup table columns included and lookup/return port.

in the lookup condition and lookup table

columns linked as output ports to other

transformations).

42. What u need concentrate after getting explain plan? Ans: The 3 most significant columns in the plan table are named OPERATION,OPTIONS, and OBJECT_NAME.For each step, these tell u which operation is going to be performed and which object is the target of that operation. Ex:- ************************** TO USE EXPLAIN PLAN FOR A QRY... ************************** SQL> EXPLAIN PLAN 2 SET STATEMENT_ID = 'PKAR02' 3 FOR 4 SELECT JOB,MAX(SAL) 5 FROM EMP 6 GROUP BY JOB 7 HAVING MAX(SAL) >= 5000; Explained. ************************** TO QUERY THE PLAN TABLE :- ************************** SQL> SELECT RTRIM(ID)||' '|| 2 LPAD(' ', 2*(LEVEL-1))||OPERATION 3 ||' '||OPTIONS 4 ||' '||OBJECT_NAME STEP_DESCRIPTION 5 FROM PLAN_TABLE 6 START WITH ID = 0 AND STATEMENT_ID = 'PKAR02' 7 CONNECT BY PRIOR ID = PARENT_ID 8 AND STATEMENT_ID = 'PKAR02' 9 ORDER BY ID;



STEP_DESCRIPTION ---------------------------------------------------- 0 SELECT STATEMENT 1 FILTER 2 SORT GROUP BY 3 TABLE ACCESS FULL EMP 43. How components are interfaced in Psoft? Ans:

44. How do u do the analysis of an ETL? Ans:

============================================================== 45. What is Standard, Reusable Transformation and Mapplet? Ans: Mappings contain two types of transformations, standard and reusable. Standard

transformations exist within a single mapping. You cannot reuse a standard transformation you created in another mapping, nor

can you create a shortcut to that transformation. However, often you want to create transformations that perform common tasks, such as calculating the average salary in a department. Since a standard transformation cannot be used by more than one mapping, you have to set up the same transformation each time you want to calculate the average salary in a department.

Mapplet: A mapplet is a reusable object that represents a set of transformations. It allows you to reuse transformation logic and can contain as many transformations as you need. A mapplet can contain transformations, reusable transformations, and shortcuts to transformations.

46. How do u copy Mapping, Repository, Sessions? Ans: To copy an object (such as a mapping or reusable transformation) from a shared folder, press the Ctrl key and drag and drop the mapping into the destination folder.

To copy a mapping from a non-shared folder, drag and drop the mapping into the destination folder. In both cases, the destination folder must be open with the related tool active. For example, to copy a mapping, the Mapping Designer must be active. To copy a Source Definition, the Source Analyzer must be active.

Copying Mapping: To copy the mapping, open a workbook. In the Navigator, click and drag the mapping slightly to the right, not dragging it to the

workbook. When asked if you want to make a copy, click Yes, then enter a new name and click OK. Choose Repository-Save.



Repository Copying: You can copy a repository from one database to another. You use this feature before upgrading, to preserve the original repository. Copying repositories provides a quick way to copy all metadata you want to use as a basis for a new repository.

If the database into which you plan to copy the repository contains an existing repository, the Repository Manager deletes the existing repository. If you want to preserve the old repository, cancel the copy. Then back up the existing repository before copying the new repository. To copy a repository, you must have one of the following privileges:

Administer Repository privilege Super User privilege

To copy a repository:

1. In the Repository Manager, choose Repository-Copy Repository. 2. Select a repository you wish to copy, then enter the following information: -------------------------------- --------------------------- ------------------------------------------------- Copy Repository Field Required/ Optional Description -------------------------------- --------------------------- ------------------------------------------------- Repository Required Name for the repository copy. Each repository name must be unique within the domain and should be easily distinguished from all other repositories. Database Username Required Username required to connect to the database.

This login must have the appropriate database permissions to create the

repository. Database Password Required Password associated with the database username.Must be in US-ASCII. ODBC Data Source Required Data source used to connect to the database. Native Connect String Required Connect string identifying the location of the database. Code Page Required Character set associated with the repository. Must be a superset of the code

page of the repository you want to copy.



If you are not connected to the repository you want to copy, the Repository Manager asks you to log in. 3. Click OK. 5. If asked whether you want to delete an existing repository data in the second repository,

click OK to delete it. Click Cancel to preserve the existing repository. Copying Sessions:

In the Server Manager, you can copy stand-alone sessions within a folder, or copy sessions in and out of batches. To copy a session, you must have one of the following:

Create Sessions and Batches privilege with read and write permission Super User privilege

To copy a session: 1. In the Server Manager, select the session you wish to copy. 2. Click the Copy Session button or choose Operations-Copy Session. The Server Manager makes a copy of the session. The Informatica Server names the copy after the original session, appending a number, such as session_name1.

47. What are shortcuts, and what is advantage?

Ans: Shortcuts allow you to use metadata across folders without making copies, ensuring uniform metadata. A shortcut inherits all properties of the object to which it points. Once you create a shortcut, you can configure the shortcut name and description. When the object the shortcut references changes, the shortcut inherits those changes. By using a shortcut instead of a copy, you ensure each use of the shortcut exactly matches the original object. For example, if you have a shortcut to a target definition, and you add a column to the definition, the shortcut automatically inherits the additional column. Shortcuts allow you to reuse an object without creating multiple objects in the repository. For example, you use a source definition in ten mappings in ten different folders. Instead of creating 10 copies of the same source definition, one in each folder, you can create 10 shortcuts to the original source definition. You can create shortcuts to objects in shared folders. If you try to create a shortcut to a non-shared folder, the Designer creates a copy of the object instead. You can create shortcuts to the following repository objects:

Source definitions Reusable transformations Mapplets Mappings Target definitions Business components

You can create two types of shortcuts: Local shortcut. A shortcut created in the same repository as the original object.



Global shortcut. A shortcut created in a local repository that references an object in a global repository.

Advantages: One of the primary advantages of using a shortcut is maintenance. If you

need to change all instances of an object, you can edit the original repository object. All shortcuts accessing the object

automatically inherit the changes. Shortcuts have the following advantages over copied repository objects:

You can maintain a common repository object in a single location. If you need to edit the object, all shortcuts immediately inherit the changes you make.

You can restrict repository users to a set of predefined metadata by asking users to incorporate the shortcuts into their work instead of developing repository objects independently.

You can develop complex mappings, mapplets, or reusable transformations, then reuse them easily in other folders.

You can save space in your repository by keeping a single repository object and using shortcuts to that object, instead of creating copies of the object in multiple folders or multiple repositories.

48. What are Pre-session and Post-session Options? (Plzz refer Help Using Shell Commands n Post-Session Commands and Email) Ans: The Informatica Server can perform one or more shell commands before or after the session runs. Shell commands are operating system commands. You can use pre- or post- session shell commands, for example, to delete a reject file or

session log, or to archive target files before the session begins.

The status of the shell command, whether it completed successfully or failed, appears in the session log file. To call a pre- or post-session shell command you must:

1. Use any valid UNIX command or shell script for UNIX servers, or any valid DOS or batch file for Windows NT servers.

2. Configure the session to execute the pre- or post-session shell commands. You can configure a session to stop if the Informatica Server encounters an error while executing pre-session shell commands. For example, you might use a shell command to copy a file from one directory to another. For a Windows NT server you would use the following shell command to copy the SALES_ ADJ file from the target directory, L, to the source, H: copy L:\sales\sales_adj H:\marketing\ For a UNIX server, you would use the following command line to perform a similar operation: cp sales/sales_adj marketing/ Tip: Each shell command runs in the same environment (UNIX or Windows NT) as the Informatica Server. Environment settings in one shell command script do not carry over to other scripts. To run all shell commands in the same environment, call a single shell script that in turn invokes other scripts.



49. What are Folder Versions? Ans: In the Repository Manager, you can create different versions within a folder to help you

archive work in development. You can copy versions to other folders as well. When you save a version, you save all metadata at a particular point in development. Later versions contain new or modified metadata, reflecting work that you have completed since the last version.

Maintaining different versions lets you revert to earlier work when needed. By archiving the

contents of a folder into a version each time you reach a development landmark, you can access those versions if later edits prove unsuccessful.

You create a folder version after completing a version of a difficult mapping, then continue

working on the mapping. If you are unhappy with the results of subsequent work, you can revert to the previous version, then create a new version to continue development. Thus you keep the landmark version intact, but available for regression.

Note: You can only work within one version of a folder at a time. 50. How do automate/schedule sessions/batches n did u use any tool for automating Sessions/batch? Ans: We scheduled our sessions/batches using Server Manager. You can either schedule a session to run at a given time or interval, or you can manually start the session. U needto hv create sessions n batches with Read n Execute permissions or super user privilege.

If you configure a batch to run only on demand, you cannot schedule it.

Note: We did not use any tool for automation process. 51. What are the differences between 4.7 and 5.1 versions? Ans: New Transformations added like XML Transformation and MQ Series Transformation, and PowerMart and PowerCenter both are same from 5.1version. 52. What r the procedure that u need to undergo before moving Mappings/sessions from Testing/Development to Production? Ans: 53. How many values it (informatica server) returns when it passes thru Connected Lookup n Unconncted Lookup? Ans: Connected Lookup can return multiple values where as Unconnected Lookup will return only one values that is Return Value. 54. What is the difference between PowerMart and PowerCenter in 4.7.2? Ans: If You Are Using PowerCenter

PowerCenter allows you to register and run multiple Informatica Servers against the same repository. Because you can run

these servers at the same time, you can distribute the repository session load across available servers to improve overall



performance. With PowerCenter, you receive all product functionality, including distributed metadata, the ability to organize repositories into a data mart domain and share metadata across repositories.

A PowerCenter license lets you create a single repository that you can configure as a global repository, the core component of a data warehouse. If You Are Using PowerMart

This version of PowerMart includes all features except distributed metadata and multiple registered servers. Also, the various

options available with PowerCenter (such as PowerCenter Integration Server for BW, PowerConnect for IBM DB2,

PowerConnect for SAP R/3, and PowerConnect for PeopleSoft) are not available with PowerMart. 55. What kind of modifications u can do/perform with each Transformation? Ans: Using transformations, you can modify data in the following ways: ----------------- ------------------------ Task Transformation ----------------- ------------------------ Calculate a value Expression Perform an aggregate calculations Aggregator Modify text Expression Filter records Filter, Source Qualifier Order records queried by the Informatica Server Source Qualifier Call a stored procedure Stored Procedure Call a procedure in a shared library or in the External Procedure COM layer of Windows NT Generate primary keys Sequence Generator Limit records to a top or bottom range Rank Normalize records, including those read Normalizer from COBOL sources Look up values Lookup Determine whether to insert, delete, update, Update Strategy or reject records Join records from different databases Joiner or flat file systems 56. Expressions in Transformations, Explain briefly how do u use? Ans: Expressions in Transformations To transform data passing through a transformation, you can write an expression. The

most obvious examples of these are the Expression and Aggregator transformations, which perform calculations on either

single values or an entire range of values within a port. Transformations that use expressions include the following: --------------------- ------------------------------------------ Transformation How It Uses Expressions --------------------- ------------------------------------------

Expression Calculates the result of an expression for each row passing through the transformation, using values from one or more ports.



Aggregator Calculates the result of an aggregate expression, such as a sum or average, based on all data passing through a port or on groups within that data.

Filter Filters records based on a condition you enter using an expression.

Rank Filters the top or bottom range of records, based on a condition you enter using an expression.

Update Strategy Assigns a numeric code to each record based on an expression, indicating whether the Informatica Server should use the information in the record to insert, delete, or update the target.

In each transformation, you use the Expression Editor to enter the expression. The

Expression Editor supports the transformation language for building expressions. The transformation language uses SQL-like functions, operators, and other components to build the expression. For example, as in SQL, the transformation language includes the functions COUNT and SUM. However, the PowerMart/PowerCenter transformation language includes additional functions not found in SQL.

When you enter the expression, you can use values available through ports. For example,

if the transformation has two input ports representing a price and sales tax rate, you can calculate the final sales tax using these two values. The ports used in the expression can appear in the same transformation, or you can use output ports in other transformations. 57. In case of Flat files (which comes thru FTP as source) has not arrived then what

happens?Where do u set this option? Ans: U get an fatel error which cause server to fail/stop the session. U can set Event-Based Scheduling Option in Session Properties under General tab--

>Advanced options.. ----------------- ------------------- ------------------ Event-Based Required/ Optional Description ----------------- -------------------- ------------------ Indicator File to Wait For Optional Required to use event-based

scheduling. Enter the indicator file (or directory and file) whose arrival schedules the

session. If you do not enter a directory, the Informatica Server assumes the

file appears in the server variable directory $PMRootDir.



58. What is the Test Load Option and when you use in Server Manager? Ans: When testing sessions in development, you may not need to process the entire source.

If this is true, use the Test Load Option(Session Properties General Tab Target Options Choose Target Load

options as Normal (option button), with Test Load cheked (Check box) and No.of rows to test ex.2000 (Text box with Scrolls)).

You can also click the Start button. -----------------------------------------------------------------------------------------------------------------

-------------------------------------------------------- 59. SCD Type 2 and SGT difference? 60. Differences between 4.7 and 5.1? 61. Tuning Informatica Server for improving performance? Performance Issues? Ans: See /* C:\pkar\Informatica\Performance Issues.doc */ 62. What is Override Option? Which is better? 63. What will happen if u increase buffer size? 64. what will happen if u increase commit Intervals? and also decrease commit Intervals? 65. What kind of Complex mapping u did? And what sort of problems u faced? 66. If u have 10 mappings designed and u need to implement some changes(may be in

existing mapping or new mapping need to be designed) then how much time it takes from easier to complex? 67. Can u refresh Repository in 4.7 and 5.1? and also can u refresh pieces (partially) of

repository in 4.7 and 5.1? 68. What is BI? Ans: http://www.visionnet.com/bi/index.shtml 69. Benefits of BI? Ans: http://www.visionnet.com/bi/bi-benefits.shtml 70. BI Faq Ans: http://www.visionnet.com/bi/bi-faq.shtml 71. What is difference between data scrubbing and data cleansing? Ans: Scrubbing data is the process of cleaning up the junk in legacy data and making it

accurate and useful for the next generations of automated systems. This is perhaps the most difficult of all conversion activities.

Very often, this is made more difficult when the customer wants to make good data out of bad data. This is the dog work. It is also

the most important and can not be done without the active participation of the user. DATA CLEANING - a two step process including DETECTION and then

CORRECTION of errors in a data set



72. What is Metadata and Repository?

Ans:

Metadata. Data about data .

It contains descriptive data for end users. Contains data that controls the ETL processing. Contains data about the current state of the data warehouse. ETL updates metadata, to provide the most current state.

Repository. The place where you store the metadata is called a repository. The more sophisticated your repository, the more

complex and detailed metadata you can store in it. PowerMart and PowerCenter use a relational database as the

repository.

73. SQL * LOADER? Ans: http://download-

west.oracle.com/otndoc/oracle9i/901_doc/server.901/a90192/ch03.htm#1004678 74. Debugger in Mapping? 75. Parameters passing in 5.1 vesion exposure? 76. What is the filename which u need to configure in Unix while Installing Informatica? 77. How do u select duplicate rows using Informatica i.e., how do u use

Max(Rowid)/Min(Rowid) in Informatica?

What are the perceptions to use ER and Normalization? What is ER model and Dimensional Model?

ER Model - Relational Dimensional - Star Schema(central table fact table with numeric data , all others are linked to central table, faster , but denormalised ) , Snowflake Schema(one fact table, Normalizing the dimension tables , Fact Constellation(Different fact tables and combined from one datamart to other)

What is Metadata?

Information about domain structure of data warehouse What are different types of Dimensional Modeling?



Dimensional - Star Schema(central table fact table with numeric data , all others are linked to central table, faster , but denormalised ) , Snowflake Schema(one fact table, Normalizing the dimension tables , Fact Constellation(Different fact tables and combined from one datamart to other)

1. What is dimensional modelling ? what is called a dimension ? What are the different types of dimensional modelling ? Have you done any ER modelling ? If so, how does it differ from dimensional modelling ? Which type do you prefer ? Why wouldn't you use the other type ? 2. What is snowflaking ? Example ? Why do you use snowflaking ? How is it different from star organization ? What are the advantages or disadvantages of snowflaking ? What type of data organization do you prefer ? Why ? 4. What RDBMS are you most comfortable in ? How does it support data warehousing needs ? 5. In data modelling, how do you implement a many-to-many relationship with respect to E-R modelling ? 6. Do you have any experience in data loading ? What tools or methods have you used for data loading ? 10. Why do you use dimensional modelling instead of ER modelling for data warehousing applications ? 1) Erwin - Is it possible to reverse engineer to diff schemes into single data model 2) Suppose there is a star schema where a fact table has 3 dimesnsion tables and this system is in product. Is it possible to add the more dimension table to the fact table . What is the impact in all the stages. Difference between Star & Snowflake Schema Snowflaking is a star schema design technique to separately store logical attributes usually of low cardinality along a loosely normalization technique. For example, you could snowflake the gender of your customers in order for you to track changes on these attributes if your customer dimension is too large to SCD's. The technique s not quite recommendable if you are going to use OLAP tools for your front end due to speed issues. snowflaking allows for easy update and load of data as redundancy of data is avoided to some extent, but browsing capabilites are greatly compromised. But sometimes it may become a necessary evil. To add a little to this, snowflaking often becomes necessary when you need data for which there is a one-to-many relationship with a dimension table. To try to consolidate this data into the dimension table would necessarily lead to redundancy (this is a violation of second normal form, which will produce a Cartesian product). This sort of redundancy can cause misleading results in queries, since the count of rows is artificially large (due to the Cartesian



product). A simple example of such a situation might be a "customer" dimension for which there is a need to store multiple contacts. If the contact information is brought in to the customer table, there would be one row for each contact (i.e., one for each customer/contact combination). In this situation, it is better just to create a "contact" snowflake table with a FK to the customer. In general, it is better to avoid snowflaking if possible, but sometimes the consequences of avoiding it are much worse. In star schema, all your dimensions will be linked directly with your fact table. On the other hand in Snowflake schema, dimensions maybe interlinked or may have one to many relationship with other tables. As previous mails said this isn't a desirable situation but you can make best choice once you have gathered all the requirements. The snowflake is a design like a star but with a connect tables in the dimensions tables is a relation between 2 dimensions.

3. Q: Which is better, Star or Snowflake? A: Strict data warehousing rules would have you use a Star schema but in reality most designs tend to become Snowflakes. They each have their pros and cons but both are far better then trying to use a transactional system third-normal form design.

4. Q: Why cant I use a copy of my transactional system for my data warehouse? A: This is one of the absolute worst things you can do. A lot of people initially go down this road because a tool vendor will support the idea when making their sales pitch. Many of these attempts will even experience success for a short period of time. Its not until your data sets grow and your business questions begin to be complex that this design mistake will really come out to bite you.

Q. What are the responsibilities of a data warehouse consultant/professional? The basic responsibility of a data warehouse consultant is to publish the right data. Some of the other responsibilities of a data warehouse consultant are:

1. Understand the end users by their business area, job responsibilities, and computer tolerance

2. Find out the decisions the end users want to make with the help of the data warehouse 3. Identify the best users who will make effective decisions using the data warehouse 4. Find the potential new users and make them aware of the data warehouse 5. Determining the grain of the data 6. Make the end user screens and applications much simpler and more template driven

Q. Stars and Cubes (Polaris) The star schema and OLAP cube are intimately related. Star schemas are most appropriate for very large data sets. OLAP cubes are most appropriate for smaller data sets where analytic tools can perform complex data comparisons and calculations. In almost all OLAP cube environments, its recommended that you originally source data into a star schema structure, and then use wizards to transform the data into the OLAP cube.



Q. What is the necessity of having dimensional modeling instead of an ER modeling? Compared to entity/relation modeling, it's less rigorous (allowing the designer more discretion in organizing the tables) but more practical because it accommodates database complexity and improves performance. Q. Dimensions and Facts.

Dimensional modeling begins by dividing the world into measurements and context. Measurements are usually numeric and taken repeatedly. Numeric measurements are facts. Facts are always surrounded by mostly textual context that's true at the moment the fact is recorded. Facts are very specific, well-defined numeric attributes. By contrast, the context surrounding the facts is open-ended and verbose. It's not uncommon for the designer to add context to a set of facts partway through the implementation.

Dimensional modeling divides the world of data into two major types: Measurements and Descriptions of the context surrounding those measurements. The measurements, which are typically numeric, are stored in fact tables, and the descriptions of the context, which are typically textual, are stored in the dimension tables. A fact table in a pure star schema consists of multiple foreign keys, each paired with a primary key in a dimension, together with the facts containing the measurements. Every foreign key in the fact table has a match to a unique primary key in the respective dimension (referential integrity). This allows the dimension table to possess primary keys that arent found in the fact table. Therefore, a product dimension table might be paired with a sales fact table in which some of the products are never sold. Dimensional models are full-fledged relational models, where the fact table is in third normal form and the dimension tables are in second normal form. The main difference between second and third normal form is that repeated entries are removed from a second normal form table and placed in their own snowflake. Thus the act of removing the context from a fact record and creating dimension tables places the fact table in third normal form. E.g. for Fact tables Sales, Cost, Profit E.g. for Dimensions Customer, Product, Store, Time Q. What are Additive Facts? Or what is meant by Additive Fact? The fact tables are mostly very huge and almost never fetch a single record into our answer set. We fetch a very large number of records on which we then do, adding, counting, averaging, or taking the min or max. The most common of them is adding. Applications are simpler if they store facts in an additive format as often as possible. Thus, in the grocery example, we dont need to store the unit price. We compute the unit price by dividing the dollar sales by the unit sales whenever necessary. Q. What is meant by averaging over time? Some facts, like bank balances and inventory levels, represent intensities that are awkward to express in an additive format. We can treat these semi additive facts as if they were additive but



just before presenting the results to the end user; divide the answer by the number of time periods to get the right result. This technique is called averaging over time. Q. What is a Conformed Dimension? When the enterprise decides to create a set of common labels across all the sources of data, the separate data mart teams (or, single centralized team) must sit down to create master dimensions that everyone will use for every data source. These master dimensions are called Conformed Dimensions. Two dimensions are conformed if the fields that you use as row headers have the same domain. Q. What is a Conformed Fact? If the definitions of measurements (facts) are highly consistent, we call them as Conformed Facts. Q. What are the 3 important fundamental themes in a data warehouse? The 3 most important fundamental themes are:

1. Drilling Down 2. Drilling Across and 3. Handling Time

Q. What is meant by Drilling Down? Drilling down means nothing more than give em more detail. Drilling Down in a relational database means adding a row header to an existing SELECT statement. For instance, if you are analyzing the sales of products at a manufacturer level, the select list of the query reads: SELECT MANUFACTURER, SUM(SALES). If you wish to drill down on the list of manufacturers to show the brand sold, you add the BRAND row header: SELECT MANUFACTURER, BRAND, SUM(SALES). Now each manufacturer row expands into multiple rows listing all the brands sold. This is the essence of drilling down. We often call a row header a grouping column because everything in the list thats not aggregated with an operator such as SUM must be mentioned in the SQL GROUP BY clause. So the GROUP BY clause in the second query reads, GROUP BY MANUFACTURER, BRAND. Q. What is meant by Drilling Across? Drilling Across adds more data to an existing row. If drilling down is requesting ever finer and granular data from the same fact table, then drilling across is the process fo linking two or more fact tables at the same granularity, or, in other words, tables with the same set of grouping columns and dimensional constraints. A drill across report can be created by using grouping columns that apply to all the fact tables used in the report.



The new fact table called for in the drill-across operation must share certain dimensions with the fact table in the original query. All fact tables in a drill-across query must use conformed dimensions. Q. What is the significance of handling time? Example, when a customer moves from a property, we might want to know:

1. who the new customer is 2. when did the old customer move out 3. when did the new customer move in 4. how long was the property empty etc

Q. What is menat by Drilling Up? If drilling down is adding grouping columns from the dimension tables, then drilling up is subtracting grouping columns. Q. What is meant by Drilling Around? The final variant of drilling is drilling around a value circle. This is similar to the linear value chain that I showed in the previous example, but occurs in a data warehouse where the related fact tables that share common dimensions are not arranged i n a linear order. The best example is from health care, where as many as 10 separate entities are processing patient encounters, and are sharing this information with one another. E.g. a typical health care value circle with 10 separate entities surrounding the patient. When the common dimensions are conformed and the requested grouping columns are drawn from dimensions that tie to all the fact tables in a given report, you can generate really powerful drill around reports by performing separate queries on each fa ct table and outer joining the answer sets in the client tool. Q. What are the important fields in a recommended Time dimension table? Time_key Day_of_week Day_number_in_month Day_number_overall Month Month_number_overall Quarter Fiscal_period Season Holiday_flag Weekday_flag Last_day_in_month_flag Q. Why have timestamp as a surrogate key rather than a real date? The tiem stamp in a fact table should be a surrogate key instead of a real date because:



the rare timestamp that is inapplicable, corrupted, or hasnt happened yet needs a value that cannot be a real date

most end-user calendar navigation constraints, such as fiscal periods, end-of-periods, holidays, day numbers and week numbers arent supported by database timestamps

integer time keys take up much less disk space than full dates Q. Why have more than one fact table instead of a single fact table? We cannot combine all of the business processes into a single fact table because:

the separate fact tables in the value chain do not share all the dimensions. You simply cant put the customer ship to dimension on the finished goods inventory data

each fact table possesses different facts, and the fact table records are recorded at different tiems along the alue chain

Q. What is mean by Slowly Changing Dimensions and what are the different types of SCDs? (Mascot) Dimensions dont change in predicable ways. Ind

informatica interview questioner-ambarish

Documents

source database

source definition

informatica designer

source table

source target definitions

source analyzer

cobol file xml file

components of informatica