ibm info sphere suite overview

Post on 06-Mar-2015

70 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© 2011, GAVS Technologies

IBM Infosphere Suite Overview

© 2011, GAVS Technologies

Datawarehousing

► Datawarehouse is a DBMS platform where historical data of an organization is stored.

► The concepts and methods used to create a datawarehouse is collectively known as datawarehousing.

► Why datawarehouse when we have transactional databases?– Databases are designed for fast query and editing of information whereas

warehouses are designed for analytical & reporting purposes.– Databases can hold 3 to 6 months of data due to design constraints (E-R

Modeling) whereas warehouses can hold years of data (Dimensional Modeling).– Datawarehouses are also known as Decision Support Systems (DSS) since they

help the management to make informed policy & product related decisions.

► The subsets of Datawarehouses are known as Data Marts.► The tools that are used to create a warehouse can be categorized into Data

Modeling tools & ETL Tools.

© 2011, GAVS Technologies

ETL

► ETL stands for Extract, Transform and Load► Extract is the process of getting the data from various source

systems & files.► Transformation is the stage where the data is checked for

consistency, cleansed and transformed as per the business requirements.

► Load is the process of updating or inserting the transformed data into the datawarehouse.

► There are many ETL tools available in the market like Informatica, Abinitio etc.

► The ETL tool selected for IHS Newton project is IBM Infosphere Suite 8.5

© 2011, GAVS Technologies

Components

► The Infosphere suite comprises the following softwares in it– Datastage - ETL tool

– Qualitystage - Standardizing & cleansing tool

– Information Analyzer - Analysis & understanding of the data structure.

– Metadata Workbench - Centralized repository of metadata.

– Business Glossary - Web-based tool to create, manage, and share an enterprise vocabulary and classification system

– Fast Track ETL - job creation assistance

– Blueprint Director - Project flow assistance

© 2011, GAVS Technologies

Architecture

© 2011, GAVS Technologies

Datastage

► Datastage provides a graphical framework that can be used to design and run the jobs that transform the data

► Datastage delivers four core capabilities:– Connectivity to a wide range of mainframe, legacy, and enterprise

applications, databases, file formats, and external information sources.

– Prebuilt library of more than 300 functions including data validation rules and very complex transformations.

– Maximum throughput using a parallel, high-performance processing

architecture.

– Enterprise-class capabilities for development, deployment, maintenance, and high-availability. It leverages metadata for analysis and maintenance. It also operates in batch, real time, or as a Web service.

© 2011, GAVS Technologies

Datastage Cont...

► Data transformation and movement is the process by which source data is selected, converted, and mapped to the format required by targeted systems.

► The process manipulates data to bring it into compliance with business, domain, and integrity rules and with other data in the target environment.

► Data Transformation can take some of the following forms:– Aggregation

Consolidating or summarizing data values into a single value. Collecting daily sales data to be aggregated to the weekly level is a common example of aggregation.

– Basic conversion

Ensuring that data types are correctly converted and mapped from source to target columns.

© 2011, GAVS Technologies

Datastage Cont...

► Data Transformations Contd...– Derivation

Transforming data from multiple sources by using a complex business rule or algorithm.

– Enrichment

Combining data from internal or external sources to provide additional meaning to the data.

– Normalizing

Reducing the amount of redundant and potentially duplicated data.– Combining

The process of combining data from multiple sources via parallel Lookup, Join, or Merge operations.

– Pivoting

Converting records in an input stream to many records in the appropriate table in the data warehouse or data mart.

– Sorting

Grouping related records and sequencing data based on data or string values.

© 2011, GAVS Technologies

Datastage – Stage Examples

© 2011, GAVS Technologies

Datastage Sample Jobs

© 2011, GAVS Technologies

Datastage Sample Jobs Contd....

© 2011, GAVS Technologies

Datastage Sample Jobs Contd....

© 2011, GAVS Technologies

Qualitystage

► Qualitystage is the data cleansing and standardizing tool of the Infosphere suite

► The main functionalities of Qualitystage are:– Investigation of source data to understand the nature, scope, and detail

of data quality challenges.

– Standardization to ensure that data is formatted and conforms to organization-wide specifications, including name and firm standards as well as address cleansing and verification.

– Matching of data to identify duplicate records within and across data sets.

– Survivorship to eliminate duplicate records and create the “best record view” of data.

© 2011, GAVS Technologies

Qualitystage Sample Jobs

© 2011, GAVS Technologies

Qualitystage Sample Jobs Contd...

© 2011, GAVS Technologies

Information Analyzer

► Information Analyzer is used to understand the content, structure, and overall quality of the data at a given point in time.

► This analysis aids in understanding the inputs to the integration process, ranging from individual fields to high-level data entities.

► Information analysis also enables to correct problems with structure or validity before they affect the project.

© 2011, GAVS Technologies

Information Analyzer Interface

© 2011, GAVS Technologies

Metadata Workbench

► This tool provides a visual, Web-based exploration of metadata that is generated, used, and imported by the InfoSphere Information Server.

► InfoSphere Information Server components store design time, runtime, and glossary metadata in the metadata repository.

► Users can also import database and data file information into the metadata repository and create extended data sources and extension mappings that represent objects and processes that exist outside of InfoSphere Information Server.

► Metadata Workbench helps business and IT users explore and manage those metadata assets.

► The metadata workbench gives reports on data flow, data lineage, and the impact of changes to data assets or physical assets.

© 2011, GAVS Technologies

Metadata Workbench Interface

© 2011, GAVS Technologies

Business Glossary

► Business Glossary is an interactive, Web-based tool that enables users to create, manage, and share an enterprise vocabulary and classification system.

► A business glossary is designed to help users understand business language and the business meaning of information assets like databases, jobs, database tables and columns, and business intelligence reports.

► In addition to categories and terms, the business glossary also contains information about other assets such as database tables, jobs, and reports that are in the metadata repository.

© 2011, GAVS Technologies

© 2011, GAVS Technologies

Fast Track

► FastTrack automates multiple data integration tasks from analysis to code generation, while incorporating the business perspective and maintaining lineage and documented requirements.

top related