ibm info sphere suite overview
TRANSCRIPT
© 2011, GAVS Technologies
IBM Infosphere Suite Overview
© 2011, GAVS Technologies
Datawarehousing
► Datawarehouse is a DBMS platform where historical data of an organization is stored.
► The concepts and methods used to create a datawarehouse is collectively known as datawarehousing.
► Why datawarehouse when we have transactional databases?– Databases are designed for fast query and editing of information whereas
warehouses are designed for analytical & reporting purposes.– Databases can hold 3 to 6 months of data due to design constraints (E-R
Modeling) whereas warehouses can hold years of data (Dimensional Modeling).– Datawarehouses are also known as Decision Support Systems (DSS) since they
help the management to make informed policy & product related decisions.
► The subsets of Datawarehouses are known as Data Marts.► The tools that are used to create a warehouse can be categorized into Data
Modeling tools & ETL Tools.
© 2011, GAVS Technologies
ETL
► ETL stands for Extract, Transform and Load► Extract is the process of getting the data from various source
systems & files.► Transformation is the stage where the data is checked for
consistency, cleansed and transformed as per the business requirements.
► Load is the process of updating or inserting the transformed data into the datawarehouse.
► There are many ETL tools available in the market like Informatica, Abinitio etc.
► The ETL tool selected for IHS Newton project is IBM Infosphere Suite 8.5
© 2011, GAVS Technologies
Components
► The Infosphere suite comprises the following softwares in it– Datastage - ETL tool
– Qualitystage - Standardizing & cleansing tool
– Information Analyzer - Analysis & understanding of the data structure.
– Metadata Workbench - Centralized repository of metadata.
– Business Glossary - Web-based tool to create, manage, and share an enterprise vocabulary and classification system
– Fast Track ETL - job creation assistance
– Blueprint Director - Project flow assistance
© 2011, GAVS Technologies
Architecture
© 2011, GAVS Technologies
Datastage
► Datastage provides a graphical framework that can be used to design and run the jobs that transform the data
► Datastage delivers four core capabilities:– Connectivity to a wide range of mainframe, legacy, and enterprise
applications, databases, file formats, and external information sources.
– Prebuilt library of more than 300 functions including data validation rules and very complex transformations.
– Maximum throughput using a parallel, high-performance processing
architecture.
– Enterprise-class capabilities for development, deployment, maintenance, and high-availability. It leverages metadata for analysis and maintenance. It also operates in batch, real time, or as a Web service.
© 2011, GAVS Technologies
Datastage Cont...
► Data transformation and movement is the process by which source data is selected, converted, and mapped to the format required by targeted systems.
► The process manipulates data to bring it into compliance with business, domain, and integrity rules and with other data in the target environment.
► Data Transformation can take some of the following forms:– Aggregation
Consolidating or summarizing data values into a single value. Collecting daily sales data to be aggregated to the weekly level is a common example of aggregation.
– Basic conversion
Ensuring that data types are correctly converted and mapped from source to target columns.
© 2011, GAVS Technologies
Datastage Cont...
► Data Transformations Contd...– Derivation
Transforming data from multiple sources by using a complex business rule or algorithm.
– Enrichment
Combining data from internal or external sources to provide additional meaning to the data.
– Normalizing
Reducing the amount of redundant and potentially duplicated data.– Combining
The process of combining data from multiple sources via parallel Lookup, Join, or Merge operations.
– Pivoting
Converting records in an input stream to many records in the appropriate table in the data warehouse or data mart.
– Sorting
Grouping related records and sequencing data based on data or string values.
© 2011, GAVS Technologies
Datastage – Stage Examples
© 2011, GAVS Technologies
Datastage Sample Jobs
© 2011, GAVS Technologies
Datastage Sample Jobs Contd....
© 2011, GAVS Technologies
Datastage Sample Jobs Contd....
© 2011, GAVS Technologies
Qualitystage
► Qualitystage is the data cleansing and standardizing tool of the Infosphere suite
► The main functionalities of Qualitystage are:– Investigation of source data to understand the nature, scope, and detail
of data quality challenges.
– Standardization to ensure that data is formatted and conforms to organization-wide specifications, including name and firm standards as well as address cleansing and verification.
– Matching of data to identify duplicate records within and across data sets.
– Survivorship to eliminate duplicate records and create the “best record view” of data.
© 2011, GAVS Technologies
Qualitystage Sample Jobs
© 2011, GAVS Technologies
Qualitystage Sample Jobs Contd...
© 2011, GAVS Technologies
Information Analyzer
► Information Analyzer is used to understand the content, structure, and overall quality of the data at a given point in time.
► This analysis aids in understanding the inputs to the integration process, ranging from individual fields to high-level data entities.
► Information analysis also enables to correct problems with structure or validity before they affect the project.
© 2011, GAVS Technologies
Information Analyzer Interface
© 2011, GAVS Technologies
Metadata Workbench
► This tool provides a visual, Web-based exploration of metadata that is generated, used, and imported by the InfoSphere Information Server.
► InfoSphere Information Server components store design time, runtime, and glossary metadata in the metadata repository.
► Users can also import database and data file information into the metadata repository and create extended data sources and extension mappings that represent objects and processes that exist outside of InfoSphere Information Server.
► Metadata Workbench helps business and IT users explore and manage those metadata assets.
► The metadata workbench gives reports on data flow, data lineage, and the impact of changes to data assets or physical assets.
© 2011, GAVS Technologies
Metadata Workbench Interface
© 2011, GAVS Technologies
Business Glossary
► Business Glossary is an interactive, Web-based tool that enables users to create, manage, and share an enterprise vocabulary and classification system.
► A business glossary is designed to help users understand business language and the business meaning of information assets like databases, jobs, database tables and columns, and business intelligence reports.
► In addition to categories and terms, the business glossary also contains information about other assets such as database tables, jobs, and reports that are in the metadata repository.
© 2011, GAVS Technologies
© 2011, GAVS Technologies
Fast Track
► FastTrack automates multiple data integration tasks from analysis to code generation, while incorporating the business perspective and maintaining lineage and documented requirements.