meljun cortes introducing data warehousing - the chess pieces

Upload: meljun-cortes-mbampa

Post on 14-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces

    1/21

    Introducing Data WarehousingThe Chess Pieces

    Page 1 of 21

    IBM Software Group

    Business Analytics

    Restricted and Confidential

    For internal use only

    IBM Software Group Business Analytics

    2010 IBM Corporation

    IntroducingData WarehousingThe Chess Pieces

    Dennis ButteraSenior Training SpecialistIBM Software Group Business Analytics

    Last updated: April 2010

    Material for presentation referenced from:

    The Data Warehouse Lifecycle Toolkit

    Expert Methods for Designing, Developing and Deploying Data Warehouses

    Ralph Kimball, Laura Reeves, Margy Ross, Warren Thornthwaite

    Wiley Computer Publishing, John Wiley & Sons, Inc., 1998

    ISBN: 0-471-25547-5

  • 7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces

    2/21

  • 7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces

    3/21

    Introducing Data WarehousingThe Chess Pieces

    Page 3 of 21

    IBM Software Group

    Business Analytics

    Restricted and Confidential

    For internal use only

    IBM Software Group Business Analytics

    2010 IBM Corporation

    Overview

    vague terminology in data

    warehouse marketplace

    define important terms withmainstream

    learn the strategic significance ofeach piece and how each fit intothe big picture

    this is something like studying all the chess pieces and what they can dobefore attempting to play a chess game.

    from The Data Warehouse Lifecycle Toolkit

    There is a lot of vague terminology being used in the data warehouse (DW) marketplace.

    Some would classify a DW as a non-query-able data resource! We will attempt to provide

    definitions for the key components of data warehousing that is close to the mainstream

    definitions.

  • 7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces

    4/21

    Introducing Data WarehousingThe Chess Pieces

    Page 4 of 21

    IBM Software Group

    Business Analytics

    Restricted and Confidential

    For internal use only

    IBM Software Group Business Analytics

    2010 IBM Corporation

    Key takeaways from session

    A better understanding of the termsused in data warehousing

    A better understanding of thedimensional lifecycle

    A basic understanding of whereFramework Manager fits into the datawarehousing scenario

  • 7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces

    5/21

    Introducing Data WarehousingThe Chess Pieces

    Page 5 of 21

    IBM Software Group

    Business Analytics

    Restricted and Confidential

    For internal use only

    IBM Software Group Business Analytics

    2010 IBM Corporation

    Data Warehouse Pieces

    Need to understand strategic significance of pieces

    Need to know roles

    Need to think ahead

    The challenge:

    To address the ever-changing nature of the businessenvironment:

    User needs

    Business conditions

    Changing nature of data

    Technical environment

    In chess, you study the use of the pieces before trying to play the game. You need to

    understand what they can (and cannot) do on the board. You also need to learn the strategic

    significance of the pieces and how to wield them during a game.

    Finally, you have better success when thinking ahead. Your opponent is the ever-changing

    nature of the environment you work in.

    You cannot avoid the changing user needs, changing business conditions, changing nature of

    the data you are given to work with, and the changing technical environment.

    In the upcoming slides, we will identify and briefly define the basic elements (mainstreamdefinitions) of a data warehouse.

  • 7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces

    6/21

    Introducing Data WarehousingThe Chess Pieces

    Page 6 of 21

    IBM Software Group

    Business Analytics

    Restricted and Confidential

    For internal use only

    IBM Software Group Business Analytics

    2010 IBM Corporation

    BASIC ELEMENTS OF A DATA WAREHOUSE

  • 7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces

    7/21

    Introducing Data WarehousingThe Chess Pieces

    Page 7 of 21

    IBM Software Group

    Business Analytics

    Restricted and Confidential

    For internal use only

    IBM Software Group Business Analytics

    2010 IBM Corporation

    Basic Elements of DW Source SystemAn operational system whose function it is to capture the

    transactions of the business.

    Legacy system

    Queries are narrow account-based queries

    Limited historical information

    Management reporting is a burden

    Limited time conforming basic dimensions (ie.Product, customer, geography) with other legacysystems

    Production keys (ie. Product number, customernumber)

    A source system is often called a legacy system in a mainframe environment. The main

    priorities are uptime and availability.

    Queries against source systems are part of the normal transaction flow, but, can be

    restrictive due to the nature of the demands on the legacy systems.

    Most legacy systems maintain little historical data and management reporting from source

    systems is a burden on these systems. They are not queried in the broad ways that data

    warehouses are typically queried.

    Most of these systems were developed with little or no investment put into conforming basicdimensions (ie. product, customer, geography or calendar) with other legacy systems in an

    organization.

    Source systems have keys (also known as production keys) that make certain things unique

    (ie. product keys or customer keys). They are treated as attributes (like other textual

    descriptions).

  • 7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces

    8/21

    Introducing Data WarehousingThe Chess Pieces

    Page 8 of 21

    IBM Software Group

    Business Analytics

    Restricted and Confidential

    For internal use only

    IBM Software Group Business Analytics

    2010 IBM Corporation

    Basic Elements of DW Data Staging AreaA storage area and set of processes that clean, transform,

    combine, duplicate, household, archive, and preparesource data for use in the data warehouse.

    Everything between source system and presentationserver

    Can be spread over a number of machines

    Sorting, sequential processing activities

    Not necessarily relational

    Data arrives in 3rd normal form RDB

    Set of normalized structures

    Does not provide query and presentation services

    Definition - A storage area and set of processes that clean, transform, combine, duplicate,

    household, archive, and prepare source data for use in the data warehouse.

    The data staging area is everything between source system and presentation server. Its

    better to exist on a single centralized area, but, is more likely to be spread over a number of

    machines. Activities here are dominated by simple activities of sorting and sequential

    processing. It does not have to be based on relational technology. This could be pointless if

    the data has already been verified for conformance and with the 1-to-1 and many-to-1

    business rules you defined.

    Many situations when data arrives at the doorstep of the data staging area in 3 rd normal

    form RDB. Some managers of the data staging area are more comfortable organizing their

    cleaning, transforming, and combining steps around a set of normalized structures.

    Restriction of the data staging area - does not provide query and presentation services.

  • 7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces

    9/21

    Introducing Data WarehousingThe Chess Pieces

    Page 9 of 21

    IBM Software Group

    Business Analytics

    Restricted and Confidential

    For internal use only

    IBM Software Group Business Analytics

    2010 IBM Corporation

    Basic Elements of DW Presentation Server

    The target physical machine on which the data warehouse

    data is organized and stored for direct querying by endusers, report writers, and other applications.

    Data presented and stored in a dimensionalframework

    If relational, tables organized as star schemas

    If olap, recognizable dimensions

    Definition - The target physical machine on which the data warehouse data is organized and

    stored for direct querying by end users, report writers, and other applications.

    It is suggested that three different systems are required for a data warehouse to function:

    1. Source System - should be thought of as outside the data warehouse. In most

    situations, there is no control over the content and format of the data on the legacy

    system.

    2. Data Staging area the initial storage and cleaning system for data that is moving

    toward the presentation server. This area may also contain only flat files.

    Presentation Server suggested that this is where the data should be presented and stored

    in a dimensional framework. If based on a relational database, tables are organized as star

    schemas. If based on non-relational on-line analytical processing (OLAP), the data will have

    recognizable dimensions.

    Most large marts used relational databases, but, OLAP is becoming much more relevant.

  • 7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces

    10/21

    Introducing Data WarehousingThe Chess Pieces

    Page 10 of 21

    IBM Software Group

    Business Analytics

    Restricted and Confidential

    For internal use only

    IBM Software Group Business Analytics

    2010 IBM Corporation

    Basic Elements of DW Dimensional Model

    A specific discipline for modeling data that is an alternative

    to entity relationship modeling.

    Same information as E/R model

    Data packaged for user understandability, queryperformance and resilience to change

    Fact table

    Dimension table

    Definition -A specific discipline for modeling data that is an alternative to entity relationship

    modeling.

    A dimensional model contains the same information as an E/R model, but, packages the data

    in a balanced format whose design goals are user understandability, query performance, and

    resilience to change.

    Too many data warehouses fail fail because of overly complex E/R designs. Dimensional

    modeling is key.

    The main components of a dimensional model are fact tables and dimension tables. Briefly,

    Fact Table

    Is the primary table in each dimensional model that is meant to contain measurements of

    the business. Fact represents a business measure. The most useful facts are numeric and

    additive. Each fact table represents a many-to-many relationship and every fact table

    contains a set of two or more foreign keys that join to their respective dimension tables.

    Dimension Table

    Is one of a set of companion tables to a fact table. Each dimension is defined by its primary

    key that serves as the basis for referential integrity with any given fact table to which it is

    joined. Most dimension tables contain many attributes (fields) that are the basis for

    constraining and grouping within data warehouse queries.

  • 7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces

    11/21

    Introducing Data WarehousingThe Chess Pieces

    Page 11 of 21

    IBM Software Group

    Business Analytics

    Restricted and Confidential

    For internal use only

    IBM Software Group Business Analytics

    2010 IBM Corporation

    Basic Elements of DW Business Process

    A coherent set of business activities that make sense to thebusiness users of the data warehouse.

    A business process is a set of activities (ie.Order processing)

    Business processes overlap

    Grouping of information resources with acoherent theme

    One or more data marts for each

    Definition - A coherent set of business activities that make sense to the business users of the

    data warehouse.

    A business process is a set of activities (ie. order processing, customer pipeline management),

    but, business processes can overlap. Individual business processes can evolve over time. In

    general, business process is a grouping of information resources with a coherent theme.

    It is common to have one or more data marts for each business process.

  • 7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces

    12/21

    Introducing Data WarehousingThe Chess Pieces

    Page 12 of 21

    IBM Software Group

    Business Analytics

    Restricted and Confidential

    For internal use only

    IBM Software Group Business Analytics

    2010 IBM Corporation

    Basic Elements of DW Data Mart

    A logical subset of the complete data warehouse.

    Pie-wedge of overall data warehouse

    Restriction to single business process/ group ofrelated business processes targeted toward aparticular business group

    Built from conformed facts and conformeddimensions

    Top-down vs. Bottom-up

    Based on granular data

    May/may not contain aggregates

    Definition -A logical subset of the complete data warehouse.

    A data mart is a complete pie-wedge of overall the overall data warehouse pie. It

    represents a project that can be brought to completion rather than an impossible all-

    encompassing undertaking. A data warehouse is a collection of data marts that are joined to

    work together. A data mart can be viewed as the restriction of a data warehouse to a single

    business process or to a group of related business processes targeted toward a particular

    business group. It is usually sponsored and built by a single part of the business.

    Best practices state that data marts should be built from conformed facts and conformed

    dimensions (basis of the Data Warehouse Bus Architecture). This ensures its robustness andresilience to continuous evolving (not changing) requirements. This also ensures that data

    marts can be combined together.

    The top-down data warehouse perspective is that a completely centralized, tightly designed

    master database must be completed before parts of it are summarized and published as

    individual data marts.

    The bottom-up perspective is that an enterprise data warehouse can be assembled from

    disparate and unrelated data marts.

    Working to only the extreme of one or the other may not be the best approach. A blend of

    both is a good approach - if a proper architecture is put in place, it would guide the design of

    all the separate pieces. The best way to link the physical tables together is to ensure that thedimensions of the data have the same meaning across the tables (conformed).

    Data marts are generally based on granular data and may/may not contain aggregates

    (summaries for performance improvements).

  • 7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces

    13/21

    Introducing Data WarehousingThe Chess Pieces

    Page 13 of 21

    IBM Software Group

    Business Analytics

    Restricted and Confidential

    For internal use only

    IBM Software Group Business Analytics

    2010 IBM Corporation

    Basic Elements of DW Data Warehouse

    The queryable source of data in the enterprise.

    Union of data marts

    Fed from data staging area

    Not organized around the E/R model

    Frequently updated

    Definition - The query-able source of data in the enterprise.

    The data warehouse is a union of data marts. It is fed from the data staging area (usually

    managed by a DW manager).

    The data warehouse is the query-able presentation for an enterprises data it is not

    organized around the E/R model. If it is, understandability and performance is lost. It is

    frequently updated on a controlled basis as data is corrected, snapshots are accumulated,

    and statuses and labels are changed.

  • 7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces

    14/21

    Introducing Data WarehousingThe Chess Pieces

    Page 14 of 21

    IBM Software Group

    Business Analytics

    Restricted and Confidential

    For internal use only

    IBM Software Group Business Analytics

    2010 IBM Corporation

    Basic Elements of DW Operational DataStore (ODS)

    Originally, serves as a point of integration for operationalsystems. Now, includes decision support access by clerks

    and executives.

    Describes underlying database of operationalsystem to data warehouse

    Point of integration

    Constant operational access and updates

    Decision support access

    Second ODS built to support lowest layer of DW

    Operational real-time role or reporting anddecision support

    Definition - Originally, serves as a point of integration for operational systems. Now, includes

    decision support access by clerks and executives.

    ODS has taken on a number of definitions and is not as useful as a term as it used to be. It

    has been used to describe everything from the underlying database of operational system to

    the data warehouse itself.

    Originally, the ODS was meant to serve as a point of integration for operational systems.

    This was especially important for legacy systems that grew independent of each other.

    Example Banks typically have independent systems to handle loans, checking accounts,savings accounts, etcWith the emergence of teller support computers and the ATM, many

    banks created ODSs to integrate current balances and recent history from these separate

    accounts under one customer number. Perfect example of the role that ODS can play. This

    need for integration was a driving force behind success of the client/server ERP business.

    This type of ODS should be housed outside the warehouse need to ensure that no one

    launches a complex report that requires full table scans and aggregation of historical data at

    the same time as users are looking to view history to support a business scenario.

    Purpose of ODS now considered to include decision support access by clerks and executives.

    The thought is that if ODS is meant to contain aggregated data at the detail level, need to

    create one to support the lowest layer of the data warehouse.

    ODS is meant to be an operational real-time role and should be separate. If it is meant toprovide reporting and decision support, it is suggested that you skip the ODS and address

    needs directly from the detailed level of the data warehouse.

  • 7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces

    15/21

    Introducing Data WarehousingThe Chess Pieces

    Page 15 of 21

    IBM Software Group

    Business Analytics

    Restricted and Confidential

    For internal use only

    IBM Software Group Business Analytics

    2010 IBM Corporation

    Basic Elements of DW OLAP (On-line Analytic Processing)

    The general activity of querying and presenting text andnumber data from data warehouses, as well as a

    specifically dimensional style of querying and presentingthat is exemplified by a number of OLAP vendors.

    Non-relational

    Multidimensional cube of data

    Definition -The general activity of querying and presenting text and number data from data

    warehouses, as well as a dimensional style of querying and presenting that is exemplified by a

    number of OLAP vendors.

    The OLAP vendors technology is non-relational and is almost always based on an explicit

    multi-dimensional cube of data. OLAP databases are also known as multidimensional

    databases (MDDB). OLAP installations are classified as small individual data marts when

    viewed against the full-range of DW.

  • 7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces

    16/21

    Introducing Data WarehousingThe Chess Pieces

    Page 16 of 21

    IBM Software Group

    Business Analytics

    Restricted and Confidential

    For internal use only

    IBM Software Group Business Analytics

    2010 IBM Corporation

    Basic Elements of DW End User Application

    A collection of tools that query, analyze, and presentinformation targeted to support a business need.

    Minimal set of tools made up of:

    End user data access tool

    A spreadsheet

    A graphics package

    User interface that simplifies the userexperience

  • 7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces

    17/21

    Introducing Data WarehousingThe Chess Pieces

    Page 17 of 21

    IBM Software Group

    Business Analytics

    Restricted and Confidential

    For internal use only

    IBM Software Group Business Analytics

    2010 IBM Corporation

    Basic Elements of DW End User DataAccess Tool

    A client of the data warehouse.

    Maintains a session with presentation server

    Displays report, graph or analysis

    Simple tool (adhoc) or complex tool (data mining)

    Modeling or forecasting tools upload results to special datawarehouse areas

  • 7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces

    18/21

    Introducing Data WarehousingThe Chess Pieces

    Page 18 of 21

    IBM Software Group

    Business Analytics

    Restricted and Confidential

    For internal use only

    IBM Software Group Business Analytics

    2010 IBM Corporation

    Basic Elements of DW Modeling Applications

    A sophisticated data warehouse client with analyticcapabilities that transform or digest the output from a data

    warehouse.

    Includes:

    Forecasting models

    Behaviour scoring models

    Allocation models

    Data mining tools

  • 7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces

    19/21

    Introducing Data WarehousingThe Chess Pieces

    Page 19 of 21

    IBM Software Group

    Business Analytics

    Restricted and Confidential

    For internal use only

    IBM Software Group Business Analytics

    2010 IBM Corporation

    Basic Elements of DW Metadata

    All of the information in the data warehouse environmentthat is not the actual data itself.

    Should:

    Catalog

    Version stamp

    Document

    Backup

  • 7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces

    20/21

    Introducing Data WarehousingThe Chess Pieces

    Page 20 of 21

    IBM Software Group

    Business Analytics

    Restricted and Confidential

    For internal use only

    IBM Software Group Business Analytics

    2010 IBM Corporation

    BASIC PROCESSES OF A DATA WAREHOUSE

    Data staging includes:

    Extracting Transforming cleaning, purging, combining, surrogate

    creation, aggregates

    Loading and indexing

    Quality assurance checking

    Updating

    Querying

    Data feedback/feeding in reverse

    Auditing

    Securing

    Backup and recovery

  • 7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces

    21/21

    Introducing Data WarehousingThe Chess Pieces

    Page 21 of 21

    IBM Software Group

    Business Analytics

    Restricted and Confidential

    IBM Software Group Business Analytics

    2010 IBM Corporation

    Vragen

    Fragen

    Domande

    Preguntas

    Perguntas

    Questions?

    Cwestiwn