meljun cortes introducing data warehousing - the chess pieces
TRANSCRIPT
-
7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces
1/21
Introducing Data WarehousingThe Chess Pieces
Page 1 of 21
IBM Software Group
Business Analytics
Restricted and Confidential
For internal use only
IBM Software Group Business Analytics
2010 IBM Corporation
IntroducingData WarehousingThe Chess Pieces
Dennis ButteraSenior Training SpecialistIBM Software Group Business Analytics
Last updated: April 2010
Material for presentation referenced from:
The Data Warehouse Lifecycle Toolkit
Expert Methods for Designing, Developing and Deploying Data Warehouses
Ralph Kimball, Laura Reeves, Margy Ross, Warren Thornthwaite
Wiley Computer Publishing, John Wiley & Sons, Inc., 1998
ISBN: 0-471-25547-5
-
7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces
2/21
-
7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces
3/21
Introducing Data WarehousingThe Chess Pieces
Page 3 of 21
IBM Software Group
Business Analytics
Restricted and Confidential
For internal use only
IBM Software Group Business Analytics
2010 IBM Corporation
Overview
vague terminology in data
warehouse marketplace
define important terms withmainstream
learn the strategic significance ofeach piece and how each fit intothe big picture
this is something like studying all the chess pieces and what they can dobefore attempting to play a chess game.
from The Data Warehouse Lifecycle Toolkit
There is a lot of vague terminology being used in the data warehouse (DW) marketplace.
Some would classify a DW as a non-query-able data resource! We will attempt to provide
definitions for the key components of data warehousing that is close to the mainstream
definitions.
-
7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces
4/21
Introducing Data WarehousingThe Chess Pieces
Page 4 of 21
IBM Software Group
Business Analytics
Restricted and Confidential
For internal use only
IBM Software Group Business Analytics
2010 IBM Corporation
Key takeaways from session
A better understanding of the termsused in data warehousing
A better understanding of thedimensional lifecycle
A basic understanding of whereFramework Manager fits into the datawarehousing scenario
-
7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces
5/21
Introducing Data WarehousingThe Chess Pieces
Page 5 of 21
IBM Software Group
Business Analytics
Restricted and Confidential
For internal use only
IBM Software Group Business Analytics
2010 IBM Corporation
Data Warehouse Pieces
Need to understand strategic significance of pieces
Need to know roles
Need to think ahead
The challenge:
To address the ever-changing nature of the businessenvironment:
User needs
Business conditions
Changing nature of data
Technical environment
In chess, you study the use of the pieces before trying to play the game. You need to
understand what they can (and cannot) do on the board. You also need to learn the strategic
significance of the pieces and how to wield them during a game.
Finally, you have better success when thinking ahead. Your opponent is the ever-changing
nature of the environment you work in.
You cannot avoid the changing user needs, changing business conditions, changing nature of
the data you are given to work with, and the changing technical environment.
In the upcoming slides, we will identify and briefly define the basic elements (mainstreamdefinitions) of a data warehouse.
-
7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces
6/21
Introducing Data WarehousingThe Chess Pieces
Page 6 of 21
IBM Software Group
Business Analytics
Restricted and Confidential
For internal use only
IBM Software Group Business Analytics
2010 IBM Corporation
BASIC ELEMENTS OF A DATA WAREHOUSE
-
7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces
7/21
Introducing Data WarehousingThe Chess Pieces
Page 7 of 21
IBM Software Group
Business Analytics
Restricted and Confidential
For internal use only
IBM Software Group Business Analytics
2010 IBM Corporation
Basic Elements of DW Source SystemAn operational system whose function it is to capture the
transactions of the business.
Legacy system
Queries are narrow account-based queries
Limited historical information
Management reporting is a burden
Limited time conforming basic dimensions (ie.Product, customer, geography) with other legacysystems
Production keys (ie. Product number, customernumber)
A source system is often called a legacy system in a mainframe environment. The main
priorities are uptime and availability.
Queries against source systems are part of the normal transaction flow, but, can be
restrictive due to the nature of the demands on the legacy systems.
Most legacy systems maintain little historical data and management reporting from source
systems is a burden on these systems. They are not queried in the broad ways that data
warehouses are typically queried.
Most of these systems were developed with little or no investment put into conforming basicdimensions (ie. product, customer, geography or calendar) with other legacy systems in an
organization.
Source systems have keys (also known as production keys) that make certain things unique
(ie. product keys or customer keys). They are treated as attributes (like other textual
descriptions).
-
7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces
8/21
Introducing Data WarehousingThe Chess Pieces
Page 8 of 21
IBM Software Group
Business Analytics
Restricted and Confidential
For internal use only
IBM Software Group Business Analytics
2010 IBM Corporation
Basic Elements of DW Data Staging AreaA storage area and set of processes that clean, transform,
combine, duplicate, household, archive, and preparesource data for use in the data warehouse.
Everything between source system and presentationserver
Can be spread over a number of machines
Sorting, sequential processing activities
Not necessarily relational
Data arrives in 3rd normal form RDB
Set of normalized structures
Does not provide query and presentation services
Definition - A storage area and set of processes that clean, transform, combine, duplicate,
household, archive, and prepare source data for use in the data warehouse.
The data staging area is everything between source system and presentation server. Its
better to exist on a single centralized area, but, is more likely to be spread over a number of
machines. Activities here are dominated by simple activities of sorting and sequential
processing. It does not have to be based on relational technology. This could be pointless if
the data has already been verified for conformance and with the 1-to-1 and many-to-1
business rules you defined.
Many situations when data arrives at the doorstep of the data staging area in 3 rd normal
form RDB. Some managers of the data staging area are more comfortable organizing their
cleaning, transforming, and combining steps around a set of normalized structures.
Restriction of the data staging area - does not provide query and presentation services.
-
7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces
9/21
Introducing Data WarehousingThe Chess Pieces
Page 9 of 21
IBM Software Group
Business Analytics
Restricted and Confidential
For internal use only
IBM Software Group Business Analytics
2010 IBM Corporation
Basic Elements of DW Presentation Server
The target physical machine on which the data warehouse
data is organized and stored for direct querying by endusers, report writers, and other applications.
Data presented and stored in a dimensionalframework
If relational, tables organized as star schemas
If olap, recognizable dimensions
Definition - The target physical machine on which the data warehouse data is organized and
stored for direct querying by end users, report writers, and other applications.
It is suggested that three different systems are required for a data warehouse to function:
1. Source System - should be thought of as outside the data warehouse. In most
situations, there is no control over the content and format of the data on the legacy
system.
2. Data Staging area the initial storage and cleaning system for data that is moving
toward the presentation server. This area may also contain only flat files.
Presentation Server suggested that this is where the data should be presented and stored
in a dimensional framework. If based on a relational database, tables are organized as star
schemas. If based on non-relational on-line analytical processing (OLAP), the data will have
recognizable dimensions.
Most large marts used relational databases, but, OLAP is becoming much more relevant.
-
7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces
10/21
Introducing Data WarehousingThe Chess Pieces
Page 10 of 21
IBM Software Group
Business Analytics
Restricted and Confidential
For internal use only
IBM Software Group Business Analytics
2010 IBM Corporation
Basic Elements of DW Dimensional Model
A specific discipline for modeling data that is an alternative
to entity relationship modeling.
Same information as E/R model
Data packaged for user understandability, queryperformance and resilience to change
Fact table
Dimension table
Definition -A specific discipline for modeling data that is an alternative to entity relationship
modeling.
A dimensional model contains the same information as an E/R model, but, packages the data
in a balanced format whose design goals are user understandability, query performance, and
resilience to change.
Too many data warehouses fail fail because of overly complex E/R designs. Dimensional
modeling is key.
The main components of a dimensional model are fact tables and dimension tables. Briefly,
Fact Table
Is the primary table in each dimensional model that is meant to contain measurements of
the business. Fact represents a business measure. The most useful facts are numeric and
additive. Each fact table represents a many-to-many relationship and every fact table
contains a set of two or more foreign keys that join to their respective dimension tables.
Dimension Table
Is one of a set of companion tables to a fact table. Each dimension is defined by its primary
key that serves as the basis for referential integrity with any given fact table to which it is
joined. Most dimension tables contain many attributes (fields) that are the basis for
constraining and grouping within data warehouse queries.
-
7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces
11/21
Introducing Data WarehousingThe Chess Pieces
Page 11 of 21
IBM Software Group
Business Analytics
Restricted and Confidential
For internal use only
IBM Software Group Business Analytics
2010 IBM Corporation
Basic Elements of DW Business Process
A coherent set of business activities that make sense to thebusiness users of the data warehouse.
A business process is a set of activities (ie.Order processing)
Business processes overlap
Grouping of information resources with acoherent theme
One or more data marts for each
Definition - A coherent set of business activities that make sense to the business users of the
data warehouse.
A business process is a set of activities (ie. order processing, customer pipeline management),
but, business processes can overlap. Individual business processes can evolve over time. In
general, business process is a grouping of information resources with a coherent theme.
It is common to have one or more data marts for each business process.
-
7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces
12/21
Introducing Data WarehousingThe Chess Pieces
Page 12 of 21
IBM Software Group
Business Analytics
Restricted and Confidential
For internal use only
IBM Software Group Business Analytics
2010 IBM Corporation
Basic Elements of DW Data Mart
A logical subset of the complete data warehouse.
Pie-wedge of overall data warehouse
Restriction to single business process/ group ofrelated business processes targeted toward aparticular business group
Built from conformed facts and conformeddimensions
Top-down vs. Bottom-up
Based on granular data
May/may not contain aggregates
Definition -A logical subset of the complete data warehouse.
A data mart is a complete pie-wedge of overall the overall data warehouse pie. It
represents a project that can be brought to completion rather than an impossible all-
encompassing undertaking. A data warehouse is a collection of data marts that are joined to
work together. A data mart can be viewed as the restriction of a data warehouse to a single
business process or to a group of related business processes targeted toward a particular
business group. It is usually sponsored and built by a single part of the business.
Best practices state that data marts should be built from conformed facts and conformed
dimensions (basis of the Data Warehouse Bus Architecture). This ensures its robustness andresilience to continuous evolving (not changing) requirements. This also ensures that data
marts can be combined together.
The top-down data warehouse perspective is that a completely centralized, tightly designed
master database must be completed before parts of it are summarized and published as
individual data marts.
The bottom-up perspective is that an enterprise data warehouse can be assembled from
disparate and unrelated data marts.
Working to only the extreme of one or the other may not be the best approach. A blend of
both is a good approach - if a proper architecture is put in place, it would guide the design of
all the separate pieces. The best way to link the physical tables together is to ensure that thedimensions of the data have the same meaning across the tables (conformed).
Data marts are generally based on granular data and may/may not contain aggregates
(summaries for performance improvements).
-
7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces
13/21
Introducing Data WarehousingThe Chess Pieces
Page 13 of 21
IBM Software Group
Business Analytics
Restricted and Confidential
For internal use only
IBM Software Group Business Analytics
2010 IBM Corporation
Basic Elements of DW Data Warehouse
The queryable source of data in the enterprise.
Union of data marts
Fed from data staging area
Not organized around the E/R model
Frequently updated
Definition - The query-able source of data in the enterprise.
The data warehouse is a union of data marts. It is fed from the data staging area (usually
managed by a DW manager).
The data warehouse is the query-able presentation for an enterprises data it is not
organized around the E/R model. If it is, understandability and performance is lost. It is
frequently updated on a controlled basis as data is corrected, snapshots are accumulated,
and statuses and labels are changed.
-
7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces
14/21
Introducing Data WarehousingThe Chess Pieces
Page 14 of 21
IBM Software Group
Business Analytics
Restricted and Confidential
For internal use only
IBM Software Group Business Analytics
2010 IBM Corporation
Basic Elements of DW Operational DataStore (ODS)
Originally, serves as a point of integration for operationalsystems. Now, includes decision support access by clerks
and executives.
Describes underlying database of operationalsystem to data warehouse
Point of integration
Constant operational access and updates
Decision support access
Second ODS built to support lowest layer of DW
Operational real-time role or reporting anddecision support
Definition - Originally, serves as a point of integration for operational systems. Now, includes
decision support access by clerks and executives.
ODS has taken on a number of definitions and is not as useful as a term as it used to be. It
has been used to describe everything from the underlying database of operational system to
the data warehouse itself.
Originally, the ODS was meant to serve as a point of integration for operational systems.
This was especially important for legacy systems that grew independent of each other.
Example Banks typically have independent systems to handle loans, checking accounts,savings accounts, etcWith the emergence of teller support computers and the ATM, many
banks created ODSs to integrate current balances and recent history from these separate
accounts under one customer number. Perfect example of the role that ODS can play. This
need for integration was a driving force behind success of the client/server ERP business.
This type of ODS should be housed outside the warehouse need to ensure that no one
launches a complex report that requires full table scans and aggregation of historical data at
the same time as users are looking to view history to support a business scenario.
Purpose of ODS now considered to include decision support access by clerks and executives.
The thought is that if ODS is meant to contain aggregated data at the detail level, need to
create one to support the lowest layer of the data warehouse.
ODS is meant to be an operational real-time role and should be separate. If it is meant toprovide reporting and decision support, it is suggested that you skip the ODS and address
needs directly from the detailed level of the data warehouse.
-
7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces
15/21
Introducing Data WarehousingThe Chess Pieces
Page 15 of 21
IBM Software Group
Business Analytics
Restricted and Confidential
For internal use only
IBM Software Group Business Analytics
2010 IBM Corporation
Basic Elements of DW OLAP (On-line Analytic Processing)
The general activity of querying and presenting text andnumber data from data warehouses, as well as a
specifically dimensional style of querying and presentingthat is exemplified by a number of OLAP vendors.
Non-relational
Multidimensional cube of data
Definition -The general activity of querying and presenting text and number data from data
warehouses, as well as a dimensional style of querying and presenting that is exemplified by a
number of OLAP vendors.
The OLAP vendors technology is non-relational and is almost always based on an explicit
multi-dimensional cube of data. OLAP databases are also known as multidimensional
databases (MDDB). OLAP installations are classified as small individual data marts when
viewed against the full-range of DW.
-
7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces
16/21
Introducing Data WarehousingThe Chess Pieces
Page 16 of 21
IBM Software Group
Business Analytics
Restricted and Confidential
For internal use only
IBM Software Group Business Analytics
2010 IBM Corporation
Basic Elements of DW End User Application
A collection of tools that query, analyze, and presentinformation targeted to support a business need.
Minimal set of tools made up of:
End user data access tool
A spreadsheet
A graphics package
User interface that simplifies the userexperience
-
7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces
17/21
Introducing Data WarehousingThe Chess Pieces
Page 17 of 21
IBM Software Group
Business Analytics
Restricted and Confidential
For internal use only
IBM Software Group Business Analytics
2010 IBM Corporation
Basic Elements of DW End User DataAccess Tool
A client of the data warehouse.
Maintains a session with presentation server
Displays report, graph or analysis
Simple tool (adhoc) or complex tool (data mining)
Modeling or forecasting tools upload results to special datawarehouse areas
-
7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces
18/21
Introducing Data WarehousingThe Chess Pieces
Page 18 of 21
IBM Software Group
Business Analytics
Restricted and Confidential
For internal use only
IBM Software Group Business Analytics
2010 IBM Corporation
Basic Elements of DW Modeling Applications
A sophisticated data warehouse client with analyticcapabilities that transform or digest the output from a data
warehouse.
Includes:
Forecasting models
Behaviour scoring models
Allocation models
Data mining tools
-
7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces
19/21
Introducing Data WarehousingThe Chess Pieces
Page 19 of 21
IBM Software Group
Business Analytics
Restricted and Confidential
For internal use only
IBM Software Group Business Analytics
2010 IBM Corporation
Basic Elements of DW Metadata
All of the information in the data warehouse environmentthat is not the actual data itself.
Should:
Catalog
Version stamp
Document
Backup
-
7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces
20/21
Introducing Data WarehousingThe Chess Pieces
Page 20 of 21
IBM Software Group
Business Analytics
Restricted and Confidential
For internal use only
IBM Software Group Business Analytics
2010 IBM Corporation
BASIC PROCESSES OF A DATA WAREHOUSE
Data staging includes:
Extracting Transforming cleaning, purging, combining, surrogate
creation, aggregates
Loading and indexing
Quality assurance checking
Updating
Querying
Data feedback/feeding in reverse
Auditing
Securing
Backup and recovery
-
7/30/2019 MELJUN CORTES Introducing Data Warehousing - The Chess Pieces
21/21
Introducing Data WarehousingThe Chess Pieces
Page 21 of 21
IBM Software Group
Business Analytics
Restricted and Confidential
IBM Software Group Business Analytics
2010 IBM Corporation
Vragen
Fragen
Domande
Preguntas
Perguntas
Questions?
Cwestiwn