extended star schema
TRANSCRIPT
© 2008 MindTree Consulting© 2008 MindTree Consulting
Introduction to BI, Data warehouseDay 1
© 2008 MindTree Consulting
Introduction to BI, Data warehouse
BI concepts
Data warehouse concepts
Introduction to BIW
Advantages of BIW over other data warehouse tools
Concept of star schema architecture
Introduction to Administrator workbench (All buttons in AWB)
© 2008 MindTree Consulting
Introduction
Success in a competitive business environment needs more than just good information. Ability to derive meaningful, timely and readily accessible insights from the information is the need of the hour.
Insights into the business are the key to define effective strategy, align business operations to the strategy and improve the efficiency and effectiveness of execution.
© 2008 MindTree Consulting
Is your enterprise set up to win?
© 2008 MindTree Consulting
Business needs
The ability to take actions based on complete, timely, relevant insights.
A fast accurate way to pinpoint root causes.
The ability to track and manage the alignment of strategic objectives and business activities
Easy access to information
Support for legal compliance
© 2008 MindTree Consulting
Why are today’s insights not enough?
75% of business users do not use analytic applications
Analytics are disconnected from business processes.
Business processes are disconnected from corporate strategy.
90% of organisations fail to execute their strategies- Fortune magazine
© 2008 MindTree Consulting
Introducing SAP BI
© 2008 MindTree Consulting
SAP BI’ s Approach
1. Establish one foundation to run your business - providing integrated consistent data and metrics
© 2008 MindTree Consulting
Today
© 2008 MindTree Consulting
With SAP BI -Establish a Foundation
© 2008 MindTree Consulting
SAP BI’ s Approach
1. Establish one foundation to run your business - providing integrated consistent data and metrics
2. Bring decision making to the business process
© 2008 MindTree Consulting
Bring decision making to the business process
© 2008 MindTree Consulting
Bring decision making to the business process
© 2008 MindTree Consulting
SAP BI’s Approach
1. Establish one foundation to run your business - providing integrated consistent data and metrics
2. Bring decision making to the business process
3. Align execution with strategy across organizations to achieve corporate goals
© 2008 MindTree Consulting
Align execution to strategy
© 2008 MindTree Consulting
Deliver actionable insights
© 2008 MindTree Consulting
SAP BI’s Approach
1. Establish one foundation to run your business - providing integrated consistent data and metrics
2. Bring decision making to the business process
3. Align execution with strategy across organizations to achieve corporate goals
4. Profit from the immediate action on insights within the business process with clear options and explanation of potential results
© 2008 MindTree Consulting
Profit from timely action
© 2008 MindTree Consulting
Improve business process
© 2008 MindTree Consulting
What does this mean for the future
© 2008 MindTree Consulting
Business benefits
Better-informed decisions with faster corrective actions.
Better business performance as a result of strategy-guided actions.
Faster innovation.
Faster response to changing business conditions.
Increased competitive advantage.
© 2008 MindTree Consulting
Business Intelligence
Defined as:
“Business Intelligence is a technology based on customer and profit oriented models that reduce operating costs and provide increased profitability by improving productivity, sales, and service and help to make decision-making capabilities at no time. Business Intelligence Models are based on multi dimensional analysis capabilities.”
© 2008 MindTree Consulting
BI solutions differ from and add value to standard operational systems (OLTP systems – Online Transaction Processing systems) in three ways
By providing the ability to extract, cleanse and aggregate data from multiple operational systems into a separate data mart or data warehouse
By storing data often in a star or multi dimensional cube format, to enable rapid delivery of summarized information and drill down to detail
By delivering personalized, relevant informational views and querying, reporting and analysis capabilities for gaining deeper business understanding and making better decisions faster
© 2008 MindTree Consulting
To implement BI, the following technologies are used-
Data Marts/ Data Warehouses - A data warehouse is a subject
oriented, integrated, time variant, non-volatile collection of data in
support of management's decision-making process. To facilitate
data retrieval for multi dimensional analytical processing a star
schema is used very often.
Extraction, Transformation and Loading (ETL) - Data is extracted
from multiple source systems. Data is cleansed and transformed
and into a consistent format and structure. The cleansed data is
loaded into the data warehouse.
On-Line Analytical Processing (OLAP) and Data Mining - Analysis
tools are applied against the data warehouse to analyze and mine
the data.
© 2008 MindTree Consulting
Key differentiators
●SAP BI supports key business processes.
●SAP BI reflects SAP’s industry-leading business
process expertise.
●SAP BI provides complete visibility across the entire
value chain.
●SAP BI is delivered on the most robust and scalable
technology platform.
●SAP BI delivers the most relevant set of predefined
content.SAP BI is easy to deploy and extend.
© 2008 MindTree Consulting
Key features of SAP BI
© 2008 MindTree Consulting© 2008 MindTree Consulting
Introduction to Data Warehouse
© 2008 MindTree Consulting
What is a data warehouse?
Term Data Warehouse coined by Bill Inmon in 1990
Bill Inmon ’s definition
A warehouse is a Subject-oriented, Integrated, Time-variant and Non-volatile collection of data in support of management’s decision making process
Page 28
© 2008 MindTree Consulting
What is a data warehouse?
Subject-Oriented Data that gives information about a particular subject instead of about
a company's ongoing operations
Page 29
Invoices Orders
DespatchPlan
Operational Data Warehouse
Customers Products
Regions Time
© 2008 MindTree Consulting
What is a data warehouse?
IntegratedData that is gathered into the data warehouse from a variety of sources
and merged into a coherent whole.
Page 30
Appl A - m,fAppl B - 1,0
Appl C - male,female
Appl A - balance dec fixed (13,2)Appl B - balance pic 9(9)V99
Appl C - balance pic S9(7)V99 comp-3Appl A - bal-on-hand
Appl B - current-balanceAppl C - cash-on-hand
Appl A - date (julian)Appl B - date (yymmdd)Appl C - date (absolute)
m,f
date (julian)
balance dec fixed (13,2)
Current balance
© 2008 MindTree Consulting
What is a data warehouse?
Time VariantAll data in the data warehouse is identified with a particular time
period.
Page 31
Operational Data Warehouse
Current Value data• time horizon : 60-90 days• key may not have element of time
Snapshot data• time horizon : 5-10 years• key has an element of time• data warehouse stores historical data
© 2008 MindTree Consulting
What is a data warehouse?
Ralph Kimball’s Definition
“A copy of transaction data specifically structured for query and analysis.”
Basically - “Snapshots of business events at regular intervals”
Page 32
© 2008 MindTree Consulting
How is a DW different from OLTP?
Page 33
OLTP DW
Business event / transaction oriented
Decision oriented
Supports Operations“Making the Wheels turn”
Decision support“Watching the Wheels turn”
View NarrowLooking ‘within’...
Broadlooking ‘across’...
Usage patterns
Stable, predictable Variable, Unpredictable
Time Limited time frame Historical data
Data Detailed only Detailed / Summarized and Derived
© 2008 MindTree Consulting
How is a DW different from OLTP?
Page 34
OLTP DW
Typical Operation
Insert/Update intensive(A “twinkling” database)
Read intensive(A quiet data store )
Age of Data Current Historical
Data Required/ Queried
Minimal Extensive
Table structure
Normalized.Minimum redundancy
De-normalized. Controlled redundancy
Scope of data Internal Internal+external
Data Reacts to events Can anticipate events
© 2008 MindTree Consulting
How is a DW different from OLTP?
To Summarize
Page 35
OLTP Systems are used to “run” a business and are based on ER Model
The Data Warehouse helps to “optimize” the business and is based on OLAP (dimensional model)
© 2008 MindTree Consulting
What is OLAP
Stands for OnLine Analytical Processing
OLAP tools aid users in quick and easy multi dimensional analysis to get insights into what’s happening
Supports features for the following
Slice and dice along the dimensions Drill up and drill down through hierarchies
Types of OLAP ROLAP – Relational OLAP
Data always comes from relational tables
MOLAP – Multidimensional OLAP Data always comes from multi-dimensional cubes
HOLAP – Hybrid OLAP Data always comes from both relational as well as multi-dimensional cubes Aggregated data comes from multi-dimensional cubes Detailed data comes from relational tables
Page 36
© 2008 MindTree Consulting
What is OLAP
Page 37
Product Manager View Regional Manager View
Financial Manager View Ad Hoc View
ProductFilmLensesCamerasFilm
RegionEastWestCentralWest
MonthDecJanFebMar
Sales240250690425
Record#001#002#003#004
Relational Model:
Product
Reg
ion
Time
Sales
Multidimensional Model:
Slice and Dice
© 2008 MindTree Consulting
EDW – Dimensional Model
Originated in the mid seventies by A.C.Nielson
Made popular by Ralph Kimball
Dimensional Model divides the world into
Measurement : Sales, Cost, Stock, Yield
Context (Dimensions) surrounding these measurements : Customer, Time, Service, Region
Two Variants of dimensional model
Star Schema
Snow Flake Schema
Page 38
© 2008 MindTree Consulting
EDW – Dimensional Model
Typical OLTP Model
Page 39
Payment
Mode
Payment Denial Product
Location Property Agent Product Line
Contract
Booking Business
Unit
Contact
Sales rep
Franchisee Customer
Division
Product Group
Data is S C A T T E R E D across !!!
© 2008 MindTree Consulting
EDW – Dimensional Model
Star Schema
Page 40
Site
Rate Plan
Channel
Date
# of new bookings
# of booking nights
# of rooms for bookings
# of guaranteed bookings
Site key
Site desc
Chain code
Site status
Rooms available
Mgmt company
Marketing area
Site QA Score
Lodge Score
Rest Score
Site
Bookings
Dimension Fact
MeasureHierarchy
Channel key
Channel desc
Original channel
Source id
Source desc
Source type
Channel
Rate plan key
Rate plan desc
Rate plan type
Brand
Rate Plan
Date key
Week
Month
Quarter
Year
Weekend flag
Time
© 2008 MindTree Consulting
Dimensions - Definition
●Contain descriptors of the business using which analysts
view data by.
●Dimensions sets the context for asking questions about the
facts in the fact table.
●SPEAKS BUSINESS LANGUAGE !!!
●Dimensions have multiple levels
●A combination of levels participate in a hierarchy
●Hierarchies are logical structures that use ordered levels as
a means of organizing data.
●A hierarchy can be used to define data aggregation.
Page 41
© 2008 MindTree Consulting
Dimension - Characteristics
●The tables contains all the textual descriptors of the
business.
●Dimensions supply the context in which a measurement was
made
●They correspond to the entities by which you want to
analyze the business
●Many columns
●Fewer rows
●Are linked to a fact table through a foreign key reference to
their primary keyPage 42
© 2008 MindTree Consulting
Dimensions – Examples
●Franchisee
●Consumer
●Property
●Car
●Channel
●Channel-Travel Agent
●Site
●Rate plans
●Brand
●Business unit
●Entity
●Entity group
Page 43
© 2008 MindTree Consulting
Fact - Definition
●Fact tables contain the measures related to a process or
event
●measures are analyzed by the various dimensions contained
in the dimension tables
●Each row in a fact table corresponds to a measurement.
●Fact tables have a few columns and lots of rows
Page 44
© 2008 MindTree Consulting
Fact - characteristics
They
●Are usually the largest tables
●Are usually appended to
●Can grow quickly
●Can contain either detail or summarized data
●Are joined to dimension tables through foreign keys
●It is always sparse – no rows are stored to represent ‘nothing happened’.
Page 45
© 2008 MindTree Consulting
Fact – examples
●Sum insured
●Amount Approved
●Claims ratio (derived fact)
●Premium Paid
Page 46
© 2008 MindTree Consulting
EDW – Dimensional Model Advantages
Page 47
4 * 4 * 5 * 6 = 480 reports
Actual salesSales ForecastReturnsComplaints
PRODUCTAll Products
Category
Brand
Product
All CustomersAll Depots
Region
Depot
Territory
Customer
Region
State
Time point
Area
CUSTOMERDEPOTAll Periods
Year
Month
Quarter
Day
PERIOD
© 2008 MindTree Consulting
EDW – Dimensional Model Advantages
Page 48
SELECT Channel_Desc,
year,
month,
'TotSales' = sum(Total_Sale)
FROM Arrivals st,
Channel_Dimension pd,
Time_Dimension td
WHERE Channel_Desc = ‘Agent'
and month = 2
and year in(1992,1994)
and st.Product_Key = pd.Product_Key
and st.Time_Key = td.Time_Key
GROUP BY
Channel_Desc,
year,
month
SELECT Channel_Desc,
‘Year’ = DATEPART(year,oht.book_Date),
‘Month’ = DATEPART(month,oht.book_Date),
‘TotRevenue’ = sum(DISTINCT(1+Tax_Rate)
*(days_booked*olt.rate_per_night))
FROM book_Header_Table oht,
book_Line_Table olt,
Property_Table st,
Product_Table pt,
SubChannel_Table sct,
Channel_Table ct
WHERE oht.book_Number = olt.book_Number
and oht.Property_Number = st. Property _Number
and olt.product_code = pt.product_code
and pt.product_code = sct.product_code
and sct.subChannel_code = ct.subChannel_code
and Channel_Desc = 'Agent'
and DATEPART(year,oht.book_Date) IN (1992, 1994)
and DATEPART(month,oht.book_Date) = 2
GROUP BY
Category_Desc,
DATEPART(year,oht.book_Date) ,
DATEPART(month,oht.book_Date)
Using OLTP Database Using Star Schema
5 Joins !!!
2 Joins !!!
Intensive computation
Less intensive
© 2008 MindTree Consulting
Typical Stages in the evolution of a DW
Stage 1: Reporting
Biggest challenge : Data Integration + Data quality
Example
Retail : what products does he buy ?
HealthCare : Which area contribute to maximum Claims?
Stage 2: Analysis
Less focus on what happened ?
More focus on why it happened ?
Iterative refinement of questions ( Q&A Map ) – support “chain of thought” analysis and questions
Example
Why did expenses increase by 10% compared to last quarter?
Page 49
© 2008 MindTree Consulting
Typical Stages in the evolution of a DW
Stage 3: Prediction Org is now well entrenched in the “whys” Build predictive models Regression ( linear/non linear), decision trees, Neural
Stage 4: Operational Insight Stage 13 on strategic decision making Process reengineering Example
Retail : Inventory management with JITHealthCare : Generating Preventive Campaigns well before time.
Stage 5: Activate Sense and respond layer sits on top of BI Example
Order raw material if inventory below threshold value
Page 50
© 2008 MindTree Consulting
Typical architecture of a DW
Page 51
© 2008 MindTree Consulting
Building Blocks - Component 1: Source Systems
●Operational systems that run the business SAP Siebel JD Edwards BAAN Point of sales application Oracle applications Home grown systems Excel spreadsheets
●Optimized for inserts and updates
●Very less redundancy of data by design …
Page 52
© 2008 MindTree Consulting
Building Blocks – Component 2 : ETL
●ETL stands for Extract Transform Load
●The action of Extracting information from one or more Source
Transforming it mid stream
Aggregation
Business Rules
Code normalization/cleansing
Loading it into a central database
Page 53
© 2008 MindTree Consulting
Building Blocks - Component 3: Staging Area
●Refers to both the storage area and a set of ETL processes
●Raw data is “massaged” and made ready for loading using ETL tools, scripts, SQL, etc Rules checking Re formatting / Re structuring etc
●Should NOT be exposed to business users
●May use flat files, relational tables or both
●Is the ‘black box’ that converts raw input data into finished data for the presentation layer
Page 54
© 2008 MindTree Consulting
Building Blocks - Component 4: Storage Layer
●This is where the data is organized, stored and made available for querying by users and tools
●This is the data warehouse for the business users
●Usually based on the dimensional model
Page 55
© 2008 MindTree Consulting
Building Blocks - Component 5: Reporting Layer
●Comprises tools and applications that present the data to end users for decision making.
●Could consist of:
Pre-canned reports (optionally web-enabled)
Ad-hoc query tools
Data mining applications
Budget Planning and forecasting applications, etc
Page 56
© 2008 MindTree Consulting
Building Blocks – Component 6 : The Metadata Layer
●The glue that binds the data warehouse components
●An encyclopedia of the data warehouse
●Crucial for maintaining the warehouse
●One of the hardest thing to manage in a warehouse !!!
Page 57
© 2008 MindTree Consulting
Other Building Blocks
Calculation Engines
For deriving new measures using the base measures
DW is the ideal place for calculating Key Performance Indicators
Extractors
For distributing data to other applications
MRDR (Master Reference Data Repository)
Also termed MDM – Master Data Management
Managing master data in the enterprise
Best place to implement conformed dimensions
Page 58
© 2008 MindTree Consulting
Tools required for a DW solution?
Page 59
To extract & transform data
baseSAS Datastage Informatica DTS PL-SQL Pro*C
To extract & transform data
baseSAS Datastage Informatica DTS PL-SQL Pro*C
To Store data Oracle SQL
Server DB2
To Store data Oracle SQL
Server DB2
To present data Business
Objects Cognos Microstrategy OLAP services Express Hyperion Brio SAS-EIS
To present data Business
Objects Cognos Microstrategy OLAP services Express Hyperion Brio SAS-EIS
Data cleansing Trillium I-Spheres
based solution
Data cleansing Trillium I-Spheres
based solution
© 2008 MindTree Consulting
SAP Business Information Warehouse - BW
What is SAP BW ? Data warehouse System with optimized
structures for reporting and analysis
OLAP engine and Tools
Integrated Meta Data Repository
Data Extraction and Staging Preconfigured support for data sources from R/3
System Business Application Programming Interfaces
(BAPI’s) for non-SAP systems
Automated Data Warehouse Management
Administrative Workbench for controlling and managing
DataExtraction
Transformation
Data Warehouse
Reports
Data Sources
Reporting and Analysis
Data Access
© 2008 MindTree Consulting
SAP BW Architecture
© 2008 MindTree Consulting
SAP BW Components
Info-objects
DataSources
Persistent Staging Area (PSA)
ODS objects
Infocubes
Master data
InfoProviders
Query and query views
InfoSpokes and Open-hub destination
Business Content
© 2008 MindTree Consulting
Star schema
The Star schema offers comprehensibility for software. The Star schema is the most popular way of implementing a Multi-Dimensional Model in a relational database
© 2008 MindTree Consulting
Star schema
The key elements of a Star schema are:
Central fact table with dimension tables shooting off from it
Fact tables typically store atomic and aggregate transaction information, such as quantitative amounts of goods sold. They are called facts.
Facts are numeric values of a normally additive nature.
Fact tables contain foreign keys to the most atomic dimension attribute of each dimension table.
Foreign keys tie the fact table rows to specific rows in each of the associated dimension tables.
The points of the star are dimension tables.
Dimension tables store both attributes about the data stored in the fact table and textual data.
Dimension tables are de-normalized.
The most atomic dimension attributes in the dimensions define the granularity of the information, i.e. the number of records in the fact table.
© 2008 MindTree Consulting
Extended Star Schema
Attributes of the dimension tables are called characteristics. The meta data objects for these are infoobjects
Hierarchies of characteristics or attributes may be stored in separate hierarchy tables. Therefore these hierarchies are named external hierarchies
Textual descriptions of a characteristic are stored in a separate text table. The system runs in different languages at a time.
Dependent attributes of a characteristic can be stored in a separate table called the Master Data Table for the characteristic
© 2008 MindTree Consulting
Extended Star Schema - Continued
Text
SID Tables
Master
Hierarchies
Hierarchies
Master
SID Tables
Text
Hierarchies
Master
SID Tables
Text
Hierarchies
Master
SID Tables
Text
Hierarchies
Master
SID Tables
Text
Hierarchies
Master
SID Tables
Text
Text
SID Tables
Master
Hierarchies
Text
SID Tables
Master
Hierarchies
Text
SID Tables
Master
Hierarchies
DimensionTable
Text
SID Tables
Master
Hierarchies
DimensionTable
DimensionTable
DimensionTable
DimensionTable
Hierarchies
Master
SID Tables
Text
FACT
Solution Dependent Schema
The InfoCube, which describes the process-oriented part of the solution.
An InfoCube consist of One fact table andSeveral dimension tables
Solution Independent Schema
The Shared Master Table valid for use with any info cube or ODS object
These master tables are the glue that binds the data warehouse
pointer or translation tables called SID (Surrogate-ID) tables are used in the BW schema to link the solution-independent master tables of the BW schema to InfoCubes
© 2008 MindTree Consulting
Comparison
Slide 67
© 2008 MindTree Consulting
Extended Star Schema – Key Elements
Attributes located in the dimensions are called Characteristics.
Attributes located in a master data table of a Characteristic are called attributes of the Characteristic.
SID tables (pointer tables) provide the technical link to the Master Data (attribute, text and hierarchy) tables that are outside the dimension of a star schema.
Dimension tables are built using the combination of numeric SID values of each Characteristic in the Dimension.
External information (attributes of the Characteristics, text descriptions and external hierarchies) is stored separately (shared) and linked to the InfoCubes.
Historical relationships as well as the current state of the data can be maintained and reported on
Multiple languages are supported for text / description
© 2008 MindTree Consulting
Administrator Workbench
The Data Warehousing Workbench (DWB) is the central tool for performing the tasks in the data warehousing process
It provides data modeling functions as well as functions for control, monitoring and maintenance of all processes in SAP NetWeaver BI having to do with data procurement, data retention, and data processing.
Functional Areas of the Data Warehousing Workbench: Modeling
Administration
Transport connection
Documents
Business Content
Translation
Metadata Repository
© 2008 MindTree Consulting
Modeling
Used to create and maintain
(meta) objects relevant to the
data staging process in SAP BW.
Objects are displayed in a tree
structure, in which the objects are
ordered according to hierarchical
criteria.
To access the Modeling function
area, choose transaction RSA1.
© 2008 MindTree Consulting
Administration
Functional areas is used to
display the navigational area
and, if applicable, the
corresponding object tree in
the left hand area of the
screen when applications are
called.
This means that you can use
the tree to start new
application you are in
© 2008 MindTree Consulting
Transport Connection
Used to collect newly created or
changed objects in the SAP BW
system.
You can use the Change and
Transport Organizer (CTO) to
transport them into other SAP
BW systems.
© 2008 MindTree Consulting
Documents
The Documents function area
enables you to insert, search
in, and create links for one or
more documents in various
formats, versions and
languages for SAP BW objects.
© 2008 MindTree Consulting
BI Content
BI Content provides pre-
configured information models
based on metadata.
It provides users in an
enterprise with a selection of
information they can use to
fulfill their tasks.
To access the BI Content
function area, choose the
transaction RSORBCT
© 2008 MindTree Consulting
Translation
In the Translation function area,
you can translate short and
long texts belonging to SAP
BW- objects.
© 2008 MindTree Consulting
Metadata Repository
All SAP BW Meta objects and
the corresponding links to
each other are managed
centrally.
In addition, metadata can
also be exchanged between
different systems, HTML
pages can be exported, and
graphics for the objects can
be displayed.
To access the Metadata
Repository function area,
choose the transaction RSOR.
© 2008 MindTree Consulting© 2008 MindTree Consulting
Thank You
© 2008 MindTree Consulting© 2008 MindTree Consulting© 2008 MindTree Limited
Imagination Action JoyImagination Action Joy