data warehouse and business intelligence dr. minder chen [email protected] fall 2008
Post on 20-Dec-2015
220 views
TRANSCRIPT
Data Warehouse and Business Intelligence
Dr. Minder Chen
Fall 2008Fall 2008
Data Warehouse - 2 © Minder Chen, 2004-2008
Online Resources
• Additional resources: – Teradata Student Network.
» The Premier Learning Resource for Data Warehousing, DSS/BI, and Database. The URL is http://www.teradatastudentnetwork.com
» PSW: smartdecisions
Data Warehouse - 3 © Minder Chen, 2004-2008
BI
Business Intelligence (BI) is the process of gathering meaningful information to answer questions and identify significant trends or patterns, giving key stakeholders the ability to make better business decisions.
“The key in business is to know something that
nobody else knows.”-- Aristotle Onassis
PHOTO: HULTON-DEUTSCH COLL
“To understand is to perceive patterns.”
— Sir Isaiah Berlin
"The manager asks how and when, the leader asks what and why."
— “On Becoming a Leader” by Warren Bennis
Data Warehouse - 4 © Minder Chen, 2004-2008
BI Questions
• What happened?– What were our total sales this month?
• What’s happening?– Are our sales going up or down, trend analysis
• Why?– Why have sales gone down?
• What will happen?– Forecasting & “What If” Analysis
• What do I want to happen?– Planning & Targets
Source: Bill Baker, Microsoft
Data Warehouse - 5 © Minder Chen, 2004-2008
Increasing potentialto supportbusiness decisions (MIS) End User
Business Analyst
DataAnalyst
DBA
MakingDecisions
Data Presentation
Visualization Techniques
Data MiningInformation Discovery
Data ExplorationOLAP, MDA,
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts
Data Sources(Paper, Files, Information Providers, Database Systems, OLTP)
Business Intelligence
Data Warehouse - 6 © Minder Chen, 2004-2008
Where is Business Intelligence applied?
• ERP Reporting
• KPI Tracking
• Product Profitability
• Risk Management
• Balanced Scorecard
• Activity Based Costing
• Global Sourcing
• Logistics
• Sales Analysis
• Sales Forecasting
• Segmentation
• Cross-selling
• CRM Analytics
• Campaign Planning
• Customer Profitability
Operational Efficiency Customer Interaction
Data Warehouse - 8 © Minder Chen, 2004-2008
Inmon's Definition of Data Warehouse – Data View
• A warehouse is a
– subject-oriented,
– integrated,
– time-variant and – non-volatile
collection of data in support of management's decision making process.
– Bill Inmon in 1990
Source: http://www.intranetjournal.com/features/datawarehousing.html
Data Warehouse - 9 © Minder Chen, 2004-2008
Inmon's Definition Explain• Subject-oriented: They are organized around major
subjects such as customer, supplier, product, and sales. Data warehouses focus on modeling and analysis to support planning and management decisions v.s. operations and transaction processing.
• Integrated: Data warehouses involve an integration of sources such as relational databases, flat files, and on-line transaction records. Processes such as data cleansing and data scrubbing achieve data consistency in naming conventions, encoding structures, and attribute measures.
• Time-variant: Data contained in the warehouse provide information from an historical perspective.
• Nonvolatile: Data contained in the warehouse are physically separate from data present in the operational environment.
Data Warehouse - 10 © Minder Chen, 2004-2008
Kimball's Definition – Process View
• A data warehouse is a system that extracts, cleans, conforms, and delivers source data into a dimensional data store and then supports and implements querying and analysis for the purpose of decision making.
» Ralph Kimball
Data Warehouse - 12 © Minder Chen, 2004-2008
The Data Warehouse Process
Data Marts Data Marts and cubesand cubes
DataDataWarehouseWarehouse
SourceSourceSystemsSystems
ClientsClients
Design theDesign the Populate Populate CreateCreate QueryQuery Data Warehouse Data Warehouse Data Warehouse Data Warehouse OLAP CubesOLAP Cubes DataData
33 44
Query ToolsQuery ToolsReportingReportingAnalysisAnalysis
Data MiningData Mining
2211
Data Warehouse - 17 © Minder Chen, 2004-2008
OLTP Normalized Design
Ordering Ordering ProcessProcess
Ware- Ware- househouse
POS POS ProcessProcess
Chain Chain RetailerRetailer
Retailer Retailer ReturnsReturns
Retailer Retailer PaymentsPayments
StoreStore
ProductProduct
BrandBrandGLGL AccountAccount
ClerkClerk
Retail Retail CustCust
Cash Cash RegisterRegister
Retail Retail PromoPromo
Data Warehouse - 18 © Minder Chen, 2004-2008
OLTP Versus Business Intelligence: Who asks what?
OLTP Questions
• When did that order ship?
• How many units are in inventory?
• Does this customer haveunpaid bills?
• Are any of customer X’s line items on backorder?
Analysis Questions• What factors affect order
processing time?
• How did each product line (or product) contribute to profit last quarter?
• Which products have the lowest Gross Margin?
• What is the value of items on backorder, and is it trending up or downover time?
Data Warehouse - 19 © Minder Chen, 2004-2008
OLTP vs. OLAP
Source: http://www.rainmakerworks.com/pdfdocs/OLTP_vs_OLAP.pdf#search=%22OLTP%20vs.%20OLAP%22
Data Warehouse - 20 © Minder Chen, 2004-2008
Dimensional Design Process
• Select the business process to model • Declare the grain of the business process/data
in the fact table • Choose the dimensions that apply to each fact
table row• Identify the numeric facts that will populate
each fact table row
BusinessRequirements
Data Realities
Data Warehouse - 21 © Minder Chen, 2004-2008
Select a business process to model
• Not business departments or business functions
• Cross-functional business processes
• Business events
• Examples: – Raw materials purchasing
– Order fulfillment process
– Shipments
– Invoicing
– Inventory
– General ledger
Data Warehouse - 23 © Minder Chen, 2004-2008
Identifying Measures and Dimensions
The attribute variescontinuously: •Balance•Unit Sold•Cost•Sales
The attribute is perceived asa constant or discrete value:
•Description•Location•Color•Size
DimensionsMeasures
Performance Measures for KPI
Performance Drivers
Data Warehouse - 25 © Minder Chen, 2004-2008
Product Dimension
• SKU: Stock Keeping Unit
• Hierarchy: – Department Category Subcategory Brand Product
Data Warehouse - 26 © Minder Chen, 2004-2008
Creating Dimensional Model
• Identify fact tables• Translate business measures into fact tables
• Analyze source system information for additional measures
• Identify base and derived measures
• Document additivity of measures
• Identify dimension tables
• Link fact tables to the dimension tables
• Create views for users
Data Warehouse - 28 © Minder Chen, 2004-2008
Inside a Dimension Table
• Dimension table key: Uniquely identify each row. Use surrogate key (integer).
• Table is wide: A table may have many attributes (columns).
• Textual attributes. Descriptive attributes in string format. No numerical values for calculation.
• Attributes not directly related: E.g., product color and product package size. No transitive dependency.
• Not normalized (star schemar).
• Drilling down and rolling up along a dimension.
• One or more hierarchy within a dimension.
• Fewer number of records.
Data Warehouse - 29 © Minder Chen, 2004-2008
Fact Tables
Fact tables have the following characteristics:• Contain numeric measures (metric) of the
business• May contain summarized (aggregated) data• May contain date-stamped data• Are typically additive• Have key value that is typically a concatenated
key composed of the primary keys of the dimensions
• Joined to dimension tables through foreign keys that reference primary keys in the dimension tables
Data Warehouse - 30 © Minder Chen, 2004-2008
Facts Table
DateID
ProductID
CustomerID
Units
Dollars
DimensionsDimensionsDimensionsDimensions
MeasuresMeasuresMeasuresMeasures
The Fact Table contains keys and units of The Fact Table contains keys and units of measuremeasure
Measurements of business events.
Data Warehouse - 31 © Minder Chen, 2004-2008
Snowflake Schema
SalesSales
CustomersCustomers
DatesDates
ProductsProducts
ChannelsChannels
PromotionsPromotions
BrandsBrands
Data Warehouse - 33 © Minder Chen, 2004-2008
OLAP Solutions
• Data Warehouse/Data Mart
• Dimensions
• Measures
• Cubes
• Cells
Gadgets
Gizmos
Thingies
Widgets
Q1 Q2 Q3 Q4
US
EuropeAsia
130 135 140 142
205 390 350 475
175 230 190 250
310 340 410 450
Data Warehouse - 34 © Minder Chen, 2004-2008
Operations in Multidimensional Data Model
• Aggregation (roll-up)
– dimension reduction: e.g., total sales by city
– summarization over aggregate hierarchy: e.g., total sales by city and year total sales by region and by year
• Selection (slice) defines a subcube
– e.g., sales where city = Palo Alto and date = 1/15/96
• Navigation to detailed data (drill-down)
– e.g., (sales - expense) by city, top 3% of cities by average income
• Visualization Operations (e.g., Pivot)
Data Warehouse - 35 © Minder Chen, 2004-2008
A Visual Operation: Pivot (Rotate)
1010
4747
3030
1212
JuiceJuice
ColaCola
Milk Milk
CreaCreamm
NYNY
LALA
SFSF
3/1 3/2 3/3 3/1 3/2 3/3 3/43/4
DateDate
Month
Month
Reg
ion
Reg
ion
ProductProduct
Data Warehouse - 37 © Minder Chen, 2004-2008
Store Dimension
• It is not uncommon to represent multiple hierarchies in a dimension table. Ideally, the attribute names and values should be unique across the multiple hierarchies.
Data Warehouse - 38 © Minder Chen, 2004-2008
Multidimensional Query Techniques
What?Why?
Why?
Why? Slicing
Dicing
Drillingdown
ProductTime
Geography
Data Warehouse - 39 © Minder Chen, 2004-2008
ETL
ETL = Extract, Transform, Load
• Moving data from production systems to DW
• Checking data integrity
• Assigning surrogate key values
• Collecting data from disparate systems
• Reorganizing data
Data Warehouse - 41 © Minder Chen, 2004-2008
Data Quality Issues
• No common time basis
• Different calculation algorithms
• Different levels of extraction
• Different levels of granularity
• Different data field names
• Different data field meanings
• Missing information
• No data correction rules
• No drill-down capability
Data Warehouse - 43 © Minder Chen, 2004-2008
CUST #CUST # NAMENAME ADDRESSADDRESS TYPETYPE
90238475
90233479
90233489
90234889
90345672
90328574
90328575
Digital Equipment
Digital
Digital Corp
Digital Consulting
Digital Info Service
Digital Integration
DEC
187 N. PARK St. Salem NH 01458187 N. Pk. St. Salem NH 01458
187 N. Park St Salem NH 01458
187 N. Park Ave. Salem NH 01458
15 Main Street Andover MA 02341PO Box 9 Boston MA 02210
Park Blvd. Boston MA 04106
OEM
OEM
$#%
Comp
Consult
Mail List
SYS INT
No Unique KeyNoise in
Blank FieldsSpellingNo StandardizationAnomalies
How does one correctly identify and consolidate anomalies from millions of records?
The Anomalies Nightmare
Data Warehouse - 44 © Minder Chen, 2004-2008
OLAP and Data Mining Address Different Types of Questions
While reporting and OLAP are informative about past facts, only data mining can help you predict the future of your business.
OLAP Data Mining
What was the response rate to our mailing? What is the profile of people who are likely to respond to future mailings?
How many units of our new product did we sell to our existing customers?
Which existing customers are likely to buy our next new product?
Who were my 10 best customers last year? Which 10 customers offer me the greatest profit potential?
Which customers didn't renew their policies last month?
Which customers are likely to switch to the competition in the next six months?
Which customers defaulted on their loans? Is this customer likely to be a good credit risk?
What were sales by region last quarter? What are expected sales by region next year?
What percentage of the parts we produced yesterday are defective?
What can I do to improve throughput and reduce scrap?
Source: http://www.dmreview.com/editorial/dmreview/print_action.cfm?articleId=2367
Data Warehouse - 45 © Minder Chen, 2004-2008
Use of Data Mining
• Customer profiling
• Market segmentation
• Buying pattern affinities
• Database marketing
• Credit scoring and risk analysis
Data Warehouse - 46 © Minder Chen, 2004-2008
Associates
Which items are purchased in a retail store at the same time?
Data Warehouse - 47 © Minder Chen, 2004-2008
Sequential Patterns
What is the likelihood that a customer will
buy a product next month, if he buys a related item today?
Data Warehouse - 48 © Minder Chen, 2004-2008
Classifications
Determine customers’ buying patterns
and then find other customers with
similar attributes that may be targeted for
a marketing campaign.