![Page 1: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/1.jpg)
Chapter 15Data Warehousing, OLAP, and
Data Mining
![Page 2: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/2.jpg)
2
Introduction
• Data, data, data…everywhere!• Information…that’s another story!• Especially, the right information @ the right time!• Data warehousing’s goal is to make the right
information available @ the right time• Data warehousing is a data store (eg., a
database of some sort) and a process for bringing together disparate data from throughout an organization for decision-support purposes
![Page 3: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/3.jpg)
3
Introduction
• Data warehouses are natural allies for data mining (work together well)
• Data mining can help fulfill some of the goal of data warehouses – right information @ the right time
• Relational database management systems (RDBMS), such as Oracle, DB2, Sybase, Informix, Focus, SQL Server, etc. are often used for data warehousing
![Page 4: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/4.jpg)
4
Definitions of a Data Warehouse
- W.H. Inmon
“A subject-oriented, integrated, time-variant and
non-volatile collection of data in support of
management's decision making process”
- Ralph Kimball
“A copy of transaction data, specifically structured for query and analysis”
1.
2.
![Page 5: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/5.jpg)
5
Data Warehouse
• For organizational learning to take place, data
from many sources must be gathered together
and organized in a consistent and useful way –
hence, Data Warehousing (DW)
• DW allows an organization (enterprise) to
remember what it has noticed about its data
• Data Mining techniques make use of the data in
a Data Warehouse
![Page 6: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/6.jpg)
6
Data Warehouse
Customers
Etc…
Vendors Etc…
Orders
DataWarehouse
Enterprise“Database”
Transactions
Copied, organizedsummarized
Data Mining
Data Miners:• “Farmers” – they know• “Explorers” - unpredictable
![Page 7: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/7.jpg)
7
Data Warehouse
A data warehouse is a copy of transaction data
specifically structured for querying, analysis, reporting,
and more rigorous data mining
Note that the data warehouse contains a copy of the
transactions which are not updated or changed later by
the transaction system
Also note that this data is specially structured, and may
have been transformed when it was copied into the data
warehouse
![Page 8: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/8.jpg)
8
Data Mart
• A Data Mart is a smaller, more focused
Data Warehouse – a mini-warehouse.
• A Data Mart typically reflects the business
rules of a specific business unit within an
enterprise.
![Page 9: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/9.jpg)
9
Data Warehouse to Data Mart
DataWarehouse
Data Mart
Data Mart
Data Mart
Decision Support
Information
Decision Support
Information
Decision Support
Information
![Page 10: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/10.jpg)
10
Generic Architecture of Data
(synonym) Transaction data
![Page 11: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/11.jpg)
11
Transaction (Operational) Data
• Operational (production) systems create (massive number of) transactions, such as sales, purchases, deposits, withdrawals, returns, refunds, phone calls, toll roads, web site “hits”, etc…
• Transactions are the base level of data – the raw material for understanding customer behavior
• Unfortunately, operational systems change due to changing business needs
• Fortunately, operational systems can usually be changed to support changing business needs
• Data warehousing strategies need to be aware of operational system changes
![Page 12: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/12.jpg)
12
Operational Summary Data
Summaries are for a specific time period and utilize the transaction data for that time period
Other Examples???
![Page 13: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/13.jpg)
13
Decision Support Summary Data
• The data that are used to help make decisions about the business– Financial Data, such as:
• Income Statements (Profit & Loss)• Balance Sheets (Assets – Liabilities = Net Worth)
– Sales summaries– Other examples???
• Data warehouses maintain this type of data, however financial data “of record” (for audit purposes) usually comes from databases and not the data warehouse (confusing???)
• Generally, it is a bad idea to use the same system for analytic and operational purposes
![Page 14: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/14.jpg)
14
Database Schema
• Database schema defines the structure of data, not the values of the data (e.g., first name, last name = structure; Ron Norman = values of the data)
• In RDBMS:– Columns = fields = attributes (A,B,C)– Rows = records = tuples (1-7)
![Page 15: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/15.jpg)
15
Logical & Physical Database Schema
• Describes data in a way that is familiar to business users
• Describes the data the way it will be stored in an RDBMS which might be different than the way the logical shows it
![Page 16: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/16.jpg)
16
Metadata
• General definition: Data about data !!!– Examples:
• A library’s card catalog (metadata) describes publications (data)
• A file system maintains permissions (metadata) about files (data)
• A form of system documentation including:– Values legally allowed in a field (e.g., AZ, CA, OR, UT, WA, etc.)– Description of the contents of each field (e.g., start date)– Date when data were loaded– Indication of currency of the data (last updated)– Mappings between systems (e.g., A.this = B.that)
• Invaluable, otherwise have to research to find it
![Page 17: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/17.jpg)
17
Business Rules
• Highest level of abstraction from operational (transaction) data
• Describes why relationships exist and how they are applied
• Examples:– Need to have 3 forms of ID for credit
– Only allow a maximum daily withdrawal of $200
– After the 3rd log-in attempt, lock the log-in screen
– Accept no bills larger than $20
– Others???
![Page 18: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/18.jpg)
18
General Architecture for Data Warehousing
• Source systems
• Extraction, (Clean),
Transformation, &
Load (ETL)
• Central repository
• Metadata repository
• Data marts
• Operational feedback
• End users (business)
![Page 19: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/19.jpg)
19
Where does OLAP fit in?
![Page 20: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/20.jpg)
20
OLAP Overview
• Interactive, exploratory analysis of multidimensional data to discover patterns
age accid
ents
gen
de
r
![Page 21: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/21.jpg)
21
OLAP Architecture
![Page 22: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/22.jpg)
22
Server Options
• Single processor
• Symmetric
multiprocessor (SMP)
• Massively parallel
processor (MPP)
![Page 23: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/23.jpg)
23
OLAP Server Options
• ROLAP (Relational)
• MOLAP (Multidimensional)
• HOLAP (Hybrid)
![Page 24: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/24.jpg)
24
OLAP – Online Analytical Processing
• A definition:
• Data representation is in the form of a CUBE• OLAP goes beyond SQL with its analysis
capabilities• Key feature of OLAP: Relevant multi-dimensional
views such as products, time, geography
![Page 25: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/25.jpg)
25
OLAP Cube - 1
![Page 26: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/26.jpg)
26
OLAP Cube - 2
![Page 27: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/27.jpg)
27
OLAP Cube - 3
• Star Structure (quite common)
Facts
Week
Product
Product
Year
Region
Time
Channel
Revenue
Expenses
Units
Model
Type
Color
Channel
Region
Nation
District
Dealer
Time
![Page 28: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/28.jpg)
28
OLAP Cube - 4
Sales 1996
Redblob
Blueblob
1997
TheCube
![Page 29: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/29.jpg)
29
OLAP Cube - 5
Three-Dimensional
CubeDisplay
Page ColumnsRegion:North
Sales
Redblob
Blueblob
Total
1996Rows 1997Year Total
![Page 30: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/30.jpg)
30
OLAP Cube - 6
Six-Dimensional
Cube
Dimension ExampleBrand Mt. AiryStore AtlantaCustomer segment BusinessProduct group DesksPeriod JanuaryVariable Units sold
![Page 31: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/31.jpg)
31
Rotation (Pivot Table)
![Page 32: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/32.jpg)
32
Drill Down
![Page 33: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/33.jpg)
33
OLAP Examples
• http://perso.wanadoo.fr/bernard.lupin/english/example.htm
• Excel Pivot Table example (similar to OLAP cube)
![Page 34: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/34.jpg)
34
Sample of OLAP products
Just a snippet from http://www.olapreport.com/ProductsIndex.htm ; not an endorsement
![Page 35: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/35.jpg)
35
Data Mining versus OLAP
![Page 36: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/36.jpg)
36
Data Mining versus OLAP
• OLAP - Online
Analytical Processing
– Provides you with a very
good view of what is
happening, but can not
predict what will happen
in the future or why it is
happening
![Page 37: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/37.jpg)
37
Results of Data Mining Include:
• Forecasting what may happen in the future• Classifying people or things into groups by
recognizing patterns• Clustering people or things into groups
based on their attributes• Associating what events are likely to occur
together• Sequencing what events are likely to lead
to later events
![Page 38: Chapter 15 Data Warehousing, OLAP, and Data Mining](https://reader036.vdocuments.site/reader036/viewer/2022081501/56649d5f5503460f94a406f4/html5/thumbnails/38.jpg)
38
End of Chapter 15