Transcript
Page 1: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

1

Data MiningData Warehouses

Page 2: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

2

What is a data warehouse?

A multi-dimensional data model

Data warehouse architecture

From data warehousing to data mining

Data Warehousing and OLAP Technology: An Overview

Page 3: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

3

Defined in many different ways, but not rigorously.

◦ A decision support database that is maintained separately

from the organization’s operational database

◦ Support information processing by providing a solid

platform of consolidated, historical data for analysis.

“A data warehouse is a subject-oriented, integrated, time-

variant, and nonvolatile collection of data in support of

management’s decision-making process.”—W. H. Inmon

Data warehousing:

◦ The process of constructing and using data warehouses

What is Data Warehouse?

Page 4: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

4

Organized around major subjects, such as

customer, product, sales

Focusing on the modeling and analysis of data for

decision makers, not on daily operations or

transaction processing

Provide a simple and concise view around

particular subject issues by excluding data that

are not useful in the decision support process

Data Warehouse—Subject-Oriented

Page 5: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

5

Constructed by integrating multiple, heterogeneous data sources◦ relational databases, flat files, on-line

transaction records Data cleaning and data integration techniques

are applied.◦ Ensure consistency in naming conventions,

encoding structures, attribute measures, etc. among different data sources E.g., Hotel price: currency, tax, breakfast covered, etc.

◦ When data is moved to the warehouse, it is converted.

Data Warehouse—Integrated

Page 6: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

6

The time horizon for the data warehouse is significantly longer than that of operational systems

◦ Operational database: current value data

◦ Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years)

Every key structure in the data warehouse

◦ Contains an element of time, explicitly or implicitly

◦ But the key of operational data may or may not contain “time element”

Data Warehouse—Time Variant

Page 7: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

7

A physically separate store of data transformed

from the operational environment

Operational update of data does not occur in the

data warehouse environment

◦ Does not require transaction processing,

recovery, and concurrency control mechanisms

◦ Requires only two operations in data accessing: initial loading of data and access of data

Data Warehouse—Nonvolatile

Page 8: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

8

OLTP (on-line transaction processing)◦ Major task of traditional relational DBMS◦ Day-to-day operations: purchasing, inventory, banking,

manufacturing, payroll, registration, accounting, etc. OLAP (on-line analytical processing)

◦ Major task of data warehouse system◦ Data analysis and decision making

Distinct features (OLTP vs. OLAP):◦ User and system orientation: customer vs. market◦ Data contents: current, detailed vs. historical, consolidated◦ Database design: ER + application vs. star + subject◦ View: current, local vs. evolutionary, integrated◦ Access patterns: update vs. read-only but complex queries

Data Warehouse vs. Operational DBMS

Page 9: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

9

OLTP vs. OLAP

OLTP OLAP

function day to day operations decision support

DB design application-oriented subject-oriented

data current, up-to-date detailed, flat relational isolated

historical, summarized, multidimensional integrated, consolidated

access read/write index/hash on prim. key

lots of scans

unit of work short, simple transaction complex query

# records accessed tens millions

#users thousands hundreds

DB size 100MB-GB 100GB-TB

Page 10: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

10

What is a data warehouse?

A multi-dimensional data model

Data warehouse architecture

From data warehousing to data mining

Data Warehousing and OLAP Technology: An Overview

Page 11: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

11

A data warehouse is based on a multidimensional data

model which views data in the form of a data cube

A data cube, such as sales, allows data to be modeled and

viewed in multiple dimensions

◦ Dimension tables, such as item (item_name, brand, type),

or time(day, week, month, quarter, year)

◦ Fact table contains measures (such as dollars_sold) and

keys to each of the related dimension tables

From Tables and Spreadsheets to Data Cubes

Page 12: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

12

Modeling data warehouses: dimensions &

measures

◦ Star schema: A fact table in the middle

connected to a set of dimension tables

◦ Snowflake schema: A refinement of star schema

where some dimensional hierarchy is normalized

into a set of smaller dimension tables, forming a

shape similar to snowflake

Conceptual Modeling of Data Warehouses

Page 13: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

13

Example of Star Schema

time_keydayday_of_the_weekmonthquarteryear

time

location_keystreetcitystate_or_provincecountry

location

Sales Fact Table

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_sales

Measures

item_keyitem_namebrandtypesupplier_type

item

branch_keybranch_namebranch_type

branch

Page 14: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

14

Example of Star Schema

Page 15: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

15

Example of Snowflake Schema

time_keydayday_of_the_weekmonthquarteryear

time

location_keystreetcity_key

location

Sales Fact Table

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_sales

Measures

item_keyitem_namebrandtypesupplier_key

item

branch_keybranch_namebranch_type

branch

supplier_keysupplier_type

supplier

city_keycitystate_or_provincecountry

city

Page 16: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

Example of Snowflake Schema

Page 17: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

17

Allows data to be modeled and viewed in multiple dimensions

Another representation like star schema Dimensions are entities which you want to

keep records◦ Time, item, branch, location…

Each dimension has a table called dimension table

Data Cubes

Page 18: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

18

3-D Data Cube

• 4-D cubes can be visualized as a series of 3-D cubesSupplier 1 Supplier 2 Supplier 3

Page 19: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

19

A data cube is often referred as a cuboid Can generate a cubiod for all possible

subsets of dimensions◦ Provide different level of summarization

N-D cube base cuboid◦ Lowest level of summary

0-D cube apex cuboid◦ Highest level of summary◦ Summary over all dimensions

Cuboid

Page 20: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

20

Cube: A Lattice of Cuboids

time,item

time,item,location

time, item, location, supplier

all

time item location supplier

time,location

time,supplier

item,location

item,supplier

location,supplier

time,item,supplier

time,location,supplier

item,location,supplier

0-D(apex) cuboid

1-D cuboids

2-D cuboids

3-D cuboids

4-D(base) cuboid

Page 21: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

21

Start with time-product (2-D) table that shows sale amounts

Ex. Data Cube Gen

TV PC VCR

1st Qtr 1000 850 350

2nd Qtr 1352 940 298

3rd Qtr 1450 658 314

4th Qtr 1500 965 365

USA

Page 22: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

22

TV PC VCR TV PC VCR TV PC VCR

1st Q 1000 850 350 2600 750 425 1300 850 350

2nd Q 1352 940 298 1752 860 236 1200 1000 400

3rd Q 1450 658 314 1055 458 520 1150 555 510

4th Q 1500 965 365 1350 1065 390 900 750 425

Ex. Data Cube Gen (3D)

USA Canada Mexico

Page 23: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

23

A Sample Data Cube

Total annual salesof TV in U.S.A.Date

Produ

ct

Cou

ntr

ysum

sum TV

VCRPC

1Qtr 2Qtr 3Qtr 4Qtr

U.S.A

Canada

Mexico

sum

Page 24: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

24

Cuboids Corresponding to the Cube

all

product date country

product,date product,country date, country

product, date, country

0-D(apex) cuboid

1-D cuboids

2-D cuboids

3-D(base) cuboid

Page 25: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

25

Sales volume as a function of product, month, and region

Multidimensional DataP

rodu

ctReg

ion

Month

Dimensions: Product, Location, TimeHierarchical summarization paths

Industry Region Year

Category Country Quarter

Product City Month Week

Office Day

Page 26: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

26

A Concept Hierarchy: Dimension (location)

all

Europe North_America

MexicoCanadaSpainGermany

Vancouver

M. WindL. Chan

...

......

... ...

...

all

region

office

country

TorontoFrankfurtcity

Page 27: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

27

Roll up (drill-up): summarize data◦ by climbing up hierarchy or by dimension reduction

Drill down (roll down): reverse of roll-up◦ from higher level summary to lower level summary

or detailed data, or introducing new dimensions Slice and dice:

◦ project and select Pivot (rotate):

◦ reorient the cube, visualization, 3D to series of 2D planes

Typical OLAP Operations

Page 28: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

28

Page 29: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

29

What is a data warehouse?

A multi-dimensional data model

Data warehouse architecture

From data warehousing to data mining

Chapter 3: Data Warehousing and OLAP Technology: An Overview

Page 30: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

Data Warehouse: A Multi-Tiered Architecture

DataWarehouse

ExtractTransformLoadRefresh

OLAP Engine

AnalysisQueryReportsData mining

Monitor&

IntegratorMetadata

Data Sources Front-End Tools

Serve

Data Marts

Operational DBs

Othersources

Data Storage

OLAP Server

Page 31: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

31

Data Mart◦ a subset of corporate-wide data that is of value to a specific

groups of users. Its scope is confined to specific, selected groups, such as marketing data mart

Meta data is the data defining warehouse objects. It stores:◦ Description of the structure of the data warehouse

schema, view, dimensions, hierarchies, data mart locations and contents

◦ Monitoring information warehouse usage statistics, error reports

◦ Algorithms used for summarization

Warehouse Architecture

Page 32: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

32

What is a data warehouse?

A multi-dimensional data model

Data warehouse architecture

From data warehousing to data mining

Chapter 3: Data Warehousing and OLAP Technology: An Overview

Page 33: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

33

Three kinds of data warehouse applications

◦ Information processing supports querying, basic statistical analysis, and

reporting using crosstabs, tables, charts and graphs

◦ Analytical processing multidimensional analysis of data warehouse data supports basic OLAP operations, slice-dice, drilling,

pivoting

◦ Data mining knowledge discovery from hidden patterns supports associations, constructing analytical models,

performing classification and prediction, and presenting the mining results using visualization tools

Data Warehouse Usage

Page 34: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

34

What is a data warehouse?

A multi-dimensional data model

Data warehouse architecture

From data warehousing to data mining

Summary

Chapter 3: Data Warehousing and OLAP Technology: An Overview

Page 35: Data Warehouses 1.  What is a data warehouse?  A multi-dimensional data model  Data warehouse architecture  From data warehousing to data mining 2

35

Why data warehousing? A multi-dimensional model of a data warehouse

◦ Star schema, snowflake schema

◦ A data cube consists of dimensions & measures OLAP operations: drilling, rolling, slicing, dicing and pivoting Data warehouse architecture

Summary: Data Warehouse and OLAP Technology


Top Related