roadmap 1.what is the data warehouse, data mart 2.multi-dimensional data modeling 3.data warehouse...

15
Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator – semantics and computation 5.Aggregate View Selection

Upload: james-carpenter

Post on 03-Jan-2016

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –

Roadmap

1. What is the data warehouse, data mart 2. Multi-dimensional data modeling3. Data warehouse design – schemas, indices4. The Data Cube operator – semantics and

computation5. Aggregate View Selection

Page 2: Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –

Why not Using Existing DB?

• DBMS is for On Line Transaction Processing (OLTP)– automate day-to-day operations (purchasing,

banking etc)

• Data Warehouse is for On Line Analytical Processing (OLAP)– need historical data for trend analysis

Page 3: Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –

OLTP vs. OLAP OLTP OLAP

users clerk, IT professional knowledge worker

function day to day operations decision support

DB design application-oriented subject-oriented

data current, up-to-date detailed, flat relational isolated

historical, summarized, multidimensional integrated, consolidated

usage repetitive ad-hoc

access read/write index/hash on prim. key

lots of scans

unit of work short, simple transaction complex query

# records accessed tens millions

#users thousands hundreds

DB size 100MB-GB 100GB-TB

metric transaction throughput query throughput, response

Page 4: Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –

Examples of OLAP• Comparisons (this period v.s. last period)

– Show me the sales per store for this year and compare it to that of the previous year to identify discrepancies

• Ranking and statistical profiles (top N/bottom N)

– Show me sales, profit and average call volume per day for my 10 most profitable salespeople

• Custom consolidation (market segments, ad hoc groups)

– Show me an abbreviated income statement by quarter for the last four quarters for my northeast region operations

Page 5: Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –

Multidimensional Modeling• Example: compute total sales volume per product and store

Store Product Total Sales

1 1 454

1 4 925

2 1 468

2 2 800

Etc.

Product Total Sales 1 2 3 4

1 454 - - 925

2 468 800 - -

3 296 - 240 - Stor

e

4 652 - 540 745

Product

Store

800

Page 6: Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –

From Tables and Spreadsheets to Data Cubes

• In general multidimensional data model views data in the form of a data cube

• A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions

– Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year)

– Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables

• In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube.

Page 7: Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –

Cube: A Lattice of Cuboids

all

time item location supplier

time,item time,location

time,supplier

item,location

item,supplier

location,supplier

time,item,location

time,item,supplier

time,location,supplier

item,location,supplier

time, item, location, supplier

0-D(apex) cuboid

1-D cuboids

2-D cuboids

3-D cuboids

4-D(base) cuboid

Page 8: Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –

Dimensions and Hierarchies

DIMENSIONS

prod

uct

city

month

category region year

product country quarter

state month week

city day

store

PRODUCT LOCATION TIMEHyd

DVD

Augu

s t

Sales of DVDs in Hyd in August

• A cell in the cube may store values (measurements) relative to the combination of the labeled dimensions

Page 9: Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –

Common OLAP Operations

• Roll-up: move up the hierarchy– e.g given total sales per city, we

can roll-up to get sales per state

• Drill-down: move down the hierarchy– more fine-grained aggregation

category region year

product country quarter

state month week

city day

store

PRODUCT LOCATION TIME

Page 10: Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –

Pivoting

• Pivoting: aggregate on selected dimensions– usually 2 dims (cross-tabulation)

Product Sales

1 2 3 4 ALL

1 454 - - 925 1379

2 468 800 - - 1268

3 296 - 240 - 536

4 652 - 540 745 1937

Sto

re

ALL 1870 800 780 1670 5120

Page 11: Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –

Slice and Dice Queries

• Slice and Dice: select and project on one or more dimensions

product

customers

store

customer = “Kalam”

Page 12: Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –

Roadmap

1. What is the data warehouse, data mart 2. Multi-dimensional data modeling3. Data warehouse design – schemas, indices4. The Data Cube operator – semantics and

computation5. Aggregate View Selection

Page 13: Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –

The Data Cube Operator (Gray et al)

• All previous aggregates in a single query:

SELECT LOCATION.store, SALES.product_key, SUM (amount)

FROM SALES, LOCATION

WHERE SALES.location_key=LOCATION.location_key

CUBE BY SALES.product_key, LOCATION.storeORCUBE product_key, store BY SUM(SALES.amount)

Challenge: Optimize Aggregate Computation

Page 14: Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –

Store Product_key sum(amout)1 1 4541 4 9252 1 4682 2 8003 1 2963 3 2404 1 6254 3 2404 4 7451 ALL 13792 ALL 12683 ALL 5364 ALL 1937ALL 1 1870ALL 2 800ALL 3 780ALL 4 1670ALL ALL 5120

Relational View of Data Cube

Product Sales

1 2 3 4 ALL

1 454 - - 925 1379

2 468 800 - - 1268

3 296 - 240 - 536

4 652 - 540 745 1937

Sto

re

ALL 1870 800 780 1670 5120

SELECT LOCATION.store, SALES.product_key, SUM (amount)

FROM SALES, LOCATION

WHERE SALES.location_key=LOCATION.location_key

CUBE BY SALES.product_key, LOCATION.store

Page 15: Roadmap 1.What is the data warehouse, data mart 2.Multi-dimensional data modeling 3.Data warehouse design – schemas, indices 4.The Data Cube operator –

Data Cube: Multidimensional ViewTotal annual salesof DVDs in AmericaQuarter

Prod

uct

Regi

on

sum

sum DVD

VCRPC

1Qtr 2Qtr 3Qtr 4Qtr

America

Europe

Asia

sum