roadmap 1.what is the data warehouse, data mart 2.multi-dimensional data modeling 3.data warehouse...

Post on 03-Jan-2016

216 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Roadmap

1. What is the data warehouse, data mart 2. Multi-dimensional data modeling3. Data warehouse design – schemas, indices4. The Data Cube operator – semantics and

computation5. Aggregate View Selection

Why not Using Existing DB?

• DBMS is for On Line Transaction Processing (OLTP)– automate day-to-day operations (purchasing,

banking etc)

• Data Warehouse is for On Line Analytical Processing (OLAP)– need historical data for trend analysis

OLTP vs. OLAP OLTP OLAP

users clerk, IT professional knowledge worker

function day to day operations decision support

DB design application-oriented subject-oriented

data current, up-to-date detailed, flat relational isolated

historical, summarized, multidimensional integrated, consolidated

usage repetitive ad-hoc

access read/write index/hash on prim. key

lots of scans

unit of work short, simple transaction complex query

# records accessed tens millions

#users thousands hundreds

DB size 100MB-GB 100GB-TB

metric transaction throughput query throughput, response

Examples of OLAP• Comparisons (this period v.s. last period)

– Show me the sales per store for this year and compare it to that of the previous year to identify discrepancies

• Ranking and statistical profiles (top N/bottom N)

– Show me sales, profit and average call volume per day for my 10 most profitable salespeople

• Custom consolidation (market segments, ad hoc groups)

– Show me an abbreviated income statement by quarter for the last four quarters for my northeast region operations

Multidimensional Modeling• Example: compute total sales volume per product and store

Store Product Total Sales

1 1 454

1 4 925

2 1 468

2 2 800

Etc.

Product Total Sales 1 2 3 4

1 454 - - 925

2 468 800 - -

3 296 - 240 - Stor

e

4 652 - 540 745

Product

Store

800

From Tables and Spreadsheets to Data Cubes

• In general multidimensional data model views data in the form of a data cube

• A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions

– Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year)

– Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables

• In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube.

Cube: A Lattice of Cuboids

all

time item location supplier

time,item time,location

time,supplier

item,location

item,supplier

location,supplier

time,item,location

time,item,supplier

time,location,supplier

item,location,supplier

time, item, location, supplier

0-D(apex) cuboid

1-D cuboids

2-D cuboids

3-D cuboids

4-D(base) cuboid

Dimensions and Hierarchies

DIMENSIONS

prod

uct

city

month

category region year

product country quarter

state month week

city day

store

PRODUCT LOCATION TIMEHyd

DVD

Augu

s t

Sales of DVDs in Hyd in August

• A cell in the cube may store values (measurements) relative to the combination of the labeled dimensions

Common OLAP Operations

• Roll-up: move up the hierarchy– e.g given total sales per city, we

can roll-up to get sales per state

• Drill-down: move down the hierarchy– more fine-grained aggregation

category region year

product country quarter

state month week

city day

store

PRODUCT LOCATION TIME

Pivoting

• Pivoting: aggregate on selected dimensions– usually 2 dims (cross-tabulation)

Product Sales

1 2 3 4 ALL

1 454 - - 925 1379

2 468 800 - - 1268

3 296 - 240 - 536

4 652 - 540 745 1937

Sto

re

ALL 1870 800 780 1670 5120

Slice and Dice Queries

• Slice and Dice: select and project on one or more dimensions

product

customers

store

customer = “Kalam”

Roadmap

1. What is the data warehouse, data mart 2. Multi-dimensional data modeling3. Data warehouse design – schemas, indices4. The Data Cube operator – semantics and

computation5. Aggregate View Selection

The Data Cube Operator (Gray et al)

• All previous aggregates in a single query:

SELECT LOCATION.store, SALES.product_key, SUM (amount)

FROM SALES, LOCATION

WHERE SALES.location_key=LOCATION.location_key

CUBE BY SALES.product_key, LOCATION.storeORCUBE product_key, store BY SUM(SALES.amount)

Challenge: Optimize Aggregate Computation

Store Product_key sum(amout)1 1 4541 4 9252 1 4682 2 8003 1 2963 3 2404 1 6254 3 2404 4 7451 ALL 13792 ALL 12683 ALL 5364 ALL 1937ALL 1 1870ALL 2 800ALL 3 780ALL 4 1670ALL ALL 5120

Relational View of Data Cube

Product Sales

1 2 3 4 ALL

1 454 - - 925 1379

2 468 800 - - 1268

3 296 - 240 - 536

4 652 - 540 745 1937

Sto

re

ALL 1870 800 780 1670 5120

SELECT LOCATION.store, SALES.product_key, SUM (amount)

FROM SALES, LOCATION

WHERE SALES.location_key=LOCATION.location_key

CUBE BY SALES.product_key, LOCATION.store

Data Cube: Multidimensional ViewTotal annual salesof DVDs in AmericaQuarter

Prod

uct

Regi

on

sum

sum DVD

VCRPC

1Qtr 2Qtr 3Qtr 4Qtr

America

Europe

Asia

sum

top related