cs437 lecture 10-12

18
Lecture OLAP Implementation Techniques OLAP Implementation Techniques OLAP Implementation Techniques OLAP Implementation Techniques

Upload: aneebkhawar

Post on 16-Feb-2017

60 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Lecture

OLAP Implementation TechniquesOLAP Implementation TechniquesOLAP Implementation TechniquesOLAP Implementation Techniques

Aggregation in MOLAP� Sales volume as a function of (i) product, (ii) time, and (iii)

geography

� A cube structure created to handle this.

Dimensions: Product, Geography, Time

Industry

Category

Product

Hierarchical summarization paths

Pro

duct

Time

w1 w2 w3 w4 w5 w6

Milk

Bread

Eggs

Butter

Jam

Juice

NEW

S

12

13

45

8

23

10

Province

Division

District

City

Zone

Year

Quarter

Month Week

Day

� Drill down: get more details� e.g., given summarized sales as above, find breakup of sales by city within each region, or within Sindh

� Rollup: summarize data� e.g., given sales data, summarize sales for last year by product category and region

� Slice and dice: select and project � e.g.: Sales of soft-drinks in Karachi during last quarter

� Pivot: change the view of data

Cube Operations

Querying the Cube

Drill-down

-

2,000

4,000

6,000

8,000

10,000

12,000

Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4

OJ RK 8UP PK MJ BU AJ

2001 2002

-

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

2001 2002

Juices Soda Drinks

-

2,000

4,000

6,000

8,000

10,000

12,000

14,000

Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4

Juices Soda Drinks

2001 2002

Drill-Down

Roll-Up

Querying the Cube (Pivoting)

-

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

2001 2002

Juices Soda Drinks

-

2,000

4,000

6,000

8,000

10,000

12,000

14,000

16,000

18,000

Orange

juice

Mango

juice

Apple

juice

Rola-

Kola

8-UP Bubbly-

UP

Pola-

Kola

2001 2002

� No standard query language for querying MOLAP - No SQL !

� Vendors provide proprietary languages allowing business users to create queries that involve pivots, drilling down, or rolling up.- E.g. MDX of Microsoft

- Languages generally involve extensive visual (click and drag) support.

- Application Programming Interface (API)’s also provided for probing the cubes.

MOLAP Implementations

Need to consider both maintenance and storage implications when designing strategy for when to build cubes.

� Maintenance Considerations: Every data item received into MDD must be aggregated into every cube (assuming “to-date” summaries are maintained).

� Storage Considerations: Although cubes get much smaller (e.g., more dense) as dimensions get less detailed (e.g., year vs. day), storage implications for building hundreds of cubes can be significant.

MOLAP Implementations

Virtual cubes are used when there is a need to join information from two dissimilar cubes that share one or more common dimensions.

� Similar to a relational view; two (or more) cubes are linked along common dimension (s).

� Often used to save space by eliminating redundant storage of information.

Example: Build a list price cube that can be used to compute discounts given across many stores in a retail chain without redundant storage of the list price data through use of a virtual cube.

Virtual Cubes

� Typically outperform relational database technology because all answers

are pre-computed into cubes.

� Difficult to scale because of combinatorial explosion in the number and

size of cubes when dimensions of significant cardinality are required.

� Beyond tens (sometimes small hundreds) of thousands of entries in a

single dimension will break the MOLAP model because the pre-

computed cube model does not work well when the cubes are very

sparse in the population of individual cells.

See www.olapreport.com/DataExplosion.htm

MOLAP Implementations

� What is ROLAP?

� Why ROLAP?

� How to implement ROLAP

ROLAP

� Advances in database technologies and front-end tools have begun to

allow deployment of OLAP using ANSI SQL RDBMS implementations.

� ROLAP facilitates deployment of much larger dimension tables than

MOLAP implementations.

� Front-end tools to facilitate GUI access to multi-dimensional analysis

capabilities.

� Aggregate awareness allows exploitation of pre-built summary tables for

some front-end tools.

� Star schema designs are often used to facilitate OLAP against relational

databases.

ROLAP Implementations

Data Cube Schema( a multidimensional array of summaries)( a multidimensional array of summaries)( a multidimensional array of summaries)( a multidimensional array of summaries)

SALESStore IDTime ID

Product IDCustomer IDUnit SalesStore CostStore Sales

STOREStore ID

Store NameStore City

Store StateStore Country

TIMETime IDMonthQuarter

Year

PRODUCTProduct Class ID

Product IDBrand Name

CUSTOMERCustomer IDLast Name

CityState

Country

PRODUCT CLASSProduct Class IDProduct Category

Product Subcategory

Time: Month → Quarter → Year → (all)Store: Name → City → State → Country → (all)Product: Brand Name → Subcategory → Category → (all)Customer: Last Name → City → State → Country → (all)

� Issue of scalability i.e. curse of dimensionality for MOLAP

� Deployment of significantly large dimension tables as compared to MOLAP using secondary storage.

� Aggregate awareness allows using pre-built summary tables by some front-end tools.

� Star schema designs usually used to facilitate ROLAP querying (in next lecture).

Why ROLAP?

� OLAP data is stored in a relational database (e.g. a star schema)

� The fact table is a way of visualizing as a “un-rolled” cube.

� So where is the cube?� It’s a matter of perception

� Visualize the fact table as an elementary cube.

ROLAP as a “Cube”

Pro

duct

Time

500Z1P2M2

250Z1P1M1

Sale K Rs.ZoneProductMonth

Fact Table

� Cube is a logical entity containing values of a certain fact at a certain aggregation level at an intersection of a combination of dimensions.

� The following table can be created using 3 queries

How to Create Cube in ROLAP?

SUM

(Sales_Amt)

M1 M2 M3 ALL

P1

P2

P3

Total

Month_ID

Pro

duct

_ID

� For the table entries, without the totalsSELECT S.Month_Id, S.Product_Id,

SUM(S.Sales_Amt)FROM SalesGROUP BY S.Month_Id, S.Product_Id;

� For the row totalsSELECT S.Product_Id, SUM (Sales_Amt)FROM SalesGROUP BY S.Product_Id;

� For the column totalsSELECT S.Month_Id, SUM (Sales) FROM Sales GROUP BY S.Month_Id;

How to Create Cube in ROLAP using

SQL?

� Number of required queries increases exponentially with the increase in number of dimensions.

� Its wasteful to compute all queries.

� In the example, the first query can do most of the work of the other two queries

� If we could save that result and aggregate over Month_Id and Product_Id, we could compute the other queries more efficiently

Problem with Simple Approach

� The CUBE clause is part of SQL:1999

� GROUP BY CUBE (v1, v2, …, vn)

� Equivalent to a collection of GROUP BYs, one for each of the subsets of v1, v2, …, vn

Cube Clause in SQL