cs437 lecture 10-12
TRANSCRIPT
Lecture
OLAP Implementation TechniquesOLAP Implementation TechniquesOLAP Implementation TechniquesOLAP Implementation Techniques
Aggregation in MOLAP� Sales volume as a function of (i) product, (ii) time, and (iii)
geography
� A cube structure created to handle this.
Dimensions: Product, Geography, Time
Industry
Category
Product
Hierarchical summarization paths
Pro
duct
Time
w1 w2 w3 w4 w5 w6
Milk
Bread
Eggs
Butter
Jam
Juice
NEW
S
12
13
45
8
23
10
Province
Division
District
City
Zone
Year
Quarter
Month Week
Day
� Drill down: get more details� e.g., given summarized sales as above, find breakup of sales by city within each region, or within Sindh
� Rollup: summarize data� e.g., given sales data, summarize sales for last year by product category and region
� Slice and dice: select and project � e.g.: Sales of soft-drinks in Karachi during last quarter
� Pivot: change the view of data
Cube Operations
Querying the Cube
Drill-down
-
2,000
4,000
6,000
8,000
10,000
12,000
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
OJ RK 8UP PK MJ BU AJ
2001 2002
-
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
2001 2002
Juices Soda Drinks
-
2,000
4,000
6,000
8,000
10,000
12,000
14,000
Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4
Juices Soda Drinks
2001 2002
Drill-Down
Roll-Up
Querying the Cube (Pivoting)
-
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
2001 2002
Juices Soda Drinks
-
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
Orange
juice
Mango
juice
Apple
juice
Rola-
Kola
8-UP Bubbly-
UP
Pola-
Kola
2001 2002
� No standard query language for querying MOLAP - No SQL !
� Vendors provide proprietary languages allowing business users to create queries that involve pivots, drilling down, or rolling up.- E.g. MDX of Microsoft
- Languages generally involve extensive visual (click and drag) support.
- Application Programming Interface (API)’s also provided for probing the cubes.
MOLAP Implementations
Need to consider both maintenance and storage implications when designing strategy for when to build cubes.
� Maintenance Considerations: Every data item received into MDD must be aggregated into every cube (assuming “to-date” summaries are maintained).
� Storage Considerations: Although cubes get much smaller (e.g., more dense) as dimensions get less detailed (e.g., year vs. day), storage implications for building hundreds of cubes can be significant.
MOLAP Implementations
Virtual cubes are used when there is a need to join information from two dissimilar cubes that share one or more common dimensions.
� Similar to a relational view; two (or more) cubes are linked along common dimension (s).
� Often used to save space by eliminating redundant storage of information.
Example: Build a list price cube that can be used to compute discounts given across many stores in a retail chain without redundant storage of the list price data through use of a virtual cube.
Virtual Cubes
� Typically outperform relational database technology because all answers
are pre-computed into cubes.
� Difficult to scale because of combinatorial explosion in the number and
size of cubes when dimensions of significant cardinality are required.
� Beyond tens (sometimes small hundreds) of thousands of entries in a
single dimension will break the MOLAP model because the pre-
computed cube model does not work well when the cubes are very
sparse in the population of individual cells.
See www.olapreport.com/DataExplosion.htm
MOLAP Implementations
� Advances in database technologies and front-end tools have begun to
allow deployment of OLAP using ANSI SQL RDBMS implementations.
� ROLAP facilitates deployment of much larger dimension tables than
MOLAP implementations.
� Front-end tools to facilitate GUI access to multi-dimensional analysis
capabilities.
� Aggregate awareness allows exploitation of pre-built summary tables for
some front-end tools.
� Star schema designs are often used to facilitate OLAP against relational
databases.
ROLAP Implementations
Data Cube Schema( a multidimensional array of summaries)( a multidimensional array of summaries)( a multidimensional array of summaries)( a multidimensional array of summaries)
SALESStore IDTime ID
Product IDCustomer IDUnit SalesStore CostStore Sales
STOREStore ID
Store NameStore City
Store StateStore Country
TIMETime IDMonthQuarter
Year
PRODUCTProduct Class ID
Product IDBrand Name
CUSTOMERCustomer IDLast Name
CityState
Country
PRODUCT CLASSProduct Class IDProduct Category
Product Subcategory
Time: Month → Quarter → Year → (all)Store: Name → City → State → Country → (all)Product: Brand Name → Subcategory → Category → (all)Customer: Last Name → City → State → Country → (all)
� Issue of scalability i.e. curse of dimensionality for MOLAP
� Deployment of significantly large dimension tables as compared to MOLAP using secondary storage.
� Aggregate awareness allows using pre-built summary tables by some front-end tools.
� Star schema designs usually used to facilitate ROLAP querying (in next lecture).
Why ROLAP?
� OLAP data is stored in a relational database (e.g. a star schema)
� The fact table is a way of visualizing as a “un-rolled” cube.
� So where is the cube?� It’s a matter of perception
� Visualize the fact table as an elementary cube.
ROLAP as a “Cube”
Pro
duct
Time
500Z1P2M2
250Z1P1M1
Sale K Rs.ZoneProductMonth
Fact Table
� Cube is a logical entity containing values of a certain fact at a certain aggregation level at an intersection of a combination of dimensions.
� The following table can be created using 3 queries
How to Create Cube in ROLAP?
SUM
(Sales_Amt)
M1 M2 M3 ALL
P1
P2
P3
Total
Month_ID
Pro
duct
_ID
� For the table entries, without the totalsSELECT S.Month_Id, S.Product_Id,
SUM(S.Sales_Amt)FROM SalesGROUP BY S.Month_Id, S.Product_Id;
� For the row totalsSELECT S.Product_Id, SUM (Sales_Amt)FROM SalesGROUP BY S.Product_Id;
� For the column totalsSELECT S.Month_Id, SUM (Sales) FROM Sales GROUP BY S.Month_Id;
How to Create Cube in ROLAP using
SQL?
� Number of required queries increases exponentially with the increase in number of dimensions.
� Its wasteful to compute all queries.
� In the example, the first query can do most of the work of the other two queries
� If we could save that result and aggregate over Month_Id and Product_Id, we could compute the other queries more efficiently
Problem with Simple Approach