achieving scalability in olap materialized view selection thomas p. nadeau toby j. teorey university...
TRANSCRIPT
![Page 1: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/1.jpg)
Achieving Scalability in OLAP Materialized View Selection
Thomas P. NadeauToby J. Teorey
University of Michigan
DOLAP 2002
![Page 2: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/2.jpg)
2
Topics
• Overview of OLAP• Exponentiality in View Selection• Our Polynomial Greedy Algorithm (PGA)• Test Results• Conclusions• Current Work
![Page 3: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/3.jpg)
3
Example Star Schema
Sell
CustID
DateID
BindID
Cost
Fact Table
DateID
Month
Quarter
Year
Calendar
CustID
Name
City
State/Prov
Customer
Bind StyleBindID
Desc
![Page 4: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/4.jpg)
4
Star Schema Viewed with Data
Fact Table
Bind StyleBindID
PBHC
DescPaper BackHard Cover
DateID Month Quarter Year
1/1/98 Jan 1 1998
1/2/98 Jan 1 1998
12/31/00 Dec 4 2000
CustomerCustID Name City State/Prov
00001 U of M Ann Arbor MI00002 Smith & Co. Toronto Ont
SellCustID DateID BindID Cost$60000002 12/31/00 PB $500
$130000222 1/1/99 HC $1100
Many Rows
Calendar
![Page 5: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/5.jpg)
5
Eight Dimensions of Book Database
Attribute Hierarchy Levels
Trim Width 4
Trim Length 4
Pages 4
Quantity 4
Stock Width 4
Stock Length 4
Bind Style 4
Press 4
![Page 6: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/6.jpg)
6
Combinatorial Explosion
• Possible views = ℓi,
where d = |dimensions| ℓi = |levels| in dimension i
• Book database example– 2 dimensions, 42 = 16 views– 4 dimensions, 44 = 256 views– 6 dimensions, 46 = 4,096 views– 8 dimensions, 48 = 65,536 views
i = 1
d
![Page 7: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/7.jpg)
7
Recap
• Materialized views quicken query responses
• Disk space limits view materialization
• Update window is a constraint
• Solution: Select strategic views
![Page 8: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/8.jpg)
8
Our OLAP Optimization ApproachFact Table
Update
Users
Sample Data
Estimated View Size
Strategic Views
Current Views
Incremental Data
QueriesQuick
Responses
Completed Work
Current Work
Initial Data
Estimate Request
View Size Estimation
View Selection
View Maintenance
Query Optimization
![Page 9: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/9.jpg)
9
View Selection:Example of Hypercube Lattice [HRU96]
p = Part
s = Supplier
c = Customer
{c, p, s} 6M
{p, s} 0.8M {c, s} 6M {c, p} 6M
{s} 0.01M {p} 0.2M {c} 0.1M
{} 1
![Page 10: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/10.jpg)
10
Example of HRU Algorithm [HRU96]
5.2M x 4 = 20.8M0 x 4 = 00 x 4 = 0
5.99M x 2 = 11.98M5.8M x 2 = 11.6M5.9M x 2 = 11.8M
6M - 1
{p, s}{c, s}{c, p}
{s}{p}{c}{}
Iteration 1
Benefits of Possible Materialization Choices
p = Part
s = Supplier
c = Customer
{c, p, s} 6M
{p, s} 0.8M {c, s} 6M {c, p} 6M
{s} 0.01M {p} 0.2M {c} 0.1M
{} 1
![Page 11: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/11.jpg)
11
0 x 4 = 00 x 4 = 0
0.79M x 2 = 1.58M0.6M x 2 = 1.2M
5.9M x 2 = 11.8M0.8M - 1
Iteration 2
Benefits of Possible Materialization Choices
p = Part
s = Supplier
c = Customer
Example of HRU
5.2M x 4 = 20.8M0 x 4 = 00 x 4 = 0
5.99M x 2 = 11.98M5.8M x 2 = 11.6M5.9M x 2 = 11.8M
6M - 1
{p, s}{c, s}{c, p}
{s}{p}{c}{}
Iteration 1
{c, p, s} 6M
{p, s} 0.8M {c, s} 6M {c, p} 6M
{s} 0.01M {p} 0.2M {c} 0.1M
{} 1
![Page 12: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/12.jpg)
12
Exponentiality in HRU
• O(kn2) time, where k = |views to select|, n = |possible views|
• n = 2d in non-hierarchical database, where d = |dimensions|
• HRU algorithm is O(k22d) time• Two sources of exponentiality
– Each possible view is evaluated– Each view evaluation considers the effect of
materialization on every descendent
![Page 13: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/13.jpg)
13
Polynomial Greedy Algorithm (PGA)
Nominate smallest child view
Nomination Selection
For each candidate
Select fact table
[more candidates]
[else]
[termination condition met]
[else]
Evaluate benefit
Select view greedily
Start new path
[path ended]
[continuing path]
![Page 14: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/14.jpg)
14
p = Part
s = Supplier
c = Customer
Example of PGA [NT02]{c, p, s} 6M
{p, s} 0.8M {c, s} 6M {c, p} 6M
{s} 0.01M {p} 0.2M {c} 0.1M
{} 1
![Page 15: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/15.jpg)
15
Example of PGA{c, p, s} 6M
{p, s} 0.8M {c, s} 6M {c, p} 6M
{s} 0.01M {p} 0.2M {c} 0.1M
{} 1
p = Part
s = Supplier
c = Customer
Nomination
Candidates
{p, s}{s}{}
![Page 16: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/16.jpg)
16
Example of PGA
p = Part
s = Supplier
c = Customer
Candidates
{p, s}{s}{}
Iteration 1
5.2M x 4 = 20.8M5.99M x 2 = 11.98M
6M - 1
Nomination Selection
{c, p, s} 6M
{p, s} 0.8M {c, s} 6M {c, p} 6M
{s} 0.01M {p} 0.2M {c} 0.1M
{} 1
![Page 17: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/17.jpg)
17
Example of PGA
p = Part
s = Supplier
c = Customer
Candidates
{p, s}{s}{}
Iteration 1
5.2M x 4 = 20.8M5.99M x 2 = 11.98M
6M - 1
Candidates
{c, s}{s}{c}{}
Nomination Selection Nomination
{c, p, s} 6M
{p, s} 0.8M {c, s} 6M {c, p} 6M
{s} 0.01M {p} 0.2M {c} 0.1M
{} 1
![Page 18: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/18.jpg)
18
Example of PGA
p = Part
s = Supplier
c = Customer
Candidates
{p, s}{s}{}
Iteration 1
5.2M x 4 = 20.8M5.99M x 2 = 11.98M
6M - 1
Candidates
0 x 2 = 00.79M x 2 = 1.58M 5.9M x 2 = 11.8M
6M - 1
{c, s}{s}{c}{}
Iteration 2
Nomination Selection Nomination Selection
{c, p, s} 6M
{p, s} 0.8M {c, s} 6M {c, p} 6M
{s} 0.01M {p} 0.2M {c} 0.1M
{} 1
![Page 19: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/19.jpg)
19
Nomination Complexity
• Maximum swatch width is d.
• Maximum path length is d.
• Finding one path is O(d2) time
• Our strategy nominates a path each time a view is selected, complexity is O(d2k) time
![Page 20: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/20.jpg)
20
Evaluating Views in PGA
• Polynomial time evaluation requires approximating materialization benefits
• Account for smallest ancestor
• Account for materialized view with largest overlap in descendants
• Complexity of our algorithm is O(d2k2)
![Page 21: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/21.jpg)
21
Complexities
d = | dimensions |
g = geometric mean of the number of hierarchical levels per dimension
k = | views selected for materialization |
ℓ = | layers in lattice |
Database Type HRU PGA
Non-Hierarchical O(k22d) time O(d2k2) time
O(d2k) space
Hierarchical O(kg2d) time O(dk2ℓ) time
O(dkℓ) space
![Page 22: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/22.jpg)
22
Near Optimal Selection
d=2, ℓ = 4
0
200
400
600
800
1000
1200
1400
0 50 100 150 200 250 300 350
OptimalHRUPolynomial Greedy
Materialization Costs (rows)
Qu
ery
Cos
ts (
row
s)
![Page 23: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/23.jpg)
23
Query Costs at Four Dimensions
Qu
ery
Cos
ts (
thou
san
ds
of r
ows)
Materialization Costs (thousands of rows)
0
200
400
600
800
0 20 40 60 80 100 120 140
HRU PGA
![Page 24: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/24.jpg)
24
Query Costs at Six Dimensions
Qu
ery
Cos
ts (
mil
lion
s of
row
s)
Materialization Costs (thousands of rows)
0
5
10
15
20
0 50 100 150 200 250
HRU PGA
![Page 25: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/25.jpg)
25
Query Costs at Eight Dimensions
Qu
ery
Cos
ts (
mil
lion
s of
row
s)
Materialization Costs (thousands of rows)
0
50
100
150
200
250
300
350
0 100 200 300 400 500
HRU PGA
![Page 26: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/26.jpg)
26
Performance at Four Dimensions
Materialization Costs (thousands of rows)
Pro
cess
ing
Tim
e (s
econ
ds)
0
50
100
150
200
250
0 20 40 60 80 100 120 140
HRU PGA
![Page 27: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/27.jpg)
27
Performance at Six Dimensions
0.00
50.00
100.00
150.00
200.00
0 50 100 150 200 250
HRU PGA
Materialization Costs (thousands of rows)
Pro
cess
ing
Tim
e (m
inu
tes)
![Page 28: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/28.jpg)
28
Performance at Eight Dimensions
0.00
50.00
100.00
150.00
200.00
0 100 200 300 400 500Materialization Costs (thousands of rows)
Pro
cess
ing
Tim
e (m
inu
tes)
HRU PGA
![Page 29: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/29.jpg)
29
Conclusions
• PGA finds a good set of views for materialization, when HRU fails due to algorithm complexity
• PGA extends the usefulness of OLAP systems into higher dimensionality
![Page 30: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/30.jpg)
30
Current WorkFact Table
Update
Users
Sample Data
Estimated View Size
Strategic Views
Current Views
Incremental Data
QueriesQuick
Responses
Completed Work
Current Work
Initial Data
Estimate Request
View Size Estimation
View Selection
View Maintenance
Query Optimization
![Page 31: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/31.jpg)
31
Current Work
• Design alternative data structures for materialized views in OLAP
• Test impact of new data structures on update and query costs.
• Integrate our work into an OLAP system
![Page 32: Achieving Scalability in OLAP Materialized View Selection Thomas P. Nadeau Toby J. Teorey University of Michigan DOLAP 2002](https://reader035.vdocuments.site/reader035/viewer/2022081503/56649eda5503460f94be9c2b/html5/thumbnails/32.jpg)
32
References
• [HRU96] V. Harinarayan, A. Rajaraman, J. D. Ullman. Implementing Data Cubes Efficiently. In Proceedings of 1996 ACM-SIGMOD Conf., pp. 205 - 216, Montreal, Canada.
• [NT01] T. P. Nadeau, T. J. Teorey. A Pareto Model for OLAP View Size Estimation. CASCON 2001, pp 1 – 13, Toronto, Canada.
• [NT02] T. P. Nadeau, T. J. Teorey. Achieving Scalability in OLAP Materialized View Selection. Technical Report (extended version). http://www.eecs.umich.edu/~teorey/cv.html .