materialized view selection for xquery workloads asterios katsifodimos 1, ioana manolescu 1 &...

39
Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1 , Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud, 2 Athens University of Economics and Business Athens University of Economics and Business

Upload: alison-wells

Post on 02-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads

Asterios Katsifodimos1, Ioana Manolescu1 & Vasilis Vassalos2

1Inria Saclay & Université Paris-Sud, 2Athens University of Economics and Business

Athens University of

Economics and Business

Page 2: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 2

View selection in XML databasesProblem definition

Find a set of materialized views that minimizes workload evaluation costs not exceeding a space budget.

Page 3: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 3

Materialized View Selection for XQuery Workloads

View selection for multiple-views XQuery rewriting

Rich subset of XQuery Tree patterns with multiple return nodes and value joins

We provide Candidate view pruning methods View selection algorithms:

Utility-Based Greedy (UDG) Reduce-Optimize Algorithm (ROA)

Extensive experimental evaluation Outperforming & extending state-of-the-art works

Contributions

Page 4: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads

Outline

The View Selection Problem

View Language & Candidate Views

View Selection Algorithms

Related Work & Experimentation

- 4

Page 5: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 5

Query and view languageAnatomy of a query

cont=subtree of the text element Value-join

Return the ID of every book along with its text and author if the book author has a paper in the SIGMOD conference.

ID

ID of book

book

textcont authorval

paper

author conference[=“SIGMOD”]

Page 6: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 6

Candidate Views

JOIN[v1.authorID>v2.bookID]

SCAN(v1) SCAN(v2)

PROJECT[textcont, authorval]

Rewriting

v1

authorID,val

v2

bookID

textID,cont

Candidate Views

Example:

Query

book

authorvaltextcont

Candidate views: views that can participate in a rewriting of a query. Property: candidate views are exactly those embeddable in a query.

Page 7: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 7

Candidate Views

Number of candidate views For query of m value joins and k tree patterns:

Early pruning is needed

Rules of thumb for pruning Drop all views that can be replaced by others Views should not store anything extraneous

Challenge: remove maximum number of views Preserve low cost and/or small size rewriting possibilities.

Page 8: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads

Candidate Views

8

Pruning techniques

book

authorvaltextcont

v2

authorID, cont

v2‘

authorID,val

v3

bookID

textID,cont

v1

book

authorID

Query Candidate Views

② Do not store unnecessary data i.e. useless cont, val or //-axis Avoid expensive rewritings Save space

① Annotate all nodes with ID Maximize rewriting

opportunities

v1‘

bookID

authorIDv3‘

bookID

textID,cont

Page 9: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads

Outline

The View Selection Problem

View Language & Candidate Views

View Selection Algorithms

Related Work & Experimentation

- 9

Page 10: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 10

Materializing a set of views

Benefit of materializing a set of views

benefit (V, Q)=(cost of evaluating Q over D) – (cost of evaluating Q over V)

Computation of benefit requires invoking rewriting algorithm Expensive!

Space occupancy of a view set V Total size (in bytes)

View set benefit

Page 11: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 11

View Selection Algorithms

High similarity with the classic 0-1 knapsack problem

Typical element of the greedy algorithms for knapsack:

utility(v,Q)=benefit({v} U V, Q)/size(v)

Knapsack-inspired view selection

Knapsack View Selection

Weight View Size

Profit Benefit (evaluation cost savings)

Page 12: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 12

S=12

View Selection AlgorithmsUtility-Driven Greedy (UDG) Algorithm

U=Utility(=benefit/size)S=Space occupancy

Space BudgetCandidate Views

U=10S=7

U=60S=5

U=50S=4

U=8S=2

1. Enumerate candidate views

2. Compute view utilities

3. Order views by utility

4. Select the view of largest utility fitting in budget

5. Repeat 2-4 until budget exhausted

Page 13: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 13

S=12

View Selection Algorithms

1. Enumerate candidates

2. Compute utilities

3. Order by utility

4. Select the view of largest utility fitting in budget

5. Repeat 2-4 until budget exhausted

Utility-Driven Greedy (UDG) Algorithm

U=Utility(=benefit/size)S=Space occupancy

Space BudgetCandidate Views

U=12S=7

U=40S=5

U=64S=4

U=9S=2

Page 14: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 14

S=12

View Selection AlgorithmsUtility-Driven Greedy (UDG) Algorithm

U=Utility(=benefit/size)S=Space occupancy

Space BudgetCandidate Views

U=13S=7

U=10S=5

U=64S=4

U=4S=2

1. Enumerate candidates

2. Compute utilities

3. Order by utility

4. Select the view of largest utility fitting in budget

5. Repeat 2-4 until budget exhausted

Greedy algorithms for knapsack not a perfect fit for our problem

Utility of a view may change after every round depends on other views already selected

Page 15: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 15

View Selection AlgorithmsState space search (state=candidate view set)

S1

S3

S4 S5

S6 S7 S8

S9

S10

S11

S12 S13

S14

S15 S16

Initial state:

Best state:

query workload

largest benefit under space budget

transform(S1)S8

Page 16: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 16

View Selection Algorithms

View Break: break a view in smaller parts

Reveals common sub-expressions of views

Can reduce or increase space occupancy

Increases query evaluation costs

State Transformations: Break, Join, Generalize, Adapt

book

textcont authorval

paper

author conference[=“SIGMOD”]

Page 17: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 17

View Selection Algorithms

Join: opposite to Break, join two views into one

Reduces evaluation costs

Joined views can be smaller in size

State Transformations: Break, Join, Generalize, Adapt

book

textcont authorval

paper

author conference[=“SIGMOD”]

ID

val ,ID

Page 18: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 18

View Selection Algorithms

Generalize: generalization/relaxation of a view

Reveals common sub-expressions of views

Can reduce or increase space occupancy

Increases query evaluation costs

State Transformations: Break, Join, Generalize, Adapt

book

textcont authorval

paper

author conferenceval[=“SIGMOD”]val

cont

Page 19: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 19

View Selection Algorithms

Adapt: specialization of views by1. Conversion of //-axis to /-axis 2. Addition of existential nodes

Reduces evaluation costs

“Adapted” views can be smaller in size

State Transformations: Break, Join, Generalize, Adapt

book

text author

paper

author conferenceval[=“SIGMOD”]

cont

Break, Join, Generalize, Adapt Allow to generate all states Guaranteed not to generate pruned views

Page 20: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 20

View Selection Algorithms

Huge number of states

Call rewriting algorithm after every state transition

Need for heuristics

Proposal: heuristic three-phase algorithm ROA

The Reduce-Optimize algorithm (ROA)

OptimizeJump

Reduce

Page 21: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 21

View Selection AlgorithmsThe Reduce-Optimize algorithm (ROA)

Space Budget

Time

Time

Space Occupancy

Benefit

Reduce Optimize Jump Reduce Optimize Reduce ...

SolutionBest State Revisited StateIntermediary State

Page 22: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 22

View Selection Algorithms

1. Some transitions may apply several transformations at once

2. Stop the rewriting algorithm early After k rewritings found or At a timeout

3. Consider only the lowest cost rewritings

Reducing ROA search time - heuristics

Page 23: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads

Outline

The View Selection Problem

View Language & Candidate Views

View Selection Algorithms

Related Work & Experimentation

- 23

Page 24: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 24

Related Work

Algorithm Rewriting power

[Mandhani, Suciu VLDB05] 1-view rewritings

[Tang et. al. DASFAA09] 1-view rewritings

Utility-Driven Greedy Multiple view rewritings

Reduce-Optimize Multiple view rewritings

Page 25: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 25

Experimental Evaluation

Queries Workloads

Tree patterns: Q1(14), Q2 (50), Q3(100)

Tree patterns + joins: Q4 (50), 20% joins Query Selectivity

⅓ low, ⅓ medium, ⅓ high

Database: 1GB XMark (10x100MB documents)

Settings

Space budget S=size(Q) Tested space budgets:

S, S/2, S/4, S/6

Algorithms UDG and ROA Competitors:

[Mandhani & Suciu VLDB05] [Tang et al. DASFAA09]

Implementation ViP2P*, Java

*http://vip2p.saclay.inria.fr

Page 26: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 26

Experimental EvaluationWorkload Evaluation Time of Q1 (14 queries)

Reduce-Optimize (ROA)

Space/Time Greedy [Tang et al. DASFAA09]

Set-Cover Greedy [Mandhani & Suciu VLDB05]

Utility-Driven Greedy (UDG)

Space Optimal [Tang et al. DASFAA09]

Hit

Rat

io

Eva

lua

tion

tim

e ve

rsu

s do

cs

Page 27: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 27

Experimental EvaluationEvaluation Time & hit ratio for Q3 (100 queries)

Reduce-Optimize (ROA)

Set-Cover Greedy [Mandhani & Suciu VLDB05]

Hit

Rat

io

Eva

lua

tion

tim

e ve

rsu

s do

cs

Page 28: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 28

Experimental EvaluationROA evaluation for Q4 (50 queries, 20% value-joined)

% of evaluation time vs. documents

Hit Ratio

Page 29: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 29

Conclusions

Automatic selection of XQuery views for multiple-views rewritings

Reduction of candidate views By orders of magnitude

ROA performs better than related work Scales and manages to find good solutions relatively fast

80% of the benefits attained in ~2 minutes Maximum benefit attained within 25 minutes.

Algorithms of [Tang et. al. DASFAA09] did not scale beyond 14 queries

Utility Drive Greedy (UDG) did not scale beyond 50 queries

Page 30: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Thank you

- 30

Questions?

?

Page 31: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads

BACKUP

- 31

Page 32: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 32

Cost of algebraic plans

Algebraic Plan cost Execution cost of an operator has

A CPU execution cost and An IO cost Both depend on input

Evaluation cost of a plan: Calculated bottom-up

Estimating the evaluation cost of a rewriting

Data Statistics DataGuide of every document

Enriched with information: # of instances of a path Average path val size (bytes) Average path cont size (bytes) Distinct values of each path

Used to estimate Cardinality & size of a view

Page 33: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 33

Cost of algebraic plans

View Size Cardinality

v1 500KB 50

v2 100KB 10

Cost estimation example

JOIN[v1.author=v2.author]

SCAN(v1)

SCAN(v2)

SELECT[conference=“SIGMOD”]

PROJECT[textcont, authorval]

IO=100 | CPU=10

IO=100 | CPU=10+10IO= 500 | CPU=50

IO=500+100 | CPU=70+50*5

IO=600 | CPU=320+25

OUTPUT=50

OUTPUT=10

OUTPUT=5

OUTPUT=25 (50*5*0.1)

OUTPUT=25

Page 34: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 34

Experimental EvaluationROA time to attain increasing benefits (minutes)

Page 35: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 35

Experimental EvaluationCandidate views pruning

CS0max Maximum estimated number of candidate views

CS0min Minimum estimated number of candidate views

CS1 Pruned candidate view set

CS2 Pruned candidate view set – only linear path candidates

Page 36: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 36

Candidate Views

The cardinality of the set of candidate views of a tree pattern query q of |q| nodes is bounded by:

Size of the set of candidate views for a tree pattern

Combinations of nodes of q: ({a},{b},{c},{a,b},{a,c},{a,b,c})

Edge combinations: how to connect nodes with (/, //) e.g. /a/b, //a/b, /a//b, //a//b}.

There are 12 return node variations for each node in a pattern e.g. (aID,cont,aval,aID,val…)

Example: q=/a/bval/c

Page 37: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 37

Candidate Views

Given a joined pattern q with: k tree patterns and m value-joins

The candidate view set size of q is bounded by:

Size of the set of candidate views for a joined pattern

Value join combinationsNumber of views resulting from all possible cartesian products of k tree patterns

Page 38: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 38

View Selection Algorithms

The benefit of materializing a view set V is The difference in cost of evaluating the workload over V

vs. evaluating from the documents

Benefit of materializing a set of views

Cost of evaluating query q given the set

of materialized views V

Cost of evaluating query q from the

documents

Frequency of query q

Page 39: Materialized View Selection for XQuery Workloads Asterios Katsifodimos 1, Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris-Sud,

Materialized View Selection for XQuery Workloads 39

Tree Pattern queryof |q| nodes

Joined Pattern query with m value joins & k tree patterns