index interactions in physical design tuning modeling, analysis, and applications

Post on 05-Feb-2016

26 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Index Interactions in Physical Design Tuning Modeling, Analysis, and Applications. Karl Schnaitter, UC Santa Cruz Neoklis Polyzotis, UC Santa Cruz Lise Getoor, Univ. of Maryland. VLDB 2009, Lyon, France. Index Selection. Index selection problem: Given a query workload - PowerPoint PPT Presentation

TRANSCRIPT

Index Interactions in Physical Design Tuning Modeling, Analysis, and Applications

Karl Schnaitter, UC Santa CruzNeoklis Polyzotis, UC Santa CruzLise Getoor, Univ. of Maryland

VLDB 2009, Lyon, France

2University of California, Santa Cruz

Index Selection• Index selection problem:

– Given a query workload– Choose indices that improve workload performance

• Does index benefit depend on other indices? – If so, this is called index interaction

• Index “benefit” is a key concept– Informally, for an index i,

[benefit of i] = [exec cost without i] – [exec cost with i]

3University of California, Santa Cruz

Related Work• Interactions are a key concern in physical tuning

– [Whang et al. 1981] make assumptions implying that indices on different tables do not interact

– [Finklestein et al. 1988] assume that indices do not interact if they are relevant to separate queries

– [Bruno and Chaudhuri 2007] explicitly account for some interactions in on-line index selection

– Many more…

• These studies treat interactions as a secondary issue, and often rely on ad hoc assumptions

4University of California, Santa Cruz

Index Interactions• Let S be a set of indices relevant to a query Q• •

cost(X)

cost(X {a}) benefit({a}, X)

cost(X {b})

cost(X {a,b}) benefit({a}, X {b})

Indices a,b are independent with respect to X

cost(X) = cost of Q if only X ⊆S is available

benefit(Y,X ) = cost(X) − cost(Y ∪X)

5University of California, Santa Cruz

Index Interactions

cost(X)

cost(X {a}) benefit({a}, X)

cost(X {b})

cost(X {a,b}) benefit({a}, X {b})

Indices a,b positively interact with respect to X

• Let S be a set of indices relevant to a query Q• •

cost(X) = cost of Q if only X ⊆S is available

benefit(Y,X ) = cost(X) − cost(Y ∪X)

6University of California, Santa Cruz

Index Interactions

cost(X)

cost(X {a}) benefit({a}, X)

cost(X {b})

cost(X {a,b}) benefit({a}, X {b})

Indices a,b negatively interact with respect to X

• Let S be a set of indices relevant to a query Q• •

cost(X) = cost of Q if only X ⊆S is available

benefit(Y,X ) = cost(X) − cost(Y ∪X)

7University of California, Santa Cruz

• = degree of interaction between a,b with respect to X

=

Degree of Interaction

=

• •

benefit({a},X) − benefit({a},X ∪{b})cost(X ∪{a,b})

cost(X ∪{a}) − cost(X) − cost(X ∪{a,b}) + cost(X ∪{b})cost(X ∪{a,b})

doi(a,b,X)

X€

X ∪{a}

X ∪{b}€

X ∪{a,b}

doi is symmetric

doi(a,b) = maxX ⊆S

doi(a,b,X)

8University of California, Santa Cruz

Problem Statement• Which indices in S interact?• How strong are the interactions?• The Degree of Interaction Problem:

Compute doi(a,b) for all a,b∈ S

9University of California, Santa Cruz

Outline

• Properties of Query Optimization• Degree of Interaction Algorithm• Applying Interaction Information

10University of California, Santa Cruz

Outline

• Properties of Query Optimization• Degree of Interaction Algorithm• Applying Interaction Information

11University of California, Santa Cruz

Query Optimization• Computing doi(a,b) is not practical if the

optimizer is totally arbitrary– Need to compute

• In practice, query optimization is not arbitrary– E.g., we expect

• We put mild assumptions on query optimization:– Plans are selected from some fixed space P– Optimizer chooses the cheapest feasible plan from P– Ties are broken consistently

cost(∅ ) ≥ cost({a})

S allfor ),,( XXbadoi

12University of California, Santa Cruz

Index Benefit Graph• An Index Benefit Graph (IBG) encodes the

selection of optimal plans for a query– Introduced by [Frank, Omiecinski, and Navathe 1992]

• Example IBG when S = {a,b,c,d}

a b c d

a b c b c d

a c b c

= 20

= 45

d = 80c = 80

= 50

c d = 65= 50= 80

used in opt plan

cost of plan

– There are 16 subsets of S– IBG has 8 nodes– But IBG can compute

cost(X) for all X ⊆S

13University of California, Santa Cruz

Outline

• Properties of Query Optimization• Degree of Interaction Algorithm• Applying Interaction Information

14University of California, Santa Cruz

Naive Algorithm• Recall that we want the degree of interaction between

all pairs of indices in S• Each doi(a,b) may be computed directly

For all a,b∈ S

Initialize T[a,b] = 0

Assign T[a,b] = max(d,T[a,b])

Let d =cost(X ∪{a}) − cost(X) − cost(X ∪{a,b}) + cost(X ∪{b})

cost(X ∪{a,b})

For all X ⊆S

Upon termination, T[a,b] = doi(a,b) for all a,bCan save time using an IBG as a cache of cost

functionDownside: iteration over all subsets of S

15University of California, Santa Cruz

The QINTERACT Algorithm

For all a,b∈ S

Initialize T[a,b] = 0

Assign T[a,b] = max(doi(a,b,X1),doi(a,b,X2),T[a,b])

For all IBG nodes Y

Construct two index sets X1, X2 ⊆S (see paper)

For all a,b∈ S

Initialize T[a,b] = 0

Assign T[a,b] = max(doi(a,b,X),T[a,b])

For all X ⊆S

Naive Algorithm (condensed)

We should avoid evaluating doi(a,b,X) for all

X ⊆S

QINTERACT algorithm processes two index sets per IBG node

QINTERACTAlgorithm

16University of California, Santa Cruz

cost(∅ )€

cost(a)

cost(b)€

cost(ab)

cost(u)€

cost(ua)

cost(ub)€

cost(aub)

QINTERACT Example

a b u v = 20

a u v = 30 b u v = 30

a u = 40 u v = 40

v = 50u = 50

b v = 40

•Let’s calculate doi(a,b) on the graph below•What happens on iteration Y = {u} ?

Y

a b u v = 20

a u v = 30 b u v = 30

a u = 40 u v = 40

v = 50u = 50

b v = 40

Y

doi(a,b,X1) =40 − 50 − 20 + 30

20= 0

X1 = {u}

doi(a,b, X2) =40 − 50 − 20 + 40

20= 0.5

X2 =∅

17University of California, Santa Cruz

Interleaved IBG Processing• In QINTERACT, the IBG is built, then analyzed

– I.e., IBG construction and analysis is serial

• We can discover interactions in a partial IBG

• IBG construction and analysis may be interleaved- Improves accuracy of doi over time

a b c d

a b c b c d

a c

= 20

= 45 = 50

= 80 . . . . . .b c

d = 80c = 80

c d = 65= 50

doi(b,d,{a,c}) =45 − 80 − 20 + 20

20=1.75

18University of California, Santa Cruz

Outline

• Properties of Query Optimization• Degree of Interaction Algorithm• Applying Interaction Information

- Visualizing Index Interactions- Scheduling Index Creation

19University of California, Santa Cruz

Outline

• Properties of Query Optimization• Degree of Interaction Algorithm• Applying Interaction Information

- Visualizing Index Interactions- Scheduling Index Creation

20University of California, Santa Cruz

Visualizing Index Interactions• We can visualize the doi function as a graph

– Nodes correspond to indices– Edge between a and b has weight doi(a,b)

O(CK,OK)

C(CK,NK)

LI(SK,SD,D,EP,OK)

LI(SD,D)

S(NK,N,SK) S(NK,SK) S(SK,NK)

C(NK,CK)

LI(SD,Q)

0.01

0.02

0.04

0.02

0.03

0.09 0.020.01

0.02TPC-H Query 7

21University of California, Santa Cruz

Interaction Graph• The connected components have special meaning

1. The benefit of any X ⊆Ci does not depend on S −Ci

2. Refining the partition loses property (1)3. This is the only partition with property (1) and (2)

C1

C3

C2

22University of California, Santa Cruz

Outline

• Properties of Query Optimization• Degree of Interaction Algorithm• Applying Interaction Information

- Visualizing Index Interactions- Scheduling Index Creation

23University of California, Santa Cruz

Scheduling Index Creation• Suppose we want to materialize new indices• In what order should they be created?

Benefit

∅ a,ba a,b,c

Materialized Indices

∅ a,cc a,b,c

Schedule = a,b,c

Choose first schedule to maximize benefit over time (shaded area)€

∅ a,bb a,b,c

Schedule = b,a,c Schedule = c,a,b

24University of California, Santa Cruz

Scheduling Index Creation• We define an optimization problem

– M = preexisting indices– {a1, …, an} = new indices to create

– Permute new indices as t1, …, tn to maximize

benefit({t1,..., ti}, M )i=1

n

∑• This problem is computationally hard

– There is a connection to the Set Cover problem, since each new index “covers” more benefit

25University of California, Santa Cruz

Greedy Scheduling• We are tempted to use a greedy heuristic• This results in the third schedule

Greedy schedule can be suboptimal by a factor of about (n – 1)

Benefit

∅ a,ba a,b,c

Materialized Indices

∅ a,cc a,b,c

Schedule = a,b,c

∅ a,bb a,b,c

Schedule = b,a,c Schedule = c,a,b

26University of California, Santa Cruz

Interaction-Aware Scheduling• Scheduling can use interaction graph

C1

C3

C2

Idea: First find optimal sub-schedules for each Ci

Then choose the best interleaving of sub-schedulesThis heuristic avoids the pitfalls of greedy scheduling We can also show stronger performance guarantees

27University of California, Santa Cruz

Conclusions• Index interactions provide useful insights

for physical design tuning• The doi metric is an effective characterization

of interaction relationships• We can analyze interactions efficiently when

the Index Benefit Graph has limited size• Future work?

28University of California, Santa Cruz

Thank You

29University of California, Santa Cruz

Performance Evaluation• QINTERACT implementation in Java

– Uses JDBC to connect to IBM DB2 database• Experiments use 22 TPC-H benchmark queries • We generate indices based on the DB2 advisor

– SALL = all indices recommended by DB2– S1C = indices in SALL with first column only

• We monitor the progress of the “serial” and “interleaved” approaches over time

30University of California, Santa Cruz

Experimental Results

SALL index set0.1 threshold

S1C index set0.1 threshold

31University of California, Santa Cruz

Applications• QINTERACT returns doi(a,b) for all a,b• We propose two applications of this

information– Visualizing index interactions

• Illustrates the global interactions as a graph• Useful when manually tuning the index set

– Scheduling index construction• Want to choose when new indices will be created• Goal is to increase performance as quickly as possible• Knowledge of index interactions can help

32University of California, Santa Cruz

Problem Statement• Which indices in S interact?• How strong are the interactions?• The Degree of Interaction Problem:

Compute doi(a,b) for all a,b∈ S

• It may be useful to ignore “minor” interactions• A threshold-based variant:

Decide if doi(a,b) > τ for all a,b∈ S

33University of California, Santa Cruz

Index Selection• Index selection problem:

a = any indexX = set of other indicesbenefit(a,X ) = cost(X) − cost(X ∪{a})

• Does benefit(a, X) depend on X ? – If so, this is called index interaction

W = a query workloadS = a set of indices relevant to Wcost(M ) = cost of W when indices M ⊆S are availableWant to find M ⊆S to minimize cost(M )

• We can quantify the benefit of an index:

34University of California, Santa Cruz

Future Work• Expand our support for updates• Implementation of visualization tool• Experiments with materialization scheduling• Incremental updates to doi function• Exploring stronger assumptions on query

optimization– Efficient upper bounds on doi function?

top related