relaxing join and selection queries

32
Relaxing Join and Relaxing Join and Selection Queries Selection Queries Rares Vernica Rares Vernica UC Irvine, USA UC Irvine, USA Joint work with Nick Koudas, Chen Li, and Anthony K. Joint work with Nick Koudas, Chen Li, and Anthony K. H. Tung H. Tung

Upload: quang

Post on 07-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Relaxing Join and Selection Queries. Rares Vernica UC Irvine, USA Joint work with Nick Koudas, Chen Li, and Anthony K. H. Tung. Query Example. SELECT * FROM Jobs J, Candidates C WHERE J.Salary = 5;. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Relaxing Join and Selection Queries

Relaxing Join and Selection Relaxing Join and Selection QueriesQueries

Rares VernicaRares VernicaUC Irvine, USAUC Irvine, USA

Joint work with Nick Koudas, Chen Li, and Anthony K. H. TungJoint work with Nick Koudas, Chen Li, and Anthony K. H. Tung

Page 2: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 2

Query ExampleQuery Example

SELECT * FROM Jobs J, Candidates CSELECT * FROM Jobs J, Candidates C

WHERE J.Salary <= 95WHERE J.Salary <= 95

AND J.Zipcode = C.ZipcodeAND J.Zipcode = C.Zipcode

AND C.WorkExp >= 5;AND C.WorkExp >= 5;

Jobs CandidatesID

Company

Zipcode

Salary

ID

Zipcode

ExpSalary

WorkExp

J1 Broadcom

92047 80 C1

93652 120 3

J2 Intel 93652 95 C2

92612 130 6

J3 Microsoft

82632 120 C3

82632 100 5

J4 IBM 90391 130 C4

90391 150 1

... … … … ... … … …

Page 3: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 3

What if the query answer is What if the query answer is empty?empty?

SELECT * FROM Jobs J, Candidates CSELECT * FROM Jobs J, Candidates C

WHERE J.Salary <= 95WHERE J.Salary <= 95

AND J.Zipcode = C.ZipcodeAND J.Zipcode = C.Zipcode

AND C.WorkExp >= 5;AND C.WorkExp >= 5;

Adjust the conditionsAdjust the conditions

What conditions to adjust?What conditions to adjust? How to adjust them?How to adjust them?

Page 4: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 4

Example Percentages of Empty Result Example Percentages of Empty Result QueriesQueries

• In a Customer Relationship Management (CRM) In a Customer Relationship Management (CRM) application developed by IBMapplication developed by IBM 18.07% (3,396 empty result queries in 18,793 queries)18.07% (3,396 empty result queries in 18,793 queries)

• In a real estate application developed by IBM In a real estate application developed by IBM 5.75% 5.75%

• In a digital library application [JCMIn a digital library application [JCM++00] 00] 10.53%10.53%

• In a bioinformatics application [RCPIn a bioinformatics application [RCP++98]98] 38%38%

Efficient Detection of Empty-Result Queries (p.1015)Gang Luo (IBM T.J. Watson Efficient Detection of Empty-Result Queries (p.1015)Gang Luo (IBM T.J. Watson Research Center, USA) VLDB 2006Research Center, USA) VLDB 2006

Page 5: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 5

ObservationsObservations

Jobs CandidatesID

Company

Zipcode

Salary

ID

Zipcode

ExpSalary

WorkExp

J1 Broadcom

92047 80 C1

93652 120 3

J2 Intel 93652 95 C2

92612 130 6

J3 Microsoft

82632 120 C3

82632 100 5

J4 IBM 90391 130 C4

90391 150 1

... … … … ... … … …

Different ways to adjust the conditions: Different ways to adjust the conditions: Select vs. Join Select vs. Join

How much to adjust each condition?How much to adjust each condition?Salary <= 100 vs. Salary <= 120Salary <= 100 vs. Salary <= 120

Adjust join vs. Adjust both selectionsAdjust join vs. Adjust both selections

Salary <= 95

WorkExp >= 5

Page 6: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 6

ContributionsContributions

Query relaxationQuery relaxation framework for selections framework for selections and joinsand joins

LatticeLattice-based approach for query relaxation-based approach for query relaxation

Efficient relaxation Efficient relaxation algorithmsalgorithms

Page 7: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 7

OverviewOverview

1.1. MotivationMotivation

2.2. Query RelaxationQuery Relaxation

3.3. Lattice-based RelaxationLattice-based Relaxation

4.4. Relaxation AlgorithmsRelaxation Algorithms

5.5. VariationsVariations

6.6. ExperimentsExperiments

Page 8: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 8

Query RelaxationQuery Relaxation

Top-k / Nearest neighborTop-k / Nearest neighborWeight for each conditionWeight for each condition

SkylineSkylineNo weights are neededNo weights are neededConditions are not considered equalConditions are not considered equalReturn non dominated pointsReturn non dominated points

Page 9: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 9

Query RelaxationQuery Relaxation

J .Salary

C.WorkExp

J .Salary <= 95C.WorkExp >=5

5

95

SkylineSkyline

Stephan Börzsönyi, Donald Kossmann, Stephan Börzsönyi, Donald Kossmann, Konrad Stocker: The Skyline Operator. Konrad Stocker: The Skyline Operator. ICDE 2001ICDE 2001

Page 10: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 10

OverviewOverview

1.1. MotivationMotivation

2.2. Query RelaxationQuery Relaxation

3.3. Lattice-based RelaxationLattice-based Relaxation

4.4. Relaxation AlgorithmsRelaxation Algorithms

5.5. VariationsVariations

6.6. ExperimentsExperiments

Page 11: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 11

LatticeLattice-based Relaxation-based Relaxation

Jobs CandidatesID

Company

Zipcode

Salary

ID

Zipcode

ExpSalary

WorkExp

J1 Broadcom

92047 80 C1

93652 120 3

J2 Intel 93652 95 C2

92612 130 6

J3 Microsoft

82632 120 C3

82632 100 5

J4 IBM 90391 130 C4

90391 150 1

... … … … ... … … …

R

RJS

f

J S

JSRSRJ

Salary <= 95

WorkExp >= 5

R – select on JobsR – select on JobsJ – join conditionJ – join conditionS – select on S – select on CandidatesCandidates

Page 12: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 12

OverviewOverview

1.1. Motivation Motivation

2.2. Query RelaxationQuery Relaxation

3.3. Lattice-based RelaxationLattice-based Relaxation

4.4. Relaxation AlgorithmsRelaxation Algorithms

5.5. VariationsVariations

6.6. ExperimentsExperiments

Page 13: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 13

Relaxing Selection ConditionsRelaxing Selection Conditions

Jobs CandidatesID

Company

Zipcode

Salary

ID

Zipcode

ExpSalary

WorkExp

J1 Broadcom

92047 80 C1

93652 120 3

J2 Intel 93652 95 C2

92612 130 6

J3 Microsoft

82632 120 C3

82632 100 5

J4 IBM 90391 130 C4

90391 150 1

... … … … ... … … …

Algorithm:Algorithm:

1.1. Compute Compute SkylineSkyline on Jobs on Jobs

2.2. Compute Compute SkylineSkyline on on CandidatesCandidates

3.3. Join the SkylinesJoin the Skylines

Salary <= 95

WorkExp >= 5

INCORRECTINCORRECT

SkylineSkyline

SkylineSkyline

Empty JoinEmpty Join

SkylineSkyline

R

RJS

f

J S

JSRSRJ

Page 14: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 14

Relaxing Selection ConditionsRelaxing Selection Conditions

Jobs CandidatesID

Company

Zipcode

Salary

ID

Zipcode

ExpSalary

WorkExp

J1 Broadcom

92047 80 C1

93652 120 3

J2 Intel 93652 95 C2

92612 130 6

J3 Microsoft

82632 120 C3

82632 100 5

J4 IBM 90391 130 C4

90391 150 1

... … … … ... … … …

Join FirstJoin First Algorithm: Algorithm:

1.1. Compute the joinCompute the join(disregarding the selections)(disregarding the selections)

2.2. Compute Compute SkylineSkyline on join results on join results

Salary <= 95

WorkExp >= 5JoinJoin

SkylineSkyline

R

RJS

f

J S

JSRSRJ

Page 15: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 15

Relaxing Selection ConditionRelaxing Selection ConditionVariationsVariations

Pruning JoinPruning JoinBuild the Skyline during the joinBuild the Skyline during the join

Pruning Join+Pruning Join+Pruning JoinPruning JoinBuild the local Skyline before the joinBuild the local Skyline before the join

Sorted Access JoinSorted Access JoinFagin’s Top-k: sort the columns on Fagin’s Top-k: sort the columns on

relaxationrelaxationCompute the join SkylineCompute the join Skyline

R

RJS

f

J S

JSRSRJ

Page 16: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 16

Relaxing all conditionsRelaxing all conditions

82632 - 93652 80 - 130

120 - 13082632 - 90391

12082632 13090391 ......

92047 - 93652 80 - 95

82632 - 93652 1 - 6

1 - 582632 - 90391

582632 190391 ......

92612 - 93652 3 - 6

Multi-Dim.-Index-based-RelaxationMulti-Dim.-Index-based-Relaxation Algorithm: Algorithm:

1.1. Traverse the index structure Traverse the index structure top-downtop-down

2.2. Form pairs of nodes or recordsForm pairs of nodes or records

3.3. Build the Build the SkylineSkyline

12082632

582632

13090391

190391

......

......Skyline

Queue

82632 - 93652 80 - 130

82632 - 93652 1 - 6

120 - 13082632 - 90391 120 - 13082632 - 90391

1 - 582632 - 90391 92612 - 93652 3 - 6

......

......

R

RJS

f

J S

JSRSRJ

Page 17: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 17

OverviewOverview

1.1. MotivationMotivation

2.2. Query RelaxationQuery Relaxation

3.3. Lattice-based RelaxationLattice-based Relaxation

4.4. Relaxation AlgorithmsRelaxation Algorithms

5.5. VariationsVariations

6.6. ExperimentsExperiments

Page 18: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 18

VariationsVariations

Computing Computing Top-kTop-k over Skyline over SkylineWeight to each conditionWeight to each condition

Queries with Queries with multiple joinsmultiple joins

Conditions on Conditions on nonnumeric attributesnonnumeric attributesDominance checking functionDominance checking function

J .Salary

C.WorkExp

J .Salary <= 95C.WorkExp >=5

5

95

Top 2

Page 19: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 19

OverviewOverview

1.1. MotivationMotivation

2.2. Query RelaxationQuery Relaxation

3.3. Lattice-based RelaxationLattice-based Relaxation

4.4. Relaxation AlgorithmsRelaxation Algorithms

5.5. VariationsVariations

6.6. ExperimentsExperiments

Page 20: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 20

Experimental SettingExperimental Setting

DatasetsDatasets RealReal

1.1. Internet Movie Database (IMDB)Internet Movie Database (IMDB)Movies (120k) & ActorInMovies (1.2m)Movies (120k) & ActorInMovies (1.2m)

2.2. Census-Income – UCI KDD RepositoryCensus-Income – UCI KDD RepositoryCensus (200k)Census (200k)

SyntheticSyntheticIndependent, Correlated, and AnticorrelatedIndependent, Correlated, and Anticorrelated

ImplementationImplementation GNU C++GNU C++ Spatial Index Library (R-tree)Spatial Index Library (R-tree) Linux, AMD Opteron 240, 1GB RAMLinux, AMD Opteron 240, 1GB RAM

Page 21: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 21

IMDB Dataset

Different algorithms, different Different algorithms, different behaviorsbehaviors

Page 22: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 22

Correlated Dataset

Different datasets, different Different datasets, different behaviorsbehaviors

Anticorrelated Dataset

Independent Dataset

Page 23: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 23

How big is the Skyline?How big is the Skyline?

Page 24: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 24

Relaxing join takes timeRelaxing join takes time

Self-join on Census Dataset

Page 25: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 25

Top-k over SkylineTop-k over Skyline

IMDB Dataset

Page 26: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 26

Related WorkRelated Work

Muslea et al.Muslea et al.Alternate forms of conjunctive expressionsAlternate forms of conjunctive expressions

Efficient Skyline algorithmsEfficient Skyline algorithmsSelection queriesSelection queries

Efficient Top-k algorithmsEfficient Top-k algorithmsRequire weights for conditionsRequire weights for conditions

Page 27: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 27

ConclusionsConclusions

Query relaxationQuery relaxation framework for selections framework for selections and joinsand joins

LatticeLattice-based approach for query relaxation-based approach for query relaxation

Efficient relaxation Efficient relaxation algorithmsalgorithms

Page 28: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 28

Future WorkFuture Work

OptimumOptimum use of the lattice structure use of the lattice structure

Relax conditions on Relax conditions on string attributesstring attributes

Algorithms applicableAlgorithms applicable outside the outside the databases databases

Page 29: Relaxing Join and Selection Queries

Questions ?Questions ?

Page 30: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 30

Page 31: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 31

Skyline vs. Top-kSkyline vs. Top-k

J .Salary

C.WorkExp

J .Salary <= 95C.WorkExp >=5

5

95

Top 2

J .Salary

C.WorkExp

J .Salary <= 95C.WorkExp >=5

5

95

Page 32: Relaxing Join and Selection Queries

Rares Vernica, UC Irvine 32

Skyline vs. Top-k over SkylineSkyline vs. Top-k over Skyline

J .Salary

C.WorkExp

J .Salary <= 95C.WorkExp >=5

5

95

J .Salary

C.WorkExp

J .Salary <= 95C.WorkExp >=5

5

95

Top 2