creating competitive products qian wan [1], raymond chi-wing wong [1], ihab f. ilyas [2], m. tamer...

Post on 23-Dec-2015

223 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Creating Competitive Products

Qian Wan[1], Raymond Chi-Wing Wong[1], Ihab F. Ilyas[2], M. Tamer Ozsu[2], Yu Peng[1]

[1] Hong Kong University of Science and Technology

[2] University of WaterlooPresented by Qian WanPrepared by Qian Wan

2

Outline

• Background– Skyline, Related Work

• Motivation– Examples, Problem Definition

• Algorithm– Framework, Grouping, Pruning

• Experiments– Synthetic, Real data– 6 factors

• Conclusions

3

Skyline

• Definition– Skyline contains the points which are not dominated by

others

• Hotel searching problem– Distance to beach VS Price– Dominance– Skyline

Dist

Price

H3

H5

H7

H9

H1

H2

H4

H6

H8

Dist

Price

H1

H2

4

Related Work

• Skyline Queries in DBMS [S.Borzsonyi, 2001]

• Single Table Skyline Queries– Bitmaps[K.L. Tan,2001], Nearest Neighbor[D.Kossomann,

2002], Branch and Bound Skylines[D.Papadias, 2005]

• Multi-Table Skyline Queries– Natural Join [W.Jin, 2007][D.Sun, 2008]

– Our Work• Join different source tables via a “Cartesian product”

like procedure.

5

Outline

• Background– Skyline, Related Work

• Motivation– Examples, Problem Definition

• Algorithm– Framework, Grouping, Pruning

• Experiments– Synthetic, Real data– 6 factors

• Conclusions

6

A Travel Agency’s DatabasePackage No-of-

stopsDistance-to-beach

Hotel-class Price

P1 0 130 2 250

P2 1 140 2 170

P3 1 300 1 150

P4 1 150 4 300

Existing Vacation Packages

Hotel Distance-to-beach

Hotel-class

Hotel-cost

H1 100 3 100

H2 200 2 90

H3 400 1 80

Flight No-of-stops

Flight-cost

F1 0 120

F2 1 100

Package No-of-stops

Distance-to-beach

Hotel-class Price

Q1(F1:H1) 0 100 3 220

Q2(F1,H2) 0 200 2 210

Q3(F1, H3) 0 400 1 200

… … … … …

Q24(f4,h6) 2 200 3 210

Newly Created Vacation Packages

Source Tables

1. Direct attributes2. Indirect attributes3. One indirect attribute characteristic e.g. Travel Agency (Price), PC Manufacture(Price) and Logistic Transportation Service (Price)

21,TT

ET

QT

Skyline tuples

7

Finding Competitive Products

• Given a set of source tables• Market packages• New packages • Then, a tuple q in TQ is said to be competitive

product if q is in Skyline with respect to

kTTT ..., 21

ET

QT

QE TT

8

Naïve Solution

Hotel Distance-to-beach

Hotel-class

Hotel-cost

H1 100 3 100

H2 200 2 90

H3 400 1 80

H4 150 2 150

H5 170 2 140

H6 200 3 120

Flight No-of-stops

Flight-cost

F1 0 120

F2 1 100

F3 2 80

F4 2 90

Package No-of-stops

Distance-to-beach

Hotel-class

Price

Q1(f1:h1)

0 100 3 220

Q2(f1,h2)

0 200 2 210

Q3(f1, h3)

0 400 1 200

… … … … …

Q7(f2,h1)

1 100 3 200

… … … … …

Q13(f3,h1)

2 100 3 180

… … … … …

Q24(f4,h6)

2 200 3 210

Package

No-of-stops

Distance-to-beach

Hotel-class

Price

P1 0 130 2 250

P2 1 140 2 170

P3 1 300 1 150

P4 1 150 4 300

1. Intra-dominance checking2. Inter-dominance checking

Source Tables

Existing Vacation Packages

Newly Created Vacation Packages

Package

No-of-stops

Distance-to-beach

Hotel-class

Price

Q1(f1:h1)

0 100 3 220

Q2(f1,h2)

0 200 2 210

Q3(f1, h3)

0 400 1 200

… … … … …

Q7(f2,h1)

1 100 3 200

… … … … …

Q13(f3,h1)

2 100 3 180

Competitive Products

9

Outline

• Background– Skyline, Related Work

• Motivation– Examples, Problem Definition

• Algorithm– Framework, Grouping, Pruning

• Experiments– Synthetic, Real data– 6 factors

• Conclusions

10

Algorithm Overview

• Intra-dominance checking (Framework)– To Find Skyline in Source Tables

• Inter-dominance checking– Skyline in Existing Market Packages– R* Tree Indies in Existing Market Packages– Full Pruning– Partial Pruning

• Post-processing

11

Intra-dominance Checking

Hotel Distance-to-beach

Hotel-class

Hotel-cost

H1 100 3 100

H2 200 2 90

H3 400 1 80

H4 150 2 150

H5 170 2 140

H6 200 3 120

Flight No-of-stops

Flight-cost

F1 0 120

F2 1 100

F3 2 80

F4 2 90

Package No-of-stops

Distance-to-beach

Hotel-class

Price

Q1(f1:h1)

0 100 3 220

Q2(f1,h2)

0 200 2 210

Q3(f1, h3)

0 400 1 200

… … … … …

Q7(f2,h1)

1 100 3 200

… … … … …

Q13(f3,h1)

2 100 3 180

… … … … …

Q15(f3,h5)

2 170 3 200

Hotel Distance-to-beach

Hotel-class

Hotel-cost

H1 100 3 100

H2 200 2 90

H3 400 1 80

H4 150 2 150

H5 170 2 140

Flight No-of-stops

Flight-cost

F1 0 120

F2 1 100

F3 2 80

Skyline Tuples of Source Tables

Newly Created Vacation Packages

1. NO intra-dominance checking(one indirect attribute)2. NO competitive products are missing

Package No-of-stops

Distance-to-beach

Hotel-class

Price

Q1(f1:h1)

0 100 3 220

Q2(f1,h2)

0 200 2 210

Q3(f1, h3)

0 400 1 200

… … … … …

Q7(f2,h1)

1 100 3 200

… … … … …

Q13(f3,h1)

2 100 3 180

Competitive Products

12

Algorithm Overview

• Intra-dominance checking (Framework)– To Find Skyline in Source Tables

• Inter-dominance checking– Skyline in Existing Market Packages– R* Tree Indies in Existing Market Packages– Full Pruning– Partial Pruning

• Post-processing

13

Inter-dominance Checking

Package No-of-stops

Distance-to-beach

Hotel-class

Price

P1 0 130 2 250

P2 1 140 2 170

P3 1 300 1 150

P4 1 150 4 300

Package No-of-stops

Distance-to-beach

Hotel-class

Price

P1 0 130 2 250

P2 1 140 2 170

P3 1 300 1 150

P4 1 150 4 300

Package No-of-stops

Distance-to-beach

Hotel-class

Price

P1 0 130 2 250

P2 1 140 2 170

P3 1 300 1 150

No Missing Competitive Products

R* Tree will speedup the inter-dominance checking

Existing Vacation Packages Skyline in Existing

Vacation Packages

R0

R1

R3 R4

R2

R5

Inter-dominance Checking Range query

14

Algorithm Overview

• Intra-dominance checking (Framework)– To Find Skyline in Source Tables

• Inter-dominance checking– Skyline in Existing Market Packages– R* Tree Indies in Existing Market Packages– Full Pruning– Partial Pruning

• Post-processing

15

GroupingPackage No-of-

stopsDistance-to-beach

Hotel-class

Price

P1 0 130 2 250

P2 1 140 2 170

P3 1 300 1 150

Package No-of-stops

Distance-to-beach

Hotel-class

Price

Q1(f1:h1)

0 100 3 220

Q2(f1,h2)

0 200 2 210

Q3(f1, h3)

0 400 1 200

… … … … …

Q7(f2,h1)

1 100 3 200

… … … … …

Q13(f3,h1)

2 100 3 180

… … … … …

Q15(f3,h5)

2 170 3 200

Hotel Distance-to-beach

Hotel-class

Hotel-cost

H1 100 3 100

H2 200 2 90

H3 400 1 80

H4 150 2 150

H5 170 2 140

Flight No-of-stops

Flight-cost

F1 0 120

F2 1 100

F3 2 80

Skyline Tuples of Source Tables

Newly Created Vacation Packages

Package No-of-stops

Distance-to-beach

Hotel-class

Price

Q1(f1:h1)

0 100 3 220

Q2(f1,h2)

0 200 2 210

Q3(f1, h3)

0 400 1 200

… … … … …

Q7(f2,h1)

1 100 3 200

… … … … …

Q13(f3,h1)

2 100 3 180

Existing Vacation Packages

Competitive Products

A1

A2

B1

B2

C1={A1, B1}

C4={A2, B2}

Full Pruning

16

Full PruningPackage No-of-

stopsDistance-to-beach

Hotel-class

Price

P1 0 130 2 250

P2 1 140 2 170

P3 1 300 1 150

Best Representative

B1

B2

… … … … …

Bi

… … … … …

Bj

… … … … …

Bk

Group

C1

C2

… … … … …

Ci

… … … … …

Cj

… … … … …

Ck

Package No-of-stops

Distance-to-beach

Hotel-class

Price

Q(f2:h4) 1 150 4 250

Q’(f2,h5) 1 170 4 240

Package No-of-stops

Distance-to-beach

Hotel-class

Price

Min 1 150 4 240

Quality of Best Representative: tightness of each group(Clustering, e.g. KMeans)

Best Representative

17

Algorithm Overview

• Intra-dominance checking (Framework)– To Find Skyline in Source Tables

• Inter-dominance checking– Skyline in Existing Market Packages– R* Tree Indies in Existing Market Packages– Full Pruning– Partial Pruning

• Post-processing

18

Partial Pruning

Partial Pruning Full pruning prunes all members in the group Partial pruning prunes some members in the group Partial pruning is used when full pruning cannot be applied

Idea Direct attribute does not change Estimate the best possible value for indirect attributes Eliminate a combination , if

It is dominated on all direct attributes It is dominated on all indirect attributes according to their best

estimation

19

Algorithm Overview

• Framework• Intra-dominance checking– To Find Skyline in Source Tables

• Inter-dominance checking– Skyline in Existing Market Packages– R* Tree Indies in Existing Market Packages– Full Pruning– Partial Pruning

• Post-processing

20

Post-processing

• More than one indirect attributes– Calculation• Previous algorithm Intra-dominance checking

– Any existing Skyline algorithm– Post-processing cost depends on the size of

Competitive Products

21

Outline

• Background– Skyline, Related Work

• Motivation– Examples, Problem Definition

• Algorithm– Framework, Grouping, Pruning

• Experiments– Synthetic, Real data– 6 factors

• Conclusions

22

Experiments

• Pentium IV 2.4GHz PC with 4GB memory, Linux platform, C++

• Synthetic anti-correlated datasets• Real datasets, Travel Agency A and Travel Agency B

– A, 296 packages, 1014 hotels and 4394 flights – B, 149 packages, 995 hotels and 866 flights

• Implementation– Algorithm for Creating Competitive Products (ACCP)– Baseline algorithm – Naïve algorithm

Preprocessing R* Tree Pruning

ACCP Yes Yes Yes

Baseline Yes Yes No

Naïve No No No

23

Synthetic DatasetsParameters Default value

No. of attributes in each source table 4

No. of indirect attributes in a product table

1

No. of source tables 2

No. of clusters in each source table 2

Size of existing packages 5M

Size of each source table 100k

• Schema is the same as example

• Anti-correlated• 6 factors• Measurement

– Execution time– Pruning Power– Ratio of Competitive

Products out of all combinations

– Memory Usage

24

ExperimentsParameters Execution time Pruning Power Ratio of

Competitive Products

Memory Usage

No. of attributes in each source table

1 2 3 4

No. of indirect attributes in a product table

5 6 7 8

No. of source tables

9 10 11 12

No. of clusters in each source table

13 14 15 16

Size of existing packages

17 18 19 20

Size of each source table

21 22 23 24

25

Experiments

From 100k to 500k

Full pruning & partial pruning

TQ, TQ’, and TR

Pruning Powerslightly increases

Parameters Default value

No. of attributes in each source table 4

No. of indirect attributes in a product table

1

No. of source tables 2

No. of clusters in each source table 6

Size of existing packages 5M

Size of each source table 100k

26

Outline

• Background– Skyline

• Motivation– Examples & Problem Definition

• Algorithm– Framework, Partition, Pruning

• Experiments– On both synthetic and real data– Over 6 factors

• Conclusions

27

Conclusions

• Creating Competitive Products– Example– Problem Definition

• Algorithms– Framework– Intra-dominance checking– Inter-dominance checking– Post-processing

• Experiments– Synthetic anti-correlated datasets– Real datasets

28

THANK YOU !Q&A

29

APPENDIX

30

Partial PruningPackage No-of-

stopsDistance-to-beach

Hotel-class

Price

P1 0 130 2 250

P2 1 140 2 170

P3 1 300 1 150

Package No-of-stops

Distance-to-beach

Hotel-class

Price

Q1(f1:h1)

0 100 3 220

Q2(f1,h2)

0 200 2 210

Q3(f1, h3)

0 400 1 200

… … … … …

Q7(f2,h1)

1 100 3 200

… … … … …

Q13(f3,h1)

2 100 3 180

… … … … …

Q15(f3,h5)

2 170 3 200

Hotel Distance-to-beach

Hotel-class

Hotel-cost

H1 100 3 100

H2 200 2 90

H3 400 1 80

H4 150 2 150

H5 170 2 140

Flight No-of-stops

Flight-cost

F1 0 120

F2 1 100

F3 2 80

Skyline Tuples of Source Tables

Newly Created Vacation Packages

Package No-of-stops

Distance-to-beach

Hotel-class

Price

Q1(f1:h1)

0 100 3 220

Q2(f1,h2)

0 200 2 210

Q3(f1, h3)

0 400 1 200

… … … … …

Q7(f2,h1)

1 100 3 200

… … … … …

Q13(f3,h1)

2 100 3 180

Existing Vacation Packages

Competitive Products

A1

B1

C1={A1, B1}

Full Pruning

Meta Transformation

Package No-of-stops

Distance-to-beach

Hotel-class

Price

P1 0 130 2 250

P2 1 140 2 170

P3 1 300 1 150

Package No-of-stops

Distance-to-beach

Hotel-class

Price

P2 1 140 2 170

Package No-of-stops

Price

P2 1 170

Package Distance-to-beach

Hotel-class Price

P2 140 2 170

Hotel Distance-to-beach

Hotel-class

Hotel-cost

H1 100 3 200

H2 200 2 190

H3 400 1 180

Flight No-of-stops

Flight-cost

F1 0 200

F2 1 180

•No inter-dominance checking for {F2} X{H2}

Meta-Hotel

Meta-Flight

Min 1 100

Min 400 1 80

Hotel Distance-to-beach

Hotel-class

Hotel-cost

H1 100 3 100

H2 200 2 90

H3 400 1 80

Flight No-of-stops

Flight-cost

F1 0 120

F2 1 100

A1

B1

32

Experiments

From 2.5M to 10M

Parameters Default value

No. of attributes in each source table 4

No. of indirect attributes in a product table

1

No. of source tables 2

No. of clusters in each source table 6

Size of existing packages 5M

Size of each source table 100k

More competitive Slightly decreases

33

Experiments

Travel Agency A Package Generation Set

1. A, 296 packages, 1014 hotels and 4394 flights . B, 149 packages, 995 hotels and 866 flights

2. Source tables from B, and Package from A

3. Vary discount from 0 to 0.504. Efficiency

ACCP(44.74s) and Baseline (84.47s)

5. |SKY|/|TQ|6. |DOM|/|TE|

DOMSKY

top related