linked data top-k query processing

Post on 10-May-2015

458 Views

Category:

Education

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

"Linked Data Top-K Query Processing" paper at ESWC'12.

TRANSCRIPT

KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu

Institute of Applied Informatics and Formal Description Methods (AIFB)

Top-k Linked Data Query ProcessingAndreas Wagner, Duc Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi Studer

Institute of Applied Informatics and Formal Description Methods (AIFB)

2

Evaluation Results

Top-k Linked Data Query Processing

Introduction and Motivation

Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi Studer

Institute of Applied Informatics and Formal Description Methods (AIFB)

3

INTRODUCTION & MOTIVATION

Institute of Applied Informatics and Formal Description Methods (AIFB)

4

Linked Data Query Processing

Problems: Efficiency and Scalability

Linked Data Query Processing Engine

data

data sources

Src.

URI

HTTP lookup

Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi Studer

Institute of Applied Informatics and Formal Description Methods (AIFB)

5

Top-K Query Processing

Users are usually interested in only a few results

Top-K query processing addresses the efficiency and scalability issues

ex:sgt_pepper foaf:name "Sgt. Pepper";ex:song "Lucy".

ex:help foaf:name "Help!"; ex:song "Help!".

ex:beatles foaf:name "The Beatles"; ex:album ex:sgt_pepper; ex:album ex:help.

Src. 1Src. 2

Src. 3

SELECT * WHERE { ex:beatles ex:album ?album . ?album ex:song ?song .}

Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi Studer

Institute of Applied Informatics and Formal Description Methods (AIFB)

6

Contributions

Transfer top-k query processing to the Linked Data setting

Linked Data specific improvements of the top-k approach

Evaluation using real-world data

Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi Studer

Institute of Applied Informatics and Formal Description Methods (AIFB)

7

TOP-K LINKED DATA QUERY PROCESSING

Institute of Applied Informatics and Formal Description Methods (AIFB)

8

Linked Data Query Processing Engine

Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi Studer

Top-K Query Processing in a Linked Data Setting (1) – Requirements (1)

Source index mapping triple patterns to sources containing bindings (e.g., [1,2])

Ranking function determining the relevance of triple pattern bindings

ex:sgt_pepper foaf:name "Sgt. Pepper";ex:song "Lucy".

Src. 2

ex:help foaf:name "Help!"; ex:song "Help!".

Src. 3ex:beatles foaf:name "The Beatles"; ex:album ex:sgt_pepper; ex:album ex:help.

Src. 1

TP1: ex:beatles ex:album ?album . TP2: ?album ex:song ?song .

source index

score [2,3] ∈

score [1,2] ∈

score [0,1] ∈

TP1

TP2

TP2

Institute of Applied Informatics and Formal Description Methods (AIFB)

9

TP2: ?album ex:song ?song

Top-K Query Processing in a Linked Data Setting (2) – Requirements (2)

Sorted access on each join input

Src. 2

TP1:ex:beatles ex:album ?album

Bindings withdescendingscores

SchedulingStrategy

Src. 3score [2,3] ∈

2

3Src. 1

score [0,1] ∈

1

score [1,2] ∈

Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi Studer

Institute of Applied Informatics and Formal Description Methods (AIFB)

10

Top-K Query Processing in a Linked Data Setting (3) – Push Bound Rank Join (1)

Sorted Access forex:beatles ex:album ?album .

Sorted Access for?album ex:song ?song

Score Query Bindings – Output Queue

Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi Studer

Scheduling Strategy: Load source 1

ex:beatles foaf:name "The Beatles"; ex:album ex:sgt_pepper; ex:album ex:help.

Src. 1ex:help foaf:name "Help!"; ex:song "Help!".

Src. 3

Score Seen Triples (TP1) Score Seen Triples (TP2)Score Seen Triples (TP2)

3 ex:help ex:song "Help!"

Scheduling Strategy: Load source 3

Score Seen Triples (TP1)

1 ex:beatles ex:album ex:sgt_pepper

1 ex:beatles ex:album ex:help

Institute of Applied Informatics and Formal Description Methods (AIFB)

11

Top-K Query Processing in a Linked Data Setting (4) – Push Bound Rank Join (2)

Score Query Bindings – Output Queue

4 ex:beatles ex:album ex:help .ex:help ex:song "Help!" .

Threshold: 4

Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi Studer

Src. 2

Sorted Access forex:beatles ex:album ?album .

Sorted Access for?album ex:song ?song

Score Seen Triples (TP2)

3 ex:help ex:song "Help!"

Score Seen Triples (TP1)

1 ex:beatles ex:album ex:sgt_pepper

1 ex:beatles ex:album ex:help

Found query binding with score ≥ threshold

STOP

Institute of Applied Informatics and Formal Description Methods (AIFB)

12

Score Seen Triples (TP2)Score Seen Triples (TP1)

Improving the Threshold Estimation (1)

Threshold estimation:

max_1

min_1

max_2

min_2

Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi Studer

+

max_1 + min_2

We improve the threshold estimation:Star-shaped entity query bounds

Look-ahead bounds

max_2 + min_1

upperbound seen

upperbound unseen

Threshold: max { , }

Institute of Applied Informatics and Formal Description Methods (AIFB)

13

Improving the Threshold Estimation (2) Star-shaped Entity Query Bounds

Observation: Results for entity queries come from one single source

Idea: Upper bound scores for triple pattern bindings via the maximal possible triple score

score [2,3] ∈

ex:help foaf:name "Help!"; ex:song "Help!".

Src. 3

ex:sgt_pepper foaf:name "Sgt. Pepper";ex:song "Lucy".

Src. 2

score [1,2] ∈

upper-bound for triple bindings: 3

?x

?y

?zfoaf:name

ex:song

upper-bound for triple bindings: 3

upper bound for entity query bindings: 3 + 3

Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi Studer

Institute of Applied Informatics and Formal Description Methods (AIFB)

14

Improving the Threshold Estimation (3) Look-ahead Bounds

Idea: Provide a more accurate upper bound for the unseen bindings scores via the „next possible“ score

Score Query Bindings – Output Queue

4 ex:beatles ex:album ex:help .ex:help ex:song "Help!" .

max_1 = 1

min_1 = 1

Threshold: max { 1 + 3 , 1 + 3 } = 4

max_2 = 3

min_2 = 3

Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi Studer

score ∈ [1,2]

Threshold: max { 1 + 2 , 1 + 3 } = 4

Score Seen Triples (TP1)

1 ex:beatles ex:album ex:sgt_pepper

1 ex:beatles ex:album ex:help

Score Seen Triples (TP2)

3 ex:help ex:song "Help!"

Sorted Access for?album ex:song ?song

Sorted Access forex:beatles ex:album ?album .

Src. 2

Src. 3

min_2 = 2

Institute of Applied Informatics and Formal Description Methods (AIFB)

15

EVALUATION

Institute of Applied Informatics and Formal Description Methods (AIFB)

16

Evaluation – Setting

We implemented three systemsPush-based symmetric hash join operator [2,5]

Standard top-k operator [6]

Improved top-k operator

Query set: 20 queries (8 FedBench and 12 own queries), having varying result size (1 to ~10.000) and complexity (2 to 5 triple patterns)

Data set: ~ 2.000.000 triples, distributed over ~700.000 sources

Parameters: k {1,5,10,20} ∈ and score distributions ∈{uniform, normal, exponential}

Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi Studer

Institute of Applied Informatics and Formal Description Methods (AIFB)

17

Evaluation – Results (1)

Overall Results

Top-k strategies lead to runtime improvement of 35% on average (compared to standard Linked Data processing)

Tighter bounding lead to further improvements of 12% on average (compared to standard top-k processing)

Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi Studer

Overview of processing times for all queries (k = 1, d = n)

Institute of Applied Informatics and Formal Description Methods (AIFB)

18

Evaluation – Results (2)

Effect of K and Score Distributions

Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi Studer

Institute of Applied Informatics and Formal Description Methods (AIFB)

19

CONCLUSION

Institute of Applied Informatics and Formal Description Methods (AIFB)

20

Conclusion

We showed that top-k processing techniques are applicable to the Linked Data setting.

Top-k strategies lead to significant time savings w.r.t. small values of k (in our experiments 35% on average)

We showed that our improved top-k strategy lead to further runtime advantages (in our experiments 12% on average)

Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi Studer

Institute of Applied Informatics and Formal Description Methods (AIFB)

21

QUESTIONS

Institute of Applied Informatics and Formal Description Methods (AIFB)

22

REFERENCES

Institute of Applied Informatics and Formal Description Methods (AIFB)

23

References

[1] A. Harth, K. Hose, M. Karnstedt, A. Polleres, K. Sattler, and J. Umbrich. Data summaries for on-demand queries over linked data. In World Wide Web,

2010.

[2] G. Ladwig and T. Tran. Linked Data Query Processing Strategies. In ISWC, 2010.

[3] M. Wu, L. Berti-Equille, A. Marian, C. M. Procopiuc, and D. Srivastava. Processing top-k join queries. Proc. VLDB Endow., pages 860–870, 2010.

[4] A. Harth, S. Kinsella, and S. Decker. Using naming authority to rank data and

ontologies for web search. In ISWC, pages 277–292, 2009.

[5] G. Ladwig and T. Tran. SIHJoin: Querying Remote and Local Linked Data. In

ESWC, 2011.

[6] K. Schnaitter and N. Polyzotis. Optimal algorithms for evaluating rank joins in

database systems. ACM Trans. Database Syst., 35:6:1–6:47, 2010.

Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi Studer

Institute of Applied Informatics and Formal Description Methods (AIFB)

24

BACKUP SLIDES

Institute of Applied Informatics and Formal Description Methods (AIFB)

25

Early Pruning of Partial Results

Motivation: Top-k join processing can be quite costly in terms of memory consumption

Idea: Prune such partial query results that cannot contribute to a final top-k result

?x

?yex:song

foaf:name ?z

upper-bound for triple bindings: 3

Rank Triple Pattern Binding

1 ex:sgt_pepper ex:song "Getting Better".

Currently known top-2 results:

Rank Query Bindings – Output Queue

6 ex:help foaf:name "Help!".ex:help ex:song "Help!" .

4 ex:sgt_pepper foaf:name "Sgt. Pepper".ex:sgt_pepper ex:song "Lucy".

Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi Studer

+

Currently known partial results:

maximal score: 3 + 1 = 4

top related