15. october 2010 - uni-freiburg.dezablocki/talks/handling... · 2011-02-17 · semantic rhine...

39
Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information Systems Group 15. October 2010

Upload: others

Post on 08-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Semantic Rhine

Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information Systems Group

15. October 2010

Page 2: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

1. Motivation

2. MapReduce

3. RDFPath

4. PigSPARQL

5. Summary

Handling Large RDF Graphs with MapReduce 2 0. Overview

Page 3: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Analysis of large RDF Graphs

Handling Large RDF Graphs with MapReduce 3 1. Motivation

Page 4: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Facebook (2010)1

◦ > 500 million active users

◦ > 900 million interactive objects (sites, groups, events, …)

◦ Usage: > 700 billion minutes per month

◦ Can be expressed as RDF Graphs

How to handle such large RDF Graphs?

Approach: Distributed analysis of large RDF Graphs using MapReduce

Handling Large RDF Graphs with MapReduce 4 1. Motivation

Large RDF Graphs

Source: (1) Facebook Press Room (12.10.2010)

http://www.facebook.com/press/info.php?statistics

Page 5: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

MapReduce Cluster

Handling Large RDF Graphs with MapReduce 5 1. Motivation

Overview

RDF Graph

Path Queries

RDFPath

Graph Pattern

SPARQL

Pig Latin

PigSPARQL

Page 6: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Principles & Basic Concepts

Handling Large RDF Graphs with MapReduce 6 4. PigSPARQL

Page 7: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Google‘s MapReduce ◦ Automatic parallelization of computations ◦ Fix and simple level of abstraction: Map & Reduce

Distributed File System

◦ Clusters of commodity hardware Fault tolerance by replication

◦ Very large files / write-once, read-many-times

Hadoop ◦ Open Source implementation (Apache project) ◦ Used by Yahoo, Facebook, Amazon, IBM, Last.fm, … ◦ more

Handling Large RDF Graphs with MapReduce 7 2. MapReduce

Page 8: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Path queries on large RDF Graphs Martin Przyjaciel-Zablocki

Handling Large RDF Graphs with MapReduce 8 3. RDFPath

Page 9: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Requirements ◦ Navigational queries over RDF Graphs

◦ Extendibility

◦ Particularly with regard to a MapReduce evaluation

Idea ◦ Declarative path specification with XPath like location steps

◦ Every location step can be mapped to one MapReduce job

Handling Large RDF Graphs with MapReduce 9 3. RDFPath

Path Language

Page 10: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Peter :: knows[country=equals(DE)] > age.

Results ◦ Peter (knows) Frank [country=DE] (age) 17

◦ Peter (knows) Klaus [country=CH]

◦ Peter (knows) Simon [country=CH]

Handling Large RDF Graphs with MapReduce 10 3. RDFPath

Example Queries

knows

country

age

Frank DE

Simon Peter

Klaus

CH

Chris

42

60 17

25

Page 11: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Peter :: knows(*3).

Results ◦ Peter (knows) Frank

◦ Peter (knows) Frank (knows) Chris

◦ Peter (knows) Klaus

◦ Peter (knows) Simon

Handling Large RDF Graphs with MapReduce 11 3. RDFPath

Example Queries

knows

country

age

Frank DE

Simon Peter

Klaus

CH

Chris

42

60 17

25 more

Page 12: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Starting nodes

◦ fixed or arbitrary

Location step follows edge

Filters & sub queries

Shortest path queries

Avoidance of cycles

Different types of result

◦ paths, nodes, aggregations,…

Handling Large RDF Graphs with MapReduce 12 3. RDFPath

Path Language

Page 13: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

RDFPath-Store ◦ Build on the top of HDFS + local storage

◦ Vertical partitioning related to predicates (edges)

◦ Optional Dictionary Encoding

Query-Engine

Peter :: knows(*2) > knows.

Handling Large RDF Graphs with MapReduce 13 3. RDFPath

Further components

Peter (knows) Frank (knows) Chris Peter (knows) Simon Peter (knows) Klaus

previous paths

Chris Peter Johan Frank Frank Chris

knows

Join

rsj

system

Page 14: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Hadoop cluster with 10 servers

Real Last.fm & generated SP2Bench datasets

Results

◦ Promising scaling behavior

◦ Evaluated up to 1.6 billion triples

◦ Considered problems:

Shortest path, Erdoes-number, Six-degrees of sep., …

◦ Dictionary Encoding reduces data but with significant

Dictionary lookup costs

Handling Large RDF Graphs with MapReduce 14

3. RDFPath

Evaluation

Page 15: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Translating SPARQL to Pig Latin Alexander Schätzle

Handling Large RDF Graphs with MapReduce 15 4. PigSPARQL

Page 16: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Advantages of MapReduce

◦ Parallelization done by the system

◦ Good fault tolerance & scalability

Drawbacks of MapReduce ◦ „Low-Level“ to implement & hard to maintain

◦ No primitives like JOIN or GROUP

Pig Latin ◦ „High-Level“ language for data analysis with Hadoop

◦ Link between user & MapReduce

◦ Automatic translation into MapReduce jobs

◦ more

Handling Large RDF Graphs with MapReduce 16 4. PigSPARQL

Page 17: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

1. Step

◦ Convert SPARQL Query into SPARQL Algebra-Tree

Handling Large RDF Graphs with MapReduce 17 4. PigSPARQL

SELECT *

WHERE {

?person foaf:name ?name.

?person foaf:age ?age.

FILTER (?age >= 18)

OPTIONAL {

?person foaf:mbox ?mbox

}

}

BGP

?person name ?name .

?person age ?age

BGP

?person mbox ?mbox

LeftJoin

Filter

?age >= 18

Page 18: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

2. Step

◦ Translate Algebra-Tree into Pig Latin Program

Handling Large RDF Graphs with MapReduce 18 4. PigSPARQL

indata = LOAD 'pathToFile' USING myLoad() AS (s,p,o);

f1 = FILTER indata BY p=='foaf:name';

t1 = FOREACH f1 GENERATE s AS person, o AS name;

f2 = FILTER indata BY p=='foaf:age';

t2 = FOREACH f2 GENERATE s AS person, o AS age;

j1 = JOIN t1 BY person, t2 BY person;

BGP1 = FOREACH j1 GENERATE t1::person AS person,

t1::name AS name, t2::age AS age;

BGP

?person name ?name .

?person age ?age

BGP

?person mbox ?mbox

LeftJoin

Filter

?age >= 18

Page 19: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

2. Step

◦ Translate Algebra-Tree into Pig Latin Program

Handling Large RDF Graphs with MapReduce 19 4. PigSPARQL

indata = LOAD 'pathToFile' USING myLoad() AS (s,p,o);

f1 = FILTER indata BY p=='foaf:name';

t1 = FOREACH f1 GENERATE s AS person, o AS name;

f2 = FILTER indata BY p=='foaf:age';

t2 = FOREACH f2 GENERATE s AS person, o AS age;

j1 = JOIN t1 BY person, t2 BY person;

BGP1 = FOREACH j1 GENERATE t1::person AS person,

t1::name AS name, t2::age AS age;

F1 = FILTER BGP1 BY age >= 18;

BGP

?person name ?name .

?person age ?age

BGP

?person mbox ?mbox

LeftJoin

Filter

?age >= 18

Page 20: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

2. Step

◦ Translate Algebra-Tree into Pig Latin Program

Handling Large RDF Graphs with MapReduce 20 4. PigSPARQL

indata = LOAD 'pathToFile' USING myLoad() AS (s,p,o);

f1 = FILTER indata BY p=='foaf:name';

t1 = FOREACH f1 GENERATE s AS person, o AS name;

f2 = FILTER indata BY p=='foaf:age';

t2 = FOREACH f2 GENERATE s AS person, o AS age;

j1 = JOIN t1 BY person, t2 BY person;

BGP1 = FOREACH j1 GENERATE t1::person AS person,

t1::name AS name, t2::age AS age;

F1 = FILTER BGP1 BY age >= 18;

f1 = FILTER indata BY p=='foaf:mbox';

BGP2 = FOREACH indata GENERATE s AS person, o AS mbox;

BGP

?person name ?name .

?person age ?age

BGP

?person mbox ?mbox

LeftJoin

Filter

?age >= 18

Page 21: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

2. Step

◦ Translate Algebra-Tree into Pig Latin Program

Handling Large RDF Graphs with MapReduce 21 4. PigSPARQL

indata = LOAD 'pathToInput' USING myLoad() AS (s,p,o);

f1 = FILTER indata BY p=='foaf:name';

t1 = FOREACH f1 GENERATE s AS person, o AS name;

f2 = FILTER indata BY p=='foaf:age';

t2 = FOREACH f2 GENERATE s AS person, o AS age;

j1 = JOIN t1 BY person, t2 BY person;

BGP1 = FOREACH j1 GENERATE t1::person AS person,

t1::name AS name, t2::age AS age;

F1 = FILTER BGP1 BY age >= 18;

f1 = FILTER indata BY p=='foaf:mbox';

BGP2 = FOREACH indata GENERATE s AS person, o AS mbox;

lj = JOIN F1 BY person LEFT OUTER, BGP2 BY person;

LJ1 = FOREACH lj GENERATE F1::person AS person,

F1::name AS name, F1::age AS age,

BGP2::mbox AS mbox;

STORE LJ1 INTO 'pathToOutput' USING myStore();

BGP

?person name ?name .

?person age ?age

BGP

?person mbox ?mbox

LeftJoin

Filter

?age >= 18

Page 22: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Three Levels of Optimization:

SPARQL Algebra

◦ Filter Optimizations (Pushing, Splitting, Substitution)

◦ Triple-Pattern Reordering by Selectivity

Algebra Translation ◦ Delete unnecessary Data as early as possible

◦ Multi-Joins to reduce the Number of Joins

Data Representation ◦ Vertical Partitioning of the RDF-Data by Predicate

Handling Large RDF Graphs with MapReduce 22 4. PigSPARQL

Page 23: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Handling Large RDF Graphs with MapReduce 23 4. PigSPARQL

00:00:00

02:00:00

04:00:00

06:00:00

08:00:00

10:00:00

12:00:00

14:00:00

16:00:00

18:00:00

0 200 400 600 800 1000 1200 1400 1600

Tim

e (

hh:m

m:s

s)

RDF-Triples (in Millions)

Q2 Q2 MJ Q2 MJ+VP

3171,2

8

1345,4

5

1244,0

9

2214,1

7

388,3

5

272,4

6

168,2

7

125,7

0

178,2

0

HDFS Bytes

Read

HDFS Bytes

Written

Reduce Shuffle

Bytes

in GB (1600M RDF-Triples)

Q2

Q2 MJ

Q2 MJ+VP

SELECT ?inproc ?author ?booktitle ?title

?proc ?ee ?page ?url ?yr ?abstract

WHERE {

?inproc rdf:type bench:Inproceedings .

?inproc dc:creator ?author .

?inproc bench:booktitle ?booktitle .

?inproc dc:title ?title .

?inproc dcterms:partOf ?proc .

?inproc rdfs:seeAlso ?ee .

?inproc swrc:pages ?page .

?inproc foaf:homepage ?url .

?inproc dcterms:issued ?yr

OPTIONAL {

?inproc bench:abstract ?abstract

}

}

ORDER BY ?yr

Native Translation needs 8 Joins + 1 Outer Join

Multi-Join reduces the number of Joins

Vertical Partitioning reduces the Input-Data

Page 24: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Handling Large RDF Graphs with RDFPath & PigSPARQL on MapReduce

Handling Large RDF Graphs with MapReduce 24 4. PigSPARQL

Page 25: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

RDFPath is especially suited for the execution of path queries on large RDF Graphs with MapReduce

PigSPARQL allows the efficient execution of SPARQL queries with MapReduce

Handling up to 1.6 Billion RDF Triples

Both approaches show a promising scaling behavior

I/O is the dominating bottleneck Optimization means reducing the I/O

Handling Large RDF Graphs with MapReduce 25 5. Summary

Page 26: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Handling Large RDF Graphs with MapReduce 26 Thanks

Page 27: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

MapReduce

Pig Latin – Data Model

Pig Latin – Operators

RDFPath – Last.fm Example

Reduce-Side-Join

RDFPath System Overview

Handling Large RDF Graphs with MapReduce 27

Backup Slides

Page 28: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Steps of a MapReduce execution

Handling Large RDF Graphs with MapReduce 28 2. MapReduce

Split 1

Split 0

Split 2

Split 3

Split 4

Split 5

Map

Map

Map

Reduce

Reduce

Out 0

Out 1

Input

(HDFS)

Intermediate Results

(Local)

Output

(HDFS)

Map-Phase Shuffle-Phase Reduce-Phase

Page 29: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Signature of a Map-Function ◦ map(in_key, in_value) -> (out_key, intermediate_value) list

Signature of a Reduce-Function ◦ reduce(out_key, intermediate_value list) -> out_value list

back

Handling Large RDF Graphs with MapReduce 29 2. MapReduce

Page 30: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Handling Large RDF Graphs with MapReduce 30 4. PigSPARQL

Flexible, nested Data Model 4 Datatypes:

'Bob'

('John', 'Doe')

('Bob' , 'Sarah') ('Peter', ('likes', 'football'))

'knows' -> {('Sarah')} 'age' -> 24

Tupelwise Loading of Data with "User Defined Function" every Field of a Tuple can have a Name and a Datantype

Atom:

Tuple:

Bag:

Map:

Page 31: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Handling Large RDF Graphs with MapReduce 31 4. PigSPARQL

FOREACH: Apply Processing on every Tuple

Ex: result = FOREACH input GENERATE field1*field2 AS mul ;

field1 field2

2 3

4 7

mul

6

28

input result

FILTER: Delete unwanted Tuples

Ex: adults = FILTER persons BY age >= 18 ;

name age

Bob 21

Sarah 17

persons adults

name age

Bob 21

Page 32: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Handling Large RDF Graphs with MapReduce 32 4. PigSPARQL

field1

a

b

left

[OUTER] JOIN: Combine two or more Relations

Ex: result = JOIN left BY field1 [LEFT OUTER], right BY field2 ;

field1 field2

4 a

7 a

right

left:: field1

right:: field1

right:: field2

a 4 a

a 7 a

result

field1

a

b

left

field1 field2

4 a

7 a

right

left:: field1

right:: field1

right:: field2

a 4 a

a 7 a

b

result

Outer

Page 33: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Handling Large RDF Graphs with MapReduce 33 4. PigSPARQL

field1 field2

a 1

rel1

UNION: Ex: result = UNION rel1, rel2 ;

field1 field2

a 1

b 3

result

field1 field2

b 3

rel2

U

field1 field2

3 a

1 b

input

ORDER: Ex: result = ORDER input BY field1 ;

field1 field2

1 b

3 a

result

back

Page 34: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Michael_Jackson :: artistTracks

[trackAlbum = equals(Michael_Jackson_-_Thriller)]

> trackSimilar [trackDuration = min(50000)]

> trackTopFans [userCountry = equals(DE)].

Results ◦ Michael_Jackson (artistTracks) Michael_Jackson_-_Beat_It (trackSimilar) Michael_Jackson_-_Billie_Jean (trackTopFans) Mark

◦ Michael_Jackson (artistTracks) Michael_Jackson_-_Someone_in_the_Dark (trackSimilar) Rihanna_-_Russian_Roulette (trackTopFans) Megan

Handling Large RDF Graphs with MapReduce 34 3. RDFPath

Example Queries

back

Page 35: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Example: * :: knows(*2) > knows.

Handling Large RDF Graphs with MapReduce 35 3. RDFPath

Reduce-Side-Join

Peter (knows) Frank (knows) Chris Peter (knows) Simon (knows) Johan Klaus (knows) Simon (knows) Johan Frank (knows) Chris Peter (knows) Klaus

previous paths

Chris Peter Johan Frank Johan Lukas Frank Chris

knows Reduce-Sid

e Jo

in

Page 36: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Mapper Input Mapper Output

Handling Large RDF Graphs with MapReduce 36 3. RDFPath

Reduce-Side-Join

Peter (knows) Frank (knows) Chris Peter (knows) Simon (knows) Johan Klaus (knows) Simon (knows) Johan Frank (knows) Chris Peter (knows) Klaus

previous paths

Chris Peter Johan Frank Johan Lukas Frank Chris

knows

(Chris, 1) Peter (knows) Frank (knows) Chris (Johan, 1) Peter (knows) Simon (knows) Johan (Johan, 1) Klaus (knows) Simon (knows) Johan (Chris, 1) Frank (knows) Chris (Klaus, 1) Peter (knows) Klaus

(Chris, 0) Peter (Johan, 0) Frank (Johan, 0) Lukas (Frank, 0) Chris …

Key Value

Page 37: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Reducer‘s strategy (sorting phase): (1) Partition according to the first keypair % #reducer (2) Sort within a partiton according the whole keypair

Consequences

◦ A Reducer gets all „values“ with the same first keypair

◦ The „values“ within a partiton contains at first all new nodes and thereafter all previous paths

Handling Large RDF Graphs with MapReduce 37 3. RDFPath

Reduce-Side-Join

Page 38: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Reducer Input Reducer Output

Handling Large RDF Graphs with MapReduce 38 3. RDFPath

Reduce-Side-Join

Johan Frank Lukas Peter (knows) Simon (knows) Johan Klaus (knows) Simon (knows) Johan

Chris Peter Manu Peter (knows) Frank (knows) Chris Frank (knows) Chris

Peter (knows) Frank (knows) Chris (knows) Peter Peter (knows) Frank (knows) Chris (knows) Manu Frank (knows) Chris (knows) Peter Frank (knows) Chris (knows) Manu

… Peter (knows) Simon (knows) Johan (knows) Frank Peter (knows) Simon (knows) Johan (knows) Lukas Klaus (knows) Simon (knows) Johan (knows) Frank Klaus (knows) Simon (knows) Johan (knows) Lukas

back

Page 39: 15. October 2010 - uni-freiburg.dezablocki/talks/Handling... · 2011-02-17 · Semantic Rhine Martin Przyjaciel-Zablocki Alexander Schätzle University of Freiburg Databases & Information

Evaluation

MapReduce-Framework HDFS RDFPath-Store

Query Engine

Intermediate Language Sequence of MapReduce Jobs

RDFPath Query

Handling Large RDF Graphs with MapReduce 39 3. RDFPath

System

back