hare: a hybrid sparql engine to enhance query answers via crowdsourcing

47
HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing Maribel Acosta , Elena Simperl, Fabian Flöck, Maria-Esther Vidal ?x dbp:producer dbr: Bad_Hair

Upload: maribel-acosta-deibe

Post on 17-Feb-2017

848 views

Category:

Science


1 download

TRANSCRIPT

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing

Maribel Acosta, Elena Simperl, Fabian Flöck, Maria-Esther Vidal!

?x  dbp:producer  dbr:  

Bad_Hair  

Motivation (1)

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 2  

Motivation (1)

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

Due to the semi-structured nature of RDF, incomplete values cannot be easily detected. !

3  

Motivation (2)

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

SELECT  DISTINCT  ?movie  WHERE  {  

 ?movie  rdf:type  schema.org:Movie  .  

 ?movie  dbp:producer  ?producer  .  

 ?movie  dct:subject  dbc:Universal_Pictures_film  .  

 ?movie  dct:subject  dbc:Films_shot_in_New_York_City  .  

}        

Retrieve  movies  that  have  producers  and  have  been  filmed  in                        New  York  City  by  Universal  Pictures.    

39 movies!(v. 2015-04)!

4  

Motivation (2)

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

SELECT  DISTINCT  ?movie  WHERE  {  

 ?movie  rdf:type  schema.org:Movie  .  

 ?movie  dbp:producer  ?producer  .  

 ?movie  dct:subject  dbc:Universal_Pictures_film  .  

 ?movie  dct:subject  dbc:Films_shot_in_New_York_City  .  

}        

46 movies!(There are 7 movies without producers)!

Retrieve  movies  that  have  producers  and  have  been  filmed  in                        New  York  City  by  Universal  Pictures.    

5  

(v. 2015-04)!

Motivation

Movies (shot in NYC by Universal Pictures) with no producers in!

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

All images licensed under Fair use via Wikipedia.!

dbr:Legal_Eagles

6  

dbr:Wanderlust dbr:Barney’s_ Version_(film)

dbr:Non_Stop_ (film)

dbr:The_Wolf_of_Wall_Street_(2013_film)

dbr:Broadway_Love dbr:Trainwreck_(film)

(v. 2015-04)!

Leonardo DiCaprio is a producer!

[[(?movie, dbp:producer, ?producer)]]D [[(?movie, dbp:producer, ?producer)]]D*

Problem Definition

Given an RDF data set D and a SPARQL query Q against D. Consider D* the virtual data set that contains all the data that should be in D. !!

P1) Identifying portions of Q that yield missing values

!

P2) Resolving missing values

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

µ={movieàdbr:The_Wolf_of_Wall_Street_(2013)_film, produceràdbr:Leonardo_DiCaprio} [[(?movie, dbp:producer, ?producer)]]D ∧ ∉

µ={movieàdbr:The_Wolf_of_Wall_Street_(2013)_film, produceràdbr:Leonardo_DiCaprio} [[(?movie, dbp:producer, ?producer)]]D* ∈

7  

Does not belong to DBpedia!

Should belong to DBpedia!

OUR APPROACH: HARE

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 8  

HARE

•  A hybrid machine/human SPARQL query engine that is able to enhance the size of query answers. !

•  Based on a novel RDF completeness model, HARE implements query optimization and execution techniques:!

P1) Identifying portions of queries that yield missing values.

•  HARE resorts to microtask crowdsourcing:!P2) Resolving missing values.

!

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 9  

HARE Architecture

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

SPARQL Query Q, τ"

RDF Completeness

Model !

Tasks!

Human input!

Crowd Knowledge!

Query Engine!

Crowd!

CKB+! CKB-! CKB~!

Query Optimizer!

Microtask Manager!

LOD Cloud!

Query plan!

Crowdsourcing triple patterns!

RDF !Data Set!

Input!

Results for Q"

Bindings from the crowd!

RDF data!

Output!

Aggregated!Human Input!

10  

HARE Architecture

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

SPARQL Query Q, τ"

RDF Completeness

Model !

Tasks!

Human input!

Crowd Knowledge!

Query Engine!

Crowd!

CKB+! CKB-! CKB~!

Query Optimizer!

Microtask Manager!

LOD Cloud!

Query plan!

Crowdsourcing triple patterns!

RDF !Data Set!

Input!

Results for Q"

Bindings from the crowd!

RDF data!

Output!

Aggregated!Human Input!

11  

RDF Completeness Model (1)

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

dbr:!Eric_Fellner!

dbr:!Tim_Bevan!

dbr:!Kevin_Misher!

dbp:producer!rdf:type!

rdf:type!

schema.org:!Movie!

rdf:type!

dbr:!Bad_Hair!

?!

?!

dbp:producer!

dbp:producer!

Movies have producers (e.g. db:The_Interpreter).!

dbr:!Tower_Heist!

dbr:!The_Interpreter!

…  

12  

RDF Completeness Model (2)

①  Predicate multiplicity of an RDF resource!Number of different objects that a resource has for a certain predicate.!

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

MD(dbr:The_Interpreter | dbp:producer) = 3

dbr:!Eric_Fellner!

dbr:!Tim_Bevan!

dbr:!Kevin_Misher!

dbp:producer!

dbr:!The_Interpreter!

13  

RDF Completeness Model (3)

②  Aggregated predicate multiplicity of a class!Given a predicate, median number of distinct objects that have all the resources that belong to a class. !

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

AMD(schema.org:Movies | dbp:producer) = 3

MD(dbr:The_Interpreter | dbp:producer) = 3

MD(dbr:Legal_Eagles | dbp:producer) = 2

14  

RDF Completeness Model (4)

③  Completeness of an RDF resource (with respect to a predicate)!

Given a predicate, the completeness of an RDF resource is determined by the aggregated predicate multiplicity of the classes that it belongs to.!

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

CompD(dbr:The_Interpreter | dbp:producer) =

CompD(dbr:Legal_Eagles | dbp:producer) =

CompD(dbr:Bad_Hair) | dbp:producer) =

33

23

03

①     Computed in !

Computed in !②     

15  

HARE Architecture

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

SPARQL Query Q, τ"

RDF Completeness

Model !

Tasks!

Human input!

Crowd Knowledge!

Query Engine!

Crowd!

CKB+! CKB-! CKB~!

Query Optimizer!

Microtask Manager!

LOD Cloud!

Query plan!

Crowdsourcing triple patterns!

RDF !Data Set!

Input!

Results for Q"

Bindings from the crowd!

RDF data!

Output!

Aggregated!Human Input!

16  

Crowd Knowledge

•  The knowledge collected from the crowd is captured in three knowledge bases:!

•  CKB+, CKB–, CKB~ are fuzzy sets over RDF data

composed of 4-tuples of the form:!

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

CKB = ( , , ) CKB+! CKB–! CKB~!

(subject, predicate, object, membership_degree)

RDF triple

17  

Types of Crowd Knowledge Bases!

Crowd Knowledge

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

(dbr:Bad_Hair, dbp:producer, _:o2, 0.78)!

“Brian Grazer is a producer of Tower Heist.”!(dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)!

“Tower Heist does not have a producer.”!(dbr:Tower_Heist, dbp:producer, _:o1, 0.05)!

“I am not sure if Bad Hair has a producer.”!

CKB+!

CKB-!

CKB~!

18  

Types of Crowd Knowledge Bases!

Crowd Knowledge

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

(dbr:Bad_Hair, dbp:producer, _:o2, 0.78)!

“Brian Grazer is a producer of Tower Heist.”!(dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)!

“Tower Heist does not have a producer.”!(dbr:Tower_Heist, dbp:producer, _:o1, 0.05)!

“I am not sure if Bad Hair has a producer.”!

CKB+!

CKB-!

CKB~!

Contradiction"

Uncertainty!19  

Measuring Contradiction!!

•  Contradiction occurs when triples with the same subject and predicate belong to CKB+ and CKB–.!

•  It is measured as follows:!

•  Contradiction values close to 0.0 indicate high consensus.!

!!!

Contradiction(dbr:Tower_Heist | dbp:producer) = 1 - | 0.9 – 0.05 | ! = 0.15!

Crowd Knowledge

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

(dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)!

(dbr:Tower_Heist, dbp:producer, _:o1, 0.05)!

CKB+!

CKB–!

20  

Measuring Uncertainty!!

•  When a triple belongs to CKB~, the value of the triple object is unknown or uncertain.!

!•  Uncertainty is measured as follows:!

•  Uncertainty values close to 1.0 indicate that the crowd has shown to be unknowledgeable about the fact to be vetted.!

!!!

Uncertainty(dbr:Bad_Hair| dbp:producer) = avg({0.78})! = 0.78!

Crowd Knowledge

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

(dbr:Bad_Hair, dbp:producer, _:o2, 0.78)!CKB~!

21  

HARE Architecture

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

SPARQL Query Q, τ"

RDF Completeness

Model !

Tasks!

Human input!

Crowd Knowledge!

Query Engine!

Crowd!

CKB+! CKB-! CKB~!

Query Optimizer!

Microtask Manager!

LOD Cloud!

Query plan!

Crowdsourcing triple patterns!

RDF !Data Set!

Input!

Results for Q"

Bindings from the crowd!

RDF data!

Output!

Aggregated!Human Input!

22  

Query Optimizer (1)

•  Heuristic-based optimizer that decomposes the BGPs of a SPARQL query into two subsets:!

–  SQD: triples patterns executed against the data set D,"

–  SQCROWD: triple patterns to be crowdsourced.!!

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 23  

Query Optimizer (2)

•  Given a SPARQL query Q:!–  Triple patterns in Q with variables in the subject position

and object position are added to SQCROWD.!

–  The rest of the triple patterns in Q are added to to SQD.!

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

SELECT  DISTINCT  ?movie  WHERE  {  

 ?movie  rdf:type  schema.org:Movie  .  

 ?movie  dbp:producer  ?producer  .  

 ?movie  dct:subject  dbc:Universal_Pictures_film  .  

 ?movie  dct:subject  dbxFilms_shot_in_New_York_City  .  

}        

t1  

t2  

t3  

t4  

SQCROWD  SQD  

SQD  SQD  

24  

•  The optimizer builds a query plan TQ for query Q.!

•  Triple patterns from SQD are grouped into star-shaped sub-queries in a bushy tree [Vidal et al.].!

•  Triple patterns in SQCROWD are added to the plan TQ in a left-linear fashion.!!!

Query Optimizer (3)

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

t1   t3  

t4  

t2  

SQD  

SQCROWD  

25  

Query Engine (1)

•  Executes the query plan TQ.!

•  Sub-queries that are part of SQD are executed against the data set:!

•  For each mapping contained in Ω, the engine instantiates the triple patterns in SQCROWD.!

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

t1   t3  

t4  

SQD  

Ω = {{movieà dbr:Tower_Heist}, {movieà dbr:Legal_Eagles}, …}

26  

Query Engine (2)

Example of an Iteration !•  The engine processes {movieà dbr:Tower_Heist}. !

•  Following the running example:!

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

Comp (dbr:Tower_Heist) | dbp:producer) = = 0.33 13

Contradiction (dbr:Tower_Heist) | dbp:producer) = 0.15

Uncertainty(dbr:Tower_Heist) | dbp:producer) = 0.0 27  

(dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)!

(dbr:Tower_Heist, dbp:producer, _:o1, 0.05)!

CKB+!

CKB–!

(dbr:Bad_Hair, dbp:producer, _:o2, 0.78)!CKB~!

Query Engine (3)

Example of an Iteration !•  The algorithm computes the probability of crowdsourcing

the triple pattern (dbr:Tower_Heist, dbp:producer, ?producer):!

•  α is a score weight between 0.0 and 1.0 (in example 0.5)!

•  If P(CROWD | μ(s), p) is greater than a user threshold τ, then algorithm crowdsources the triple pattern (μ(s), p, o).!

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

P(CROWD | μ(s), p) = α (1 – 0.33) + (1 – α) min{0.15, 1 – 0.0} = 0.41

Estimated incompleteness

Crowd reliability

28  

•  The engine combines mappings obtained from the data set D and mappings from the crowd stored in CKB+.!

•  The query evaluation terminates when all the sub-queries are executed. !

Query Engine (4)

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

The HARE query engine does not increase the time complexity of executing a SPARQL query.!(Theorem 1)

29  

HARE Architecture

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

SPARQL Query Q, τ"

RDF Completeness

Model !

Tasks!

Human input!

Crowd Knowledge!

Query Engine!

Crowd!

CKB+! CKB-! CKB~!

Query Optimizer!

Microtask Manager!

LOD Cloud!

Query plan!

Crowdsourcing triple patterns!

RDF !Data Set!

Input!

Results for Q"

Bindings from the crowd!

RDF data!

Output!

Aggregated!Human Input!

30  

Microtask Manager (1)

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

• Receives triple patterns to crowdsource, for example:!

• Creates human tasks.!!

• Submits tasks to the crowdsourcing platform.!

(dbr:Tower_Heist, dbp:producer, ?p)

31  

Microtask Manager (2)

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

dbr:Tower_Heist, rdfs:label,

dbp:producer, rdfs:label,

dbr:Tower_Heist, foaf:depiction,

dbr:Tower_Heist, dbo:abstract,

dbr:Tower_Heis, foaf:primaryTopic,

HARE exploits the semantics encoded in RDF resources!

32  

Microtask Manager (3)

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 33  

CKB+! CKB-! CKB~!

EXPERIMENTAL STUDY

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 34  

•  Benchmark: 50 queries against (v. 2014).!–  Ten queries in different knowledge domains: ! History, Life Sciences, Movies, Music, and Sports.!

•  Implementation details:!–  HARE is implemented in Python 2.7.6.!–  CrowdFlower is used as crowdsourcing platform.!

•  Crowdsourcing configuration:!–  Four different RDF triples per task, 0.07 US$ per task.!–  At least three judgments were collected per task.!

•  Total RDF triple patterns crowdsourced: 502!•  Total answers collected from the crowd: 1,609!

Experimental Set-Up

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 35  

Results: Size of Query Answer (1)

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

0

5

10

15

20

25

30

35

40

45

Q1 Q2 Q5 Q6 Q3 Q4 Q10 Q8 Q9 Q7

# A

nsw

ers

Queries

Crowd Answers Data Set Answers

Sports!

0

10

20

30

40

50

60

70

80

Q4 Q2 Q3 Q1 Q5 Q4 Q7 Q8 Q9 Q10

# A

nsw

ers

Queries

Crowd Answers Data Set Answers

Music! Life Sciences!

0 20 40 60 80

100 120 140 160 180

Q2 Q4 Q1 Q3 Q5 Q8 Q7 Q9 Q6 Q10

# A

nsw

ers

Queries

Crowd Answers Data Set Answers

1.25 – 2.00! 1.50 – 2.00! 1.08 – 1.92!

HARE identifies sub-queries that produce incomplete answers. Crowdsourcing is a feasible solution to resolve missing values. !

36  

Metric: Number of answers when queries are executed.!

Results: Size of Query Answer (2)

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

0

100

200

300

400

500

Q1 Q2 Q3 Q5 Q6 Q4 Q7 Q8 Q10 Q9 #

Ans

wer

s Queries

Crowd Answers Data Set Answers

0

20

40

60

80

100

120

140

160

Q8 Q3 Q7 Q6 Q5 Q4 Q1 Q2 Q9 Q10

# A

nsw

ers

Queries

Crowd Answers Data Set Answers

Movies! History!

1.05 – 3.13! 1.10 – 1.89!

HARE identifies sub-queries that produce incomplete answers. Crowdsourcing is a feasible solution to resolve missing values. !

37  

Metric: Number of answers when queries are executed.!

Metric: Elapsed time since the first task until the last answer is retrieved.!

Results: Crowd Response Time (1)

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

0 10 20 30 40 50 60 70 80 90

100

0 10 20 30 40 50 60

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10

0 10 20 30 40 50 60 70 80 90

100

0 10 20 30 40 50 60 70 80 90 100

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10

Judg

men

ts c

ompl

eted

(%)!

0 10 20 30 40 50 60 70 80 90

100

0 10 20 30 40 50 60

Time (min)

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10

Sports! Music! Life Sciences!

(12th min.): 77%!Time (min) Time (min)

(12th min.): 82%! (12th min.): 97%!

At the 12th minute after the first task is submitted the crowd produces at least 75% of the answers.!

38  

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10

Results: Crowd Response Time (2)

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

Judg

men

ts c

ompl

eted

(%)!

Movies! History!

(12th min.): 98%!Time (min)

(12th min.): 75%!

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10

Time (min)

At the 12th minute after the first task is submitted the crowd produces at least 75% of the answers.!

39  

Metric: Elapsed time since the first task until the last answer is retrieved.!

Metric: A true positive is a mapping that belongs to the query answer.!

Sports Music Life

Sciences Movies History Q1 1.00 1.00 0.67 0.88 1.00 Q2 1.00 1.00 1.00 0.96 1.00 Q3 1.00 1.00 0.89 0.79 0.67 Q4 0.55 0.67 1.00 1.00 0.96 Q5 0.86 0.67 1.00 1.00 0.95 Q6 0.69 0.83 1.00 1.00 0.96 Q7 1.00 0.63 0.71 1.00 0.57 Q8 1.00 0.67 0.88 0.94 0.72 Q9 0.46 0.73 1.00 1.00 0.64 Q10 0.92 0.49 1.00 1.00 0.95 Avg 0.85 0.77 0.91 0.96 0.84

Results: Quality of Crowd Answers

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

Sports Music Life

Sciences Movies History Q1 1.00 1.00 1.00 0.47 1.00 Q2 1.00 0.29 1.00 1.00 1.00 Q3 1.00 1.00 1.00 1.00 1.00 Q4 0.83 1.00 1.00 1.00 1.00 Q5 1.00 0.86 1.00 1.00 1.00 Q6 1.00 1.00 1.00 1.00 0.96 Q7 1.00 1.00 1.00 1.00 0.84 Q8 1.00 1.00 1.00 1.00 0.78 Q9 1.00 1.00 1.00 1.00 0.92 Q10 1.00 1.00 1.00 1.00 0.98 Avg 0.98 0.91 1.00 0.95 0.95

Recall! Precision!

The crowd exhibits heterogeneous performance within domains. This supports the importance of HARE triple-based approach.!

40  

RELATED WORK

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 41  

Human/computer query processing architectures!

Summary of Related Work

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

Manual specification

Automatically

HARE

CrowdDB [Franklin et al.]: Tables, columns Deco [Park and Widom]: Rules

Qurk [Marcus et al.]: Microtask I/O

HARE relies on the RDF graph and crowd knowledge to resort to crowdsourcing !

Crowdsourcing

42  

Crowdsourcing in other contexts of Data Management (SPARQL- or RDF-based)

Summary of Related Work

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.!

HARE OASSIS [Amsterdamer et al.]

KATARA [Chu et al.]

SPARQL Query Processing

Tabular Data Cleansing

Recommendation System

Mines crowdsourced patterns specified in a SPARQL-like language

Compares tabular data against RDF data sets via crowdsourced mappings

Resorts to crowdsourcing to complete missing

values in RDF data sets

43  

CONCLUSIONS & FUTURE WORK

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 44  

Conclusions

•  HARE: Hybrid query engine against RDF data sets.!

•  Supports microtasks to enhance query answers on-the-fly.!!!•  Experimental results confirmed that:!!!

Future work •  Study further approaches to capture crowd reliability.!•  Consider other quality dimensions on the knowledge collected

from the crowd.!HARE: A Hybrid SPARQL Engine to Enhance

Query Answers via Crowdsourcing – Acosta et al.!

3.13 times!Size of query answer!

Crowd response time!(12th min.): 98%!

Accuracy!0.84 – 0.96!

45  

References •  [Amsterdamer et al.] Y. Amsterdamer, S. B. Davidson, T. Milo, S.

Novgorodov, and A. Somech. OASSIS: query driven crowd mining. In SIGMOD, pages 589–600, 2014. !

•  [Chu et al.] X. Chu, J. Morcos, I. F. Ilyas, M. Ouzzani, P. Papotti, N. Tang, and Y. Ye. Katara: A data cleaning system powered by knowledge bases and crowdsourcing. In SIGMOD, pages 1247–1261, 2015. !

•  [Marcus et al.] A. Marcus, D. R. Karger, S. Madden, R. Miller, and S. Oh. Counting with the crowd. PVLDB, 6(2):109–120, 2012. !

•  [Park and Widom] H. Park and J.Widom. Query optimization over crowdsourced data. PVLDB, 6(10):781–792, 2013. !

•  [Vidal et al.] M.E. Vidal, E. Ruckhaus, T. Lampo, A. Martínez, J. Sierra, and A. Polleres. Efficiently joining group patterns in SPARQL queries. In ESWC, pages 228–242, 2010. !

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 46  

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing

Maribel Acosta, Elena Simperl, Fabian Flöck, Maria-Esther Vidal!

SPARQL Query Q, τ"

RDF Completeness

Model !

Tasks!

Human input!

Crowd Knowledge!

Query Engine!

Crowd!

CKB+! CKB-! CKB~!

Query Optimizer!

Microtask Manager!

LOD Cloud!

Query plan!

Crowdsourcing triple patterns!

RDF !Data Set!

Input!

Results for Q"

Bindings from the crowd!

RDF data!

Output!

Aggregated!Human Input!