querying knowledge graphs by example en;ty...

1
Usability Challenges Querying Knowledge Graphs by Example En;ty Tuples Nandish Jayaram 1+ , Arijit Khan 2 *, Chengkai Li 1 , Xifeng Yan 2 , Ramez Elmasri 1 University of Texas at Arlington 1 University of California, Santa Barbara 2 Related Work N. Jayaram, A. Khan, C. Li, X. Yan and R. Elmasri. Querying knowledge graphs by example enOty tuples, TKDE 2015. N. Jayaram, M. Gupta, A. Khan, C. Li, X. Yan and R. Elmasri. GQBE: Querying knowledge graphs by example enOty tuples, ICDE 2014. Demo URL: hXp://idir.uta.edu/gqbe Technical Details and Demo Query Processing 1 10 100 1000 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 F16 F17 F18 F19 F20 Query Processing Time (secs.) Query GQBE NESS Baseline 12 13 18 10 8 10 8 12 8 8 11 9 7 11 8 9 9 7 10 7 # edges in MQG q Big and complex data: Lack of schema, challenging to users and developers. q How to query the graph, and understand the results. qQuery-by-example in relaOonal databases [Zloof’75]. qKeyword search and keyword-based query formulaOon [Chang et al.’11]. qSet expansion [Wang et al.’07]. XML query relaxaOon [Amer-Yahia et al.’2005]. qExemplar queries [Moin et al., 2014]. Query Interface Experiments Query Graph Discovery Answer Space Modeling Mul;-Tuple Query Graphs qNodes (F) and (HL) are two minimal query trees. qNode (F) corresponds to the sub-graph that connects Jerry Yang and Yahoo through edge founded. qNode (FGHLP) is the MQG, and it corresponds to the enOre query graph on the lem. Ground truth based accuracy comparison of GQBE and NESS. Comparison of GQBE, NESS and Exemplar Queries. The measured parameters are precision-at-k, Mean Average Precision and normalized Discounted Cumula;ve Gain. Pearson Correla;on Coefficient (PCC) between GQBE and Amazon Mechanical Turk Workers, for k=30. MTurk workers were presented with answer pairs and asked for their preference between the two answers in each pair. 20000 such opinions collected. Query processing ;mes of GQBE, NESS and Baseline. Obtain neighborhood graph. Weight edges using heurisOcs and use a greedy approach to obtain a smaller connected MQG. All the super- graphs of a null node are pruned. (GHL), a node that does not have any answers, is a null node. Ini;al LaXce Lower Bound (LB): edge matching score of the corresponding graph. Upper Bound (UB): score of the highest-scored super-graph in the laice. GQBE Architecture MulOple MQGs are merged based on edge labels and vertex label matches. Modified laXce with recomputed upper boundary Knowledge Graphs Input: an example of what the user wants to find. Accuracy of GQBE on mul;-tuple queries, k = 25. LaXce nodes evaluated comparison. Query processing ;me for 2-tuple queries. ParOally funded by: + Now with Pivotal, *Now with Nanyang Technological University, Singapore 1.0E-2 1.0E-1 1.0E+0 1.0E+1 1.0E+2 1.0E+3 1.0E+4 Query Processing Time (secs.) Combined (1,2) Tuple1+Tuple2 0 40 80 120 160 200 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 F16 F17 F18 F19 F20 # of LaXce Nodes Evaluated Query GQBE Baseline 840 12 13 18 10 8 10 8 12 8 8 11 9 7 11 8 9 7 9 7 10 # edges in MQG DATASETS: Freebase (28 M nodes, 47 M edges, 5400 relaOonships) DBpedia (759 K nodes, 2.6 M edges, 9100 relaOonships) QUERIES: 20 Queries on Freebase 8 Queries on DBpedia Query Graph 1 Query Graph 2

Upload: others

Post on 08-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Querying Knowledge Graphs by Example En;ty Tuplesranger.uta.edu/~cli/talks/2016/gqbe-ICDE-TKDEposter.pdfF1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 F16 F17 F18 F19 F20 Query

UsabilityChallenges

QueryingKnowledgeGraphsbyExampleEn;tyTuplesNandishJayaram1+,ArijitKhan2*,ChengkaiLi1,XifengYan2,RamezElmasri1

UniversityofTexasatArlington1 UniversityofCalifornia,SantaBarbara2

RelatedWork

N.Jayaram,A.Khan,C.Li,X.YanandR.Elmasri.QueryingknowledgegraphsbyexampleenOtytuples,TKDE2015.N.Jayaram,M.Gupta,A.Khan,C.Li,X.YanandR.Elmasri.GQBE: Querying knowledge graphs by example enOty tuples,ICDE2014.DemoURL:hXp://idir.uta.edu/gqbe

TechnicalDetailsandDemo

QueryProcessing

1

10

100

1000

F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 F16 F17 F18 F19 F20Que

ryProcessingTime(secs.)

Query

GQBE NESS Baseline

12 13 18 10 8 10 8 12 8 8 11 9 7 11 8 9 9710 7#edgesinMQG

q Bigandcomplexdata:Lackofschema,challengingtousersanddevelopers.

q Howtoquerythegraph,andunderstandtheresults.

q Query-by-exampleinrelaOonaldatabases[Zloof’75].q Keyword search and keyword-based queryformulaOon[Changetal.’11].

q Set expansion [Wang et al.’07]. XML queryrelaxaOon[Amer-Yahiaetal.’2005].

q Exemplarqueries[Moinetal.,2014].

QueryInterface

Experiments

QueryGraphDiscovery

AnswerSpaceModeling

Mul;-TupleQueryGraphs

q Nodes(F)and(HL)aretwominimalquerytrees.q Node (F) corresponds to the sub-graph thatconnects Jerry Yang and Yahoo through edgefounded.

q Node (FGHLP) is theMQG, and it correspondstotheenOrequerygraphonthelem.

GroundtruthbasedaccuracycomparisonofGQBEandNESS.ComparisonofGQBE,NESSandExemplarQueries.Themeasuredparametersareprecision-at-k,MeanAveragePrecisionandnormalizedDiscountedCumula;veGain.

Pearson Correla;on Coefficient (PCC) between GQBE and AmazonMechanicalTurkWorkers, fork=30.MTurkworkerswerepresentedwithanswerpairsandaskedfortheirpreferencebetweenthetwoanswers ineachpair.20000suchopinionscollected.

Queryprocessing;mesofGQBE,NESSandBaseline.

Obtain neighborhoodgraph.

Weight edges usingheurisOcs and use agreedy approach too b t a i n a s m a l l e rconnectedMQG.

A l l the super -graphs of a nullnodearepruned.

(GHL), a node that does not haveanyanswers,isanullnode.

Ini;alLaXce

Lower Bound (LB): edgematching score of thecorrespondinggraph.

Upper Bound (UB): scoreof the highest-scoredsuper-graphinthelaice.

GQBEArchitecture

MulOple MQGs are mergedbased on edge labels andvertexlabelmatches.

Modified laXce withrecomputed upperboundary

KnowledgeGraphs

Input:anexampleofwhattheuserwantstofind.

AccuracyofGQBEonmul;-tuplequeries,k=25.

LaXcenodesevaluatedcomparison.

Queryprocessing;mefor2-tuplequeries.

ParOallyfundedby:+NowwithPivotal,*NowwithNanyangTechnologicalUniversity,Singapore

1.0E-2

1.0E-1

1.0E+0

1.0E+1

1.0E+2

1.0E+3

1.0E+4

Que

ryProcessingTime(secs.)

Combined(1,2)Tuple1+Tuple2

0

40

80

120

160

200

F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 F16 F17 F18 F19 F20

#ofLaX

ceNod

esEvaluated

Query

GQBE Baseline840

12 13 18 10 8 10 8 12 8 8 11 9 7 11 8 9 7 9710#edgesinMQG

DATASETS:Freebase(28Mnodes,47Medges,5400relaOonships)DBpedia(759Knodes,2.6Medges,9100relaOonships)QUERIES:20QueriesonFreebase8QueriesonDBpedia

Q u e r yGraph1

Q u e r yGraph2