natural language questions for the web of data
DESCRIPTION
Natural Language Questions for the Web of Data. Mohamed Yahya , Klaus Berberich , Gerhard Weikum Max Planck Institute for Informatics, Germany Shady Elbassuoni Qatar Computing Research Institute Maya Ramanath Dept. of CSE, IIT-Delhi, India Volker Tresp - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/1.jpg)
Natural Language Questions for the Web of Data
Mohamed Yahya, Klaus Berberich, Gerhard WeikumMax Planck Institute for Informatics, Germany
Shady ElbassuoniQatar Computing Research Institute
Maya RamanathDept. of CSE, IIT-Delhi, India
Volker TrespSiemens AG, Corporate Technology, Munich, Germany
EMNLP 2012
![Page 2: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/2.jpg)
Natural Language Questions for the Web of Data
QNL Translation to
QNL : Natural Language Questions“Which female actor played in Casablanca and is married to a writer who
was born in Rome?”.
QFL: SPARQL 1.0?x hasGender female ?x marriedTo ?w?x isa actor ?w isa writer?x actedIn Casablanca_(film) ?w bornIn Rome
Translation
Problem : This complex query is difficult for the userSoluction : automatically Translate qNL to qFL
![Page 3: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/3.jpg)
Natural Language Questions for the Web of Data
YAGO2 is a huge semantic knowledge base, derived from Wikipedia, WordNet and GeoNames.
Knowledge base
RelationClass Entities
![Page 4: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/4.jpg)
Natural Language Questions for the Web of Data
Architecture of System
• DEANNA (DEep Answers for maNy Naturally Asked questions)
![Page 5: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/5.jpg)
Natural Language Questions for the Web of Data
Phrase detection
A detected phrase p is a pair < Toks, l >Toks : phrasel : label (l {concept, relation})∈
Phrase detectionQNL Phrase
Pr : {<*, relation >}Pc : {<*, concept >}
![Page 6: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/6.jpg)
Natural Language Questions for the Web of Data
Phrase detection
e.q. “Which female actor played in Casablanca and is married to a writer who was born in Rome?”
use a detector that works against a phrase-concept dictionary
concept phrase detection :
phrase-concept dictionary : instances of the means relation in Yago2
![Page 7: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/7.jpg)
Natural Language Questions for the Web of Data
Phrase detection
relation phrase detection : rely on a relation detector based on ReVerb (Fader et al., 2011) with additional POS tag patterns
e.q. “Which female actor played in Casablanca and is married to a writer who was born in Rome?”
![Page 8: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/8.jpg)
Natural Language Questions for the Web of Data
Phrase Mapping
• Two kinds of phrase Mapping:– The mapping of concept phrases– The mapping of relation phrases
Phrase MappingPhrase Mappings
![Page 9: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/9.jpg)
Natural Language Questions for the Web of Data
Phrase Mapping
the mapping of concept phrases:
e.q. “Which female actor played in Casablanca and is married to a writer who was born in Rome?”
phrase-concept dictionary : instances of the means relation in Yago2
also use a detector that works against a phrase-concept dictionary
![Page 10: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/10.jpg)
Natural Language Questions for the Web of Data
Phrase Mapping
the mapping relation phrases: rely on a corpus of textual patterns to relation mappings
e.q. “Which female actor played in Casablanca and is married to a writer who was born in Rome?”
textual patterns relation
![Page 11: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/11.jpg)
Natural Language Questions for the Web of Data
Q-Unit Generation
Q-Unit GenerationMapping Candidategraph
Dependency parsing
q-unit is a triple of sets of phrases
Two parts of q-uint generation step:
![Page 12: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/12.jpg)
Natural Language Questions for the Web of Data
Q-Unit GenerationDependency parsing : identifies triples of tokens:
<trel, targ1, targ2>, where trel, targ1, targ2 q∈ NL
who was born in Rome?
nsubjpass(born-3, who-1)auxpass(born-3, was-2)root(ROOT-0, born-3)prep_in(born-3, Rome-5)
e.q.
born
who Rome
trel
targ1targ2
root
nsubjpass in
<born, who, Rome>,
![Page 13: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/13.jpg)
Natural Language Questions for the Web of Data
Q-Unit Generationq-unit is a triple of sets of phrases
<{prel P∈ r}, {parg1 P∈ c}, {parg2 P∈ c}> ,trel p∈ rel , targ1 p∈ arg1 , and targ2 p∈ arg2 .
bornwas born , ,a writer Rome
PrPc Pc
![Page 14: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/14.jpg)
Natural Language Questions for the Web of Data
Joint Disambiguation
Joint Disambiguation
Rule 2: each phrase is assigned to at most one semantic item
Rule 1: resolves the phrase boundary ambiguity (only nonoverlapping phrases are mapped)
e
![Page 15: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/15.jpg)
Natural Language Questions for the Web of Data
Joint Disambiguation
Disambiguation Graph• Joint disambiguation takes place over a disambiguation
graph DG = (V, E), – V = Vs V∪ p V∪ q
– E = Esim E∪ coh E∪ q
![Page 16: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/16.jpg)
Natural Language Questions for the Web of Data
Joint Disambiguation
Vs : the set of s-node
Vp :
the set of p-node Vrp : the set of relation phrases Vrc : the set of concept phrases
Vq : a set of placeholder nodes for q–units
Disambiguation Graph: Vertices
![Page 17: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/17.jpg)
Disambiguation GraphDisambiguation Graph: Edges
Esim: Esim V⊆ p × Vs
a set of weighted similarity edges
Ecoh: Ecoh V⊆ s × Vs
a set of weighted coherence edges
Eq: Eq V⊆ q × Vp × d d {rel, arg1, ∈arg2}
Q-edges
sim-edges Ecoh:
Natural Language Questions for the Web of Data
![Page 18: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/18.jpg)
Natural Language Questions for the Web of Data
Disambiguation Graph
Edge Weights• Cohsem (Semantic Coherence)
– between two semantic items s1 and s2 as the Jaccard coefficient of their sets of inlinks.
• Three kinds of inlink– InLinks(e)– InLinks(c)– InLinks(r)
![Page 19: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/19.jpg)
Natural Language Questions for the Web of Data
Disambiguation Graph: Edge Weights
Cohsem : inlinks of entity• InLinks(e):
– the set of Yago2 entities whose corresponding Wikipedia pages link to the entity.
• E.q. – InLinks(Casablanca) = {Marwan_al-Shehhi , Ingrid_Bergman, …,
Morocco,…} InLinks(Casablanca)
https://d5gate.ag5.mpi-sb.mpg.de/webyagospo/Browser
![Page 20: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/20.jpg)
Natural Language Questions for the Web of Data
Disambiguation Graph: Edge Weights
Cohsem : inlinks of class
• InLinks(c) = ∪e c ∈ Inlinks(e)• E.q.
– InLinks(wikicategory_Metropolitan_areas_of_Morocco) = InLinks(Casablanca) InLinks(Marrakech) … InLinks(Rabat)∪ ∪ ∪
entities
class
![Page 21: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/21.jpg)
Natural Language Questions for the Web of Data
Disambiguation Graph: Edge Weights
• Cohsem : inlinks of ralation• InLinks(r) = ∪(e1, e2) r ∈ (InLinks(e1) ∩ InLinks(e2))
![Page 22: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/22.jpg)
Natural Language Questions for the Web of Data
Similarity Weights
• Similarity Weights of entities– how often a phrase refers to a certain entity in
Wikipedia.• Similarity Weights of classes– reflects the number of members in a class
• Similarity Weights of relations– reflects the maximum n-gram similarity between
the phrase and any of the relation’s surface forms
![Page 23: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/23.jpg)
Natural Language Questions for the Web of Data
Joint Disambiguation
Disambiguation Graph Processing• The result of disambiguation is a subgraph of the
disambiguation graph, yielding the most coherent mappings. • We employ an ILP(integer linear program) to this end.
ILP e
![Page 24: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/24.jpg)
Natural Language Questions for the Web of Data
Joint Disambiguation : ILPDefinitions :
![Page 25: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/25.jpg)
Natural Language Questions for the Web of Data
Joint Disambiguation : ILP
objective function :
![Page 26: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/26.jpg)
Natural Language Questions for the Web of Data
Joint Disambiguation : ILPConstraints:
![Page 27: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/27.jpg)
Natural Language Questions for the Web of Data
Joint Disambiguation : ILP
resulting subgraph
e
![Page 28: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/28.jpg)
Natural Language Questions for the Web of Data
Query Generation
• not assign subject/object roles in triploids and q-units
• Replacing each semantic class with distinct type-constrained variable
• Example:– “Which singer is married to a singer?”• ?x type singer , ?x marriedTo ?y , and ?y type singer
![Page 29: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/29.jpg)
Natural Language Questions for the Web of Data
Query Generation
• E.q.
e
?x
Replacing each semantic class
?x
?y
Q-uint: arg1 rel arg2
Generation
?x type writer
?y type person
bornIn Rome
?y actedIn Casablanca
?y married ?x
![Page 30: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/30.jpg)
Natural Language Questions for the Web of Data
Evaluation
Three part of Evaluation:• Datasets• Evaluation Metrics• Results & Discussion
![Page 31: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/31.jpg)
Natural Language Questions for the Web of Data
Datasets• Experiments are based on two datasets:
– QALD-1• 1st Workshop on Question Answering over Linked Data (QALD-1)• the context of the NAGA project
– NAGA collection• The NAGA collection is based on linking data from the Yago2 knowledge
base
• Training set:– 23 QALD-1 questions – 43 NAGA questions
• Test set:– 27 QALD-1 questions – 44 NAGA questions
• hyperparameters (α, β, γ) in the ILP objective function.• 19 QALD-1 questions in Test set
![Page 32: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/32.jpg)
Natural Language Questions for the Web of Data
Evaluation Metrics
• evaluated the output of DEANNA at three stages– after the disambiguation of phrases– after the generation of the SPARQL query– after obtaining answers from the underlying linked-data sources
• Judgement– two human assessors– If they were in disagreement
then a third person resolved the judgment.
![Page 33: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/33.jpg)
Natural Language Questions for the Web of Data
Evaluation Metrics
disambiguation stage• looked at each q-node/s-node pair.• whether the mapping was correct or not.• whether any expected mappings were missing.
e
![Page 34: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/34.jpg)
Natural Language Questions for the Web of Data
Evaluation Metrics
query-generation stage• Looked at each triple pattern.• whether the pattern was meaningful for the question or not.• whether any expected triple pattern was missing.e.q. (triple pattern)• ?x bornIn Rome• ?y actedIn Casablanca• ?y married ?x
![Page 35: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/35.jpg)
Natural Language Questions for the Web of Data
query-answering stage
query-answering stage• the judges were asked to identify if the result sets for the
generated queries are satisfactory.
![Page 36: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/36.jpg)
Natural Language Questions for the Web of Data
Results• question q • item set s
• correct(q, s) :– the number of correct items in s
• ideal(q) : the size of the ideal item set• retrieved(q, s) : the number of retrieved
items
• define:• coverage and precision as follows:
– cov(q, s) = correct(q, s) / ideal(q)– prec(q, s) = correct(q, s) / retrieved(q, s).
![Page 37: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/37.jpg)
Natural Language Questions for the Web of Data
•Micro-averaging • aggregates over all assessed items
regardless of the questions to which they belong.
•Macro-averaging • first aggregates the items for the same
question, and then averages the quality measure over all questions.
•For a question q and item set s in one of the stages of evaluation
•correct(q, s) : the number of correct items in s•ideal(q) : the size of the ideal item set•retrieved(q, s) : the number of retrieved items
•define coverage and precision as follows:cov(q, s) = correct(q, s) / ideal(q)
prec(q, s) = correct(q, s) / retrieved(q, s).
![Page 38: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/38.jpg)
Natural Language Questions for the Web of Data
Results
• Example questions, the generated SPARQL queries and their answers
the relation bornIn relates people to cities and not countries in Yago2.
![Page 39: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/39.jpg)
Natural Language Questions for the Web of Data
Results
Relaxation use (Elbassuoni et al., 2009)
![Page 40: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/40.jpg)
Natural Language Questions for the Web of Data
![Page 41: Natural Language Questions for the Web of Data](https://reader035.vdocuments.site/reader035/viewer/2022062805/56814e04550346895dbb7142/html5/thumbnails/41.jpg)
Natural Language Questions for the Web of Data
Conclusions
• Author presented a method for translating natural language questions into structured queries.
• Although author’s model, in principle, leads to high combinatorial complexity, they observed that the Gurobi solver could handle they judiciously designed ILP very efficiently.
• Author’s experimental studies showed very high precision and good coverage of the query translation, and good results in the actual question answers.