![Page 1: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/1.jpg)
Philipp Cimiano, Christina Unger and André Freitas
10th Reasoning Web Summer School
![Page 2: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/2.jpg)
Understand how Question Answering (QA) can address Linked Data consumption challenges.
Provide you a quick overview of the state-of-the-art.
Provide you the fundamental pointers to develop your own QA system.
2
![Page 3: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/3.jpg)
Motivation & Context
Challenges for QA over Linked Data
The Anatomy of a QA System
QA over Linked Data (Case Studies)
Evaluation of QA over Linked Data
Do-it-yourself (DIY): Core Resources
Trends
Take-away Message
3
![Page 4: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/4.jpg)
4
Motivation & Context
![Page 5: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/5.jpg)
Humans are built-in with natural language communication capabilities.
Very natural way for humans to communicate information needs.
The archetypal AI system.
5
![Page 6: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/6.jpg)
6
![Page 7: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/7.jpg)
A research field on its own.
Empirical bias: Focus on the development and evaluation of approaches and systems to answer questions over a knowledge base.
Multidisciplinary: ◦ Natural Language Processing
◦ Information Retrieval
◦ Knowledge Representation
◦ Databases
◦ Linguistics
◦ Artificial Intelligence
◦ Software Engineering
◦ ...
7
![Page 8: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/8.jpg)
From the QA expert perspective ◦ QA depends on mastering different semantic computing
techniques.
8
![Page 9: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/9.jpg)
9
QA System
![Page 10: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/10.jpg)
Keyword Search: ◦ User still carries the major efforts in interpreting the data.
◦ Satisfying information needs may depend on multiple search operations.
◦ Answer-driven information access.
◦ Input: Keyword search
Typically specification of simpler information needs.
◦ Output: documents, structured data.
QA: ◦ Delegates more ‘interpretation effort’ to the machines.
◦ Query-driven information access.
◦ Input: natural language query
Specification of complex information needs.
◦ Output: direct answer.
10
![Page 11: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/11.jpg)
Structured Queries: ◦ A priori user effort in understanding the schemas behind
databases.
◦ Effort in mastering the syntax of a query language.
◦ Satisfying information needs may depend on multiple querying operations.
◦ Input: Structured query
◦ Output: data records, aggregations, etc
QA: ◦ Delegates more ‘semantic interpretation effort’ to the
machine.
◦ Input: natural language query
◦ Output: direct natural language answer
11
![Page 12: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/12.jpg)
Keyword search: ◦ Simple information needs.
◦ Vocabulary redundancy (large document collections, Web).
Structured queries: ◦ Demand for absolute precision/recall guarantees.
◦ Small & centralized schemas.
◦ More data volume/smaller schema size.
QA: ◦ Heterogeneous and schema-less data.
◦ Specification of complex information needs.
◦ More automated semantic interpretation.
12
![Page 13: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/13.jpg)
13
![Page 14: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/14.jpg)
14
![Page 15: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/15.jpg)
15
![Page 16: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/16.jpg)
![Page 17: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/17.jpg)
QA is usually associated with the delegation of more of the ‘interpretation effort’ to the machines.
QA, keyword search and structured queries are complementary data access perspectives.
QA making its way to the industry.
17
![Page 18: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/18.jpg)
18
Challenges for QA over Linked Data
![Page 19: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/19.jpg)
19
![Page 20: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/20.jpg)
Example: What is the currency of the Czech Republic? SELECT DISTINCT ?uri WHERE { res:Czech_Republic dbo:currency ?uri . } Main challenges:
Mapping natural language expressions to vocabulary elements (accounting for lexical and structural differences).
Handling meaning variations (e.g. ambiguous or vague expressions, anaphoric expressions).
20
![Page 21: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/21.jpg)
URIs are language independent identifiers.
Their only actual connection to natural language is by the labels that are attached to them.
dbo:spouse rdfs:label “spouse”@en , “echtgenoot”@nl .
Labels, however, do not capture lexical variation:
wife of
husband of
married to
...
21
![Page 22: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/22.jpg)
Which Greek cities have more than 1 million inhabitants?
SELECT DISTINCT ?uri
WHERE {
?uri rdf:type dbo:City .
?uri dbo:country res:Greece .
?uri dbo:populationTotal ?p .
FILTER (?p > 1000000)
}
22
![Page 23: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/23.jpg)
Often the conceptual granularity of language does not coincide with that of the data schema.
When did Germany join the EU?
SELECT DISTINCT ?date
WHERE {
res:Germany dbp:accessioneudate ?date .
}
Who are the grandchildren of Bruce Lee?
SELECT DISTINCT ?uri
WHERE {
res:Bruce_Lee dbo:child ?c .
?c dbo:child ?uri .
}
23
![Page 24: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/24.jpg)
In addition, there are expressions with a fixed, dataset-independent meaning.
Who produced the most films?
SELECT DISTINCT ?uri
WHERE {
?x rdf:type dbo:Film .
?x dbo:producer ?uri .
}
ORDER BY DESC(COUNT(?x))
OFFSET 0 LIMIT 1
24
![Page 25: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/25.jpg)
Different datasets usually follow different schemas, thus provide different ways of answering an information need.
Example:
25
![Page 26: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/26.jpg)
The meaning of expressions like the verbs to be, to have, and prepositions of, with, etc. strongly depends on the linguistic context.
Which museum has the most paintings?
?museum dbo:exhibits ?painting .
Which country has the most caves?
?cave dbo:location ?country .
26
![Page 27: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/27.jpg)
The number of non-English actors on the web is growing substantially. ◦ Accessing data.
◦ Creating and publishing data.
Semantic Web: In principle very well suited for multilinguality, as URIs are
language-independent.
But adding multilingual labels is not common practice (less than a quarter of the RDF literals have language tags, and most of those tags are in English).
27
![Page 28: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/28.jpg)
Requirement: Completeness and accuracy
(Wrong answers are worse than no answers)
In the context of the Semantic Web:
QA systems need to deal with heterogeneous and imperfect data. ◦ Datasets are often incomplete.
◦ Different datasets sometimes contain duplicate information, often using different vocabularies even when talking about the same things.
◦ Datasets can also contain conflicting information and inconsistencies.
28
![Page 29: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/29.jpg)
Data is distributed among a large collection of interconnected datasets.
Example: What are side effects of drugs used for the treatment of Tuberculosis?
SELECT DISTINCT ?x
WHERE {
disease:1154 diseasome:possibleDrug ?d1.
?d1 a drugbank:drugs .
?d1 owl:sameAs ?d2.
?d2 sider:sideEffect ?x.
}
29
![Page 30: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/30.jpg)
Requirement: Real-time answers, i.e. low processing time.
In the context of the Semantic Web:
Datasets are huge. ◦ There are a lot of distributed datasets that might be
relevant for answering the question.
◦ Reported performance of current QA systems amounts to ~20-30 seconds per question (on one dataset).
30
![Page 31: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/31.jpg)
Bridge the gap between natural languages and data.
Deal with incomplete, noisy and heterogeneous datasets.
Scale to a large number of huge datasets.
Use distributed and interlinked datasets.
Integrate structured and unstructured data.
Low maintainability costs (easily adaptable to new datasets and domains).
31
![Page 32: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/32.jpg)
The Anatomy of a QA System
32
![Page 33: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/33.jpg)
Categorization of question, answer and data types.
Important for: ◦ What information in the question can be used? ◦ Scoping the QA system. ◦ Understanding the challenges before attacking the
problem.
Based on: ◦ Chin-Yew Lin: Question Answering. ◦ Farah Benamara: Question Answering Systems: State of
the Art and Future Directions.
33
![Page 34: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/34.jpg)
Natural Language Interfaces (NLI) ◦ Input: Natural language queries
◦ Output:
QA: Direct answers.
NLI: Database records, text snippets, documents, data visualizations.
34
NLI
QA
![Page 35: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/35.jpg)
What is in the question?
35
![Page 36: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/36.jpg)
The part of the question that says what is being asked: ◦ Wh-words:
who, what, which, when, where, why, and how
◦ Wh-words + nouns, adjectives or adverbs: “which party …”, “which actress …”, “how long …”, “how tall
…”.
36
![Page 37: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/37.jpg)
Question focus is the property or entity that is being sought by the question ◦ “In which city was Barack Obama born?”
◦ “What is the population of Galway?”
Question topic: What the question is generally about ◦ “What is the height of Mount Everest?”
(geography, mountains)
◦ “Which organ is affected by the Meniere’s disease?”
(medicine)
37
![Page 38: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/38.jpg)
Useful for distinguishing different processing strategies ◦ FACTOID:
PREDICATIVE QUESTIONS: “Who was the first man in space?”
“What is the highest mountain in Korea?”
“How far is Earth from Mars?”
“When did the Jurassic Period end?”
“Where is the Taj Mahal?”
LIST: “Give me all cities in Germany.”
SUPERLATIVE: “What is the highest mountain?”
YES-NO: “Was Margaret Thatcher a chemist?”
38
![Page 39: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/39.jpg)
Useful for distinguishing different processing strategies ◦ OPINION:
“What do most Americans think of gun control?”
◦ CAUSE & EFFECT: “What is the most frequent cause for lung cancer?”
◦ PROCESS: “How do I make a cheese cake?”
◦ EXPLANATION & JUSTIFICATION: “Why did the revenue of IBM drop?”
◦ ASSOCIATION QUESTION: “What is the connection between Barack Obama and
Indonesia?”
◦ EVALUATIVE OR COMPARATIVE QUESTIONS: “What is the difference between impressionism and
expressionism?”
39
![Page 40: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/40.jpg)
Usually: ◦ Rules + Part-of-Speech Tags + Regular Expressions
... goes a long way!
40
![Page 41: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/41.jpg)
What is in the data ?
41
![Page 42: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/42.jpg)
Structure level: ◦ Structured data.
◦ Semi-structured data.
◦ Unstructured data.
Data source distribution: ◦ Single dataset (centralized).
◦ Enumerated list of multiple, distributed datasets.
◦ Web-scale.
42
![Page 43: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/43.jpg)
Domain Scope: ◦ Open domain ◦ Domain specific
Data Type: ◦ Structured Data ◦ Text ◦ Image ◦ Sound ◦ Video
Multi-modal QA (both input and output) ◦ E.g. visual, voice modalities
43
![Page 44: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/44.jpg)
What is in the answer?
44
![Page 45: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/45.jpg)
The class of object sought by the question:
Entity: event, color, animal, plant,. . .
Description, Explanation & Justification : definition, manner, reason,. . . (“How, why …”)
Human: group, individual,. . . (“Who …”)
Location: city, country, mountain,. . . ( “Where …”)
Numeric: count, distance, size,. . . (“How many how far, how long …”)
Temporal: date, time, …(from “When …”)
Abbreviation
45
![Page 46: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/46.jpg)
Long answers ◦ Definition/justification based.
Short answers ◦ Phrases.
◦ Named entities, numbers, aggregate, yes/no.
46
![Page 47: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/47.jpg)
Relevance: The level in which the answer addresses users information needs.
Correctness: The level in which the answer is factually correct.
Conciseness: The answer should not contain irrelevant information.
Completeness: The answer should be complete.
Simplicity: The answer should be easy to interpret.
Justification: Sufficient context should be provided to support the data consumer in the determination of the query correctness.
47
![Page 48: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/48.jpg)
Right: The answer is correct and complete.
Inexact: The answer is incomplete or incorrect.
Unsupported: The answer does not have an appropriate evidence/justification.
Wrong: The answer is not appropriate for the question.
48
![Page 49: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/49.jpg)
What is in the QA system?
49
![Page 50: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/50.jpg)
Simple Extraction: Direct extraction of snippets from the original document(s) / data records.
Combination: Combines excerpts from multiple sentences, documents / multiple data records, databases.
Summarization: Synthesis from large texts / data collections.
Operational/functional: Depends on the application of functional operators.
Reasoning: Depends on the application of an inference process over the original data.
50
![Page 51: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/51.jpg)
Semantic Tractability (Popescu et al., 2003): Lexical and syntactic conditions for soundness and completeness.
Semantic Resolvability (Freitas et al., 2014): Vocabulary mapping types between the query and the answer.
Answer Locality (Webber et al., 2002): Whether answer fragments are distributed across different document fragments / documents or datasets/dataset records.
Derivability (Webber et al., 2002): Dependent if the answer is explicit or implicit. Level of reasoning dependency.
Semantic Complexity: Level of ambiguity and discourse/data heterogeneity.
51
![Page 52: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/52.jpg)
52
![Page 53: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/53.jpg)
Data pre-processing: Pre-processes the database data (includes indexing, data cleaning, feature extraction).
Question Analysis: Performs syntactic analysis and detects/extracts the core features of the question (NER, answer type, etc).
Data Matching: Matches terms in the question to entities in the data.
Query Construction: Generates structured query candidates considering the question-data mappings and the syntactic constraints in the query and in the database.
Scoring: Data matching and the query construction components output several candidates that need to be scored and ranked according to certain criteria.
Answer Retrieval & Extraction: Executes the query and extracts the natural language answer from the result set.
53
![Page 54: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/54.jpg)
55
QA over Linked Data
(Case Studies)
![Page 55: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/55.jpg)
Aqualog & PowerAqua (Lopez et al., 2006) ◦ Querying the Semantic Web
ORAKEL & Pythia (Cimiano et al., 2007; Unger & Cimiano, 2011) ◦ Ontology-specific question answering
TBSL (Unger et al., 2012) ◦ Template-based question answering
Kwiatowski et al. 2013 ◦ Scaling Semantic Parsers with On-the-fly Ontology
Matching
Treo (Freitas et al. 2011, 2014) ◦ Schema-agnostic querying using distributional
semantics
IBM Watson (Ferrucci et al., 2010) ◦ Large-scale evidence-based model for QA
56
![Page 56: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/56.jpg)
QuestIO & Freya (Damljanovic et al. 2010)
QAKIS (Cabrio et al. 2012)
Yahya et al., 2013
57
![Page 57: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/57.jpg)
58
Aqualog & PowerAqua (Lopez et al. 2006)
![Page 58: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/58.jpg)
Key contributions: ◦ Pioneer work on the QA over Semantic Web
data. ◦ Semantic similarity mapping.
Terminological Matching: ◦ WordNet-based ◦ Ontology-based ◦ String similarity ◦ Sense-based similarity matcher
Evaluation: QALD (2011). Extends the AquaLog system.
59
![Page 59: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/59.jpg)
60
![Page 60: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/60.jpg)
Two words are strongly similar if any of the following holds: ◦ 1. They have a synset in common (e.g. “human” and
“person”) ◦ 2. A word is a hypernym/hyponym in the taxonomy of
the other word. ◦ 3. If there exists an allowable “is-a” path connecting a
synset associated with each word. ◦ 4. If any of the previous cases is true and the
definition (gloss) of one of the synsets of the word (or its direct hypernyms/hyponyms) includes the other word as one of its synonyms, we said that they are highly similar.
61
Lopez et al. 2006
![Page 61: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/61.jpg)
62
![Page 62: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/62.jpg)
63
ORAKEL (Cimiano et al, 2007) & Pythia
(Unger & Cimiano, 2011)
![Page 63: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/63.jpg)
Key contributions: ◦ Using ontologies to interpret user questions
◦ Relies on a deep linguistic analysis that returns semantic representations aligned to the ontology vocabulary and structure
Evaluation: Geobase
64
![Page 64: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/64.jpg)
65
Ontology-based QA: ◦ Ontologies play a central role in interpreting user questions
◦ Output is a meaning representation that is aligned to the ontology underlying the dataset that is queried
◦ ontological knowledge is used for drawing inferences, e.g. for resolving ambiguities
Grammar-based QA: ◦ Rely on linguistic grammars that assign a syntactic and
semantic representation to lexical units
◦ Advantage: can deal with questions of arbitrary complexity
◦ Drawback: brittleness (fail if question cannot be parsed because expressions or constructs are not covered by the grammar)
![Page 65: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/65.jpg)
66
Ontology-independent entries ◦ mostly function words
quantifiers (some, every, two)
wh-words (who, when, where, which, how many)
negation (not)
◦ manually specified and re-usable for all domains
Ontology-specific entries ◦ content words and phrases corresponding to
concepts and properties in the ontology
◦ automatically generated from an ontology lexicon
![Page 66: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/66.jpg)
Aim: capture rich and structured linguistic information about how ontology elements are lexicalized in a particular language
lemon (Lexicon Model for Ontologies)
http://lemon-model.net ◦ meta-model for describing ontology lexica with
RDF
◦ declarative (abstracting from specific syntactic and semantic theories)
◦ separation of lexicon and ontology
67
![Page 67: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/67.jpg)
Semantics by reference: ◦ The meaning of lexical entries is specified by
pointing to elements in the ontology.
Example:
68
![Page 68: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/68.jpg)
69
![Page 69: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/69.jpg)
Which cities have more than three universities?
SELECT DISTINCT ?x WHERE {
?x rdf:type dbo:City .
?y rdf:type dbo:University .
?y dbo:city ?x .
}
GROUP BY ?y
HAVING (COUNT(?y) > 3) 70
![Page 70: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/70.jpg)
71
TBSL (Unger et al., 2012)
![Page 71: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/71.jpg)
Key contributions: ◦ Constructs a query template that directly
mirrors the linguistic structure of the question
◦ Instantiates the template by matching natural language expressions with ontology concepts
Evaluation: QALD 2012
72
![Page 72: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/72.jpg)
In order to understand a user question, we need to understand:
The words (dataset-specific) Abraham Lincoln → res:Abraham Lincoln
died in → dbo:deathPlace
The semantic structure (dataset-independent)
who → SELECT ?x WHERE { … }
the most N → ORDER BY DESC(COUNT(?N)) LIMIT 1
more than i N → HAVING COUNT(?N) > i
73
![Page 73: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/73.jpg)
Goal: An approach that combines both an analysis of the semantic structure and a mapping of words to URIs.
Two-step approach: ◦ 1. Template generation
Parse question to produce a SPARQL template that directly mirrors the structure of the question, including filters and aggregation operations.
◦ 2. Template instantiation
Instantiate SPARQL template by matching natural language expressions with ontology concepts using statistical entity identification and predicate detection.
74
![Page 74: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/74.jpg)
SPARQL template:
SELECT DISTINCT ?x WHERE {
?y rdf:type ?c .
?y ?p ?x .
}
ORDER BY DESC(COUNT(?y))
OFFSET 0 LIMIT 1
?c CLASS [films]
?p PROPERTY [produced]
Instantiations:
?c = <http://dbpedia.org/ontology/Film>
?p = <http://dbpedia.org/ontology/producer>
75
![Page 75: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/75.jpg)
76
![Page 76: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/76.jpg)
77
![Page 77: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/77.jpg)
1. Natural language question is tagged with part-of-speech information.
2. Based on POS tags, grammar entries are built on the fly.
◦ Grammar entries are pairs of:
tree structures (Lexicalized Tree Adjoining Grammar)
semantic representations (ext. Discourse Representation Structures)
3. These lexical entries, together with domain-independent lexical entries, are used for parsing the question (cf. Pythia).
4. The resulting semantic representation is translated into a SPARQL template.
78
![Page 78: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/78.jpg)
Domain-independent: who, the most
Domain-dependent: produced/VBD, films/NNS
SPARQL template 1: SELECT DISTINCT ?x WHERE {
?x ?p ?y .
?y rdf:type ?c .
}
ORDER BY DESC(COUNT(?y)) LIMIT 1 ?c CLASS [films]
?p PROPERTY [produced]
SPARQL template 2:
SELECT DISTINCT ?x WHERE {
?x ?p ?y .
}
ORDER BY DESC(COUNT(?y)) LIMIT 1 ?p PROPERTY [films]
79
![Page 79: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/79.jpg)
80
![Page 80: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/80.jpg)
1. For resources and classes, a generic approach to entity detection is applied: ◦ Identify synonyms of the label using WordNet. ◦ Retrieve entities with a label similar to the slot label based
on string similarities (trigram, Levenshtein and substring similarity).
2. For property labels, the label is additionally
compared to natural language expressions stored in the BOA pattern library.
3. The highest ranking entities are returned as candidates for filling the query slots.
81
![Page 81: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/81.jpg)
?c CLASS [films]
<http://dbpedia.org/ontology/Film>
<http://dbpedia.org/ontology/FilmFestival> ... ?p PROPERTY [produced]
<http://dbpedia.org/ontology/producer>
<http://dbpedia.org/property/producer>
<http:// dbpedia.org/ontology/wineProduced>
82
![Page 82: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/82.jpg)
83
![Page 83: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/83.jpg)
1. Every entity receives a score considering string similarity and prominence.
2. The score of a query is then computed as the average of the scores of the entities used to fill its slots.
3. In addition, type checks are performed: ◦ For all triples ?x rdf:type <class>, all query triples ?x p e
and e p ?x are checked w.r.t. whether domain/range of p is consistent with <class>.
4. Of the remaining queries, the one with highest score that returns a result is chosen.
84
![Page 84: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/84.jpg)
SELECT DISTINCT ?x WHERE {
?x <http://dbpedia.org/ontology/producer> ?y .
?y rdf:type <http://dbpedia.org/ontology/Film> .
}
ORDER BY DESC(COUNT(?y)) LIMIT 1
Score: 0.76
SELECT DISTINCT ?x WHERE {
?x <http://dbpedia.org/ontology/producer> ?y .
?y rdf:type <http://dbpedia.org/ontology/FilmFestival>.
}
ORDER BY DESC(COUNT(?y)) LIMIT 1
Score: 0.60
85
![Page 85: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/85.jpg)
The created template structure does not always coincide with how the data is actually modelled.
Considering all possibilities of how the data could be modelled leads to a big amount of templates (and even more queries) for one question.
86
![Page 86: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/86.jpg)
87
Kwiatowski et al., 2013 Scaling Semantic Parsers with On-the-fly
Ontology Matching
![Page 87: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/87.jpg)
Recent approaches view interpretation as a machine
translation problem (translating natural language
questions into meaning representations or SPARQL
queries).
Example:
Construct all possible interpretations and learn a model to
score and rank them (from either question-query pairs or question-answer pairs).
QA over Freebase.
88
![Page 88: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/88.jpg)
1. datatset-independent probabilistic CCG
parsing: ◦ mapping sentences to underspecified meaning
representations (containing generic logical constants not
yet aligned to any ontology/dataset schema)
◦ one grammar for all domains (with domain-independent
entries as well as generic entries built on the basis of
POS)
E.g. ◦ city: N lambda x.city(x)
◦ visit: S\NP/NP lambda x y exists e . visit(x,y,e)
89
![Page 89: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/89.jpg)
2. ontology matching: ◦ structural matching (transformations of the meaning representations)
◦ Collapsing, e.g. public(x) and library(x) and of(x,NewYork,e) ->
PublicLibraryOfNewYork
◦ Expansion, e.g. discover(x,y,e) -> discover(x,e) and discover'(y,e)
◦ Constant matching (replacing all generic constants with constants
from the ontology)
This leads to a lot of possible interpretations.
learn function that ranks derivations, then prune and pick
the highest ranked one
90
![Page 90: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/90.jpg)
Estimate a linear model for scoring derivations (including
all parsing and matching decisions) from question-answer
pairs
Weighted features include: ◦ parse features (e.g. pairings of words with categories)
◦ structural features (e.g. types of constants, number of domain-
independent constants) --> allows adaptation to knowledge base
◦ lexical features (e.g. similarity of NL string and ontology constant
based on stem and synonyms)
◦ knowledge base features (e.g. violation of domain/range restrictions)
Weights are learned so they support separation of
derivations that yield correct answers from those that don't
91
![Page 91: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/91.jpg)
92
Treo (Freitas et al. 2011, 2014)
![Page 92: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/92.jpg)
Key contributions: ◦ Distributional semantic relatedness matching
model.
◦ Distributional model for QA.
Terminological Matching: ◦ Explicit Semantic Analysis (ESA)
◦ String similarity + node cardinality
Evaluation: QALD (2011)
93
![Page 93: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/93.jpg)
Treo (Irish): Direction
94
![Page 94: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/94.jpg)
95
![Page 95: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/95.jpg)
96
![Page 96: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/96.jpg)
97
![Page 97: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/97.jpg)
98
![Page 98: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/98.jpg)
99
Data
Structured Representation
Inference
![Page 99: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/99.jpg)
• Most semantic models have dealt with particular types of constructions, and have been carried out under very simplifying assumptions, in true lab conditions.
• If these idealizations are removed it is not clear at all that modern semantics can give a full account of all but the simplest models/statements.
Sahlgren, 2013
Formal World
Real World
10
0
Baroni et al. 2013
![Page 100: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/100.jpg)
“Words occurring in similar (linguistic) contexts are semantically related.”
If we can equate meaning with context, we can simply record the contexts in which a word occurs in a collection of texts (a corpus).
This can then be used as a surrogate of its semantic representation.
101
![Page 101: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/101.jpg)
c1
child
husband
spouse
cn
c2
(number of times that the words occur in c1)
0.7
0.5
102
![Page 102: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/102.jpg)
θ
c1
child
husband
spouse
cn
c2
103
![Page 103: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/103.jpg)
Query Planner
Ƭ
Large-scale
unstructured data
Database
Query Analysis Query Query Features
Query Plan
104
![Page 104: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/104.jpg)
Query Planner
Ƭ
Wikipedia
RDF
Query Analysis Query Query Features
Query Plan
105
![Page 105: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/105.jpg)
106
![Page 106: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/106.jpg)
The vector space is
segmented by the
instances
107
![Page 107: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/107.jpg)
108
![Page 108: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/108.jpg)
109
![Page 109: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/109.jpg)
Instance search ◦ Proper nouns
◦ String similarity + node cardinality
Class (unary predicate) search ◦ Nouns, adjectives and adverbs
◦ String similarity + Distributional semantic relatedness
Property (binary predicate) search ◦ Nouns, adjectives, verbs and adverbs
◦ Distributional semantic relatedness
Navigation
Extensional expansion ◦ Expands the instances associated with a class.
Operator application ◦ Aggregations, conditionals, ordering, position
Disjunction & Conjunction
Disambiguation dialog (instance, predicate)
110
![Page 110: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/110.jpg)
Minimize the impact of Ambiguity, Vagueness, Synonymy. Address the simplest matchings first (heuristics).
Semantic Relatedness as a primitive operation.
Distributional semantics as commonsense knowledge.
Lightweight syntactic constraints
111
![Page 111: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/111.jpg)
Transform natural language queries into triple patterns.
“Who is the daughter of Bill Clinton married to?”
112
![Page 112: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/112.jpg)
Step 1: POS Tagging ◦ Who/WP ◦ is/VBZ ◦ the/DT ◦ daughter/NN ◦ of/IN ◦ Bill/NNP ◦ Clinton/NNP ◦ married/VBN ◦ to/TO ◦ ?/.
113
![Page 113: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/113.jpg)
Step 2: Core Entity Recognition ◦ Rules-based: POS Tag + TF/IDF
Who is the daughter of Bill Clinton married to? (PROBABLY AN INSTANCE)
114
![Page 114: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/114.jpg)
Step 3: Determine answer type ◦ Rules-based.
Who is the daughter of Bill Clinton married to? (PERSON)
115
![Page 115: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/115.jpg)
Step 4: Dependency parsing ◦ dep(married-8, Who-1) ◦ auxpass(married-8, is-2) ◦ det(daughter-4, the-3) ◦ nsubjpass(married-8, daughter-4) ◦ prep(daughter-4, of-5) ◦ nn(Clinton-7, Bill-6) ◦ pobj(of-5, Clinton-7) ◦ root(ROOT-0, married-8) ◦ xcomp(married-8, to-9)
116
![Page 116: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/116.jpg)
Step 5: Determine Partial Ordered Dependency Structure (PODS) ◦ Rules based.
Remove stop words.
Merge words into entities.
Reorder structure from core entity position.
Bill Clinton daughter married to
(INSTANCE)
Person
ANSWER
TYPE
QUESTION FOCUS
117
![Page 117: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/117.jpg)
Step 5: Determine Partial Ordered Dependency Structure (PODS) ◦ Rules based.
Remove stop words.
Merge words into entities.
Reorder structure from core entity position.
Bill Clinton daughter married to
(INSTANCE)
Person
(PREDICATE) (PREDICATE) Query Features
118
![Page 118: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/118.jpg)
Map query features into a query plan.
A query plan contains a sequence of:
◦ Search operations.
◦ Navigation operations.
(INSTANCE) (PREDICATE) (PREDICATE) Query Features
(1) INSTANCE SEARCH (Bill Clinton)
(2) DISAMBIGUATE ENTITY TYPE
(3) GENERATE ENTITY FACETS
(4) p1 <- SEARCH RELATED PREDICATE (Bill Clintion, daughter)
(5) e1 <- GET ASSOCIATED ENTITIES (Bill Clintion, p1)
(6) p2 <- SEARCH RELATED PREDICATE (e1, married to)
(7) e2 <- GET ASSOCIATED ENTITIES (e1, p2)
(8) POST PROCESS (Bill Clintion, e1, p1, e2, p2)
Query Plan
119
![Page 119: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/119.jpg)
Bill Clinton daughter married to Person
:Bill_Clinton
120
![Page 120: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/120.jpg)
Bill Clinton daughter married to Person
:Bill_Clinton :Chelsea_Clinton :child
:Baptists :religion
:Yale_Law_School
:almaMater
...
(PIVOT ENTITY)
(ASSOCIATED
TRIPLES)
121
![Page 121: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/121.jpg)
Bill Clinton daughter married to Person
:Bill_Clinton :Chelsea_Clinton :child
:Baptists :religion
:Yale_Law_School
:almaMater
...
sem_rel(daughter,child)=0.054
sem_rel(daughter,child)=0.004
sem_rel(daughter,alma mater)=0.001
122
![Page 122: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/122.jpg)
Bill Clinton daughter married to Person
:Bill_Clinton :Chelsea_Clinton :child
123
![Page 123: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/123.jpg)
Computation of a measure of “semantic proximity” between two terms.
Allows a semantic approximate matching between query terms and dataset terms.
It supports a commonsense reasoning-like behavior based on the knowledge embedded in the corpus.
124
![Page 124: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/124.jpg)
Bill Clinton daughter married to Person
:Bill_Clinton :Chelsea_Clinton :child
(PIVOT ENTITY)
126
![Page 125: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/125.jpg)
Bill Clinton daughter married to Person
:Bill_Clinton :Chelsea_Clinton :child
:Mark_Mezvinsky :spouse
127
![Page 126: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/126.jpg)
130
![Page 127: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/127.jpg)
131
![Page 128: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/128.jpg)
132
![Page 129: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/129.jpg)
133
![Page 130: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/130.jpg)
What is the highest mountain?
(CLASS) (OPERATOR) Query Features
mountain - highest
PODS
134
![Page 131: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/131.jpg)
Mountain highest
:Mountain :typeOf
(PIVOT ENTITY)
135
![Page 132: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/132.jpg)
Mountain highest
:Mountain :Everest
:typeOf
(PIVOT ENTITY)
:K2 :typeOf
136
![Page 133: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/133.jpg)
Mountain highest
:Mountain :Everest :typeOf
(PIVOT ENTITY)
:K2 :typeOf
:elevation
:location
:deathPlaceOf
137
![Page 134: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/134.jpg)
Mountain highest
:Mountain :Everest :typeOf
(PIVOT ENTITY)
:K2 :typeOf
:elevation
:elevation
8848 m
8611 m
138
![Page 135: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/135.jpg)
Mountain highest
:Mountain :Everest
:typeOf
(PIVOT ENTITY)
:K2 :typeOf
:elevation
:elevation
8848 m
8611 m
SORT
TOP_MOST
139
![Page 136: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/136.jpg)
140
![Page 137: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/137.jpg)
Semantic approximation in databases (as in any IR system): semantic best-effort.
Need some level of user disambiguation, refinement and feedback.
As we move in the direction of semantic systems we should expect the need for principled dialog mechanisms (like in human communication).
Pull the the user interaction back into the system.
141
![Page 138: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/138.jpg)
142
![Page 139: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/139.jpg)
143
![Page 140: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/140.jpg)
144
IBM Watson (Ferrucci et al., 2010)
![Page 141: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/141.jpg)
Key contributions: ◦ Evidence-based QA system.
◦ Complex and high performance QA Pipeline.
Uses more than 50 scoring components that produce scores which range from probabilities and counts to categorical features.
◦ Major cultural impact:
Before: QA as AI vision, academic exercise.
After: QA as an attainable software architecture in the short term.
Evaluation: Jeopardy! Challenge
145
![Page 142: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/142.jpg)
146
“Rap” Sheet
This archaic term for a mischievous or annoying
child can also mean a rogue or scamp.
Rapscallion
Can be more challenging from a question analysis perspective
Higher specificity from an Information Retrieval perpective
![Page 143: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/143.jpg)
Question analysis: includes shallow and deep parsing, extraction of logical forms, semantic role labelling, coreference resolution, relations extraction, named entity recognition, among others.
Question decomposition: decomposition of the question into separate phrases, which will generate constraints that need to be satisfied by evidence from the data.
148
![Page 144: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/144.jpg)
Ferrucci et al. 2010
Question
Question & Topic Analysis
Question Decomposition
149
![Page 145: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/145.jpg)
Ferrucci et al. 2010
150
“Rap” Sheet
This archaic term for a mischievous or annoying
child can also mean a rogue or scamp.
This archaic term for a mischievous or
annoying child.
This term can also mean a rogue or
scamp.
Rapscallion
![Page 146: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/146.jpg)
Hypothesis generation:
◦ Primary search Document and passage retrieval
SPARQL queries are used over triple stores.
◦ Candidate answer generation (maximizing recall). Information extraction techniques are applied to the
search results to generate candidate answers.
Soft filtering: Application of lightweight (less resource intensive) scoring algorithms to a larger set of initial candidates to prune the list of candidates before the more intensive scoring components.
151
![Page 147: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/147.jpg)
Ferrucci et al. 2010
Question
Primary
Search
Candidate
Answer
Generation
Hypothesis Generation
Answer
Sources
Question & Topic Analysis
Question Decomposition
Hypothesis Generation
Hypothesis and Evidence Scoring
152
![Page 148: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/148.jpg)
Hypothesis and evidence scoring:
◦ Supporting evidence retrieval Seeks additional evidence for each candidate answer from
the data sources while the deep evidence scoring step determines the degree of certainty that the retrieved evidence supports the candidate answers.
◦ Deep evidence scoring Scores are then combined into an overall evidence profile
which groups individual features into aggregate evidence dimensions.
153
![Page 149: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/149.jpg)
Ferrucci et al. 2010
Answer
Scoring
Question
Evidence
Sources
Primary
Search
Candidate
Answer
Generation
Hypothesis Generation
Hypothesis and Evidence Scoring
Answer
Sources
Question & Topic Analysis
Question Decomposition
Evidence
Retrieval
Deep
Evidence
Scoring
Hypothesis Generation
Hypothesis and Evidence Scoring
154
![Page 150: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/150.jpg)
Answer merging: is a step that merges answer candidates (hypotheses) with different surface forms but with related content, combining their scores.
Ranking and confidence estimation: ranks the hypotheses and estimate their confidence based on the scores, using machine learning approaches over a training set. Multiple trained models cover different question types.
155
![Page 151: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/151.jpg)
Ferrucci et al. 2010
Farrell, 2011
Answer
Scoring
Models
Answer &
Confidence
Question
Evidence
Sources
Models
Models
Models
Models
Models Primary
Search
Candidate
Answer
Generation
Hypothesis Generation
Hypothesis and Evidence Scoring
Final Confidence Merging & Ranking
Synthesis
Answer
Sources
Question & Topic Analysis
Question Decomposition
Evidence
Retrieval
Deep
Evidence
Scoring
Hypothesis Generation
Hypothesis and Evidence Scoring
Learned Models
help combine and
weigh the Evidence
156
![Page 152: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/152.jpg)
Question 100s Possible
Answers
1000’s of
Pieces of Evidence
Multiple
Interpretations
100,000’s scores from many simultaneous
Text Analysis Algorithms 100s sources
Hypothesis Generation
Hypothesis and Evidence Scoring
Final Confidence Merging & Ranking
Synthesis
Question & Topic Analysis
Question Decomposition
Hypothesis Generation
Hypothesis and Evidence Scoring
Answer &
Confidence
Ferrucci et al. 2010,
Farrell, 2011
157
UIMA for interoperability
UIMA-AS for scale-out and speed
![Page 153: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/153.jpg)
Ferrucci et al. 2010
158
![Page 154: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/154.jpg)
159
Evaluation of QA over Linked Data
![Page 155: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/155.jpg)
Test Collection ◦ Questions
◦ Datasets
◦ Answers (Gold-standard)
Evaluation Measures
160
![Page 156: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/156.jpg)
Measures how complete is the answer set.
The fraction of relevant instances that are retrieved.
Which are the Jovian planets in the Solar System? ◦ Returned Answers:
Mercury
Jupiter
Saturn
Gold-standard:
– Jupiter
– Saturn
– Neptune
– Uranus
161
![Page 157: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/157.jpg)
Measures how accurate is the answer set.
The fraction of retrieved instances that are relevant.
Which are the Jovian planets in the Solar System? ◦ Returned Answers:
Mercury
Jupiter
Saturn
Gold-standard:
– Jupiter
– Saturn
– Neptune
– Uranus
162
![Page 158: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/158.jpg)
Measures the ranking quality.
The Reciprocal-Rank (1/r) of a query can be defined as the rank r at which a system returns the first relevant result.
Which are the Jovian planets in the Solar System?
Returned Answers:
– Mercury
– Jupiter
– Saturn
Gold-standard:
– Jupiter
– Saturn
– Neptune
– Uranus
163
![Page 159: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/159.jpg)
Query execution time
Indexing time
Index size
Dataset adaptation effort (Indexing time)
Semantic enrichment/disambiguation ◦ # of operations/time
![Page 160: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/160.jpg)
Question Answering over Linked Data (QALD-CLEF)
INEX Linked Data Track
BioASQ
SemSearch
167
![Page 161: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/161.jpg)
168
![Page 162: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/162.jpg)
QALD is a series of evaluation campaigns on question answering over linked data. ◦ QALD-1 (ESWC 2011) ◦ QALD-2 as part of the workshop
(Interacting with Linked Data (ESWC 2012))
◦ QALD-3 (CLEF 2013) ◦ QALD-4 (CLEF 2014)
It is aimed at all kinds of systems that mediate between a user, expressing his or her information need in natural language, and semantic data.
169
![Page 163: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/163.jpg)
QALD-4 is part of the Question Answering track at CLEF 2014:
http://nlp.uned.es/clef-qa/
Tasks: ◦ 1. Multilingual question answering over DBpedia
◦ 2. Biomedical question answering on interlinked data
◦ 3. Hybrid question answering
170
![Page 164: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/164.jpg)
Task:
Given a natural language question or keywords, either retrieve the correct answer(s) from a given RDF repository, or provide a SPARQL query that retrieves these answer(s).
◦ Dataset: DBpedia 3.9 (with multilingual labels)
◦ Questions: 200 training + 50 test
◦ Seven languages:
English, Spanish, German, Italian, French, Dutch, Romanian
171
![Page 165: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/165.jpg)
<question id = "36" answertype = "resource"
aggregation = "false"
onlydbo = "true" >
Through which countries does the Yenisei river flow? Durch welche Länder fließt der Yenisei? ¿Por qué países fluye el río Yenisei? ...
PREFIX res: <http://dbpedia.org/resource/>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?uri WHERE {
res:Yenisei_River dbo:country ?uri .
}
172
![Page 166: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/166.jpg)
Datasets: SIDER, Diseasome, Drugbank
Questions: 25 training + 25 test
require integration of information from different datasets
Example: What is the side effects of drugs used for Tuberculosis?
SELECT DISTINCT ?x WHERE {
disease:1154 diseasome:possibleDrug ?v2 .
?v2 a drugbank:drugs .
?v3 owl:sameAs ?v2 .
?v3 sider:sideEffect ?x .
}
173
![Page 167: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/167.jpg)
Dataset: DBpedia 3.9 (with English abstracts)
Questions: 25 training + 10 test
require both structured data and free text from the abstract to be answered
Example: Give me the currencies of all G8 countries.
SELECT DISTINCT ?uri WHERE {
?x text:"member of" text:"G8" .
?x dbo:currency ?uri .
}
174
![Page 168: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/168.jpg)
Focuses on the combination of textual and structured data.
Datasets:
◦ English Wikipedia (MediaWiki XML Format)
◦ DBpedia 3.8 & YAGO2 (RDF)
◦ Links among the Wikipedia, DBpedia 3.8, and YAGO2 URI's.
Tasks:
◦ Ad-hoc Task: return a ranked list of results in response to a search topic that is formulated as a keyword query (144 search topics).
◦ Jeopardy Task: Investigate retrieval techniques over a set of natural-language Jeopardy clues (105 search topics – 74 (2012) + 31 (2013)).
https://inex.mmci.uni-saarland.de/tracks/lod/
180
![Page 169: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/169.jpg)
181
![Page 170: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/170.jpg)
Focuses on entity search over Linked Datasets. Datasets: ◦ Sample of Linked Data crawled from publicly available
sources (based on the Billion Triple Challenge 2009).
Tasks: ◦ Entity Search: Queries that refer to one particular
entity. Tiny sample of Yahoo! Search Query. ◦ List Search: The goal of this track is select objects that
match particular criteria. These queries have been hand-written by the organizing committee.
http://semsearch.yahoo.com/datasets.php#
182
![Page 171: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/171.jpg)
List Search queries: ◦ republics of the former Yugoslavia ◦ ten ancient Greek city ◦ kingdoms of Cyprus ◦ the four of the companions of the prophet ◦ Japanese-born players who have played in MLB where
the British monarch is also head of state ◦ nations where Portuguese is an official language ◦ bishops who sat in the House of Lords ◦ Apollo astronauts who walked on the Moon
183
![Page 172: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/172.jpg)
Entity Search queries: ◦ 1978 cj5 jeep
◦ employment agencies w. 14th street
◦ nyc zip code
◦ waterville Maine
◦ LOS ANGELES CALIFORNIA
◦ ibm
◦ KARL BENZ
◦ MIT
184
![Page 173: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/173.jpg)
Balog & Neumayer, A Test Collection for Entity Search in DBpedia (2013).
185
![Page 174: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/174.jpg)
Datasets: ◦ PubMed documents
Tasks: ◦ 1a: Large-Scale Online Biomedical Semantic
Indexing
Automatic annotation of PubMed documents.
Training data is provided.
◦ 1b: Introductory Biomedical Semantic QA
300 questions and related material (concepts, triples and golden answers).
186
![Page 175: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/175.jpg)
Metrics, Statistics, Tests - Tetsuya Sakai (IR) ◦ http://www.promise-noe.eu/documents/10156/26e7f254-
1feb-4169-9204-1c53cc1fd2d7
Building test Collections (IR Evaluation - Ian Soboroff) ◦ http://www.promise-noe.eu/documents/10156/951b6dfb-
a404-46ce-b3bd-4bbe6b290bfd
187
![Page 176: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/176.jpg)
188
Do-it-yourself (DIY): Core Resources
![Page 177: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/177.jpg)
DBpedia ◦ http://dbpedia.org/
YAGO ◦ http://www.mpi-inf.mpg.de/yago-naga/yago/
Freebase ◦ http://www.freebase.com/
Wikipedia dumps ◦ http://dumps.wikimedia.org/
ConceptNet ◦ http:// conceptnet5.media.mit.edu/
Common Crawl ◦ http://commoncrawl.org/
Where to use: ◦ As a commonsense KB or as a data source
189
![Page 178: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/178.jpg)
High domain coverage: ◦ ~95% of Jeopardy! Answers. ◦ ~98% of TREC answers.
Wikipedia is entity-centric. Curated link structure. Complementary tools: ◦ Wikipedia Miner.
Where to use: ◦ Construction of distributional semantic models. ◦ As a commonsense KB
190
![Page 179: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/179.jpg)
WordNet ◦ http://wordnet.princeton.edu/
Wiktionary ◦ http://www.wiktionary.org/ ◦ API: https://www.mediawiki.org/wiki/API:Main_page
FrameNet ◦ https://framenet.icsi.berkeley.edu/fndrupal/
VerbNet ◦ http://verbs.colorado.edu/~mpalmer/projects/verbnet.html
English lexicon for DBpedia 3.8 (in the lemon format) ◦ http://lemon-model.net/lexica/dbpedia_en/
PATTY (collection of semantically-typed relational patterns) ◦ http://www.mpi-inf.mpg.de/yago-naga/patty/
BabelNet ◦ http://babelnet.org/
Where to use: ◦ Query expansion ◦ Semantic similarity ◦ Semantic relatedness ◦ Word sense disambiguation
191
![Page 180: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/180.jpg)
Lucene & Solr ◦ http://lucene.apache.org/
Terrier ◦ http://terrier.org/
Where to use: ◦ Answer Retrieval
◦ Scoring
◦ Query-Data matching
192
![Page 181: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/181.jpg)
GATE (General Architecture for Text Engineering) ◦ http://gate.ac.uk/
NLTK (Natural Language Toolkit) ◦ http://nltk.org/
Stanford NLP ◦ http://www-nlp.stanford.edu/software/index.shtml
LingPipe ◦ http://alias-i.com/lingpipe/index.html
Where to use: ◦ Question Analysis
193
![Page 182: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/182.jpg)
MALT ◦ http://www.maltparser.org/ ◦ Languages (pre-trained): English, French, Swedish
Stanford parser ◦ http://nlp.stanford.edu/software/lex-parser.shtml ◦ Languages: English, German, Chinese, and others
CHAOS ◦ http://art.uniroma2.it/external/chaosproject/ ◦ Languages: English, Italian
C&C Parser ◦ http://svn.ask.it.usyd.edu.au/trac/candc
Where to Use: ◦ Question Analysis
194
![Page 183: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/183.jpg)
NERD (Named Entity Recognition and Disambiguation) ◦ http://nerd.eurecom.fr/
Stanford Named Entity Recognizer ◦ http://nlp.stanford.edu/software/CRF-NER.shtml
FOX (Federated Knowledge Extraction Framework) ◦ http://fox.aksw.org
DBpedia Spotlight ◦ http://spotlight.dbpedia.org
Where to use: ◦ Question Analysis ◦ Query-Data Matching
195
![Page 184: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/184.jpg)
Wikipedia Miner ◦ http://wikipedia-miner.cms.waikato.ac.nz/
WS4J (Java API for several semantic relatedness algorithms) ◦ https://code.google.com/p/ws4j/
SecondString (string matching) ◦ http://secondstring.sourceforge.net
EasyESA (distributional semantics framework) ◦ http://easy-esa.org
S-space (distributional semantics framework) ◦ https://github.com/fozziethebeat/S-Space
Where to use: ◦ Query-Data matching ◦ Semantic relatedness & similiarity ◦ Word Sense Disambiguation
196
![Page 185: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/185.jpg)
DIRT ◦ Paraphrase Collection:
http://aclweb.org/aclwiki/index.php?title
◦ DIRT_Paraphrase_Collection
Demo:
http://demo.patrickpantel.com/demos/lexsem/paraphrase.htm
PPDB (The Paraphrase Database) ◦ http://www.cis.upenn.edu/~ccb/ppdb/
Where to use: ◦ Query-Data matching
197
![Page 186: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/186.jpg)
Apache UIMA ◦ http://uima.apache.org/
Open Advancement of Question Answering Systems (OAQA) ◦ http://oaqa.github.io/
OKBQA ◦ http://www.okbqa.org/documentation
https://github.com/okbqa
Where to use: ◦ Components integration
198
![Page 187: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/187.jpg)
199
Trends
![Page 188: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/188.jpg)
Querying distributed linked data Integration of structured and unstructured data
User interaction and context mechanisms Integration of reasoning (deductive, inductive,
counterfactual, abductive ...) on QA approaches and test collections
Measuring confidence and answer uncertainty
Multilinguality Machine Learning
Reproducibility and resource integration in QA research
200
![Page 189: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/189.jpg)
Linked/Big Data demand new principled semantic approaches to cope with the scale and heterogeneity of data.
Part of the Semantic Web/AI vision can be addressed today with a multi-disciplinary perspective:
◦ Linked Data, IR and NLP
The multidiscipinarity of the QA problem can show what semantic computing have achieved and can be transported to other information system types.
Challenges are moving from the construction of basic QA systems to more sophisticated semantic functionalities.
Very active research area.
201
![Page 190: Question Answering over Linked Data (Reasoning Web Summer School)](https://reader035.vdocuments.site/reader035/viewer/2022070323/559458271a28ab6a2f8b476b/html5/thumbnails/190.jpg)
[1] Kaufmann & Bernstein, How Useful are Natural Language Interfaces to the Semantic Web for Casual End-users?, 2007
[2] Chin-Yew Lin, Question Answering.
[3] Farah Benamara, Question Answering Systems: State of the Art and Future Directions.
[4] Yahya et al Robust Question Answering over the Web of Linked Data, CIKM, 2013.
[5] Freitas et al., Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches and Trends, 2012.
[6] Freitas et al., Answering Natural Language Queries over Linked Data Graphs: A Distributional Semantics Approach,, 2014.
[7] Freitas et al., Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches and Trends, 2012.
[8] Freitas & Curry, Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach, IUI, 2014.
[9] Cimiano et al., Towards portable natural language interfaces to knowledge bases, 2008.
[10] Lopez et al., PowerAqua: fishing the semantic web, 2006.
[11] Damljanovic et al., Natural Language Interfaces to Ontologies: Combining Syntactic Analysis and Ontology-based Lookup through the User Interaction, 2010
[12] Unger et al. Template-based Question Answering over RDF Data, 2012.
[13] Cabrio et al., QAKiS: an Open Domain QA System based on Relational Patterns, 2012.
[14] How Useful Are Natural Language Interfaces to the Semantic Web for Casual End-Users?, 2007.
[15] Popescu et al.,Towards a theory of natural language interfaces to databases., 2003.
[16] Farrel, IBM Watson A Brief Overview and Thoughts for Healthcare Education and Performance Improvement .
[17] Freitas et al. On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study, NLIWoD, 2014.
202