collate deriving framenet representations: towards meaning-oriented question answering gerhard...
Post on 21-Dec-2015
222 views
TRANSCRIPT
COLLATE
Deriving FrameNet Representations: Towards Meaning-Oriented Question
Answering
Gerhard FliednerDFKI GmbH andComputational Linguistics, Saarland University
NLDB 2004, 23 June 2004
NLDB 2004, 23 June 2004
2
Overview
Introduction Question Answering Using FrameNet Deriving FrameNet Structures from Texts Implementation Issues Conclusions
NLDB 2004, 23 June 2004
3
Introduction
We present a system for automatically annotating German texts with lexical semantic structures, namely FrameNet.
This module is eventually to form the core of a Question Answering system that uses direct matching of FrameNet representations of both document collection and the user’s questions.
This work is pursued within the Collate project (Computational Linguistics and Language Technology for Real Life Applications) at DFKI GmbH, partly jointly with the Computational Linguistics Department of the Saarland University, Saarbrücken.
Joint work with Christian Braun (Saarland University)
NLDB 2004, 23 June 2004
5
Overview
Introduction Question Answering Using FrameNet Deriving FrameNet Structures from Texts Implementation Issues Conclusions
NLDB 2004, 23 June 2004
6
Using Semantics in Question Answering
Most QA systems use IR techniques based on surface words for document/passage retrieval.
Often used extensions: Stemming Query expansion using semantically related words (mostly using
WordNet) Deeper linguistic processing of retrieved passages (using logic
forms or similar)
However, semantic relations between words are rarely taken into account.
NLDB 2004, 23 June 2004
7
Different textual realisations
We want to reliably capture systematic semantic relations such as Synonmy: buy vs. purchase Converse/inverse relations: buy vs. sell (Some) Hyponymy/Hyperonymy: order/request
Realisations as e.g. verbs or nouns should receive the same representationA sold B to C vs. the sale of C to B by A
We also want to factor out different surface realisations of argument PPs (especially with ‘picture nouns’) Compare A with B vs. compare A to B
NLDB 2004, 23 June 2004
8
FrameNet
As a framework for a ‘flat’ semantic representation, we have chosen FrameNet.
FrameNet is a database that documents the semantic and syntactic valence, using a concept that is derived from the idea of thematic roles (Fillmore, 68).
Related words are grouped into a hierarchical structure of frames according to word fields.
Instead of universal thematic roles, each frame has a set of specific roles (frame elements)
For example: Commerce (buy, sell, sale) defines frame elements buyer and seller.
NLDB 2004, 23 June 2004
9
FrameNet: Resources
English FrameNet:ICSI, University of Berkeley (CA)
Charles Fillmore et al.
Overall running time: 5 years
German FrameNet:SALSA (The Saarbrücken Lexical Semantics Annotation and Analysis
Project)
Leibniz programme of the German Science Foundation (DFG)
Saarland University
Manfred Pinkal et al.
NLDB 2004, 23 June 2004
11
Example: Commerce_buy
Commerce_buy (buy.v, purchase.v , purchase.n)Core Elements:BuyerSeller GoodsMoneyRate five dollars an hourUnit by the pound
Non-Core Elements:Means buy with cashPlacePurposeReasonTime
NLDB 2004, 23 June 2004
12
Overview
Introduction Question Answering Using FrameNet Deriving FrameNet Structures from Texts Implementation Issues Conclusions
NLDB 2004, 23 June 2004
13
Deriving FrameNet Structures from Texts
Cascade of Parsers Parsers use hand-crafted grammars. Easy-first parsing (Abney): Every parser recognises one
linguistically motivated ‘layer’. Ambiguities are in general left unresolved, so that later
processing steps may resolve them.
NLDB 2004, 23 June 2004
14
Morphology
Tokenisation (rule-based, using abbreviation recognition) Morphological analysis using GERTWOL German Two-
Level Morphology by Lingsoft Oy, Helsinki. Full German morphology (inflection, derivation,
composition) Broad coverage lexicon (~350,000 stems)
NLDB 2004, 23 June 2004
15
Topological Parser (Braun 99, 03)
Recognising German sentence structure based on sentence topology
German sentences have a relatively rigid structure (Vorfeld, left sentence bracket, Mittelfeld, right sentence bracket, Nachfeld).
Helps to recognise the following Subordinate clauses Split verbs Verb clusters
Parser uses Context-Free grammar. Evaluation: 87% precision&recall (perfect match).
NLDB 2004, 23 June 2004
16
German Sentence Topology Stellungsfeldertheorie ([Drach, 1937], [Engel, 1970])
A German sentence is dividable into fields:
Das Unternehmen hat 1999 gute Gewinne gemacht, weil es expandiert hat.
LK RK
LK RK
The company has 1999 good profits made, because it expanded has.
Main clause
VF MF NF
MF
Subordinate clauseVF : Front fieldMF: MidfieldNF: Back field
LK: Left bracketRK: Right bracket
NLDB 2004, 23 June 2004
17
NE Recognition
Finite state based rule set Developed in Collate IE subproject (multilingual NE
recognition) Covers company names, currency expressions, date
expressions, number expressions, person names Gazetteer with several thousand company names Evaluation: precision 96%, recall 82% (average for
different text sorts) Complementation with more sophisticated techniques is
under investigation (e.g. “learn-filter-apply-forget”, Volk/Clematide 01).
NLDB 2004, 23 June 2004
18
NP/PP Chunking
NP/PP chunking based on extended finite state grammar Extension allows complex, self-embedded NPs/PPs (e.g.
Adjective phrases with pre-nominal complements/modifiers).
Chunker includes results from NE recogniser (N' or NP), allowing complex NPs, e.g. with coordination.
Evaluation (NEGRA, Brants et al. 99, as gold standard): recall 92%, precision 71% (due to different handling of postnominal attachment)
NLDB 2004, 23 June 2004
19
PReDS
Syntacto-semantic dependency structure (Partially Resolved Dependency Structure)
Abstracts away over certain surface differences (active/passive), retains others (prepositions in PPs).
Underspecified in case of ambiguities Derivation using context-free grammar Brings together results from all previous steps
NLDB 2004, 23 June 2004
20
Deriving FrameNet structures
Based on PReDS Subtree matching using weighted rules Based on FrameNet valency information Small coverage for German yet, but grows with increasing
FrameNet coverage
NLDB 2004, 23 June 2004
21
Putting it all together
Gloss:
Lockheed has from Great Britain the order for 25 transport planes received.
NLDB 2004, 23 June 2004
22
Overview
Introduction Question Answering Using FrameNet Deriving FrameNet Structures from Texts Implementation Issues Conclusions
NLDB 2004, 23 June 2004
23
Implementation Issues of the QA system
Frame Merging Question type recognition/question typology Efficient storing of FrameNet structures (database) ‘Ontology-enabled’ matching Matching Interlinked Frames (‘database join’) Inferencing
NLDB 2004, 23 June 2004
24
Question/MatchingLockheed has received an order for 25 transport planes from Great
Britain.
From whom has Lockheed received an order?
Getting
Target: Receive
Donor: Great Britain
Recipient: Lockheed
Theme:
Getting
Target: Receive
Donor: ? [Person_or_Organisation]
Recipient: Lockheed
Theme: (Request)
NLDB 2004, 23 June 2004
25
Question/MatchingLockheed has received an order for 25 transport planes from Great
Britain.
From whom has Lockheed received an order?
Getting
Target: Receive
Donor: Great Britain
Recipient: Lockheed
Theme:
Getting
Target: Receive
Donor: ? [Person_or_Organisation]
Recipient: Lockheed
Theme: (Request)
NLDB 2004, 23 June 2004
26
Question/MatchingLockheed has received an order for 25 transport planes from Great
Britain.
From whom has Lockheed received an order?
Getting
Target: Receive
Donor: Great Britain
Recipient: Lockheed
Theme:
Getting
Target: Receive
Donor: ? [Person_or_Organisation]
Recipient: Lockheed
Theme: (Request)
NLDB 2004, 23 June 2004
27
Overview
Introduction Question Answering Using FrameNet Deriving FrameNet Structures from Texts Implementation Issues Conclusions
NLDB 2004, 23 June 2004
28
Conclusions
We have presented a system for deriving FrameNet structures from German texts.
The coverage needs to be extended with growing German FrameNet.
An evaluation is in process, based on matching of grammatical relations (Carroll, Minnen, Briscoe 03)
The QA system is still in its design phase, some of the issues have been shown.
NLDB 2004, 23 June 2004
31
FrameNet Representation: Current Thoughts
Basis: Frame instances Frame elements are references (links) to frame instances A FrameNet representation thus forms a network of linked
frame instances. This is comparable to the A-Box in Knowledge
Representation.
NLDB 2004, 23 June 2004
32
Example
Lockheed has received an order for 25 transport planes from Great Britain.
Getting
Target: Receive
Donor: Great Britain
Recipient: Lockheed
Theme:
Request
Target: Order
Message: 25 Transport planes
Speaker:
Addressee:
NLDB 2004, 23 June 2004
33
Frame Merging
Lockheed has received an order for 25 transport planes from Great Britain.
Getting
Target: Receive
Donor: Great Britain
Recipient: Lockheed
Theme:
Request
Target: Order
Message: 25 Transport planes
Speaker:
Addressee:
NLDB 2004, 23 June 2004
34
Frame Merging
Comparable to template merging in IE systems. In IE often done by sets of rules describing equality and
inequality constraints over template slots. Hand-craft rules? (Observation: Give/receive order are
‘strong’ collocations) Use machine learning?
NLDB 2004, 23 June 2004
35
Matching
Storing/Matching in principle straightforward, but: Hypo/Hypernyms should be matched, e.g. should plane
match transport plane in ‘Who has ordered planes from Lockheed?’.
Similar to ‘Ontology-Enabled’ Searching (Weikum et al.)
NLDB 2004, 23 June 2004
36
Missing Frames
The FrameNet coverage is not yet perfect, therefore ‘missing’ frames will have to be inserted.
Different degree of difficulty: Named Entities (as ‘Great Britain’): Introduce pseudo-frame
without frame elements. Nouns (as ‘transport planes’): Introduce frame, try to ‘position’ it
using sortal information. Verbs (as ‘pinch’ for ‘get’): Introduce frame, try to ‘position’ it using
sortal information, assign underspecified frame elements.
NLDB 2004, 23 June 2004
37
Underspecified frames
Lockheed has received an order for 25 transport planes from Great Britain.
Who pinched the order from Great Britain?
Getting
Target: Receive
Donor: Great Britain
Recipient: Lockheed
Theme:
PseudoFrame_pinch
Target: pinch
DeepSubject: ? [Person_or_Organisation]
PPfrom: Great Britain
DeepObject: (Request)
NLDB 2004, 23 June 2004
38
Sortal Information
Lockheed has received an order for 25 transport planes from Great Britain.
From whom has Lockheed received an order for the construction of transport planes?
Request
Target: Order
Message: 25 transport planes
Request
Target: Order
Message:
Construction
Target: Construction
Created_entity: 25 transport planes
NLDB 2004, 23 June 2004
39
Sortal Information
Lockheed has received an order for 25 transport planes from Great Britain.
From whom has Lockheed received an order for the construction of transport planes?
Request
Target: Order
Message: 25 transport planes
Request
Target: Order
Message:
Construction
Target: Construction
Created_entity: 25 transport planes
??
NLDB 2004, 23 June 2004
40
Sortal Mismatch
Case of sortal mismatch: ‘Message’ should contain an event, ’25 transport planes’ is not an event
General solution: Type coercion. Two solutions possible:
Introduce an empty, underspecified frame during indexing. Enhance matching to handle these cases.
NLDB 2004, 23 June 2004
41
Matching Interlinked Frames
Lockheed has received an order for 25 transport planes from Great Britain.
Who has received an order for 25 transport planes?
Getting
Target: Receive
Donor: Great Britain
Recipient: Lockheed
Theme:
Request
Target: Order
Message: 25 Transport planes
Speaker:
Addressee:
NLDB 2004, 23 June 2004
42
Database join
In relational databases, such a query would be done using a join (very efficient).
Can that be brought together with out other requirements?
NLDB 2004, 23 June 2004
43
Inferencing
Quite often, inferencing might help to find answers to ‘hard’ questions:List plane manufacturers.plane_manufacturer(x)↔company(x)&y.produce(x,y)& plane(y)company(lockheed).z.receive_from(lockheed,z,great_britain)&order_to(z,w)& F.F(lockheed,v)&plane(v).plane_manufacturer(lockheed).
See QA engines by LCC (Harabagiu et al.)
NLDB 2004, 23 June 2004
44
Parser Evaluation: Grammatical Function Annotation
<Sent>Die im Direktvertrieb aktiven Gesellschaften schneiden 1994 gut ab.</Sent>
<Roles>
ncsubj(abschneiden, Gesellschaft, _)
dobj(abschneiden, gut, _)
ncmod(_, Gesellschaft, aktiv)
iobj(in, aktiv, Direkt#vertrieb)
ncmod(_, abschneiden, 1994)
</Roles>