collate deriving framenet representations: towards meaning-oriented question answering gerhard...

44
COLLATE Deriving FrameNet Representations: Towards Meaning-Oriented Question Answering Gerhard Fliedner DFKI GmbH and Computational Linguistics, Saarland University NLDB 2004, 23 June 2004

Post on 21-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

COLLATE

Deriving FrameNet Representations: Towards Meaning-Oriented Question

Answering

Gerhard FliednerDFKI GmbH andComputational Linguistics, Saarland University

NLDB 2004, 23 June 2004

NLDB 2004, 23 June 2004

2

Overview

Introduction Question Answering Using FrameNet Deriving FrameNet Structures from Texts Implementation Issues Conclusions

NLDB 2004, 23 June 2004

3

Introduction

We present a system for automatically annotating German texts with lexical semantic structures, namely FrameNet.

This module is eventually to form the core of a Question Answering system that uses direct matching of FrameNet representations of both document collection and the user’s questions.

This work is pursued within the Collate project (Computational Linguistics and Language Technology for Real Life Applications) at DFKI GmbH, partly jointly with the Computational Linguistics Department of the Saarland University, Saarbrücken.

Joint work with Christian Braun (Saarland University)

NLDB 2004, 23 June 2004

4

Meaning Oriented QA: System Architecture

NLDB 2004, 23 June 2004

5

Overview

Introduction Question Answering Using FrameNet Deriving FrameNet Structures from Texts Implementation Issues Conclusions

NLDB 2004, 23 June 2004

6

Using Semantics in Question Answering

Most QA systems use IR techniques based on surface words for document/passage retrieval.

Often used extensions: Stemming Query expansion using semantically related words (mostly using

WordNet) Deeper linguistic processing of retrieved passages (using logic

forms or similar)

However, semantic relations between words are rarely taken into account.

NLDB 2004, 23 June 2004

7

Different textual realisations

We want to reliably capture systematic semantic relations such as Synonmy: buy vs. purchase Converse/inverse relations: buy vs. sell (Some) Hyponymy/Hyperonymy: order/request

Realisations as e.g. verbs or nouns should receive the same representationA sold B to C vs. the sale of C to B by A

We also want to factor out different surface realisations of argument PPs (especially with ‘picture nouns’) Compare A with B vs. compare A to B

NLDB 2004, 23 June 2004

8

FrameNet

As a framework for a ‘flat’ semantic representation, we have chosen FrameNet.

FrameNet is a database that documents the semantic and syntactic valence, using a concept that is derived from the idea of thematic roles (Fillmore, 68).

Related words are grouped into a hierarchical structure of frames according to word fields.

Instead of universal thematic roles, each frame has a set of specific roles (frame elements)

For example: Commerce (buy, sell, sale) defines frame elements buyer and seller.

NLDB 2004, 23 June 2004

9

FrameNet: Resources

English FrameNet:ICSI, University of Berkeley (CA)

Charles Fillmore et al.

Overall running time: 5 years

German FrameNet:SALSA (The Saarbrücken Lexical Semantics Annotation and Analysis

Project)

Leibniz programme of the German Science Foundation (DFG)

Saarland University

Manfred Pinkal et al.

NLDB 2004, 23 June 2004

10

FrameNet: Example

NLDB 2004, 23 June 2004

11

Example: Commerce_buy

Commerce_buy (buy.v, purchase.v , purchase.n)Core Elements:BuyerSeller GoodsMoneyRate five dollars an hourUnit by the pound

Non-Core Elements:Means buy with cashPlacePurposeReasonTime

NLDB 2004, 23 June 2004

12

Overview

Introduction Question Answering Using FrameNet Deriving FrameNet Structures from Texts Implementation Issues Conclusions

NLDB 2004, 23 June 2004

13

Deriving FrameNet Structures from Texts

Cascade of Parsers Parsers use hand-crafted grammars. Easy-first parsing (Abney): Every parser recognises one

linguistically motivated ‘layer’. Ambiguities are in general left unresolved, so that later

processing steps may resolve them.

NLDB 2004, 23 June 2004

14

Morphology

Tokenisation (rule-based, using abbreviation recognition) Morphological analysis using GERTWOL German Two-

Level Morphology by Lingsoft Oy, Helsinki. Full German morphology (inflection, derivation,

composition) Broad coverage lexicon (~350,000 stems)

NLDB 2004, 23 June 2004

15

Topological Parser (Braun 99, 03)

Recognising German sentence structure based on sentence topology

German sentences have a relatively rigid structure (Vorfeld, left sentence bracket, Mittelfeld, right sentence bracket, Nachfeld).

Helps to recognise the following Subordinate clauses Split verbs Verb clusters

Parser uses Context-Free grammar. Evaluation: 87% precision&recall (perfect match).

NLDB 2004, 23 June 2004

16

German Sentence Topology Stellungsfeldertheorie ([Drach, 1937], [Engel, 1970])

A German sentence is dividable into fields:

Das Unternehmen hat 1999 gute Gewinne gemacht, weil es expandiert hat.

LK RK

LK RK

The company has 1999 good profits made, because it expanded has.

Main clause

VF MF NF

MF

Subordinate clauseVF : Front fieldMF: MidfieldNF: Back field

LK: Left bracketRK: Right bracket

NLDB 2004, 23 June 2004

17

NE Recognition

Finite state based rule set Developed in Collate IE subproject (multilingual NE

recognition) Covers company names, currency expressions, date

expressions, number expressions, person names Gazetteer with several thousand company names Evaluation: precision 96%, recall 82% (average for

different text sorts) Complementation with more sophisticated techniques is

under investigation (e.g. “learn-filter-apply-forget”, Volk/Clematide 01).

NLDB 2004, 23 June 2004

18

NP/PP Chunking

NP/PP chunking based on extended finite state grammar Extension allows complex, self-embedded NPs/PPs (e.g.

Adjective phrases with pre-nominal complements/modifiers).

Chunker includes results from NE recogniser (N' or NP), allowing complex NPs, e.g. with coordination.

Evaluation (NEGRA, Brants et al. 99, as gold standard): recall 92%, precision 71% (due to different handling of postnominal attachment)

NLDB 2004, 23 June 2004

19

PReDS

Syntacto-semantic dependency structure (Partially Resolved Dependency Structure)

Abstracts away over certain surface differences (active/passive), retains others (prepositions in PPs).

Underspecified in case of ambiguities Derivation using context-free grammar Brings together results from all previous steps

NLDB 2004, 23 June 2004

20

Deriving FrameNet structures

Based on PReDS Subtree matching using weighted rules Based on FrameNet valency information Small coverage for German yet, but grows with increasing

FrameNet coverage

NLDB 2004, 23 June 2004

21

Putting it all together

Gloss:

Lockheed has from Great Britain the order for 25 transport planes received.

NLDB 2004, 23 June 2004

22

Overview

Introduction Question Answering Using FrameNet Deriving FrameNet Structures from Texts Implementation Issues Conclusions

NLDB 2004, 23 June 2004

23

Implementation Issues of the QA system

Frame Merging Question type recognition/question typology Efficient storing of FrameNet structures (database) ‘Ontology-enabled’ matching Matching Interlinked Frames (‘database join’) Inferencing

NLDB 2004, 23 June 2004

24

Question/MatchingLockheed has received an order for 25 transport planes from Great

Britain.

From whom has Lockheed received an order?

Getting

Target: Receive

Donor: Great Britain

Recipient: Lockheed

Theme:

Getting

Target: Receive

Donor: ? [Person_or_Organisation]

Recipient: Lockheed

Theme: (Request)

NLDB 2004, 23 June 2004

25

Question/MatchingLockheed has received an order for 25 transport planes from Great

Britain.

From whom has Lockheed received an order?

Getting

Target: Receive

Donor: Great Britain

Recipient: Lockheed

Theme:

Getting

Target: Receive

Donor: ? [Person_or_Organisation]

Recipient: Lockheed

Theme: (Request)

NLDB 2004, 23 June 2004

26

Question/MatchingLockheed has received an order for 25 transport planes from Great

Britain.

From whom has Lockheed received an order?

Getting

Target: Receive

Donor: Great Britain

Recipient: Lockheed

Theme:

Getting

Target: Receive

Donor: ? [Person_or_Organisation]

Recipient: Lockheed

Theme: (Request)

NLDB 2004, 23 June 2004

27

Overview

Introduction Question Answering Using FrameNet Deriving FrameNet Structures from Texts Implementation Issues Conclusions

NLDB 2004, 23 June 2004

28

Conclusions

We have presented a system for deriving FrameNet structures from German texts.

The coverage needs to be extended with growing German FrameNet.

An evaluation is in process, based on matching of grammatical relations (Carroll, Minnen, Briscoe 03)

The QA system is still in its design phase, some of the issues have been shown.

NLDB 2004, 23 June 2004

29

Questions

NLDB 2004, 23 June 2004

30

Backups

NLDB 2004, 23 June 2004

31

FrameNet Representation: Current Thoughts

Basis: Frame instances Frame elements are references (links) to frame instances A FrameNet representation thus forms a network of linked

frame instances. This is comparable to the A-Box in Knowledge

Representation.

NLDB 2004, 23 June 2004

32

Example

Lockheed has received an order for 25 transport planes from Great Britain.

Getting

Target: Receive

Donor: Great Britain

Recipient: Lockheed

Theme:

Request

Target: Order

Message: 25 Transport planes

Speaker:

Addressee:

NLDB 2004, 23 June 2004

33

Frame Merging

Lockheed has received an order for 25 transport planes from Great Britain.

Getting

Target: Receive

Donor: Great Britain

Recipient: Lockheed

Theme:

Request

Target: Order

Message: 25 Transport planes

Speaker:

Addressee:

NLDB 2004, 23 June 2004

34

Frame Merging

Comparable to template merging in IE systems. In IE often done by sets of rules describing equality and

inequality constraints over template slots. Hand-craft rules? (Observation: Give/receive order are

‘strong’ collocations) Use machine learning?

NLDB 2004, 23 June 2004

35

Matching

Storing/Matching in principle straightforward, but: Hypo/Hypernyms should be matched, e.g. should plane

match transport plane in ‘Who has ordered planes from Lockheed?’.

Similar to ‘Ontology-Enabled’ Searching (Weikum et al.)

NLDB 2004, 23 June 2004

36

Missing Frames

The FrameNet coverage is not yet perfect, therefore ‘missing’ frames will have to be inserted.

Different degree of difficulty: Named Entities (as ‘Great Britain’): Introduce pseudo-frame

without frame elements. Nouns (as ‘transport planes’): Introduce frame, try to ‘position’ it

using sortal information. Verbs (as ‘pinch’ for ‘get’): Introduce frame, try to ‘position’ it using

sortal information, assign underspecified frame elements.

NLDB 2004, 23 June 2004

37

Underspecified frames

Lockheed has received an order for 25 transport planes from Great Britain.

Who pinched the order from Great Britain?

Getting

Target: Receive

Donor: Great Britain

Recipient: Lockheed

Theme:

PseudoFrame_pinch

Target: pinch

DeepSubject: ? [Person_or_Organisation]

PPfrom: Great Britain

DeepObject: (Request)

NLDB 2004, 23 June 2004

38

Sortal Information

Lockheed has received an order for 25 transport planes from Great Britain.

From whom has Lockheed received an order for the construction of transport planes?

Request

Target: Order

Message: 25 transport planes

Request

Target: Order

Message:

Construction

Target: Construction

Created_entity: 25 transport planes

NLDB 2004, 23 June 2004

39

Sortal Information

Lockheed has received an order for 25 transport planes from Great Britain.

From whom has Lockheed received an order for the construction of transport planes?

Request

Target: Order

Message: 25 transport planes

Request

Target: Order

Message:

Construction

Target: Construction

Created_entity: 25 transport planes

??

NLDB 2004, 23 June 2004

40

Sortal Mismatch

Case of sortal mismatch: ‘Message’ should contain an event, ’25 transport planes’ is not an event

General solution: Type coercion. Two solutions possible:

Introduce an empty, underspecified frame during indexing. Enhance matching to handle these cases.

NLDB 2004, 23 June 2004

41

Matching Interlinked Frames

Lockheed has received an order for 25 transport planes from Great Britain.

Who has received an order for 25 transport planes?

Getting

Target: Receive

Donor: Great Britain

Recipient: Lockheed

Theme:

Request

Target: Order

Message: 25 Transport planes

Speaker:

Addressee:

NLDB 2004, 23 June 2004

42

Database join

In relational databases, such a query would be done using a join (very efficient).

Can that be brought together with out other requirements?

NLDB 2004, 23 June 2004

43

Inferencing

Quite often, inferencing might help to find answers to ‘hard’ questions:List plane manufacturers.plane_manufacturer(x)↔company(x)&y.produce(x,y)& plane(y)company(lockheed).z.receive_from(lockheed,z,great_britain)&order_to(z,w)& F.F(lockheed,v)&plane(v).plane_manufacturer(lockheed).

See QA engines by LCC (Harabagiu et al.)

NLDB 2004, 23 June 2004

44

Parser Evaluation: Grammatical Function Annotation

<Sent>Die im Direktvertrieb aktiven Gesellschaften schneiden 1994 gut ab.</Sent>

<Roles>

ncsubj(abschneiden, Gesellschaft, _)

dobj(abschneiden, gut, _)

ncmod(_, Gesellschaft, aktiv)

iobj(in, aktiv, Direkt#vertrieb)

ncmod(_, abschneiden, 1994)

</Roles>