on distance between deep syntax and semantic representation

31
Motivation Issues of transformation Conclusions On Distance between Deep Syntax and Semantic Representation aclav Nov´ ak Institute of Formal and Applied Linguistics Charles University Prague, Czech Republic Frontiers in Linguistically Annotated Corpora July 22, 2006, 16:00 – 16:30 Sydney, Australia [email protected]ff.cuni.cz Syntax – Semantic Distance 1/ 20

Upload: vaclav-novak

Post on 05-Dec-2014

533 views

Category:

Technology


0 download

DESCRIPTION

Sydney 2006

TRANSCRIPT

Page 1: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

On Distance between Deep Syntax and SemanticRepresentation

Vaclav Novak

Institute of Formal and Applied LinguisticsCharles University

Prague, Czech Republic

Frontiers in Linguistically Annotated CorporaJuly 22, 2006, 16:00 – 16:30

Sydney, Australia

[email protected] Syntax – Semantic Distance 1/ 20

Page 2: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

[email protected] Syntax – Semantic Distance 2/ 20

Page 3: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

[email protected] Syntax – Semantic Distance 2/ 20

Page 4: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

[email protected] Syntax – Semantic Distance 2/ 20

Page 5: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

[email protected] Syntax – Semantic Distance 2/ 20

Page 6: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

[email protected] Syntax – Semantic Distance 2/ 20

Page 7: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

Presentation Outline

Again

1 MotivationMultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces

2 Issues of transformationMappingTopic-Focus ArticulationAdditional Requirements

3 ConclusionsConclusionsRelated WorkFuture Work

[email protected] Syntax – Semantic Distance 3/ 20

Page 8: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces

MultiNet

What is MultiNet

Multilayered Semantic Network

University in Hagen, Germany

Hermann Helbig, Sven Hartrumpf

Parser: WOCADI for German(relies heavily on HaGenLex lexicon)

MWR interface (Workbench of Knowledge Engineer)

Designed w.r.t. question answering and cognitive modeling

[email protected] Syntax – Semantic Distance 4/ 20

Page 9: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces

Semantic Network

Properties of Semantic Networks

Everything represented as graph nodes

The utterances gradually build the graph

Inference rules can further connect the nodes(or add new ones)

⇒ Representation of knowledge, usable for inferencing and QA

[email protected] Syntax – Semantic Distance 5/ 20

Page 10: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces

[email protected] Syntax – Semantic Distance 6/ 20

Page 11: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces

MultiNet Example: “The car was damaged because of the impact.”

[email protected] Syntax – Semantic Distance 7/ 20

Page 12: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces

MultiNet – technical info

Properties of MultiNet

93 relations + 18 functions

7 layers of attributes

hierarchy of 46 sorts

1 edge-end attribute distinguishing immanent (prototypical /categorical) vs. situational knowledge

encapsulation of concepts

default vs. categorical inference rules

[email protected] Syntax – Semantic Distance 8/ 20

Page 13: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces

Prague Dependency Treebank

Developed at the Institute of Formaland Applied Linguistics, CharlesUniversity, Prague

Three layers of annotation

3,168 documents ≈ 49,442 sentences≈ 833,357 tokens annotated on allthree layers.

[email protected] Syntax – Semantic Distance 9/ 20

Page 14: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces

Prague Dependency Treebank

[email protected] Syntax – Semantic Distance 10/ 20

Page 15: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces

Prague Dependency Treebank

Sed’ klidne, nehybej se, nabadala me prıtelkyne. Kritizovali hvezdny system, verıce v autenticnost dosudneokoukanych tvarı, ktere se vsak zahy take staly hvezdami (a nenı to jen osud Belmonduv). Pacient, vzpomenuv sina vsechna prıkorı zpusobena mu spolecnostı, vztahl na doktora ruku.

[email protected] Syntax – Semantic Distance 10/ 20

Page 16: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces

Prague Dependency Treebank

Sed’ klidne, nehybej se, nabadala me prıtelkyne. Kritizovali hvezdny system, verıce v autenticnost dosudneokoukanych tvarı, ktere se vsak zahy take staly hvezdami (a nenı to jen osud Belmonduv). Pacient, vzpomenuv sina vsechna prıkorı zpusobena mu spolecnostı, vztahl na doktora ruku.

[email protected] Syntax – Semantic Distance 10/ 20

Page 17: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces

Prague Dependency Treebank

Sed’ klidne, nehybej se, nabadala me prıtelkyne. Kritizovali hvezdny system, verıce v autenticnost dosudneokoukanych tvarı, ktere se vsak zahy take staly hvezdami (a nenı to jen osud Belmonduv). Pacient, vzpomenuv sina vsechna prıkorı zpusobena mu spolecnostı, vztahl na doktora ruku.

.

.

t-lnd94103-085-p1s21B

root

přítelkyně

ACT

n.denot

#PersPron

ADDR

n.pron.def.pers

nabádat enunc

PRED

v

#PersPron

ACT

n.pron.def.pers

sedět

PAT

v

klidný

MANN

adj.denot

#Comma enunc

CONJ

coap

#Neg

RHEM

atom

hýbat_se

PAT

v

#PersPron

.

.

.

t-ln94200-173-p2s6

root

#PersPron

ACT

n.pron.def.pers

kritizovat enunc

PRED

v

systém

PAT

n.denot

hvězdný

RSTR

adj.denot

#Cor

ACT

qcomplex

věřit

COMPL

v

autentičnost

PAT

n.denot.neg

tvář

APP

n.denot

okoukaný

RSTR

adj.denot

dosud

TTILL

adv.denot.ngrad.nneg

však

PREC

atom

který

ACT

n.pron.indef

záhy

TWHEN basic

adv.denot.ngrad.nneg

také

RHEM

atom

stát_se

RSTR

v

hvězda

PAT

n.denot

a

PREC

atom

ten

ACT

n.pron.def.demon

#Neg

RHEM

atom

být enunc

PAR

v

osud

PAT

n.denot

jen

RHEM

atom

Belmondo

APP

n.denot

.

t-ln94211-120-p5s4

root

pacient

ACT

n.denot

#Cor

ACT

qcomplex

vzpomenout_si

COMPL

v

příkoří

PAT

n.denot

#PersPron

PAT

n.pron.def.pers

způsobený

RSTR

adj.denot

společnost

ACT

n.denot

který

RSTR

adj.pron.indef

vztáhnout enunc

PRED

v

doktor

PAT

n.denot

ruku

DPHR

dphr

[email protected] Syntax – Semantic Distance 10/ 20

Page 18: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces

Tectogrammatical Representation

Properties of Tectogrammatical Layer

One sentence ≈ one tree

Auxiliaries and function words removed

Missing obligatory valents inserted

Attributes of nodes

FunctorSemantic part of speech15 grammatemes (negation, tense, politeness, . . . )Topic-Focus distinctionSentential modality+ technical attributes (coordinations, parentheses, IDs)

[email protected] Syntax – Semantic Distance 11/ 20

Page 19: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces

Tectogrammatical Representation

.

.

t-lnd94103-085-p1s21B

root

přítelkyně

ACT

n.denot

#PersPron

ADDR

n.pron.def.pers

nabádat enunc

PRED

v

#PersPron

ACT

n.pron.def.pers

sedět

PAT

v

klidný

MANN

adj.denot

#Comma enunc

CONJ

coap

#Neg

RHEM

atom

hýbat_se

PAT

v

[email protected] Syntax – Semantic Distance 12/ 20

Page 20: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

MultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces

Additional Required Information

Missing Pieces

1 Named entities recognition

NumbersPlacesPeople. . .

2 Metadata

AuthorDatePlaceDocument typeIntended recipient of the textBibliographical and other references

[email protected] Syntax – Semantic Distance 13/ 20

Page 21: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

MappingTopic-Focus ArticulationAdditional Requirements

Presentation Outline Again

1 MotivationMultiNet – Knowledge RepresentationPrague Dependency TreebankMissing pieces

2 Issues of transformationMappingTopic-Focus ArticulationAdditional Requirements

3 ConclusionsConclusionsRelated WorkFuture Work

[email protected] Syntax – Semantic Distance 14/ 20

Page 22: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

MappingTopic-Focus ArticulationAdditional Requirements

Mapping of Representational Means

Main Issues of Transformation

– closer look

1 Mapping of edges and corresponding functors in TR toMultiNet cognitive roles

2 Mapping of TR nodes to MultiNet concepts

3 Mapping of various natural language constructs toattribute-value assignments

4 Mapping of verbal tenses to temporal axis

[email protected] Syntax – Semantic Distance 15/ 20

Page 23: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

MappingTopic-Focus ArticulationAdditional Requirements

Mapping of Representational Means

Main Issues of Transformation – closer look 1

1 Mapping of edges and corresponding functors in TR toMultiNet cognitive roles

Actor and Patient highly ambiguousLocation functors are used also where no location is involved(ELMT, CTXT, SITU)However, other functors correspond quite straightforwardly toMultiNet roles (a table is presented in the paper)

2 Mapping of TR nodes to MultiNet concepts

3 Mapping of various natural language constructs toattribute-value assignments

4 Mapping of verbal tenses to temporal axis

[email protected] Syntax – Semantic Distance 15/ 20

Page 24: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

MappingTopic-Focus ArticulationAdditional Requirements

Mapping of Representational Means

Main Issues of Transformation – closer look 2

1 Mapping of edges and corresponding functors in TR toMultiNet cognitive roles

2 Mapping of TR nodes to MultiNet concepts

Typically, a TR node corresponds to a MultiNet concept (i.e.,also a node)Quite often, a TR node corresponds to a subnetwork inMultiNetSometimes, the TR node corresponds to an edge in MultiNet(e.g., CORR, CTXT)

3 Mapping of various natural language constructs toattribute-value assignments

4 Mapping of verbal tenses to temporal axis

[email protected] Syntax – Semantic Distance 15/ 20

Page 25: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

MappingTopic-Focus ArticulationAdditional Requirements

Mapping of Representational Means

Main Issues of Transformation – closer look 3

1 Mapping of edges and corresponding functors in TR toMultiNet cognitive roles

2 Mapping of TR nodes to MultiNet concepts

3 Mapping of various natural language constructs toattribute-value assignments

The color of x is y .

x has y color.

x is y .

y is the color of x .

ATTR

SUB

color

VAL

y

x4 Mapping of verbal tenses to temporal axis

[email protected] Syntax – Semantic Distance 15/ 20

Page 26: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

MappingTopic-Focus ArticulationAdditional Requirements

Mapping of Representational Means

Main Issues of Transformation – closer look 4

1 Mapping of edges and corresponding functors in TR toMultiNet cognitive roles

2 Mapping of TR nodes to MultiNet concepts

3 Mapping of various natural language constructs toattribute-value assignments

4 Mapping of verbal tenses to temporal axis

Verbal tenses encoded in grammatemesIn MultiNet, TEMP, ANTE, DUR, STRT, and FIN relations can beused.

[email protected] Syntax – Semantic Distance 15/ 20

Page 27: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

MappingTopic-Focus ArticulationAdditional Requirements

Topic-Focus Articulation

TFA in PDT

TFA is annotated on the Tectogrammatical layer

Every word has an attribute: c, t, or f

The nodes are ordered with respect to“communicativedynamism”

TFA in MultiNet

Content expressed by TFA is further analyzed into:1 Encapsulation of concepts2 Scope of quantifiers3 Layer attributes (GENER, REFER, VARIA, . . . )

[email protected] Syntax – Semantic Distance 16/ 20

Page 28: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

MappingTopic-Focus ArticulationAdditional Requirements

Additional Requirements

Additional Requirements

1 Spatio-Temporal Representation

For simple inferences about space and time

2 Calendar

For computations with dates

3 Ontology

For all kinds of inferencesOntology is an inherent part of MultiNet semantic networkdesignUpper conceptual ontology represented by sorts

[email protected] Syntax – Semantic Distance 17/ 20

Page 29: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

ConclusionsRelated WorkFuture Work

Conclusions

Conclusions

MultiNet is a suitable formalism for inferences and QA

It’s difficult to transform texts into MultiNet

Tectogrammatical representation is not designed forinferencing and QA

There are tools for text-to-TR conversion

TR is a good starting point for conversion to MultiNet(structural similarity, disambiguation in TR)

We have presented issues arising in such a process

[email protected] Syntax – Semantic Distance 18/ 20

Page 30: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

ConclusionsRelated WorkFuture Work

Related Work

Related Work

Helbig (1986): Automatical transformation to MultiNet

Horak (2001): Automatical transformation to TransparentIntensional Logic

Callmeier et al. (2004): DeepThought project – automaticaltransformation to Robust Minimal Recursion Semantics

Bos (2005): Automatical transformation to DiscourseRepresentation Theory

Bolshakov and Gelbukh (2000): Automatical transformationin Meaning–Text Theory framework

Kruijff-Korbayova (1998): TR to DRT automaticaltransformation

[email protected] Syntax – Semantic Distance 19/ 20

Page 31: On Distance between Deep Syntax and Semantic Representation

MotivationIssues of transformation

Conclusions

ConclusionsRelated WorkFuture Work

Future Work

Future Work

1 Stage I – Preparation

Annotation toolsAnnotation guidelines

2 Stage II – Annotation

Pilot studyAutomated preprocessingEvaluation of annotators

3 Stage III – Application

Supervised“parsing”Assessment of TR necessity

[email protected] Syntax – Semantic Distance 20/ 20