a comprehensive framework for multimodal meaning representation ashwani kumar laurent romary...

25
A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

Upload: jason-oliver

Post on 05-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

A comprehensive framework for multimodal meaning representation

Ashwani KumarLaurent Romary

Laboratoire Loria, Vandoeuvre Lès Nancy

Page 2: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

Overview - 1

Context: Conception phase of the EU IST/MIAMM project (Multidimensional Information Access using Multiple Modalities - with DFKI, TNO, Sony, Canon)

Study of the design factors for a future haptic PDA like deviceUnderlying application: multidimensional access to a musical database

Page 3: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

Overview - 2

Objectives:Design and implementation of a unified representation language within the MIAMM demonstrator• MMIL: Multimodal interface language

“Blind” application of (Bunt & Romary 2002)

Page 4: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

Methodology

Basic componentsRepresent the general organization of any semantic structureParameterized by

• data categories taken from a common registry• application specific data categories

General mechanismsTo make the thing work

General categoriesDescriptive categories available to all formats

+ strict conformance to existing standards

Page 5: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

MIAMM - wheel mode

Page 6: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

MIAMM architecture

dependancies Dialogue Manager

MultiModalFusion (MMF)

MiaDomo

Database

Dialogue Historyy

Visual configuration

Action Planner (AP)

Sentences

Haptic Device

Display

Haptic Processor

Visualization Haptic-Visual

Generation

Visual-Haptic Processing (VisHapTac)

Speaker Speech Synthesis

Language Generation

Scheduling Information

Speech Generation

Haptic-Visual Interpretation

Microphone (Headset)

Continuous Speech Recognizer

Structural Analysis (SPIN)

Word/Phoneme Lattice

Speech Analysis

Word/Phoneme sequence

Page 7: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

Various processing steps - 1

Reco:Provides word latticesOut of our scope (MPEG7 word and phone lattice module)

SPIN:Template based (en-de) or TAG-based (fr) dependency structuresLow level semantic constructs

Page 8: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

Various processing steps - 2

MMF (Multimodal Fusion)Fully interpreted structuresReferential (MMILId) and temporal anchoringDialogue history update

AP (Action Planner)Generates MIAMM internal actions

• Request to MiaDoMo• Actions to be generated (Language+VisHapTac)

Page 9: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

Various processing steps - 3

VisHaptacInforms MMF of current graphical and haptic configuration (hierarchies of objects, focus, selection)

MMIL: must answer those needsBut not at the same time

Page 10: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

Main characteristics of MMIL

Basic ontologyEvents and participants (organized as hierarchies)Restrictions on events and participantRelations among these

Additional mechanismsTemporal anchoring of eventsRanges and alternatives

RepresentationFlat meta-model

Page 11: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

MMIL meta-model (UML)

LevelName:Event

LevelName:Time

LevelName:MMIL

LevelName:Participant

LevelName :NMTOKEN

Struct Node

Associationdependancy

Association

dependancy

Association

dependancy

Association

dependancy

Associationdependancy

Associationdependancy

Associationdependancy

MMIL Level Event Level Time Level Participant Level

0..*

0..*

0..*

0..* 0..* 0..*

0..*1..1 1..1 1..1 0..11..1 1..1 1..1

Page 12: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

Meta-model DatCat Registry

DatCat Specification- DCR subset- Application dependantDatCats

Interoperability conditionsGMT

Dialecti- Expansion trees- DatCat styles +

vocabularies

Semantic Markup Language (e.g. MM IL)

Page 13: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

An overview of data categories

Underlying ontology for a variety of formatsDistinction between abstract definition and implementation (e.g. in XML)Standardization objective: implementing a reference registry for NLP applications

Wider set of DatCats than just semanticsISO 11179 (meta-data registries) as a reference standard for implementing such a registry

Page 14: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

DatCat example: Addressee

/Addressee/Definition: The entity that is the intended hearer of a speech event. The scope of this data category is extended to deal with any multimodal communication event (e.g. haptics and tactile)Source: (implicit) an event, whose evtType should be /Speak/Target: a participant (user or system)

Page 15: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

Styles and vocabularies

Style: design choice to impement a data actegory as an XML element, a database field, etc.Vocabulary: the names to be provided for a given styleE.g. (for /Addressee/)

Style: ElementVocabulary: {“addressee”}

Note:Multilingualism

Page 16: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

Time stamping

/Starting point/• Def: indicates the beginning of the event• Values: dateTime• Anchor: time level

Style: attributeVocabulary: {“startPoint”}

Example<event id="e4">

<evtType>yearPeriod</evtType><lex>1991</lex><tempSpan

startPoint=“1991-01-01T00:00:00”endPoint=“1991-12-31T24:59:59”/>

</event>

Page 17: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

Application: a family of formats

Openness: a requirement for MIAMM

Specific formats for input and output of each moduleEach format is defined within the same generic MMIL framework:• Same meta-model for all• Specific DatCat specification for each

Page 18: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

The MIAMM family of formats

SPIN-OMMF-O AP-O

VisHapTac-OMMF-I

MMIL+

The specifications provide typing information for all these formats

Page 19: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

SPIN-O exampleSpiel mir den lied bitte vor(Please play the song)

e0

e1

p1

destination

evtType=speakdialogueAct=request

evtType=playlex=vorspielen

p2

objectType=user

objType=tunerefType=definiterefStatus=pending

object

propContent

speaker

Page 20: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

<mmilComponent> <event id="e0"> <evtType>speak</evtType>

<dialogueAct>request</dialogueAct><speaker target=“p1“/>

</event> <event id="e1"> <evtType>play</evtType> <lex>vorspielen</lex> </event> <participant id= "p1"> <objType>user</objType> </participant> <participant id= "p2"> <objType>tune</objType> <refType>definite</refType> <refStatus>pending</refStatus> </participant> <relation source="e1" target="e0" type="propContent"/> <relation source=" p1" target="e1" type="destination"/> <relation source= "p2" target="e1" type="object"/></mmilComponent>

Page 21: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

• The use of perceptual grouping

Reference domains and visual contexts

« these three objects »

{ , , }

« the triangle »

{ }

« the two circles » { , }

• The use of salience

Page 22: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

VisHapTac-Oe0

set1

s1

s2

set2

s25

description

s2-1

s2-2

s2-3

inFocus

inSelection

Visual haptic state

Participant setting

Sub-divisions

Page 23: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

VisHapTac output - 1<mmilcomponent>

<event id=“e0”>

<evtType>HGState</evtType>

<visMode>galaxy</visMode>

<tempSpan

startPoint=“2000-01-20T14:12:06”

endPoint=“2002-01-20T14:12:13”/>

</event>

<participant id=“set1”>

</participant>

<relation type=“description” source=“set1” target=“e0”/>

</mmilcomponent>

Page 24: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

VisHapTac output - 2<participant id=“set1”>

…<participant id=”s1”>

<Name>Let it be</Name></participant><participant id=“set2”>

<individuation>set</individuation><attentionstatus>inFocus</attentionstatus><participant id=“s2-1”>

<Name>Lady Madonna</Name></participant>…<participant id=“s2-3”>

<attentionStatus>inSelection</attentionStatus><Name>Revolution 9</Name>

</participant></participant>…

</participant>

Page 25: A comprehensive framework for multimodal meaning representation Ashwani Kumar Laurent Romary Laboratoire Loria, Vandoeuvre Lès Nancy

Conclusion

Most of the properties we wanted are fulfilled:

Uniformity, incrementality, partiality, openness and extensibility

Discussion point:Semantic adequacy:• Not a direct input to an inference system

(except for underlying ontology)• Semantics provided through specification