linguistic and semantic information for the semantic web lrec 2004, iso working group on the...

28
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information Linguistic and Semantic Information for the Semantic Web A Multi-Layered, XML-Based Approach to the Integration of Linguistic and Semantic Annotations Thierry Declerck, Paul Buitelaar University of the Saarland & DFKI GmbH Saarbrücken, Germany In this presentation are also slides and graphics included, which are taken from three presentations at the EUROLAN 2003 in Bucharest. Authors are P.Vossen (Wordnet, EuroWordNet, Global Wordnet), A. Lenci (Computational Lexicons and the Semantic Web) and Srini Narayanan (FrameNet Meets the Semantic Web). Also included are graphics from M. Fernández-López and A. Gómez-Pérez Asun Gomez Perez (UPM) from the deliverable 1.2 of the Esperonto Project

Upload: robert-hamilton

Post on 18-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

A Multi-Layered, XML-Based Approach to the Integration of Linguistic and Semantic

AnnotationsThierry Declerck, Paul BuitelaarUniversity of the Saarland & DFKI GmbH

Saarbrücken, Germany

In this presentation are also slides and graphics included, which are taken from three presentations at the EUROLAN 2003 in Bucharest. Authors are P.Vossen (Wordnet, EuroWordNet, Global Wordnet), A. Lenci (Computational Lexicons and the Semantic Web) and Srini Narayanan (FrameNet Meets the Semantic Web). Also included are graphics from M. Fernández-López and A.

Gómez-Pérez Asun Gomez Perez (UPM) from the deliverable 1.2 of the Esperonto Project

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

Overview

Semantic Web Applications of LT Annotation of Web Documents with Ontology-

based Metadata (Knowledge Markup) Ontology Learning through Text Mining from

Annotated Corpora

Integration of Annotations Use of Different Tools Use of Different Knowledge Sources

Motivations

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

Overview

… Linguistic and Semantic Annotations Linguistic: e.g. PoS, Lemma, Phrase Structure Semantic: e.g. Concepts, Relations, Events

Objectives: Integration of…

… Annotations from Different Resources e.g. Different Domains… Annotations in Different Formats e.g. from Different Tools

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

Knowledge Markup and Knowledge Extraction

 

Text/Speech/Image-VideoText/Speech/Image-Video

Text/Speech/Media Mining

Concepts, Relations, EventsConcepts, Relations, Events

Linguistic and Media Analysis

Linguistic, Low-level Image and Semantic Annotations

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

AnnotationsProjects, Tools and Resources

Projects MuchMore: Cross-lingual Information Retrieval, Medical Domain Mumis: Content-based Multimedia Retrieval, Soccer DomainTools and Resources MuchMore: Integration of Shprot (TnT, Mmorph, Chunkie) with

Semantic Tagging Tools (UMLS – Medical Semantic Resource, EuroWordNet)

Mumis: Schug, Integration of SPPC with Rule-based Chunking and Shallow Dependency

Analysis, Event Structure (Mumis Soccer Ontology)

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

document sentence

umlsterms

xrceterms

ewnterms

semrels

gramrels

chunks

text

cui

sense

umlsterm

xrceterm

ewnterm

semrel

gramrel

chunk

token

to

id from

to

offset

from

id

code

typeterm2term1id

pref tui

code pref tui

type

id

to

id from

type

id pos

lemma

msh

cui msh

AnnotationsMuchMore

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

Balint syndrom is a combination of symptoms including simultanagnosia, a disorder of spatial and object-based attention, disturbed spatial perception and representation, and optic ataxia resulting from bilateral parieto-occipital lesions.

<text> <token id="w1" pos="NN">Balint</token> <token id="w2" pos="NN">syndrom</token> <token id="w3" pos="VBZ" lemma="be">is</token> <token id="w4" pos="DT" lemma="a">a</token> <token id="w5" pos="NN" lemma="combination">combination</token> <token id="w6" pos="IN" lemma="of">of</token> <token id="w7" pos="NNS" lemma="symptom">symptoms</token> ... <token id="w20" pos="JJ" lemma="spatial">spatial</token> <token id="w21" pos="NN" lemma="perception">perception</token> <token id="w22" pos="CC" lemma="and">and</token> <token id="w23" pos="NN" lemma="representation">representation</token> ...</text>

<chunks><chunk id="c1" from="w1" to="w2" type="NP"/><chunk id="c7" from="w20" to="w23" type="NP"/></chunks>>

AnnotationsMuchMore: Linguistic

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

Balint syndrom is a combination of symptoms including simultanagnosia, a disorder of spatial and object-based attention, disturbed spatial perception and representation, and optic ataxia resulting from bilateral parieto-occipital lesions.

<umlsterm id="t7" from="w20" to="w21"><concept id="t7.1" cui="C0037744" preferred="Space Perception" tui="T041"> <msh code="F2.463.593.778"/> <msh code="F2.463.593.932.869"/></concept>

</umlsterm>

<umlsterm id="t8" from="w26" to="w26"><concept id="t8.1" cui="C0029144" preferred="Optics" tui="T090"> <msh code="H1.671.606"/></concept>

</umlsterm>

<semrel id="r7" term1="t7.1" term2="t8.1" reltype="issue_in"/>

<ewnterm id="e2" from="w21" to="w21"><sense offset="0487490"/><sense offset="3955418"/><sense offset="4002483"/>

</ewnterm>

AnnotationsMuchMore: Semantic

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

Document SentenceParagraph

PP

VG

NP

NE

AP

AdvP

Subord-Clause

AnnotationsMumis

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

AP

TYPE

STRUK

AP_AGR

STRING

AP_HEADW

AnnotationsMumis

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

VG

TYPE

VG_SUBCAT_STEM

STRING

KLAMMER

VG_STRG

SENT_STRING

VG_TYPE

VG_AGR

STRUK

VG_HEAD

...

VG

W

AnnotationsMumis

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

W

INFL

STRING

CLAUSE_PRED_SUBCAT

CLAUSE_PP_LIST

...

CLAUSE_TYPE

TC

CLAUSE_SUBJ

CLAUSE_PRED_STRG

STEM

TYPE

SENT_STRING

CLAUSE_VG_LIST

CLAUSE_PRED_AGR

CLAUSE

POS

CLAUSE_PP_ADJUNKT

CLAUSE_NP_LIST

AnnotationsMumis

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

AnnotationsIntegration

Objectives Integrate Linguistic and Semantic Information from the MuchMore and Mumis Annotations, e.g.

Enrich MuchMore: Head/Complement of Chunks, Clauses Enrich Mumis: EuroWordNet, Medical Ontology

Approach MuchMore uses Multilayered Annotation over Indexes (‘standoff’) Introduce Mumis Annotations as Additional LayersProblems Integration of Overlapping Layers (i.e. Additional Attributes)

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

Industrie, Handel und Dienstleistungen werden in der ersten Liste aufgeführt, wobei die in Klammern gesetzten Zahlen auf die Mutterfirmen hinweisen.

(Industry, trade and services are mentioned in the first list, in which numbers within brackets point to parent companies.) <chunks> <chunk id="c1" from="w1" to="w5" type="NP" head=”w1,w3,w5”/> <chunk id="c2" from="w6" to="w6" type="VG"/> <chunk id="c3" from="w7" to="w10" type="PP" head=”w7” complement=”w8,w9,w10”/> <chunk id="c4" from="w11" to="w1" type="VG"/> ….</chunks> <clauses> <clause id="cl1" from="c1" to="c4" pred_struct="c2 c4" GF_Subj="c1"/> <clause id="cl2" from="c6" to="c9" pred_struct="c9" GF_Subj="c6"/></clauses>

AnnotationsMumis: Linguistic

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

Ein Freistoss von Christian Ziege aus 25 Metern geht über das Tor. (A 25-meter free-kick by Christian Ziege goes over the goal.) <clauses> <clause id="cls1" from="c1" to="c4" pred_struct="c3" GF_Subj="c1"/></clauses> <events> <event id="e1" clause=”cls1” event-name=”free-kick”> <arguments> <argument id="arg1" name="player” value=”w4, w5”/> <argument id="arg2" name="location” value=”25-meter”/> <argument id="arg3" name="time” value=”07:00”/> </arguments> </event>

<event id="e2" clause=”cls1” event-name=”goal-scene-fail”> <arguments> <argument id="arg1" name="player” value=”w4, w5”/> <argument id="arg2" name="location” value=”25-meter”/> <argument id="arg3" name="time” value=”07:00”/> </arguments> </event></events>

AnnotationsMumis: Semantic

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

Conclusions MuchMore and MUMIS

Work in Progress Development of Compatibility between the Formats Full Integration of the Formats

Possible Future Work Integration of the Formats on a more Abstract Level, i.e. by Use of Data Categories as Specified by ISO/TC37/SC4

Separating Text Data from Annotation. Multiple pointing to Annotations.

Extension to Multimedia

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

Esperonto: OverviewApplications

Router

Agent

XML DAML OIL RDF(S)

Certificate

Workbench

Maintenance

Multilinguality

Reengineering

Mapping

OntologyRepository

Service

Tagger/Wrapper

Web ServerProvider

DynamicInformation Provider

StaticInformation Provider

Multimedia DataProvider

Multilingual NL

Understanding

World Wide Web

Semantic Web

VisualizationServiceProvider

SemASP

MultilingualNL

Generation

PortalAgent

Tagger/Wrapper

Tagger/Wrapper

Tagger/Wrapper

Router

Router

Router

Router

Semantic indices, Concept instances

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

Ontologies (Classification)

Lassila and McGuinness [Lassila and McGuinness, 2001] categorization

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

Ontologies(classification)

Van Heijst and colleagues [Van Heijst et al., 1997] categorization

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

Knowledge Architecture

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

Esperonto Knowledge Architecture

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

Abstracting over Linguistic Information in Esperonto

 

Ontology_1: NPHead:NMod: {Adj*,PP?}Spec: {Det? PossPron}Type: {RefNP, ProNP, DateNP,etc.}

Ontology_2: PP Head: PrepType: {LocPP,DatePP, etc.}

Comp: NP

Ontology_3:Grammatical FunctionsSubject, Object, Ind. ObjectNP Adjunct, PP Adjunct, etc..

Ontology_4: Dependencies Head Comp Mod Spec

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

From WordNet to EuroWordNet

 

voorwerp{object}

lepel{spoon}

werktuig{tool}

tas{bag}

bak{box}

blok{block}

lichaam{body}

Wordnet1.5 Dutch Wordnet

bagspoonbox

object

natural object (an object occurring naturally)

artifact, artefact (a man-made object)

instrumentality block body

container

deviceimplement

tool instrument

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

Relations of EWN to Top-Level Ontologies

 

ReferenceOntologyClasses: BOXContainerProduct;SolidTangibleThing

Language-Neutral Ontology

object

box

container

box

container

WordNet1.5

Language-Specific Wordnets

doos

voorwerp

Dutch Wordnet

EuroWordNet Top-Ontology:Form: CubicFunction: ContainOrigin: ArtifactComposition: Whole

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

Framenet: Events in Syntactic Context

 

eventsartifacts, built objectsnatural kinds, parts and aggregatesinstitutions, belief systems, practicesspace, time, location, motionetc.

Let us take a commercial transaction as an example of an event. The following (partial) wordlist is showing lexical realization of the event: Verbs: pay, spend, cost, buy, sell, charge Nouns: cost, price, payment Adjectives: expensive, cheap

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

Semantic and Domain Specific Information in the Simple/Parole Framework

 

semantic frame

semantic relations

ontology

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

Combining Ontological and “Linguistic Ontology” (EWN, Parole/Simple)

 

<lex-element id="ID" concept="Shot-on-goal"> <...lang = "DE" type = "main„ ewn=”[digit+]]pos = „N“ mod = {„von concept = „Player“ | concept = „player“ ewn=”[digit+] gender = „gen“ | pos = „posspron“ } >Torschuss</term> <...lang="DE" type="synonym„ ewn=[[digit+] pos = „V“ comp = {„SUBJ“ concept = „Player“} >abzieh</term> <definition>URL: DFB home page/glossary</definition></lex-element>

LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information

Linguistic and Semantic Information for the Semantic Web

Actual Work

• Including FrameNet for 3 Languages.• Including new semantic classes for Adj., Adverbs,

Polarity etc.• New improved annotation schema for

syntactic/Semantic annotation• A declarative set of mapping rule Linguistic

Ontology (domain ontologies). The Onto-LT frameowrk (see paper by P. Buitelaar & al at LREC).