ontology lexicalisation in collaboration with john mccrae, philipp cimiano (citec, univ. of...

29
Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid) and other Monnet partners Copyright 2010 Digital Enterprise Research Institute. All rights Paul Buitelaar Unit for Natural Language Processing Digital Enterprise Research Institute - National University of Ireland, Galway

Upload: brandon-quinn

Post on 26-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

Ontology Lexicalisation

In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid) and other Monnet

partners

Copyright 2010 Digital Enterprise Research Institute. All rights reserved, Paul Buitelaar

Paul Buitelaar

Unit for Natural Language ProcessingDigital Enterprise Research Institute - National University of Ireland, Galway

Page 2: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

What is this talk about?

Ontology Lexicalisation Integrating ontologies (knowledge representation about

objects) and lexicons (knowledge representation about words that refer to objects)

Enriching ontologies with a lexical layer

Defining an Ontology for Lexicons Defining a formal model (ontology) for representing

lexical information relative to independently defined ontological semantics of concepts denoted by this lexicon

Formal model for web-based, modular, distributed lexicons

Page 3: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

Use Cases of Ontology Lexicalisation

Ontology-based Information Extraction from text

Ontology Learning from text

Lexical methods in Ontology Alignment

Ontology Verbalisation

Ontology Localisation

Page 4: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

Ontology-based Information Extraction

>> ontology-text mismatch – is this a good match? (no)

Ontology: Recurso-comercial

Text: recurso por las licencias comerciales

Page 5: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

Cross-lingual Ontology-based IE

>> cross-lingual meaning mismatch

Ontology (es): Recurso-comercial

Text (en): Commercial Appeal (of Communism …)

Page 6: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

SKOS - Multilingual Information

Page 7: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

SKOS - Multilingual Information

Not much uptake yet? from http://data.nytimes.com/

Page 8: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

Ontology-Text Mismatch

‘Edificio-historico’ vs. ‘…edificio, declarado Monumento Histórico…’

>> goes beyond SKOS (monolingual & multilingual term variants)

>> requires representation of lexical information to compute linguistic variants, e.g.

‘edificio historico[apposVP[NP[Adj]]]’

Page 9: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

A Lexicon Model for Ontologies

Requirements for ‘ontology-lexicon’ model Represent linguistic information relative to ontology

– Avoid unnecessary ambiguities by representing only lexical features relevant to semantics of underlying application

Keep semantics separate from linguistic info

– Separate clearly ‘world’ (properties of objects referred to by words) from ‘word’ (properties of words) knowledge

Modular, minimal design

– Provide simple core model that can be easily extended upon need

Page 10: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

Was there a solution already? - SKOS

Simple Knowledge Organization System – SKOS General model for formalizing thesauri, terminologies and

related semantic and knowledge resources

Formalization of terminology in focus - terminology, classification, Semantic Web communities

Does not address linguistic aspects of terminology, or therefore, the lexicon-ontology interface

http://www.w3.org/2004/02/skos/

Page 11: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

Was there a solution already? - GOLD

General Ontology for Linguistic Description – GOLD Community-based ontology of linguistics

Linguistic study in focus - linguistics community

Formal model of linguistics as an ontology, but not about connecting lexical features to ontological semantics

Other issues: very big, modularity?

http://linguistics-ontology.org/gold/2010

Page 12: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

Was there a solution already? - OWN

OntoWordNet – OWN Formal specification of WordNet through extension and

axiomatization of its conceptual relations

Formal knowledge representation in focus - logic, knowledge representation, Semantic Web communities

Turns WordNet into an ontology but not about connecting lexical features to ontological semantics

http://wiki.loa-cnr.it/index.php/LoaWiki:OWN

Page 13: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

Was there a solution already? - LMF

Lexical Markup Framework – LMF General model for formalizing and sharing of machine-

readable dictionaries

Lexical knowledge representation in focus - lexicography, NLP communities

Very close to ontology-lexicon requirements, but no view on how lexical features link to ontological semantics – semantics is limited to a notion of sense based on synsets

Other issues: incomplete formal model, focus on classes, less on properties/relations

http://www.lexicalmarkupframework.org/

Page 14: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

lemon

lexicon model for ontologies: ‘lemon’ General model for formalizing lexical features relative to

independently defined ontological semantics

Two-level modelling Abstract level (meta-model): lemon

Instantiation level (lexicon model): e.g. ‘LexInfo2’

http://lexinfo.net/

Page 15: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

lemon: Overview

Page 16: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

LexicalEntry can be a Word, Phrase, or Part - such as an Affix

lemon: Lexicon

Page 17: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

lemon: Form

LexicalForm can be, e.g., lemma (canonicalForm), plural form (otherForm), stem (abstractForm)

Page 18: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

lemon: Structure

LexicalEntry can be decomposed into one or more Components and compositional structure can be represented

Page 19: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

lemon: Structure - Example

Page 20: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

lemon: Meaning & Reference

LexicalSense is an underspecified sense that points to a language-external reference, a unique ontological semantic object, depending on conditions and context

LexicalSense can have a subsense and senseRelation with other LexicalSense sememe relation between LexicalSense and ontological semantic object can be either of pref/alt/hiddenSem

Page 21: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

lemon: Meaning & Reference - Examples

Page 22: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

lemon: Lexical Projection

LexicalEntry can introduce a syntactic frame with arguments that are mapped to LexicalSense and indirectly to ontological semantic objects/properties

Page 23: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

lemon: Lexical Projection - Example

Page 24: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

lemon in Use Ontology-Lexicon Generator

Generate a lexicon for a given ontology in RDF/OWL format

http://monnetproject.deri.ie/osgi/DemoLexiconGenerator

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix lemon: <http://www.monnet-project.eu/lemon#> .@prefix financeV4: <http://fadyart.com/financeV4#> .@prefix lexinfo: <http://www.lexinfo.net/ontology/2.0/lexinfo#> .@prefix pennbank: <http://www.monnet-project.eu/pennbank#> .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .…<file:test#assetbackeddebt> lemon:phraseRoot [ lemon:edge [ lemon:edge [ lemon:edge [ lemon:leaf _:n6 ] ; lemon:constituent pennbank:NNP ] ; lemon:constituent pennbank:NP ] , [ lemon:edge [ lemon:edge [ lemon:leaf _:n88 ] ; lemon:constituent pennbank:VBD ] , [ lemon:edge [ lemon:edge [ lemon:leaf _:n69 ] ; lemon:constituent pennbank:NN ] ; lemon:constituent pennbank:NP ] ; lemon:constituent pennbank:VP ] ; lemon:constituent pennbank:S ] ; lemon:decomposition ( _:n6 _:n88 _:n69 ) ; lemon:sense [ lemon:reference financeV4:AssetBackedDebt ] ; lemon:canonicalForm [ lemon:writtenRep "Asset backed debt"@en ] .…

<file:test#back> lexinfo:partOfSpeech lexinfo:verb ; lemon:canonicalForm [ lexinfo:tense lexinfo:past ; lexinfo:verbFormMood lexinfo:indicative ; lemon:writtenRep "backed"@en ; lexinfo:aspect lexinfo:perfective ] .

_:n88 rdf:type lemon:Component ; lexinfo:tense lexinfo:past ; lemon:element <file:test#back> ; lexinfo:verbFormMood lexinfo:indicative ; lexinfo:aspect lexinfo:perfective .

Page 25: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

Lexical Linked Data

lemon is a web-based ontology, i.e., based on Uniform Resource Identifiers (URI) Therefore all objects described by it are uniquely identifiable on

the web

And can therefore be interlinked in a flexible, modular and distributed way

Making lemon-based lexicons part of the Web of Data, as currently defined by the ‘Linked Open Data cloud’

Page 26: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

Lexical Linked Data – LOD cloud

Page 27: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

Lexical Linked Data - Implications

lemon objects (lexicons, lexical entries, words, phrases, forms, variants, senses, references, etc.) can be maintained uniquely (only one URI for each lemon object) but in a distributed fashion (maintenance by various parties)

lemon objects can be interlinked upon need, creating layers of lexical structure defined formally by selected links

with growing legacy of collaborative, formal definition of lexical structure (through use in applications), meta-level analysis of lemon objects will become object of study for lexicography and linguistics

ontology development can build on and plug-in formal lexical structures in specific application domains

collaborative web-based ontological knowledge development and lexicon development will go hand-in-hand

Page 28: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

What happens next?

lemon W3C Incubator Group planned

Experimentation, Dissemination

YOUR input/feedback

Lexical Linked Data Develop infrastructures to support/exploit this

Envision drastically novel applications in linguistic study and product development

Page 29: Ontology Lexicalisation In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid)

Acknowledgements & Further Info Monnet colleagues

In particular John McCrae of CITEC, University of Bielefeld, Germany who leads the lemon effort in Monnet

Grant support

EU FP7 Grant No. 248458 for the Monnet project on Multilingual Ontologies for Networked Knowledge

Science Foundation Ireland Grant No. SFI/08/CE/I1380 for Lion-2 http://nlp.deri.ie/

Further info

lemon: http://lexinfo.net

http://www.monnet-project.eu & http://twitter.com/monnetproject

Monnet Community – contact me: [email protected]