ontology lexicalisation in collaboration with john mccrae, philipp cimiano (citec, univ. of...

Post on 26-Dec-2015

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Ontology Lexicalisation

In collaboration with John McCrae, Philipp Cimiano (CITEC, Univ. of Bielefeld) Elena Montiel-Ponsado (Universidad Politecnica Madrid) and other Monnet

partners

Copyright 2010 Digital Enterprise Research Institute. All rights reserved, Paul Buitelaar

Paul Buitelaar

Unit for Natural Language ProcessingDigital Enterprise Research Institute - National University of Ireland, Galway

What is this talk about?

Ontology Lexicalisation Integrating ontologies (knowledge representation about

objects) and lexicons (knowledge representation about words that refer to objects)

Enriching ontologies with a lexical layer

Defining an Ontology for Lexicons Defining a formal model (ontology) for representing

lexical information relative to independently defined ontological semantics of concepts denoted by this lexicon

Formal model for web-based, modular, distributed lexicons

Use Cases of Ontology Lexicalisation

Ontology-based Information Extraction from text

Ontology Learning from text

Lexical methods in Ontology Alignment

Ontology Verbalisation

Ontology Localisation

Ontology-based Information Extraction

>> ontology-text mismatch – is this a good match? (no)

Ontology: Recurso-comercial

Text: recurso por las licencias comerciales

Cross-lingual Ontology-based IE

>> cross-lingual meaning mismatch

Ontology (es): Recurso-comercial

Text (en): Commercial Appeal (of Communism …)

SKOS - Multilingual Information

SKOS - Multilingual Information

Not much uptake yet? from http://data.nytimes.com/

Ontology-Text Mismatch

‘Edificio-historico’ vs. ‘…edificio, declarado Monumento Histórico…’

>> goes beyond SKOS (monolingual & multilingual term variants)

>> requires representation of lexical information to compute linguistic variants, e.g.

‘edificio historico[apposVP[NP[Adj]]]’

A Lexicon Model for Ontologies

Requirements for ‘ontology-lexicon’ model Represent linguistic information relative to ontology

– Avoid unnecessary ambiguities by representing only lexical features relevant to semantics of underlying application

Keep semantics separate from linguistic info

– Separate clearly ‘world’ (properties of objects referred to by words) from ‘word’ (properties of words) knowledge

Modular, minimal design

– Provide simple core model that can be easily extended upon need

Was there a solution already? - SKOS

Simple Knowledge Organization System – SKOS General model for formalizing thesauri, terminologies and

related semantic and knowledge resources

Formalization of terminology in focus - terminology, classification, Semantic Web communities

Does not address linguistic aspects of terminology, or therefore, the lexicon-ontology interface

http://www.w3.org/2004/02/skos/

Was there a solution already? - GOLD

General Ontology for Linguistic Description – GOLD Community-based ontology of linguistics

Linguistic study in focus - linguistics community

Formal model of linguistics as an ontology, but not about connecting lexical features to ontological semantics

Other issues: very big, modularity?

http://linguistics-ontology.org/gold/2010

Was there a solution already? - OWN

OntoWordNet – OWN Formal specification of WordNet through extension and

axiomatization of its conceptual relations

Formal knowledge representation in focus - logic, knowledge representation, Semantic Web communities

Turns WordNet into an ontology but not about connecting lexical features to ontological semantics

http://wiki.loa-cnr.it/index.php/LoaWiki:OWN

Was there a solution already? - LMF

Lexical Markup Framework – LMF General model for formalizing and sharing of machine-

readable dictionaries

Lexical knowledge representation in focus - lexicography, NLP communities

Very close to ontology-lexicon requirements, but no view on how lexical features link to ontological semantics – semantics is limited to a notion of sense based on synsets

Other issues: incomplete formal model, focus on classes, less on properties/relations

http://www.lexicalmarkupframework.org/

lemon

lexicon model for ontologies: ‘lemon’ General model for formalizing lexical features relative to

independently defined ontological semantics

Two-level modelling Abstract level (meta-model): lemon

Instantiation level (lexicon model): e.g. ‘LexInfo2’

http://lexinfo.net/

lemon: Overview

LexicalEntry can be a Word, Phrase, or Part - such as an Affix

lemon: Lexicon

lemon: Form

LexicalForm can be, e.g., lemma (canonicalForm), plural form (otherForm), stem (abstractForm)

lemon: Structure

LexicalEntry can be decomposed into one or more Components and compositional structure can be represented

lemon: Structure - Example

lemon: Meaning & Reference

LexicalSense is an underspecified sense that points to a language-external reference, a unique ontological semantic object, depending on conditions and context

LexicalSense can have a subsense and senseRelation with other LexicalSense sememe relation between LexicalSense and ontological semantic object can be either of pref/alt/hiddenSem

lemon: Meaning & Reference - Examples

lemon: Lexical Projection

LexicalEntry can introduce a syntactic frame with arguments that are mapped to LexicalSense and indirectly to ontological semantic objects/properties

lemon: Lexical Projection - Example

lemon in Use Ontology-Lexicon Generator

Generate a lexicon for a given ontology in RDF/OWL format

http://monnetproject.deri.ie/osgi/DemoLexiconGenerator

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix lemon: <http://www.monnet-project.eu/lemon#> .@prefix financeV4: <http://fadyart.com/financeV4#> .@prefix lexinfo: <http://www.lexinfo.net/ontology/2.0/lexinfo#> .@prefix pennbank: <http://www.monnet-project.eu/pennbank#> .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .…<file:test#assetbackeddebt> lemon:phraseRoot [ lemon:edge [ lemon:edge [ lemon:edge [ lemon:leaf _:n6 ] ; lemon:constituent pennbank:NNP ] ; lemon:constituent pennbank:NP ] , [ lemon:edge [ lemon:edge [ lemon:leaf _:n88 ] ; lemon:constituent pennbank:VBD ] , [ lemon:edge [ lemon:edge [ lemon:leaf _:n69 ] ; lemon:constituent pennbank:NN ] ; lemon:constituent pennbank:NP ] ; lemon:constituent pennbank:VP ] ; lemon:constituent pennbank:S ] ; lemon:decomposition ( _:n6 _:n88 _:n69 ) ; lemon:sense [ lemon:reference financeV4:AssetBackedDebt ] ; lemon:canonicalForm [ lemon:writtenRep "Asset backed debt"@en ] .…

<file:test#back> lexinfo:partOfSpeech lexinfo:verb ; lemon:canonicalForm [ lexinfo:tense lexinfo:past ; lexinfo:verbFormMood lexinfo:indicative ; lemon:writtenRep "backed"@en ; lexinfo:aspect lexinfo:perfective ] .

_:n88 rdf:type lemon:Component ; lexinfo:tense lexinfo:past ; lemon:element <file:test#back> ; lexinfo:verbFormMood lexinfo:indicative ; lexinfo:aspect lexinfo:perfective .

Lexical Linked Data

lemon is a web-based ontology, i.e., based on Uniform Resource Identifiers (URI) Therefore all objects described by it are uniquely identifiable on

the web

And can therefore be interlinked in a flexible, modular and distributed way

Making lemon-based lexicons part of the Web of Data, as currently defined by the ‘Linked Open Data cloud’

Lexical Linked Data – LOD cloud

Lexical Linked Data - Implications

lemon objects (lexicons, lexical entries, words, phrases, forms, variants, senses, references, etc.) can be maintained uniquely (only one URI for each lemon object) but in a distributed fashion (maintenance by various parties)

lemon objects can be interlinked upon need, creating layers of lexical structure defined formally by selected links

with growing legacy of collaborative, formal definition of lexical structure (through use in applications), meta-level analysis of lemon objects will become object of study for lexicography and linguistics

ontology development can build on and plug-in formal lexical structures in specific application domains

collaborative web-based ontological knowledge development and lexicon development will go hand-in-hand

What happens next?

lemon W3C Incubator Group planned

Experimentation, Dissemination

YOUR input/feedback

Lexical Linked Data Develop infrastructures to support/exploit this

Envision drastically novel applications in linguistic study and product development

Acknowledgements & Further Info Monnet colleagues

In particular John McCrae of CITEC, University of Bielefeld, Germany who leads the lemon effort in Monnet

Grant support

EU FP7 Grant No. 248458 for the Monnet project on Multilingual Ontologies for Networked Knowledge

Science Foundation Ireland Grant No. SFI/08/CE/I1380 for Lion-2 http://nlp.deri.ie/

Further info

lemon: http://lexinfo.net

http://www.monnet-project.eu & http://twitter.com/monnetproject

Monnet Community – contact me: paul.buitelaar@deri.org

top related