roadmap for a multilingual bioportal

29
Roadmap for a multilingual BioPortal Clement Jonquet ([email protected] ), Vincent Emonet ([email protected] ) & Mark A. Musen ([email protected] ) 4 th workshop on the Multilingual Semantic Web Portoroz, Slovenia – June 1 st 2015

Upload: clement-jonquet

Post on 17-Aug-2015

304 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Roadmap for a multilingual BioPortal

Roadmap for a

multilingual BioPortal

Clement Jonquet ([email protected]),Vincent Emonet ([email protected]) &Mark A. Musen ([email protected])

4th workshop on the Multilingual Semantic Web

Portoroz, Slovenia – June 1st 2015

Page 2: Roadmap for a multilingual BioPortal

A few introduction words

4th workshop on the Multilingual

Semantic Web

Page 3: Roadmap for a multilingual BioPortal

Context:

increasing number of biomedical data

+ multilingualism

Limits of keyword-based indexing

Biomedical community has turned to ontologies to describe their

data and turn them into structured and formalized knowledge

Using ontologies is by means of creating semantic annotations

Crucial need for tools & services for French biomedical data

Biomedical data integration challenge

New potential sceintific discoveries hidden in data

Translational research

4th workshop on the Multilingual

Semantic Web

Page 4: Roadmap for a multilingual BioPortal

Biologist have adopted

ontologies To provide canonical representation of scientific

knowledge

To annotate experimental data to enable interpretation, comparison, and discovery across databases

To facilitate knowledge-based applications for

Decision support

Natural language-processing

Data integration

But ontologies are: spread out, in different formats, of different size, with different structures

In different “languages”

4th workshop on the Multilingual

Semantic Web

Page 5: Roadmap for a multilingual BioPortal

Working with terminologies &

ontologies – a portal please!

You’ve built an ontology, how do you let the world know?

You need an ontology, where do you go o get it?

How do you know whether an ontology is any good?

How do you find resources that are relevant to the domain of the ontology (or to specific terms)?

How could you leverage your ontology to enable new science?

How could you use ontologies without managing them ?

4th workshop on the Multilingual

Semantic Web

Page 6: Roadmap for a multilingual BioPortal

A few words about

BioPortal

4th workshop on the Multilingual

Semantic Web

Page 7: Roadmap for a multilingual BioPortal

Bioportal : A “one stop shop”

for Biomedical Ontologies

Web repository for biomedical ontologies

Make ontologies accessible and usable – abstraction on format, locations, structure, etc.

Users can publish, download, browse, search, comment, align ontologies and use them for annotations both online and via a web services API.

Online support for ontology

Peer review

Notes (comments and discussion)

Versioning

Mapping

Search

Resources

4th workshop on the Multilingual

Semantic Web

Page 8: Roadmap for a multilingual BioPortal

http://bioportal.bioontology.org

BioPortal Ontology Repository

Page 9: Roadmap for a multilingual BioPortal

htt

p:/

/dat

a.b

ioo

nto

logy

.org

Ontology Services

• Search• Traverse• Comment• Download

Widgets• Tree-view• Auto-complete• Graph-view

Annotation

Data Access

Mapping Services

• Create• Upload• Download

Term recognition

Search “data”annotated with a given term

http://bioportal.bioontology.org4th workshop on the Multilingual

Semantic Web

Page 10: Roadmap for a multilingual BioPortal

Status of multilingualism in

BioPortal

Does accept (and parse) both multilingual ontologies and

monolingual ontologies

sometime represented as views

No leveraging of multilingual structure and content

inclusion/exclusion of labels in different languages in the use of the

services the portal offers e.g., Annotator

No t capable to reconcile and deal with the multilingual mappings

Not use a proper mechanism to identify the language property(ies)

of an ontology

Not support relationships between ontologies in different languages

(or in general)

Does not support any internationalization.

whole UI exists only in English

4th workshop on the Multilingual

Semantic Web

Page 11: Roadmap for a multilingual BioPortal

A few words about words

4th workshop on the Multilingual

Semantic Web

Page 12: Roadmap for a multilingual BioPortal

multilingual

ontology

4th workshop on the Multilingual

Semantic Web

en:diseasefr:maladie

...en:cancerfr:cancer

en:spindel cell sarcomefr:sarcome à cellules fusiformes

en:melanomafr:mélanome

disease

... cancer

spindle cell sarcome melanoma

maladie

... cancer

sarcome à cellules fusiformes

mélanome

language specific ontology (monolingual)

Page 13: Roadmap for a multilingual BioPortal

Ontology language &

translation

Natural language = the language (French, English, Spanish,

etc.) used when building a language specific ontology

Format language = used to describe the ontology (OWL,

RDFS, RRF, etc.)

Translation = relation between two language specific

ontologies that represent mainly the same object (domain,

topics, set of concepts and relations)

4th workshop on the Multilingual

Semantic Web

Page 14: Roadmap for a multilingual BioPortal

Multilingual mappings

Mapping (or alignment) = a correspondence between concepts in different ontologies

Multilingual mapping = a concept mapping between 2 language specific ontologies

Multilingual translation mapping = additionally the 2 concerned language specific ontologies are a translation of one another

For instance,

Mesh/melanoma has a mapping to DOID/melanoma

Mesh-fr/mélanome has multilingual mapping to DOID/melanoma

Mesh/melanoma has a multilingual translation mapping to Mesh-fr/mélanome

4th workshop on the Multilingual

Semantic Web

Page 15: Roadmap for a multilingual BioPortal

What is being multilingual?

Interface internationalization = displaying static elements of

the user interface (e.g., menu names, help, etc.) in

different languages

Content internationalization = displaying BioPortal content

(e.g., ontology labels, mappings, etc.) in different languages

Multilingual = internationalization (display) + to enabling a

complete use of the functionalities and services of BioPortal

for multilingual ontologies or monolingual ontologies

completely and properly addressed (languages, translations,

multilingual mappings, etc.)

rich semantic description

4th workshop on the Multilingual

Semantic Web

Page 16: Roadmap for a multilingual BioPortal

A few propositions for

multilingual BioPortal

4th workshop on the Multilingual

Semantic Web

Page 17: Roadmap for a multilingual BioPortal

Representation of natural

language property for an ontology

Reuse the OMV (http://omv2.sourceforge.net) is already

imported and used in BioPortal Metadata ontology

(http://bioportal.bioontology.org/ontologies/BP-METADATA)

omv:naturalLanguage

4th workshop on the Multilingual

Semantic Web

Page 18: Roadmap for a multilingual BioPortal

Representation of the distinction

between ontologies

Extend OMV within BioPortal Metadata to include and

formalize the distinction

4th workshop on the Multilingual

Semantic Web

meta:MultilingualOntology

rdfs:subClassOf omv:Ontology

omv:naturalLanguage some Literal

meta:LanguageSpecificOntology

rdfs:subClassOf omv:Ontology

omv:naturalLanguage exactly 1 literal

Page 19: Roadmap for a multilingual BioPortal

Representation of relation

between ontologies

Extend the DOOR ontology (http://kannel.kmi.open.ac.uk)

A translated ontology is a specific evolution of the ontology with

a different syntax (an equivalent ontology but in another

language)

new property in BioPortal metadata

4th workshop on the Multilingual

Semantic Web

Page 20: Roadmap for a multilingual BioPortal

4th workshop on the Multilingual

Semantic Web

meta:isTranslationOf

Page 21: Roadmap for a multilingual BioPortal

Representation of

multilingual mappings

Keep a single and simple model as the one BioPortal already

provides to represent any mappings

as any other mapping, but with a specific relation (non exclusive)

Reuse standard properties to represent translations

the LEMON translation module (direct|cultural|lexicalEquivalent)

the GOLD ontology (free|literalTranslation)

4th workshop on the Multilingual

Semantic Web

disease

... cancer

spindle cell sarcome melanoma

maladie

... cancer

sarcome à cellules fusiformes

mélanome

gold:freeTranslation

gold:literalTranslation

Page 22: Roadmap for a multilingual BioPortal

Reconciliation of multilingual

mappings

Methods to extract multilingual (translation) mappings

between (translated) ontologies and then reconcile them

into BioPortal mapping repository

Approaches

Via term code when they are the same

Extraction from a meta-thesaurus such as UMLS

Extraction from external mapping databases e.g. CISMEF

Using existing monolingual mappings

Using language parallel data resources

Etc.

4th workshop on the Multilingual

Semantic Web

Page 23: Roadmap for a multilingual BioPortal

Overall representation of

multilingual content

4th workshop on the Multilingual

Semantic Web

Page 24: Roadmap for a multilingual BioPortal

A few elements of

discussion

4th workshop on the Multilingual

Semantic Web

Page 25: Roadmap for a multilingual BioPortal

Important for the Web of

tomorrow

Multilingualism is an important issue in the explosion of data

being released and linked over the Web today

The vision of the semantic web is to be able to leverage and

interoperate data whatever natural language these data is

available into

Make ontology repository multilingual and thus making

ontologies inside the repositories multilingual

4th workshop on the Multilingual

Semantic Web

Page 26: Roadmap for a multilingual BioPortal

Language reflects cultural

difference An ontology corresponds to an interpretation of a certain

reality done by a group of people at a certain time

Language => cultural differences => conceptual differences

When the sociological and cultural differences are important,

the effect on the knowledge formalized is also important

4th workshop on the Multilingual

Semantic Web

traitement de données

transfer de données

téléchargement

data process

data transfer

upload download sideload

Page 27: Roadmap for a multilingual BioPortal

What is the challenge?

Multilingual translational discoveries

Potential discoveries that would become possible by crossing

large amount of (clinical) data about population of different

ethnics and continental origins currently expressed and

limited to a unique natural language

e.g. multilingual crossing of genotype-phenotype distinction

studies to help understanding better the role of the

environment on gene expression

4th workshop on the Multilingual

Semantic Web

Page 28: Roadmap for a multilingual BioPortal

Remaining open questions

How to deal with partial multilingual ontology?

How to deal more than one-to-one mapping?

download/upload vs. télecharger

Formalize entailment of these new classes and properties

e.g., a multilingual translation mapping is a multilingual mapping

connecting 2 ontologies that are a translation one of the other

Make BioPortal ontology parser deals with lexical enrichment

vocabularies

SKOS-XL, LIR, LexINfo, Lexvo, Lingvoj => LEMON

LEMON translation module (Jan 2014)

4th workshop on the Multilingual

Semantic Web

Page 29: Roadmap for a multilingual BioPortal

Conclusions

Multilingual semantic Web is crucial

Propositions to manage multilingualism in an ontology

repository such like BioPortal

Deal with monolingual ontologies and translation mappings

Deal with multilingual ontologies (from xmllang to LEMON)

Within the SIFR project, we are implementing and test those

propositions in a local instance of BioPortal deployed at

LIRMM

4th workshop on the Multilingual

Semantic Web

Thank you.

Any questions?