Transcript
Page 1: Enabling Language Resources to Expose Translations as Linked Data on the Web

Enabling Language Resources to

Expose Translations as

Linked Data on the Web

Jorge Gracia, Elena Montiel-Ponsoda,

Daniel Vila-Suero, Guadalupe Aguado-de-Cea

Ontology Engineering Group (OEG)

Universidad Politécnica de Madrid (UPM)

[email protected]

Acknowledgments: LIDER and BabeLData projects

9th Language Resources and Evaluation

Conference, LREC 2014

Reykjavik (Iceland) 28/05/2014

Page 2: Enabling Language Resources to Expose Translations as Linked Data on the Web

Outline

Motivation

The translation model

Terminesp: a validating example

Conclusions

2

Page 3: Enabling Language Resources to Expose Translations as Linked Data on the Web

3

Motivation and goals

Page 4: Enabling Language Resources to Expose Translations as Linked Data on the Web

Motivation

Current multilingual lexica and electronic dictionaries

• Proprietary formats

• Non-standard APIs

• Disconnected from other resources

4

Page 5: Enabling Language Resources to Expose Translations as Linked Data on the Web

Motivation

GOAL: to allow language resources to expose

translations as Linked Data on the Web for their

consumption by semantic enabled applications in a

direct manner, not relying on application-specific

formats

5

Page 6: Enabling Language Resources to Expose Translations as Linked Data on the Web

Motivation

Objectives:

• To define a model for representing translations in RDF

• As a proof of concept:

1. Extract translations from the Terminesp terminological

database

2. Represent them in RDF with our model

3. Make them accessible both for human and machine

consumption

6

Page 7: Enabling Language Resources to Expose Translations as Linked Data on the Web

7

The translation model

Page 8: Enabling Language Resources to Expose Translations as Linked Data on the Web

The translation model

8

Page 9: Enabling Language Resources to Expose Translations as Linked Data on the Web

The translation model

9

Page 10: Enabling Language Resources to Expose Translations as Linked Data on the Web

LEXICONES

LEXICONEN

LexicalEntry LexicalSense

http://purl.org/goodrelations/v1#PaymentMethods

LexicalEntry LexicalSense

ONTOLOGY

“payment method”

“medio de pago”

The translation model

Translation (direct equivalent)

10

Page 11: Enabling Language Resources to Expose Translations as Linked Data on the Web

LEXICONES

LEXICONEN

LexicalEntry LexicalSensehttp://dbpedia.org/ontology/PrimeMinister

LexicalEntry LexicalSense

ONTOLOGY

“Prime Minister”

“Presidente del Gobierno”

http://es.dbpedia.org/resource/Presidente_del_Gobierno

ONTOLOGY

The translation model

Translation (Cultural equivalence)

11

Page 12: Enabling Language Resources to Expose Translations as Linked Data on the Web

The translation model

Characteristics of the model

• Translation as a relation between senses

• Translation relation reified additional information

can be attached to it

• Support to a variety of translation categories

• Translation categories clearly separated from the

model no commitment to specific views or

translation theories

• Translation sets group translations coming from the

same language resource, or belonging to the same

organization, for instance

• Re-use of well established vocabularies (DC, DCAT,

etc.) for provenance and additional information.

12

Page 13: Enabling Language Resources to Expose Translations as Linked Data on the Web

LexicalSense

tran

translationTarget

context

TranslationSet TranslationtranslationConfidence:double

The translation model

Translation Categories

http://purl.org/net/translation-categories

translationCategory

context

Resource

http://purl.org/net/translation.owl

Translation Module

translationSource

directEquivalent

culturalEquivalent

lexicalEquivalent

13

Page 14: Enabling Language Resources to Expose Translations as Linked Data on the Web

14

Terminesp,

a validating example

Page 15: Enabling Language Resources to Expose Translations as Linked Data on the Web

Terminesp, a validating example

TERMINESP

• Multilingual terminological database

• Terms and definitions from Spanish technological

standards

• More than 30K terms in Spanish, with translations into

English, German, French, Italian, …

15

Page 16: Enabling Language Resources to Expose Translations as Linked Data on the Web

lemon:LexicalEntryterminesp:38756es

lemon:LexicalEntry terminesp:38756en

lemon:LexicalSenseterminesp:38756es-sense

lemon:LexicalSenseterminesp:38756en-sense

skos:Conceptterminesp:38756

lemon:Lexiconterminesp:lexiconES

lemon:Lexicon terminesp:lexiconEN

tr:Translationterminesp:38756es-en-TR

“red”@es

“network”@en

lemon:entry

lemon:entry

lemon:sense

lemon:sensetr:translationTarget

tr:translationSource

lemon:reference

lemon:reference

ClassInstance

Legend

lemon:form

lemon:form

lemon:LexicalForm

lemon:writtenRep

lemon:writtenRep

lemon:LexicalForm

Terminesp, a validating example

16

Page 17: Enabling Language Resources to Expose Translations as Linked Data on the Web

lemon:LexicalSenseterminesp:38756es-sense

lemon:LexicalSenseterminesp:38756en-sense

Tr:TranslationSetterminesp:es-en-transet

tr:Translationterminesp:38756es-en-TR

tr:translationCategorytr:translationTarget

tr:translationSource

ClassInstance

Legend

tr:tran

trcat:directEquivalent

Terminesp, a validating example

17

Page 18: Enabling Language Resources to Expose Translations as Linked Data on the Web

Before

• MS Access database and a Web search interface

• Non standard formats and vocabularies

• Data “invisible” to software agents

• Translations implicit, not explicit

Terminesp, a validating example

18

Page 19: Enabling Language Resources to Expose Translations as Linked Data on the Web

Now

• Published on the Web as Linked Data

• Modelled using lemon and well established vocabularies

• Dereferenceable URIs

• Data “visible” to software agents

• Translations were made explicit

• Web search interface for human consumption

• SPARQL endpoint for machine consumption

Terminesp, a validating example

19

Page 20: Enabling Language Resources to Expose Translations as Linked Data on the Web

Terminesp for machine consumption – SPARQL endpoint

http://linguistic.linkeddata.es/terminesp/sparql-editor/

Terminesp, a validating example

20

Page 21: Enabling Language Resources to Expose Translations as Linked Data on the Web

Terminesp for machine consumption – SPARQL endpoint

http://linguistic.linkeddata.es/terminesp/sparql-editor/

Written representation target Lexicon target

network http://linguistic.linkeddata.es/data/terminesp/lexiconEN

Netzwerk (in der

Netzwerktopologie)http://linguistic.linkeddata.es/data/terminesp/lexiconDE

Terminesp, a validating example

21

Page 22: Enabling Language Resources to Expose Translations as Linked Data on the Web

Terminesp for human consumption – Web interface

http://linguistic.linkeddata.es/terminesp/search/

Terminesp, a validating example

22

Page 23: Enabling Language Resources to Expose Translations as Linked Data on the Web

23

Conclusions

Page 24: Enabling Language Resources to Expose Translations as Linked Data on the Web

Conclusions

24

Our proposal

• Model to represent translations as Linked Data on the

Web

• Terminesp as a validating example

Next steps

• Standardization through W3C Ontolex Community group

• Study possible reuse of ITS 2.0 elements

• Links of Terminesp to external resources (e.g., BabelNet)

24

Page 25: Enabling Language Resources to Expose Translations as Linked Data on the Web

Thanks for your attention !

25


Top Related