infrastructures and plans boosting language technology research and innovation

34
Co-funded by the 7th Framework Programme of the European Commission through the contract T4ME, grant agreement no.: 249119. Infrastructures and plans boosting Language Technology Research and Innovation Stelios Piperidis Athena RC, Greece [email protected]

Upload: saul

Post on 25-Feb-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Infrastructures and plans boosting Language Technology Research and Innovation. Stelios Piperidis Athena RC, Greece [email protected]. Multilingual Europe. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Infrastructures  and plans  boosting Language Technology  Research and Innovation

Co-funded by the 7th Framework Programme of the European Commission through the contract T4ME, grant agreement no.: 249119.

Infrastructures and plans boosting

Language Technology Research and Innovation

Stelios PiperidisAthena RC, Greece

[email protected]

Page 2: Infrastructures  and plans  boosting Language Technology  Research and Innovation

Multilingual Europe

3http://www.meta-net.eu

Challenge: Providing each language community with the most advanced technologies for communication and information so that maintaining their mother tongue does not turn into a disadvantage.

While research has made considerable progress in recent years, the pace of progress is not fast enough to meet the challenge within the next 10-20 years.

All stakeholders – researchers, LT user and provider industries, language communities, funding programmes, policy makers – should team up for a major dedicated push.

Page 3: Infrastructures  and plans  boosting Language Technology  Research and Innovation

Objectives

META-NET is a network of excellence dedicated to fostering the tech-nological foundations of the European multilingual

information society.

http://www.meta-net.eu 4

Page 4: Infrastructures  and plans  boosting Language Technology  Research and Innovation

Four EU-Funded Projects

Initial project: T4ME (FP7; 13 partners, 10 countries)

Three ICT-PSP consortia since Feb. 2011: CESAR, METANET4U, META-NORD

All EU member states and several non-member states covered.

META-NET in Nov. 2012: 60 members in 34 countries.

http://www.meta-net.eu 5

http://www.meta-net.eu/members

Page 5: Infrastructures  and plans  boosting Language Technology  Research and Innovation

Language White Paper Series

META-VISION

http://www.meta-net.eu 6

Page 6: Infrastructures  and plans  boosting Language Technology  Research and Innovation

Language White Paper Series

http://www.meta-net.eu 7

Reports on the state of our languages inthe digital age and the level of support through language technology.

Series covers 30 languages. Key communication instruments to

address decision makers and journalists. Inform about societal and technological

problems and challenges as well as economic opportunities.

>2 years in the making. >200 national experts as contributors. >8.000 copies printed and distributed to

politicians and journalists.

Page 7: Infrastructures  and plans  boosting Language Technology  Research and Innovation

30 Languages Covered

Basque Bulgarian* Catalan Czech* Danish* Dutch* English* Estonian* Finnish* French*

Galician German* Greek* Hungarian* Icelandic Irish* Italian* Latvian* Lithuanian* Maltese*

Norwegian Polish* Portuguese* Romanian* Serbian Slovak* Slovene* Spanish* Swedish* Croatian

http://www.meta-net.eu 8

* = Official EU language

Page 8: Infrastructures  and plans  boosting Language Technology  Research and Innovation

Cross-Lingual Ranking

In four application areas, each language is assigned to one of five clusters, ranging from excellent LT support to weak/no support:1. Machine Translation2. Speech Processing3. Text Analysis4. Resources

Results finalised at a meeting in Berlin with representatives of all 30 languages (October 21/22, 2011).

http://www.meta-net.eu 9

Page 9: Infrastructures  and plans  boosting Language Technology  Research and Innovation

MT

http://www.meta-net.eu 10

English

good

French, Spanish

moderate fragmentary

Catalan, Dutch, German, Hungarian, Italian, Polish,

Romanian

weak or no support

Basque, Bulgarian, Croatian, Czech, Da-nish, Estonian, Finnish, Galician, Greek, Icelandic, Irish,

Latvian, Lithuanian, Maltese, Norwegian, Portuguese, Serbian,

Slovak, Slovene, Swedish

excellent

Czech, Dutch, Finnish, French, German, Italian,

Portuguese, Spanish

moderate fragmentary

Basque, Bulgarian, Catalan, Danish, Estonian, Galician, Greek, Hungarian, Irish,

Norwegian, Polish, Serbian, Slovak, Slovene, Swedish

weak or no support

Croatian, Icelandic, Latvian, Lithuanian, Maltese, Romanian

excellent

English

good

Spee

ch

English

good

Dutch, French, German, Italian,

Spanish

moderate fragmentary

Basque, Bulgarian, Catalan, Czech, Danish, Finnish,

Galician, Greek, Hungarian, Norwegian, Polish, Portuguese,

Romanian, Slovak, Slovene, Swedish

weak or no support

Croatian, Estonian, Icelandic, Irish, Latvian, Lithuanian, Maltese,

Serbian

excellent

English

good

Czech, Dutch, French, German,

Hungarian, Italian, Polish, Spanish,

Swedish

moderate fragmentary

Basque, Bulgarian, Catalan, Croatian, Danish, Estonian,

Finnish, Galician, Greek, Norwegian, Portuguese,

Romanian, Serbian, Slovak, Slovene

Icelandic, Irish, Latvian, Lithuanian, Maltese

weak/no supportexcellent

Reso

urc

esTe

xt

Anal

ysis

Page 10: Infrastructures  and plans  boosting Language Technology  Research and Innovation

Europe’s Languages and LT

http://www.meta-net.eu 11

DutchFrenchGermanItalian

Spanish

CatalanCzech

FinnishHungarian

PolishPortugues

eSwedish

BasqueBulgarian

DanishGalicianGreek

Norwegian

RomanianSlovak

Slovene

CroatianEstonianIcelandic

IrishLatvian

LithuanianMalteseSerbian

English

good support through Language

Technology

weak orno support

Page 11: Infrastructures  and plans  boosting Language Technology  Research and Innovation

Key Observations

http://www.meta-net.eu 12

When it comes to Language Technology support, there are massive differences between Europe’s languages and technology areas.

LT support for English is ahead of any other language. Even support for English is far from being perfect. The gap between English and the other languages keeps

widening! Several languages – Icelandic, Latvian, Lithuanian, Maltese

– receive the weakest score in all four areas! At least 21 European languages in danger of digital

extinction!(Languages put into the “weak or no support” category at least once.)

Page 12: Infrastructures  and plans  boosting Language Technology  Research and Innovation

Strategic Research Agenda

META-VISION

http://www.meta-net.eu 13

Page 13: Infrastructures  and plans  boosting Language Technology  Research and Innovation

Three Ingredients

14

Appropriate

Programme

Vision & Agenda

Appropriate ActorsResearch &

Commercialisation

Appropriate Support

Funding

http://www.meta-net.eu

Page 14: Infrastructures  and plans  boosting Language Technology  Research and Innovation

Strategic Research Agenda

http://www.meta-net.eu 15

META-NET Strategic Research Agenda for Multilingual Europe 2020.

Addresses the problems we found during the white paper study.

Three priority research themes and application/innovation scenarios.

Can put Europe ahead of its competitors in this technology area.

190+ contributors. Final version ready today! SRA will be presented to the EC and

national bodies.

Page 15: Infrastructures  and plans  boosting Language Technology  Research and Innovation

Strategic Research Agenda

http://www.meta-net.eu 16

Page 16: Infrastructures  and plans  boosting Language Technology  Research and Innovation

Priority Themes: 3 + 2

Three Priority Research Themes: Translation Cloud Social Intelligence and e-Participation Socially-Aware Interactive Assistant

Two additional themes: European Language Technology

Platform Core Technologies for Language

Analysis and Production

http://www.meta-net.eu 17

Page 17: Infrastructures  and plans  boosting Language Technology  Research and Innovation

Open Resource Infrastructure

META-SHARE

http://www.meta-net.eu 18

Page 18: Infrastructures  and plans  boosting Language Technology  Research and Innovation

The power of data

http://www.meta-net.eu 19

Scientific data has the potential to transform and drastically improve our lives

Evidence from many domains – geo & earth sciences, biotechnology – shows data & tools become valuable through opening and sharing Both for research and technology development &

evaluation Supporting innovative applications

Making the Human Genome Project results accessible, leveraged ~ €3 billion R&D investment, ~ €500 billion in economic activity

“Alzheimers’ researchers recently pooled genetic data and discovered 5 new genes and important evidence about the disease”

“Data is too valuable to be locked away”

Page 19: Infrastructures  and plans  boosting Language Technology  Research and Innovation

Strategic Research Agenda

http://www.meta-net.eu 21

Page 20: Infrastructures  and plans  boosting Language Technology  Research and Innovation

LRs in the SRA

http://www.meta-net.eu 22

Page 21: Infrastructures  and plans  boosting Language Technology  Research and Innovation

LRs Discovery? Availability?

http://www.meta-net.eu 23

According to past and recent studies only a portion of language resources (LRs) is known/ announced / shared / traded / ...

… despite the fact that data collection, cleaning, annotation, curation and maintenance is a very costly business

To make any progress, enable the development of useful applications, we need all those scientific, technical, legal, organisational, societal mechanisms that enable the necessary resources to be shared, recycled, repurposed

Page 22: Infrastructures  and plans  boosting Language Technology  Research and Innovation

META-SHARE rationale

http://www.meta-net.eu 24

Language resources (data and tools) are dynamic living entities they evolve over time in various dimensions (quantity,

annotation levels, conversion to new formats, addition of new languages)

they are usually the product of collaborative work they may come with varying restrictions, ...

Need solutions that enable every language resource provider, at any granularity level (individual/lab/organisation), to Create his own repository of LRs Describe, document and update LR descriptions Link to a network of repositories of other providers Keep track of the use of his LRs, trade LRs, …

Need solutions that enable every language resource consumer to Discover what LRs suitable for his/her purposes exist Get information about, download / acquire them

Page 23: Infrastructures  and plans  boosting Language Technology  Research and Innovation

META-SHARE: what it is

http://www.meta-net.eu 25

META-SHARE tries to match LR providers and consumers needs and expectations by enhancing visibility, documentation, identification, availability, preservation of language data and (basic language processing) tools

It launches a long-term multidimensional endeavour by which language resources will contribute to boosting research, technology and innovation through wide availability, pooling, openness and sharing

Page 24: Infrastructures  and plans  boosting Language Technology  Research and Innovation

http://www.meta-net.eu 26

metadata harvesting

…LR repoInventory

LR repoInventory

LR repoInventory

LR repoInventory

META-SHARE inventory

META-SHARE inventory

META-SHARE inventory

Search / browse

reportingmappings

licence statistics

Billing / payment recommenders

download

Registration – authentication - authorisationMETA-SHARE portal

External repos

META-SHARE architecture

Resources provision services

User oriented and support services

Page 25: Infrastructures  and plans  boosting Language Technology  Research and Innovation

META-SHARE provider side All facilities for creating

your own META-SHARE-compliant repository and linking to the META-SHARE network : Open source repository

software Functionalities for

documenting, updating descriptions, storing/linking LRs

Provider support services (helpdesks, forum, knowledge base)

Each repository maintains an inventory with all LRs MD, exports MD for harvesting

Harvested MD are stored in synchronised central servers

http://www.meta-net.eu 27

Page 26: Infrastructures  and plans  boosting Language Technology  Research and Innovation

META-SHARE user side

Users (LR consumers) can search the central

inventory browse using multiple

facets

http://www.meta-net.eu 28

access the actual resources by visiting the respective repositories to get legally interoperable licence(s) to download and use them

get support through an online user forum and helpdesks dedicated to technical, metadata and legal issues

access a knowledge base

Page 27: Infrastructures  and plans  boosting Language Technology  Research and Innovation

Join META-SHARE as ...

Core and User Support Service Providers

Hosting (non-local) repositories

Local repositories

Depositing-only Members

Associate members

Third Party Consumers

Repository Service Providers

Page 28: Infrastructures  and plans  boosting Language Technology  Research and Innovation

Legal provisions for LR sharing Language Resources Sharing Charter – high level

principles

Memorandum of Understanding – aka membership agreement

Licensing templates and deposition agreements Inclusive mix of open and openness inspired models

- Creative Commons licences (starting with Creative Commons Zero (CC-0) and all possible combinations along the CC differentiation of rights of use)

- META-SHARE Commons licences, fully developed CC-based licensing tool that allows META-SHARE members to make their resources available inside the network only

- META-SHARE “No Redistribution” licences, allowing use and exploitation of the Resources while permitting the LR Owner to have full control over the Resource distribution.

- Software tools and web services are either provided though one of the standard Open Source licenses or under a custom commercial license.

http://www.meta-net.eu 30

Page 29: Infrastructures  and plans  boosting Language Technology  Research and Innovation

META-SHARE today… A network of 24 language resources repositories in 19

EU countries, with >1550 LRs

META-SHARE software, open source, under a permissive licence (BSD), to set up a language resource repository

Legal instruments catering for a range of uses

Software-based services for both LR providers and LR consumers

User support services User Forum helpdesks

Mapping services to big resource inventories (CLARIN, OLAC, …)http://www.meta-net.eu 32

Page 30: Infrastructures  and plans  boosting Language Technology  Research and Innovation

In the immediate future… More META-SHARE nodes and respective language

resources will be integrated – integration of ELRA supported initiatives, LRE Map, Language Library

Adoption of the META-SHARE platform and framework by ELRA

Full deployment of the services of META-SHARE members – from software availability, maintenance and technical assistance to language resources storage and preservation as well as support related to metadata and legal issues

Coordination with upcoming initiatives (iCordi, Research Data Alliance, …)

Official launch : 25 January 2013http://www.meta-net.eu 33

Page 31: Infrastructures  and plans  boosting Language Technology  Research and Innovation

ConclusionsMETA-NET

http://www.meta-net.eu 34

Page 32: Infrastructures  and plans  boosting Language Technology  Research and Innovation

Conclusions

Our white paper press campaign shows that Europe is extremely interested in and passionate about its languages.

Two Parliamentary Questions in the European Parliament on the “digital extinction of languages” topic.

Now is the time to move forward with a continent-wide, systematic push and to invest in strategic research.

A modest investment is required. This push will generate a countless number of

opportunities. Horizon 2020 and Connecting Europe Facility can provide

sufficient resources to make our visions for Europe’s citizens and economy a reality.

http://www.meta-net.eu 35

Page 33: Infrastructures  and plans  boosting Language Technology  Research and Innovation

http://www.meta-net.eu 36

Page 34: Infrastructures  and plans  boosting Language Technology  Research and Innovation

Thank you very much!

[email protected]

http://www.meta-net.euhttp://www.facebook.com/META.Alliance

37

Q/A