human language technologies in a multilingual europe

27
Georg Rehm [email protected] DFKI GmbH, Language Technology Lab – Berlin, Germany META-NET, General Secretary Human Language Technologies in a Multilingual Europe

Upload: georg-rehm

Post on 10-Feb-2017

57 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Human Language Technologies in a Multilingual Europe

Georg [email protected]

DFKI GmbH, Language Technology Lab – Berlin, GermanyMETA-NET, General Secretary

Human Language Technologiesin a Multilingual Europe

Page 2: Human Language Technologies in a Multilingual Europe

Outline• Multilingual Europe

• Analysis I: Technology Support for Europe’s Languages

• Analysis II: Status and Current Developments

• Example: LT for the Digital Single Market

• Missions and Opportunities

• Towards the Human Language Project

2EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)

Page 3: Human Language Technologies in a Multilingual Europe

• Multilingualism is at the very heart of the European idea.

• 24 EU languages – all languages have the same status.

• Dozens of regional and minority languages as well as languages of immigrants and trade partners.

• Economic challenges: – If the DSM is not multilingual, there will be 20+ isolated markets!

– Language barriers are market barriers!

• Social and public challenges:– Empower all citizens to use their mother tongues.

– Provide multilingual digital public services.

– Enable cross-border, cross-lingual, cross-cultural communication. Towards a European public sphere and e-participation.

– Restore trust in media (fake news debate, filter bubble issue etc.)

Page 4: Human Language Technologies in a Multilingual Europe

Analysis I: Technology Support for Europe’s Languages

4EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)

Page 5: Human Language Technologies in a Multilingual Europe

q

60 research centres in 34 countries (founded in 2010)Chair of Executive Board: Jan Hajic (CUNI)Dep.: J. van Genabith (DFKI), A. Vasiljevs (Tilde) General Secretary: Georg Rehm (DFKI)

q

Multilingual Europe Technology Alliance.826 members in 67 countries

(published in 2013) (31 volumes; published in 2012)

T4ME (META-NET) CESAR METANET4UMETA-NORDMultilingual Europe Technology AllianceNET

Page 6: Human Language Technologies in a Multilingual Europe

q Basqueq Bulgarian*q Catalanq Croatian*q Czech*q Danish*q Dutch*q English*q Estonian*q Finnish*q French*

q Galicianq German*q Greek*q Hungarian*q Icelandicq Irish*q Italian*q Latvian*q Lithuanian*q Maltese*q Norwegian

q Polish*q Portuguese*q Romanian*q Serbianq Slovak*q Slovene*q Spanish*q Swedish*q Welsh

* Official EU languagehttp://www.meta-net.eu/whitepapers

Page 7: Human Language Technologies in a Multilingual Europe

MT

English

good

French, Spanish

moderate fragmentary

Catalan, Dutch, German, Hungarian, Italian, Polish,

Romanian

weak or no support through LT

Basque, Bulgarian, Croatian, Czech, Danish, Estonian, Finnish, Galician, Greek, Icelandic, Irish,

Latvian, Lithuanian, Maltese, Norwegian, Portuguese, Serbian, Slovak, Slovene, Swedish, Welsh

excellent

Czech, Dutch, Finnish, French, German, Italian,

Portuguese, Spanish

moderate fragmentary

Basque, Bulgarian, Catalan, Danish, Estonian, Galician,

Greek, Hungarian, Irish, Norwegian, Polish, Serbian, Slovak, Slovene, Swedish

weak or no support through LT

Croatian, Icelandic, Latvian, Lithuanian, Maltese, Romanian,

Welsh

excellent

English

good

Spee

ch

English

good

Dutch, French, German, Italian,

Spanish

moderate fragmentary

Basque, Bulgarian, Catalan,Czech, Danish, Finnish,

Galician, Greek, Hungarian, Norwegian, Polish,

Portuguese, Romanian, Slovak, Slovene, Swedish

weak or no support through LT

Croatian, Estonian, Icelandic, Irish, Latvian, Lithuanian, Maltese,

Serbian, Welsh

excellent

English

good

Czech, Dutch, French, German, Hungarian, Italian, Polish, Spanish,

Swedish

moderate fragmentary

Basque, Bulgarian, Catalan, Croatian, Danish, Estonian,

Finnish, Galician, Greek, Norwegian, Portuguese,

Romanian, Serbian, Slovak, Slovene

Icelandic, Irish, Latvian, Lithuanian, Maltese, Welsh

weak or no support through LTexcellent

Res

ourc

esTe

xt A

naly

tics

Page 8: Human Language Technologies in a Multilingual Europe

Fragmentary

Weak/none

Moderate

Good

Excellent

Welsh

Maltese

Lithuanian

Latvian

Icelandic

Irish

Croatian

Serbian

Estonian

Slovene

Slovak

Roma

nian

Norwegian

Greek

Galician

Danish

Bulgarian

Basque

Swedish

Portu

guese

Finnish

Catal

anPo

lish

Hung

arian

Czech

Italia

nGe

rman

Dutch

Span

ishFre

nch

Engli

sh

Leve

l of s

uppo

rt

Languages with names in redhave little or no MT support

Source: META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg, New York, Dordrecht, London, September 2012. Georg Rehm and Hans Uszkoreit (series editors)

Important: even current state of the art technologies are far from being perfect!

Important: 20+ European languages areseverely under-supported and face the danger of digital extinction.

Page 9: Human Language Technologies in a Multilingual Europe

Excellent

Good

Moderate

Fragmentary

Weak/nosupport

Lang

uage

Tech

nolo

gy Su

ppor

tM

illions of Native Speakers (Worldwide)

Yiddis

h

Welsh

Vlax R

oman

i

Turki

sh

Scot

s

Roma

ny

Occit

an

Malte

se

Mace

donia

n

Luxe

mbou

rgish

Lithu

anian

Limbu

rgish

Latvi

an

Icelan

dicFri

ulian

Frisia

n

Breto

n

Bosn

ian

Astu

rian

Alban

ian Irish

Croati

an

Serb

ian

Hebr

ew

Esto

nian

Slove

ne

Slova

k

Romanian

Norw

egian

Gree

k

Galic

ian

Danis

hBu

lgaria

n

Basq

ue

Swed

ish

Portu

gues

e

Finnis

h

Catalan

Polish

Hungarian

Czec

h

Italian

German

Dutch

Spanish

French

English

0

50

100

150

200

250

300

350

400

Source: Georg Rehm, Hans Uszkoreit, Ido Dagan, Vartkes Goetcherian, Mehmet Ugur Dogan, Coskun Mermer, Tamás Váradi, Sabine Kirchmeier-Andersen, Gerhard Stickel, Meirion Prys Jones, Stefan Oeter, and Sigve Gramstad. An Update and Extension of the META-NET Study “Europe's Languages in theDigital Age”. In Proceedings of the Workshop on Collaboration and Computing for Under-Resourced Languages in the Linked Open Data Era (CCURL 2014), Reykjavik, Iceland, May 2014.

Page 10: Human Language Technologies in a Multilingual Europe

Analysis II: Status and Current Developments

10EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)

Page 11: Human Language Technologies in a Multilingual Europe

• Multilingual Europe: our languages enjoy equal status yet digital extinction of the majority of European languages is a very severe danger.

• Language Technology Research and Innovation in Europe: World class research, excellent results (examples: Moses, recent NMT results of QT21), strong SME base, thousands of LSPs; fragmentation; need for coordination.

• Big need for high-quality, high-coverage, precise, robust, deployable Language Technologies: translation, conversational interfaces, text and media analytics, personal assistants, multilingual DSM etc.

• Artificial Intelligence: Important breakthroughs and massive investments in R&D and applications (mostly in US and Asia) – huge opportunity for Europe!

• The European Language Challenge cannot be abandoned or outsourced.

Ø Europe must not make its digital infrastructure dependent on non-European solutions. This is why the EU is building GALILEO as an alternative to GPS, GLONASS, Bei Dou.

• Big need for Language Technologies made in Europe for Europe!

Status and Current Developments

11EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)

!

Page 12: Human Language Technologies in a Multilingual Europe

Example: Language Technology for the

Digital Single Market

12EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)

Page 13: Human Language Technologies in a Multilingual Europe

q Top priority in the European Union.

q Expected to add 400b€ to European GDP and hundreds of thousands of new jobs.

q Unfortunately, the language topic is not included in the EC’s Digital Single Market strategy (published in May 2015).

Page 14: Human Language Technologies in a Multilingual Europe
Page 15: Human Language Technologies in a Multilingual Europe

MDSM: Needed Applications

q Crosslingual SME presales communication and aftersales servicesq Multilingual websites, product catalogues, product descriptionsq Crosslingual business intelligence (e.g., based on UGC)q Crosslingual communication for SMEs, public institutions, citizensq Multilingual (big) data, language and knowledge value chainsq Multilingual knowledge bases and knowledge graphs (and services)q Multilingual conversational interfaces for connected devices (IoT)q Crosslingual social media analytics for EU-wide societal issuesq Multilingual text and report generation (knowledge/data to text)q All services must be domain-adaptable (avoid one size fits all)q etc.

15EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)

Page 16: Human Language Technologies in a Multilingual Europe

Multilingual Value Programme

q Multilingual Value Programe§ Suggested three-year programme§ Requires modest investment

q “Enabling the Multilingual Digital SingleMarket through technologies fortranslating, analysing, processing andcurating natural language content”

q Three components address the main needs of the Multilingual DSM (MDSM)and how to put them into practice:1. Multilingual Application Areas2. Multilingual Services3. Research

Strategic Research and Innovation Agenda

Language as a Data Type and Key Challenge for Big Data

Enabling the Multilingual Digital Single Market through technologies for translating, analysing, processing

and curating natural language content

SRIA Editorial Team

Version 0.9 – July 2016

16EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)

Version 1.0 to be published in 2017

Page 17: Human Language Technologies in a Multilingual Europe

Missions and Opportunities

17EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)

Page 18: Human Language Technologies in a Multilingual Europe

Missions and Opportunities• Languages & European Society: Enable all European citizens

to communicate and operate in their mother tongues (online & offline).

• Languages & Media: Address – technologically – the massively increasing social, political and commercial relevance of content and communication (fake news debate, filter bubble challenge).

• Languages & Market: Realise the Multilingual DSM, including multilingual content, crosslingual text analytics, multilingual generation.

• Languages & Digital Tech: Future-proof our languages.

• Languages & Devices: Robust, precise, high-quality spoken language interfaces for billions of connected things – and all languages.

• Excellent opportunity for Europe, European research, European education, European industry, European innovation, European culture!

• Goal: Move Europe into the pole position in this field!

18EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)

Page 19: Human Language Technologies in a Multilingual Europe

Towards theHuman Language Project

19EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)

Page 20: Human Language Technologies in a Multilingual Europe

Multilingual Europe through

Technology

Multilingual Strategy of the EU: more tech

support for multilingualism

Language Technologies for Europe's digital public

services

Technologies for the

Multilingual Digital Single

Market

Language Technologies for Big Data text analytics

The Human Language

Project – long-term R&D&I, post-H2020

Language Technologies

R&D&I (H2020, WP

2018-20)

Multilingual Europein January 2017

Strategic Research and Innovation Agenda

Language as a Data Type and Key Challenge for Big Data

Enabling the Multilingual Digital Single Market through technologies for translating, analysing, processing

and curating natural language content

SRIA Editorial Team

Version 0.9 – July 2016

Open calls andupcoming servicecontracts

Dec. 2016: EC brainstormingmeeting on future LT prioritiesin Horizon 2020 and FP9.Need for a new strategy paper?

Jan. 2017: STOA workshop and study on LT for Europe

Dec. 2017: LT Sessionat BDVA Summit inValencia

2017: MDSM SRIA V1.0

Policy change and initiative towards a European digital public sphere enabled by MT/LT

DG CONNECT

DGT andDG CONNECT

DG CONNECT

WP 2018-20 (incl. IoT, I4.0, assistants, robots etc.)

Shared programmebetween EU and MS Suggested MLV Programme

Strategic Research and Innovation Agenda

Language as a Data Type and Key Challenge for Big Data

Enabling the Multilingual Digital Single Market through technologies for translating, analysing, processing

and curating natural language content

SRIA Editorial Team

Version 0.9 – July 2016

CEF ATELRC

Page 21: Human Language Technologies in a Multilingual Europe

Multilingual Europe through

Technology

Multilingual Strategy of the EU: more tech

support for multilingualism

Language Technologies for Europe's digital public

services

Technologies for the

Multilingual Digital Single

Market

Language Technologies for Big Data text analytics

The Human Language

Project – long-term R&D&I, post-H2020

Language Technologies

R&D&I (H2020, WP

2018-20)

Multilingual Europein January 2017

Strategic Research and Innovation Agenda

Language as a Data Type and Key Challenge for Big Data

Enabling the Multilingual Digital Single Market through technologies for translating, analysing, processing

and curating natural language content

SRIA Editorial Team

Version 0.9 – July 2016

Open calls andupcoming servicecontracts

Dec. 2016: EC brainstormingmeeting on future LT prioritiesin Horizon 2020 and FP9.Need for a new strategy paper?

Jan. 2017: STOA workshop and study on LT for Europe

Dec. 2017: LT Sessionat BDVA Summit inValencia

2017: MDSM SRIA V1.0

Policy change and initiative towards a European digital public sphere enabled by MT/LT

DG CONNECT

DGT andDG CONNECT

DG CONNECT

WP 2018-20 (incl. IoT, I4.0, assistants, robots etc.)

Shared programmebetween EU and MS Suggested MLV Programme

Strategic Research and Innovation Agenda

Language as a Data Type and Key Challenge for Big Data

Enabling the Multilingual Digital Single Market through technologies for translating, analysing, processing

and curating natural language content

SRIA Editorial Team

Version 0.9 – July 2016

CEF ATELRC

Observations:

• Current initiatives are too small and unbalanced; they concentrate on innovation and technology deployment.

• Danger to loose touch with research and novel, potentially paradigm-shifting developments.

• Difficult to kick-start new, paradigm-shifting research.

• We need a coordinated, concerted and consolidated push in basic research, applied R&D and innovation!

Page 22: Human Language Technologies in a Multilingual Europe

Human Language Project – Interdisciplinary R&D&I Programme

Basic Research

•Results in new methods, approaches

Applied R&D

•Results in novel technologies

Innovation

•Results in novel or improved products or services

Research Themes – Needs and Gaps (market-driven)

• Computational Linguistics• Artificial Intelligence• Language Technology• Linguistics• Computer Science• Cognitive Science• other related fields

• New, groundbreaking methods, paradigms, approaches

• Foster technologies, products, innovation, economy

• Foster education

HLP: Umbrella programmeto turbo-charge and to

coordinate all European R&D&I activities in a

systematic way including EP, EC, Member States.

22EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)

Page 23: Human Language Technologies in a Multilingual Europe

Human Language Project• Goal: Deep Natural Language Understanding.

• Breakthroughs in Artificial Intelligence plus a fresh look at Linguistics for the Next Generation of LT!

• All official European and many additional languages

• Broad coverage, high quality, high precision

• Across modalities: text, text types, speech, image, video etc.

• Across platforms: messaging, telephony, social, mobile, IoT etc.

• Across cultures: knowledge, customs, formalities, humour, emotion, subjectivity, biases, opinions, filter bubble etc.

23EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)

Page 24: Human Language Technologies in a Multilingual Europe

Human Language Project• Collaboration and coordination between EC, EP,

Member States and all other stakeholders.• Mix of funding sources:

– EU projects: Horizon 2020 (WP 2018-2020) + FP9 (2021+)– National/regional funding sources

• Setup: basic research, applied research, innovation, commercialisation – tightly intertwined

• Timeframe: 10 years • Policy change towards “LT-enabled multilingualism” • Public procurement: EU/EC, MS administrations

should demand certain language technologies

24EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)

Page 25: Human Language Technologies in a Multilingual Europe

HLP Topics: Key Ingredients for Future European LT Research

Artificial Intelligenceincluding cognition, perception, vision,

cross-modal, cross-platform, cross-culture, IoT etc.

Machine Learning

Language Technology

• Extend knowledge bases• Semantic Web, ontologies,

linked data, interoperability • More complex models• Multilingual resources that

are grounded, extensible• Subjectivity, objectivity,

further novel dimensions• Web-scale reasoning

• Combine DNNs and symbolic processing

• ML for knowledge acquisition and extension

• DNNs embedded into modular systems including symbolic knowledge bases

• Make it possible to inspect and also to optimise DNNs (beyond end-to-end)

• (Computational) Linguistics research towards deep language understanding

• From corpora to DNNs to annotated data to highly improved symbolic methods

• Language portability• Full and Deep Language

Understanding by 2030 –Human Language Project

Knowledge Technology

25EP STOA Workshop: Language Equality in the Digital Age (10 Jan. 2017)

Page 26: Human Language Technologies in a Multilingual Europe

Human Language

Project

Truly Multilingual

Europe

European Economy (MDSM)

Attractive jobs for

high potentials

Education and young

researchers

Massive boost for research

Foster innovation and new

companies

26

Page 27: Human Language Technologies in a Multilingual Europe

Thank you!

Georg [email protected]

Human Language

Project

Truly Multilingual

Europe

European Economy (MDSM)

Attractive jobs for

high potentials

Education and young

researchers

Massive boost for research

Foster innovation and new

companies