machine translation at the epo removing language barriers...

21
19/09/2013 Machine Translation at the EPO Removing language barriers from patent documentation Paul Schwander European Patent Office The 5th Workshop on Patent Translation, MT Summit 2013

Upload: others

Post on 10-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

19/09/2013

Machine Translation at the EPO

Removing language barriers from patent documentation

Paul Schwander

European Patent Office

The 5th Workshop on Patent Translation, MT Summit 2013

Page 2: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Roadmap

The context: why is MT strategic?

Machine Translation @ the EPO:

state of play and future plans

Page 3: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Why Machine Translation?

- Reducing the language

barrier in the European

context: Unitary Patent

system supported by MT.

- Access to global patent

information for prior art

searches.

Page 4: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Global patent filings rising continuously, especially Chinese applications

IP5= Europe, USA, Japan, China and Korea

Page 5: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Accessing Asian languages patent for EPO

examiners

Page 6: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO
Page 7: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Addressing the Chinese language wall

MT Full-text acquired, ca. 5 milion documents.

An on demand manual translation service offered to examiner

5 million patents: manual translation -> 1 day a patent -> 22 years of work for a team of 1000 translators.

Search in

MTed text

Relevant patents

Order a manual

translation

Detect

Understand

Page 8: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO
Page 9: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Patent Translate launched on 29 February 2012

System integrated in Espacenet,

the EPO Publication Server and EPOQUE

Page 10: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Patent Translate: how does it work?

Result of a collaboration between the EPO and Google

Patent data represent a huge source of corpora.

Patent documents and their translation/corresponding documents are prepared and stored in a corpora repository.

Translation system is trained using this corpora.

Translation quality assessed before launch: test fit for purpose level.

Page 11: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Translation

Memory

Translation

(Google)

Patent Translate: Architecture

Patent Translate Service

EPO

API (REST)

Corpora

Repository

Espacenet Bulk GPI Examiner tools Publication Server

EPO

«Functionblock»

Business Services::ApplicationBE

EPO«Functionblock»

SysManag.::Monitoring

«Functionblock»

SysManag.::Installation

«file»

ConfigurationFile

«Component»

Superinstaller

uses/provides

«file»

Monitoring::Checks

«document»

InstallationGuideline

Installation-GUI

Actors::ITAdministartor

«Component»

LiveUpdate

Update Plugin

«Component»

EPO::EPO update server

«file»

ExternalSoftware

«file»

OLF Software build

«file»

Installation Packages

provides

provides

Update-GUI

Actors::PatentApplicant

soap via http

«file»

OLF & NO Plugins

CLI

«Component»

ApplicationBE::FileManagerBE

provides

install

National

Patent

Offices

Page 12: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Quality level ranking

Page 13: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Current achievements

Patent Translate now covers translations between English and

21 other languages: Bulgarian, Chinese, Czech, Danish, Dutch,

Finnish, French, German, Greek, Hungarian, Icelandic, Italian,

Japanese, Norwegian, Polish, Portuguese, Romanian, Slovak,

Slovenian, Spanish and Swedish

305 million different machine translations of complete patent

documents can be accessed 'on the fly', using the current

language pairs offered = 1500 years of work for 1000 translators

if done manually

Page 14: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Patent Translate usage

Page 15: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Plans

Project to be completed end of 2014:

32 languages European and Asian.

2013-2014

Turkish, Estonian, Croatian, Latvian,

Lithuanian, Albanian, Macedonian, Serbian,

Russian and Korean

Page 16: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Patent Translate : illustrative example (Description)

Page 17: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

MT Example

Page 18: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

MT Example

Page 19: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Topics around Patent MT

CLIR or not?

OCR combined with MT for

non-digitised collections

Fit for purpose quality

assessment

Perception of the quality

Page 20: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Conclusion

MT is more than ever a must in the context of

the global patent documentation:

– The size of patent collections to search is

increasing and systematic manual translation

is not an option.

– MT has proven to be fit for purpose

– Quality will continue improving

Page 21: Machine Translation at the EPO Removing language barriers ...aamtjapio.com/kenkyu/files/kenkyu03/EPO_MTSummit(20130902).pdfEspacenet Publication Server Bulk Examiner toolsp GPI EPO

Thank You

www.epo.org

[email protected]