requirements for its2.0 support in computer assisted ... · requirements for its2.0 support in...

32
www.cngl.ie Requirements for ITS2.0 support in Computer Assisted Translation Tools John Moran, Christian Saam, Anuar Serikov, Pablo Porto and Dave Lewis CNGL Trinity College Dublin

Upload: hadiep

Post on 15-Feb-2019

224 views

Category:

Documents


0 download

TRANSCRIPT

www.cngl.ie

Requirements for ITS2.0 support in Computer

Assisted Translation Tools

John Moran, Christian Saam, Anuar Serikov, Pablo Porto and Dave Lewis

CNGL

Trinity College Dublin

www.cngl.ie

Overview

• Meta-Data and CAT Tools

• Use Cases: ITS2.0 and CAT tools

• Prototype: OmegaT

• Prototype: Web-client CAT

• Richer CAT meta-data

• Summary

www.cngl.ie

ITS 2.0 Draft Data

Categories

ITS1.0

• Translate

• Localization Note

• Terminology

• Directionality

• Ruby

• Lang info

• Element within text

I18n

• Locale Filter

• External Resource

• Preserve Space

• Allowed Characters

• Storage Size

• ID Value

Language Technology

• Domain

• MT confidence

• Text Analysis

Provenance & QA

• Quality Issue

• Quality Rating

• Provenance

www.cngl.ie

Meta-Data and CAT Tools

• Meta-Data can provide useful information to

translators if presented carefully

• Translation, Post-editing and Review tasks

can add meta-data

• Integration with tool chain requires standard

meta-data specification

• ITS2.0 provide new standards for several CAT

use cases

• What further CAT meta-data can be leveraged

?

www.cngl.ie

Meta-Data and CAT Tools

• Much ITS and ITS2.0 metadata is already

implicitly supported in OmegaT and other CAT

tools.

Some examples from OmegaT…

www.cngl.ie

Localization note in HTML (ITS)

www.cngl.ie

Localization note in HTML (OmegaT)

www.cngl.ie

RTL and LTR mixed in a segment (ITS)

www.cngl.ie

RTL and LTR mixed in a segment (OmegaT)

Shift + Ctrl + O

www.cngl.ie

Protected text (ITS)

www.cngl.ie

Protected text (OmegaT)

Protected text spans are not included for word counts.

One of a number of features sponsored by Welocalize in OmegaT 3.0

www.cngl.ie

ITS2.0 Confidence Scores

www.cngl.ie

ITS2.0 Confidence Scores (webcat)

http://mobile-webcat.appspot.com

Pablo Porto

www.cngl.ie

ITS2.0 Confidence Scores (OmegaT)

Anuar Serikov

ITS2.0 extensions in OmegaT

www.cngl.ie

ITS2.0 Confidence Scores (OmegaT)

www.cngl.ie

Tabular segment display option

Colours –easy to see segment status but

Inflexible in some regards,

Precedence

e.g. Mark segments with Notes has

precedence over Mark (Un)Translated

Segments.

Sooner or later you run out of easily

distinguishable colours.

Graphics contain more information.

www.cngl.ie

Walking before running…OmegaT current

www.cngl.ie

Walking before running…OmegaT dev

Anuar

www.cngl.ie

Walking before running…OmegaT dev

Anuar

www.cngl.ie

Planned…

www.cngl.ie

An idea… target terminology

www.cngl.ie

An idea… target terminology

www.cngl.ie

its:allowedCharactersRule

Tabular display should make it easier to show infringements of…

its:storageSizeRule

But other options are available. E.g. Validate Tags under Tools menu,

regular expressions, scripts plugin.

www.cngl.ie

Instrumentation in iOmegaT based on TransLog but in

a CAT tool

Via ITS2.0 provRef attribute to implement reference to external provenance

descriptions

www.cngl.ie

Instrumentation

Similar to logging but we wanted to distinguish

it from application logging (e.g. for debugging)

If you have a technology that purports to improve translator speed it

helps to be able to measure that in the field

E.g. Machine Translation, predictive typing

www.cngl.ie

Instrumentation

www.cngl.ie

Instrumentation

Example : Dell MT versus HT (Human Translation) Typical large translation project with 20 translators in 10 languages

Languages: Simplified Chinese, Chinese Taiwan, French, Italian, German, Spanish, Czech, Russian, Portuguese, Brazilian Portuguese (40 days in total)

Quality checks for all languages

Example : Dell MT versus HT (Human Translation) Typical large translation project with 20 translators in 10 languages

Languages: Simplified Chinese, Chinese Taiwan, French, Italian, German, Spanish, Czech, Russian, Portuguese, Brazilian Portuguese (40 days in total)

Quality checks for all languages

Example : Dell MT versus HT (Human Translation)

carried out in Welocalize

Typical large translation project with 20 translators in 10 languages

Languages: Simplified Chinese, Chinese Taiwan, French, Italian, German,

Spanish, Czech, Russian, Portuguese, Brazilian Portuguese (40 days in

total)

Quality checks for all languages

www.cngl.ie

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

Example : Dell MT versus HT (Human Translation)

Typical large translation project with 20 translators in 10 languages

Languages: Simplified Chinese, Chinese Taiwan, French, Italian,

German, Spanish, Czech, Russian, Portuguese, Brazilian

Portuguese (40 days in total)

Quality checks for all languages

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

Example : Dell MT versus HT (Human Translation)

Typical large translation project with 20 translators in 10 languages

Languages: Simplified Chinese, Chinese Taiwan, French, Italian,

German, Spanish, Czech, Russian, Portuguese, Brazilian

Portuguese (40 days in total)

Quality checks for all languages

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

0

200

400

600

800

1000

1200

HT Words/Hr

MT Words/Hr

MT versus Human Translation (base case)

17%+

www.cngl.ie

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

Example : Dell MT versus HT (Human Translation)

Typical large translation project with 20 translators in 10 languages

Languages: Simplified Chinese, Chinese Taiwan, French, Italian,

German, Spanish, Czech, Russian, Portuguese, Brazilian

Portuguese (40 days in total)

Quality checks for all languages

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

Example : Dell MT versus HT (Human Translation)

Typical large translation project with 20 translators in 10 languages

Languages: Simplified Chinese, Chinese Taiwan, French, Italian,

German, Spanish, Czech, Russian, Portuguese, Brazilian

Portuguese (40 days in total)

Quality checks for all languages

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

MT versus MT

Goal, MT impact on translation speed on ongoing basis

More information at try-and-see-mt.org

www.cngl.ie

OmegaT

www.cngl.ie

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

Example : Dell MT versus HT (Human Translation)

Typical large translation project with 20 translators in 10 languages

Languages: Simplified Chinese, Chinese Taiwan, French, Italian,

German, Spanish, Czech, Russian, Portuguese, Brazilian

Portuguese (40 days in total)

Quality checks for all languages

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

Example : Dell MT versus HT (Human Translation)

Typical large translation project with 20 translators in 10 languages

Languages: Simplified Chinese, Chinese Taiwan, French, Italian,

German, Spanish, Czech, Russian, Portuguese, Brazilian

Portuguese (40 days in total)

Quality checks for all languages

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

0

200

400

600

800

1000

1200

ch_si_tr1 ch_si_tr2 ch_ti_tr2 ch_ti_tr1 de_de_tr1 de_de_tr2 es_es_tr1 es_es_tr2 es_lx_tr1 es_lx_tr2 fr_ca_tr1 fr_ca_tr2 fr_fr_tr2 fr_fr_tr1 pt_br_tr1 pt_br_tr2 pt_pt_tr1 pt_pt_tr2 ru_ru_tr2

HT Words/Hr

MT Words/Hr

OmegaT (recent/upcoming developments)

Team translation using SVN / Git (incl. notes feature and lemmatized glossary)

OmegaT support in GlobalSight

LSP adoption (e.g. Velior)

SDLXLIFF support

www.cngl.ie

Summary

• Translation/Post-editing provenance

and Instrumentation becoming more

important downstream

• Open source gaining industry traction

• ITS goes mainly from content to

translator. Can provenance and NLP,

help facilitate terminology creation?