generation of multilingual parallel documentsdragoman.org/pardoc/pre.pdf · workshop on the...
TRANSCRIPT
![Page 1: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/1.jpg)
Workshop on the
Generation of
Multilingual Parallel Documents
European Commission
Luxembourg, 3 April 2017
M.T. Carrasco Benitez
02-04-2017 Carrasco 1
![Page 2: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/2.jpg)
Three fluid parts
* Touristic tour
* Gory details
* Future directions
02-04-2017 Carrasco 2
![Page 3: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/3.jpg)
Philosophy
* Restate the obvious
* Lie if it helps
* Create practical systems - cost
* Philosophy: Unix - Internet
* Disclaimer: mostly copied
http://www.catb.org/esr/writings/taoup/html
https://www.ietf.org/tao.html
02-04-2017 Carrasco 3
![Page 4: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/4.jpg)
Objectives
* Automatic generation in all languages
* Operation: no human intervention
* Manual vs. industrial production
02-04-2017 Carrasco 4
![Page 5: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/5.jpg)
Scenario 1 - automatic
* Let's dream
* Monthly Data Foo
* Fully automatic generation
* Unix cron - first day of each month
* Multilingual parallel set - parset
* 24 PDF files
* A box with 24 paper publications
02-04-2017 Carrasco 5
![Page 6: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/6.jpg)
Scenario 1 - best case scenario
* Periodic publications
* Lots of text reuse
* Complicated typography
* Lots of prone error data
02-04-2017 Carrasco 6
![Page 7: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/7.jpg)
Example - TEC
* Not new
* 1988
* Travaux en Cours (TEC)
* European Parliament
* Command line interface - CLI
* Standalone system
* Used for several years
02-04-2017 Carrasco 7
![Page 8: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/8.jpg)
Example - CIBA
* Mid 90s
* Common Integrated Budgetary
Application (CIBA)
* EU Budget
02-04-2017 Carrasco 8
![Page 9: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/9.jpg)
Applicability
* Not for all publications
* Even 10% very useful
* Typically more
* Often, hard to produce publications
02-04-2017 Carrasco 9
![Page 10: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/10.jpg)
ATP
* Authoring, Translation and Publishing
* Before: preparation - terminology
* After: archiving - data reuse - ready
* Adapt the process to generation
02-04-2017 Carrasco 10
![Page 11: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/11.jpg)
Translation - definition
* Reading -> translate -> writing
* The rest is not translation
* Translators are not typographers
* Translation starts with authoring
* Adapt source text to translation and
generation
02-04-2017 Carrasco 11
![Page 12: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/12.jpg)
Translation - dimensions
* Quality
* Speed
* Cost
* Cost = Quality + Speed
* EU: estimation
* 1952 to 2017: 46 billions
http://dragoman.org/cubero
02-04-2017 Carrasco 12
![Page 13: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/13.jpg)
Scenario 2 - typified
* Typified publications
* Types of documents in EUR-Lex
* Example: EU regulations
http://eur-lex.europa.eu
02-04-2017 Carrasco 13
![Page 14: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/14.jpg)
Scenario 2 - reuse
* Text reuse
* Continuum - 0% to 100%
* CIBA: reuse 60% to 85%
* CIBA: best case - 15% new text
* Main work is management and typography
* Real translation is minor
02-04-2017 Carrasco 14
![Page 15: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/15.jpg)
Scenario 2 - comparison
* Scenarios: 1 vs. 2
* Scenario 1: 100%
* Scenario 2: typified - lots of reuse
02-04-2017 Carrasco 15
![Page 16: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/16.jpg)
Scenario 2 - CAA
* Computer-aided authoring
* Text in context - new texts
* Human friendly system
* Layers - similar to cartography
* EU legislation + regulation +
agriculture + foo
* Controlled vocabulary - terms -
segments
02-04-2017 Carrasco 16
![Page 17: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/17.jpg)
Scenario 2 - CAA - new texts
* Author submits text
* System returns suggestions - segments
* Author accepts/rejects
* Suggestions already translated to all
languages
* Similar to on-demand spelling checker
02-04-2017 Carrasco 17
![Page 18: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/18.jpg)
Scenario 2 - CAA - texts/translation
* New texts for the publication
* No new translations
* Source: translation memory silos
* Silos data size: teras or pentas
* Silos: hard for interactive systems
02-04-2017 Carrasco 18
![Page 19: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/19.jpg)
Complexity
* Engineering, not science - MT
* Deceptively simple
* More complex than it looks
02-04-2017 Carrasco 19
![Page 20: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/20.jpg)
Related techniques - monolingual
* From markup to presentation
* troff
* ReSpec
* Bikeshed
https://www.w3.org/respec
https://github.com/tabatkins/bikeshed
02-04-2017 Carrasco 20
![Page 21: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/21.jpg)
Related techniques - merging
* Mail merge
* Template processor
* Data + template = documents
* Often monolingual
02-04-2017 Carrasco 21
![Page 22: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/22.jpg)
Related techniques - DITA
* Darwin Information Typing Architecture
* Inheritance
* Reuse
02-04-2017 Carrasco 22
![Page 23: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/23.jpg)
Related techniques - CMS
* Web content management systems
* Templates
02-04-2017 Carrasco 23
![Page 24: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/24.jpg)
Related techniques - multilingual
* Internationalization - I18N
* Localization - L10N
* Unix environment variables: LANG LC_C*
* Unix commands: locale gettext
* Common Locale Data Repository
CLDR.unicode.org
02-04-2017 Carrasco 24
![Page 25: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/25.jpg)
Simple overview
* Table: es - en - fr - equal
* Template with numbers of the table
* For each language: es - en - fr
* Replace numbers with the segments
* One output file per lang: es - en - fr
* Human computer
02-04-2017 Carrasco 25
![Page 26: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/26.jpg)
Flat level XML
* Entities: &foo; - [foo]
02-04-2017 Carrasco 26
![Page 27: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/27.jpg)
Parstruct
* Pardoc structure
* Needed for real world complex systems
* Contain all items
* Abstract structure
02-04-2017 Carrasco 27
![Page 28: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/28.jpg)
Parstruct - instantiation
* Abstract structure
* Many instantiations
* Reference instantiation - filesystem
* Easy to produce and consume
* Other instantiations: SQLite -
filesystem with SQLite - URI - XML
* Structure, how is secondary
02-04-2017 Carrasco 28
![Page 29: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/29.jpg)
Parstruct - URI
* http://es.example.com/foo
* foo: an identifier
* Ouput should be easy to use - low cost
* Human and machine readable
02-04-2017 Carrasco 29
![Page 30: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/30.jpg)
Parstruct - creation
* Creation and maintenance
* Artefacts and data
* Raw editing might be realistic for
tabular publications - TEC
* Command line interface - CLI
* Graphical user interface - GUI
* Generated programmatically
* Parstruct is an interface http://arxiv.org/pdf/0808.3889
02-04-2017 Carrasco 30
![Page 31: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/31.jpg)
Parstruct - levels
* Three levels
* Compromise: complexity vs. cost
[1] parcommon - parallel common - to all
[2] partypes - parallel types – reg
[3] parsets - parallel set - reg 2017/1
02-04-2017 Carrasco 31
![Page 32: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/32.jpg)
Parstruct - example
[1] parcommon
c-definition
partypes
[2] regulation
t-definition
parsets
[3] regulation 2017/1
s-definition
docs: es - en - fr
[3] regulation 2017/2
[2] directive
02-04-2017 Carrasco 32
![Page 33: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/33.jpg)
Parstruct - components
* Components nodes: parcommon - partypes
– parsets
* Abbreviated to "components"
* Components are the roots of the main
tree and the subtrees
02-04-2017 Carrasco 33
![Page 34: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/34.jpg)
Definitions - components
* Each component requires a definition
* parcommon: one definition - optional
* partypes: per type - reg, directive
* parsets: per set - reg 2017/1, 2017/2
02-04-2017 Carrasco 34
![Page 35: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/35.jpg)
Definition - structure
* artefact: schema, template, style
* data
* equal: fix - number - date - ref -
quote - synonym - chunk
* lang: es - en - fr
02-04-2017 Carrasco 35
![Page 36: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/36.jpg)
Definition - artefact
* parstruct -> X-definition -> artefact
* parcommon: entities/vocabularies -
include in schema.dtd
* partypes: schema.dtd - reg
* parsets: template.xml – reg 2017/1
02-04-2017 Carrasco 36
![Page 37: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/37.jpg)
Definition - schemas
* parstruct -> t-definition -> artefact -> schema
* It must be adapted to pardoc
* Fine granularity
* No presentation items
* Vocabularies: elements common to related schemas
* Related schemas: regulation - directive
* Vocabulary: Interinstitutional Format Committee (IFC) http://publications.europa.eu/mdr/ifc
02-04-2017 Carrasco 37
![Page 38: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/38.jpg)
Definition - lang
* parstruct -> X-definition -> data ->
lang
* Files with parallel segments
* Files: es - es - fr
* Morphologically independent
* Phrase - multitoken term - token
02-04-2017 Carrasco 38
![Page 39: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/39.jpg)
Definition - equal
* parstruct -> X-definition -> data ->
equal
* equal segment for all languages -
codes
* A good allied for pardoc
* equal: fix - number - date - ref -
quote - synonym
* Keep value as attribute - ISO date
02-04-2017 Carrasco 39
![Page 40: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/40.jpg)
Definition - equal - fix
* Presentation fix: 1000
* XML concatenation: &year;/&serial;
* Year=2017
* Serial=1
* Presentation fix: 2017/1
02-04-2017 Carrasco 40
![Page 41: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/41.jpg)
Definition - equal - L10N
* L10N: localization
* Localizable equal: number - date
* Presentation number: 1.000 - 1,000
* Some transformations possible with XML
* Others more advanced processing
02-04-2017 Carrasco 41
![Page 42: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/42.jpg)
Definition - equal - ref
* References
* Replace code by full reference
* ReSpec
* Normative: [[!RFC3986]]
* Informative: [[RFC3986]]
* Database - specref.org
02-04-2017 Carrasco 42
![Page 43: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/43.jpg)
Definition - equal - quote
* Quotes
* Similar to ref
* Legal texts
* Database
02-04-2017 Carrasco 43
![Page 44: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/44.jpg)
Definition - equal - synonym
* Code to one of several synonyms
* Several methods - random
* Comments for student evaluations
02-04-2017 Carrasco 44
![Page 45: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/45.jpg)
Definition - equal - chunk
* A section or page
* It might contain markups
* Topic - DITA
02-04-2017 Carrasco 45
![Page 46: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/46.jpg)
Run program
* Completed parstruct
* Generate: parstruct -> programs ->
output
* Warning about unavailable data
* Guaranteed parallelity
02-04-2017 Carrasco 46
![Page 47: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/47.jpg)
Content output
* Reference canonical markup: clean XML
* Clean XML: simple - no presentation -
fine granularity
* Further processing - presentation
* Canonical markups are interfaces
02-04-2017 Carrasco 47
![Page 48: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/48.jpg)
Content output - others
* Other canonical markups
* Procedural: TeX - PostScript - troff
* Descriptive: XML - HTML
* Lightweight: Markdown - wiki
* JSON
* Clean XML could be transformed
02-04-2017 Carrasco 48
![Page 49: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/49.jpg)
Presentation output
* Good compromise: XML + CSS
* Web: HTML + CSS
* CSS: Cascading Style Sheets
02-04-2017 Carrasco 49
![Page 50: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/50.jpg)
Further output
* Table of Content - TEC
* Generic approach
* Not specific TEC-like
02-04-2017 Carrasco 50
![Page 51: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/51.jpg)
Grey zone
* In content or presentation
* Automatic numbering - CSS
* equal transformations
http://dragoman.org/laf
02-04-2017 Carrasco 51
![Page 52: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/52.jpg)
Programs
* medinfo
* medrun
* medfoo
* Command line interface - CLI
* Spartan system
* TEC system was simpler
* Used for several years
02-04-2017 Carrasco 52
![Page 53: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/53.jpg)
Libraries
* Transformations
* Equal
* 2017-03-30T16:25:10Z
* Thu 30 Mar 2017 18:25:10 CEST
02-04-2017 Carrasco 53
![Page 54: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/54.jpg)
Package or perish
* Organise files
* Filesystem naming convention
* Filesystem Hierarchy Standard - Linux
* Root directory: foo
* One file: foo.med - zip
* zip: .docx - .odt
* Tree - XML - XLIFF
02-04-2017 Carrasco 54
![Page 55: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/55.jpg)
MED
* Multilingual Electronic Dossier
* Package
* Contains pardoc and the programs
http://dragoman.org/med
02-04-2017 Carrasco 55
![Page 56: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/56.jpg)
Future directions - pargen
* Open source generic system - pargen
* For many types of documents
* Not specific systems: TEC-like
* Specific systems on top pargen
02-04-2017 Carrasco 56
![Page 57: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/57.jpg)
Approach
* Mostly an organisational endeavour
* Not an engineering challenge
* Long-term - forever
02-04-2017 Carrasco 57
![Page 58: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/58.jpg)
Governance
* Foundation
* Copy: Linux, Mozilla, Apache
* Stakeholders: EU - UN - researchers -
vendors
02-04-2017 Carrasco 58
![Page 59: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/59.jpg)
Ecosystem
* Copy successful approaches
* Governance and technical
* Internet - IETF
02-04-2017 Carrasco 59
![Page 60: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/60.jpg)
Interoperability
* Internet technologies
* Connection to other systems
* Enterprise Resource Planning (ERP)
02-04-2017 Carrasco 60
![Page 61: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/61.jpg)
Interfaces
* Parstruct
* Canonical markups
02-04-2017 Carrasco 61
![Page 62: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/62.jpg)
Development style
* Running code wins - IETF
* Standard <-> code
02-04-2017 Carrasco 62
![Page 63: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/63.jpg)
Standalone - CLI
* If one cannot implement an existing
document in standalone, forget it
* For example: a report from the
European Court of Auditors (ECA)
* Later move to new documents
* Mockup regulation
* Real regulation
* Bigger: all the ECA reports - mining
02-04-2017 Carrasco 63
![Page 64: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/64.jpg)
Server
* Cooperation
* Next system: web-based + XML
02-04-2017 Carrasco 64
![Page 65: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/65.jpg)
Many implementations
* At least two independent
implementations - IETF
* Generic systems – not TEC-like
dedicated system
02-04-2017 Carrasco 65
![Page 66: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/66.jpg)
* Good illustration
* Simpler than pardoc
* Evolving for over half a century
02-04-2017 Carrasco 66
![Page 67: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/67.jpg)
Email - components
* Format: email
* Protocol: SMTP - servers
* Protocols: POP3, IMAP - client
02-04-2017 Carrasco 67
![Page 68: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/68.jpg)
Email - format
* Uses not originally intended
* Mailing lists - HTML
02-04-2017 Carrasco 68
![Page 69: Generation of Multilingual Parallel Documentsdragoman.org/pardoc/pre.pdf · Workshop on the Generation of Multilingual Parallel Documents European Commission Luxembourg, 3 April 2017](https://reader035.vdocuments.site/reader035/viewer/2022070906/5f76981434c25b5fd41b74cb/html5/thumbnails/69.jpg)
End
02-04-2017 Carrasco 69