thematic network: enbi european network for biodiversity information objective: to manage an open...

13
Thematic Network: ENBI E uropean N etwork for B iodiversity I nformation Objective: To manage an open network of relevant biodiversity information centres in Europe and other countries of the western European palearctic region. Task IfM: To provide multi-lingual access to biodiversity information in the Internet The work plan is focused on user needs, and on making European biodiversity information available for the end-users. The users include government agencies, decision makers, legislators, scientists, companies, and citizens. Also non-European users are very dependent on access to European information, because many data in European repositories originate from non-European (often developing) countries. Understanding the needs of all these kinds of users is paramount for the dissemination of biodiversity knowledge resources, and common access, with attention to multilingual access (WP 11), is a key issue

Upload: alexina-booth

Post on 03-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Thematic Network:

ENBIEuropean Network for Biodiversity

InformationObjective: To manage an open network of relevant biodiversity information centres in Europe and other countries of the western European palearctic region.

Task IfM: To provide multi-lingual access to biodiversity information in the Internet The work plan is focused on user needs, and on making European biodiversity information available for the end-users. The users include government agencies, decision makers, legislators, scientists, companies, and citizens. Also non-European users are very dependent on access to European information, because many data in European repositories originate from non-European (often developing) countries. Understanding the needs of all these kinds of users is paramount for the dissemination of biodiversity knowledge resources, and common access, with attention to multilingual access (WP 11), is a key issue

Thematic Network: ENBI – European Network for Biodiversity Information

ENBI's main objective is to establish a strong network that will identify biodiversity information priorities to be managed at the European scale. ENBI is also the European contribution to the Global Biodiversity Information Facility (GBIF). ENBI is an EC supported Thematic Network accommodating 65 European institutes representing 24 countries. The project started on the first of January 2003 and will run for 3 years. Total project budget ~ 3 Mill. Euro

Task IfM: WP 11, Multi-lingual access for European biodiversity site

Contribution to IfM (Coordination of WP 11): ~ 250,000 Euro

What is GBIF?

The GBIF Mission

The purpose of the Global Biodiversity Information Facility (GBIF) is to make the world's biodiversity data freely and universally available.

GBIF works cooperatively with and in support of several other international organizations concerned with biodiversity. These include the Clearing House Mechanism and the Global Taxonomic Initiative of the Convention on Biological Diversity , and regional biodiversity information networks.

Participants in GBIF have signed the Memorandum of Understanding, and support network Nodes through which they provide data.

The GBIF Vision

GBIF contributes to economic growth, ecological sustainability, social outcomes and scientific research by increasing the utility, availability and completeness of primary scientific biodiversity information available on the Internet.

Convention on Biological Diversity Rio, 1992: "Clearing-house mechanism" to ensure that all governments have access to the information and technologies they need for their work on biodiversity.

WP 11. Multi-lingual access for European biodiversity site

Start date or starting event: March 2003

Name of the partner responsible: Institute of Marine Research at the University of Kiel, Co-ordinator WP 11, Bernd Ueberschär

N° of the partner responsible: P11

N°s of other partners involved: P1, P8, P15, P21, P31, P42, P48, P51, P62

Person-months per partner: (P11: 24)

Objectives

•Identify biodiversity terms (biology, morphology, taxonomy, geography, genetics) that need special

dictionaries for proper translation, as indicated by unsatisfactory translation by the existing service

•Translate these terms into 8 European languages in a format that can be used efficiently for machine

Translation

•Explore options to build a prototype biodiversity glossary that can be used by European biodiversity web

Sites

•Explore options to significantly improve access to biodiversity information through vernacular names

•Define recommendations on how to tackle these issues in subsequent dedicated work packages, including finding a long-term host for the biodiversity translation service

Introduction WP11

80%80% of the Internet's content is in English of the Internet's content is in English43%43% of Internet users today cannot read English at all of Internet users today cannot read English at all

•Multi-lingual access will be provided to European biodiversity sites through a user-friendly interface on the World Wide Web. (Dutch, French, German, Greek, Italian, Portuguese, and Spanish).

•Traditional 'manual' translation is not an option. Rather, machine translation on demand has to be applied. The quality of machine translation varies greatly. Results can be drastically improved if specialized dictionaries are available for the topic in question, and certain terms are excluded from translation.

•The Translation Service of the European Commission (SDT) has developed its own machine translation system, starting in the 1970s and building on the Systran engine, currently supporting 8 European languages and 18 language pairs. The service provides machine translation of documents for registered users.

•SDT plans to add machine translation of web pages to its services in 2002. This work package will create special biodiversity dictionaries to be integrated in the machine translation service of the European Commission. It will improve access to biodiversity information through vernacular instead of scientific names. It will provide a glossary explaining unfamiliar biodiversity terms in 8 European languages.

•European biodiversity web sites can avail of this service by showing a 'Translate' button on their pages. The activities in this work package will provide vernacular names to the GBIF Electronic Catalogue of Names.

The principle of machine translation: a raw translation of a document, from a source language into a target language, is made on the basis of a system of dictionaries and linguistic programs (e.g. SYSTRAN).

Multi-lingual access for European biodiversity sites through a user-friendly interface on the World Wide Web is the main issue of WP 11

Machine translation quality mainly depends on the kind of documents (with typing errors or complex syntax, the result will be poor) language similarities and on specific dictionaries available.

Machine translation quality

About Machine translation

SYSTRANLinksSYSTRANLinks is a turnkey website translation solution. SYSTRANLinks transforms standard websites and content applications into interactive multilingual hubs, all within seconds. SYSTRANLinks offers all major European, Asian and Russian languages .

Gold 1 Gold 2

Number of languages All All

Available languages All All

Co-branding * Logo Logo

Number of requests per year 2,500,000 5,000,000

Fluid navigation Yes Yes

User Dictionary * 1,000 1,000

API mode Yes Yes

Advanced features * Yes Yes

Customization * Yes Yes

Setup fee 1,350 USD 2,700 USD

Annual fee 12,150 USD 24,300 USD

Systran® offers the service SYSTRANLinks

On request, EC SYSTRAN can be made available to public authorities, schools and universities in EU Member States (permission granted already for the life time of this project)

The Commission has been developing the Systran machine translation system since 1976. EC-SYSTRAN has been developed for internal purposes and is therefore distinct from the commercial version available and on Web sites such as Altavista.

The system can produce 2 000 pages of raw translation per hour. Machine translation is accessible from a PC via a web interface and the electronic mail system. The translation is returned to the user within minutes.

The European Commission offers EC SYSTRAN

SYSTRAN Dictionary Manager guides through the process of adding own terms and expressions. SYSTRAN Dictionary Manager even allows to import and export Text and Excel file-formatted glossaries, and to create your own custom domains for greater term specification.

Customizable Dictionaries

Eurodicautom is the European Commissions multilingual term bank. It an invaluable tool for translators, interpreters, terminologists and other linguists worldwide over the Internet.

Eurodicautom covers a broad spectrum of human knowledge, but is particularly rich in technical and specialised terminology (agriculture, telecommunications, transport, legislation, finance) related to EU policy.

New data are added constantly, obtained from Commission terminologists, translators, linguists from other European and international institutions, research centres, publishers, private experts, etc.

Entries are classified into 48 subject fields (ranging from medicine to public administration). At present the term bank contains about five and a half million entries (terms and abbreviations), subdivided into more than 800 collections.

About Eurodicautom (the Translation Service's terminology bank)

Web Interface of Eurodicautom

Return Page of Eurodicautom (example request for "finfish)

Eurodicautom will help to translate Biodiversity terms.

In turn, the project will add new data to the system

0 5 10 15 20 25

Year 1

Year 1

Year 1

Year 1

Year 1

Year 2

Year 2

Year 2

Year 2

Identification of biodiversity terms that need special dictionaries for proper translation

Usefulness of existing public-domain

glossaries

Options for providing vernacular names to the

GBIF Electronic Catalogue

W

Negotiation Author Contracts

W

Integration of dictionaries with SDT machine translation. Connect, test, improve machine translation

Development prototype of biodiversity glossary

Proof of concept of vernacular names to GBIF

Months

Author contracts for all languages in place

Workshop on how to use and to translate biodiversity dictionaries conducted, Report

'How to proceed' workshop conducted. Report with documentation on how to further develop multi-lingual access to biodiversity

Prototype Biodiversity Glossary available

Beta version of translation service (M18)Online service for quality translation of biodiversity

web pages (M 26). Test with FishBase

Recommendations on how to access to biodiversity information through common names available

Glossaries identified, permissions obtained

List of terms available in English, translation into German available

Options explored, Report

Deliverables

Tasks

Time Table WP11, Time Table WP11, Multilingual AccessMultilingual Access

1. Explore the (technical) options and abilities of the EC SYSTRAN system, e.g. how to compile specialized dictionaries, how to provide dictionaries to the EC SYSTRAN system, how to proceed with Website "on-the-fly translation (date with the Translation Service arranged for tuesday afternoon).

2. Compile a list of biodiversity terms (biology, morphology, taxonomy, geography, genetics) that need special dictionaries for proper translation, under consideration of the needs an input of ENBI-Partners and applying the Eurodicautom system.

3. Compile dictionaries of those terms in 8 languages (with European partners).

4. Application of the EC SYSTRAN system, test with the FishBase Website.

5. Make the technique available for other biodiversity databases

6. Explore options to significantly improve access to biodiversity information through vernacular names

Results of the workshop: Time table in place, Partner from France, Greece, German for specialized dictionaries/glossaries in place, still open from Spain, Portugal, Italy, Dutch. Launching a website (www.multilinguaweb.org). First Workshop date fixed (beginning of October). Agreement with EC Translation Service in place (explore the feasability of special needs for this project, e.g. as Web Site Translation "on the fly" encoding).

Provisional Summary of Tasks "Multilingual Access", WP11: