localization enablers technology development for indian languages (tdil) programme department of...

12
Localization Enablers Technology Development for Indian Languages (TDIL) Programme Department of Information Technology, Ministry of Communication & Information Technology Govt of India Swaran Lata, Director [email protected] Elitex-2008, January 17, 2008

Upload: ronald-richards

Post on 27-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Localization Enablers Technology Development for Indian Languages (TDIL) Programme Department of Information Technology, Ministry of Communication & Information

Localization Enablers

Technology Development for Indian Languages (TDIL) Programme

Department of Information Technology,

Ministry of Communication & Information Technology

Govt of India

Swaran Lata, Director

[email protected]

Elitex-2008, January 17, 2008

Page 2: Localization Enablers Technology Development for Indian Languages (TDIL) Programme Department of Information Technology, Ministry of Communication & Information

Web based applications

Dynamic & Static websites with search &

Cross Lingual access

Operating systems

ToolsOffice Suites

Handheld devices

Mobile Devices

Stand alone applications

Globalization of IT

Page 3: Localization Enablers Technology Development for Indian Languages (TDIL) Programme Department of Information Technology, Ministry of Communication & Information

Localization

Internationalization

Process of generalizing a product so that it can handle multiple languages and cultural conventions without the need for re-design.

Taking a product and making it linguistically and culturally appropriate to the target locale (country/ region and language) where it will be used and sold"

I18N L10N

GLOBALIZATION

Page 4: Localization Enablers Technology Development for Indian Languages (TDIL) Programme Department of Information Technology, Ministry of Communication & Information

Focus of National Knowledge Commission of India

The National Knowledge Commission focuses on the objective of transforming India into a knowledge society.

It has concentrated five focus areas of the knowledge paradigm:

Access Creation Concepts Application Services

Information Technology applications , services , tools and resources based on natural language processing techniques would be key enabler for the above five knowledge paradigm.

Page 5: Localization Enablers Technology Development for Indian Languages (TDIL) Programme Department of Information Technology, Ministry of Communication & Information

ACCESS(Information

Retrieval across Languages)

CREATION

[Multilingual E-content]

SERVICES[e-Governance;

G2C , G2G]

APPLICATIONS(Multilingual officeTools & database)

CONCEPTS(Semantic –web)

NLP

Local Area Portals for Gloabalizing local knowledge – digitize earlier existing communities

Page 6: Localization Enablers Technology Development for Indian Languages (TDIL) Programme Department of Information Technology, Ministry of Communication & Information

Deployment

Review QA

Functionality testing of localized software or web applications

Locale Data substitution

Project management

Translation and engineering of software

Translation, Engineering, and Testing of online help/web content/documentation/

multimedia/icons etc.

Localization activities

Localized Application Internationalized application to be Localized

Page 7: Localization Enablers Technology Development for Indian Languages (TDIL) Programme Department of Information Technology, Ministry of Communication & Information

The Tree of Localization Complexities

• Presentation of dates, times, numbers, lists, and other values.

• Collation and sorting• Alternate calendars, which may

include holidays, work rules, weekday/weekend.

• Currency• Tax or regulatory regime

• Machine Translation• Optical Character Recognition• Speech Technologies• Cross Lingual Information

Retrieval

• Project Management• Translation Memory• Translation Tools• Natural language for text processing:

parsing, spell checking, and grammar checking etc

• Automatic Testing Tools

• Encoding Standards• Multimodal input device

standards• Fonts & Rendering Engines• Transliteration & Translation

• Guidelines• Best Practices• Case Studies• Consultancy• Showcasing of Tools

& Technologies

• Parallel Corpora• Speech Corpora• Lexical resources• Ontologies• Dictionaries • Thesaurus• Reference Terminologies

• Certified Localization professionals

• PG Specialization in Localization

• PhD Programmes

• Minimizing Time lag• Benchmarking w.r.t.

English version• Political sensitivity • Pricing issues

• Testing methodologies • Metrics for Linguistic Testing• Certification by Government for

linguistic compliance

Page 8: Localization Enablers Technology Development for Indian Languages (TDIL) Programme Department of Information Technology, Ministry of Communication & Information

Guidelines for enhancing the Localizability

• Design and develop information and applications in a way that meets the needs of the international user

• Design that allows for easy localization at the point of need

• Means to reduce the cost and length of localization

• Checklists grouped by task, and supported by backup examples and explanations for example Browser feature applicability charts:

What browsers and browser versions supported which i18n features (eg. ruby, bidi, utf-8, Lang attribute, :lang, white-space handling, writing-mode:lr-tb, etc, etc.) This would help us implement pages that used the most up-to-date internationalization features appropriate to our audience without the pain of trial and error (or perhaps more likely erring too far on the side of caution).

• Use of constructs in existing markup languages (eg. (x)HTML) to either enable interoperability in a globalised system or improve the localizability of data for example avoid the use of deprecated tags of HTML

Page 9: Localization Enablers Technology Development for Indian Languages (TDIL) Programme Department of Information Technology, Ministry of Communication & Information

• I18n considerations applicable to document and ui design also includes such things as navigation, screen space and layout, implementing graphics, creating source text, designing interoperable systems, choosing and implementing fonts and complex script rendering, multimedia design, handling data format conventions, supplying data for translation/localization etc For example Standard Icons:a) Allowing for regional variation for point to a list of (or link to)

country/language site selectionsb) Text based approaches can be problematic in two ways: – they may not be understood - that's often why you are going to the

selection list (eg. how would the average American find the 'global sites' link on a page in Arabic or Japanese - not made up examples!)

– they may make the user feel like his/her needs are secondary.

• Separation of localizable data from style sheets and templates for example use of CSS for separating presentation aspects from the content while designing websites.

• Guidelines focussing on content development, DTD design and stylesheet development relating to implementation in XHTML, XML, XSL, XSLT, CSS, XForms, SVG, and other similar specifications.

Page 10: Localization Enablers Technology Development for Indian Languages (TDIL) Programme Department of Information Technology, Ministry of Communication & Information

• Guidelines for developing internationalized DTDs such as: white space handling, use of markup vs. Unicode control characters, use of alternative content or entities for different markets, provision of meta data to describe document structure for localization tools, provision of information about available space and other aspects of content affected by localization, the ability to tag terminology and semantics within content

• Language Tags:rfc3066 for 'language tagging' in XML and HTML has inherent difficulties in distinguishing between language and dialect, as well as historical variations. To devise a way of expanding the language tag concept to adequately cover the locale and script oriented needs of the localization community, incorporation of markup to support international script features (such as ruby and Arabic directionality)

• Internationalization tag set:a) Develop a set of tags that others could use for creating DTDs b) In the form of a namespace for inclusion in a schema, or simply a

partial DTD and set of recommendations. c) Methodology for identifying non-translatable content for automatic

identification by the localization tool

Page 11: Localization Enablers Technology Development for Indian Languages (TDIL) Programme Department of Information Technology, Ministry of Communication & Information

Internationalized data formats

• Time and date formats are just two of many ways in which people represent the same or similar information differently. Other examples include numbers, currencies, temperatures, weights, dimensions, addresses, telephone numbers, personal names, paper sizes, etc.

• It would be great if there was a way of capturing this information in a non-culturally-specific way and rendering and (more difficult) recognising it automatically in a culture-specific format, that could be used by people implementing web based communication - be it web page forms or exchange of information between machines.

• The work involved in this is not trivial, but it is desperately needed. Whether the W3C should attempt to produce this or work with others to achieve it is for discussion, but either way I believe it would be very useful.

Page 12: Localization Enablers Technology Development for Indian Languages (TDIL) Programme Department of Information Technology, Ministry of Communication & Information

धन्यवा�दThank You