localization enablers technology development for indian languages (tdil) programme department of...
TRANSCRIPT
Localization Enablers
Technology Development for Indian Languages (TDIL) Programme
Department of Information Technology,
Ministry of Communication & Information Technology
Govt of India
Swaran Lata, Director
Elitex-2008, January 17, 2008
Web based applications
Dynamic & Static websites with search &
Cross Lingual access
Operating systems
ToolsOffice Suites
Handheld devices
Mobile Devices
Stand alone applications
Globalization of IT
Localization
Internationalization
Process of generalizing a product so that it can handle multiple languages and cultural conventions without the need for re-design.
Taking a product and making it linguistically and culturally appropriate to the target locale (country/ region and language) where it will be used and sold"
I18N L10N
GLOBALIZATION
Focus of National Knowledge Commission of India
The National Knowledge Commission focuses on the objective of transforming India into a knowledge society.
It has concentrated five focus areas of the knowledge paradigm:
Access Creation Concepts Application Services
Information Technology applications , services , tools and resources based on natural language processing techniques would be key enabler for the above five knowledge paradigm.
ACCESS(Information
Retrieval across Languages)
CREATION
[Multilingual E-content]
SERVICES[e-Governance;
G2C , G2G]
APPLICATIONS(Multilingual officeTools & database)
CONCEPTS(Semantic –web)
NLP
Local Area Portals for Gloabalizing local knowledge – digitize earlier existing communities
Deployment
Review QA
Functionality testing of localized software or web applications
Locale Data substitution
Project management
Translation and engineering of software
Translation, Engineering, and Testing of online help/web content/documentation/
multimedia/icons etc.
Localization activities
Localized Application Internationalized application to be Localized
The Tree of Localization Complexities
• Presentation of dates, times, numbers, lists, and other values.
• Collation and sorting• Alternate calendars, which may
include holidays, work rules, weekday/weekend.
• Currency• Tax or regulatory regime
• Machine Translation• Optical Character Recognition• Speech Technologies• Cross Lingual Information
Retrieval
• Project Management• Translation Memory• Translation Tools• Natural language for text processing:
parsing, spell checking, and grammar checking etc
• Automatic Testing Tools
• Encoding Standards• Multimodal input device
standards• Fonts & Rendering Engines• Transliteration & Translation
• Guidelines• Best Practices• Case Studies• Consultancy• Showcasing of Tools
& Technologies
• Parallel Corpora• Speech Corpora• Lexical resources• Ontologies• Dictionaries • Thesaurus• Reference Terminologies
• Certified Localization professionals
• PG Specialization in Localization
• PhD Programmes
• Minimizing Time lag• Benchmarking w.r.t.
English version• Political sensitivity • Pricing issues
• Testing methodologies • Metrics for Linguistic Testing• Certification by Government for
linguistic compliance
Guidelines for enhancing the Localizability
• Design and develop information and applications in a way that meets the needs of the international user
• Design that allows for easy localization at the point of need
• Means to reduce the cost and length of localization
• Checklists grouped by task, and supported by backup examples and explanations for example Browser feature applicability charts:
What browsers and browser versions supported which i18n features (eg. ruby, bidi, utf-8, Lang attribute, :lang, white-space handling, writing-mode:lr-tb, etc, etc.) This would help us implement pages that used the most up-to-date internationalization features appropriate to our audience without the pain of trial and error (or perhaps more likely erring too far on the side of caution).
• Use of constructs in existing markup languages (eg. (x)HTML) to either enable interoperability in a globalised system or improve the localizability of data for example avoid the use of deprecated tags of HTML
• I18n considerations applicable to document and ui design also includes such things as navigation, screen space and layout, implementing graphics, creating source text, designing interoperable systems, choosing and implementing fonts and complex script rendering, multimedia design, handling data format conventions, supplying data for translation/localization etc For example Standard Icons:a) Allowing for regional variation for point to a list of (or link to)
country/language site selectionsb) Text based approaches can be problematic in two ways: – they may not be understood - that's often why you are going to the
selection list (eg. how would the average American find the 'global sites' link on a page in Arabic or Japanese - not made up examples!)
– they may make the user feel like his/her needs are secondary.
• Separation of localizable data from style sheets and templates for example use of CSS for separating presentation aspects from the content while designing websites.
• Guidelines focussing on content development, DTD design and stylesheet development relating to implementation in XHTML, XML, XSL, XSLT, CSS, XForms, SVG, and other similar specifications.
• Guidelines for developing internationalized DTDs such as: white space handling, use of markup vs. Unicode control characters, use of alternative content or entities for different markets, provision of meta data to describe document structure for localization tools, provision of information about available space and other aspects of content affected by localization, the ability to tag terminology and semantics within content
• Language Tags:rfc3066 for 'language tagging' in XML and HTML has inherent difficulties in distinguishing between language and dialect, as well as historical variations. To devise a way of expanding the language tag concept to adequately cover the locale and script oriented needs of the localization community, incorporation of markup to support international script features (such as ruby and Arabic directionality)
• Internationalization tag set:a) Develop a set of tags that others could use for creating DTDs b) In the form of a namespace for inclusion in a schema, or simply a
partial DTD and set of recommendations. c) Methodology for identifying non-translatable content for automatic
identification by the localization tool
Internationalized data formats
• Time and date formats are just two of many ways in which people represent the same or similar information differently. Other examples include numbers, currencies, temperatures, weights, dimensions, addresses, telephone numbers, personal names, paper sizes, etc.
• It would be great if there was a way of capturing this information in a non-culturally-specific way and rendering and (more difficult) recognising it automatically in a culture-specific format, that could be used by people implementing web based communication - be it web page forms or exchange of information between machines.
• The work involved in this is not trivial, but it is desperately needed. Whether the W3C should attempt to produce this or work with others to achieve it is for discussion, but either way I believe it would be very useful.
धन्यवा�दThank You