presented by: prof. manikrao l. dhore mr. abhishek k. dhote department of computer engineering

47
1 Presented By: Prof. Manikrao L. Dhore Mr. Abhishek K. Dhote Department of Computer Engineering Vishwakarma Institute of Technology, Pune, India A Paper On Automating the HTML Localisation Process: An Implementation Using a Java Internationalisation Approach LRC-XI-11 th Annual Internationalisation and Localisation Conference Organised By: Localisation Research Centre (LRC), Department of Computer Science and Information Systems (CSIS), University of Limerick,Limerick,Ireland.

Upload: minor

Post on 11-Jan-2016

43 views

Category:

Documents


3 download

DESCRIPTION

LRC-XI-11 th Annual Internationalisation and Localisation Conference. A Paper On Automating the HTML Localisation Process: An Implementation Using a Java Internationalisation Approach. Presented By: Prof. Manikrao L. Dhore Mr. Abhishek K. Dhote Department of Computer Engineering - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

1

Presented By:

Prof. Manikrao L. Dhore

Mr. Abhishek K. DhoteDepartment of Computer Engineering

Vishwakarma Institute of Technology, Pune, India

A Paper On

Automating the HTML Localisation Process: An Implementation Using a Java Internationalisation Approach

LRC-XI-11th Annual Internationalisation and Localisation Conference

Organised By:

Localisation Research Centre (LRC),

Department of Computer Science and Information Systems (CSIS),

University of Limerick,Limerick,Ireland.

Page 2: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

2

Agenda

Introduction — Why Web Page Localisation?

— Borderless Integration

— Why Multilingual Web Sites?

— What is Locale and multi-locale Operation?

— Internationalisation and Key Challenges

— I18n Standard: Important Issues and Business Context

— Variance : Regional and Cultural Issues

System Design— Web Localisation and Rural India

— Localization Approaches

— Architecture of Servers

System Implementation and Test Results— Configuration of Server

— Localisation Test Results

— Alternative Approach

Conclusion

References

Page 3: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

3

Why Web Page Localisation?

Web Localisation

InformationRepository

Internet

Banking Sector

OnlineBusiness

ServiceSector

OpenLinguistic Barriers

ClosedLinguistic Barriers

ObjectiveInformation Convenience

International Market and Customers

Increased Sales Leads

Advantage of Global growth

Reduce Marketing Costs

Page 4: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

4

Borderless Integration

Model Business Process

Integration Logic

Resource Mapping

AnalyseOptimizeProcess

Integration Deployment

BusinessEntities

BusinessLogic

Customer

MarketResearch

Internet Framework

Local

Global Global

Page 5: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

5

Over 100 million people access the Internet in a language other than English.

Over 50% of web users speak native language other than English

According to Forrester research, 50% of all online sales are expected to occur outside USA.

Web users are four times more likely to purchase from a site that communicates in the customer’s native language.

“Your website is your window to the world…”

Why Multilingual Websites?

Page 6: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

6

Basic Terminology

Locale

Set of features that can be varied depending on the language and culture of the user or the data

Internationalisation

The process of designing software so that it can be easily adapted to different locales

Localisation

The process of adapting software to a locale

Page 7: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

7

What is Locale?

A locale is an abstraction: a data processing structure that identifies a collection of culturally and linguistically affected preferences.

Java locales are associated with upwards of 300 pieces of data— time zone names— collation sequences— the infinity symbol— Number formats— Days of the week

Locales generally do not contain this data themselves. They represent a way of obtaining “localized behavior” in the system.

Locales are generally part of the programming context or environment.

Page 8: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

8

Multi-Locale Operation

SystemContext

MessagePassing

LogicExecution

ClientLocale

ContextSeparation

Design Policy

ServerProcesses

APIs provide late binding localisation

MessagePassing

LogicExecution

ClientLocale

Page 9: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

9

Internationalisation

"I18n" is an abbreviation for the word "Internationalisation". The term "i18n" is derived from its spelling as the letter "i" plus 18 letters plus the letter "n".

I+n1t2e3r4n5a6t7i8o9n10a11l12i13s14a15t16i17o18+n

The extension of this naming convention to the terms Localisation (l10n), Europeanisation (e13n), Japanisation (j10n), Globalisation (g11n), seemed to come somewhat after the invention of "i18n".

— Potentially handle multiple languages, customs in the world— Displaying/ Inputting characters for the users' native languages.— Handling popular encoding for the users' native languages.— Native characters for file names and other items.— Character classification & sorting.— Typesetting and hyphenation rules.

Page 10: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

10

Unicode support and implementation Use of language specific encoding Configuring encoding

Encoding and Character Set

Availability, Performance Continuity of i18n features Translation

Locale and Parameterisation

UI design Handling collation Migration of existing data

Presentation, Processing

Sta

nd

ard

s

Data Correspondence

Refer

ence

Info

rmat

ion

Key Challenges

Page 11: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

11

Important Issues in I18n

CurrencyLanguage rules

UI preferences

Localization

Culture context

Date/TimeCharacter encodings

Business impact

Content management

Page 12: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

12

Internationalisation

OldApplication

NewProduct

NewApplication

OldProduct

To improve effectiveness of globally distributed business users by providing language/culture specific application/product/service interfaces

To reach out to global customer base by providing language/culture specific interfaces and allow for international preferences.

Mergers / Acquisitions.

To consolidate same functionality application/service developed and maintained separately for separate language/region.

To support region specific functionality (due to legal aspects, financial practice etc.).

To provide region specific value added services (like UI, look and feel, Sorting/Searching).

NewService

ExistingService

Business Context of I18n

Page 13: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

13

Regional and Cultural Differences

Software solutions should be designed to fit into the cultural context of the user

ExamplesNaming of the productDifferences in the meanings of jargonsConfusing graphical symbolsNational rules, conventionsReligious beliefs and assumptionsBasic cultural values and customsNo appropriate translations available for phrases and slogansFavorite sports and slangscultural anachronismsReading left-to-right, top-to-bottom etc…

Page 14: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

14

Language and Character Encoding

Language peculiaritiesHyphenationCollationSpellingTransliteration

English: ABC...RSTUVWXYZGerman: AÄB...NOÖ...SßTUÜV…YZSwedish/Finnish: AB...STUVWXYZÅÄÖNorwegian: AB…VWXYÜZÆØÅ

There are various “standards” and they are varied for different languages

ISO standards: ISO-8859-1,2,3,4,5,6,7, Windows-1252Chinese encodings: Big5, Big5-HKCS, GB18030, GB2312Japanese and Korean: EUC-JP,EUC-KR, ISO-2022-JP, ISO-2022-KR

Page 15: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

15

Unicode Character Standard

Developed by the Unicode Consortium

Covers all major living scripts

Version 4.0 has 96,000+ characters

Capacity for 1 million+ characters

Unicode Character Set = ISO 10646

Unicode adds character properties and algorithms

ISO and Unicode work together to synchronize

ISO support enhances international acceptance

Page 16: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

16

Date / Time Formats Variance

Locale Example Format

U. S. A. 2/16/05 mdy, /

France 16.2.05 dmy, .

France 16-2-05 dmy, -

CJKT 2005/2/16 ymd, /

Japan 17/2/16 ¥md, /

Hour minute separators,AM,PM,TimeZone

• India : 4:00 P.M.• U.S.A. : 4:00 p.m.• France : 16.00• Japan : 1600• Japan : 4:00

Page 17: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

17

Numbers / Currency Variance

Varieties in group and fractional separators

• India : 12,34,567.89• England : 12,345.67• Germany : 12.345,67• Switzerland: 12’345,67• Swiss money: 12’345.67• France : 12 345,67

Varieties in symbol placement, symbol length, precision, number width, rounding rules

• India : Rs. 12,34,567.89 ; Re. 1• U.S.A : US $1,234,567.89• France : 12.345,67 €• Portuguese : 12$34ESC• Portuguese : 12$34€

Page 18: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

18

System Design

Page 19: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

19

Indian Languages Profile

Page 20: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

20

 Data Source : 2001 Census of India

 Number Percentage

Hindi 337,272,114 40.22%

Bengali 69,595,738 8.30%

Telugu 66,017,615 7.87%

Marathi 62,481,681 7.45%

Tamil 53,006,368 6.32%

Urdu 43,406,932 5.18%

Gujarati 40,673,814 4.85%

Kannada 32,753,676 3.91%

Malayalam 30,377,176 3.62%

Oriya 28,061,313 3.35%

Punjabi 23,378,744 2.79%

Assamese 13,079,696 1.56%

Sindhi 2,122,848 0.25%

Nepali 2,076,645 0.25%

Konkani 1,760,607 0.21%

Manipuri 1,270,216 0.15%

Kashmiri 56,693 0.01%

Sanskrit 49,736 0.01%

Other Languages 31,142,376 3.71%

Total : 838,583,988 100.00%

Language

Percentage Languages Usage Index

Page 21: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

21

Population resides in villages of India : 70%

Total number of Languages in India : 40

Official Languages : 22

Indian Currency Example

Language Panel

Indian Currency (Value Rs. 10)

15 major Indian Languages

Overall Literacy Rate : 64.20 %

English Language Literacy : 17.75 %

Page 22: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

22

Internationalisation

Text Extraction

Translation

Localisation

Prepare material for localisation(account for text expansion, avoid embedded text..)

Extract text from sourceFiles (graphics, PDFs etc.)

Translate content fromExtracted materials

Replace graphics, change colors, redesign layout to accommodate target culture.

Information Channelisation

Page 23: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

23

Localisation Process

Web page is

“dynamically” converted into

target language

Languageselection

Static web page

is selected and

displayed

TranslationLocalisation

Site Acceptance Factors— Color— Image— Representation

Translation ErrorsText Placement in Separate File

Late Binding

MappingTechniques

Page 24: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

24

Server Architecture

ClientBrowser_1

ClientBrowser_2

ClientBrowser_3

SOCKET

API

HTML Server

Parse Request Module

Localised Content

--------------------------------

Default Alternative Language Response

ClientBrowser_n

Property File

------------------------------------

Page 25: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

25

Definition – To parse the request header

Responsibilities – To parse the request header– To analyze and forward the request– Provide log to the administrator

Compositions – Main server loop– Threads

Interfaces/Ports — Socket APIs

Implementation: Parse Request Module

Page 26: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

26

Parse Request Module Architecture

Main

Server Loop

Thread 1

Thread 2

Thread 3

Thread 4

Thread 5

Thread n

Page 27: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

27

Definition – Default implementation of HTTP protocol– Processes static HTML requests

Responsibilities – Process static HTML request – Process dynamic Internationalisation request

Compositions – Server Processes

Interfaces/Ports— Socket APIs

HTML Server

Page 28: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

28

HTML Server Architecture

Parse ProtocolGET/POST

Default Language

Alternative Language

Default Language

Alternative Language

Static Response

--------------------------------

Static Response

--------------------------------

GET Request Processor

POST Request Processor

.properties ------------------------------------

Page 29: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

29

System Implementation and Test Results

Page 30: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

30

Java Support for Internationalisation

The Locale class lets applications identify locales, allowing for truly multilingual applications.

The ResourceBundle class provides the foundation for localisation, including localization for multiple locales in a single application container.

The Date, Calendar, and TimeZone classes provide the basis for time handling around the globe.

The String and Character classes as well as the java.text package contain rich functionality for text processing, formatting, and parsing.

Text stream input and output classes support converting text between Unicode and other character encoding.

Page 31: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

31

Conversion Process

Character conversion is a pretty straightforward process as long as there is a one-to-one mapping between sequences of Unicode characters on one side and sequences of bytes in another encoding on the other side, and the input only consists of characters or bytes that have mappings.

The reality is :— A single character in a non-Unicode encoding may have multiple equivalent

representations (say, a precomposed character and a sequence of base character and combining mark).

— A character in one encoding may not have an equivalent in the other encoding.

— An invalid sequence of bytes or characters may show up in the input.

Page 32: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

32

Process: Configure Server

Page 33: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

33

Process: Register

Page 34: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

34

Process: Log

Page 35: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

35

Process: Localise Servlet

Page 36: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

36

Web Page in English with IE

Page 37: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

37

Web Page in Spanish with IE

Page 38: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

38

Web Page in Dutch with IE

Page 39: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

39

Web Page in French with IE

Page 40: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

40

Web Page in Italian with IE

Page 41: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

41

Web Page in Portuguese with IE

Page 42: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

42

Web Page in German with IE

Page 43: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

43

Web Page in English with IE

Page 44: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

44

Web Page in Marathi with IE

Page 45: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

45

The Java Localisation API`s come in handy to dynamically localise the web page into alternative languages

The rich set of Java class libraries such as java.util.ResourceBundle and java.util.Locale provide an efficient approach to work with locale specific information

More manageable workspace for users in native language

Regional Settings, Colour, Image representation not disturbed

Improves effectiveness of globally distributed business users by providing language/culture specific application/product/service interfaces

Supports region specific functionality (due to legal aspects, financial practice etc.).

Provides region specific value added services (like UI, look and feel, Sorting/Searching). consolidate same functionality application/service developed and maintained separately for separate language/region.

Conclusion

Page 46: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

46

References

[1]. Fernandez, N. C. (2000), Web Site Localisation and Internationalisation: A Case study, published, City University[2]. Khachane, J, (2005), Web Page Localisation, published Pune University [3]. DEPALMA, D.A. (1999), Strategies for Global Sites, Forrester Research Inc, May 1998 and The eBusiness Report. In: eMarketer[4]. ROCHE, M. (2000) Managing Multilingual Web Applications. 16th International Unicode Conference, Amsterdam[5]. NIELSEN, J. (1999) Designing Web Usability, Indianapolis: New Riders Publishing [6]. Deitsch, Loukides, M, Java Internationalisation

Page 47: Presented By: Prof. Manikrao L. Dhore Mr.  Abhishek K. Dhote Department of Computer Engineering

47