internationalization 2014.pptx

68
Internationalizati on Localization Locale Java

Upload: nilabjyaghosh

Post on 08-Sep-2015

228 views

Category:

Documents


1 download

TRANSCRIPT

Internationalization

Internationalization

Localization

Locale

Java

Regional Requirements affecting the UI Design

It is important to bear in mind that more than 40 languages need to be supported and in the future the number will be expanding

The UI and software design should be as generic as possible

At least they should be as easily customizable as possible

Customization is needed for creating language, regional, and operator variants

Definition

It is the process of designing a program from the ground up so that it can be changed to reflect the expectations of a new user community without having to modify or recompile its executable code

Designing the program from the ground up with this kind of customization in mind is vital

Different user populations, particularly those speaking different languages or living in different countries, have widely varying expectations for how a computer program interact with them

Definitions

Translation

The process of converting text in one language to text in another language

Localization

The process of modifying a program to conform to the expectations of a given user community.

This can involve not only translating text, but also altering pictures, colors, and window layouts and changing the programs behavior

Internationalization

This does not involve localization its a technique that greatly simplifies the localization process

Localization (which includes translation) is what translation houses do to software to prepare it for a particular market. Internationalization is what programmers do to make sure the program can be localized easily

Definition

Locale includes

language, currency formatting, time and date formatting, and numeric formatting

Separating the locale-specific information from the code

Internationalization isnt a feature

Making sure the product is localized is just part of designing a good user interface

Designing the program from the start with eventual localization in mind can save considerable time down the road

The built-in Java I18N functions support over 70 language-country combinations

Purpose of I18N

Translation can be complicated

Retrofitting an application so that it can be translated is incredibly difficult

Designing the program from the start with eventual localization in mind can save considerable time down the road

Design

Basic approaches

Design one all-inclusive product that is shipped everywhere butwith different defaults

Design a modular product

Plug in localization modules asrequired before shipping to specific locales.

Procedure

Gather information about target locales

Determine the target audience

Make an international impact assessment

Determine which features will function identically acrossinternational boundaries

Determine which features are generally OK but have to beimplemented differently in target locales (e.g., addresses)

Determine which features have to be discarded or completelyre-engineered

Specific ideas

Use visual rather than verbal feedback.

Reduce the number of commands = Empower the mouse.

Make use of multi-cultural images

arrows

books, newspapers or magazines

calculators, computers, monitors or keyboards

How Java helps

Java supplies an extensive library of classes and functions to help you internationalize your programs

Some I18N support comes for free or at very little cost

This often includes partial support for some languages your program doesnt explicitly support

Avoid ad-hoc solutions in favor of the standard ones whenever possible

The Java libraries are more thorough and more thoroughly tested than most ad-hoc solutions would be

Bug fixes and support for new languages come for free

Rules for Internationalization

Separate program code from user interface

keep user-interface data (labels and messages, pictures, window layouts, etc.) out of program code

Text elements may grow or shrink dramatically when translated

An English message can get much smaller when translated into Japanese and much larger when translated into Italian

Overall arrangement of UI elements may change depending on writing direction of text

UI elements themselves may change shape or arrangement depending on writing direction of text

Handling User-Visible Text

Rules for Internationalization

Rely on external libraries whenever possible

In Java, this means using routines and classes in java.text and java.util whenever possible

If you need locale-specific capabilities that Java doesnt provide you and you implement them yourself, keep them separate from the rest of your program logic, and allow for graceful degradation when youre operating in a language they werent designed for

Watch out for hidden assumptions

Be careful to keep hidden assumptions about locale and UI out of your internal processing code

Be careful when converting a piece of data from its internal representation to a human-visible representation

Use locale-sensitive APIs whenever possible

Numeric values

Currency and other denominated numeric values

Dates and times

Characteristics

With the addition of localized data, the same executable can run worldwide

Textual elements, such as status messages and the GUI component labels, are not hardcoded in the program

Support for new languages does not require recompilation

Culturally-dependent data, such as dates and currencies, appear in formats that conform to the end user's region and language

It can be localized quickly

Unicode

A universal character-encoding standard

Unicode starts with a single character set that includes the characters used in the world's major (and quite a few minor) writing systems

Unicode provides several character encoding systems that allow the representation of all these characters, all at the same time

Most newer programming languages (including both Java and JavaScript) are being designed with Unicode as their native character-string format, and Unicode support is appearing in more and more operating systems and applications

Unicode and I18N

Internationalization is completely possible without Unicode

But internationalization is much easier with Unicode

No need for character-set tagging

Easier to implement language-specific processes

Easier to handle multilingual text

Java and Unicode

All text in a running Java program is Unicode

The primitive type char is a single Unicode character

The String type is a collection of char

The java.io package can do conversion

Java and Unicode

class {

String = "north";

double = 3.14159;

}

class UnicodeTest {

public static void main(String[] arg) {

x1 = new ();

System.out.println( x1. );

System.out.println( x1. );

}

}

Resource Bundles

Resource bundles can contain not only messages and other user-visible strings, but icons and pictures, actual UI elements like menus and buttons, and even whole window layouts

Java provides an abstract ResourceBundle object that represents a resource bundle

The process at a glance

ListResourceBundle

ListResourceBundle

import java.util.ListResourceBundle;

import java.awt.Button;

public class MyResource extends java.util.ListResourceBundle {

public Object getContents() {

return contents;

}

static Object[][] contents = {

{ "HELLO_TEXT", "Hello, world!" },

{ "GOODBYE_TEXT", "Goodbye everyone!" },

{ "CANCEL_BUTTON, new Button("Cancel") }};

ListResourceBundle

ListResourceBundle allows you to store any class of object

It implements both handles

getObjects and getKeys for you

The purpose of this bundle is to allow you to define localizable elements as a two-dimensional array of pairs

The Code

Step 1

Create the resource bundles

Create properties files

These are in plain-text format

Store the translatable text of the messages to be displayed

File MyResources_bn.properties will contain the Bengali text corresponding to the keys

Create a ListResourceBundle class

Steps 2

Define the locale

The Locale object identifies a particular language and country

frLocale = new Locale("fr","FR");

String language = new String(args[0]);

String country = new String(args[1]);

currentLocale = new Locale(language, country);

Locale objects are only identifiers

The object is passed to other locale-sensitive objects that perform useful tasks, such as formatting dates and numbers

A ResourceBundle is an example of a locale-sensitive object

Step 3

Create ResourceBundle

contain locale-specific objects

isolate locale-sensitive data

myResources = ResourceBundle.getBundle("MyResources", currentLocale);

ResourceBundle has two subclasses

ListResourceBundle and PropertyResourceBundle

What's in a Name?

If you create MyResources to store all English text, you will create a similarly named file to store the French text

MyResources__

The getBundle method provides a graceful degradation algorithm that attempts to find the nearest matched bundle in cases where the specified bundle can't be found or doesn't exist

MyResource_fr_FR

MyResource_fr_CA

The Algorithm

++ +

++

+

++ +

++

+

An Example

Suppose the default locale determined from the operating system is U.S. English

You may want to load MyResource for the Canadian French locale instead

The call to ResourceBundle.getBundle("MyResource", new Locale("fr","CA")) would produce the following search order

MyResource_fr_CA

MyResource_fr

MyResource_en_US

MyResource_en

MyResource

Inheritance

The ResourceBundle class associates a parent to any bundle

If an object value cannot be found in the specified class, ResourceBundle searches the parent class

This relationship among bundles is established by giving them the same base name

Step 4

Fetch the text from the Resource Bundle

The properties files contain key-value pairs

The key is hardcoded in the program and it must be present in the properties files

String msg1 =myResources.getString("greetings");

Example

import java.util.*;

public class I18NSample {

static public void main(String[] args) {

String language;

String country;

if (args.length != 2) {

language = new String("en");

country = new String("US");

} else {

language = new String(args[0]);

country = new String(args[1]);

}

Locale currentLocale;

ResourceBundle messages;

currentLocale = new Locale(language, country);

messages = ResourceBundle.getBundle("MessagesBundle",

currentLocale);

System.out.println(messages.getString("greetings"));

System.out.println(messages.getString("inquiry"));

System.out.println(messages.getString("farewell"));

}

}

java.text Architecture

Data Driven Model

Most i18n classes are pure execution engines that derive their exact behavior from some kind of textual description

The classs actual behavior is specified by a description (usually a String) that is supplied from outside

The application supplies it at construction time

or

The framework loads one from a resource bundle

Abstract classes and factory methods

Sometimes different code is required to support certain locales.

The Java i18n frameworks are based on abstract classes and factory methods

Factory methods are static methods that return an instance of the native class

like Calendar.getInstance

Factory methods:

have names, unlike constructors, which can clarify code

do not need to create a new object upon each invocation - objects can be cached and reused, if necessary.

can return a subtype of their return type - in particular,can return an object whose implementation class is unknown to the caller.This is a very valuable and widely used feature in many frameworks which use interfaces as the return type of static factory methods.

Common names for factory methods includegetInstanceandvalueOf

The Framework-Overview

The main API classes are all abstract; many of the implementation classes are internal

Collator.getInstance(Locale.FRANCE);

Framework instantiates a subclass based on parameters

Many of the implementation classes are also public

These classes can be instantiated directly by the user

more control

less flexibility as special cases could not be handled

Most classes have multiple factory methods:

DateFormat.getInstance()

DateFormat.getTimeInstance()

DateFormat.getTimeInstance(style)

DateFormat.getTimeInstance(style, locale)

This allows the user to achieve a fair amount of control over the result without having to call the implementation class directly

Locales

A Locale has three parts:

Language ID (drawn from ISO 639): e.g. de = German

Country/Region ID (drawn from ISO 3166): e.g. AT = Austria

Variant code (ad-hoc): can be used to specify Euro currency

Locale objects dont contain data

This approach allows different subsystems to support different sets of locales

Java doesnt follow the POSIX setlocale() model

Instead of setting a locale and then doing something, a Locale object is passed to an i18n objects constructor

I18n objects for several locales can coexist easily

Default Locale

There is, however, a default locale:

Used when the user doesnt supply a locale

Used as a fallback when looking for resource bundles

Picked up from the underlying environment or specified on the command line

(e.g., java -Dlanguage=fr -Dregion=CA MyProgram)

Can be changed (Locale.setDefault()), but not multithread safe

ResourceBundle Hierarchy

Resource Bundles

ResourceBundle provides

A generic interface to any type of actual repository of resource data

A graceful fallback in case information for a particular locale isnt there

All of the resource must be present in the root resource bundle

If you have a resource bundle with a language and a country, DO NOT omit the bundle with just the language

Root resource bundle can be in any language

There is no requirement that all of the bundles in the hierarchy be of the same class

Make all of your resource bundles descend directly from ListResourceBundle and not from each other

Programmatic ID vs Display Names

Locale IDs and time zone IDs (and so forth) are meant only for internal programmatic use

Dont use getName() to get user-visible text;

Use getDisplayName() instead

Message Format

The search found 23 files containing hello on disk MyDisk.

Es gibt 23 Dateien auf Platte MyDisk, die hello enthalten.

The code

dialog.add("Center",

new Label("The search found " + hits + " files containing \"" + searchString + "\" on disk \"" + searchRoot+ "\"."));

The hidden assumption is that the blanks will come in the same order in every language

Use Formatter

dialog.add("Center",

new Label(MessageFormat.format(

"The search found {0} files containing "

+ "\"{1}\" on disk \"{2}\".",

new Object[] {

new Integer(hits),

searchString,

searchRoot

} ) ));

The localizable part of this statement is the pattern string

The pattern string can use some parameters (not all) and can also use a parameter multiple times

Using Formatter

dialog.add("Center",

new Label(MessageFormat.format(

resources.getString("ResultMessage"),

new Object[] {

new Integer(hits),

searchString,

searchRoot

} ) ));

{ "ResultMessage",

"The search found {0} files + "containing \"{1}\" on disk + "\"{2}\"." }

{ "ResultMessage",

"Es gibt {0} Dateien + "auf Platte {2}, + "die {1} enthalten." }

Handling Plurals

The search found 1 files containing hello on disk MyDisk.

The search found 1 file(s) containing hello on disk MyDisk.

The search found {0} files containing "{1}" on disk "{2}".

The search found {0,choice, 0#no files|1#one file|2#{0} files} containing "{1}" on disk "{2}".

Lets say the root of the search could either be a whole disk or a single folder

The search found {0,choice, 0#no files|1#one file|2#{0} files} containing "{1}" {3,choice,0#on disk "{2}" |1#in folder "{2}"}

Handling dynamically generated text

Ammerican Format

French Format

Swiss German Format

Arabic Format

Japanese Format

Handling Numbers

DO NOT use toString() to format uservisible numbers!

DO NOT use parseInt() or other similar functions to parse numeric user input!

Use NumberFormat.format() and NumberFormat.parse() instead

lbl = new Label(Double.toString(milesTraveled));

NumberFormat fmtr = NumberFormat.getInstance();

lbl = new Label(fmtr.format(milesTraveled));

Make fmtr to be static

Number Formatters

All formatters both format and parse

00001111 31 00001111

public final String format(Object obj);

public Object parseObject(String source);

There are convenience methods which can take String or double parameter rather than an Object

NumberFormat provides four factory methods

NumberFormat.getInstance()

NumberFormat.getNumberInstance()

Formats numbers as generic format

NumberFormat.getPercentInstance()

NumberFormat.getCurrencyInstance()

Decimal Format

This class is used to format numbers using standard Western positional notation and the decimal numeration system

The minimum and maximum number of digits on either side of the decimal point can be fixed

Different subclass is needed for

formatting numbers in Chinese characters, or

formatting numbers into words

Decimal Format

String pattern=000000.000;

DecimalFormat myFormatter = new DecimalFormat(pattern);

String output = myFormatter.format(value); System.out.println(value + " " + pattern + " " + output);

Locale-sensitive formatting

NumberFormat nf = NumberFormat.getNumberInstance(loc);

DecimalFormat df = (DecimalFormat)nf;

Patterns

DecimalFormat provides a pattern language as a shortcut way to specify many options at once

0 specifies a required digit position

0000

# specifies an optional digit position

0.###

, specifies the use and position of a grouping separator

#,##0.00

Prefixes and suffixes can be added

Value=12345.67

Output=$12,345.67

Pattern=$###,###.###

; separates positive and negative patterns

$#,##0.00

$#,##0.00;($#,##0.00)

Handling Currency

Handling multiple currencies at the same time can be tricky

You may need to keep track of the units for each value

You may need to perform currency conversions

Handling Dates and Times

Calendar

Handling Dates and Times

Today is Friday, July 2, 1999.

Heute ist Friday, July 2, 1999.

Heute ist Freitag, 2. Juli 1999.

Not only do the words for the days and months change, but so does the order of the fields themselves and the punctuation around them.

In fact, in some countries, the calendar system in use also changes: In Hebrew, for example, April 2, 1999 is the 16th of Nisan, 5759

Handling Dates and Times

Use DateFormat:

DateFormat fmt = DateFormat.getDateTimeInstance(

DateFormat.FULL, DateFormat.DEFAULT);

System.out.println(fmt.format(new Date()));

Use MessageFormat:

MessageFormat.format("It is {0,time,medium} on {0,date,full}., new Object[] { new Date() } );

DateFormat also offers a selection of formats (short, medium, long, and full)

The date and time can be set independently

Date Style

Time Style

Handling Dates and Times

Provides four factory methods:

getInstance()

getDateInstance()

August 26, 1999

getTimeInstance()

12:47 PM

getDateTimeInstance()

August 26, 1999 12:47 PM

getInstance works in the same way as getDateTimeInstance()

Handling Dates and Times

Four time styles:

Short (12:54 PM)

Medium/Default: (12:54:56 PM)

Long: (12:54:56 PM PDT)

Full:(12:54:56.034 PM PDT)

Four date styles:

Short: (8/26/99)

Medium/Default: (8/26/1999 or Aug 26, 1999)

Long: (August 26, 1999)

Full:(Thursday, August 26, 1999)

Calendar

java.util.Date

# of milliseconds since midnight, January 1, 1970 GMT (signed 64-bit integer)

Composing and decomposing

DO NOT use Date.getMonth(), Date.getDate(), Date.getYear(), etc.

Use java.util.Calendar:

Calendar cal = Calendar.getInstance();

cal.setTime(myDate);

myDay = cal.get(Calendar.DAY_OF_MONTH);

myMonth = cal.get(Calendar.MONTH) + 1;

myYear = cal.get(Calendar.YEAR);

The API is better, and itll work with multiple calendar systems

Handling searching and sorting

String comparison is very language-specific

Different definitions of letter

In English, a and v w

In Swedish, a and v w

In Spanish, ch and ll are considered single letters, not pairs of letters

Expanding character sequences

In German, ae and ss

Ignorable characters

e-mail and email are the same word

String Comparison

list[i].compareTo(list[i + 1])

Collator coll = Collator.getInstance();

coll.compare(list[i], list[i+1]) > 0)

There are various levels of equivalence for searching

Primary differences

if somewhere they have different letters (according to the language) in corresponding positions

Different letters: resume vs repeat

Secondary differences

if they dont have a primary difference, but do have two corresponding letters with a diacritic or variant-form difference.

Different diacritics: rsum vs resume

Tertiary differences

if they dont have a primary or secondary difference, but two corresponding letters have different case

Theres also a fourth level of difference, identity difference, which is when there are no tertiary differences, but the strings still are different in terms of the actual hex codes

Whole word searches

Definition of word varies with language

Other Issues

Use Unicode Character Properties

char ch; ... if ((ch >= 'a' && ch = 'A' && ch