challenges in testing multilingual databases

21
Challenges in Testing Multilingual Databases Gaurav Luthra NishaBanu Anwar Swetha Konduru

Upload: simeon

Post on 23-Feb-2016

57 views

Category:

Documents


0 download

DESCRIPTION

Challenges in Testing Multilingual Databases. Gaurav Luthra NishaBanu Anwar Swetha Konduru. Abstract. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Challenges in Testing Multilingual Databases

Challenges in Testing Multilingual Databases

Gaurav LuthraNishaBanu Anwar

Swetha Konduru

Page 2: Challenges in Testing Multilingual Databases

BFS-US SSDU Research

Abstract

• The scope of market has widened across the globe with the growing demand for international products. For a product to

enter and grow in global markets, key factors like cultural differences, language differences and technical requirements

should be taken into account. To achieve the above objectives, organizations adapt Internationalization and Localization for

a specific region, country or language respectively. This helps the local customer to benefit from a globally available

product or software.

• For example, If an American Bank needs to launch a new application in country like China/Germany where majority of the

population is dependent on its national language and English usage is minimal. Hence there is a necessity to launch the

application in their comfort zone i.e. local language in order to provide a more users friendly application. Thereby,

increasing the customer satisfaction and the revenue for the organization.

• Though, globalization is already in practice, there is always a challenge to translate and store the data – coming from

different languages data sources – into the database. And, this is done through Unicode which enables data storage in a

database from any language in a single character set.

2

Page 3: Challenges in Testing Multilingual Databases

Abstract (cont..)

3

English Chinese Spanish Japanese Arabic0

200400600800

1000120014001600

20102011

• The data that is translated stored (in database) and fed back to an application via the Unicode is to be validated to ensure

the data accuracy and quality.

• In this document we will discuss on the different challenges encountered while validating the data i.e. flowing from an

application to a database for the various language data sources.

Popular Internet language

Pop

ulat

ion

in b

illio

n

Page 4: Challenges in Testing Multilingual Databases

General back groundInternationalization and Localization in Banking

• Due to the recent developments taken place in international financial markets , Globalization has come into existence • The international active banks have acquired significant market power and such global activities are making markets

more risky for the banks to sustain with increasing foreign branches and foreign assets. • Moreover, banks with higher shares of foreign investments run through foreign branches, have higher market power in

local nation. • Therefore, it is not necessary that the entire local population should be comfortable with the English language. • This asks for adoption of Internationalization and Localization by banks in their business processes.

Internationalization The process of enabling an application for the users located in different nations and supporting different

languages is called Internationalization.

Localization Localization is the process of customizing an application for the users located in a specific location or a specific

country.

Need for testing As the business and markets are crossing boundaries for their growth and development, volume of end users

and their data is increasing exponentially as a result risk involved in storing and keeping this data is also escalating. So a proper check has to be maintained to keep all this data safe and secure. This is where testing work starts

from, it checks for the quality and consistency of the data flow.

4

Page 5: Challenges in Testing Multilingual Databases

Why this Paper?

• Data Warehouse testing, in itself, is very exhaustive due to the large amount of data it consists of.

• But its complexity increases when data to be tested belongs to different language data sources.

• It has to be taken care, so that data may not get misinterpret because of different encodings used.

• Thus, testing data from multilingual sources is very critical and challenging.

• Few of the challenges are listed below:

a. Usage of correct version of Unicode used by a database for supporting different languages.

b. Correctness of data loaded in a data base from different applications using different languages i.e. the data entered

via front end application is same as the data stored in a DB.

c. Data lost during Data migration due to improper usage of non Unicode data types.

d. Data misinterpretation due to different encoding versions during file transactions

5

Page 6: Challenges in Testing Multilingual Databases

BFS-US SSDU Research

Pre Requisites for Testing

• Tester should be well aware of the language for which data is to be tested.

• Check if the product is locale aware.

• The Database that need to be tested for a particular application.

• Check whether proper Standards are being followed for data storage.

• Availability of Language Translator software with the tester.

• Requirements related to back end testing should be clear to the tester.

Page 7: Challenges in Testing Multilingual Databases

Proposed Solutions

Page 8: Challenges in Testing Multilingual Databases

BFS-US SSDU Research

Validating the Version of Unicode used by a database for supporting different languages

• Many databases used by different financial institution/banking to hold their data in it. Each database supports different Unicode properties.

• Prior check to Database needs to be tested for the unicode properties.

• Different Databases supports different possible unicode properties .

Different Character Formats

Oracle Sybase ASE DB2® z/OS® DB2 Linux®, UNIX®, Windows® SQL Server

AL16UTF16 UTF8 437 UTF8 1252

AL32UTF8 UTF16 850 UTF16 UTF8

AR8ISO8859P6 cp1250 860 860 UTF16

AR8MSWIN1256 cp1251 863 863  

BLT8MSWIN1257 cp1252 865 865  

CDN8PC863 cp1253 1252 936  

UTF8 cp1254 UTF8 949  

UTF16 cp1255 UTF16 950  

Page 9: Challenges in Testing Multilingual Databases

BFS-US SSDU Research

Validating the Version of Unicode used by a database for supporting different languages (Contd..)

• For Example ,when a tester is trying to validate Japanese data in in Sybase database . Few mismatches detected by the tester are,

like content getting truncated mismatch in the currency format mismatch found in the address Population of nulls,symbols,etc.

• This is because different Unicode encodings supports different lanuage Scripts.

• It is beneficial to use UTF-8 for storing/retrieving European scripts and UTF-16 for storing/retrieving Asian scripts .

• So when an Japanese language(asian script) whose each character is 2 bytes it is beneficial to use UTF16 type character set over UTF 8 for storing/retrieving the Japanese content from the database.

• So keeping the above issue faced the Tester needs to check for the Unicode Database and also the Unicode Datatype which is best suitable for the Particular Language.

Page 10: Challenges in Testing Multilingual Databases

BFS-US SSDU Research

Validating the Correctness of data loaded in the DB

• Data Feed Via Front end is validated with the data retrieved from the Database at the backend.

• To validate the data , firstly the data is verified in the same language as it is displayed in the Application and secondly it is

verified with English.

• After validating data for accuracy, common language validation procedure needs to be followed

a. Check if the database support feature includes Language Support & Territory support.

b. To check for the database Schema and character set of the Table

c. Testing for culture awareness

Page 11: Challenges in Testing Multilingual Databases

BFS-US SSDU Research

Validating the Correctness of data loaded in the DB (Contd..)

a. Check if the database support feature includes Language Support & Territory support.

• The database should be validated to check for the particular language to enable the user to store, process and retrieve the data in his\her native language.

• Here the tester needs to look for the “NLS” parameters, which allows the database session to use different cultural settings. E.g. one can set the euro (EUR) as the primary currency and the Japanese yen (JPY) as the secondary currency for a given database session even when the territory is defined as AMERICA.

b. To check for the database Schema and character set of the Table

• Here the Schema of the table needs to be checked according to the language. It should be supporting the Unicode for encoding the different language and also use proper Unicode data types for storing and retrieving data through the database.

• Validation of character set should be compatible with the database.

• when one pass SQL parameter to the database the data value will be converted according to the character set compatible with that database. i.e. for the database supporting ASCII values of Unicode will get convert into ASCII characters

Page 12: Challenges in Testing Multilingual Databases

BFS-US SSDU Research

Validating the Correctness of data loaded in the DB (Contd..)

c. Testing for culture awareness

• The data retreived needs to be tested for the basic formats like Date,address,time,currency,etc

• Validation of address format: For example in English, name, city, state and postal code in the order of display but for Japanese the order can be postal code, state, city and name.

• Validation of date format: To verify whether the Date and Time format displayed across the application is based on the client locale and also to validate the date value if it’s handled using double-byte numbers.

• Validation of Currency formats: It has to be done depending upon different locale. e.g. in American locale decimal point is used for the showing the value after the one’s position , but in the French locale the decimal point is replaced by a comma for showcasing the lower numerals.

Page 13: Challenges in Testing Multilingual Databases

BFS-US SSDU Research

Validating the Correctness of data loaded in the DB (Contd..)

French LocalePraesent Number Lorem ipsum Date of Birth merces Oratio contact Number

154856 Jack Daniel 24-Août-1980 2.000.000,52 27 avenue du Parc 785-444-6589

19965 Kareem cheikh 15-Sep-75 7.500.000,75 46,3 citerne routier 698-562-4756

143547 Abhishek Nagar 09-août-1987 1.980.000,6250 B vue sur la rivière de route 763-983-6537

84745 Praisey Priscilla 04-mai-1986 2.350.000,00 47, rue Jeyapandian 996-237-7336

158763 Getzi Miranda 20-Juillet-1985 4.534.000,54 11, rue Koilpillai 991-653-1004

American LocaleEmployee Number Employee name Date Of Birth Salary Address Contact Number

154856 Jack daniel 24-Aug-1980 2000000.52 27 Park avenue 785-444-6589

19965 Kareem sheikh 15-Sep-1975 7500000.75 46,3 tank Road 698-562-4756

143547 Abhishek Nagar 09-Aug-1987 1980000.62 50 B river view road 763-983-6537

84745 Praisey Priscilla 04-May-1986 2350000.00 47 Jeyapandian Street 996-237-7336

158763 Getzi Miranda 20-July-1985 4534000.54 11 Koilpillai Street 991-653-1004

American /EnglishDate Field

24 August 198015 September

197509 August 1987

04 May198620 July 1985

Japanese日付フィールド1980年 8月 24日1975年 9月 15日1987年 8月 9日1986年 5月 4日

1985年 7月 20日

Page 14: Challenges in Testing Multilingual Databases

BFS-US SSDU Research

Validating Data loss during Data migration due to incorrect Unicode data types

• One of the biggest challenges in testing Multilingual Databases is during the process of data migration.

• During such migration the below checks need to be considered : target Database should support Unicode characters special data types are used Which language it is moving to

Teradata Sybase

DATA MIGRATION

Page 15: Challenges in Testing Multilingual Databases

BFS-US SSDU Research

Validating Data loss during Data migration due to incorrect Unicode data types (Contd..)

• For Example the source (Teradata) uses SQL Server

which has defined data types like nchar, nvarchar and

ntext to allow the user to store Unicode text

• Target database(Sybase) uses SQL Server 2000 which

has some other defined data types like char, varchar for

same purpose.

• During such migration Tester need to give special

attention to the mapping of the source and target data

columns.

• Because there is more possibility of data getting lost

due to different data types being used at source and

target side and also there may be inconsistency in the

schema of the table

Teradata (SQL Server)

Employee number NUMBER(4)

Employee name NCHAR(15)

Date Of Birth DATE

Salary NUMBER(8,2)

Address NVARCHAR2(20)

Contact Number NUMBER(12)

Sybase (SQL Server2000)

Employee number NUMBER(4)

Employee name CHAR(15)

Date Of Birth DATE

Salary NUMBER(8,2)

Address VARCHAR2(20)

Contact Number NUMBER(12)

Page 16: Challenges in Testing Multilingual Databases

BFS-US SSDU Research

Validating Data loss during Data migration due to incorrect Unicode data types (Contd..)

DATA AFTER MIGRATINGPraesen

t Number

Lorem ipsum

Date of Birth merces Oratio contact

Number

154856 Jack24-

Août-1980

2.000.000,52 ?????

785-444-6589

19965 Kareem 15-Sep-75

7.500.000,75

??????????

698-562-4756

143547 Abhishek

09-Aug-1987

1.980.000,62

??????????????

??/

763-983-6537

84745 Praisey 04-mai-1986

2.350.000,00 ?????

996-237-7336

158763 Getzi 20-

Juillet-1985

4.534.000,54 ???????

991-653-1004

DATA BEFORE MIGRATINGPraesent Number Lorem

ipsumDate of Birth merces Oratio contact

Number

154856 Jack Daniel

24-Août-1980 2.000.0

00,52

27 avenue du Parc

785-444-6589

19965 Kareem cheikh

15-Sep-75

7.500.000,75

46,3 citerne routier

698-562-4756

143547 Abhishek Nagar

09-août-1987

1.980.000,62

50 B vue sur

la rivière

de route

763-983-6537

84745 Praisey Priscilla

04-mai-1986

2.350.000,00

47, rue Jeyapan

dian

996-237-7336

158763 Getzi Miranda

20-Juillet-1985

4.534.000,54

11, rue Koilpillai

991-653-1004

Page 17: Challenges in Testing Multilingual Databases

BFS-US SSDU Research

Validating the encoding versions while file transactions.

• Data misinterpretation is a major problem being faced while retrieving a data from the database.

• Such situations can be avoided by clearly specifying the encoding technique used for encoding that particular data otherwise

database will use the default encoding available with it.

• Use of default encoding by database, sometimes may result in data alteration or misinterpretation. So whichever encoding is to

be used should be clearly mentioned.

• E.g. UTF-8 is default encoding for .NET framework and if a file encoded in UTF-16 is tried to open in .NET framework without

clear specification of encoding (UTF-16).In that case it will use its default encoding (UTF-8) and which will result in unintelligible

input.

Page 18: Challenges in Testing Multilingual Databases

BFS-US SSDU Research

Common Mistakes

As the solution for the various challenges is known, the following points will help in avoiding some of the common mistakes done by a tester new to multilingual testing:

• Usage of standard language translators

• Knowing the databases supporting different languages

• Aware of the validation checks for the different languages as the validation differs from language to language.

• Avoid accessing multiple sessions of same application in different languages overlaps the session.

Page 19: Challenges in Testing Multilingual Databases

References

• http://www.internetworldstats.com/stats7.htm

• Oracle® Database Globalization Support Guide 10g Release 1 (10.1) Part No. B10749-02 , June

2004/ b10749.pdf

• http://publib.boulder.ibm.com/infocenter/idm/v2r2/index.jsp?topic=

%2Fcom.ibm.optimd.install.doc%2F01cgintr%2Fopinstall-r-character_formats.html

• Google Translate : http://translate.google.com/?hl=en&tab=wT#

• Infosys Project Experience

Page 21: Challenges in Testing Multilingual Databases

Thank You !