dqm localization for china rcd

Upload: pummysharma

Post on 04-Jun-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 DQM Localization for China RCD

    1/41

    RELEASE CONTENT DOCUMENT

    Data Quality ManagementLocalization for China (v1)

    Oracle China Research and Development Center

    Last Updated: 24-Jun-05

    Version: 1.4

    Copyright 2005 Oracle Corporation

    All Rights Reserved

  • 8/13/2019 DQM Localization for China RCD

    2/41

    Change Record

    Date Author Version Change Reference

    11-Apr-05 APG 1.0 New Document

    22-Apr-05 APG 1.1 Updated reference documents

    03-Jun-05 APG 1.2 Updated the match rules.

    07-Jun-05 APG 1.3 Added appendix A

    24-Jun-05 APG 1.4 Updated Word Replacement List

    Contributors

    Date Author Version Document

    Reviewers

    Name Position

    Approvers

    Name Position

    Distribution

    Copy Number Name Location

  • 8/13/2019 DQM Localization for China RCD

    3/41

    Table of Contents

    1. Disclaimer 4

    2. Introduction 5

    2.1. Purpose of Document 5

    2.2. Reference Documents 5

    3. Data Quality Management (DQM) 6

    3.1. DQM Localization for China 6

    3.1.1. Overview 6

    3.1.2. Features 6

    3.1.2.1. Seeded Attributes and Transformation Functions 6

    3.1.2.2. Chinese Specific Word Replacement Lists 7

    3.1.2.3. Chinese Specific Transformation Functions 10

    3.1.2.4. Seeded Chinese Specific Match rules 17

    3.1.3. Product Dependencies 19

    3.1.4. Third Party Integration Points 19

    3.1.5. Terminology 19

    Appendix A: Seeded Chinese Specific Match Rules 20

    SAMPLE: SEARCH CN 20

    HZ_PERSON_ADVANCED_SEARCH_MATCH_RULE_CN 22

    HZ_PERSON_SIMPLE_SEARCH_RULE_CN 24

    HZ_ORG_ADV_SEARCH_RULE_CN 25

    HZ_ORG_SIMPLE_SEARCH_RULE_CN 27

    DL SMART SEARCH CN 28

    SAMPLE: IDENTICAL_PERSON_CN 30

    SAMPLE: IDENTICAL_ORGANIZATIONS_CN 31

    SAMPLE: SIMILAR_ORGANIZATION_CN 33

    SAMPLE: SIMILAR_PERSON_CN 34

    DL SYSTEM DUPLICATE IDENTIFICATION CN 36

    BULK MATCH: IDENTICAL ORGANIZATIONS CN 38

    BULK MATCH: IDENTICAL PERSONS CN 40

  • 8/13/2019 DQM Localization for China RCD

    4/41

    1. Disclaimer

    This Release Content Document (RCD) describes product features that are proposed

    for the specified release of the Oracle E-Business Suite. This document describes

    new or changed functionality only. Existing functionality from prior releases is not

    described.

    This RCD in any form, software or printed matter, contains proprietary information

    that is the exclusive property of Oracle Corporation. This document is subject to

    change without notice until such time as details of the release are finalized and should

    not, therefore, be taken as a commitment to deliver functionality.

    This Release Content Document is intended to outline new or changed functionality

    only. It is intended for information purposes only, and may not be incorporated into

    any contract. It is not a commitment to deliver any material, code, or functionality,

    and should not be relied upon in making purchasing decisions. The

    development, release, and timing of any features or functionality described in this

    Release Content Document remain at the sole discretion of Oracle.

  • 8/13/2019 DQM Localization for China RCD

    5/41

    2. Introduction

    2.1. Purpose of Document

    The Release Content Document (RCD), produced as part of Oracle Product ReleaseProcess (PRP), communicates information about new or changed functionality in the

    specified release of the Oracle E-Business Suite. Existing functionality from prior

    releases is not described.

    2.2. Reference Documents

    Name Location Completion

    Date

    Oracle Customers Online User Guide Part No. A96178-05

    Oracle Data Librarian User Guide Part No. B12312_02

    Oracle Trading Community Architecture User Guide Part No. B12310_02

    Oracle Trading Community Architecture Administration Guide Part No. B10854_03

    Oracle Trading Community Architecture Reference Guide Part No. B12311-02

    Oracle Trading Community Architecture Data Quality

    Management

    Part No. A97626-02

    Instructions for Applying Oracle Trading Community Architecture

    Mini-pack 11i.HZ.N

    MetaLink Note 289826.1

  • 8/13/2019 DQM Localization for China RCD

    6/41

    3. Data Quality Management (DQM)

    3.1. DQM Localization for China

    3.1.1. OverviewThe functionality delivered is mainly geared toward meeting the requirements of

    China customers who are using the DQM engine on data that is different from seeded

    English data. To accommodate the transforming of Chinese localized data we are

    introducing Seeded Customer Attributes, Chinese Specific Word Replacement Lists,

    Chinese Specific Transformation Functions and Seeded Match Rules. Some customer

    attributes required in China will be delivered as seeded ones and customers can use

    them directly without any customization or coding. Chinese Specific Word

    Replacement Lists include general Simplified Chinese word lists (dictionaries) which

    may be used in party, address, contact or contact point. Chinese Specific

    Transformation Functions are based on Non-Delimited Word Replacement Lists.

    They allow the customers to use Chinese languages where words are not separated by

    white spaces (different from English), to standardize their data effectively. That is,

    they can analyze or parse Chinese sentences and process China specific data

    accurately, so we can handle the Chinese NLP (Natural Language Processing) issue

    when searching and identifying the duplicate party, address, contact and contact

    point. To accommodate the matching of Simplified Chinese data more effectively we

    also seeded some Chinese Specific Match Rules.

    3.1.2. Features

    3.1.2.1. Seeded Attributes and Transformation Functions

    3.1.2.1.1. Seeded Customer Attributes

    We added two seeded customer attributes for Chinese person search and

    identification requirement.

    Type of Personal Identification

    Driver's License, Identification Number, Passport Number

    Personal Identification Number

    If the type is ID number, it will be a 15 or 18 bit number

    3.1.2.1.2. Seeded Configuration and Setup

    DQM Localization will additionally set the status of all transformation functions,

    which can be used in Chinese, to active. It is easier for Chinese users to define

    match rules, and also can improve staging performance greatly.

    Note: if you want to use Seeded English DQM match rules (e-business

    suite standard functionality), please ensure all the transformation

    functions used in this match rule have been set to active, and then run

    DQM Staging program (Staging Command: STAGE_ALL_DATA) once.

    (See: DQM Staging Program, Oracle Trading Community Architecture

    Data Quality Management)

  • 8/13/2019 DQM Localization for China RCD

    7/41

    3.1.2.2. Chinese Specific Word Replacement Lists

    Seeded 12 Word Replacement Lists (Dictionaries), covering all DQM attributes. We

    will give several examples for every word replacement list.

    3.1.2.2.1. RM_MARK_CN

    This word list contains symbols, marks and so on as the original word, while the

    replacement words are null. For example:

    Original Word Replacement Word Condition (Optional) Entity Value

    *

    )

    @

    3.1.2.2.2. FULL_PART_CN

    This word list contains full width symbols, numeric characters and alphabetic

    characters as the original word, while the replacement words are the

    corresponding half width. For example:

    Original Word Replacement Word Condition (Optional) Entity Value

    A

    b

    &

    3.1.2.2.3. CHAR_TO_PINYIN

    To replace Simplified Chinese characters with pinyin. For example:

    Original Word Replacement Word Condition (Optional) Entity Value

    XIAO

    WANG

    3.1.2.2.4. TRADITION_TO_SIMPLIFY

    This list is for replacing traditional Chinese with Simplified Chinese. For

    example:

    Original Word Replacement Word Condition (Optional) Entity Value

    3.1.2.2.5. ORG_TYPE_CN

  • 8/13/2019 DQM Localization for China RCD

    8/41

    This list is for stripping out anything equivalent to legal organization type in

    China, organization abbreviation, English name to Chinese name, etc. For

    example:

    Original Word Replacement Word Condition (Optional) Entity Value

    3.1.2.2.6. ORG_BIS_CN

    To remove business or industry words from organization name string. For

    example:

    Original Word Replacement Word Condition (Optional) Entity Value

    3.1.2.2.7. ORG_NICKNAME_CN

    To replace organization identification name with their nickname or abbreviation.

    For example:

    Original Word Replacement Word Condition (Optional) Entity Value

    ICBC

    ORACLE

    3.1.2.2.8. PROVINCE_CN

    This list is to delete provinces. For example:

    Original Word Replacement Word Condition (Optional) Entity Value

    3.1.2.2.9. CITY_CN

    This list is to delete city abbreviation. For example:

    Original Word Replacement Word Condition (Optional) Entity Value

  • 8/13/2019 DQM Localization for China RCD

    9/41

    3.1.2.2.10. COUNTY_CN

    This list is to delete county abbreviation. For example:

    Original Word Replacement Word Condition (Optional) Entity Value

    3.1.2.2.11. PROVINCE_TO_SIMPLE

    Convert various kinds of province names to standard abbreviations. For example:

    Original Word Replacement Word Condition (Optional) Entity Value

    3.1.2.2.12. REGION_SYNONYM_CN

    To replace city, county with abbreviated one. For example:

    Original Word Replacement Word Condition (Optional) Entity Value

    3.1.2.2.13. COUNTRY_CN

    To replace city, county with abbreviated one. For example:

    Original Word Replacement Word Condition (Optional) Entity Value

    3.1.2.2.14. PERSON_CN

    To replace Chinese surname. For example:

    Original Word Replacement Word Condition (Optional) Entity Value

  • 8/13/2019 DQM Localization for China RCD

    10/41

    3.1.2.3. Chinese Specific Transformation Functions

    We provide 26 localization transformation functions. Totally seeded more than 120

    transformations for all 60+ attributes.

    3.1.2.3.1. EXACT_CN

    To obtain a string that contains only the alphabetic, numeric and Chinese

    characters, without any space and in half width format. In the result, all

    alphabetic letters should be capitalized.

    Description in English:

    Convert full width to half width; Removes non-alphanumeric characters;

    Forces upper case

    Description in Chinese:

    Example:

    From:

    To: A1

    3.1.2.3.2. EXACT_CN (URL)

    This transformation function obtains a half width capitalized letters URL string

    for exact match.

    Description in English:

    Forces upper case; Convert full width to half width

    Description in Chinese:

    + Example:

    From: http://www.oracle.com

    To: WWW.ORACLE.COM

    3.1.2.3.3. EXACT_CN (EMAIL)

    This transformation function obtains a half width capitalized letters email string

    for exact match.

    Description in English:

    Convert full width to half width + Forces upper case

    Description in Chinese:

    +

    Example:

    From:@oracle.Com

    To: [email protected]

    3.1.2.3.4. CLEANSE_CN (NUMBER)

    This transformation function remove all the characters besides number to

    recognize duplication of attributes that should contains numeric characters only

    such as phone number.

  • 8/13/2019 DQM Localization for China RCD

    11/41

    Description in English:

    Remove nonnumeric characters; Convert full width to half width

    Description in Chinese:

    Example:

    From: 86-10-82786000 001

    To: 861082786000001

    3.1.2.3.5. CLEANSE_CN (ID NUMBER)

    This transformation function remove all the characters besides number to

    recognize duplication of attributes that should contains numeric characters, and

    then convert new type Chinese ID Number to old type Chinese ID number.

    Description in English:

    Convert new type Chinese ID Number to old type Chinese ID number.

    Description in Chinese:

    1815

    Example:

    From: 51272819491001001X

    To: 512728491001001

    3.1.2.3.6. CLEANSE_CN (URL)

    This transformation function tries to obtain a critical string from a URL string by

    means of the following operations.

    Description in English:

    Convert full width to half width; Replace non-alphanumeric characters with

    white space + Keep first five words + Domain name word replacement +

    Remove vowels and double letters

    Description in Chinese:

    + + +

    +

    Example:

    From: http://www.oracle.com

    To: ORCL

    3.1.2.3.7. CLEANSE_CN (PHONE)

    This transformation function removes the 0 from in front of a numeric string.

    Description in English:

    Remove nonnumeric characters; Convert full width to half width; Remove all

    "0" in front of the string.

    Description in Chinese:

    0 Example:

    From: (010)

  • 8/13/2019 DQM Localization for China RCD

    12/41

    To: 10

    3.1.2.3.8. CLEANSE_CN (EMAIL)

    This transformation function tries to catch incorrect vowel usage and typing

    errors as well as mistakes with domain names.

    Description in English:

    Convert full width to half width; Replace non-alphanumeric characters withwhite space + Domain name word replacement + Remove vowels and double

    letters

    Description in Chinese:

    + + +

    Example:

    From:@oracle.Com

    To: CDC ORCL

    3.1.2.3.9. CLEANSE_CN (PROVINCE)

    This transformation function replaces various kinds of province names to

    standard abbreviations.

    Description in English:

    Replaces various kinds of province names with standard abbreviations.

    Description in Chinese:

    Example:

    From:

    To:

    3.1.2.3.10. CLEANSE_CN (CITY)

    This transformation function removes the words with the same meaning as City

    from a string.

    Description in English:

    Removes the words with the same meaning as City from a string.

    Description in Chinese:

    Example:

    From:

    To:

    3.1.2.3.11. CLEANSE_CN (COUNTY DISTRICT)

    This transformation function removes the words with the same meaning as

    County and district from a string.

    Description in English:

  • 8/13/2019 DQM Localization for China RCD

    13/41

    Removes the words with the same meaning as County and district from a

    string.

    Description in Chinese:

    Example:

    From:

    To:

    3.1.2.3.12. REVERSE (PHONE)

    This transformation function reverses the numeric characters so that the detail

    phone number without area codes can be recognized.

    Description in English:

    Reverses the Phone Number and then compares the information.

    Description in Chinese:

    Example:

    From: 86-10-82786123

    To: 3216872

    3.1.2.3.13. FORMAT TO NUMBER

    This transformation function replaces non-numeric characters to single '-'.

    Description in English:

    Replaces non alphanumeric characters to single '-'

    Description in Chinese:

    '-'

    Example:

    From:

    To: 14-1-2-A

    3.1.2.3.14. CORE DOMAIN EXTRACTION CN

    This transformation function tries to extract the core domain of e-mail addresses

    and ignore ISP e-mail domains.

    Description in English:

    Extracts core domain of e-mail address

    Description in Chinese:

    Example:

    From: [email protected]

    To: ORACLE.CO.UK (if CO.UK is an E-Mail Domain Suffixes lookup

    code and not included in the ISP E-Mail Domains lookup type) CO.UK (if

    the input does not match codes in either lookup type)

    3.1.2.3.15. FULL DOMAIN EXTRACTION CN

  • 8/13/2019 DQM Localization for China RCD

    14/41

  • 8/13/2019 DQM Localization for China RCD

    15/41

    From:

    To:

    3.1.2.3.19. WR ORG NICKNAME

    Usually, an organization has got a standard name, while people often use a

    nickname instead because the later one is shorter or easy to memorize. This

    transformation function tries to replace differences of nickname with its standard

    name.

    Description in English:

    Organization name word replacement

    Description in Chinese:

    Example:

    From:Oracle

    To:

    3.1.2.3.20. WR ORG TYPE+NICKNAME

    This transformation function is an enhancement of WR_ORG_TYPE and

    WR_ORGNICKNAME, which colligate the two together.

    Description in English:

    Remove the legal organization types; Organization name word replacement

    Description in Chinese:

    +

    Example:From:Oracle

    To:

    3.1.2.3.21. WR ORG TYPE+REGION+BRANCH

    Some companies have branch offices in different areas or different fields. This

    transformation function try to recognize these branch offices, while as

    prerequisites the organization types and regions need to be recognized.

    Description in English:

    Remove the legal organization types, Administrative Region and branchoffices

    Description in Chinese:

    Example:

    From:Oracle

    To: ORACLE

    3.1.2.3.22. WR ORG TYPE+REGION+BRANCH+NICKNAME

    This transformation function add the ability of replacing nickname to WR ORG

    TYPE+REGION+BRANCH for more capable of recognizing duplicated

    organization names.

  • 8/13/2019 DQM Localization for China RCD

    16/41

    Description in English:

    Remove the legal organization types, Administrative Region and branch

    offices; Organization name word replacement

    Description in Chinese:

    +

    Example:

    From:Oracle

    To:

    3.1.2.3.23. WR ORG TYPE+REGION+BIZ+BRANCH

    In Chinese context the Business or Industry suffixes of an organization name can

    be in various formats that mean the same. This transformation function removes

    these suffixes from an organization name that is processed by WR ORG

    TYPE+REGION+BRANCH function.

    Description in English:

    Remove the legal organization types, Administrative Region, Branch offices

    and Business or Industry

    Description in Chinese:

    Example:

    From:Oracle

    To: ORACLE

    3.1.2.3.24. WR ORG TYPE+REGION+BIZ+BRANCH+NICKNAME

    This transformation function is an overall integration of the functions above,

    which provide more powerful capability of duplication recognizing.

    Description in English:

    Remove the legal organization types, Administrative Region and branch

    offices; Organization name word replacement

    Description in Chinese:

    +

    Example:

    From:Oracle

    To:

    3.1.2.3.25. WR PURGE REGION

    This transformation function tries to purge region above city and county level

    form an address string.

    Description in English:

    Purge province, city and county level region name from an address string.

    Description in Chinese:

    Example:

  • 8/13/2019 DQM Localization for China RCD

    17/41

    From:

    To: 1314

    3.1.2.3.26. WR TRADITION TO SIMPLIFY

    This transformation function replaces traditional Chinese characters with

    simplified Chinese characters.

    Description in English:

    Replaces traditional Chinese characters with simplified Chinese characters

    Description in Chinese:

    Example:

    From:

    To:

    3.1.2.4. Seeded Chinese Specific Match rules

    The Chinese match rules will be constructed mainly based on real Chinese customer

    requirements and seed English ones. We provide a lot of generic seeded match rules

    that you can use or base your custom match rules on.

    There are three types of match rule: Search, Expanded Duplicate Identification and

    Bulk Duplicate Identification. Other Oracle applications that implement DQM can

    use the Search type match rules for identifying duplicates of entered or updated party

    information as well as these match rules for search. You can use the seeded Expanded

    Duplicate Identification type match rules for batch duplicate identification to identify

    duplicates that currently exist within your TCA registry. And you can use the Bulk

    Duplicate Identification match rules to identify duplicates when you bulk import data.

    For more information about match rules, seeAppendix A: Seeded Chinese Specific

    Match Rules.

    3.1.2.4.1. SAMPLE: SEARCH CN

    Description in English:

    Extensive online search based on commonly used attributes for Simplified

    Chinese

    Description in Chinese:

    3.1.2.4.2. HZ_PERSON_ADVANCED_SEARCH_MATCH_RULE_CN

    Description in English:

    HZ: Person Advanced Search Match Rule for Simplified Chinese

    Description in Chinese:

    HZ

    3.1.2.4.3. HZ_PERSON_SIMPLE_SEARCH_RULE_CN

    Description in English:

    HZ: Person Simple Search Match Rule for Simplified Chinese

  • 8/13/2019 DQM Localization for China RCD

    18/41

    Description in Chinese:

    HZ

    3.1.2.4.4. HZ_ORG_ADV_SEARCH_RULE_CN

    Description in English:

    HZ: Organization Advanced Search Match Rule for Simplified Chinese

    Description in Chinese:HZ

    3.1.2.4.5. HZ_ORG_SIMPLE_SEARCH_RULE_CN

    Description in English:

    HZ: Organization Simple Search Match Rule for Simplified Chinese

    Description in Chinese:

    HZ

    3.1.2.4.6. DL SMART SEARCH CN

    Description in English:

    Rule used for Smart Search in Data Librarian for Simplified Chinese

    Description in Chinese:

    3.1.2.4.7. SAMPLE: IDENTICAL_PERSON_CN

    Description in English:

    Finds identical person parties for Simplified Chinese

    Description in Chinese:

    3.1.2.4.8. SAMPLE: IDENTICAL_ORGANIZATION_CN

    Description in English:

    Finds identical Organization Parties for Simplified Chinese

    Description in Chinese:

    3.1.2.4.9. SAMPLE: SIMILAR_ORGANIZATION_CN

    Description in English:

    Finds duplicate organizations that have similar names, address, contacts or

    contact points for Simplified Chinese

    Description in Chinese:

    3.1.2.4.10. SAMPLE: SIMILAR_PERSON_CN

    Description in English:

    Finds duplicate organizations that have similar names, address, or contact

    points for Simplified Chinese

    Description in Chinese:

  • 8/13/2019 DQM Localization for China RCD

    19/41

    3.1.2.4.11. DL SYSTEM DUPLICATE IDENTIFICATION

    Description in English:

    Rule used for System Duplicate Identification in de-duplication for

    Simplified Chinese

    Description in Chinese:

    3.1.2.4.12. BULK MATCH: IDENTICAL ORGANIZATIONS CN

    Description in English:

    Bulk Duplicate Identification match rule to identify organization matches for

    Simplified Chinese

    Description in Chinese:

    3.1.2.4.13. BULK MATCH: IDENTICAL PERSONS CN

    Description in English:

    Bulk Duplicate Identification match rule to identify person matches for

    Simplified Chinese

    Description in Chinese:

    3.1.3. Product Dependencies

    DQM Globalization enhancement HZ.11i.N (3618299)

    No DQM customization.

    3.1.4. Third Party Integration Points

    No update in DQM Localization

    3.1.5. Terminology

    Term Definition

  • 8/13/2019 DQM Localization for China RCD

    20/41

    Appendix A: Seeded Chinese Specific Match Rules

    DQM Localization will provide 13 seeded Chinese specific match rules that you can use or base

    your custom match rules on. Also you can use seeded English match rules depending on your

    business needs.

    You must run the DQM Compile All Rules Program before you can use seeded match rules.

    See also: Seeded Match Rules, Oracle Trading Community Architecture Reference Guide

    SAMPLE: SEARCH CN

    Description in English: Extensive online search based on commonly used attributes for

    Simplified Chinese

    Description in Chinese:

    Purpose:Search

    Automerge:No

    Attribute Match: Match All Attributes

    Acquisition

    This table shows the seeded attributes and transformation functions for the acquisition part of

    the matching process.

    Attribute

    Name

    Entity Filter Transformation Name

    Name Party No WR ORG TYPE+REGION+BRANCH+NICKNAME

    WR CHINESE PINYIN

    WR PERSON

    Address Address No Purge Region

    City Address No CLEANSE CITY

    Province Address No CLEANSE PROVINCE

    Contact Name Contact No WR PERSON

    Phone

    Number

    Contact

    Point

    No REVERSE

    e-mail

    Address

    Contact

    Point

    No CLEANSE_CN (EMAIL)

    Scoring

    This table shows the seeded thresholds for the scoring part of the matching process.

  • 8/13/2019 DQM Localization for China RCD

    21/41

    Threshold Value

    Match Threshold 100

    Override Threshold

    Automatic Merge

    Threshold

    This table shows the seeded attributes and transformation functions for the scoring part of the

    matching process.

    Attribute

    Name

    Entity Score Transformation Name Weight

    (%)

    Type Simila

    rity

    (%)

    Name Party 50 EXACT_CN 100 Exact

    WR ORG TYPE+NICKNAME 90 Exact

    WR ORG

    TYPE+REGION+BRANCH+NI

    CKNAME

    50 Exact

    Address Address 60 EXACT_CN 100 Exact

    Purge Region 70 Exact

    City Address 30 CLEANSE CITY 100 Exact

    Province Address 20 CLEANSE PROVINCE 100 Exact

    Contact

    Name

    Contact 40 EXACT_CN 100 Exact

    WR PERSON 90 Exact

    WR CHINESE PINYIN 70 Exact

    Phone

    Number

    Contact

    Point

    80 CLEANSE_CN (NUMBER) 100 Exact

    REVERSE Exact

    e-mail

    Address

    Contact

    Point

    80 EXACT_CN (EMAIL) 100 Exact

    CLEANSE_CN (EMAIL) 80 Exact

  • 8/13/2019 DQM Localization for China RCD

    22/41

    HZ_PERSON_ADVANCED_SEARCH_MATCH_RULE_CN

    Description in English: HZ: Person Advanced Search Match Rule for Simplified Chinese

    Description in Chinese:

    Purpose:Search

    Automerge:No

    Attribute Match: Match Any Attributes

    Acquisition

    This table shows the seeded attributes and transformation functions for the acquisition part of

    the matching process.

    Attribute

    Name

    Entity Filter Transformation Name

    Name Party No WR PERSON

    WR CHINESE PINYIN

    Registry ID Party No EXACT CN

    Account

    Number

    Party No EXACT (NUMBER)

    Personal

    Identification

    Party No CLEANSE_CN (ID NUMBER)

    Address Address No Format to Number

    City Address No CLEANSE CITY

    Postal Code Address No EXACT_CN

    Province Address No CLEANSE PROVINCE

    Country Address No EXACT

    Job Title Contact No EXACT_CN

    Phone

    Number

    Contact

    Point

    No REVERSE

    e-mail

    Address

    Contact

    Point

    No CLEANSE_CN (EMAIL)

    Scoring

    This table shows the seeded thresholds for the scoring part of the matching process.

  • 8/13/2019 DQM Localization for China RCD

    23/41

    Threshold Value

    Match Threshold 100

    Override Threshold

    Automatic Merge

    Threshold

    This table shows the seeded attributes and transformation functions for the scoring part of the

    matching process.

    Attribute

    Name

    Entity Score Transformation Name Weigh

    t (%)

    Type Similar

    ity (%)

    Name Party 50 EXACT_CN 100 Exact

    WR PERSON 90 Exact

    WR CHINESE PINYIN 70 Exact

    Registry ID Party 60 EXACT CN 100 Exact

    Account

    Number

    Party 60 EXACT (NUMBER) 100 Exact

    Personal

    Identification

    Party 100 CLEANSE_CN (ID

    NUMBER)

    100 Exact

    Address Address 40 EXACT_CN 100 Exact

    Format to Number 70 Exact

    City Address 20 CLEANSE CITY 100 Exact

    Postal Code Address 40 EXACT_CN 100 Exact

    Province Address 20 CLEANSE PROVINCE 100 Exact

    Country Address 20 EXACT 100 Exact

    Job Title Contact 40 EXACT_CN 100 Exact

    Phone Number Contact

    Point

    60 CLEANSE_CN

    (NUMBER)

    100 Exact

    REVERSE 80

    e-mail Address Contact

    Point

    60 EXACT_CN (EMAIL) 100 Exact

  • 8/13/2019 DQM Localization for China RCD

    24/41

    CLEANSE_CN (EMAIL) 80 Exact

    HZ_PERSON_SIMPLE_SEARCH_RULE_CN

    Description in English: HZ: Person Simple Search Match Rule for Simplified Chinese

    Description in Chinese:

    Purpose:Search

    Automerge:No

    Attribute Match: Match Any Attributes

    Acquisition

    This table shows the seeded attributes and transformation functions for the acquisition part of

    the matching process.

    Attribute

    Name

    Entity Filter Transformation Name

    Name Party No WR PERSON

    WR CHINESE PINYIN

    Registry ID Party No EXACT

    Account

    Number

    Party No EXACT (NUMBER)

    Job Title Contact No EXACT_CN

    Phone

    Number

    Contact

    Point

    No REVERSE

    e-mail

    Address

    Contact

    Point

    No CLEANSE_CN (EMAIL)

    Scoring

    This table shows the seeded thresholds for the scoring part of the matching process.

    Threshold Value

    Match Threshold 100

    Override Threshold

    Automatic Merge

    Threshold

    This table shows the seeded attributes and transformation functions for the scoring part of the

    matching process.

  • 8/13/2019 DQM Localization for China RCD

    25/41

    Attribute

    Name

    Entity Score Transformation Name Weight

    (%)

    Type Similar

    ity (%)

    Name Party 50 EXACT_CN 100 Exact

    WR PERSON 90 Exact

    WR CHINESE PINYIN 70 Exact

    Registry

    ID

    Party 100 EXACT 100 Exact

    Account

    Number

    Party 60 EXACT (NUMBER) 100 Exact

    Job Title Contact 20 EXACT_CN 100 Exact

    PhoneNumber

    ContactPoint

    80 CLEANSE_CN(NUMBER) 100 Exact

    REVERSE 80 Exact

    e-mail

    Address

    Contact

    Point

    70 EXACT_CN (EMAIL) 100 Exact

    CLEANSE_CN (EMAIL) 80 Exact

    HZ_ORG_ADV_SEARCH_RULE_CN

    Description in English: HZ: Organization Advanced Search Match Rule for Simplified Chinese

    Description in Chinese: HZ

    Purpose:Search

    Automerge:No

    Attribute Match: Match Any Attributes

    Acquisition

    This table shows the seeded attributes and transformation functions for the acquisition part of

    the matching process.

    Attribute

    Name

    Entity Filter Transformation Name

    Name Party No WR ORG

    TYPE+REGION+BIZ+BRANCH+NICKNAME

    Registry ID Party No EXACT

    Account Party No EXACT (NUMBER)

  • 8/13/2019 DQM Localization for China RCD

    26/41

    Number

    Taxpayer ID Party No EXACT_CN

    Address Address No Format to Number

    City Address No CLEANSE CITY

    URLContact

    PointNo CLEANSE_CN (URL)

    Scoring

    This table shows the seeded thresholds for the scoring part of the matching process.

    Threshold Value

    Match Threshold 60

    Override Threshold

    Automatic Merge

    Threshold

    This table shows the seeded attributes and transformation functions for the scoring part of the

    matching process.

    Attribute

    Name

    Entity Score Transformation Name Weight

    (%)

    Type Simila

    rity

    (%)

    Name Party 60 EXACT_CN 100 Exact

    WR ORG TYPE+NICKNAME 90 Exact

    WR ORG

    TYPE+REGION+BRANCH+NI

    CKNAME

    50 Exact

    Registry

    ID

    Party 60 EXACT 100 Exact

    Account

    Number

    Party 60 EXACT (NUMBER) 100 Exact

    Taxpayer

    ID

    Party 60 EXACT_CN 100 Exact

    Address Address 40 EXACT_CN 100 Exact

    Format to Number 70 Exact

  • 8/13/2019 DQM Localization for China RCD

    27/41

    City Address 20 CLEANSE CITY 100 Exact

    Postal

    Code

    Address 40 EXACT_CN 100 Exact

    Province Address 20 EXACT_CN 100 Exact

    CLEANSE PROVINCE 100 Exact

    Country Address 20 EXACT 100 Exact

    Phone

    Number

    Contact

    Point

    60 CLEANSE_CN (NUMBER) 100 Exact

    URLContact

    Point60 CLEANSE_CN (URL) 100 Exact

    HZ_ORG_SIMPLE_SEARCH_RULE_CN

    Description in English: HZ: Organization Simple Search Match Rule for Simplified Chinese

    Description in Chinese: HZ

    Purpose:Search

    Automerge:No

    Attribute Match: Match Any Attributes

    Acquisition

    This table shows the seeded attributes and transformation functions for the acquisition part ofthe matching process.

    Attribute

    Name

    Entity Filter Transformation Name

    Name Party No WR ORG

    TYPE+REGION+BIZ+BRANCH+NICKNAME

    Registry ID Party No EXACT

    Account

    Number

    Party No EXACT (NUMBER)

    Taxpayer ID Party No EXACT_CN

    URLContact

    PointNo CLEANSE_CN (URL)

    Scoring

    This table shows the seeded thresholds for the scoring part of the matching process.

  • 8/13/2019 DQM Localization for China RCD

    28/41

  • 8/13/2019 DQM Localization for China RCD

    29/41

    Attribute

    Name

    Entity Filter Transformation Name

    Name Party No WR ORG TYPE+REGION+BRANCH+NICKNAME

    WR CHINESE PINYIN

    WR PERSON

    Registry ID Party No EXACT

    Tax

    Registration

    Num

    Party No EXACT CN

    Personal

    Identification

    Party No CLEANSE_CN (ID NUMBER)

    Party Type Party No EXACT

    Address Address No Purge Region

    City Address No CLEANSE CITY

    Province Address No CLEANSE PROVINCE

    Country Address No EXACT

    Scoring

    This table shows the seeded thresholds for the scoring part of the matching process.

    Threshold Value

    Match Threshold 100

    Override Threshold

    Automatic Merge

    Threshold

    This table shows the seeded attributes and transformation functions for the scoring part of the

    matching process.

    Attribute

    Name

    Entity Score Transformation Name Weight

    (%)

    Type Simil

    arity

    (%)

    Name Party 50 EXACT_CN 100 Exact

    WR ORG TYPE+NICKNAME 90 Exact

  • 8/13/2019 DQM Localization for China RCD

    30/41

    WR ORG

    TYPE+REGION+BRANCH+N

    ICKNAME

    80 Exact

    WR CHINESE PINYIN 70 Exact

    WR PERSON 90 Exact

    Registry ID Party 100 EXACT Exact

    Tax

    Registration

    Num

    Party 100 EXACT CN Exact

    Personal

    Identification

    Party 100 CLEANSE_CN (ID NUMBER) Exact

    Party Type Party 10 EXACT Exact

    Address Address 80 EXACT_CN 100 Exact

    Format to Number 70 Exact

    City Address 20 CLEANSE CITY 100 Exact

    Province Address 10 CLEANSE PROVINCE 100 Exact

    Country Address 5 EXACT 100 Exact

    SAMPLE: IDENTICAL_PERSON_CN

    Description in English: Finds identical person parties for Simplified Chinese

    Description in Chinese:

    Purpose:Expanded Duplicate Identification

    Automerge:No

    Attribute Match: Match All Attributes

    Acquisition

    This table shows the seeded attributes and transformation functions for the acquisition part of

    the matching process.

    Attribute Name Entity Filter Transformation

    Name

    Party Type Party Yes EXACT

    Personal

    Identification

    Party No CLEANSE_CN (ID

    NUMBER)

  • 8/13/2019 DQM Localization for China RCD

    31/41

    Phone Number Contact

    Point

    No REVERSE

    E-Mail Address Contact

    Point

    No CLEANSE_CN

    (EMAIL)

    Scoring

    This table shows the seeded thresholds for the scoring part of the matching process.

    Threshold Value

    Match Threshold 60

    Override Threshold

    Automatic Merge

    Threshold

    This table shows the seeded attributes and transformation functions for the scoring part of the

    matching process.

    Attribute

    Name

    Entity Score Transformation Name Weight

    (%)

    Type Similar

    ity (%)

    Personal

    Identification

    Party 80 CLEANSE_CN (ID

    NUMBER)

    100 Exact

    Phone Number Contact

    Point

    80 CLEANSE_CN

    (NUMBER)

    100 Exact

    REVERSE 80 Exact

    E-Mail

    Address

    Contact

    Point

    80 EXACT_CN (EMAIL) 100 Exact

    CLEANSE_CN (EMAIL) 75 Exact

    SAMPLE: IDENTICAL_ORGANIZATIONS_CN

    Description in English: Finds identical Organization Parties for Simplified Chinese

    Description in Chinese:

    Purpose:Expanded Duplicate Identification

    Automerge:No

    Attribute Match: Match All Attributes

    Acquisition

    This table shows the seeded attributes and transformation functions for the acquisition part of

    the matching process.

  • 8/13/2019 DQM Localization for China RCD

    32/41

  • 8/13/2019 DQM Localization for China RCD

    33/41

    Number

    SAMPLE: SIMILAR_ORGANIZATION_CN

    Description in English: Finds duplicate organizations that have similar names, address,

    contacts or contact points for Simplified Chinese

    Description in Chinese:

    Purpose:Expanded Duplicate Identification

    Automerge:No

    Attribute Match: Match Any Attributes

    Acquisition

    This table shows the seeded attributes and transformation functions for the acquisition part of

    the matching process.

    Attribute Name Entity Filter Transformation Name

    Name Party No WR ORG

    TYPE+REGION+BRANCH+NICKNAME

    Tax Registration

    Num

    Party No EXACT CN

    DUNS Number Party No EXACT CN

    Address Address No PURGE REGION

    Postal Code Address No EXACT CN

    Phone Number

    Flexible Format

    Contact

    Point

    No CLEANSE_CN (NUMBER)

    REVERSE

    Contact Name Contact No WR CHINESE PINYIN

    WR PERSON

    Scoring

    This table shows the seeded thresholds for the scoring part of the matching process.

    Threshold Value

    Match Threshold 100

    Override Threshold

  • 8/13/2019 DQM Localization for China RCD

    34/41

    Automatic Merge

    Threshold

    This table shows the seeded attributes and transformation functions for the scoring part of the

    matching process.

    Attribute

    Name

    Entity Score Transformation Name Weight

    (%)

    Type Simila

    rity

    (%)

    Name Party 50 EXACT CN 100 Exact

    WR ORG

    TYPE+NICKNAME

    90 Exact

    WR ORG

    TYPE+REGION+BRANC

    H+NICKNAME

    70 Exact

    Tax

    Registration

    Num

    Party 100 EXACT CN 100 Exact

    DUNS

    Number

    Party 100 EXACT CN 100 Exact

    Address Address 60 EXACT CN 100 Exact

    PURGE REGION 80 Exact

    Postal Code Address 20 EXACT CN 100 Exact

    Contact Name Contact 30 EXACT CN 90 Exact

    WR CHINESE PINYIN 70 Exact

    WR PERSON 90 Exact

    Phone Number

    Flexible

    Format

    Contact

    Point

    65 CLEANSE_CN

    (NUMBER)

    100 Exact

    REVERSE 100 Exact

    SAMPLE: SIMILAR_PERSON_CN

    The SAMPLE: SIMILAR_PERSON_CN match rule identifies duplicate parties of type

    Description in English: Finds duplicate persons that have similar names, address, or contact

    points for Simplified Chinese

  • 8/13/2019 DQM Localization for China RCD

    35/41

    Description in Chinese:

    Purpose:Expanded Duplicate Identification

    Automerge:No

    Attribute Match: Match All Attributes

    Acquisition

    This table shows the seeded attributes and transformation functions for the acquisition part of

    the matching process.

    Attribute Name Entity Filter Transformation Name

    Name Party No WR CHINESE PINYIN

    WR PERSON

    Personal Identification Party No CLEANSE_CN (ID NUMBER)

    Address Address No Purge Region

    Postal Code Address No EXACT CN

    E-Mail Address Contact Point No CLEANSE_CN (EMAIL)

    Scoring

    This table shows the seeded thresholds for the scoring part of the matching process.

    Threshold Value

    Match Threshold 100

    Override Threshold

    Automatic Merge

    Threshold

    This table shows the seeded attributes and transformation functions for the scoring part of the

    matching process.

    Attribute

    Name

    Entity Score Transformation Name Weight

    (%)

    Type Similar

    ity (%)

    Name Party 50 EXACT CN 100 Exact

    WR PERSON 80 Exact

    WR CHINESE PINYIN 60 Exact

    Personal

    Identification

    Party 100 CLEANSE_CN (ID

    NUMBER)

    100 Exact

  • 8/13/2019 DQM Localization for China RCD

    36/41

    E-Mail Address Contact

    Point

    20 EXACT_CN (EMAIL) 100 Exact

    CLEANSE_CN

    (EMAIL)

    80 Exact

    Phone Number

    Flexible Format

    Contact

    Point

    65 CLEANSE_CN

    (NUMBER)

    100 Exact

    Address Address 60 EXACT CN 100 Exact

    PURGE REGION 80 Exact

    Postal Code Address 20 EXACT CN 100 Exact

    DL SYSTEM DUPLICATE IDENTIFICATION CN

    Description in English: Rule used for System Duplicate Identification in de-duplication for

    Simplified Chinese

    Description in Chinese:

    Purpose:Expanded Duplicate Identification

    Automerge:No

    Attribute Match: Match Any Attributes

    Acquisition

    This table shows the seeded attributes and transformation functions for the acquisition part ofthe matching process.

    Attribute Name Entity Filter Transformation Name

    Name Party No WR ORG

    TYPE+REGION+BRANCH+NICKNAME

    WR PERSON

    WR CHINESE PINYIN

    Registry ID Party No EXACT

    Party Type Party Yes EXACT

    Tax Registration

    Num

    Party No EXACT_CN

    Personal

    Identification

    Party No CLEANSE_CN (ID NUMBER)

    Address Address No PURGE REGION

  • 8/13/2019 DQM Localization for China RCD

    37/41

    City Address No CLEANSE CITY

    Province Address No CLEANSE PROVINCE

    Country Address Yes EXACT

    URL

    Contact

    Point

    No CLEANSE_CN (URL)

    Scoring

    This table shows the seeded attributes and transformation functions for the scoring part of the

    matching process.

    Attribute

    Name

    Entity Score Transformation Name Weight

    (%)

    Type Simil

    arity

    (%)

    Name Party 50 EXACT_CN 100 Exact

    WR ORG TYPE+NICKNAME 90 Exact

    WR ORG

    TYPE+REGION+BRANCH+NI

    CKNAME

    70 Exact

    WR PERSON 90 Exact

    WR CHINESE PINYIN 60 Exact

    Registry ID Party 100 EXACT CN 100 Exact

    Tax

    Registration

    Num

    Party 100 EXACT_CN 100 Exact

    Personal

    Identification

    Party 100 CLEANSE_CN (ID NUMBER) 100 Exact

    Address Address 80 EXACT_CN 100 Exact

    PURGE REGION 70 Exact

    City Address 20 CLEANSE CITY 100 Exact

    Province Address 20 CLEANSE PROVINCE 100 Exact

    Threshold

    This table shows the seeded thresholds for the scoring part of the matching process.

  • 8/13/2019 DQM Localization for China RCD

    38/41

    Threshold Value

    Match Threshold 100

    Override Threshold

    Automatic Merge

    Threshold

    BULK MATCH: IDENTICAL ORGANIZATIONS CN

    Description in English: Bulk Duplicate Identification match rule to identify organization

    matches for Simplified Chinese

    Description in Chinese:

    Purpose:Bulk Duplicate Identification

    Automerge:No

    Attribute Match: Match All Attributes

    Acquisition

    This table shows the seeded attributes and transformation functions for the acquisition part of

    the matching process.

    Attribute Name Entity Filter Transformation Name

    Name Party No WR ORG TYPE+NICKNAME

    Party Type Party Yes EXACT

    DUNS Number Party No EXACT CN

    Tax Registration Num Party No EXACT CN

    Address Address No EXACT CN

    Postal Code Address No EXACT CN

    Country Address Yes EXACT

    Contact Name Contact No WR PERSON

    URL Contact

    Point

    No CLEANSE_CN (URL)

    Raw Phone Number Contact

    Point

    No REVERSE (PHONE)

    Scoring

  • 8/13/2019 DQM Localization for China RCD

    39/41

  • 8/13/2019 DQM Localization for China RCD

    40/41

    Format

    REVERSE (PHONE) 80 Exact

    BULK MATCH: IDENTICAL PERSONS CN

    Description in English: Bulk Duplicate Identification match rule to identify person matches for

    Simplified Chinese

    Description in Chinese:

    Purpose:Bulk Duplicate Identification

    Automerge:No

    Attribute Match: Match All Attributes

    Acquisition

    This table shows the seeded attributes and transformation functions for the acquisition part of

    the matching process.

    Attribute Name Entity Filter Transformation Name

    Name Party No WR PERSON

    Party Type Party Yes EXACT

    Personal Identification Party No CLEANSE_CN (ID NUMBER)

    Personal Identification

    Type

    Party Yes EXACT CN

    Address Address No Purge Region

    Postal Code Address Yes EXACT CN

    Country Address Yes EXACT

    Email Address Contact

    Point

    No EXACT_CN (EMAIL)

    Raw Phone Number Contact

    Point

    No REVERSE (PHONE)

    CLEANSE_CN (PHONE)

    Scoring

    This table shows the seeded thresholds for the scoring part of the matching process.

    Threshold Value

    Match Threshold 175

  • 8/13/2019 DQM Localization for China RCD

    41/41

    Override Threshold

    Automatic Merge

    Threshold

    250

    This table shows the seeded attributes and transformation functions for the scoring part of the

    matching process.

    Attribute

    Name

    Entity Score Transformation Name Weight

    (%)

    Type Simila

    rity

    (%)

    Name Party 80 EXACT CN 100 Exact

    WR PERSON 90 Exact

    Personal

    Identification

    Party 200 CLEANSE_CN (ID

    NUMBER)

    100 Exact

    Address Address 100 EXACT CN 100 Exact

    Purge Region 80 Exact

    City Address 10 CLEANSE CITY 100 Exact

    Province Address 5 CLEANSE PROVINCE 100 Exact

    Country Address 5 CLEANSE_CN

    (COUNTRY)

    100 Exact

    Email Address Contact

    Point

    60 EXACT_CN (EMAIL) 100 Exact

    CLEANSE_CN (EMAIL) 80 Exact

    Phone Number

    Flexible Format

    Contact

    Point

    70 CLEANSE_CN (PHONE) 100 Exact

    REVERSE (PHONE) 100 Exact