dqm localization for china rcd
TRANSCRIPT
-
8/13/2019 DQM Localization for China RCD
1/41
RELEASE CONTENT DOCUMENT
Data Quality ManagementLocalization for China (v1)
Oracle China Research and Development Center
Last Updated: 24-Jun-05
Version: 1.4
Copyright 2005 Oracle Corporation
All Rights Reserved
-
8/13/2019 DQM Localization for China RCD
2/41
Change Record
Date Author Version Change Reference
11-Apr-05 APG 1.0 New Document
22-Apr-05 APG 1.1 Updated reference documents
03-Jun-05 APG 1.2 Updated the match rules.
07-Jun-05 APG 1.3 Added appendix A
24-Jun-05 APG 1.4 Updated Word Replacement List
Contributors
Date Author Version Document
Reviewers
Name Position
Approvers
Name Position
Distribution
Copy Number Name Location
-
8/13/2019 DQM Localization for China RCD
3/41
Table of Contents
1. Disclaimer 4
2. Introduction 5
2.1. Purpose of Document 5
2.2. Reference Documents 5
3. Data Quality Management (DQM) 6
3.1. DQM Localization for China 6
3.1.1. Overview 6
3.1.2. Features 6
3.1.2.1. Seeded Attributes and Transformation Functions 6
3.1.2.2. Chinese Specific Word Replacement Lists 7
3.1.2.3. Chinese Specific Transformation Functions 10
3.1.2.4. Seeded Chinese Specific Match rules 17
3.1.3. Product Dependencies 19
3.1.4. Third Party Integration Points 19
3.1.5. Terminology 19
Appendix A: Seeded Chinese Specific Match Rules 20
SAMPLE: SEARCH CN 20
HZ_PERSON_ADVANCED_SEARCH_MATCH_RULE_CN 22
HZ_PERSON_SIMPLE_SEARCH_RULE_CN 24
HZ_ORG_ADV_SEARCH_RULE_CN 25
HZ_ORG_SIMPLE_SEARCH_RULE_CN 27
DL SMART SEARCH CN 28
SAMPLE: IDENTICAL_PERSON_CN 30
SAMPLE: IDENTICAL_ORGANIZATIONS_CN 31
SAMPLE: SIMILAR_ORGANIZATION_CN 33
SAMPLE: SIMILAR_PERSON_CN 34
DL SYSTEM DUPLICATE IDENTIFICATION CN 36
BULK MATCH: IDENTICAL ORGANIZATIONS CN 38
BULK MATCH: IDENTICAL PERSONS CN 40
-
8/13/2019 DQM Localization for China RCD
4/41
1. Disclaimer
This Release Content Document (RCD) describes product features that are proposed
for the specified release of the Oracle E-Business Suite. This document describes
new or changed functionality only. Existing functionality from prior releases is not
described.
This RCD in any form, software or printed matter, contains proprietary information
that is the exclusive property of Oracle Corporation. This document is subject to
change without notice until such time as details of the release are finalized and should
not, therefore, be taken as a commitment to deliver functionality.
This Release Content Document is intended to outline new or changed functionality
only. It is intended for information purposes only, and may not be incorporated into
any contract. It is not a commitment to deliver any material, code, or functionality,
and should not be relied upon in making purchasing decisions. The
development, release, and timing of any features or functionality described in this
Release Content Document remain at the sole discretion of Oracle.
-
8/13/2019 DQM Localization for China RCD
5/41
2. Introduction
2.1. Purpose of Document
The Release Content Document (RCD), produced as part of Oracle Product ReleaseProcess (PRP), communicates information about new or changed functionality in the
specified release of the Oracle E-Business Suite. Existing functionality from prior
releases is not described.
2.2. Reference Documents
Name Location Completion
Date
Oracle Customers Online User Guide Part No. A96178-05
Oracle Data Librarian User Guide Part No. B12312_02
Oracle Trading Community Architecture User Guide Part No. B12310_02
Oracle Trading Community Architecture Administration Guide Part No. B10854_03
Oracle Trading Community Architecture Reference Guide Part No. B12311-02
Oracle Trading Community Architecture Data Quality
Management
Part No. A97626-02
Instructions for Applying Oracle Trading Community Architecture
Mini-pack 11i.HZ.N
MetaLink Note 289826.1
-
8/13/2019 DQM Localization for China RCD
6/41
3. Data Quality Management (DQM)
3.1. DQM Localization for China
3.1.1. OverviewThe functionality delivered is mainly geared toward meeting the requirements of
China customers who are using the DQM engine on data that is different from seeded
English data. To accommodate the transforming of Chinese localized data we are
introducing Seeded Customer Attributes, Chinese Specific Word Replacement Lists,
Chinese Specific Transformation Functions and Seeded Match Rules. Some customer
attributes required in China will be delivered as seeded ones and customers can use
them directly without any customization or coding. Chinese Specific Word
Replacement Lists include general Simplified Chinese word lists (dictionaries) which
may be used in party, address, contact or contact point. Chinese Specific
Transformation Functions are based on Non-Delimited Word Replacement Lists.
They allow the customers to use Chinese languages where words are not separated by
white spaces (different from English), to standardize their data effectively. That is,
they can analyze or parse Chinese sentences and process China specific data
accurately, so we can handle the Chinese NLP (Natural Language Processing) issue
when searching and identifying the duplicate party, address, contact and contact
point. To accommodate the matching of Simplified Chinese data more effectively we
also seeded some Chinese Specific Match Rules.
3.1.2. Features
3.1.2.1. Seeded Attributes and Transformation Functions
3.1.2.1.1. Seeded Customer Attributes
We added two seeded customer attributes for Chinese person search and
identification requirement.
Type of Personal Identification
Driver's License, Identification Number, Passport Number
Personal Identification Number
If the type is ID number, it will be a 15 or 18 bit number
3.1.2.1.2. Seeded Configuration and Setup
DQM Localization will additionally set the status of all transformation functions,
which can be used in Chinese, to active. It is easier for Chinese users to define
match rules, and also can improve staging performance greatly.
Note: if you want to use Seeded English DQM match rules (e-business
suite standard functionality), please ensure all the transformation
functions used in this match rule have been set to active, and then run
DQM Staging program (Staging Command: STAGE_ALL_DATA) once.
(See: DQM Staging Program, Oracle Trading Community Architecture
Data Quality Management)
-
8/13/2019 DQM Localization for China RCD
7/41
3.1.2.2. Chinese Specific Word Replacement Lists
Seeded 12 Word Replacement Lists (Dictionaries), covering all DQM attributes. We
will give several examples for every word replacement list.
3.1.2.2.1. RM_MARK_CN
This word list contains symbols, marks and so on as the original word, while the
replacement words are null. For example:
Original Word Replacement Word Condition (Optional) Entity Value
*
)
@
3.1.2.2.2. FULL_PART_CN
This word list contains full width symbols, numeric characters and alphabetic
characters as the original word, while the replacement words are the
corresponding half width. For example:
Original Word Replacement Word Condition (Optional) Entity Value
A
b
&
3.1.2.2.3. CHAR_TO_PINYIN
To replace Simplified Chinese characters with pinyin. For example:
Original Word Replacement Word Condition (Optional) Entity Value
XIAO
WANG
3.1.2.2.4. TRADITION_TO_SIMPLIFY
This list is for replacing traditional Chinese with Simplified Chinese. For
example:
Original Word Replacement Word Condition (Optional) Entity Value
3.1.2.2.5. ORG_TYPE_CN
-
8/13/2019 DQM Localization for China RCD
8/41
This list is for stripping out anything equivalent to legal organization type in
China, organization abbreviation, English name to Chinese name, etc. For
example:
Original Word Replacement Word Condition (Optional) Entity Value
3.1.2.2.6. ORG_BIS_CN
To remove business or industry words from organization name string. For
example:
Original Word Replacement Word Condition (Optional) Entity Value
3.1.2.2.7. ORG_NICKNAME_CN
To replace organization identification name with their nickname or abbreviation.
For example:
Original Word Replacement Word Condition (Optional) Entity Value
ICBC
ORACLE
3.1.2.2.8. PROVINCE_CN
This list is to delete provinces. For example:
Original Word Replacement Word Condition (Optional) Entity Value
3.1.2.2.9. CITY_CN
This list is to delete city abbreviation. For example:
Original Word Replacement Word Condition (Optional) Entity Value
-
8/13/2019 DQM Localization for China RCD
9/41
3.1.2.2.10. COUNTY_CN
This list is to delete county abbreviation. For example:
Original Word Replacement Word Condition (Optional) Entity Value
3.1.2.2.11. PROVINCE_TO_SIMPLE
Convert various kinds of province names to standard abbreviations. For example:
Original Word Replacement Word Condition (Optional) Entity Value
3.1.2.2.12. REGION_SYNONYM_CN
To replace city, county with abbreviated one. For example:
Original Word Replacement Word Condition (Optional) Entity Value
3.1.2.2.13. COUNTRY_CN
To replace city, county with abbreviated one. For example:
Original Word Replacement Word Condition (Optional) Entity Value
3.1.2.2.14. PERSON_CN
To replace Chinese surname. For example:
Original Word Replacement Word Condition (Optional) Entity Value
-
8/13/2019 DQM Localization for China RCD
10/41
3.1.2.3. Chinese Specific Transformation Functions
We provide 26 localization transformation functions. Totally seeded more than 120
transformations for all 60+ attributes.
3.1.2.3.1. EXACT_CN
To obtain a string that contains only the alphabetic, numeric and Chinese
characters, without any space and in half width format. In the result, all
alphabetic letters should be capitalized.
Description in English:
Convert full width to half width; Removes non-alphanumeric characters;
Forces upper case
Description in Chinese:
Example:
From:
To: A1
3.1.2.3.2. EXACT_CN (URL)
This transformation function obtains a half width capitalized letters URL string
for exact match.
Description in English:
Forces upper case; Convert full width to half width
Description in Chinese:
+ Example:
From: http://www.oracle.com
To: WWW.ORACLE.COM
3.1.2.3.3. EXACT_CN (EMAIL)
This transformation function obtains a half width capitalized letters email string
for exact match.
Description in English:
Convert full width to half width + Forces upper case
Description in Chinese:
+
Example:
From:@oracle.Com
3.1.2.3.4. CLEANSE_CN (NUMBER)
This transformation function remove all the characters besides number to
recognize duplication of attributes that should contains numeric characters only
such as phone number.
-
8/13/2019 DQM Localization for China RCD
11/41
Description in English:
Remove nonnumeric characters; Convert full width to half width
Description in Chinese:
Example:
From: 86-10-82786000 001
To: 861082786000001
3.1.2.3.5. CLEANSE_CN (ID NUMBER)
This transformation function remove all the characters besides number to
recognize duplication of attributes that should contains numeric characters, and
then convert new type Chinese ID Number to old type Chinese ID number.
Description in English:
Convert new type Chinese ID Number to old type Chinese ID number.
Description in Chinese:
1815
Example:
From: 51272819491001001X
To: 512728491001001
3.1.2.3.6. CLEANSE_CN (URL)
This transformation function tries to obtain a critical string from a URL string by
means of the following operations.
Description in English:
Convert full width to half width; Replace non-alphanumeric characters with
white space + Keep first five words + Domain name word replacement +
Remove vowels and double letters
Description in Chinese:
+ + +
+
Example:
From: http://www.oracle.com
To: ORCL
3.1.2.3.7. CLEANSE_CN (PHONE)
This transformation function removes the 0 from in front of a numeric string.
Description in English:
Remove nonnumeric characters; Convert full width to half width; Remove all
"0" in front of the string.
Description in Chinese:
0 Example:
From: (010)
-
8/13/2019 DQM Localization for China RCD
12/41
To: 10
3.1.2.3.8. CLEANSE_CN (EMAIL)
This transformation function tries to catch incorrect vowel usage and typing
errors as well as mistakes with domain names.
Description in English:
Convert full width to half width; Replace non-alphanumeric characters withwhite space + Domain name word replacement + Remove vowels and double
letters
Description in Chinese:
+ + +
Example:
From:@oracle.Com
To: CDC ORCL
3.1.2.3.9. CLEANSE_CN (PROVINCE)
This transformation function replaces various kinds of province names to
standard abbreviations.
Description in English:
Replaces various kinds of province names with standard abbreviations.
Description in Chinese:
Example:
From:
To:
3.1.2.3.10. CLEANSE_CN (CITY)
This transformation function removes the words with the same meaning as City
from a string.
Description in English:
Removes the words with the same meaning as City from a string.
Description in Chinese:
Example:
From:
To:
3.1.2.3.11. CLEANSE_CN (COUNTY DISTRICT)
This transformation function removes the words with the same meaning as
County and district from a string.
Description in English:
-
8/13/2019 DQM Localization for China RCD
13/41
Removes the words with the same meaning as County and district from a
string.
Description in Chinese:
Example:
From:
To:
3.1.2.3.12. REVERSE (PHONE)
This transformation function reverses the numeric characters so that the detail
phone number without area codes can be recognized.
Description in English:
Reverses the Phone Number and then compares the information.
Description in Chinese:
Example:
From: 86-10-82786123
To: 3216872
3.1.2.3.13. FORMAT TO NUMBER
This transformation function replaces non-numeric characters to single '-'.
Description in English:
Replaces non alphanumeric characters to single '-'
Description in Chinese:
'-'
Example:
From:
To: 14-1-2-A
3.1.2.3.14. CORE DOMAIN EXTRACTION CN
This transformation function tries to extract the core domain of e-mail addresses
and ignore ISP e-mail domains.
Description in English:
Extracts core domain of e-mail address
Description in Chinese:
Example:
From: [email protected]
To: ORACLE.CO.UK (if CO.UK is an E-Mail Domain Suffixes lookup
code and not included in the ISP E-Mail Domains lookup type) CO.UK (if
the input does not match codes in either lookup type)
3.1.2.3.15. FULL DOMAIN EXTRACTION CN
-
8/13/2019 DQM Localization for China RCD
14/41
-
8/13/2019 DQM Localization for China RCD
15/41
From:
To:
3.1.2.3.19. WR ORG NICKNAME
Usually, an organization has got a standard name, while people often use a
nickname instead because the later one is shorter or easy to memorize. This
transformation function tries to replace differences of nickname with its standard
name.
Description in English:
Organization name word replacement
Description in Chinese:
Example:
From:Oracle
To:
3.1.2.3.20. WR ORG TYPE+NICKNAME
This transformation function is an enhancement of WR_ORG_TYPE and
WR_ORGNICKNAME, which colligate the two together.
Description in English:
Remove the legal organization types; Organization name word replacement
Description in Chinese:
+
Example:From:Oracle
To:
3.1.2.3.21. WR ORG TYPE+REGION+BRANCH
Some companies have branch offices in different areas or different fields. This
transformation function try to recognize these branch offices, while as
prerequisites the organization types and regions need to be recognized.
Description in English:
Remove the legal organization types, Administrative Region and branchoffices
Description in Chinese:
Example:
From:Oracle
To: ORACLE
3.1.2.3.22. WR ORG TYPE+REGION+BRANCH+NICKNAME
This transformation function add the ability of replacing nickname to WR ORG
TYPE+REGION+BRANCH for more capable of recognizing duplicated
organization names.
-
8/13/2019 DQM Localization for China RCD
16/41
Description in English:
Remove the legal organization types, Administrative Region and branch
offices; Organization name word replacement
Description in Chinese:
+
Example:
From:Oracle
To:
3.1.2.3.23. WR ORG TYPE+REGION+BIZ+BRANCH
In Chinese context the Business or Industry suffixes of an organization name can
be in various formats that mean the same. This transformation function removes
these suffixes from an organization name that is processed by WR ORG
TYPE+REGION+BRANCH function.
Description in English:
Remove the legal organization types, Administrative Region, Branch offices
and Business or Industry
Description in Chinese:
Example:
From:Oracle
To: ORACLE
3.1.2.3.24. WR ORG TYPE+REGION+BIZ+BRANCH+NICKNAME
This transformation function is an overall integration of the functions above,
which provide more powerful capability of duplication recognizing.
Description in English:
Remove the legal organization types, Administrative Region and branch
offices; Organization name word replacement
Description in Chinese:
+
Example:
From:Oracle
To:
3.1.2.3.25. WR PURGE REGION
This transformation function tries to purge region above city and county level
form an address string.
Description in English:
Purge province, city and county level region name from an address string.
Description in Chinese:
Example:
-
8/13/2019 DQM Localization for China RCD
17/41
From:
To: 1314
3.1.2.3.26. WR TRADITION TO SIMPLIFY
This transformation function replaces traditional Chinese characters with
simplified Chinese characters.
Description in English:
Replaces traditional Chinese characters with simplified Chinese characters
Description in Chinese:
Example:
From:
To:
3.1.2.4. Seeded Chinese Specific Match rules
The Chinese match rules will be constructed mainly based on real Chinese customer
requirements and seed English ones. We provide a lot of generic seeded match rules
that you can use or base your custom match rules on.
There are three types of match rule: Search, Expanded Duplicate Identification and
Bulk Duplicate Identification. Other Oracle applications that implement DQM can
use the Search type match rules for identifying duplicates of entered or updated party
information as well as these match rules for search. You can use the seeded Expanded
Duplicate Identification type match rules for batch duplicate identification to identify
duplicates that currently exist within your TCA registry. And you can use the Bulk
Duplicate Identification match rules to identify duplicates when you bulk import data.
For more information about match rules, seeAppendix A: Seeded Chinese Specific
Match Rules.
3.1.2.4.1. SAMPLE: SEARCH CN
Description in English:
Extensive online search based on commonly used attributes for Simplified
Chinese
Description in Chinese:
3.1.2.4.2. HZ_PERSON_ADVANCED_SEARCH_MATCH_RULE_CN
Description in English:
HZ: Person Advanced Search Match Rule for Simplified Chinese
Description in Chinese:
HZ
3.1.2.4.3. HZ_PERSON_SIMPLE_SEARCH_RULE_CN
Description in English:
HZ: Person Simple Search Match Rule for Simplified Chinese
-
8/13/2019 DQM Localization for China RCD
18/41
Description in Chinese:
HZ
3.1.2.4.4. HZ_ORG_ADV_SEARCH_RULE_CN
Description in English:
HZ: Organization Advanced Search Match Rule for Simplified Chinese
Description in Chinese:HZ
3.1.2.4.5. HZ_ORG_SIMPLE_SEARCH_RULE_CN
Description in English:
HZ: Organization Simple Search Match Rule for Simplified Chinese
Description in Chinese:
HZ
3.1.2.4.6. DL SMART SEARCH CN
Description in English:
Rule used for Smart Search in Data Librarian for Simplified Chinese
Description in Chinese:
3.1.2.4.7. SAMPLE: IDENTICAL_PERSON_CN
Description in English:
Finds identical person parties for Simplified Chinese
Description in Chinese:
3.1.2.4.8. SAMPLE: IDENTICAL_ORGANIZATION_CN
Description in English:
Finds identical Organization Parties for Simplified Chinese
Description in Chinese:
3.1.2.4.9. SAMPLE: SIMILAR_ORGANIZATION_CN
Description in English:
Finds duplicate organizations that have similar names, address, contacts or
contact points for Simplified Chinese
Description in Chinese:
3.1.2.4.10. SAMPLE: SIMILAR_PERSON_CN
Description in English:
Finds duplicate organizations that have similar names, address, or contact
points for Simplified Chinese
Description in Chinese:
-
8/13/2019 DQM Localization for China RCD
19/41
3.1.2.4.11. DL SYSTEM DUPLICATE IDENTIFICATION
Description in English:
Rule used for System Duplicate Identification in de-duplication for
Simplified Chinese
Description in Chinese:
3.1.2.4.12. BULK MATCH: IDENTICAL ORGANIZATIONS CN
Description in English:
Bulk Duplicate Identification match rule to identify organization matches for
Simplified Chinese
Description in Chinese:
3.1.2.4.13. BULK MATCH: IDENTICAL PERSONS CN
Description in English:
Bulk Duplicate Identification match rule to identify person matches for
Simplified Chinese
Description in Chinese:
3.1.3. Product Dependencies
DQM Globalization enhancement HZ.11i.N (3618299)
No DQM customization.
3.1.4. Third Party Integration Points
No update in DQM Localization
3.1.5. Terminology
Term Definition
-
8/13/2019 DQM Localization for China RCD
20/41
Appendix A: Seeded Chinese Specific Match Rules
DQM Localization will provide 13 seeded Chinese specific match rules that you can use or base
your custom match rules on. Also you can use seeded English match rules depending on your
business needs.
You must run the DQM Compile All Rules Program before you can use seeded match rules.
See also: Seeded Match Rules, Oracle Trading Community Architecture Reference Guide
SAMPLE: SEARCH CN
Description in English: Extensive online search based on commonly used attributes for
Simplified Chinese
Description in Chinese:
Purpose:Search
Automerge:No
Attribute Match: Match All Attributes
Acquisition
This table shows the seeded attributes and transformation functions for the acquisition part of
the matching process.
Attribute
Name
Entity Filter Transformation Name
Name Party No WR ORG TYPE+REGION+BRANCH+NICKNAME
WR CHINESE PINYIN
WR PERSON
Address Address No Purge Region
City Address No CLEANSE CITY
Province Address No CLEANSE PROVINCE
Contact Name Contact No WR PERSON
Phone
Number
Contact
Point
No REVERSE
e-mail
Address
Contact
Point
No CLEANSE_CN (EMAIL)
Scoring
This table shows the seeded thresholds for the scoring part of the matching process.
-
8/13/2019 DQM Localization for China RCD
21/41
Threshold Value
Match Threshold 100
Override Threshold
Automatic Merge
Threshold
This table shows the seeded attributes and transformation functions for the scoring part of the
matching process.
Attribute
Name
Entity Score Transformation Name Weight
(%)
Type Simila
rity
(%)
Name Party 50 EXACT_CN 100 Exact
WR ORG TYPE+NICKNAME 90 Exact
WR ORG
TYPE+REGION+BRANCH+NI
CKNAME
50 Exact
Address Address 60 EXACT_CN 100 Exact
Purge Region 70 Exact
City Address 30 CLEANSE CITY 100 Exact
Province Address 20 CLEANSE PROVINCE 100 Exact
Contact
Name
Contact 40 EXACT_CN 100 Exact
WR PERSON 90 Exact
WR CHINESE PINYIN 70 Exact
Phone
Number
Contact
Point
80 CLEANSE_CN (NUMBER) 100 Exact
REVERSE Exact
e-mail
Address
Contact
Point
80 EXACT_CN (EMAIL) 100 Exact
CLEANSE_CN (EMAIL) 80 Exact
-
8/13/2019 DQM Localization for China RCD
22/41
HZ_PERSON_ADVANCED_SEARCH_MATCH_RULE_CN
Description in English: HZ: Person Advanced Search Match Rule for Simplified Chinese
Description in Chinese:
Purpose:Search
Automerge:No
Attribute Match: Match Any Attributes
Acquisition
This table shows the seeded attributes and transformation functions for the acquisition part of
the matching process.
Attribute
Name
Entity Filter Transformation Name
Name Party No WR PERSON
WR CHINESE PINYIN
Registry ID Party No EXACT CN
Account
Number
Party No EXACT (NUMBER)
Personal
Identification
Party No CLEANSE_CN (ID NUMBER)
Address Address No Format to Number
City Address No CLEANSE CITY
Postal Code Address No EXACT_CN
Province Address No CLEANSE PROVINCE
Country Address No EXACT
Job Title Contact No EXACT_CN
Phone
Number
Contact
Point
No REVERSE
e-mail
Address
Contact
Point
No CLEANSE_CN (EMAIL)
Scoring
This table shows the seeded thresholds for the scoring part of the matching process.
-
8/13/2019 DQM Localization for China RCD
23/41
Threshold Value
Match Threshold 100
Override Threshold
Automatic Merge
Threshold
This table shows the seeded attributes and transformation functions for the scoring part of the
matching process.
Attribute
Name
Entity Score Transformation Name Weigh
t (%)
Type Similar
ity (%)
Name Party 50 EXACT_CN 100 Exact
WR PERSON 90 Exact
WR CHINESE PINYIN 70 Exact
Registry ID Party 60 EXACT CN 100 Exact
Account
Number
Party 60 EXACT (NUMBER) 100 Exact
Personal
Identification
Party 100 CLEANSE_CN (ID
NUMBER)
100 Exact
Address Address 40 EXACT_CN 100 Exact
Format to Number 70 Exact
City Address 20 CLEANSE CITY 100 Exact
Postal Code Address 40 EXACT_CN 100 Exact
Province Address 20 CLEANSE PROVINCE 100 Exact
Country Address 20 EXACT 100 Exact
Job Title Contact 40 EXACT_CN 100 Exact
Phone Number Contact
Point
60 CLEANSE_CN
(NUMBER)
100 Exact
REVERSE 80
e-mail Address Contact
Point
60 EXACT_CN (EMAIL) 100 Exact
-
8/13/2019 DQM Localization for China RCD
24/41
CLEANSE_CN (EMAIL) 80 Exact
HZ_PERSON_SIMPLE_SEARCH_RULE_CN
Description in English: HZ: Person Simple Search Match Rule for Simplified Chinese
Description in Chinese:
Purpose:Search
Automerge:No
Attribute Match: Match Any Attributes
Acquisition
This table shows the seeded attributes and transformation functions for the acquisition part of
the matching process.
Attribute
Name
Entity Filter Transformation Name
Name Party No WR PERSON
WR CHINESE PINYIN
Registry ID Party No EXACT
Account
Number
Party No EXACT (NUMBER)
Job Title Contact No EXACT_CN
Phone
Number
Contact
Point
No REVERSE
e-mail
Address
Contact
Point
No CLEANSE_CN (EMAIL)
Scoring
This table shows the seeded thresholds for the scoring part of the matching process.
Threshold Value
Match Threshold 100
Override Threshold
Automatic Merge
Threshold
This table shows the seeded attributes and transformation functions for the scoring part of the
matching process.
-
8/13/2019 DQM Localization for China RCD
25/41
Attribute
Name
Entity Score Transformation Name Weight
(%)
Type Similar
ity (%)
Name Party 50 EXACT_CN 100 Exact
WR PERSON 90 Exact
WR CHINESE PINYIN 70 Exact
Registry
ID
Party 100 EXACT 100 Exact
Account
Number
Party 60 EXACT (NUMBER) 100 Exact
Job Title Contact 20 EXACT_CN 100 Exact
PhoneNumber
ContactPoint
80 CLEANSE_CN(NUMBER) 100 Exact
REVERSE 80 Exact
e-mail
Address
Contact
Point
70 EXACT_CN (EMAIL) 100 Exact
CLEANSE_CN (EMAIL) 80 Exact
HZ_ORG_ADV_SEARCH_RULE_CN
Description in English: HZ: Organization Advanced Search Match Rule for Simplified Chinese
Description in Chinese: HZ
Purpose:Search
Automerge:No
Attribute Match: Match Any Attributes
Acquisition
This table shows the seeded attributes and transformation functions for the acquisition part of
the matching process.
Attribute
Name
Entity Filter Transformation Name
Name Party No WR ORG
TYPE+REGION+BIZ+BRANCH+NICKNAME
Registry ID Party No EXACT
Account Party No EXACT (NUMBER)
-
8/13/2019 DQM Localization for China RCD
26/41
Number
Taxpayer ID Party No EXACT_CN
Address Address No Format to Number
City Address No CLEANSE CITY
URLContact
PointNo CLEANSE_CN (URL)
Scoring
This table shows the seeded thresholds for the scoring part of the matching process.
Threshold Value
Match Threshold 60
Override Threshold
Automatic Merge
Threshold
This table shows the seeded attributes and transformation functions for the scoring part of the
matching process.
Attribute
Name
Entity Score Transformation Name Weight
(%)
Type Simila
rity
(%)
Name Party 60 EXACT_CN 100 Exact
WR ORG TYPE+NICKNAME 90 Exact
WR ORG
TYPE+REGION+BRANCH+NI
CKNAME
50 Exact
Registry
ID
Party 60 EXACT 100 Exact
Account
Number
Party 60 EXACT (NUMBER) 100 Exact
Taxpayer
ID
Party 60 EXACT_CN 100 Exact
Address Address 40 EXACT_CN 100 Exact
Format to Number 70 Exact
-
8/13/2019 DQM Localization for China RCD
27/41
City Address 20 CLEANSE CITY 100 Exact
Postal
Code
Address 40 EXACT_CN 100 Exact
Province Address 20 EXACT_CN 100 Exact
CLEANSE PROVINCE 100 Exact
Country Address 20 EXACT 100 Exact
Phone
Number
Contact
Point
60 CLEANSE_CN (NUMBER) 100 Exact
URLContact
Point60 CLEANSE_CN (URL) 100 Exact
HZ_ORG_SIMPLE_SEARCH_RULE_CN
Description in English: HZ: Organization Simple Search Match Rule for Simplified Chinese
Description in Chinese: HZ
Purpose:Search
Automerge:No
Attribute Match: Match Any Attributes
Acquisition
This table shows the seeded attributes and transformation functions for the acquisition part ofthe matching process.
Attribute
Name
Entity Filter Transformation Name
Name Party No WR ORG
TYPE+REGION+BIZ+BRANCH+NICKNAME
Registry ID Party No EXACT
Account
Number
Party No EXACT (NUMBER)
Taxpayer ID Party No EXACT_CN
URLContact
PointNo CLEANSE_CN (URL)
Scoring
This table shows the seeded thresholds for the scoring part of the matching process.
-
8/13/2019 DQM Localization for China RCD
28/41
-
8/13/2019 DQM Localization for China RCD
29/41
Attribute
Name
Entity Filter Transformation Name
Name Party No WR ORG TYPE+REGION+BRANCH+NICKNAME
WR CHINESE PINYIN
WR PERSON
Registry ID Party No EXACT
Tax
Registration
Num
Party No EXACT CN
Personal
Identification
Party No CLEANSE_CN (ID NUMBER)
Party Type Party No EXACT
Address Address No Purge Region
City Address No CLEANSE CITY
Province Address No CLEANSE PROVINCE
Country Address No EXACT
Scoring
This table shows the seeded thresholds for the scoring part of the matching process.
Threshold Value
Match Threshold 100
Override Threshold
Automatic Merge
Threshold
This table shows the seeded attributes and transformation functions for the scoring part of the
matching process.
Attribute
Name
Entity Score Transformation Name Weight
(%)
Type Simil
arity
(%)
Name Party 50 EXACT_CN 100 Exact
WR ORG TYPE+NICKNAME 90 Exact
-
8/13/2019 DQM Localization for China RCD
30/41
WR ORG
TYPE+REGION+BRANCH+N
ICKNAME
80 Exact
WR CHINESE PINYIN 70 Exact
WR PERSON 90 Exact
Registry ID Party 100 EXACT Exact
Tax
Registration
Num
Party 100 EXACT CN Exact
Personal
Identification
Party 100 CLEANSE_CN (ID NUMBER) Exact
Party Type Party 10 EXACT Exact
Address Address 80 EXACT_CN 100 Exact
Format to Number 70 Exact
City Address 20 CLEANSE CITY 100 Exact
Province Address 10 CLEANSE PROVINCE 100 Exact
Country Address 5 EXACT 100 Exact
SAMPLE: IDENTICAL_PERSON_CN
Description in English: Finds identical person parties for Simplified Chinese
Description in Chinese:
Purpose:Expanded Duplicate Identification
Automerge:No
Attribute Match: Match All Attributes
Acquisition
This table shows the seeded attributes and transformation functions for the acquisition part of
the matching process.
Attribute Name Entity Filter Transformation
Name
Party Type Party Yes EXACT
Personal
Identification
Party No CLEANSE_CN (ID
NUMBER)
-
8/13/2019 DQM Localization for China RCD
31/41
Phone Number Contact
Point
No REVERSE
E-Mail Address Contact
Point
No CLEANSE_CN
(EMAIL)
Scoring
This table shows the seeded thresholds for the scoring part of the matching process.
Threshold Value
Match Threshold 60
Override Threshold
Automatic Merge
Threshold
This table shows the seeded attributes and transformation functions for the scoring part of the
matching process.
Attribute
Name
Entity Score Transformation Name Weight
(%)
Type Similar
ity (%)
Personal
Identification
Party 80 CLEANSE_CN (ID
NUMBER)
100 Exact
Phone Number Contact
Point
80 CLEANSE_CN
(NUMBER)
100 Exact
REVERSE 80 Exact
E-Mail
Address
Contact
Point
80 EXACT_CN (EMAIL) 100 Exact
CLEANSE_CN (EMAIL) 75 Exact
SAMPLE: IDENTICAL_ORGANIZATIONS_CN
Description in English: Finds identical Organization Parties for Simplified Chinese
Description in Chinese:
Purpose:Expanded Duplicate Identification
Automerge:No
Attribute Match: Match All Attributes
Acquisition
This table shows the seeded attributes and transformation functions for the acquisition part of
the matching process.
-
8/13/2019 DQM Localization for China RCD
32/41
-
8/13/2019 DQM Localization for China RCD
33/41
Number
SAMPLE: SIMILAR_ORGANIZATION_CN
Description in English: Finds duplicate organizations that have similar names, address,
contacts or contact points for Simplified Chinese
Description in Chinese:
Purpose:Expanded Duplicate Identification
Automerge:No
Attribute Match: Match Any Attributes
Acquisition
This table shows the seeded attributes and transformation functions for the acquisition part of
the matching process.
Attribute Name Entity Filter Transformation Name
Name Party No WR ORG
TYPE+REGION+BRANCH+NICKNAME
Tax Registration
Num
Party No EXACT CN
DUNS Number Party No EXACT CN
Address Address No PURGE REGION
Postal Code Address No EXACT CN
Phone Number
Flexible Format
Contact
Point
No CLEANSE_CN (NUMBER)
REVERSE
Contact Name Contact No WR CHINESE PINYIN
WR PERSON
Scoring
This table shows the seeded thresholds for the scoring part of the matching process.
Threshold Value
Match Threshold 100
Override Threshold
-
8/13/2019 DQM Localization for China RCD
34/41
Automatic Merge
Threshold
This table shows the seeded attributes and transformation functions for the scoring part of the
matching process.
Attribute
Name
Entity Score Transformation Name Weight
(%)
Type Simila
rity
(%)
Name Party 50 EXACT CN 100 Exact
WR ORG
TYPE+NICKNAME
90 Exact
WR ORG
TYPE+REGION+BRANC
H+NICKNAME
70 Exact
Tax
Registration
Num
Party 100 EXACT CN 100 Exact
DUNS
Number
Party 100 EXACT CN 100 Exact
Address Address 60 EXACT CN 100 Exact
PURGE REGION 80 Exact
Postal Code Address 20 EXACT CN 100 Exact
Contact Name Contact 30 EXACT CN 90 Exact
WR CHINESE PINYIN 70 Exact
WR PERSON 90 Exact
Phone Number
Flexible
Format
Contact
Point
65 CLEANSE_CN
(NUMBER)
100 Exact
REVERSE 100 Exact
SAMPLE: SIMILAR_PERSON_CN
The SAMPLE: SIMILAR_PERSON_CN match rule identifies duplicate parties of type
Description in English: Finds duplicate persons that have similar names, address, or contact
points for Simplified Chinese
-
8/13/2019 DQM Localization for China RCD
35/41
Description in Chinese:
Purpose:Expanded Duplicate Identification
Automerge:No
Attribute Match: Match All Attributes
Acquisition
This table shows the seeded attributes and transformation functions for the acquisition part of
the matching process.
Attribute Name Entity Filter Transformation Name
Name Party No WR CHINESE PINYIN
WR PERSON
Personal Identification Party No CLEANSE_CN (ID NUMBER)
Address Address No Purge Region
Postal Code Address No EXACT CN
E-Mail Address Contact Point No CLEANSE_CN (EMAIL)
Scoring
This table shows the seeded thresholds for the scoring part of the matching process.
Threshold Value
Match Threshold 100
Override Threshold
Automatic Merge
Threshold
This table shows the seeded attributes and transformation functions for the scoring part of the
matching process.
Attribute
Name
Entity Score Transformation Name Weight
(%)
Type Similar
ity (%)
Name Party 50 EXACT CN 100 Exact
WR PERSON 80 Exact
WR CHINESE PINYIN 60 Exact
Personal
Identification
Party 100 CLEANSE_CN (ID
NUMBER)
100 Exact
-
8/13/2019 DQM Localization for China RCD
36/41
E-Mail Address Contact
Point
20 EXACT_CN (EMAIL) 100 Exact
CLEANSE_CN
(EMAIL)
80 Exact
Phone Number
Flexible Format
Contact
Point
65 CLEANSE_CN
(NUMBER)
100 Exact
Address Address 60 EXACT CN 100 Exact
PURGE REGION 80 Exact
Postal Code Address 20 EXACT CN 100 Exact
DL SYSTEM DUPLICATE IDENTIFICATION CN
Description in English: Rule used for System Duplicate Identification in de-duplication for
Simplified Chinese
Description in Chinese:
Purpose:Expanded Duplicate Identification
Automerge:No
Attribute Match: Match Any Attributes
Acquisition
This table shows the seeded attributes and transformation functions for the acquisition part ofthe matching process.
Attribute Name Entity Filter Transformation Name
Name Party No WR ORG
TYPE+REGION+BRANCH+NICKNAME
WR PERSON
WR CHINESE PINYIN
Registry ID Party No EXACT
Party Type Party Yes EXACT
Tax Registration
Num
Party No EXACT_CN
Personal
Identification
Party No CLEANSE_CN (ID NUMBER)
Address Address No PURGE REGION
-
8/13/2019 DQM Localization for China RCD
37/41
City Address No CLEANSE CITY
Province Address No CLEANSE PROVINCE
Country Address Yes EXACT
URL
Contact
Point
No CLEANSE_CN (URL)
Scoring
This table shows the seeded attributes and transformation functions for the scoring part of the
matching process.
Attribute
Name
Entity Score Transformation Name Weight
(%)
Type Simil
arity
(%)
Name Party 50 EXACT_CN 100 Exact
WR ORG TYPE+NICKNAME 90 Exact
WR ORG
TYPE+REGION+BRANCH+NI
CKNAME
70 Exact
WR PERSON 90 Exact
WR CHINESE PINYIN 60 Exact
Registry ID Party 100 EXACT CN 100 Exact
Tax
Registration
Num
Party 100 EXACT_CN 100 Exact
Personal
Identification
Party 100 CLEANSE_CN (ID NUMBER) 100 Exact
Address Address 80 EXACT_CN 100 Exact
PURGE REGION 70 Exact
City Address 20 CLEANSE CITY 100 Exact
Province Address 20 CLEANSE PROVINCE 100 Exact
Threshold
This table shows the seeded thresholds for the scoring part of the matching process.
-
8/13/2019 DQM Localization for China RCD
38/41
Threshold Value
Match Threshold 100
Override Threshold
Automatic Merge
Threshold
BULK MATCH: IDENTICAL ORGANIZATIONS CN
Description in English: Bulk Duplicate Identification match rule to identify organization
matches for Simplified Chinese
Description in Chinese:
Purpose:Bulk Duplicate Identification
Automerge:No
Attribute Match: Match All Attributes
Acquisition
This table shows the seeded attributes and transformation functions for the acquisition part of
the matching process.
Attribute Name Entity Filter Transformation Name
Name Party No WR ORG TYPE+NICKNAME
Party Type Party Yes EXACT
DUNS Number Party No EXACT CN
Tax Registration Num Party No EXACT CN
Address Address No EXACT CN
Postal Code Address No EXACT CN
Country Address Yes EXACT
Contact Name Contact No WR PERSON
URL Contact
Point
No CLEANSE_CN (URL)
Raw Phone Number Contact
Point
No REVERSE (PHONE)
Scoring
-
8/13/2019 DQM Localization for China RCD
39/41
-
8/13/2019 DQM Localization for China RCD
40/41
Format
REVERSE (PHONE) 80 Exact
BULK MATCH: IDENTICAL PERSONS CN
Description in English: Bulk Duplicate Identification match rule to identify person matches for
Simplified Chinese
Description in Chinese:
Purpose:Bulk Duplicate Identification
Automerge:No
Attribute Match: Match All Attributes
Acquisition
This table shows the seeded attributes and transformation functions for the acquisition part of
the matching process.
Attribute Name Entity Filter Transformation Name
Name Party No WR PERSON
Party Type Party Yes EXACT
Personal Identification Party No CLEANSE_CN (ID NUMBER)
Personal Identification
Type
Party Yes EXACT CN
Address Address No Purge Region
Postal Code Address Yes EXACT CN
Country Address Yes EXACT
Email Address Contact
Point
No EXACT_CN (EMAIL)
Raw Phone Number Contact
Point
No REVERSE (PHONE)
CLEANSE_CN (PHONE)
Scoring
This table shows the seeded thresholds for the scoring part of the matching process.
Threshold Value
Match Threshold 175
-
8/13/2019 DQM Localization for China RCD
41/41
Override Threshold
Automatic Merge
Threshold
250
This table shows the seeded attributes and transformation functions for the scoring part of the
matching process.
Attribute
Name
Entity Score Transformation Name Weight
(%)
Type Simila
rity
(%)
Name Party 80 EXACT CN 100 Exact
WR PERSON 90 Exact
Personal
Identification
Party 200 CLEANSE_CN (ID
NUMBER)
100 Exact
Address Address 100 EXACT CN 100 Exact
Purge Region 80 Exact
City Address 10 CLEANSE CITY 100 Exact
Province Address 5 CLEANSE PROVINCE 100 Exact
Country Address 5 CLEANSE_CN
(COUNTRY)
100 Exact
Email Address Contact
Point
60 EXACT_CN (EMAIL) 100 Exact
CLEANSE_CN (EMAIL) 80 Exact
Phone Number
Flexible Format
Contact
Point
70 CLEANSE_CN (PHONE) 100 Exact
REVERSE (PHONE) 100 Exact