structure-based web access method for ancient chinese characters

19
. Structure-based Web Access Method for Ancient Chinese Characters Xiaoqing Lu Yingmin Tang Zhi Tang Yujun Gao Jianguo Zhang Institute of Computer Science and Technology, Peking University, Beijing, 100871, China Beijing Founder Electronics CO.,Ltd., Beijing, 100085, China Center for Chinese Font Design and Research, Beijing, 100871, China State Key Laboratory of Digital Publishing Technology (Peking University Founder Group Co.,Ltd.), 100871, Beijing, China {lvxiaoqing,tangyingmin,tangzhi}@pku.edu.cn, {gao_yujun,zjg}@founder.com 2013.11.19, ChongQing, China

Upload: radha

Post on 24-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

Structure-based Web Access Method for Ancient Chinese Characters. Xiaoqing Lu Yingmin Tang Zhi Tang Yujun Gao Jianguo Zhang Institute of Computer Science and Technology, Peking University, Beijing, 100871, China Beijing Founder Electronics CO.,Ltd., Beijing, 100085, China - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Structure-based Web Access Method for Ancient Chinese Characters

.

Structure-based Web Access Method for Ancient Chinese Characters

Xiaoqing Lu Yingmin Tang Zhi Tang Yujun Gao Jianguo Zhang

Institute of Computer Science and Technology, Peking University, Beijing, 100871, China

Beijing Founder Electronics CO.,Ltd., Beijing, 100085, China

Center for Chinese Font Design and Research, Beijing, 100871, China

State Key Laboratory of Digital Publishing Technology (Peking University Founder Group

Co.,Ltd.), 100871, Beijing, China

{lvxiaoqing,tangyingmin,tangzhi}@pku.edu.cn, {gao_yujun,zjg}@founder.com

2013.11.19, ChongQing, China

Page 2: Structure-based Web Access Method for Ancient Chinese Characters

.

NLP&CC 2013

Outline

• Background

• Formalization of relationships between

ACCs and modern characters

• Establishment of Super Large Font

• ACC Database

• Implementation and results

Page 3: Structure-based Web Access Method for Ancient Chinese Characters

.

NLP&CC 2013

Background (1/3)

• Ancient Chinese Characters (ACCs) Important heritage of Chinese history Date back to at least 3300 year-old Development is not one-dimensional Collection, management, and access on the Internet

Page 4: Structure-based Web Access Method for Ancient Chinese Characters

.

NLP&CC 2013

Background (2/3)

• Problems 1 Involves very large quantities of modern characters

Block Range CommentCJK UnifiedIdeographs

4E00–9FFF common

Extension A3400–4DBF Rare

Extension B20000–2A6DF Rare, historic

Extension C2A700–2B73F Rare, historic

Extension D2B740–2B81F Uncommon, some in current use

Compatibility F900–FAFF

Duplicates, unifiable variants, corporate characters

CompatibilitySupplement

2F800–2FA1F Unifiable variants

Page 5: Structure-based Web Access Method for Ancient Chinese Characters

.

NLP&CC 2013

Background (3/3)

• Problems 2 & 3 Lack of software code

Traditional IMEs are not suitable for ACCs

Page 6: Structure-based Web Access Method for Ancient Chinese Characters

.

NLP&CC 2013

Related work

• 1993, Xusheng Ji

• 1994, Ning Li

• 1996, Fangzheng Chen

• 2003, Zaixing Zhang

• 2004, Zhiji Liu

• 2005, Derming Juang

• 2007, Yi Zhuang

• 2008, James S. Kirk

• 2008, Dan Chen

• ... ...

Page 7: Structure-based Web Access Method for Ancient Chinese Characters

.

NLP&CC 2013

Outline

• Background

• Formalization of relationships between Formalization of relationships between

ACCs and modern charactersACCs and modern characters

• Establishment of Super Large Font

• ACC Database

• Implementation and results

Page 8: Structure-based Web Access Method for Ancient Chinese Characters

.

NLP&CC 2013

2 Formalization of relationships between ACCs and modern characters• Contemporary encoded characters

Existing encoded Chinese characters Marks for uncoded Chinese characters

• ACCs Corresponding relationships with contemporary encoded

characters No corresponding relationships with the contemporary

encoded characters

Page 9: Structure-based Web Access Method for Ancient Chinese Characters

.

NLP&CC 2013

2 Formalization of relationships between ACCs and modern characters• Two relations

• Three Types of ACCs Recognized characters Ambiguous characters Unrecognized characters

Page 10: Structure-based Web Access Method for Ancient Chinese Characters

.

NLP&CC 2013

Outline

• Background

• Formalization of relationships between

ACCs and modern characters

• Establishment of Super Large FontEstablishment of Super Large Font

• ACC Database

• Implementation and results

Page 11: Structure-based Web Access Method for Ancient Chinese Characters

.

NLP&CC 2013

3 Establishment of Super Large Establishment of Super Large FontFont

• Automatic generation of Chinese characters [27-30]

• rules regarding glyph structure

decomposition

• redundant expressions of glyph structures

are permitted

• multi-level radicals

Page 12: Structure-based Web Access Method for Ancient Chinese Characters

.

NLP&CC 2013

Outline

• Background

• Formalization of relationships between

ACCs and modern characters

• Establishment of Super Large Font

• ACC DatabaseACC Database

• Implementation and results

Page 13: Structure-based Web Access Method for Ancient Chinese Characters

.

NLP&CC 2013

ACC Database (1/3)ACC Database (1/3)

• Relation SchemaItem Meaning

Unicode Contemporary Chinese character Unicode for this ancient character.

Dynasty Dynasty when this ancient character was used. Type Type of this ancient character (e.g. pictographic characters,

ideograph, and phonogram) Classification Class type of this ancient character (e.g. inscriptions on bones or

tortoise shells of the Shang Dynasty, inscriptions on bronze, seal character, etc.)

Place Contemporary place where this ancient character was unearthed. Carrier Carrier of this ancient character (e.g. the name or the number of a

certain bronze implement)Country Ancient country where this ancient character was used. SubbaseID Number of the font database storing this ancient character. SubID Code of the ancient character, used in sub-font database. Filename File name for the picture of this ancient character. ID The unique ID of this ancient character in the font database.

Page 14: Structure-based Web Access Method for Ancient Chinese Characters

.

NLP&CC 2013

ACC Database (2/3)ACC Database (2/3)• Other relation schemas

Dynasty and Country (DC_RS), Ancient C_Character Classification (ACCC_RS) ACC Type (ACCT_RS) Unicode and Glyph (UG_RS) Radical and Component (RC_RS) Ancient Image (AI_RS) Contemporary Image (CI_RS)

Page 15: Structure-based Web Access Method for Ancient Chinese Characters

.

NLP&CC 2013

ACC Database (3/3)ACC Database (3/3)

• Relationships of the data tables

Page 16: Structure-based Web Access Method for Ancient Chinese Characters

.

NLP&CC 2013

Outline

• Background

• Formalization of relationships between

ACCs and modern characters

• Establishment of Super Large Font

• ACC Database

• Implementation and resultsImplementation and results

Page 17: Structure-based Web Access Method for Ancient Chinese Characters

.

NLP&CC 2013

Implementation and Implementation and resultsresults• Retrieval method

Page 18: Structure-based Web Access Method for Ancient Chinese Characters

.

NLP&CC 2013

Implementation and Implementation and resultsresults

Page 19: Structure-based Web Access Method for Ancient Chinese Characters

.

NLP&CC 2013