dkgbuilder:anarchitectureforbuilding ... · dkgbuilder:anarchitectureforbuilding...

1
DKGBuilder : An Architecture for Building a Domain Knowledge Graph from Scratch Yan Fan 1 , Chengyu Wang 1 , Guomin Zhou 2 , Xiaofeng He 1 1. Shanghai Key Laboratory of Trustworthy Computing, School of Computer Science and Software Engineering, East China Normal University 2 . Department of Computer and Information Technology, Zhejiang Police College DKGBuilder : An Architecture for Building a Domain Knowledge Graph from Scratch Yan Fan 1 , Chengyu Wang 1 , Guomin Zhou 2 , Xiaofeng He 1 1. Shanghai Key Laboratory of Trustworthy Computing, School of Computer Science and Software Engineering, East China Normal University 2 . Department of Computer and Information Technology, Zhejiang Police College DKGB uilder has an offline module and an online module, responsible for DKG construction and demonstration respectively. The offline module is consists of three parts: 1) s eed knowledge graph construction, which takes a couple of human - defined template names from Wikipedia to obtain domain entities and extracts seed attributes and relations with a simple approach of pattern matching; 2) fine - grained entity categorization, a way to construct domain taxonomy via is - a relation classification; and 3) representation learning and relation extraction, aiming at harvesting long - tail domain facts from text by a word embedding based linear projection model under distant supervision. The online module provides services like semantic search, deep reading, etc. DKGB uilder has an offline module and an online module, responsible for DKG construction and demonstration respectively. The offline module is consists of three parts: 1) s eed knowledge graph construction, which takes a couple of human - defined template names from Wikipedia to obtain domain entities and extracts seed attributes and relations with a simple approach of pattern matching; 2) fine - grained entity categorization, a way to construct domain taxonomy via is - a relation classification; and 3) representation learning and relation extraction, aiming at harvesting long - tail domain facts from text by a word embedding based linear projection model under distant supervision. The online module provides services like semantic search, deep reading, etc. Knowledge graph (KG) is a semantic network used to model entities and the relations between them. While m ost automatically constructed Chinese KGs are general and large - scale, they are insufficient in long - tail entities and relations in specific domains. To meet the needs of practical applications for a certain domain, we propose a general framework to construct a Chinese domain knowledge graph (DKG). It utilizes Wikipedia pages related to the entertainment industry as data source for demonstration purpose, extracts seed entities and relations from categories and infoboxes to construct an initial DKG, and employs a word embedding based linear projection model to cover more long - tail facts from texts. Knowledge graph (KG) is a semantic network used to model entities and the relations between them. While m ost automatically constructed Chinese KGs are general and large - scale, they are insufficient in long - tail entities and relations in specific domains. To meet the needs of practical applications for a certain domain, we propose a general framework to construct a Chinese domain knowledge graph (DKG). It utilizes Wikipedia pages related to the entertainment industry as data source for demonstration purpose, extracts seed entities and relations from categories and infoboxes to construct an initial DKG, and employs a word embedding based linear projection model to cover more long - tail facts from texts. Fig. 3. Deep Reading Fig. 4. Semantic Search Fig. 2. Representation Learning and Relation Extraction Fig. 1. System Architecture # Entities # Entities 100,848 100,848 # Rel. Facts # Rel. Facts 481,562 481,562 # Attr . Facts # Attr . Facts 251,183 251,183 # Rel. Types # Rel. Types 46 46 # Attr . Types # Attr . Types 33 33 Avg Acc. Avg Acc. 93.1% 93.1% Table 1. Descriptions of Chinese DKG

Upload: others

Post on 13-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DKGBuilder:AnArchitectureforBuilding ... · DKGBuilder:AnArchitectureforBuilding aDomainKnowledgeGraphfromScratch YanFan1,ChengyuWang1,GuominZhou2,XiaofengHe1 1.Shanghai Key Laboratory

DKGBuilder: An Architecture for Buildinga Domain Knowledge Graph from Scratch

Yan Fan1, Chengyu Wang1, Guomin Zhou2, Xiaofeng He1

1. Shanghai Key Laboratory of Trustworthy Computing,School of Computer Science and Software Engineering, East China Normal University2. Department of Computer and Information Technology, Zhejiang Police College

DKGBuilder: An Architecture for Buildinga Domain Knowledge Graph from Scratch

Yan Fan1, Chengyu Wang1, Guomin Zhou2, Xiaofeng He1

1. Shanghai Key Laboratory of Trustworthy Computing,School of Computer Science and Software Engineering, East China Normal University2. Department of Computer and Information Technology, Zhejiang Police College

DKGBuilder has an offline module and an online module, responsible for DKG construction and demonstration respectively.The offline module is consists of three parts: 1) seed knowledge graph construction, which takes acouple of human-defined template names from Wikipedia to obtain domain entities and extractsseed attributes and relations with a simple approach of pattern matching; 2) fine-grained entitycategorization, a way to construct domain taxonomy via is-a relation classification; and 3)representation learning and relation extraction, aiming at harvesting long-tail domain facts fromtext by a word embedding based linear projection model under distant supervision.The online module provides services like semantic search, deep reading, etc.

DKGBuilder has an offline module and an online module, responsible for DKG construction and demonstration respectively.The offline module is consists of three parts: 1) seed knowledge graph construction, which takes acouple of human-defined template names from Wikipedia to obtain domain entities and extractsseed attributes and relations with a simple approach of pattern matching; 2) fine-grained entitycategorization, a way to construct domain taxonomy via is-a relation classification; and 3)representation learning and relation extraction, aiming at harvesting long-tail domain facts fromtext by a word embedding based linear projection model under distant supervision.The online module provides services like semantic search, deep reading, etc.

Knowledge graph (KG) is a semantic network used to model entities and the relations between them.While most automatically constructed Chinese KGs are general and large-scale, they are insufficientin long-tail entities and relations in specific domains.To meet the needs of practical applications for a certain domain, we propose a generalframework to construct a Chinese domain knowledge graph (DKG). It utilizes Wikipedia pages relatedto the entertainment industry as data source for demonstration purpose, extracts seed entities andrelations from categories and infoboxes to construct an initial DKG, and employs a word embedding based linear projection model to cover more long-tail facts from texts.

Knowledge graph (KG) is a semantic network used to model entities and the relations between them.While most automatically constructed Chinese KGs are general and large-scale, they are insufficientin long-tail entities and relations in specific domains.To meet the needs of practical applications for a certain domain, we propose a generalframework to construct a Chinese domain knowledge graph (DKG). It utilizes Wikipedia pages relatedto the entertainment industry as data source for demonstration purpose, extracts seed entities andrelations from categories and infoboxes to construct an initial DKG, and employs a word embedding based linear projection model to cover more long-tail facts from texts.

Fig. 3. Deep Reading Fig. 4. Semantic Search

Fig. 2. Representation Learning and Relation Extraction

Fig. 1. System Architecture

# Entities# Entities 100,848100,848

# Rel. Facts# Rel. Facts 481,562481,562

# Attr. Facts# Attr. Facts 251,183251,183

# Rel. Types# Rel. Types 4646

# Attr. Types# Attr. Types 3333

Avg Acc.Avg Acc. 93.1%93.1%

Table 1. Descriptionsof Chinese DKG