mining advisor-advisee relationships from research publication networks

26
Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication Networks Chi Wang, Jiawei Han, Yuntao Jia, Jie Tang, Duo Zhang, Yintao Yu SIGKDD, 2010 Presented by Hung-Yi Cai 2010/12/29

Upload: aideen

Post on 24-Feb-2016

94 views

Category:

Documents


0 download

DESCRIPTION

Mining Advisor-Advisee Relationships from Research Publication Networks. Chi Wang, Jiawei Han, Yuntao Jia , Jie Tang, Duo Zhang, Yintao Yu SIGKDD, 2010 Presented by Hung-Yi Cai 2010/12/29. Outlines. Motivation Objectives Previous study Methodology Problem Formulation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

1

Mining Advisor-Advisee Relationships from Research Publication Networks

Chi Wang, Jiawei Han, Yuntao Jia, Jie Tang, Duo Zhang, Yintao YuSIGKDD, 2010

Presented by Hung-Yi Cai2010/12/29

Page 2: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

2

Outlines· Motivation· Objectives· Previous study· Methodology

─ Problem Formulation─ Assumption and Framework─ Preprocessing─ TPFG Model─ Model Learning

· Experiments· Conclusions· Comments

Page 3: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

3

Motivation· Information network contains abundant knowledge

about relationships among people or entities.

· Discovery of those relationships can benefit many interesting applications such as expert finding and research community analysis.

Page 4: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

4

Objectives· To propose a time-constrained probabilistic factor graph

model (TPFG), which takes a research publication network as input and models the advisor-advisee relationship mining problem using a jointly likelihood objective function and further to design an efficient learning algorithm to optimize the objective function.

Page 5: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Previous study· This work is different from the existing study in

Relation Mining and Relational Learning.─ Relation Mining: the study mainly employ text mining

and language processing technique on text data and structured data including web pages, user profiles and corpus of literature.

─ Relational Learning: the study refers to the classification when objects or entities are presented in multiple relations.

5

Page 6: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology Problem Formulation Assumption and Framework Preprocessing TPFG Model Model Learning

6

Page 7: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Problem Formulation

7

Page 8: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Assumption and Framework

· Assumption 1 based on the commonsense knowledge about advisor-advisee relationships.

· Assumption 2 determines that all the authors in the network have a strict order defined by the possible advising relationship.

8

Page 9: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Preprocessing· The purpose of preprocessing is to generate the candidate

graph H′ and reduce the search space.

9

Page 10: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Preprocessing· Then we have the following rule.

─ Author aj is not considered to be ai’s advisor if one of the following conditions holds:

10

Page 11: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.TPFG Model· By modeling the network as a whole, this step can incorporate both

structure information and temporal constraint and better analyze the relationship among individual links.

11

Page 12: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.TPFG Model· The graph is composed of two kinds of nodes: variable nodes and

function nodes.

12

Page 13: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Model Learning· To maximize the objective function and compute the ranking

score along with each edge in the candidate graph H′, this step need to infer the marginal maximal joint probability on TPFG, according to Eq. (10).

· Sum-product + junction tree. There is a general algorithm called sum-product to compute marginal function on a factor graph based on message passing.

13

Page 14: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Model Learning· New TPFG Inference Algorithm. The original sum-product

or max-sum algorithm meet with difficulty since it requires that each node needs to wait for all-but-one message to arrive.

14

Page 15: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Model Learning· After the two phases of message propagation, we can collect the two

messages on any edge and obtain the marginal function.

· The improved message propagation is still separated into two phases.─ Phase 1: the messages senti which passed from one to their ascendants are generated in a similar

order as before.─ Phase 2: messages returned from ascendants recvi are stored in each node.

15

Page 16: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

16

Page 17: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments· Experiment Step

17

Data Sets DBLP: The data set consists of 654,628 authors and 1,076,946 publications with time provided from 1970 to 2007.

Method Sum-Product + Junction Tree (JuncT)Loopy Belief Propagation (LBP)Independent Maxima (IndMAX)SVMRULE

Evaluation Aspects ROC curve

Page 18: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments· Accuracy: Effect of rules in TPFG

─ Using R3 as filtering rules and YEAR2 as graduation year estimation method.

18

Page 19: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments· Accuracy: Effect of network structure

─ Using DFS with a bounded maximal depth d from the given set of nodes, denoted as DFS=d, we can closures with controlled depth for a given set of authors to test.

19

Page 20: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments· Accuracy: Effect of training data

20

Page 21: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments· Accuracy: Case study

─ Finding that TPFG can discover some interesting relations beyond the “ground truth” from single source.

21

Page 22: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments· Scalability Performance

22

Page 23: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments· Application: Visualization of genealogy

23

Page 24: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments· Application: Expert finding and Bole search

24

Page 25: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

25

Conclusions· This paper studied the mining of advisor-advisee

relationships from a research publication network as an attempt to discover hidden semantic knowledge in information networks.

· Proposing a Time-constraint Probabilistic Factor Graph (TPFG) model to integrate local intuitive features in the network and results on the DBLP data sets demonstrate the effectiveness of the proposed approach.

Page 26: Mining Advisor-Advisee Relationships from  Research Publication  Networks

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

26

Comments· Advantages

─ The TPFG model can mining relationship between advisor and advisee from the research publication network.

· Applications─ Relationship Mining