efficient random walk inference with knowledge bases · 2020-04-09 · efficient random walk...

Post on 02-Jun-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Efficient Random Walk Inference with Knowledge Bases

Ni Lao

Carnegie Mellon University

2012-7-11

1

Committee: William W. Cohen (Chair), Teruko Mitamura, Tom Mitchell,

C. Lee Giles (Pennsylvania State University)

Knowledge itself is power. --Francis Bacon

7/13/2012 2 an algorithm which tries to achieve something,

e.g. IR/IE/QA/MT

KB as an edge-labeled graph

Link Prediction -- a generic relational learning task

Given

a directed edge-labeled graph

a relation type r

a source node s (also called a query)

Find

the set of nodes G, so that r(s,t) for each t in G

7/13/2012 3

Infer New Knowledge

7/13/2012 4

Application

Charlotte

BrontëPainter

Writer

Profession?Carpenter

What is the profession of Charlotte Brontë?

Consider Friends/Family

7/13/2012 5

Charlotte

Brontë

Patrick Brontë

HasFather

Writer

Profession

Consider Behaviors

7/13/2012 6

IsA-1 is the inverse of IsA Wrote-1 is the inverse of Wrote

Charlotte

BrontëWriter

Jane

Eyre

Wrote

Novel

A Tale of

Two Cities

IsA-1

IsA

Charles Dickens

Profession

Wrote-1

Consider Literatures/Publications

7/13/2012 7

Mentioned

Charlotte

BrontëWriter

Mentioned-1

Painter

Profession

Reading Recommendation

7/13/2012 8

a paper stream

these are interesting papers

Application

7/13/2012 9

a paper river

7/13/2012 10

a paper river

new development of an interesting topic

citewrite

7/13/2012 11

a paper river

new papers of my favorite author

readwrite

write

my favorite author

7/13/2012 12

a paper river

social recommendation

scientist who have similar interests

read

readreadwrite

read

Relational Learning Goals

7/13/2012 13

robust

scalable

expressive define features expressing

sequences of relations on graph

combine many such features when making decisions

efficiently discover and calculate such features

Relational learning is a subfield of artificial intelligence, that learns with expressive logical or relational representations.

Why is relational learning computationally challenging?

7/13/2012 14

Exponentially many path types

IsAIsA-1 AthletePlaysSport

Exponentially many path instantiations

s

Our solution: feature metrics

Our solution: sampling

Thesis Outline

15

Ch. 2: motivation

Ch. 3: knowledge base inference (Lao+, EMNLP 2011)

Ch. 4: literature recommendation (Lao & Cohen, DILS 2012)

Ch. 6: relation extraction from parsed text (Lao+, EMNLP 2012)

Applications

Ch. 7: coordinate term extraction

Ch. 2: Path Ranking Algorithm (Lao & Cohen, MLJ 2010)

Ch. 5: efficient RW (Lao & Cohen, KDD 2010)

Ch. 6: distributed computing

Algorithms

Ch. 7: more expressive features (submitted)

Ch. 8: future work

Outline

16

previous work

idea

contribution

Motivation

knowledge base inference

literature recommendation

relation extraction from parsed text

Applications

coordinate term extraction

Path Ranking Algorithm (PRA)

efficient RW

distributed computing

Algorithms

more expressive features

the problem

Inductive Logic Programming

e.g.

First Order Inductive Learner--FOIL (Quinlan, ECML’93)

7/13/2012 17

HasFather(a,b) ^ Profession(b,y) Profession(a,y)

not robust

High precision Horn clauses

not scalable

experimental comparison later

expressive

Undirected Graphical Models -- combine logics with GM

e.g. Markov Logic Networks (Kok & Domingos, ICML’05) Relational CRFs (Lao+, NIPS’10)

7/13/2012 18

Horn clauses smokes(A) & Friends(A,B)smokes(B)

as CRFs features smokes(A) & Friends(A,B) & !smokes(B)

robust

not scalable

expressive

Random Walk with Restart -- ignore logic

19

P(Charlotte Writer)

P(Charlotte Painter)

7/13/2012

e.g. Tong+, ICDM’06

experimental comparison later

robust

not expressive

scalable

Mentioned

Charlotte

Brontë

Patrick Brontë

HasFather

Writer

Profession

Mentioned-1

Jane

Eyre

Wrote

Novel

A Tale of

Two Cities

IsA-1

IsA

Charles Dickens

Profession

Wrote-1

PainterProfession

Outline

20

previous work

idea

contribution

Motivation

knowledge base inference

literature recommendation

relation extraction from parsed text

Applications

coordinate term extraction

Path Ranking Algorithm (PRA)

efficient RW

distributed computing

Algorithms

more expressive features

the problem

Relational Classification -- combine logics with RWs

21

e.g. Path Ranking algorithm (Lao & Cohen, MLJ’10)

7/13/2012

P(Charlotte Writer; <HasFather,IsA>)

P(Charlotte Writer; <Mention,Mention-1,IsA>)

P(Charlotte Painter; <HasFather,IsA>)

P(Charlotte Painter; <Mention,Mention-1,IsA>)

Mentioned

Profession?

Charlotte

Brontë

Patrick Brontë

HasFather

Writer

Profession

Mentioned-1

Profession

Jane

Eyre

Wrote

Novel

A Tale of

Two Cities

IsA-1

IsA

Charles Dickens

Profession

Wrote-1

Painter

Profession

robust

scalable

expressive

Contribution

22

made possible by

a family of easy-to-learn features

fast random walk

distributed computing

Apply relational learning

at scales not possible before

Outline

23

previous work

idea

contribution

Motivation

knowledge base inference

literature recommendation

relation extraction from parsed text

Applications

coordinate term extraction

Path Ranking Algorithm (PRA)

efficient RW

distributed computing

Algorithms

more expressive features

the problem

Path Ranking Algorithm (PRA)

24

(Lao & Cohen, MLJ 2010)

7/13/2012

( , ) ( ; )B

score s t P s t

e.g. π=<Mention,Mention-1,IsA>

robust

expressive

a weight

( , ) ( ; )B

score s t P s t

Random Walk Calculation

25 7/13/2012

e.g.

π’=<Mention,Mention-1>

r=Profession

( ; ) ( ; ') ( ; )z

P s t P s z P z t r

Dynamic Programing

π’

π’

π’

z3

s z2

z1

t2

t1

rr

rr

later about how to do it x100 more efficiently using sampling scalable

( , ) ( ; )B

score s t P s t

Feature Selection with Labeled Data

given training query set {(si, Gi)}

7/13/2012 26

( ) ( , )i

i j

i j G

hits f I f s t h

( , )1

( )( , )

i

i j

j G

i i j

j

f s t

accuracy f aN f s t

I(): the indicator function N: total number of queries

( , ) ( ; )B

score s t P s t

Estimating θ

for a relation r

generate positive and negative node pairs {(si, ti)}

for each (si, ti) generate (xi, yi)

xi is a vector of RW features of different paths π

yi is a binary label r(si, ti)

estimate θ by L1/L2 regularized (elastic-net) logistic regression

27 7/13/2012

Outline

28

previous work

idea

contribution

Motivation

knowledge base inference

literature recommendation

relation extraction from parsed text

Applications

coordinate term extraction

Path Ranking Algorithm (PRA)

efficient RW

distributed computing

Algorithms

more expressive features

the problem

Knowledge Base Inference

NELL (Never Ending Language Learner) v165

353 relations

0.7M nodes (concepts)

1.7M edges

7/13/2012 29

(Lao, Mitchell, Cohen, EMNLP 2010)

Example NELL relations

Application

IsAIsA-1 AthletePlaysSport

7/13/2012 30

PRA Uses Broad Coverage Features

AthletePlaysSport(HinesWard, ?)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

p@10 p@100 p@1000

Pre

cisi

on

FOIL

PRA

PRA Has Much Higher Recall and Is Much Faster

7/13/2012 31

Mechanical Turk evaluate new beliefs of 8 functional relations

PRA trains in an hour vs. FOIL trains in a few days

Outline

32

previous work

idea

contribution

Motivation

knowledge base inference

literature recommendation

relation extraction from parsed text

Applications

coordinate term extraction

Path Ranking Algorithm (PRA)

efficient RW

distributed computing

Algorithms

more expressive features

the problem

Biology Literatures

Databases

Yeast: 0.8M nodes, 3.5M edges

Fly: 0.7M nodes, 16.9M edges

33

Application

7/13/2012

author

71k

paper

50k

gene

5.6k 160k

Relates to

1.6K

Cites

0.3M

year

64 Before

title word

40k 0.5M

journal

1.0k

institute

6k

39k

mesh

descriptors/

qualifiers

29k

1.1M

chemical

14k

0.3M

Recommendation Tasks

Literature Recommendation year, author papers a user is going to read

training data --- 1 user over 20 years

34

(collected from Dr. John Woolford’s computer)

7/13/2012

PRA Combines Dozens of Recommendation Strategies

7/13/2012 35

citewrite

readwrite

write

7/13/2012 36

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

2 3 4 5

MR

R

Maximum Path Length (L)

PRARWRRWR(no training)Community-basedContent-based

Reading Recommendation

Mean Reciprocal

Rank

Outline

37

previous work

idea

contribution

Motivation

knowledge base inference

literature recommendation

relation extraction from parsed text

Applications

coordinate term extraction

Path Ranking Algorithm (PRA)

efficient RW

distributed computing

Algorithms

more expressive features

the problem

Efficient Random Walks

Exact calculation of random walks results in non-zero probabilities for many internal nodes

38

(Lao & Cohen, KDD 2010)

1 billion

nodesquery

node A few nodes that

we care about

Charlotte

Writer

Painter

Zebra

7/13/2012

Idea: a few random walkers (particles) are enough to distinguish good target

nodes from bad ones

39

1 billion

nodesquery

node A few nodes that

we care about

Writer

Painter

Zebra Charlotte

7/13/2012

details

0.17

0.18

0.19

0.20

0.21

0.22

0.23

1 10 100 1000

MR

R

Speedup

200

1k1k

300

Finger PrintingParticle FilteringFixed TruncationBeam Truncation

Compare Speedup Approaches

40 7/13/2012

10x ~ 100x faster with little loss of quality

exact random walks

exact random walks

Gene Recommendation on Fly Data (N=2k)

Mean Reciprocal

Rank

number of particles

Outline

41

previous work

idea

contribution

Motivation

knowledge base inference

literature recommendation

relation extraction from parsed text

Applications

coordinate term extraction

Path Ranking Algorithm (PRA)

efficient RW

distributed computing

Algorithms

more expressive features

the problem

Relation Extraction

7/13/2012 42

wrote

She

Mention

dobj

Charlotte

was

nsubjnsubj

Jane Eyre

Charlotte

Bronte

Mention

Jane Eyre

Mention

Coreference Resolution

Entity

Resolution

Freebase

News Corpus

Dependency Trees

Write

Patrick BrontëHasFather

?

Profession

Writer

(21M concepts, 70M edges)

(60M )

Application

7/13/2012 43

Can PRA scale?

Can PRA learn syntactic-semantic rules?

Distributed Computing

7/13/2012 44

Large number of queries e.g. 0.3M/2M persons have known profession Solution: map/reduce to explore path, generate training samples, calculate gradient, and do predictions for each query

Large text graph e.g. 60M documents Solution: each node keeps the Freebase graph in memory relevant sentences are loaded/unloaded for each query

<M, conj, M-1, Profession>

7/13/2012 45

Combine Syntax with Semantics

“McDougall and Simon Phillips collaborated…”

M

Profession

IanMcDougall

?

M

conj

Performer

SimonPhilips

subj

Freebase

Parsed TextIan McDougall

Simon Phillips

collaborated

<M, WORD, CW-1, profession-1, profession>

7/13/2012 46

Combine Text with Semantics

M

BarackObama

CW

President

“he”

Freebase

Parsed Text

leaderProfession

“leader”

other

persons

TW

tokens

words

host

CW

“host”

“president”

e.g.“The president said …”

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Profession Nationality Parent

MR

R Freebase-only

Text-only

Freebase+Text

Text and KB Work Better Together

7/13/2012 47

Tested by existing knowledge in Freebase

Mean Reciprocal

Rank

with closed world assumption

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Profession Nationality Parents

Pre

cisi

on

p@100

p@1k

p@10k

Highly Accurate New Beliefs

7/13/2012 48

manually evaluated new beliefs

Outline

49

previous work

idea

contribution

Motivation

knowledge base inference

literature recommendation

relation extraction from parsed text

Applications

coordinate term extraction

Path Ranking Algorithm (PRA)

efficient RW

distributed computing

Algorithms

more expressive features

the problem

Coordinate Term Extraction Task

parsed MUC-6 corpus 153k nodes, 748K edges

30 queries given 4 person names as seeds, find other persons

7/13/2012 50

Words/POSs

Tokens

Tokens

(Minkov & Cohen, ECML 2010)

Application

W

BillGates

BillGates

founded

founded

nsubj

W

W

SteveJobs

SteveJobs founded

nsubj

W

vbd

POS

POS

nnp

POS

POS

W: word POS: part of speech nnp: singular proper nouns vbd: verb, past tense nsubj: subject of a verb

Good Paths Are Quite Long

7/13/2012 51

<W-1,nsubj,W,W-1,nsubj-1,W>

find entities with similar behaviors

Words

Tokens

Tokens

W

BillGates

BillGates

founded

founded

nsubj

W

W

SteveJobs

SteveJobs founded

nsubj

W

Good Paths Are Quite Long

7/13/2012 52

0.00

0.05

0.10

0.15

0.20

3 4 5 6

MA

P

Max Path Length

kF

kF+1B

kF+2B

kF+3B

Mean Average

Precision

Forward Search Is Wasteful

7/13/2012 53

s

t

Find paths that connect s and t

Bidirectional Search Is More Efficient

7/13/2012 54

s

t

z

challenge is to calculate P(s→t;π)

Forward vs. Backward RWs

7/13/2012 55

( ; ) ( ; ') ( ; )z

P t s P t z P z s r

( ; ) ( ; ') ( ; )z

P s t P s z P z t r π’

π’

π’

z3

s z2

z1

t2

t1

rr

rr

evaluate P(s→t;π)

for many s

π’

π’

π’

z3

t z2

z1

s2

s1

rr

rr

Forward

Backward

evaluate P(s→t;π)

for many t

Details

Bidirectional Search with RW

7/13/2012 56

1 2 1 2( ; ) ( ; ) ( ; )z

P s t P s z P t z

π2

π2

π2 t

z3

z2

z1π1

π1sπ1

Forward RW Backward RW

0.1

1

10

100

1000

3 4 5 6

Pat

h F

ind

ing

Tim

e (

s)

Max Path Length

2F+1B

3F+1B

3F

4F

2F+2B3F+2B

5F

4F+2B

3F+3B

4F+1B

Bidirectional Search Is Much Faster

7/13/2012 57

Exceed 20Gb memory limit

1000x faster

Need for Lexicalized Paths

7/13/2012 58

P(vbd→t | <POS-1, nsubj-1, W>)

Task: find person entities

W

BillGates

BillGates founded

nsubjW

SteveJobs

SteveJobs founded

nsubj

vbd

POSPOS

Tokens

Words

POSs

Evaluate Lexicalized Paths

7/13/2012 59

Given an example (si, ti)

calculate P(z→ti;π) for many z

π

π

πt

z3

z2

z1

0.0

0.1

0.2

0.3

0.4

L=6

MA

P

RWR(no train)

RWR

FOIL

PRA

PRA+c2

PRA+c3

Person Name Extraction

7/13/2012 60

~1000 correct answers

Mean Average

Precision

Outline

61

previous work

idea

contribution

Motivation

knowledge base inference

literature recommendation

relation extraction from parsed text

Applications

coordinate term extraction

Path Ranking Algorithm (PRA)

efficient RW

distributed computing

Algorithms

more expressive features

the problem

Future Work

Apply knowledge to NLP/IE/IR/CV tasks

7/13/2012 62

arg max ( | , )decision

P decision context KB

wrote

She

Mention

dobj

Charlotte

was

nsubjnsubj

Jane Eyre

Charlotte

Bronte Jane Eyre

Mention

Coreference?

Entity

Resolution

Dependency Trees

Write

IsA FemaleKB

Text

Future Work

Conjunctions of Paths

rules can have tree structures

with source/constant/target nodes as leafs

7/13/2012 63

founded

t3

nsubj

W

dobj

W

Apple

W

SteveJobs

t2 t1

source s constant z target t

Future Work

Conjunctions of Paths

forward PCRW with multiple walkers

7/13/2012 64

founded

t3

nsubj

W

dobj

W

Apple

W

SteveJobs

t2 t1

source s constant z target t

Future Work

Conjunctions of Paths

backward PCRW with multiple walkers

7/13/2012 65

founded

t3

nsubj

W

dobj

W

Apple

W

SteveJobs

t2 t1

source s constant z target t

Contribution

66

Made possible by

a family of easy-to-learn features (3 types)

fast random walk (sampling)

distributed computing

Apply relational learning at scales not possible before.

Leads to new applications!

other work I did at CMU

Relational CRFs (Lao+, NIPS’10)

Question answering (Lao+, NTCIR’08)

Utility based retrieval evaluation (Yang+, SIGIR’07)

7/13/2012 67

7/13/2012 68

Future Work

KB extension new relation types, new concepts

7/13/2012 69

arg max ( | )KB

P corpus KB KB

arg max ( | , )KB

P decisions contexts KB KB

Unsupervised

Supervised

Directed Graphical Models e.g. Probabilistic Relational Models (Getoor+, ICML’01)

7/13/2012 70

model structure restricted to DAG

cannot express features corresponding to chains

Coverage of top k triples

7/13/2012 71

Profession Triples Unique Persons

1k 970

10k 8,726

100k 79,885

Repeatedly Combine Forward and Backward RWs

7/13/2012 72

Forward search

+Backward Search

+Backward Search

W-1,conj_and,W>

<W-1,conj_and,W,

W-1,conj_and,W,

Summary of PRA

7/13/2012 73

Stage Computation

Path Discovery

given {(si, Gi)}, find {f ; acc(f)>=a, hits(f)>=h}

Generate Training Samples

generate {(si, ti)} and {(xi, yi)}

Logistic Regression Training

Prediction

apply model to nodes s in domain(r)

2

1 1 2 2arg max ( ) || || || ||i

i

l

Need for Lexicalized Paths

7/13/2012 74

Bias toward MLB

A prior over the leagues participated by Boston Braves university athletes

Task=AthletePlaysInLeague

( ; )P mlb t

1,;

AthletePlaysForTeamP BostonBraves t

AthletePlaysInLeagure

Need for Lexicalized Paths

7/13/2012 75

Bias toward Google

Companies around Google

Task=CompetesWith

( ; )P google t

; ,P Google t CompetesWith CompetesWith

0.0

0.1

0.2

0.3

0.4

0.5

0.6

L=3

MR

R

RWR(no train)

RWR

PRA

PRA+c1

PRA+c2

Knowledge Base Inference

7/13/2012 76

16 tasks

top related