efficient random walk inference with knowledge bases · 2020-04-09 · efficient random walk...

76
Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee: William W. Cohen (Chair), Teruko Mitamura, Tom Mitchell, C. Lee Giles (Pennsylvania State University)

Upload: others

Post on 02-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Efficient Random Walk Inference with Knowledge Bases

Ni Lao

Carnegie Mellon University

2012-7-11

1

Committee: William W. Cohen (Chair), Teruko Mitamura, Tom Mitchell,

C. Lee Giles (Pennsylvania State University)

Page 2: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Knowledge itself is power. --Francis Bacon

7/13/2012 2 an algorithm which tries to achieve something,

e.g. IR/IE/QA/MT

KB as an edge-labeled graph

Page 3: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Link Prediction -- a generic relational learning task

Given

a directed edge-labeled graph

a relation type r

a source node s (also called a query)

Find

the set of nodes G, so that r(s,t) for each t in G

7/13/2012 3

Page 4: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Infer New Knowledge

7/13/2012 4

Application

Charlotte

BrontëPainter

Writer

Profession?Carpenter

What is the profession of Charlotte Brontë?

Page 5: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Consider Friends/Family

7/13/2012 5

Charlotte

Brontë

Patrick Brontë

HasFather

Writer

Profession

Page 6: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Consider Behaviors

7/13/2012 6

IsA-1 is the inverse of IsA Wrote-1 is the inverse of Wrote

Charlotte

BrontëWriter

Jane

Eyre

Wrote

Novel

A Tale of

Two Cities

IsA-1

IsA

Charles Dickens

Profession

Wrote-1

Page 7: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Consider Literatures/Publications

7/13/2012 7

Mentioned

Charlotte

BrontëWriter

Mentioned-1

Painter

Profession

Page 8: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Reading Recommendation

7/13/2012 8

a paper stream

these are interesting papers

Application

Page 9: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

7/13/2012 9

a paper river

Page 10: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

7/13/2012 10

a paper river

new development of an interesting topic

citewrite

Page 11: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

7/13/2012 11

a paper river

new papers of my favorite author

readwrite

write

my favorite author

Page 12: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

7/13/2012 12

a paper river

social recommendation

scientist who have similar interests

read

readreadwrite

read

Page 13: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Relational Learning Goals

7/13/2012 13

robust

scalable

expressive define features expressing

sequences of relations on graph

combine many such features when making decisions

efficiently discover and calculate such features

Relational learning is a subfield of artificial intelligence, that learns with expressive logical or relational representations.

Page 14: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Why is relational learning computationally challenging?

7/13/2012 14

Exponentially many path types

IsAIsA-1 AthletePlaysSport

Exponentially many path instantiations

s

Our solution: feature metrics

Our solution: sampling

Page 15: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Thesis Outline

15

Ch. 2: motivation

Ch. 3: knowledge base inference (Lao+, EMNLP 2011)

Ch. 4: literature recommendation (Lao & Cohen, DILS 2012)

Ch. 6: relation extraction from parsed text (Lao+, EMNLP 2012)

Applications

Ch. 7: coordinate term extraction

Ch. 2: Path Ranking Algorithm (Lao & Cohen, MLJ 2010)

Ch. 5: efficient RW (Lao & Cohen, KDD 2010)

Ch. 6: distributed computing

Algorithms

Ch. 7: more expressive features (submitted)

Ch. 8: future work

Page 16: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Outline

16

previous work

idea

contribution

Motivation

knowledge base inference

literature recommendation

relation extraction from parsed text

Applications

coordinate term extraction

Path Ranking Algorithm (PRA)

efficient RW

distributed computing

Algorithms

more expressive features

the problem

Page 17: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Inductive Logic Programming

e.g.

First Order Inductive Learner--FOIL (Quinlan, ECML’93)

7/13/2012 17

HasFather(a,b) ^ Profession(b,y) Profession(a,y)

not robust

High precision Horn clauses

not scalable

experimental comparison later

expressive

Page 18: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Undirected Graphical Models -- combine logics with GM

e.g. Markov Logic Networks (Kok & Domingos, ICML’05) Relational CRFs (Lao+, NIPS’10)

7/13/2012 18

Horn clauses smokes(A) & Friends(A,B)smokes(B)

as CRFs features smokes(A) & Friends(A,B) & !smokes(B)

robust

not scalable

expressive

Page 19: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Random Walk with Restart -- ignore logic

19

P(Charlotte Writer)

P(Charlotte Painter)

7/13/2012

e.g. Tong+, ICDM’06

experimental comparison later

robust

not expressive

scalable

Mentioned

Charlotte

Brontë

Patrick Brontë

HasFather

Writer

Profession

Mentioned-1

Jane

Eyre

Wrote

Novel

A Tale of

Two Cities

IsA-1

IsA

Charles Dickens

Profession

Wrote-1

PainterProfession

Page 20: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Outline

20

previous work

idea

contribution

Motivation

knowledge base inference

literature recommendation

relation extraction from parsed text

Applications

coordinate term extraction

Path Ranking Algorithm (PRA)

efficient RW

distributed computing

Algorithms

more expressive features

the problem

Page 21: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Relational Classification -- combine logics with RWs

21

e.g. Path Ranking algorithm (Lao & Cohen, MLJ’10)

7/13/2012

P(Charlotte Writer; <HasFather,IsA>)

P(Charlotte Writer; <Mention,Mention-1,IsA>)

P(Charlotte Painter; <HasFather,IsA>)

P(Charlotte Painter; <Mention,Mention-1,IsA>)

Mentioned

Profession?

Charlotte

Brontë

Patrick Brontë

HasFather

Writer

Profession

Mentioned-1

Profession

Jane

Eyre

Wrote

Novel

A Tale of

Two Cities

IsA-1

IsA

Charles Dickens

Profession

Wrote-1

Painter

Profession

robust

scalable

expressive

Page 22: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Contribution

22

made possible by

a family of easy-to-learn features

fast random walk

distributed computing

Apply relational learning

at scales not possible before

Page 23: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Outline

23

previous work

idea

contribution

Motivation

knowledge base inference

literature recommendation

relation extraction from parsed text

Applications

coordinate term extraction

Path Ranking Algorithm (PRA)

efficient RW

distributed computing

Algorithms

more expressive features

the problem

Page 24: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Path Ranking Algorithm (PRA)

24

(Lao & Cohen, MLJ 2010)

7/13/2012

( , ) ( ; )B

score s t P s t

e.g. π=<Mention,Mention-1,IsA>

robust

expressive

a weight

Page 25: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

( , ) ( ; )B

score s t P s t

Random Walk Calculation

25 7/13/2012

e.g.

π’=<Mention,Mention-1>

r=Profession

( ; ) ( ; ') ( ; )z

P s t P s z P z t r

Dynamic Programing

π’

π’

π’

z3

s z2

z1

t2

t1

rr

rr

later about how to do it x100 more efficiently using sampling scalable

Page 26: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

( , ) ( ; )B

score s t P s t

Feature Selection with Labeled Data

given training query set {(si, Gi)}

7/13/2012 26

( ) ( , )i

i j

i j G

hits f I f s t h

( , )1

( )( , )

i

i j

j G

i i j

j

f s t

accuracy f aN f s t

I(): the indicator function N: total number of queries

Page 27: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

( , ) ( ; )B

score s t P s t

Estimating θ

for a relation r

generate positive and negative node pairs {(si, ti)}

for each (si, ti) generate (xi, yi)

xi is a vector of RW features of different paths π

yi is a binary label r(si, ti)

estimate θ by L1/L2 regularized (elastic-net) logistic regression

27 7/13/2012

Page 28: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Outline

28

previous work

idea

contribution

Motivation

knowledge base inference

literature recommendation

relation extraction from parsed text

Applications

coordinate term extraction

Path Ranking Algorithm (PRA)

efficient RW

distributed computing

Algorithms

more expressive features

the problem

Page 29: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Knowledge Base Inference

NELL (Never Ending Language Learner) v165

353 relations

0.7M nodes (concepts)

1.7M edges

7/13/2012 29

(Lao, Mitchell, Cohen, EMNLP 2010)

Example NELL relations

Application

Page 30: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

IsAIsA-1 AthletePlaysSport

7/13/2012 30

PRA Uses Broad Coverage Features

AthletePlaysSport(HinesWard, ?)

Page 31: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

p@10 p@100 p@1000

Pre

cisi

on

FOIL

PRA

PRA Has Much Higher Recall and Is Much Faster

7/13/2012 31

Mechanical Turk evaluate new beliefs of 8 functional relations

PRA trains in an hour vs. FOIL trains in a few days

Page 32: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Outline

32

previous work

idea

contribution

Motivation

knowledge base inference

literature recommendation

relation extraction from parsed text

Applications

coordinate term extraction

Path Ranking Algorithm (PRA)

efficient RW

distributed computing

Algorithms

more expressive features

the problem

Page 33: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Biology Literatures

Databases

Yeast: 0.8M nodes, 3.5M edges

Fly: 0.7M nodes, 16.9M edges

33

Application

7/13/2012

author

71k

paper

50k

gene

5.6k 160k

Relates to

1.6K

Cites

0.3M

year

64 Before

title word

40k 0.5M

journal

1.0k

institute

6k

39k

mesh

descriptors/

qualifiers

29k

1.1M

chemical

14k

0.3M

Page 34: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Recommendation Tasks

Literature Recommendation year, author papers a user is going to read

training data --- 1 user over 20 years

34

(collected from Dr. John Woolford’s computer)

7/13/2012

Page 35: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

PRA Combines Dozens of Recommendation Strategies

7/13/2012 35

citewrite

readwrite

write

Page 36: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

7/13/2012 36

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

2 3 4 5

MR

R

Maximum Path Length (L)

PRARWRRWR(no training)Community-basedContent-based

Reading Recommendation

Mean Reciprocal

Rank

Page 37: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Outline

37

previous work

idea

contribution

Motivation

knowledge base inference

literature recommendation

relation extraction from parsed text

Applications

coordinate term extraction

Path Ranking Algorithm (PRA)

efficient RW

distributed computing

Algorithms

more expressive features

the problem

Page 38: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Efficient Random Walks

Exact calculation of random walks results in non-zero probabilities for many internal nodes

38

(Lao & Cohen, KDD 2010)

1 billion

nodesquery

node A few nodes that

we care about

Charlotte

Writer

Painter

Zebra

7/13/2012

Page 39: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Idea: a few random walkers (particles) are enough to distinguish good target

nodes from bad ones

39

1 billion

nodesquery

node A few nodes that

we care about

Writer

Painter

Zebra Charlotte

7/13/2012

details

Page 40: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

0.17

0.18

0.19

0.20

0.21

0.22

0.23

1 10 100 1000

MR

R

Speedup

200

1k1k

300

Finger PrintingParticle FilteringFixed TruncationBeam Truncation

Compare Speedup Approaches

40 7/13/2012

10x ~ 100x faster with little loss of quality

exact random walks

exact random walks

Gene Recommendation on Fly Data (N=2k)

Mean Reciprocal

Rank

number of particles

Page 41: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Outline

41

previous work

idea

contribution

Motivation

knowledge base inference

literature recommendation

relation extraction from parsed text

Applications

coordinate term extraction

Path Ranking Algorithm (PRA)

efficient RW

distributed computing

Algorithms

more expressive features

the problem

Page 42: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Relation Extraction

7/13/2012 42

wrote

She

Mention

dobj

Charlotte

was

nsubjnsubj

Jane Eyre

Charlotte

Bronte

Mention

Jane Eyre

Mention

Coreference Resolution

Entity

Resolution

Freebase

News Corpus

Dependency Trees

Write

Patrick BrontëHasFather

?

Profession

Writer

(21M concepts, 70M edges)

(60M )

Application

Page 43: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

7/13/2012 43

Can PRA scale?

Can PRA learn syntactic-semantic rules?

Page 44: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Distributed Computing

7/13/2012 44

Large number of queries e.g. 0.3M/2M persons have known profession Solution: map/reduce to explore path, generate training samples, calculate gradient, and do predictions for each query

Large text graph e.g. 60M documents Solution: each node keeps the Freebase graph in memory relevant sentences are loaded/unloaded for each query

Page 45: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

<M, conj, M-1, Profession>

7/13/2012 45

Combine Syntax with Semantics

“McDougall and Simon Phillips collaborated…”

M

Profession

IanMcDougall

?

M

conj

Performer

SimonPhilips

subj

Freebase

Parsed TextIan McDougall

Simon Phillips

collaborated

Page 46: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

<M, WORD, CW-1, profession-1, profession>

7/13/2012 46

Combine Text with Semantics

M

BarackObama

CW

President

“he”

Freebase

Parsed Text

leaderProfession

“leader”

other

persons

TW

tokens

words

host

CW

“host”

“president”

e.g.“The president said …”

Page 47: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Profession Nationality Parent

MR

R Freebase-only

Text-only

Freebase+Text

Text and KB Work Better Together

7/13/2012 47

Tested by existing knowledge in Freebase

Mean Reciprocal

Rank

with closed world assumption

Page 48: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Profession Nationality Parents

Pre

cisi

on

p@100

p@1k

p@10k

Highly Accurate New Beliefs

7/13/2012 48

manually evaluated new beliefs

Page 49: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Outline

49

previous work

idea

contribution

Motivation

knowledge base inference

literature recommendation

relation extraction from parsed text

Applications

coordinate term extraction

Path Ranking Algorithm (PRA)

efficient RW

distributed computing

Algorithms

more expressive features

the problem

Page 50: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Coordinate Term Extraction Task

parsed MUC-6 corpus 153k nodes, 748K edges

30 queries given 4 person names as seeds, find other persons

7/13/2012 50

Words/POSs

Tokens

Tokens

(Minkov & Cohen, ECML 2010)

Application

W

BillGates

BillGates

founded

founded

nsubj

W

W

SteveJobs

SteveJobs founded

nsubj

W

vbd

POS

POS

nnp

POS

POS

W: word POS: part of speech nnp: singular proper nouns vbd: verb, past tense nsubj: subject of a verb

Page 51: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Good Paths Are Quite Long

7/13/2012 51

<W-1,nsubj,W,W-1,nsubj-1,W>

find entities with similar behaviors

Words

Tokens

Tokens

W

BillGates

BillGates

founded

founded

nsubj

W

W

SteveJobs

SteveJobs founded

nsubj

W

Page 52: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Good Paths Are Quite Long

7/13/2012 52

0.00

0.05

0.10

0.15

0.20

3 4 5 6

MA

P

Max Path Length

kF

kF+1B

kF+2B

kF+3B

Mean Average

Precision

Page 53: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Forward Search Is Wasteful

7/13/2012 53

s

t

Find paths that connect s and t

Page 54: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Bidirectional Search Is More Efficient

7/13/2012 54

s

t

z

challenge is to calculate P(s→t;π)

Page 55: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Forward vs. Backward RWs

7/13/2012 55

( ; ) ( ; ') ( ; )z

P t s P t z P z s r

( ; ) ( ; ') ( ; )z

P s t P s z P z t r π’

π’

π’

z3

s z2

z1

t2

t1

rr

rr

evaluate P(s→t;π)

for many s

π’

π’

π’

z3

t z2

z1

s2

s1

rr

rr

Forward

Backward

evaluate P(s→t;π)

for many t

Details

Page 56: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Bidirectional Search with RW

7/13/2012 56

1 2 1 2( ; ) ( ; ) ( ; )z

P s t P s z P t z

π2

π2

π2 t

z3

z2

z1π1

π1sπ1

Forward RW Backward RW

Page 57: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

0.1

1

10

100

1000

3 4 5 6

Pat

h F

ind

ing

Tim

e (

s)

Max Path Length

2F+1B

3F+1B

3F

4F

2F+2B3F+2B

5F

4F+2B

3F+3B

4F+1B

Bidirectional Search Is Much Faster

7/13/2012 57

Exceed 20Gb memory limit

1000x faster

Page 58: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Need for Lexicalized Paths

7/13/2012 58

P(vbd→t | <POS-1, nsubj-1, W>)

Task: find person entities

W

BillGates

BillGates founded

nsubjW

SteveJobs

SteveJobs founded

nsubj

vbd

POSPOS

Tokens

Words

POSs

Page 59: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Evaluate Lexicalized Paths

7/13/2012 59

Given an example (si, ti)

calculate P(z→ti;π) for many z

π

π

πt

z3

z2

z1

Page 60: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

0.0

0.1

0.2

0.3

0.4

L=6

MA

P

RWR(no train)

RWR

FOIL

PRA

PRA+c2

PRA+c3

Person Name Extraction

7/13/2012 60

~1000 correct answers

Mean Average

Precision

Page 61: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Outline

61

previous work

idea

contribution

Motivation

knowledge base inference

literature recommendation

relation extraction from parsed text

Applications

coordinate term extraction

Path Ranking Algorithm (PRA)

efficient RW

distributed computing

Algorithms

more expressive features

the problem

Page 62: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Future Work

Apply knowledge to NLP/IE/IR/CV tasks

7/13/2012 62

arg max ( | , )decision

P decision context KB

wrote

She

Mention

dobj

Charlotte

was

nsubjnsubj

Jane Eyre

Charlotte

Bronte Jane Eyre

Mention

Coreference?

Entity

Resolution

Dependency Trees

Write

IsA FemaleKB

Text

Page 63: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Future Work

Conjunctions of Paths

rules can have tree structures

with source/constant/target nodes as leafs

7/13/2012 63

founded

t3

nsubj

W

dobj

W

Apple

W

SteveJobs

t2 t1

source s constant z target t

Page 64: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Future Work

Conjunctions of Paths

forward PCRW with multiple walkers

7/13/2012 64

founded

t3

nsubj

W

dobj

W

Apple

W

SteveJobs

t2 t1

source s constant z target t

Page 65: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Future Work

Conjunctions of Paths

backward PCRW with multiple walkers

7/13/2012 65

founded

t3

nsubj

W

dobj

W

Apple

W

SteveJobs

t2 t1

source s constant z target t

Page 66: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Contribution

66

Made possible by

a family of easy-to-learn features (3 types)

fast random walk (sampling)

distributed computing

Apply relational learning at scales not possible before.

Leads to new applications!

Page 67: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

other work I did at CMU

Relational CRFs (Lao+, NIPS’10)

Question answering (Lao+, NTCIR’08)

Utility based retrieval evaluation (Yang+, SIGIR’07)

7/13/2012 67

Page 68: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

7/13/2012 68

Page 69: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Future Work

KB extension new relation types, new concepts

7/13/2012 69

arg max ( | )KB

P corpus KB KB

arg max ( | , )KB

P decisions contexts KB KB

Unsupervised

Supervised

Page 70: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Directed Graphical Models e.g. Probabilistic Relational Models (Getoor+, ICML’01)

7/13/2012 70

model structure restricted to DAG

cannot express features corresponding to chains

Page 71: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Coverage of top k triples

7/13/2012 71

Profession Triples Unique Persons

1k 970

10k 8,726

100k 79,885

Page 72: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Repeatedly Combine Forward and Backward RWs

7/13/2012 72

Forward search

+Backward Search

+Backward Search

W-1,conj_and,W>

<W-1,conj_and,W,

W-1,conj_and,W,

Page 73: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Summary of PRA

7/13/2012 73

Stage Computation

Path Discovery

given {(si, Gi)}, find {f ; acc(f)>=a, hits(f)>=h}

Generate Training Samples

generate {(si, ti)} and {(xi, yi)}

Logistic Regression Training

Prediction

apply model to nodes s in domain(r)

2

1 1 2 2arg max ( ) || || || ||i

i

l

Page 74: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Need for Lexicalized Paths

7/13/2012 74

Bias toward MLB

A prior over the leagues participated by Boston Braves university athletes

Task=AthletePlaysInLeague

( ; )P mlb t

1,;

AthletePlaysForTeamP BostonBraves t

AthletePlaysInLeagure

Page 75: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

Need for Lexicalized Paths

7/13/2012 75

Bias toward Google

Companies around Google

Task=CompetesWith

( ; )P google t

; ,P Google t CompetesWith CompetesWith

Page 76: Efficient Random Walk Inference with Knowledge Bases · 2020-04-09 · Efficient Random Walk Inference with Knowledge Bases Ni Lao Carnegie Mellon University 2012-7-11 1 Committee:

0.0

0.1

0.2

0.3

0.4

0.5

0.6

L=3

MR

R

RWR(no train)

RWR

PRA

PRA+c1

PRA+c2

Knowledge Base Inference

7/13/2012 76

16 tasks