semi-supervised learning with graphs

54
Semi-Supervised Learning With Graphs William Cohen 1

Upload: cordelia-weeks

Post on 30-Dec-2015

65 views

Category:

Documents


0 download

DESCRIPTION

Semi-Supervised Learning With Graphs. William Cohen. Main topics today. Scalable semi-supervised learning on graphs SSL with RWR SSL with coEM/wvRN/HF Scalable unsupervised learning on graphs Power iteration clustering …. Semi-supervised learning. A pool of labeled examples L - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Semi-Supervised Learning With Graphs

1

Semi-Supervised Learning With Graphs

William Cohen

Page 2: Semi-Supervised Learning With Graphs

2

Review – Graph Algorithms so far….

• PageRank and how to scale it up• Personalized PageRank/Random Walk with Restart and

–how to implement it–how to use it for extracting part of a graph

• Other uses for graphs?–not so much

We might come back to this more

You can also look at the March 19 lecture from the spring 2015 version of this class.

HW6

Page 3: Semi-Supervised Learning With Graphs

3

Main topics today

• Scalable semi-supervised learning on graphs–SSL with RWR–SSL with coEM/wvRN/HF

• Scalable unsupervised learning on graphs–Power iteration clustering–…

Page 4: Semi-Supervised Learning With Graphs

4

Semi-supervised learning

• A pool of labeled examples L• A (usually larger) pool of unlabeled examples U• Can you improve accuracy somehow using U?

Page 5: Semi-Supervised Learning With Graphs

5

Semi-Supervised Bootstrapped Learning/Self-training

ParisPittsburgh

SeattleCupertino

mayor of arg1live in arg1

San FranciscoAustindenial

arg1 is home oftraits such as arg1

anxietyselfishness

Berlin

Extract cities:

Page 6: Semi-Supervised Learning With Graphs

6

Semi-Supervised Bootstrapped Learning via Label Propagation

Paris

live in arg1

San FranciscoAustin

traits such as arg1

anxiety

mayor of arg1

Pittsburgh

Seattle

denial

arg1 is home of

selfishness

Page 7: Semi-Supervised Learning With Graphs

7

Semi-Supervised Bootstrapped Learning via Label Propagation

Paris

live in arg1

San FranciscoAustin

traits such as arg1

anxiety

mayor of arg1

Pittsburgh

Seattle

denial

arg1 is home of

selfishness

Nodes “near” seeds Nodes “far from” seeds

Information from other categories tells you “how far” (when to stop propagating)

arrogancetraits such as arg1

denialselfishnes

s

Page 8: Semi-Supervised Learning With Graphs

8

ASONAM-2010 (Advances in Social Networks Analysis and

Mining)

Page 9: Semi-Supervised Learning With Graphs

9

Network Datasets with Known Classes

•UBMCBlog•AGBlog•MSPBlog•Cora•Citeseer

Page 10: Semi-Supervised Learning With Graphs

10

RWR - fixpoint of:

Seed selection1. order by PageRank, degree, or randomly2. go down list until you have at least k

examples/class

Page 11: Semi-Supervised Learning With Graphs

11

Results – Blog data

Random Degree PageRank

We’ll discuss

this soon….

Page 12: Semi-Supervised Learning With Graphs

12

Results – More blog data

Random Degree PageRank

Page 13: Semi-Supervised Learning With Graphs

13

Results – Citation data

Random Degree PageRank

Page 14: Semi-Supervised Learning With Graphs

14

Seeding – MultiRankWalk

Page 15: Semi-Supervised Learning With Graphs

15

Seeding – HF/wvRN

Page 16: Semi-Supervised Learning With Graphs

16

What is HF aka coEM aka wvRN?

Page 17: Semi-Supervised Learning With Graphs

17

CoEM/HF/wvRN• One definition [MacKassey & Provost, JMLR 2007]:…

Another definition: A harmonic field – the score of each node in the graph is the harmonic (linearly weighted) average of its neighbors’ scores;

[X. Zhu, Z. Ghahramani, and J. Lafferty, ICML 2003]

Page 18: Semi-Supervised Learning With Graphs

18

CoEM/wvRN/HF• Another justification of the same algorithm….• … start with co-training with a naïve Bayes learner

Page 19: Semi-Supervised Learning With Graphs

19

CoEM/wvRN/HF• One algorithm with several justifications….• One is to start with co-training with a naïve Bayes learner

• And compare to an EM version of naïve Bayes– E: soft-classify unlabeled examples with NB classifier – M: re-train classifier with soft-labeled examples

Page 20: Semi-Supervised Learning With Graphs

20

CoEM/wvRN/HF• A second experiment

– each + example: concatenate features from two documents, one of class A+, one of class B+– each - example: concatenate features from two documents, one of class A-, one of class B-– features are prefixed with “A”, “B” disjoint

Page 21: Semi-Supervised Learning With Graphs

21

CoEM/wvRN/HF• A second experiment

– each + example: concatenate features from two documents, one of class A+, one of class B+– each - example: concatenate features from two documents, one of class A-, one of class B-– features are prefixed with “A”, “B” disjoint

• NOW co-training outperforms EM

Page 22: Semi-Supervised Learning With Graphs

22

CoEM/wvRN/HF

• Co-training with a naïve Bayes learner

• vs an EM version of naïve Bayes– E: soft-classify unlabeled examples with NB classifier – M: re-train classifier with soft-labeled examples

incremental hard assignments

iterative soft assignments

Page 23: Semi-Supervised Learning With Graphs

23

Co-Training Rote Learner

My advisor+

-

-

pageshyperlinks

-

--

-

+

+

-

-

-

+

++

+

-

-

+

+

-

Page 24: Semi-Supervised Learning With Graphs

24

Co-EM Rote Learner: equivalent to HF on a bipartite graph

Pittsburgh+

-

-

contextsNPs

-

--

-

+

+

-

-

-

+

++

+

-

-

+

+

-

lives in _

Page 25: Semi-Supervised Learning With Graphs

25

What is HF aka coEM aka wvRN?

Algorithmically:

• HF propagates weights and then resets the seeds to their initial value

• MRW propagates weights and does not reset seeds

Page 26: Semi-Supervised Learning With Graphs

26

MultiRank Walk vs HF/wvRN/CoEM

Seeds are marked S

MRWHF

Page 27: Semi-Supervised Learning With Graphs

27

Back to Experiments: Network Datasets with Known Classes

•UBMCBlog•AGBlog•MSPBlog•Cora•Citeseer

Page 28: Semi-Supervised Learning With Graphs

28

MultiRankWalk vs wvRN/HF/CoEM

Page 29: Semi-Supervised Learning With Graphs

29

How well does MWR work?

Page 30: Semi-Supervised Learning With Graphs

30

Parameter Sensitivity

Page 31: Semi-Supervised Learning With Graphs

31

Semi-supervised learning

• A pool of labeled examples L• A (usually larger) pool of unlabeled examples U• Can you improve accuracy somehow using U?• These methods are different from EM

– optimizes Pr(Data|Model)• How do SSL learning methods (like label propagation) relate to optimization?

Page 32: Semi-Supervised Learning With Graphs

32

SSL as optimizationslides from Partha Talukdar

Page 33: Semi-Supervised Learning With Graphs

33

Page 34: Semi-Supervised Learning With Graphs

34

yet another name for HF/wvRN/coEM

Page 35: Semi-Supervised Learning With Graphs

35

match seeds smoothnessprior

Page 36: Semi-Supervised Learning With Graphs

36

Page 37: Semi-Supervised Learning With Graphs

37

Page 38: Semi-Supervised Learning With Graphs

38

Page 39: Semi-Supervised Learning With Graphs

39

How to do this minimization?First, differentiate to find min is at

Jacobi method:• To solve Ax=b for x• Iterate:

• … or:

Page 40: Semi-Supervised Learning With Graphs

40

Page 41: Semi-Supervised Learning With Graphs

41

Page 42: Semi-Supervised Learning With Graphs

42

precision-recall

break even point

/HF/…

Page 43: Semi-Supervised Learning With Graphs

43

/HF/…

Page 44: Semi-Supervised Learning With Graphs

44

/HF/…

Page 45: Semi-Supervised Learning With Graphs

45

from mining patterns like “musicians such as Bob Dylan”

from HTML tables on the web that are

used for data, not

formatting

Page 46: Semi-Supervised Learning With Graphs

46

Page 47: Semi-Supervised Learning With Graphs

47

Page 48: Semi-Supervised Learning With Graphs

48

More recent work (AIStats 2014)• Propagating labels requires usually small number of optimization passes

– Basically like label propagation passes• Each is linear in

– the number of edges – and the number of labels being propagated

• Can you do better?– basic idea: store labels in a countmin sketch– which is basically an compact approximation of an objectdouble mapping

Page 49: Semi-Supervised Learning With Graphs

Flashback: CM Sketch Structure

Each string is mapped to one bucket per row Estimate A[j] by taking mink { CM[k,hk(j)] } Errors are always over-estimates Sizes: d=log 1/, w=2/ error is usually less than ||A||

1

+c

+c

+c

+c

h1(s)

hd(s)

<s, +c>

d=log 1/

w = 2/

from: Minos Garofalakis

49

Page 50: Semi-Supervised Learning With Graphs

50

More recent work (AIStats 2014)• Propagating labels requires usually small number of optimization passes

– Basically like label propagation passes• Each is linear in

– the number of edges – and the number of labels being propagated– the sketch size– sketches can be combined linearly without “unpacking” them: sketch(av + bw) = a*sketch(v)+b*sketch(w)– sketchs are good at storing skewed distributions

Page 51: Semi-Supervised Learning With Graphs

51

More recent work (AIStats 2014)• Label distributions are often very skewed

–sparse initial labels–community structure: labels from other subcommunities have small weight

Page 52: Semi-Supervised Learning With Graphs

52

More recent work (AIStats 2014)

Freebase Flick-10k

“self-injection”: similarity computation

Page 53: Semi-Supervised Learning With Graphs

53

More recent work (AIStats 2014)

Freebase

Page 54: Semi-Supervised Learning With Graphs

54

More recent work (AIStats 2014)

100 Gb available