semi-supervised learning with graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 ·...

51
Semi-Supervised Learning with Graphs Xiaojin (Jerry) Zhu LTI · SCS · CMU Thesis Committee John Lafferty (co-chair) Ronald Rosenfeld (co-chair) Zoubin Ghahramani Tommi Jaakkola 1

Upload: others

Post on 28-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Semi-Supervised Learning with Graphs

Xiaojin (Jerry) Zhu

LTI · SCS · CMU

Thesis Committee

John Lafferty (co-chair)

Ronald Rosenfeld (co-chair)

Zoubin Ghahramani

Tommi Jaakkola

1

Page 2: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Semi-supervised Learning

• classifiers need labeled data to train

• labeled data scarce, unlabeled data abundant

• Traditional classifiers cannot train on unlabeled data.

My thesis (semi-supervised learning): Develop classification

methods that can use both labeled and unlabeled data.

2

Page 3: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Motivating example 1: speech recognition

• sounds −→ words

• label data: transcription 10 to 400 real-time

• unlabeled data: radio, call center. Useful?

3

Page 4: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Motivating example 2: parsing

• sentences −→ parse trees

NP VP

PN VB NP PP

DET N P NP

DET N

S

I saw a falcon with a telescope

• labeled data: treebank (Hwa, 2003):

I English 40,000 sentences, 5 years; Chinese 4,000 sentences, 2 years

• unlabeled data: sentences without annotation. Useful?

4

Page 5: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Motivating example 3: video surveillance

• images −→ identities

• labeled data: only a few images

• unlabeled data: abundant. Useful?

5

Page 6: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

The message

Unlabeled data can improve classification.

6

Page 7: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Why unlabeled data might help

example: classify astronomy vs. travel articles

d1 d3 d4 d2

asteroid • •bright • •comet •

yearzodiac

...airport

bikecamp •

yellowstone • •zion •

7

Page 8: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Why labeled data alone might fail

d1 d3 d4 d2

asteroid •bright •comet

yearzodiac •

...airport •

bike •camp

yellowstone •zion •

• no overlap!

• tends to happen when labeled data is scarce

8

Page 9: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Unlabeled data are stepping stones

d1 d5 d6 d7 d3 d4 d8 d9 d2

asteroid •bright • •comet • •

year • •zodiac • •

...airport •

bike • •camp • •

yellowstone • •zion •

• observe direct similarity d1 ∼ d5, d5 ∼ d6, . . .

• assume similar features ⇒ same label

• infer indirect similarity

9

Page 10: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Unlabeled data are stepping stones

• arrange l labeled and u unlabeled(=test) points in a graph

I nodes: the n = l + u points

I weighted edges W : direct similarity, e.g. Wij = exp(−‖xi−xj‖2

2σ2

)• infer indirect similarity (using all paths)

d1

d2

d4

d3

10

Page 11: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

One way to use labeled and unlabeled data(Zhu and Ghahramani, 2002)

• input: n× n graph weights W (important!)

labels Yl ∈ 0, 1l

• create transition matrix Pij = Wij/∑

j′Wij′

• start from any n-vector f , repeat:

I clamp labeled data fl = Yl

I propagate f ← Pf

• f converges to a unique solution, the harmonic function.

0 ≤ f ≤ 1, “soft labels”

11

Page 12: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

An electric network interpretation(Zhu, Ghahramani and Lafferty, ICML2003)

• harmonic function f is the voltage at the nodes

I edges are resistors with R = 1/WI 1 volt battery connects to labeled nodes

+1 volt

wijR =ij

1

1

0

12

Page 13: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

A random walk interpretation

harmonic function fi = P (hit label 1 | start from i)

demo

13

Page 14: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Closed form solution for the harmonic function

• diagonal degree matrix D: Dii =∑

j Wij

• graph Laplacian matrix ∆ = D −W

fu = −∆uu−1∆ulYl

• ∆: graph version of the continuous Laplacian operator

∇2 = ∂2

∂x2 + ∂2

∂y2 + ∂2

∂z2

• harmonic: ∆f = 0 on unlabeled data, with Dirichlet boundary

conditions on labeled data

14

Page 15: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Properties of the harmonic function

• currents in-flow = out-flow at any node (Kirchoff’s law)

• min energy E(f) =∑

i∼j Wij(fi − fj)2 = f>∆f

• average of neighbors: fu(i) =P

j∼i Wijf(j)Pj∼i Wij

• uniquely exists

• 0 ≤ f ≤ 1

15

Page 16: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Harmonic functions for text categorization

accuracy

baseball PC religion

hockey Mac atheism

50 labeled articles, about 2000 unlabeled articles. 10NN graph.

16

Page 17: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Harmonic functions for digit recognition

accuracy

1 vs. 2 10 digits

50 labeled images, about 4000 unlabeled images, 10NN graph

17

Page 18: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Why it works on text? Good graphs

• cosine on tf.idf

• quotes

18

Page 19: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Why it works on digits? Good graphs

• pixel-wise Euclidean distance

not similar indirectly similar

with stepping stones

19

Page 20: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Where do good graphs come from?

• domain knowledge

• learn hyperparameters Wij = exp(−

∑d

(xid−xjd)2

αd

)with

e.g. Bayesian evidence maximization

100

150

200

250

100

150

200

250

126

128

130

132

134

136

138

140

142

144

average 7 average 9 learned α

good graphs + harmonic functions = semi-supervised learning

20

Page 21: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Too much unlabeled data?

• does it scale?

I fu = −∆uu−1∆ulYl

I O(u3), e.g. millions of crawled web pages

• solution 1: use iterative methods

I the label propagation algorithm (slow)

I loopy belief propagation

I conjugate gradient

• solution 2: reduce problem size

I use a random small unlabeled subset (Delalleau et al. 2005)

I harmonic mixtures

• can it handle new points (induction)?

21

Page 22: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Harmonic mixtures(Zhu and Lafferty, 2005)

• fit unlabeled data with a mixture model, e.g.

I Gaussian mixtures for images

I multinomial mixtures for documents

• use EM or other methods

• M mixture components, here

M = 30 u ≈ 1000

• learn soft labels for the mixture

components, not the unlabeled

points

22

Page 23: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Harmonic mixtureslearn labels for mixture components

• assume mixture component labels λ1, · · · , λM

• λ determines f

I mixture model responsibility R: Rim = p(m|xi)

I f(i) =∑M

m=1Rimλm

• learn λ such that f is closest to harmonic

I minimize energy E(f) = f>∆fI convex optimization

I closed form solution λ = −(R>∆uuR

)−1R>∆ulYl

23

Page 24: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Harmonic mixtures

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

λ follows the graph, so does f

24

Page 25: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Harmonic mixturescomputational savings

• harmonic mixtures fu = −R(R>∆uuR︸ ︷︷ ︸M×M

)−1R>∆ulYl

• original harmonic function fu = −(∆uu︸︷︷︸u×u

)−1∆ulYl

• harmonic mixtures O(M3) O(u3)

Harmonic mixtures can handle large problems.

Also induction f(x) =∑M

m=1Rxmλm

25

Page 26: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Too little labeled data?(Zhu, Lafferty and Ghahramani, 2003b)

• choose unlabeled points to label: active learning

• most ambiguous vs. most influential

01

a 0.5

B 0.4

26

Page 27: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Active Learning

• generalization error

err =∑i∈U

∑yi=0,1

(sgn(fi) 6= yi)Ptrue(yi)

• approximation: Ptrue(yi = 1) ≈ fi

• estimated error: err =∑

i∈U min (fi, 1− fi)

• estimated error after querying k with answer yk:

err+(xk,yk) =

∑i∈U

min(f

+(xk,yk)i , 1− f+(xk,yk)

i

)

27

Page 28: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Active Learning

• approximation: err+xk ≈ (1− fk)err

+(xk,0) + fkerr+(xk,1)

• select the query k∗ = arg mink err+xk

• ‘re-train’ is fast with harmonic function

f+(xk,yk)u = fu + (yk − fk)

(∆uu)−1·k

(∆uu)−1kk

Active learning can be effectively integrated into semi-supervised

learning.

28

Page 29: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Beyond harmonic functions

• harmonic functions too specialized?

• I will show you things behind harmonic functions:

I probabilistic model

I kernel

• I will give you an even better kernel.

29

Page 30: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

The probabilistic model behind harmonic function

• energy E(f) =∑

i∼j Wij(fi − fj)2 = f>∆f

• low energy ⇒ good label propagation

E(f) = 4 E(f) = 2 E(f) = 1

• Markov random field p(f) ∝ exp (−E(f))

• if f ∈ 0, 1 discrete, Boltzmann machines, inference hard

30

Page 31: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

The probabilistic model behind harmonic functionGaussian random fields(Zhu, Ghahramani and Lafferty, ICML2003)

• continuous relaxation f ∈ R

• Gaussian random field p(f) ∝ exp(−f>∆f

)I n-dimensional Gaussian

I harmonic function = mean, easy to compute

I inverse covariance matrix ∆

• equivalent to Gaussian processes on finite data

I with kernel (covariance) matrix ∆−1

31

Page 32: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

The kernel matrix behind harmonic functions

K = ∆−1

• Kij = indirect similarity

I The direct similarity Wij may be small

I But Kij will be large if many paths between i, j

• K can be used with many kernel machines

I K built with both labeled and unlabeled data

I K + SVM = semi-supervised SVM

I different from transductive SVM (Joachims 1999)

I additional benefit: handles noisy labeled data

32

Page 33: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

K encourages smooth functions

• graph spectrum ∆ =∑n

k=1 λkφkφ>k

• small eigenvalue, smooth eigenvector∑i∼j Wij(φk(i)− φk(j))2 = λk

• K encourage smooth eigenvectors by large weights

Laplacian ∆ =∑

k λkφkφ>k

harmonic kernel K = ∆−1 =∑

k1λkφkφ

>k

• smooth functions have small RKHS norm

||f ||K = f>K−1f = f>∆f =∑i∼j

Wij(fi − fj)2

33

Page 34: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

General semi-supervised kernels

• K = ∆−1 may not be the best semi-supervised kernel

• general semi-supervised kernels

K =∑

i r(λi)φiφ>i

• r(λ) large when λ small, to encourage smooth eigenvectors.

• Specific choices of r() lead to some known kernels

I harmonic function kernel r(λ) = 1/λI diffusion kernel r(λ) = exp(−σ2λ)I random walk kernel r(λ) = (α− λ)p

• Is there a best r()? Yes, as measured by kernel alignment.

34

Page 35: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Alignment measures kernel quality

• alignment between kernel and the labeled data Yl

align(K,Yl) =〈Kll, YlYl

>〉‖ Kll ‖ · ‖ YlYl

> ‖

• extension of cosine angle between vectors

• high alignment related to good generalization performance

• leads to a convex optimization problem

35

Page 36: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Finding the best kernel(Zhu, Kandola, Ghahramani and Lafferty, NIPS2004)

• the order constrained semi-supervised kernel

max r align(K,Yl)subject to K =

∑i riφiφ

>i

r1 ≥ · · · ≥ rn ≥ 0

• order constraints r1 ≥ · · · ≥ rn encourage smoothness

• convex optimization

• r nonparametric

36

Page 37: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

The order constrained kernel improvesalignment and accuracy

text categorization (religion vs. atheism), 50 labeled and 2000

unlabeled articles.

• alignment

kernel order harmonic RBF

alignment 0.31 0.17 0.04

• accuracy with support vector machines

kernel order harmonic RBF

accuracy 84.5 80.4 69.3

We now have good kernels for semi-supervised learning.

37

Page 38: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Sequences and other structured data(Lafferty, Zhu and Liu, 2004)

y’

x’

y

x

• what if x1 · · ·xn form sequences?

speech, natural language

processing, biosequence analysis,

etc.

• conditional random fields (CRF)

• kernel CRF

• kernel CRF + semi-supervised

kernels

38

Page 39: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Related work in semi-supervised learning

based on different assumptions

method assumptions references

graph-based similar feature, same label this talk

mincuts (Blum & Chawla, 2001)

normalized Laplacian (Zhou et al., 2003)

regularization (Belkin et al., 2004)

mixture model, EM generative model (Nigam et al., 2000)

transductive SVM low density separation (Joachims, 1999)

info. regularization y varies less when p(x) high (Szummer & Jaakkola 02)

co-training feature split (Blum & Mitchell, 1998)

· · ·

39

Page 40: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Key contributions of the thesis

• use unlabeled data? harmonic functions

• good graph? graph hyperparameter learning

• large problems? harmonic mixtures

• pick items to label? active semi-supervised learning

• probabilistic framework? Gaussian random fields

• kernel machines? semi-supervised kernels

• structured data? kernel conditional random fields

40

Page 41: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Summary

Unlabeled data can improve classification.

The methods have reached the stage where we canapply them to real-world tasks.

41

Page 42: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Thank You

42

Page 43: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Open questions

• more efficient graph learning

• more research on structured data

• regression, ranking, clustering

• new semi-supervised assumptions

• applications: NLP, bioinformatics, vision, etc.

• learning theory: consistency etc.

43

Page 44: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

References

M. Belkin, I. Matveeva and P. Niyogi. Regularization and Semi-supervisedLearning on Large Graphs. COLT 2004.A. Blum and S. Chawla. Learning from Labeled and Unlabeled Data usingGraph Mincuts. ICML 2001.A. Blum, T. Mitchell. Combining Labeled and Unlabeled Data with Co-training.COLT 1998.O. Delalleau, Y. Bengio, N. Le Roux. Efficient Non-Parametric FunctionInduction in Semi-Supervised Learning. AISTAT 2005.R. Hwa. A Continuum of Bootstrapping Methods for Parsing Natural Languages.2003.T. Joachims, Transductive inference for text classification using support vectormachines. ICML 1999.K. Nigam, A. McCallum, S Thrun, T. Mitchell. Text Classification from Labeledand Unlabeled Documents using EM. Machine Learning. 2000.M. Szummer, T. Jaakkola. Information Regularization with Partially LabeledData. NIPS 2002.D. Zhou, O. Bousquet, T.N. Lal, J. Weston, B. Schlkopf. Learning with Localand Global Consistency. NIPS 2003.

44

Page 45: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

X. Zhu, J. Lafferty. Harmonic Mixtures. 2005. submitted.X. Zhu, Jaz Kandola, Z. Ghahramani, J. Lafferty. Nonparametric Transforms ofGraph Kernels for Semi-Supervised Learning. NIPS 2004.J. Lafferty, X. Zhu, Y. Liu. Kernel Conditional Random Fields: Representationand Clique Selection. ICML 2004.X. Zhu, Z. Ghahramani, J. Lafferty. Semi-Supervised Learning Using GaussianFields and Harmonic Functions. ICML 2003.X. Zhu, J. Lafferty, Z. Ghahramani. Combining Active Learning and Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. ICML2003 workshop.X. Zhu, Z. Ghahramani. Learning from Labeled and Unlabeled Data with LabelPropagation. CMU-CALD-02-106, 2002.

45

Page 46: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Graph spectrum ∆ =∑n

i=1 λiφiφ>i

46

Page 47: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

Learning component label with EM

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

Labels for the components do not follow the graph.(Nigam et al., 2000)

47

Page 48: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

KCRF: Sequences and other structured data

• KCRF: pf(y|x) = Z−1(x, f) exp (∑

c f(x,yc))

• f induces regularized negative log loss on training data

R(f) =l∑

i=1

− log pf(y(i)|x(i)) + Ω(‖ f ‖K)

• representer theorem for KCRFs: loss minimizer

f?(x,yc) =l∑

i=1

∑c′

∑y′

c′

α(i,y′c′)K((x(i),y′c′), (x,yc))

48

Page 49: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

KCRF: Sequences and other structured data

• learn α to minimize R(f), convex, sparse training

• special case K((x(i),y′c′), (x,yc)) = ψ(K ′(x(i)c′ ,xc),y′c′,yc)

• K ′ can be a semi-supervised kernel.

49

Page 50: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

A random walk interpretationof harmonic functions

• harmonic function fi = P (hit label 1 | start from i)

I random walk from node i to j with probability Pij

I stop if we hit a labeled node

• indirect similarity: random walks have similar destinations

1

0

i

50

Page 51: Semi-Supervised Learning with Graphs › ~zhuxj › tmp › thesis_defense.pdf · 2005-06-09 · Related work in semi-supervised learning based on different assumptions method assumptions

An image and its neighbors in the graph

image 4005 neighbor 1: time edge neighbor 2: color edge

neighbor 3: color edge neighbor 4: color edge neighbor 5: face edge

51