revisiting output coding for sequential supervised learning

12
Revisiting Output Coding for Sequential Supervised Learning Guohua Hao & Alan Fern School of Electrical Engineering and Computer Science Oregon State University Corvallis, OR, U.S.A.

Upload: nadda

Post on 15-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Revisiting Output Coding for Sequential Supervised Learning. Guohua Hao & Alan Fern School of Electrical Engineering and Computer Science Oregon State University Corvallis, OR, U.S.A. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Revisiting Output Coding for Sequential Supervised Learning

Revisiting Output Coding for Sequential Supervised Learning

Guohua Hao & Alan Fern

School of Electrical Engineering and Computer Science

Oregon State University

Corvallis, OR, U.S.A.

Page 2: Revisiting Output Coding for Sequential Supervised Learning

Scalability in CRF Training

Linear Chain CRF model

Inference in Training partition function : forward-backward algorithm Maximizing over label sequences: Viterbi algorithm Complexity of both:

Repeated inference in training Computationally demanding Can not scale to large label sets

Z X

1

1| exp , ,t t

t

P Y y yZ X XX

1kO T L

yt-1 yt+1yt

Xt-1 Xt+1Xt

Page 3: Revisiting Output Coding for Sequential Supervised Learning

Recent Work of Focus Sequential Error Correcting Output Coding (SECOC)

Error Correcting Output Coding (ECOC)

Class Code Word

b1 b2 … bn

C1 1 0 … 1

C2 0 0 … 0

… … … ... …

Cm 0 1 … 1

classifier h1 h2 hn

1 2, , , nH x h x h x h x

ˆ arg min ,i iy x H x C

Page 4: Revisiting Output Coding for Sequential Supervised Learning

Extension to CRF model

xt-1 xt xt+1

yt-1 yt yt+1

yt-1k yt

k yt+1k

ki k iy b y

original label sequence

binary label sequence

binary CRF kh

Decoding

yt-11 yt

1 yt+11

yt-1n yt

n yt+1n

1h

nh 1, , n

t t tH y yx

ˆ arg min ,t i t iy H C x

ˆty1ˆty 1ˆty

kb Y

Page 5: Revisiting Output Coding for Sequential Supervised Learning

Representational Capacity of SECOC Intuitively, it feels that training each binary CRF independently

will not be able to capture rich transition structure Counter-example to independent training

Our hypothesis: when the transition structure is critical, independent training will not do as well

1

1

1| 0 0.5

0 | 0 0.5

t t

t t

P y y

P y y

1

3 2

Y = 1 2 3 1 2 3 1

b1(Y) = 1 0 0 1 0 0 1

b1(Y)* = 1 0 1 0 1 0 0

b2(Y) = 0 1 0 0 1 0 0

b3(Y) = 0 0 1 0 0 1 0

y b1 b2 b3

1 1 0 0

2 0 1 0

3 0 0 1

Page 6: Revisiting Output Coding for Sequential Supervised Learning

Our Method—Cascaded SECOC Help capture the transition

structure For problems where a

transition model is critical, we hope to see cascade training outperform independent training

For problem where a observation model is more informative but the sliding window is small. Large sliding window will dominate the effect of cascade training

k hty

1k hty

1k hty

1kty

11kty

11kty

1kty 1

kty

kty

1t x 1t

xtx

1t x 1txtx

Pre

vio

us

bin

ary

pre

dic

tion

s

kb Y

Page 7: Revisiting Output Coding for Sequential Supervised Learning

Experimental Results

Base CRF training algorithms Gradient Tree Boosting (GTB) Voted Perceptron (VP)

Methods for comparison iid-- Non sequential ECOC i-SECOC--Independent

SECOC c-SECOC (h)--Cascaded

SECOC w/ history length h Beam search

1

|

1/ l t i i

t t ii t i

p y l LP y y l

L y L

Synthetic Data Sets Generation by HMM

“Transition” Data Set

“Both” Data Set

|1/ o t i i

t t ii t i

p o OP y l

O O

xx

x

0.2 8

0.6 2

o i

l i

p O

p L

0.6 2

0.6 2

o i

l i

p O

p L

Page 8: Revisiting Output Coding for Sequential Supervised Learning

window size 1 with GTB window size 3 with GTB

window size 1 with VP window size 3 with VP

Nettalk Data Set (134 labels)

Page 9: Revisiting Output Coding for Sequential Supervised Learning

Noun Phrase Chunking (NPC) (121 labels)

window size 3 with VP

"transition" data set: window size 1 with GTB

window size 1 with VP

Synthetic Data Sets (40 labels)

"both" data set: window size 3 with GTB

Page 10: Revisiting Output Coding for Sequential Supervised Learning

Comparing to Beam Search

window size 1 on nettalk window size 3 on nettalk

window size 1 on NPC window size 3 on NPC

Page 11: Revisiting Output Coding for Sequential Supervised Learning

Summary

i-SECOC can perform poorly when explicitly capturing complex transition models is critical

c-SECOC can improve accuracy in such situations by using cascade features

Performance of c-SECOC can depends strongly on the base CRF algorithm; Algorithms capable of capturing complex (non-linear) feature interactions are preferred

When using less powerful base CRF learning algorithms, other approaches (e.g. beam search) can outperform c-SECOC

Page 12: Revisiting Output Coding for Sequential Supervised Learning

Future Directions Efficient validation procedure for selecting cascade history

length Incremental generation of code words Wide comparison of methods for dealing with large label ses

Acknowledgements We thank John Langford for discussion of the counter

example to independent SECOC and Thomas Dietterich for his support. This work was supported by NSF grant IIS-0307592.