revisiting output coding for sequential supervised learning

Download Revisiting Output Coding for Sequential Supervised Learning

Post on 15-Jan-2016




0 download

Embed Size (px)


Revisiting Output Coding for Sequential Supervised Learning. Guohua Hao & Alan Fern School of Electrical Engineering and Computer Science Oregon State University Corvallis, OR, U.S.A. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A. - PowerPoint PPT Presentation


  • Revisiting Output Coding for Sequential Supervised LearningGuohua Hao & Alan FernSchool of Electrical Engineering and Computer ScienceOregon State UniversityCorvallis, OR, U.S.A.TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAA

  • Scalability in CRF TrainingLinear Chain CRF model

    Inference in Training partition function : forward-backward algorithmMaximizing over label sequences: Viterbi algorithmComplexity of both: Repeated inference in trainingComputationally demandingCan not scale to large label setsyt-1yt+1ytXt-1Xt+1Xt

  • Recent Work of FocusSequential Error Correcting Output Coding (SECOC)Error Correcting Output Coding (ECOC)

  • Extension to CRF modelxt-1xtxt+1yt-1ytyt+1yt-1kytkyt+1kDecodingyt-11yt1yt+11yt-1nytnyt+1n

  • Representational Capacity of SECOCIntuitively, it feels that training each binary CRF independently will not be able to capture rich transition structureCounter-example to independent training

    Our hypothesis: when the transition structure is critical, independent training will not do as well132 Y = 1 2 3 1 2 3 1b1(Y) = 1 0 0 1 0 0 1

    b1(Y)* = 1 0 1 0 1 0 0b2(Y) = 0 1 0 0 1 0 0b3(Y) = 0 0 1 0 0 1 0

  • Our MethodCascaded SECOCHelp capture the transition structureFor problems where a transition model is critical, we hope to see cascade training outperform independent trainingFor problem where a observation model is more informative but the sliding window is small. Large sliding window will dominate the effect of cascade training

    Previous binary predictions

  • Experimental ResultsBase CRF training algorithmsGradient Tree Boosting (GTB)Voted Perceptron (VP)Methods for comparisoniid-- Non sequential ECOCi-SECOC--Independent SECOC c-SECOC (h)--Cascaded SECOC w/ history length hBeam searchSynthetic Data SetsGeneration by HMM

    Transition Data Set

    Both Data Set

  • Nettalk Data Set (134 labels)

  • Noun Phrase Chunking (NPC) (121 labels)Synthetic Data Sets (40 labels)

  • Comparing to Beam Search

  • Summaryi-SECOC can perform poorly when explicitly capturing complex transition models is criticalc-SECOC can improve accuracy in such situations by using cascade featuresPerformance of c-SECOC can depends strongly on the base CRF algorithm; Algorithms capable of capturing complex (non-linear) feature interactions are preferredWhen using less powerful base CRF learning algorithms, other approaches (e.g. beam search) can outperform c-SECOC

  • Future DirectionsEfficient validation procedure for selecting cascade history lengthIncremental generation of code wordsWide comparison of methods for dealing with large label sesAcknowledgements We thank John Langford for discussion of the counter example to independent SECOC and Thomas Dietterich for his support. This work was supported by NSF grant IIS-0307592.


View more >