efficient large-scale structured learning

Efficient Large-Scale Structured Learning

Steve Branson Oscar Beijbom Serge Belongie

CVPR 2013, Portland, Oregon

UC San Diego UC San Diego Caltech

Overview• Structured prediction • Learning from larger datasets

TINY IMAGES

Large Datasets

Mammal

Primate Hoofed Mammal

Odd-toedGorilla

Deformable part models Object detection

Orangutan Even-toed

Cost sensitive Learning

Overview• Available tools for structured learning not as

refined as tools for binary classification• 2 sources of speed improvement– Faster stochastic dual optimization algorithms– Application-specific importance sampling routine

Mammal

Odd-toedGorillaOrangutan

Even-toed

Summary• Usually, train time = 1-10 times test time• Publicly available software package– Fast algorithms for multiclass SVMs, DPMs– API to adapt to new applications– Support datasets too large to fit in memory– Network interface for online & active learning

Mammal

Even-toed

Summary

Cost-sensitive multiclass SVM• 10-50 times faster than

SVMstruct

• As fast as 1-vs-all binary SVM

Deformable part models• 50-1000 faster than– SVMstruct

– Mining hard negatives– SGD-PEGASOS

Mammal

Odd-toedGorillaOrangutan Even-toed

Binary vs. Structured

Binary Learner

SVM, Boosting,Logistic Regression,

Object Detection, Pose Registration, Attribute

Prediction, etc.

Structured Output

Structured Dataset

𝑌=(𝑥 , 𝑦 ,𝑤 , h)

𝑌=−1𝑌=+1

Binary Learner

SVM, Boosting,Logistic Regression,

Object Detection, Pose Registration, Attribute

Prediction, etc.

Structured Output

Structured Dataset

• Pros: binary classifier is application independent• Cons: what is lost in terms of:– Accuracy at convergence?– Computational efficiency?

Structured Prediction Loss∆ (𝑔 (𝑋 ) ,𝑌 𝑔𝑡)

≈ ≈ ∆ 01Binary Loss Convex Upper Bound

Source of Computational Speed

Structured Prediction Loss∆ (𝑔 (𝑋 ) ,𝑌 𝑔𝑡)

≈ ≈ ∆ 01Binary Loss Convex Upper Bound

ℓ (𝑋 ;𝑤)∆ (𝑔 (𝑋 ) ,𝑌 )

≈Convex Upper Bound on Structured Prediction Loss

Application-specific optimization algorithms that:– Converge to lower test error than binary solutions– Lower test error for all amounts of train time

Structured SVM• SVMs w/ structured output

• Max-margin MRF [Taskar et al. NIPS’03]

[Tsochantaridis et al. ICML’04]

Binary SVM SolversF aster Linear SVM Solvers

SVM struct𝑂 (𝑇𝑛𝜆𝜖 )

Quadratic to linear in trainset size

SVM perf P EGASOS L IBLINEARCutting Plane SGD≫ ¿ ≥

Linear to independent in trainset size

• Faster on multiple passes• Detect convergence• Less sensitive to

regularization/learning rate

Structured SVM Solvers

SVM perf P EGASOS L IBLINEARCutting Plane SGD

Faster Linear SVM Solvers

≫ ¿ ≥

SVM structCutting Plane SGD¿ ≥Applied to

[Shalev-Shwartz et al. JMLR’13]

[Ratliff et al. AIStats’07]

• Use faster stochastic dual algorithms• Incorporate application-specific importance

sampling routine– Reduce train times when prediction time T is large– Incorporate tricks people use for binary methods

Random Example Importance Sample

Maximize Dual SSVM objective w.r.t. samples

Our Approach

Our ApproachFor t=1… do1. Choose random training example (Xi,Yi)2. ,…,ImportanceSample()3. Approx. maximize Dual SSVM objective w.r.t. iend

Random Example Importance Sample

Maximize Dual SSVM objective w.r.t. samples

(Provably fast convergence for simple approx. solver)

efficient large-scale structured learning

binary classifier

binary svmdeformable

attribute prediction

trainset size quadratic

prediction time t

application independentcons

amounts of train timebinary

nprediction time

Documents

structured interviews for the glasgow outcome scale and

large-scale complex analytics on semi-structured datasets...

efficient decomposed learning for structured prediction...

cost-efficient scale

crowd counting with deep structured scale integration...

large-scale complex analytics on semi-structured datasets...

efficient processing of semi-structured and large data in...

structured interviews for the glasgow outcome scale … ·...

resource efficient computing for warehouse-scale...

cross-validation optimization for large scale structured...

flatstore: an efficient log-structured key-value storage

efficient adcs for nano-scale cmos technology

small-scale implementation of structured documentation a...

efficient decomposed learning for structured prediction

learning brain regions via large-scale online structured

the biofilm: efficient, adaptable, & structured

towards efficient load balancing in structured p2p systems

simulation and modelling of large-scale structured p2p

elf: an efficient log-structured flash file system for micro

magnetic core–shell-structured fe3o4@ceo2 as an efficient