group norm for learning latent structural svms

1
Group Norm for Learning Latent Structural SVMs Daozheng Chen (UMD, College Park), Dhruv Batra (TTI-Chicago), Bill Freeman (MIT), Micah K. Johnson (GelSight, Inc.) Overview • Data with complete annotation is rarely ever available. • Latent variable models capture interaction between o observed data (e.g. gradient histogram image features) o latent or hidden variables not observed in the training data (e.g. location of object parts). • Parameter estimation involve a difficult non-convex optimization problem (EM, CCCP, self-paced learning) • Our goal • Estimate model parameters Learn the complexity of latent variable space. • Our approach norm for regularization to estimate the parameters of a latent-variable model. Latent Structural SVM Prediction Rule: Label space Latent Space Joint feature vector Inducing Group Norm w partitioned into P groups; each group corresponds to the parameters of a latent variable state Induce Group Norm Alternating Coordinate and Subgradient Descent nonconve x convex convex Rewrite Learning Objective Minimize Upper bound of convex if {h i } is fixed Digit recognition experiment (following the setup of Kumar et al. NIPS ‘10) MNIST data: binary classification on four difficult digit pairs • (1,7), (2,7), (3,8), (8,9) • Training data 5,851 - 6,742, and testing data 974 - 1,135 • Rotate digit images with angles from -60 o to 60 o • PCA to form 10 dimensional feature vector Experimen t • Significantly higher accuracy than random sampling. • 66% faster than full model with no loss in accuracy! Digit Recognition Key Contribution - 60 o -48 o - 36 o - 24 o - 12 o 0 o 12 o 24 o 36 o 48 o 60 o - 60 o -48 o - 12 o -60 o - 48 o 0 o - 48 o - 36 o l 2 norm of the parameter vectors for different angles over the 4 digit pairs. • Select only a few angles, much fewer than 22 angles Angles Not Selected Images Rotation (Latent Var.) Feature Vector • At group level, the norm behave like norm and induces group sparsity. norm for regularization • Within each group, the norm behave like norm and does not promote sparsity. Learning objective: Subgradient Felzenszwalb et al. car model on the PASCAL VOC 2007 data. Each row is a component of the model. Root filters Part filters Part displacement Component #1 Component #2

Upload: brandon-patrick

Post on 02-Jan-2016

40 views

Category:

Documents


3 download

DESCRIPTION

Group Norm for Learning Latent Structural SVMs Daozheng Chen (UMD, College Park), Dhruv Batra (TTI-Chicago), Bill Freeman (MIT), Micah K. Johnson (GelSight, Inc.). Overview. Induce Group Norm. Our goal Estimate model parameters Learn the complexity of latent variable space. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Group Norm for Learning Latent Structural SVMs

Group Norm for Learning Latent Structural SVMsDaozheng Chen (UMD, College Park), Dhruv Batra (TTI-Chicago),

Bill Freeman (MIT), Micah K. Johnson (GelSight, Inc.)

Overview

• Data with complete annotation is rarely ever available.

• Latent variable models capture interaction betweeno observed data (e.g. gradient histogram image features)o latent or hidden variables not observed in the training data (e.g. location of object parts).

• Parameter estimation involve a difficult non-convex optimization problem (EM, CCCP, self-paced learning)

• Our goal• Estimate model parameters• Learn the complexity of latent variable space.

• Our approach• norm for regularization to estimate the parameters of a latent-variable model.

Latent Structural SVM

Prediction Rule:

Label space Latent Space

Joint feature vector

Inducing Group Normw partitioned into P groups; each group corresponds to the parameters of a latent variable state

Induce Group Norm

Alternating Coordinate and Subgradient Descent

nonconvex

convex

convex

Rewrite Learning Objective

Minimize Upper bound of

convex if {hi} is fixed

Digit recognition experiment (following the setup of Kumar et al. NIPS ‘10)• MNIST data: binary classification on four difficult digit pairs

• (1,7), (2,7), (3,8), (8,9)• Training data 5,851 - 6,742, and testing data 974 - 1,135 • Rotate digit images with angles from -60o to 60o • PCA to form 10 dimensional feature vector

Experiment

• Significantly higher accuracy than random sampling.

• 66% faster than full model with no loss in accuracy!

Digit Recognition

Key Contribution

-60o -48o -36o -24o -12o 0o 12o 24o 36o 48o 60o

-60o-48o -12o-60o -48o0o -48o-36o

l2 norm of the parameter vectors for different angles over the 4 digit pairs.• Select only a few angles, much fewer than 22 angles Angles Not Selected

ImagesRotation

(Latent Var.)Feature Vector

• At group level, the norm behave like norm and induces group sparsity.

norm for regularization

• Within each group, the norm behave like norm and does not promote sparsity.

Learning objective:

Subgradient

Felzenszwalb et al. car model on the PASCAL VOC 2007 data. Each row is a component of the model.

Root filters Part filters Part displacement

Component #1

Component #2