bilinear classifiers for visual recognitionhpirsiav/papers/bilinear_nips... · 2013. 5. 9. ·...

1

Bilinear Classifiers for Visual Recognition

Computational Vision Lab.University of California Irvine

To be presented in NIPS 2009

Hamed Pirsiavash Deva Ramanan Charless Fowlkes

2

Introduction

ny

nf

nx

Reshape

(:)

Features Feature vector

A window on

input image

x =

...

nynxnf×1

� Linear model for visual recognition

1

3

Introduction

� Linear classifier

� Learn a template

� Apply it to all possible windows

Template

(Model)Feature

vector

wTx > 0

...

T

...

> 0

4

Introduction

ny

nf

nx

Features

Reshape

nf

X =

. . ....

nynx×nf

n f

n f

ns := nxny

. . ....

ns×nf

=

. . ....

ns×d

×[... . . .

]

d×nf

W Ws WTf

W =WsWTf

d ≤ min(ns, nf )

d

5

Introduction

� Motivation for bilinear models

� Reduced rank: less number of parameters

� Better generalization: reduced over-fitting

� Run-time efficiency

� Transfer learning

� Share a subset of parameters between different but related tasks

6

Outline

� Introduction

� Sliding window classifiers

� Bilinear model and its motivation

� Extension

� Related work

� Experiments� Pedestrian detection

� Human action classification

� Conclusion

7

Sliding window classifiers

� Extract some visual features from a spatio-temporal window� e.g., histogram of gradients (HOG) in Dalal and Triggs’

method

� Train a linear SVM using annotated positive and negative instances

� Detection: evaluate the model on all possible windows in space-scale domain

� Use convolution since the model is linear

wTx > 0

minw L(w) =1

2wTw +C

∑nmax(0, 1− ynw

Txn)

8

Sliding window classifiers

Sample image FeaturesSample template W

9

Bilinear model (Definition)

� Visual data are better modeled as matrices/tensors rather than vectors

� Why not use the matrix structure

10


ny

nf

nx

Features

Reshape

nf

nf

Tr(

. . ....

T

. . ....

) =

...

T

...

nf

Tr(WTX) = wTx

Feature XModel WTr(WTX) > 0 wTx > 0

X =

. . ....

ns×nf

ns := nxny

w x

11


� Bilinear model

wTx > 0

W

n f d

n f

Ws WTf

W =WsWTf

f(X) = Tr(WfWTs X)

d ≤ min(ns, nf )

. . ....

ns×nf

=

. . ....

ns×d

×[... . . .

]

d×nf

f(x) > 0

12


� Bilinear model

wTx > 0

W

n f d

n f

Ws WTf

W =WsWTf

f(X) = Tr(WfWTs X)

d ≤ min(ns, nf )

. . ....

ns×nf

=

. . ....

ns×d

×[... . . .

]

d×nf

Bilinear in Wf and Ws

f(x) > 0

f(X) = Tr(WTs XWf )

13


� Bilinear model

wTx > 0

W

n f d

n f

Ws WTf

W =WsWTf

f(X) = Tr(WfWTs X)

d ≤ min(ns, nf )

. . ....

ns×nf

=

. . ....

ns×d

×[... . . .

]

d×nf

Bilinear in Wf and Ws

f(x) > 0

f(X) = Tr(WTs XWf )

f(X) = Tr(WTX)Was Linear:

14

Bilinear model (Learning)

� Linear SVM for a given set of training pairs {Xn, yn}

Regularizer Constraints

Parameter

Objective function

minW L(W ) = 1

2Tr(WTW ) + C

∑nmax(0, 1− ynTr(W

TXn))

15


� Linear SVM for a given set of training pairs {Xn, yn}

Regularizer Constraints

Parameter

Objective functionW :=WsW

Tf

minW L(W ) = 1

2Tr(WTW ) + C


TXn))

16


� For Bilinear formulation

� Biconvex so solve by coordinate decent

� By fixing one set of parameters, it’s a typical SVM problem (with a change of basis)

� Use off-the-shelf SVM solver in the loop

minL(Ws,Wf ) =1

2Tr(WfW

Ts WsW

Tf )+C

∑nmax(0, 1−ynTr(WfW

Ts Xn))

17

Motivation

� Regularization� Similar to PCA, but not orthogonal and learned discriminatively and jointly with the template

� Run-time efficiency� convolutions instead of d nf

d ≤ min(ns, nf )

SubspaceReduced dimensional Template

W =WsWTf

18

Motivation

� Transfer learning

� Share the subspace between different problems

� e.g human detector and cat detector

� Optimize the summation of all objective functions

� Learn a good subspace using all data

Wf

W =WsWTf

19

Extension

� Multi-linear

� High-order tensors

� instead of just

� For 1D feature

� Separable filter for (Rank=1)

� Spatio-temporal templates

L(Ws,Wf)

L(Wx,Wy,Wt,Wf)

L(Wx,Wy)

L(Wx,Wy,Wf )

20

Related work (Rank restriction)

� Bilinear models � Often used in increasing the flexibility; however, we use

them to reduce the parameters.� Mostly used in generative models like density estimation and

we use in classification

� Soft Rank restriction� They used rather than in SVM to

regularize on rank� Convex, but not easy to solve� Decrease summation of eigen values instead of the number of non-zero eigen values (rank)

� Wolf et al (CVPR’07)� Used a formulation similar to ours with hard rank restriction� Showed results only for soft rank restriction� Used it only for one task (Didn’t consider multi-task learning)

Tr(W ) Tr(WTW )

21

Related work(Transfer learning)

� Dates back to at least Caruana’s work (1997)� We got inspired by their work on multi-task learning

� Worked on: Back-propagation nets and k-nearest neighbor

� Ando and Zhang’s work (2005)� Linear model

� All models share a component in low-dimensional subspace (transfer)

� Use the same number of parameters

22

Experiments: Pedestrian detection

� Baseline: Dalal and Triggs’ spatio-temporal classifier (ECCV’06)

� Linear SVM on features: (84 for each cell)

� Histogram of gradient (HOG)

� Histogram of optical flow

� Made sure that the spatiotemporal is better than the static one by modifying the learning method

8×8

23


� Dataset: INRIA motion and INRIA static � 3400 video frame pairs

� 3500 static images

� Typical values:�

� Evaluation� Average precision

� Initialize with PCA in feature space

� Ours is 10 times faster

ns = 14× 6, nf = 84, d = 5 or 10

24


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Recall

Pre

cis

ion

Prec/Rec curve

Bilinear AP = 0.795

Baseline AP = 0.765

PCA AP = 0.698

25


26


Link

27

Experiments: Human action classification 1

� 1 vs all action templates� Voting:

� A second SVM on confidence values

� Dataset:� UCF Sports Action (CVPR 2008)� They obtained 69.2%� We got 64.8% but

� More classes: 12 classes rather than 9� Smaller dataset: 150 videos rather than 200� Harder evaluation protocol: 2-fold vs. LOOCV� 87 training examples rather than 199 in their case

28

UCF action Results: PCA (0.444)

Dive−Side

Golf−Back

Golf−Front

Golf−Side

Kick−Front

Kick−Side

Ride−Horse

Run−Side

Skate−Front

Swing−Bench

Swing−Side

Walk−Front

Div

e−Sid

e

Gol

f−Bac

k

Gol

f−Fro

nt

Gol

f−Sid

e

Kick−

Front

Kick−

Side

Rid

e−H

orse

Run

−Sid

e

Skate

−Fro

nt

Swin

g−Ben

ch

Swin

g−Sid

e

Wal

k−Fro

nt

29

UCF action Results: Linear (0.518)

Dive−Side

Golf−Back

Golf−Front

Golf−Side

Kick−Front

Kick−Side

Ride−Horse

Run−Side

Skate−Front

Swing−Bench

Swing−Side

Walk−Front

Div

e−Sid

e

Gol

f−Bac

k

Gol

f−Fro

nt

Gol

f−Sid

e

Kick−

Front

Kick−

Side

Rid

e−H

orse

Run

−Sid

e

Skate

−Fro

nt

Swin

g−Ben

ch

Swin

g−Sid

e

Wal

k−Fro

nt

30

UCF action Results: Bilinear (0.648)

Dive−Side

Golf−Back

Golf−Front

Golf−Side

Kick−Front

Kick−Side

Ride−Horse

Run−Side

Skate−Front

Swing−Bench

Swing−Side

Walk−Front

Div

e−Sid

e

Gol

f−Bac

k

Gol

f−Fro

nt

Gol

f−Sid

e

Kick−

Front

Kick−

Side

Rid

e−H

orse

Run

−Sid

e

Skate

−Fro

nt

Swin

g−Ben

ch

Swin

g−Sid

e

Wal

k−Fro

nt

31

UCF action Results

Dive−Side

Golf−Back

Golf−Front

Golf−Side

Kick−Front

Kick−Side

Ride−Horse

Run−Side

Skate−Front

Swing−Bench

Swing−Side

Walk−Front

Div

e−Sid

e

Gol

f−Bac

k

Gol

f−Fro

nt

Gol

f−Sid

e

Kick−

Front

Kick−

Side

Rid

e−H

orse

Run

−Sid

e

Skate

−Fro

nt

Swin

g−Ben

ch

Swin

g−Sid

e

Wal

k−Fro

nt

Dive−Side

Golf−Back

Golf−Front

Golf−Side

Kick−Front

Kick−Side

Ride−Horse

Run−Side

Skate−Front

Swing−Bench

Swing−Side

Walk−Front

Div

e−Sid

e

Gol

f−Bac

k

Gol

f−Fro

nt

Gol

f−Sid

e

Kick−

Front

Kick−

Side

Rid

e−H

orse

Run

−Sid

e

Skate

−Fro

nt

Swin

g−Ben

ch

Swin

g−Sid

e

Wal

k−Fro

nt

Dive−Side

Golf−Back

Golf−Front

Golf−Side

Kick−Front

Kick−Side

Ride−Horse

Run−Side

Skate−Front

Swing−Bench

Swing−Side

Walk−Front

Div

e−Sid

e

Gol

f−Bac

k

Gol

f−Fro

nt

Gol

f−Sid

e

Kick−

Front

Kick−

Side

Rid

e−H

orse

Run

−Sid

e

Skate

−Fro

nt

Swin

g−Ben

ch

Swin

g−Sid

e

Wal

k−Fro

nt

PCA on features (0.444)Linear (0.518)

(Not always feasible)Bilinear (0.648)

32

UCF action Results

33

Experiments: Human action classification 2

� Transfer

� We used only two examples for each of 12 action classes

� Once trained independently

� Then trained jointly

� Shared the subspace

� Adjusted the C parameter for best result

34

Transfer

PCA to initialize

TrainTrain W 1

s Train W 2

sWms

Train W 1

f Train TrainW 2

fWmf

W =WsWTf

35

Transfer

PCA to initialize

TrainTrain W 1

s Train W 2

sWms

Train W 1

f Train TrainW 2

fWmfTrain Wf

W =WsWTf

minWf

∑m

i=1 L(Wf ,Wis)

36

Results: Transfer

0.3560.269Joint

bilinear (C=.1)

0.2890.222Independent bilinear (C=.01)

Coordinate decent

iteration 2

Coordinate decent

iteration 1

Average classification rate

37

Results: Transfer (for “walking”)

Iteration 1 Refined at Iteration 2

38

Conclusion

� Introduced multi-linear classifiers� Exploit natural matrix/tensor representation of spatio-

temporal data

� Trained with existing efficient linear solvers

� Shared subspace for different problems� A novel form of transfer learning

� Got better performance and about 10X speed up in run-time compared to the linear classifier.

� Easy to apply to most high dimensional features (instead of dimensionality reduction methods like PCA)

� Simple: ~ 20 lines of Matlab code

39

Thanks!

40

Bilinear model (Learning details)

� Linear SVM for a given set of training pairs

� For Bilinear formulation

� It is biconvex so solve by coordinate decent

{xn, yn}

minW L(W ) = 1

2Tr(WTW ) + C


TXn))

minL(Wf ,Ws) =1

2Tr(WfW

Ts WsW

Tf )+C

∑nmax(0, 1−ynTr(WfW

Ts Xn))

41

Bilinear model (Learning details)

� Each coordinate descent iteration:

freeze

where

then freeze

where

minW̃fL(W̃f ,Ws) =

1

2(W̃ T

f W̃f ) + C∑

nmax(0, 1− ynTr(W̃Tf X̃n))

minW̃sL(Wf , W̃s) =

1

2(W̃ T

s W̃s) + C∑

nmax(0, 1− ynTr(W̃Ts X̃n))

Ws

Wf

W̃f = A1

2

sWTf , X̃n = A

−1

2

s WTs Xn, As = WT

s Ws

W̃s = WsA1

2

f , X̃n = XnWfA−1

2

f , Af = WTf Wf

bilinear classifiers for visual recognitionhpirsiav/papers/bilinear_nips... · 2013. 5. 9. ·...

Documents