3d2pm’&3d&deformable&part&models · 2015. 9. 17. ·...

1
3D 2 PM 3D Deformable Part Models Bojan Pepik 1 Peter Gehler 2 Michael Stark 1,3 Bernt Schiele 1 1 Max Planck Ins;tute for Informa;cs, Saarbrücken, Germany 2 Max Planck Ins;tute for Intelligent Systems, Tübingen, Germany 3 Stanford University 3D Deformable Part Models Mo3va3on Objects are inherently 3dimensional 3D object representa;ons provide: Compact and accurate approxima;on of the physical world Higher level vision tasks can benefit from expressive object detectors: Angular accurate viewpoints 3D parts consistent across views Stateoftheart detectors are modeled in 2D 3D object detectors lack detec;on performance Contribu3ons 3D version of the Deformable Part Model [2] capable of: Richer object hypotheses (beyond 2D BB) Robust matching to image evidence Richer object hypotheses: Viewpoint es;ma;on of arbitrary granularity Consistent parts across views Favorable performance: Stateoftheart viewpoint es;ma;on results Compe;;ve 2D object localiza;on results Jointly op>mize for object localiza;on and con;nuous viewpoint es;ma;on Experiments Three dimensional displacement model ... N (p V j |μ V j , V j ) N (p 1 j |μ 1 j , 1 j ) N (p 2 j |μ 2 j , 2 j ) N (p j |μ j , j ) ... Compact part displacement distribu;on in 3D 6 parameters per part compared to 4M in DPM [1,2] In view projected part distribu;on p(p v j |p v o )= N (p v j |Q v j μ j , (Q v j )j (Q v j ) T ) p(p j |p o )= N (p j |μ j , j ) orthographic projec3on Structured output SVM with margin rescaling Jointly address object localiza;on and viewpoint es;ma;on Δ VP (y, ¯ y )= \(y v , ¯ y v ) 180 Δ V OC (y, ¯ y )=1 - y b \ ¯ y b y b [ ¯ y b Δ(y, ¯ y )= Δ V OC (y, ¯ y ) + (1 - )Δ VP (y, ¯ y ) Finegrained viewpoint es3ma3on (angular viewpoint es3ma3on) 5 10 20 22.5 30 45 8 12 16 18 36 5 10 15 20 #componen Median angular error angular resolu;on #components 3D 2 PM model outperforms 3D object models by large margin On par performance to 2D object models Coarsegrained viewpoint es3ma3on (viewpoint classifica3on) 0 5 10 15 20 25 30 8 viewpoint bins 16 viewpoint bins 36 viewpoint bins 4.7 7.5 9.6 4.7 6.9 11.1 5.8 7.2 12.9 24.8 Median angular error Glasner et al [5] 3d^2PMD 3D^2PMC Lin 3D^2PMC Exp 20.1° Object bounding box localiza3on results 0 20 40 60 80 100 Cars Bicycles AP Glasner et al [5] 3D^2PM DPM3DConst [1] DPMVOCVP [1] Pascal VOC 2007 +29.2% 50 60 70 80 90 100 Cars Bicycles Zia et al [4] Glasner et al [5] Liebelt et al [3] 3D^2PM DPMVOCVP [1] 3D object classes +4.7% +24.3% Ultrawide baseline matching (quality of part correspondences) Unlike the state-of-the-art 3D models, 3D 2 PMs achieve detection performance on par with the 2D object models 0 40 80 45 90 135 180 AVG % correct F matrices Zia et al [4] DPM3DConst [1] 3D^2PM 3D pose es3ma3on results Deformable part model [2] Mixture of star CRFs in 2D space Parts are independent across components Discrete appearance model Op;mized for 2D bounding box localiza;on ... ... ... 3D 2 PM A star CRF in 3D space Parts are in 3D and linked Con;nuous appearance model Op;mized for object detec;on and viewpoint es;ma;on p j =(x j ,y j ,z j ) ... ... β M β 1 3D Parts 3D Parts s Root appearance Part appearance Suppor;ng view K1 Suppor;ng view K Interpolated root and part appearances Linear and exponen;al appearance interpola;on scheme The model can synthesize infinitely many components without the need to learn them all Allows arbitrary fine viewpoint es;ma;on Faster inference w.r.t. to bruteforce Con3nuous appearance model Model training 3D part inference hβ , (I 1 ,y 1 ,h 1 )i hβ , (I M ,y M ,h M )i +...+ Part inference in 3D per object instance Across all views of a given object instance in which the part is visible in Projected parts are observed by viewpoint specific model instan;a;ons Discrete 3D part state space 3D OBJECT CLASSES DATASET (8 discrete viewpoint classes) 50 60 70 80 90 100 Cars Bicycles MPPE Liebelt et al [3] Zia et al [4] Glasner et al [5] 3D^2PMD +10.2% +20.5% 3D object models Cars Bicycles Payet et al [6] Lopez et al [7] DPM3DConst. [1] DPMVOCVP [1] 2D object models EPFL MULTIVIEW CARS DATASET (angular viewpoint annota3ons) State-of-the-art multi-view models can output discrete set of viewpoints Our 3D 2 PM-C can provide fine viewpoint estimates 3D 2 PM-C state-of-the-art on angular viewpoint estimation AP /MAE at 5° #atomic opera3ons 3D 2 PMC b36 full inference 99.2 / 4.7 2.20 x 10 10 3D 2 PMC b36 coarse to fine 99.0 / 7.0 0.48 x 10 10 3D 2 PMC b12 97.6 / 7.5 2.20 x 10 10 3D 2 PMC b18 98.0 / 6.9 2.20 x 10 10 x5 Coarse to fine inference x5 faster inference at minimal loss References +12.2% 3D 2 PM due to the 3D model gives accurate part correspondences 3D 2 PM state-of-the-art on ultra-wide baseline matching experiment [1] B. Pepik, M. Stark, P. Gehler, B. Schiele Teaching 3D Geometry to Deformable Part models CVPR’12 [2] P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan Object Detec3on with Discrimina3vely Trained Part Based Models PAMI’10 [3] J. Liebelt, C. Schmid Mul3view Object Class Detec3on With A 3D Geometric Model CVPR’10 [4] M. Z. Zia, M. Stark, B. Schiele, and K. Schindler. Revisi3ng 3D Geometric Models for Accurate and Object Shape and Pose 3DRR’11 [5] D. Glasner, M. Galun, S. Alpert, R. Basri, G. Shakhnarovich ViewpointAware Object Detec3on and Pose Es3ma3on ICCV’11 [6] N. Payet, S. Todorovic, From Contours to 3D Object Detec3on and Pose Es3ma3on ICCV’11 [7] R. J. LópezSastre, T. Tuytelaars, S. Savarese. Deformable Part Models Revisited: A Performance Evalua3on for Object Category Pose Es3ma3on CORP’11 Acknowledgements: This work has been supported by the Max Planck Center for Visual Compu;ng and Communica;on. We thank M. Zeeshan Zia for his help in conduc;ng wide baseline matching experiments. *Note the color coded part correspondences Monday, December 3, 12

Upload: others

Post on 29-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 3D2PM’&3D&Deformable&Part&Models · 2015. 9. 17. · 3D2PM’&3D&Deformable&Part&Models Bojan&Pepik1&&&Peter&Gehler2&&&Michael&Stark1,3&&&BerntSchiele 1& 1Max&Planck&Ins;tute&for&Informacs,&Saarbrücken,&Germany

3D2PM  -­‐  3D  Deformable  Part  ModelsBojan  Pepik1      Peter  Gehler2      Michael  Stark1,3      Bernt  Schiele1  

1Max  Planck  Ins;tute  for  Informa;cs,  Saarbrücken,  Germany                    2Max  Planck  Ins;tute  for  Intelligent  Systems,  Tübingen,  Germany                    3Stanford  University

3D  Deformable  Part  ModelsMo3va3on• Objects  are  inherently  3-­‐dimensional• 3D  object  representa;ons  provide:‣ Compact  and  accurate  approxima;on  of  the  

physical  world• Higher  level  vision  tasks  can  benefit  from  

expressive  object  detectors:‣ Angular  accurate  viewpoints‣ 3D  parts  consistent  across  views

• State-­‐of-­‐the-­‐art  detectors  are  modeled  in  2D• 3D  object  detectors  lack  detec;on  performance

Contribu3ons

➡ 3D  version  of  the  Deformable  Part  Model  [2]  capable  of:  ‣ Richer  object  hypotheses  (beyond  2D  BB)‣ Robust  matching  to  image  evidence

➡ Richer  object  hypotheses:‣ Viewpoint  es;ma;on  of  arbitrary  granularity‣ Consistent  parts  across  views

➡ Favorable  performance:‣ State-­‐of-­‐the-­‐art  viewpoint  es;ma;on  results‣ Compe;;ve  2D  object  localiza;on  results

➡ Jointly  op>mize  for  object  localiza;on  and  con;nuous  viewpoint  es;ma;on

Experiments

Three  dimensional  displacement  model

. . . N (pVj |µVj ,⌃

Vj )N (p1j |µ1

j ,⌃1j ) N (p2j |µ2

j ,⌃2j )

N (pj |µj ,⌃j). . .

• Compact  part  displacement  distribu;on  in  3D‣ 6  parameters  per  part  compared  to  4M  in  DPM  [1,2]

• In  view  projected  part  distribu;on p(pvj

|pvo

) = N (pvj

|Qv

j

µj

, (Qv

j

)⌃j

(Qv

j

)T )

p(pj

|po

) = N (pj

|µj

,⌃j

)orthographic  projec3on

• Structured  output  SVM  with  margin  rescaling  • Jointly  address  object  localiza;on  and  viewpoint  es;ma;on �V P (y, y) =

\(yv,yv)180��V OC(y, y) = 1� yb \ yb

yb [ yb

�(y, y) = ↵�V OC(y, y) + (1� ↵)�V P (y, y)

Fine-­‐grained  viewpoint  es3ma3on    (angular  viewpoint  es3ma3on)3D2PM – 3D Deformable Part Models 11

5102022.530

45 8121618

36

5

10

15

20

#components

Viewpoint Estimation error

angular resolution

angu

lar e

rror

5102022.5

30

45 812

1618

36

5

10

15

20

#components

Viewpoint Estimation error

angular resolution

angu

lar e

rror

Fig. 3. Graphical representation of viewpoint classification results, left - linear inter-polation, right - exponential. The number of components is the number bins.

AP/MAE at 5� at 10� at 20� at 22.5� at 30� at 45�

3D2PM-C Lin 8 96.8 / 11.1 95.6 / 12.0 96.1 / 11.7 96.9 / 12.6 97.3 / 13.1 97.8 / 12.63D2PM-C Lin 12 98.5 / 7.8 98.6 / 8.0 95.9 / 8.7 97.9 / 8.3 98.3 / 8.5 97.2 / 13.33D2PM-C Lin 16 97.8 / 6.9 98.2 / 7.1 96.8 / 8.5 97.5 / 7.4 97.6 / 8.3 95.0 / 13.83D2PM-C Lin 18 99.1 / 5.6 99.0 / 5.9 99.3 / 6.3 98.6 / 7.2 98.5 / 8.3 97.0 / 12.93D2PM-C Lin 36 99.2 / 4.7 98.8 / 5.1 98.5 / 6.1 98.2 / 7.1 98.0 / 8.0 97.5 / 12.2

Table 4. Fine-grained viewpoint estimation in MAE [26] (EPFL dataset).

Comparing linear vs. exponential appearance interpolation, both models achievecomparable performance on the finer viewpoint resolution levels (5�, 10�, 20�,22.5�). However, for coarser viewpoint resolution and wider viewpoint spacingamong the bins, exponential interpolation provides worse results than linear in-terpolation.

Summary . While the 3D2PM-Cmodel is a simple continuous model, it achievesgood performance even when starting from wide angular spacing among themodel viewpoint (appearance) bins.

3.4 CAD vs. real image data

We want to explore the performance impact of using CAD data, as they haveunrealistic appearance but perfect viewpoint annotation. Thus we train modelson synthetic data only (synthetic), on real & synthetic (mixed) and on real datawhere we use CAD data only for model initialization. We do experiments with3D2PM-Dand 3D2PM-Cmodels with 8, 18 and 36 bins on the EPFL dataset.

Tab. 5 gives the result. Synthetic models with 36 bins achieve very goodviewpoint classification performance of 5.9� and 7.9� for 3D2PM-Cand 3D2PM-D, respectively, while also achieving good detection results of 96.3% and 95.2%AP. Adding real data (mixed) leads to improved results of 5.8� MAE for 3D2PM-Dand 5.1� MAE for 3D2PM-C, while real data only with 6.4� MAE and 5.6�

performs worse, speaking in favor of using CAD data with accurate annotation.

3.5 Coarse-to-fine viewpoint inference

As we go towards arbitrarily fine viewpoint estimation with 3D2PM-C, we in-crease the number of model evaluations for a given position and viewpoint

Med

ian  angular  e

rror

angular  resolu;on #components

➡ 3D2PM model outperforms 3D object models by large margin

➡ On par performance to 2D object models

Coarse-­‐grained  viewpoint  es3ma3on  (viewpoint  classifica3on)

0

5

10

15

20

25

30

8  viewpoint  bins 16  viewpoint  bins 36  viewpoint  bins

4.77.5

9.6

4.76.9

11.1

5.87.2

12.9

24.8

Med

ian  angular  e

rror

Glasner  et  al  [5] 3d^2PM-­‐D 3D^2PM-­‐C  Lin 3D^2PM-­‐C  Exp

-­‐20.1°

Object  bounding  box  localiza3on  results

0

20

40

60

80

100

Cars Bicycles

AP

Glasner  et  al  [5] 3D^2PMDPM-­‐3D-­‐Const  [1] DPM-­‐VOCVP  [1]

Pascal  VOC  2007

+29.2%

50

60

70

80

90

100

Cars Bicycles

Zia  et  al  [4] Glasner  et  al  [5] Liebelt  et  al  [3]3D^2PM DPM-­‐VOCVP  [1]

3D  object  classes

+4.7%

+24.3%

Ultra-­‐wide  baseline  matching  (quality  of  part  correspondences)

➡ Unlike the state-of-the-art 3D models, 3D2PMs achieve detection performance on par with the 2D object models

0

40

80

45 90 135 180 AVG%  correct  F  m

atrices Zia  et  al  [4] DPM-­‐3D-­‐Const  [1] 3D^2PM

3D  pose  es3ma3on  results

• Deformable  part  model  [2]‣ Mixture  of  star  CRFs  in  2D  space‣ Parts  are  independent  across  components‣ Discrete  appearance  model‣ Op;mized  for  2D  bounding  box  localiza;on

...... ...

• 3D2PM‣ A  star  CRF  in  3D  space‣ Parts  are  in  3D                                                                            and  linked  ‣ Con;nuous  appearance  model‣ Op;mized  for  object  detec;on  and  viewpoint  es;ma;on

pj = (xj , yj , zj)

......

�M

�1

3D  Parts

3D  Parts

s

Root  

appe

arance

Part  

appe

arance

Suppor;ng  view  K  -­‐  1 Suppor;ng  view  K  Interpolated  root  and  part  appearances

• Linear  and  exponen;al  appearance  interpola;on  scheme• The  model  can  synthesize  infinitely  many  components  without  the  need  to  learn  them  all• Allows  arbitrary  fine  viewpoint  es;ma;on• Faster  inference  w.r.t.  to  brute-­‐force

Con3nuous  appearance  model

Model  training

3D  part  inference

h�, (I1, y1, h1)i h�, (IM , yM , hM )i+...+

• Part  inference  in  3D  per  object  instance  ‣ Across  all  views  of  a  given  object  instance  

in  which  the  part  is  visible  in‣ Projected  parts  are  observed  by  viewpoint-­‐

specific  model  instan;a;ons

Discrete  3D  part  state  space

3D  OBJECT  CLASSES  DATASET  (8  discrete  viewpoint  classes)

50

60

70

80

90

100

Cars Bicycles

MPP

E

Liebelt  et  al  [3] Zia  et  al  [4]Glasner  et  al  [5] 3D^2PM-­‐D

+10.2%+20.5%

3D  object  modelsCars Bicycles

Payet  et  al  [6] Lopez  et  al  [7]DPM-­‐3D-­‐Const.  [1] DPM-­‐VOCVP  [1]

2D  object  models

EPFL  MULTI-­‐VIEW  CARS  DATASET  (angular  viewpoint  annota3ons) ➡ State-of-the-art multi-view models can output discrete set of viewpoints

➡ Our 3D2PM-C can provide fine viewpoint estimates

➡ 3D2PM-C state-of-the-art on angular viewpoint estimation

AP  /MAE at  5° #atomic  opera3ons

3D2PM-­‐C  b36  full  inference 99.2  /  4.7 2.20  x  1010

3D2PM-­‐C  b36  coarse  to  fine 99.0  /  7.0 0.48  x  1010

3D2PM-­‐C  b12 97.6  /  7.5 2.20  x  1010

3D2PM-­‐C  b18 98.0  /  6.9 2.20  x  1010

x5

• Coarse  to  fine  inference

➡ x5 faster inference at minimal loss

References

+12.2%➡ 3D2PM due to the 3D model gives

accurate part correspondences➡ 3D2PM state-of-the-art on ultra-wide

baseline matching experiment

[1]  B.  Pepik,  M.  Stark,  P.  Gehler,  B.  Schiele  Teaching  3D  Geometry  to  Deformable  Part  models  CVPR’12[2]  P.  Felzenszwalb,  R.  Girshick,  D.  McAllester,  D.  Ramanan    Object  Detec3on  with  Discrimina3vely  Trained  Part  Based  Models  PAMI’10[3]  J.  Liebelt,  C.  Schmid  Mul3-­‐view  Object  Class  Detec3on  With  A  3D  Geometric  Model  CVPR’10[4]  M.  Z.  Zia,  M.  Stark,  B.  Schiele,  and  K.  Schindler.  Revisi3ng  3D  Geometric  Models  for  Accurate  and  Object  Shape  and  Pose  3DRR’11[5]  D.  Glasner,  M.  Galun,  S.  Alpert,  R.  Basri,  G.  Shakhnarovich  Viewpoint-­‐Aware  Object  Detec3on  and  Pose  Es3ma3on  ICCV’11[6]  N.  Payet,  S.  Todorovic,  From  Contours  to  3D  Object  Detec3on  and  Pose  Es3ma3on  ICCV’11[7]  R.  J.  López-­‐Sastre,  T.  Tuytelaars,  S.  Savarese.  Deformable  Part  Models  Revisited:  A  Performance  Evalua3on  for  Object  Category  Pose  Es3ma3on  CORP’11

Acknowledgements:  This  work  has  been  supported  by  the  Max  Planck  Center  for  Visual  Compu;ng  and  Communica;on.  We  thank  M.  Zeeshan  Zia  for  his  help  in  conduc;ng  wide  baseline  matching  experiments.  

*Note  the  color  coded  part  correspondences

Monday, December 3, 12