3d2pm’&3d&deformable&part&models · 2015. 9. 17. ·...

3D2PM -‐ 3D Deformable Part ModelsBojan Pepik1 Peter Gehler2 Michael Stark1,3 Bernt Schiele1

1Max Planck Ins;tute for Informa;cs, Saarbrücken, Germany 2Max Planck Ins;tute for Intelligent Systems, Tübingen, Germany 3Stanford University

3D Deformable Part ModelsMo3va3on• Objects are inherently 3-‐dimensional• 3D object representa;ons provide:‣ Compact and accurate approxima;on of the

physical world• Higher level vision tasks can benefit from

expressive object detectors:‣ Angular accurate viewpoints‣ 3D parts consistent across views

• State-‐of-‐the-‐art detectors are modeled in 2D• 3D object detectors lack detec;on performance

Contribu3ons

➡ 3D version of the Deformable Part Model [2] capable of: ‣ Richer object hypotheses (beyond 2D BB)‣ Robust matching to image evidence

➡ Richer object hypotheses:‣ Viewpoint es;ma;on of arbitrary granularity‣ Consistent parts across views

➡ Favorable performance:‣ State-‐of-‐the-‐art viewpoint es;ma;on results‣ Compe;;ve 2D object localiza;on results

➡ Jointly op>mize for object localiza;on and con;nuous viewpoint es;ma;on

Experiments

Three dimensional displacement model

. . . N (pVj |µVj ,⌃

Vj )N (p1j |µ1

j ,⌃1j ) N (p2j |µ2

j ,⌃2j )

N (pj |µj ,⌃j). . .

• Compact part displacement distribu;on in 3D‣ 6 parameters per part compared to 4M in DPM [1,2]

• In view projected part distribu;on p(pvj

|pvo

) = N (pvj

|Qv

j

µj

, (Qv

j

)⌃j

(Qv

j

)T )

p(pj

|po

) = N (pj

|µj

,⌃j

)orthographic projec3on

• Structured output SVM with margin rescaling • Jointly address object localiza;on and viewpoint es;ma;on �V P (y, y) =

\(yv,yv)180��V OC(y, y) = 1� yb \ yb

yb [ yb

�(y, y) = ↵�V OC(y, y) + (1� ↵)�V P (y, y)

Fine-‐grained viewpoint es3ma3on (angular viewpoint es3ma3on)3D2PM – 3D Deformable Part Models 11

5102022.530

45 8121618

36

5

10

15

20

#components

Viewpoint Estimation error

angular resolution

angu

lar e

rror

5102022.5

30

45 812

1618

36

5

10

15

20

#components

Viewpoint Estimation error

angular resolution

angu

lar e

rror

Fig. 3. Graphical representation of viewpoint classification results, left - linear inter-polation, right - exponential. The number of components is the number bins.

AP/MAE at 5� at 10� at 20� at 22.5� at 30� at 45�

3D2PM-C Lin 8 96.8 / 11.1 95.6 / 12.0 96.1 / 11.7 96.9 / 12.6 97.3 / 13.1 97.8 / 12.63D2PM-C Lin 12 98.5 / 7.8 98.6 / 8.0 95.9 / 8.7 97.9 / 8.3 98.3 / 8.5 97.2 / 13.33D2PM-C Lin 16 97.8 / 6.9 98.2 / 7.1 96.8 / 8.5 97.5 / 7.4 97.6 / 8.3 95.0 / 13.83D2PM-C Lin 18 99.1 / 5.6 99.0 / 5.9 99.3 / 6.3 98.6 / 7.2 98.5 / 8.3 97.0 / 12.93D2PM-C Lin 36 99.2 / 4.7 98.8 / 5.1 98.5 / 6.1 98.2 / 7.1 98.0 / 8.0 97.5 / 12.2

Table 4. Fine-grained viewpoint estimation in MAE [26] (EPFL dataset).

Comparing linear vs. exponential appearance interpolation, both models achievecomparable performance on the finer viewpoint resolution levels (5�, 10�, 20�,22.5�). However, for coarser viewpoint resolution and wider viewpoint spacingamong the bins, exponential interpolation provides worse results than linear in-terpolation.

Summary . While the 3D2PM-Cmodel is a simple continuous model, it achievesgood performance even when starting from wide angular spacing among themodel viewpoint (appearance) bins.

3.4 CAD vs. real image data

We want to explore the performance impact of using CAD data, as they haveunrealistic appearance but perfect viewpoint annotation. Thus we train modelson synthetic data only (synthetic), on real & synthetic (mixed) and on real datawhere we use CAD data only for model initialization. We do experiments with3D2PM-Dand 3D2PM-Cmodels with 8, 18 and 36 bins on the EPFL dataset.

Tab. 5 gives the result. Synthetic models with 36 bins achieve very goodviewpoint classification performance of 5.9� and 7.9� for 3D2PM-Cand 3D2PM-D, respectively, while also achieving good detection results of 96.3% and 95.2%AP. Adding real data (mixed) leads to improved results of 5.8� MAE for 3D2PM-Dand 5.1� MAE for 3D2PM-C, while real data only with 6.4� MAE and 5.6�

performs worse, speaking in favor of using CAD data with accurate annotation.

3.5 Coarse-to-fine viewpoint inference

As we go towards arbitrarily fine viewpoint estimation with 3D2PM-C, we in-crease the number of model evaluations for a given position and viewpoint

Med

ian angular e

rror

angular resolu;on #components

➡ 3D2PM model outperforms 3D object models by large margin

➡ On par performance to 2D object models

Coarse-‐grained viewpoint es3ma3on (viewpoint classifica3on)

0

5

10

15

20

25

30

8 viewpoint bins 16 viewpoint bins 36 viewpoint bins

4.77.5

9.6

4.76.9

11.1

5.87.2

12.9

24.8

Med

ian angular e

rror

Glasner et al [5] 3d^2PM-‐D 3D^2PM-‐C Lin 3D^2PM-‐C Exp

-‐20.1°

Object bounding box localiza3on results

0

20

40

60

80

100

Cars Bicycles

AP

Glasner et al [5] 3D^2PMDPM-‐3D-‐Const [1] DPM-‐VOCVP [1]

Pascal VOC 2007

+29.2%

50

60

70

80

90

100

Cars Bicycles

Zia et al [4] Glasner et al [5] Liebelt et al [3]3D^2PM DPM-‐VOCVP [1]

3D object classes

+4.7%

+24.3%

Ultra-‐wide baseline matching (quality of part correspondences)

➡ Unlike the state-of-the-art 3D models, 3D2PMs achieve detection performance on par with the 2D object models

0

40

80

45 90 135 180 AVG% correct F m

atrices Zia et al [4] DPM-‐3D-‐Const [1] 3D^2PM

3D pose es3ma3on results

• Deformable part model [2]‣ Mixture of star CRFs in 2D space‣ Parts are independent across components‣ Discrete appearance model‣ Op;mized for 2D bounding box localiza;on

...... ...

• 3D2PM‣ A star CRF in 3D space‣ Parts are in 3D and linked ‣ Con;nuous appearance model‣ Op;mized for object detec;on and viewpoint es;ma;on

pj = (xj , yj , zj)

......

�M

�1

3D Parts

3D Parts

s

Root

appe

arance

Part

appe

arance

Suppor;ng view K -‐ 1 Suppor;ng view K Interpolated root and part appearances

• Linear and exponen;al appearance interpola;on scheme• The model can synthesize infinitely many components without the need to learn them all• Allows arbitrary fine viewpoint es;ma;on• Faster inference w.r.t. to brute-‐force

Con3nuous appearance model

Model training

3D part inference

h�, (I1, y1, h1)i h�, (IM , yM , hM )i+...+

• Part inference in 3D per object instance ‣ Across all views of a given object instance

in which the part is visible in‣ Projected parts are observed by viewpoint-‐

specific model instan;a;ons

Discrete 3D part state space

3D OBJECT CLASSES DATASET (8 discrete viewpoint classes)

50

60

70

80

90

100

Cars Bicycles

MPP

E

Liebelt et al [3] Zia et al [4]Glasner et al [5] 3D^2PM-‐D

+10.2%+20.5%

3D object modelsCars Bicycles

Payet et al [6] Lopez et al [7]DPM-‐3D-‐Const. [1] DPM-‐VOCVP [1]

2D object models

EPFL MULTI-‐VIEW CARS DATASET (angular viewpoint annota3ons) ➡ State-of-the-art multi-view models can output discrete set of viewpoints

➡ Our 3D2PM-C can provide fine viewpoint estimates

➡ 3D2PM-C state-of-the-art on angular viewpoint estimation

AP /MAE at 5° #atomic opera3ons

3D2PM-‐C b36 full inference 99.2 / 4.7 2.20 x 1010

3D2PM-‐C b36 coarse to fine 99.0 / 7.0 0.48 x 1010

3D2PM-‐C b12 97.6 / 7.5 2.20 x 1010

3D2PM-‐C b18 98.0 / 6.9 2.20 x 1010

x5

• Coarse to fine inference

➡ x5 faster inference at minimal loss

References

+12.2%➡ 3D2PM due to the 3D model gives

accurate part correspondences➡ 3D2PM state-of-the-art on ultra-wide

baseline matching experiment

[1] B. Pepik, M. Stark, P. Gehler, B. Schiele Teaching 3D Geometry to Deformable Part models CVPR’12[2] P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan Object Detec3on with Discrimina3vely Trained Part Based Models PAMI’10[3] J. Liebelt, C. Schmid Mul3-‐view Object Class Detec3on With A 3D Geometric Model CVPR’10[4] M. Z. Zia, M. Stark, B. Schiele, and K. Schindler. Revisi3ng 3D Geometric Models for Accurate and Object Shape and Pose 3DRR’11[5] D. Glasner, M. Galun, S. Alpert, R. Basri, G. Shakhnarovich Viewpoint-‐Aware Object Detec3on and Pose Es3ma3on ICCV’11[6] N. Payet, S. Todorovic, From Contours to 3D Object Detec3on and Pose Es3ma3on ICCV’11[7] R. J. López-‐Sastre, T. Tuytelaars, S. Savarese. Deformable Part Models Revisited: A Performance Evalua3on for Object Category Pose Es3ma3on CORP’11

Acknowledgements: This work has been supported by the Max Planck Center for Visual Compu;ng and Communica;on. We thank M. Zeeshan Zia for his help in conduc;ng wide baseline matching experiments.

*Note the color coded part correspondences

Monday, December 3, 12

http://www.wisdom.weizmann.ac.il/~glasner/

http://www.wisdom.weizmann.ac.il/~glasner/

http://www.wisdom.weizmann.ac.il/~meirav/

http://www.wisdom.weizmann.ac.il/~meirav/

http://www.wisdom.weizmann.ac.il/~alpertsh/

http://www.wisdom.weizmann.ac.il/~alpertsh/

http://www.wisdom.weizmann.ac.il/~ronen

http://www.wisdom.weizmann.ac.il/~ronen

http://ttic.uchicago.edu/~gregory

http://ttic.uchicago.edu/~gregory

3d2pm’&3d&deformable&part&models · 2015. 9. 17. ·...

Documents