3d2pm’&3d&deformable&part&models · 2015. 9. 17. ·...
TRANSCRIPT
3D2PM -‐ 3D Deformable Part ModelsBojan Pepik1 Peter Gehler2 Michael Stark1,3 Bernt Schiele1
1Max Planck Ins;tute for Informa;cs, Saarbrücken, Germany 2Max Planck Ins;tute for Intelligent Systems, Tübingen, Germany 3Stanford University
3D Deformable Part ModelsMo3va3on• Objects are inherently 3-‐dimensional• 3D object representa;ons provide:‣ Compact and accurate approxima;on of the
physical world• Higher level vision tasks can benefit from
expressive object detectors:‣ Angular accurate viewpoints‣ 3D parts consistent across views
• State-‐of-‐the-‐art detectors are modeled in 2D• 3D object detectors lack detec;on performance
Contribu3ons
➡ 3D version of the Deformable Part Model [2] capable of: ‣ Richer object hypotheses (beyond 2D BB)‣ Robust matching to image evidence
➡ Richer object hypotheses:‣ Viewpoint es;ma;on of arbitrary granularity‣ Consistent parts across views
➡ Favorable performance:‣ State-‐of-‐the-‐art viewpoint es;ma;on results‣ Compe;;ve 2D object localiza;on results
➡ Jointly op>mize for object localiza;on and con;nuous viewpoint es;ma;on
Experiments
Three dimensional displacement model
. . . N (pVj |µVj ,⌃
Vj )N (p1j |µ1
j ,⌃1j ) N (p2j |µ2
j ,⌃2j )
N (pj |µj ,⌃j). . .
• Compact part displacement distribu;on in 3D‣ 6 parameters per part compared to 4M in DPM [1,2]
• In view projected part distribu;on p(pvj
|pvo
) = N (pvj
|Qv
j
µj
, (Qv
j
)⌃j
(Qv
j
)T )
p(pj
|po
) = N (pj
|µj
,⌃j
)orthographic projec3on
• Structured output SVM with margin rescaling • Jointly address object localiza;on and viewpoint es;ma;on �V P (y, y) =
\(yv,yv)180��V OC(y, y) = 1� yb \ yb
yb [ yb
�(y, y) = ↵�V OC(y, y) + (1� ↵)�V P (y, y)
Fine-‐grained viewpoint es3ma3on (angular viewpoint es3ma3on)3D2PM – 3D Deformable Part Models 11
5102022.530
45 8121618
36
5
10
15
20
#components
Viewpoint Estimation error
angular resolution
angu
lar e
rror
5102022.5
30
45 812
1618
36
5
10
15
20
#components
Viewpoint Estimation error
angular resolution
angu
lar e
rror
Fig. 3. Graphical representation of viewpoint classification results, left - linear inter-polation, right - exponential. The number of components is the number bins.
AP/MAE at 5� at 10� at 20� at 22.5� at 30� at 45�
3D2PM-C Lin 8 96.8 / 11.1 95.6 / 12.0 96.1 / 11.7 96.9 / 12.6 97.3 / 13.1 97.8 / 12.63D2PM-C Lin 12 98.5 / 7.8 98.6 / 8.0 95.9 / 8.7 97.9 / 8.3 98.3 / 8.5 97.2 / 13.33D2PM-C Lin 16 97.8 / 6.9 98.2 / 7.1 96.8 / 8.5 97.5 / 7.4 97.6 / 8.3 95.0 / 13.83D2PM-C Lin 18 99.1 / 5.6 99.0 / 5.9 99.3 / 6.3 98.6 / 7.2 98.5 / 8.3 97.0 / 12.93D2PM-C Lin 36 99.2 / 4.7 98.8 / 5.1 98.5 / 6.1 98.2 / 7.1 98.0 / 8.0 97.5 / 12.2
Table 4. Fine-grained viewpoint estimation in MAE [26] (EPFL dataset).
Comparing linear vs. exponential appearance interpolation, both models achievecomparable performance on the finer viewpoint resolution levels (5�, 10�, 20�,22.5�). However, for coarser viewpoint resolution and wider viewpoint spacingamong the bins, exponential interpolation provides worse results than linear in-terpolation.
Summary . While the 3D2PM-Cmodel is a simple continuous model, it achievesgood performance even when starting from wide angular spacing among themodel viewpoint (appearance) bins.
3.4 CAD vs. real image data
We want to explore the performance impact of using CAD data, as they haveunrealistic appearance but perfect viewpoint annotation. Thus we train modelson synthetic data only (synthetic), on real & synthetic (mixed) and on real datawhere we use CAD data only for model initialization. We do experiments with3D2PM-Dand 3D2PM-Cmodels with 8, 18 and 36 bins on the EPFL dataset.
Tab. 5 gives the result. Synthetic models with 36 bins achieve very goodviewpoint classification performance of 5.9� and 7.9� for 3D2PM-Cand 3D2PM-D, respectively, while also achieving good detection results of 96.3% and 95.2%AP. Adding real data (mixed) leads to improved results of 5.8� MAE for 3D2PM-Dand 5.1� MAE for 3D2PM-C, while real data only with 6.4� MAE and 5.6�
performs worse, speaking in favor of using CAD data with accurate annotation.
3.5 Coarse-to-fine viewpoint inference
As we go towards arbitrarily fine viewpoint estimation with 3D2PM-C, we in-crease the number of model evaluations for a given position and viewpoint
Med
ian angular e
rror
angular resolu;on #components
➡ 3D2PM model outperforms 3D object models by large margin
➡ On par performance to 2D object models
Coarse-‐grained viewpoint es3ma3on (viewpoint classifica3on)
0
5
10
15
20
25
30
8 viewpoint bins 16 viewpoint bins 36 viewpoint bins
4.77.5
9.6
4.76.9
11.1
5.87.2
12.9
24.8
Med
ian angular e
rror
Glasner et al [5] 3d^2PM-‐D 3D^2PM-‐C Lin 3D^2PM-‐C Exp
-‐20.1°
Object bounding box localiza3on results
0
20
40
60
80
100
Cars Bicycles
AP
Glasner et al [5] 3D^2PMDPM-‐3D-‐Const [1] DPM-‐VOCVP [1]
Pascal VOC 2007
+29.2%
50
60
70
80
90
100
Cars Bicycles
Zia et al [4] Glasner et al [5] Liebelt et al [3]3D^2PM DPM-‐VOCVP [1]
3D object classes
+4.7%
+24.3%
Ultra-‐wide baseline matching (quality of part correspondences)
➡ Unlike the state-of-the-art 3D models, 3D2PMs achieve detection performance on par with the 2D object models
0
40
80
45 90 135 180 AVG% correct F m
atrices Zia et al [4] DPM-‐3D-‐Const [1] 3D^2PM
3D pose es3ma3on results
• Deformable part model [2]‣ Mixture of star CRFs in 2D space‣ Parts are independent across components‣ Discrete appearance model‣ Op;mized for 2D bounding box localiza;on
...... ...
• 3D2PM‣ A star CRF in 3D space‣ Parts are in 3D and linked ‣ Con;nuous appearance model‣ Op;mized for object detec;on and viewpoint es;ma;on
pj = (xj , yj , zj)
......
�M
�1
3D Parts
3D Parts
s
Root
appe
arance
Part
appe
arance
Suppor;ng view K -‐ 1 Suppor;ng view K Interpolated root and part appearances
• Linear and exponen;al appearance interpola;on scheme• The model can synthesize infinitely many components without the need to learn them all• Allows arbitrary fine viewpoint es;ma;on• Faster inference w.r.t. to brute-‐force
Con3nuous appearance model
Model training
3D part inference
h�, (I1, y1, h1)i h�, (IM , yM , hM )i+...+
• Part inference in 3D per object instance ‣ Across all views of a given object instance
in which the part is visible in‣ Projected parts are observed by viewpoint-‐
specific model instan;a;ons
Discrete 3D part state space
3D OBJECT CLASSES DATASET (8 discrete viewpoint classes)
50
60
70
80
90
100
Cars Bicycles
MPP
E
Liebelt et al [3] Zia et al [4]Glasner et al [5] 3D^2PM-‐D
+10.2%+20.5%
3D object modelsCars Bicycles
Payet et al [6] Lopez et al [7]DPM-‐3D-‐Const. [1] DPM-‐VOCVP [1]
2D object models
EPFL MULTI-‐VIEW CARS DATASET (angular viewpoint annota3ons) ➡ State-of-the-art multi-view models can output discrete set of viewpoints
➡ Our 3D2PM-C can provide fine viewpoint estimates
➡ 3D2PM-C state-of-the-art on angular viewpoint estimation
AP /MAE at 5° #atomic opera3ons
3D2PM-‐C b36 full inference 99.2 / 4.7 2.20 x 1010
3D2PM-‐C b36 coarse to fine 99.0 / 7.0 0.48 x 1010
3D2PM-‐C b12 97.6 / 7.5 2.20 x 1010
3D2PM-‐C b18 98.0 / 6.9 2.20 x 1010
x5
• Coarse to fine inference
➡ x5 faster inference at minimal loss
References
+12.2%➡ 3D2PM due to the 3D model gives
accurate part correspondences➡ 3D2PM state-of-the-art on ultra-wide
baseline matching experiment
[1] B. Pepik, M. Stark, P. Gehler, B. Schiele Teaching 3D Geometry to Deformable Part models CVPR’12[2] P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan Object Detec3on with Discrimina3vely Trained Part Based Models PAMI’10[3] J. Liebelt, C. Schmid Mul3-‐view Object Class Detec3on With A 3D Geometric Model CVPR’10[4] M. Z. Zia, M. Stark, B. Schiele, and K. Schindler. Revisi3ng 3D Geometric Models for Accurate and Object Shape and Pose 3DRR’11[5] D. Glasner, M. Galun, S. Alpert, R. Basri, G. Shakhnarovich Viewpoint-‐Aware Object Detec3on and Pose Es3ma3on ICCV’11[6] N. Payet, S. Todorovic, From Contours to 3D Object Detec3on and Pose Es3ma3on ICCV’11[7] R. J. López-‐Sastre, T. Tuytelaars, S. Savarese. Deformable Part Models Revisited: A Performance Evalua3on for Object Category Pose Es3ma3on CORP’11
Acknowledgements: This work has been supported by the Max Planck Center for Visual Compu;ng and Communica;on. We thank M. Zeeshan Zia for his help in conduc;ng wide baseline matching experiments.
*Note the color coded part correspondences
Monday, December 3, 12