analysing computer vision data - the good, the bad, and ...€¦ · analysing computer vision data,...

1
IEEE 2017 Conference on Analyzing Computer Vision Data - The Good, the Bad and the Ugly IEEE 2017 Conference on Analyzing Computer Vision Data - The Good, the Bad and the Ugly Computer Vision and Pattern Analyzing Computer Vision Data - The Good, the Bad and the Ugly Computer Vision and Pattern Analyzing Computer Vision Data - The Good, the Bad and the Ugly Recognition Oliver Zendel (AIT) Katrin Honauer (HCI) Markus Murschitz (AIT) Martin Humenberger (AIT) Gustavo Fernandez Domınguez (AIT) Recognition Oliver Zendel (AIT) Katrin Honauer (HCI) Markus Murschitz (AIT) Martin Humenberger (AIT) Gustavo Fernandez Domınguez (AIT) Contact: [email protected] Webpage: www.vitro-testing.com Contact: [email protected] Webpage: www.vitro-testing.com Overview: Findings: HID Loc. /GW/Param. Meaning Entry Overview: Findings: HID Loc. /GW/Param. Meaning Entry Overview: Findings: 0 L.s. / No / Number No l.s. Highly underexposed image; only black-level noise 0 L.s. / No / Number No l.s. Highly underexposed image; only black-level noise 26 L. s. / Part of / Position Part of l.s. is visible L.s. in image is cut apart by image border What constitutes a good test dataset? 26 L. s. / Part of / Position Part of l.s. is visible L.s. in image is cut apart by image border What constitutes a good test dataset? 142 L. s. / Less / Beam prop. Focused beam Scene with half lit object leaving a large portion severely underexposed Which content and aspects have to be included so that we really 142 L. s. / Less / Beam prop. Focused beam Scene with half lit object leaving a large portion severely underexposed 183 Medium / Less / Transparency Medium is optically thicker than Fog or haze in image reduces visibility depending on distance from observer Which content and aspects have to be included so that we really 183 Medium / Less / Transparency Medium is optically thicker than expected Fog or haze in image reduces visibility depending on distance from observer Which content and aspects have to be included so that we really test. Are positive test cases (The Good) enough? How about using expected test. Are positive test cases (The Good) enough? How about using 376 Object / Less / Complexity Object is less complex than Scene contains simple object without texture or self-shading (e.g. grey opaque sphere) test. Are positive test cases (The Good) enough? How about using border cases (The Ugly) and negative test cases (The Bad)? 376 Object / Less / Complexity Object is less complex than expected Scene contains simple object without texture or self-shading (e.g. grey opaque sphere) border cases (The Ugly) and negative test cases (The Bad)? 476 Object / No / Reflectance Obj. has no reflectance Well-lit scene contains a very dark object without texture nor shading border cases (The Ugly) and negative test cases (The Bad)? 476 Object / No / Reflectance Obj. has no reflectance Well-lit scene contains a very dark object without texture nor shading We create checklists of hazards that should be in your test data 482 Object / As well as / Obj. has both shiny and dull Object has a large glare spot on its surface that obscures same areas in the left/right We create checklists of hazards that should be in your test data 482 Object / As well as / Reflectance Obj. has both shiny and dull surface Object has a large glare spot on its surface that obscures same areas in the left/right image We create checklists of hazards that should be in your test data so that it can really test the robustness of CV algorithms! Reflectance surface image 701 Objects / Spatial aper . / Refl. creates a chaotic pattern Large parts of the image show an irregular distorted mirror-like reflection so that it can really test the robustness of CV algorithms! Correlation between hazard frames and algorithm performance 701 Objects / Spatial aper . / Reflectance Refl. creates a chaotic pattern Large parts of the image show an irregular distorted mirror-like reflection so that it can really test the robustness of CV algorithms! Correlation between hazard frames and algorithm performance Reflectance Dataset Survey: Correlation between hazard frames and algorithm performance 904 Obs. / Faster / Position Observer moves too fast Image has parts with clearly visible motion blur Dataset Survey: Using only difficult frames still results in meaningful evaluations 1090 Obs. / No / PSF No optical blurring Image contains strong aliasing artifacts Dataset Survey: Using only difficult frames still results in meaningful evaluations 1090 Obs. / No / PSF No optical blurring Image contains strong aliasing artifacts A thorough dataset survey for stereo vision test datasets gives a Testing with only easy frames will allow very little distinctions A thorough dataset survey for stereo vision test datasets gives a Testing with only easy frames will allow very little distinctions good overview of the historic progress that the community has Testing with only easy frames will allow very little distinctions good overview of the historic progress that the community has good overview of the historic progress that the community has made. We identified 28 datasets between 2002 and 2017: made. We identified 28 datasets between 2002 and 2017: Missing tests: Examples for identified hazards in the test datasets. Images taken from: KITTI 2015 [15], KITTI 2012 [7], Sintel [2], Sintel [2], made. We identified 28 datasets between 2002 and 2017: Missing tests: Middlebury [22], Middlebury [22], HCI [10], HCI [10], Freiburg CNN [17] and Freiburg CNN [17] Missing tests: [1] J.-L. Blanco, F.-A. Moreno, and J. Gonz´alez-Jim´enez. The malaga urban dataset: High-rate stereo and lidars in a realistic urban scenario. International Journal of Robotics Research, 33(2):207214, 2014. 2 We identified over 60 hazard entries which were not found in any of [1] J.-L. Blanco, F.-A. Moreno, and J. Gonz´alez-Jim´enez. The malaga urban dataset: High-rate stereo and lidars in a realistic urban scenario. International Journal of Robotics Research, 33(2):207214, 2014. 2 [2] D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black. A naturalistic open source movie for optical flow evaluation. In A. Fitzgibbon et al. (Eds.), editor, European Conf. on Computer Vision (ECCV), Part IV, LNCS 7577, pages 611–625. Springer-Verlag, Oct. 2012. 2 [3] M. Chen, K. Dhingra, W. Wu, L. Yang, R. Sukthankar, and J. Yang. PFID: Pittsburgh fast-food image dataset. In Proceedings of International Conference on Image Processing, 2009. 2 We identified over 60 hazard entries which were not found in any of [3] M. Chen, K. Dhingra, W. Wu, L. Yang, R. Sukthankar, and J. Yang. PFID: Pittsburgh fast-food image dataset. In Proceedings of International Conference on Image Processing, 2009. 2 [4] K. Cordes, B. Rosenhahn, and J. Ostermann. Increasing the accuracy of feature evaluation benchmarks using differential evolution. In IEEE Symposium on Differential Evolution (SDE), 2011. 2 [5] M. Cordts, M. Omran, S. Ramos, T. Scharw¨achter, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The cityscapes dataset. In CVPR Workshop on The Future of Datasets in Vision, 2015. 2 We identified over 60 hazard entries which were not found in any of the existing datasets and represent untested behavior. Of these [5] M. Cordts, M. Omran, S. Ramos, T. Scharw¨achter, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The cityscapes dataset. In CVPR Workshop on The Future of Datasets in Vision, 2015. 2 [6] A. Gaidon, Q. Wang, Y. Cabon, and E. Vig. Virtual worlds as proxy for multi-object tracking analysis. In CVPR, 2016. 2 [7] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the KITTI vision benchmark suite. In CVPR, 2012. 2 the existing datasets and represent untested behavior. Of these [7] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the KITTI vision benchmark suite. In CVPR, 2012. 2 [8] R. Haeusler and D. Kondermann. Synthesizing real world stereo challenges. In German Conference on Pattern Recognition, pages 164–173. Springer, 2013. 2 [9] H. Hirschmuller and D. Scharstein. Evaluation of cost functions for stereo matching. In CVPR, pages 18. IEEE, 2007. 2 the existing datasets and represent untested behavior. Of these [9] H. Hirschmuller and D. Scharstein. Evaluation of cost functions for stereo matching. In CVPR, pages 18. IEEE, 2007. 2 [10] D. Kondermann, R. Nair, K. Honauer, K. Krispin, J. Andrulis, A. Brock, B. Gussefeld, M. Rahimimoghaddam, S. Hofmann, C. Brenner, et al. The hci benchmark suite: Stereo and flow ground truth with uncertainties for urban autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 1928, 2016. 2 we handpicked 16 border cases and 16 negative test cases which [11] D. Kondermann, A. Sellent, B. J¨ahne, and J. Wingberm¨uhle. Robust Vision Challenge, 2012. 2 [12] L. Ladick´y, P. Sturgess, C. Russell, S. Sengupta, Y. Bastanlar, W. Clocksin, and P. H. Torr. Joint optimisation for object class segmentation and dense stereo reconstruction. International Journal of Computer Vision, 2012. 2 [13] W. Maddern, G. Pascoe, C. Linegar, and P. Newman. 1 Year, 1000km: The Oxford RobotCar Dataset. The International Journal of Robotics Research (IJRR), to appear. 2 we handpicked 16 border cases and 16 negative test cases which [14] M. Martorell, A. Maki, S. Martull, Y. Ohkawa, and K. Fukui. Towards a simulation driven stereo vision system. In ICPR, pages 1038–1042, 2012. 2 [15] M. Menze and A. Geiger. Object scene flow for autonomous vehicles. In Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 2 [16] D. Neilson and Y.-H. Yang. Evaluation of constructable match cost measures for stereo correspondence using cluster ranking. In Computer Vision and Pattern Recognition, 2008. Proceedings CVPR’08, 2008 IEEE Computer Society Conference on. IEEE, 2008. 2 provide a good input to people who plan new challenging [16] D. Neilson and Y.-H. Yang. Evaluation of constructable match cost measures for stereo correspondence using cluster ranking. In Computer Vision and Pattern Recognition, 2008. Proceedings CVPR’08, 2008 IEEE Computer Society Conference on. IEEE, 2008. 2 [17] N.Mayer, E.Ilg, P.H¨ausser, P.Fischer, D.Cremers, A.Dosovitskiy, and T.Brox. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2016. arXiv:1512.02134. 2 [18] G. Pandey, J. R. McBride, and R. M. Eustice. Ford campus vision and lidar data set. International Journal of Robotics Research, 30(13):15431552, 2011. 2 [19] D. Pfeiffer, S. K. Gehrig, and N. Schneider. Exploiting the power of stereo confidences. In CVPR, 2013. 2 provide a good input to people who plan new challenging [18] G. Pandey, J. R. McBride, and R. M. Eustice. Ford campus vision and lidar data set. International Journal of Robotics Research, 30(13):15431552, 2011. 2 [19] D. Pfeiffer, S. K. Gehrig, and N. Schneider. Exploiting the power of stereo confidences. In CVPR, 2013. 2 [20] R. Reulke, A. Luber, M. Haberjahn, and B. Piltz. Validierung von mobilen stereokamerasystemen in einem 3dtestfeld. (EPFL-CONF-155479), 2009. 2 [21] G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. Lopez. The SYNTHIA Dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In CVPR, 2016. 2 datasets. [21] G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. Lopez. The SYNTHIA Dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In CVPR, 2016. 2 [22] D. Scharstein, H. Hirschm¨uller, Y. Kitajima, G. Krathwohl, N. Nesic, X. Wang, and P. Westling. High-resolution stereo datasets with subpixel-accurate ground truth. In Pattern Recognition, pages 31–42. Springer, 2014. 2 [23] D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV, 47(1/2/3):742, 2002. 2 datasets. [23] D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV, 47(1/2/3):742, 2002. 2 [24] D. Scharstein and R. Szeliski. High-accuracy stereo depth maps using structured light. In CVPR, pages 195–202. IEEE Computer Society, 2003. 2 [25] T. Scharwchter, M. Enzweiler, S. Roth, and U. Franke. Stixmantics: A medium-level model for real-time semantic scene understanding. In ECCV, 2014. 2 [25] T. Scharwchter, M. Enzweiler, S. Roth, and U. Franke. Stixmantics: A medium-level model for real-time semantic scene understanding. In ECCV, 2014. 2 [26] M. Smith, I. Baldwin,W. Churchill, R. Paul, and P. Newman. The new college vision and laser data set. The International Journal of Robotics Research, 28(5):595–599, May 2009. 2 [27] T. Vaudrey, C. Rabe, R. Klette, and J. Milburn. Differences between stereo and motion behavior on synthetic and realworld stereo sequences. In IVCNZ, pages 16, 2008. 2 Border cases (the Bad) [28] A. Wedel, C. Rabe, T. Vaudrey, T. Brox, U. Franke, and D. Cremers. Efficient dense scene flow from sparse or dense stereo data. In 10th European Conference on Computer Vision (ECCV ’08), pages 739–751, Berlin, Heidelberg, 2008. Springer-Verlag. 2 Border cases (the Bad) A generic checklist „CV -HAZOP“ was used as a starting point to Border cases (the Bad) A generic checklist „CV -HAZOP“ was used as a starting point to 310: Two different sized objects are positioned on the same epipolar A generic checklist „CV -HAZOP“ was used as a starting point to generate a specialized hazard checklist for stereo vision 310: Two different sized objects are positioned on the same epipolar generate a specialized hazard checklist for stereo vision line but their projected views are identical generate a specialized hazard checklist for stereo vision line but their projected views are identical Each hazard from the checklist was searched for in the well-known 694: Scene contains a clear reflection of observer together with Each hazard from the checklist was searched for in the well-known 694: Scene contains a clear reflection of observer together with datasets Freiburg, HCI, KITTI, Middlebury and Sintel 694: Scene contains a clear reflection of observer together with potential matching parts on the same epipolar datasets Freiburg, HCI, KITTI, Middlebury and Sintel potential matching parts on the same epipolar potential matching parts on the same epipolar Evaluation Results for Sintel, Intensity images were taken from the original test dataset [2] 758: Scene contains pronounced refraction rings (e.g. oil slick) Evaluation Results for Sintel, Intensity images were taken from the original test dataset [2] 758: Scene contains pronounced refraction rings (e.g. oil slick) 758: Scene contains pronounced refraction rings (e.g. oil slick) Negative test cases (the Ugly) Negative test cases (the Ugly) 245: Cloud of visible particles (e.g. pollen, small leaves) in the air is 245: Cloud of visible particles (e.g. pollen, small leaves) in the air is 245: Cloud of visible particles (e.g. pollen, small leaves) in the air is obscuring the whole scene obscuring the whole scene obscuring the whole scene 916: One camera lens contains dust/dried mud that creates a 916: One camera lens contains dust/dried mud that creates a partially defocused area in the image partially defocused area in the image partially defocused area in the image 933: Images contain rolling shutter artifacts 933: Images contain rolling shutter artifacts All entries and more results on: All entries and more results on: All entries and more results on: www.vitro-testing.com www.vitro-testing.com www.vitro-testing.com www.vitro-testing.com Images taken from the original test datasets (l.t.r , t.t.b): Middlebury [23], Middlebury [24], Middlebury [9], EISATS S1 [27], EISATS S2 [28], Neilson [16], EISATS S6 [20], New College [26], Pittsburgh [3], EVD [4], Ford Campus [18], HCI-Robust [11], KITTI 2012 [7], Images taken from the original test datasets (l.t.r , t.t.b): Middlebury [23], Middlebury [24], Middlebury [9], EISATS S1 [27], EISATS S2 [28], Neilson [16], EISATS S6 [20], New College [26], Pittsburgh [3], EVD [4], Ford Campus [18], HCI-Robust [11], KITTI 2012 [7], Leuven [12], Tsukuba [14], HCI-Synth [8], Stixel [19], Daimler Urban [25], Malaga Urban [1], Middlebury [22], Cityscapes [5], KITTI 2015 [15], MPI Sintel [2], Freiburg CNN [17], HCI Training [10], SYNTHIA [21], Virtual KITTI [6], Oxford RobotCar [13] Leuven [12], Tsukuba [14], HCI-Synth [8], Stixel [19], Daimler Urban [25], Malaga Urban [1], Middlebury [22], Cityscapes [5], KITTI 2015 [15], MPI Sintel [2], Freiburg CNN [17], HCI Training [10], SYNTHIA [21], Virtual KITTI [6], Oxford RobotCar [13]

Upload: others

Post on 03-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysing Computer Vision Data - The Good, the Bad, and ...€¦ · Analysing Computer Vision Data, Test Dataset Evaluation, Risk Analysis for CV Keywords: CVPR 2017, Test Datasets,

IEEE 2017 Conference on Analyzing Computer Vision Data - The Good, the Bad and the Ugly IEEE 2017 Conference on Analyzing Computer Vision Data - The Good, the Bad and the Ugly Computer Vision and Pattern Analyzing Computer Vision Data - The Good, the Bad and the Ugly Computer Vision and Pattern Analyzing Computer Vision Data - The Good, the Bad and the Ugly Computer Vision and Pattern Recognition

Analyzing Computer Vision Data - The Good, the Bad and the UglyOliver Zendel (AIT) Katrin Honauer (HCI) Markus Murschitz (AIT) Martin Humenberger (AIT) Gustavo Fernandez Domınguez (AIT) Recognition Oliver Zendel (AIT) Katrin Honauer (HCI) Markus Murschitz (AIT) Martin Humenberger (AIT) Gustavo Fernandez Domınguez (AIT) Recognition Oliver Zendel (AIT) Katrin Honauer (HCI) Markus Murschitz (AIT) Martin Humenberger (AIT) Gustavo Fernandez Domınguez (AIT)

Contact: [email protected] Webpage: www.vitro-testing.comContact: [email protected] Webpage: www.vitro-testing.comContact: [email protected] Webpage: www.vitro-testing.com

Overview: Findings:HID Loc. /GW/Param. Meaning Entry

Overview: Findings:HID Loc. /GW/Param. Meaning Entry

Overview: Findings:0 L.s. / No / Number No l.s. Highly underexposed image; only black-level noiseOverview: 0 L.s. / No / Number No l.s. Highly underexposed image; only black-level noise

26 L. s. / Part of / Position Part of l.s. is visible L.s. in image is cut apart by image border

� What constitutes a good test dataset?26 L. s. / Part of / Position Part of l.s. is visible L.s. in image is cut apart by image border

� What constitutes a good test dataset?142 L. s. / Less / Beam prop. Focused beam Scene with half lit object leaving a large portion severely underexposed

� Which content and aspects have to be included so that we really 142 L. s. / Less / Beam prop. Focused beam Scene with half lit object leaving a large portion severely underexposed

183 Medium / Less / Transparency Medium is optically thicker than Fog or haze in image reduces visibility depending on distance from observer� Which content and aspects have to be included so that we really 183 Medium / Less / Transparency Medium is optically thicker than

expected

Fog or haze in image reduces visibility depending on distance from observer� Which content and aspects have to be included so that we really test. Are positive test cases (The Good) enough? How about using

expected

test. Are positive test cases (The Good) enough? How about using 376 Object / Less / Complexity Object is less complex than Scene contains simple object without texture or self-shading (e.g. grey opaque sphere)test. Are positive test cases (The Good) enough? How about using border cases (The Ugly) and negative test cases (The Bad)?

376 Object / Less / Complexity Object is less complex than

expected

Scene contains simple object without texture or self-shading (e.g. grey opaque sphere)

border cases (The Ugly) and negative test cases (The Bad)?expected

476 Object / No / Reflectance Obj. has no reflectance Well-lit scene contains a very dark object without texture nor shadingborder cases (The Ugly) and negative test cases (The Bad)? 476 Object / No / Reflectance Obj. has no reflectance Well-lit scene contains a very dark object without texture nor shading

� We create checklists of hazards that should be in your test data 482 Object / As well as / Obj. has both shiny and dull Object has a large glare spot on its surface that obscures same areas in the left/right � We create checklists of hazards that should be in your test data 482 Object / As well as /

Reflectance

Obj. has both shiny and dull

surface

Object has a large glare spot on its surface that obscures same areas in the left/right

image� We create checklists of hazards that should be in your test data so that it can really test the robustness of CV algorithms!

Reflectance surface image

701 Objects / Spatial aper. / Refl. creates a chaotic pattern Large parts of the image show an irregular distorted mirror-like reflectionso that it can really test the robustness of CV algorithms!� Correlation between hazard frames and algorithm performance

701 Objects / Spatial aper. /

Reflectance

Refl. creates a chaotic pattern Large parts of the image show an irregular distorted mirror-like reflectionso that it can really test the robustness of CV algorithms!� Correlation between hazard frames and algorithm performance

Reflectance

Dataset Survey:� Correlation between hazard frames and algorithm performance

904 Obs. / Faster / Position Observer moves too fast Image has parts with clearly visible motion blur

Dataset Survey:� Using only difficult frames still results in meaningful evaluations

904 Obs. / Faster / Position Observer moves too fast Image has parts with clearly visible motion blur

1090 Obs. / No / PSF No optical blurring Image contains strong aliasing artifactsDataset Survey:

� Using only difficult frames still results in meaningful evaluations1090 Obs. / No / PSF No optical blurring Image contains strong aliasing artifacts

� A thorough dataset survey for stereo vision test datasets gives a � Using only difficult frames still results in meaningful evaluations� Testing with only easy frames will allow very little distinctions� A thorough dataset survey for stereo vision test datasets gives a � Testing with only easy frames will allow very little distinctions� A thorough dataset survey for stereo vision test datasets gives a

good overview of the historic progress that the community has � Testing with only easy frames will allow very little distinctions

good overview of the historic progress that the community has good overview of the historic progress that the community has made. We identified 28 datasets between 2002 and 2017:made. We identified 28 datasets between 2002 and 2017:

Missing tests:Examples for identified hazards in the test datasets. Images taken from: KITTI 2015 [15], KITTI 2012 [7], Sintel [2], Sintel [2], made. We identified 28 datasets between 2002 and 2017:

Missing tests:Middlebury [22], Middlebury [22], HCI [10], HCI [10], Freiburg CNN [17] and Freiburg CNN [17] Missing tests:[1] J.-L. Blanco, F.-A. Moreno, and J. Gonz´alez-Jim´enez. The malaga urban dataset: High-rate stereo and lidars in a realistic urban scenario. International Journal of Robotics Research, 33(2):207–214, 2014. 2

Missing tests:� We identified over 60 hazard entries which were not found in any of

[1] J.-L. Blanco, F.-A. Moreno, and J. Gonz´alez-Jim´enez. The malaga urban dataset: High-rate stereo and lidars in a realistic urban scenario. International Journal of Robotics Research, 33(2):207–214, 2014. 2[2] D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black. A naturalistic open source movie for optical flow evaluation. In A. Fitzgibbon et al. (Eds.), editor, European Conf. on Computer Vision (ECCV), Part IV, LNCS 7577, pages 611–625. Springer-Verlag, Oct. 2012. 2 [3] M. Chen, K. Dhingra, W. Wu, L. Yang, R. Sukthankar, and J. Yang. PFID: Pittsburgh fast-food image dataset. In Proceedings of International Conference on Image Processing, 2009. 2

� We identified over 60 hazard entries which were not found in any of [3] M. Chen, K. Dhingra, W. Wu, L. Yang, R. Sukthankar, and J. Yang. PFID: Pittsburgh fast-food image dataset. In Proceedings of International Conference on Image Processing, 2009. 2[4] K. Cordes, B. Rosenhahn, and J. Ostermann. Increasing the accuracy of feature evaluation benchmarks using differential evolution. In IEEE Symposium on Differential Evolution (SDE), 2011. 2[5] M. Cordts, M. Omran, S. Ramos, T. Scharw¨achter, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The cityscapes dataset. In CVPR Workshop on The Future of Datasets in Vision, 2015. 2 � We identified over 60 hazard entries which were not found in any of

the existing datasets and represent untested behavior. Of these [5] M. Cordts, M. Omran, S. Ramos, T. Scharw¨achter, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele. The cityscapes dataset. In CVPR Workshop on The Future of Datasets in Vision, 2015. 2[6] A. Gaidon, Q. Wang, Y. Cabon, and E. Vig. Virtual worlds as proxy for multi-object tracking analysis. In CVPR, 2016. 2[7] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the KITTI vision benchmark suite. In CVPR, 2012. 2

the existing datasets and represent untested behavior. Of these [7] A. Geiger, P. Lenz, and R. Urtasun. Are we ready for autonomous driving? the KITTI vision benchmark suite. In CVPR, 2012. 2[8] R. Haeusler and D. Kondermann. Synthesizing real world stereo challenges. In German Conference on Pattern Recognition, pages 164–173. Springer, 2013. 2[9] H. Hirschmuller and D. Scharstein. Evaluation of cost functions for stereo matching. In CVPR, pages 1–8. IEEE, 2007. 2 the existing datasets and represent untested behavior. Of these [9] H. Hirschmuller and D. Scharstein. Evaluation of cost functions for stereo matching. In CVPR, pages 1–8. IEEE, 2007. 2[10] D. Kondermann, R. Nair, K. Honauer, K. Krispin, J. Andrulis, A. Brock, B. Gussefeld, M. Rahimimoghaddam, S. Hofmann, C. Brenner, et al. The hci benchmark suite: Stereo and flow ground truth with uncertainties for urban autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 19–28, 2016. 2

we handpicked 16 border cases and 16 negative test cases which Computer Vision and Pattern Recognition Workshops, pages 19–28, 2016. 2[11] D. Kondermann, A. Sellent, B. J¨ahne, and J. Wingberm¨uhle. Robust Vision Challenge, 2012. 2 [12] L. Ladick´y, P. Sturgess, C. Russell, S. Sengupta, Y. Bastanlar, W. Clocksin, and P. H. Torr. Joint optimisation for object class segmentation and dense stereo reconstruction. International Journal of Computer Vision, 2012. 2 [13] W. Maddern, G. Pascoe, C. Linegar, and P. Newman. 1 Year, 1000km: The Oxford RobotCar Dataset. The International Journal of Robotics Research (IJRR), to appear. 2 we handpicked 16 border cases and 16 negative test cases which [14] M. Martorell, A. Maki, S. Martull, Y. Ohkawa, and K. Fukui. Towards a simulation driven stereo vision system. In ICPR, pages 1038–1042, 2012. 2[15] M. Menze and A. Geiger. Object scene flow for autonomous vehicles. In Conference on Computer Vision and Pattern Recognition (CVPR), 2015. 2[16] D. Neilson and Y.-H. Yang. Evaluation of constructable match cost measures for stereo correspondence using cluster ranking. In Computer Vision and Pattern Recognition, 2008. Proceedings CVPR’08, 2008 IEEE Computer Society Conference on. IEEE, 2008. 2

we handpicked 16 border cases and 16 negative test cases which provide a good input to people who plan new challenging

[16] D. Neilson and Y.-H. Yang. Evaluation of constructable match cost measures for stereo correspondence using cluster ranking. In Computer Vision and Pattern Recognition, 2008. Proceedings CVPR’08, 2008 IEEE Computer Society Conference on. IEEE, 2008. 2 [17] N.Mayer, E.Ilg, P.H¨ausser, P.Fischer, D.Cremers, A.Dosovitskiy, and T.Brox. A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2016. arXiv:1512.02134. 2[18] G. Pandey, J. R. McBride, and R. M. Eustice. Ford campus vision and lidar data set. International Journal of Robotics Research, 30(13):1543–1552, 2011. 2 [19] D. Pfeiffer, S. K. Gehrig, and N. Schneider. Exploiting the power of stereo confidences. In CVPR, 2013. 2 provide a good input to people who plan new challenging [18] G. Pandey, J. R. McBride, and R. M. Eustice. Ford campus vision and lidar data set. International Journal of Robotics Research, 30(13):1543–1552, 2011. 2 [19] D. Pfeiffer, S. K. Gehrig, and N. Schneider. Exploiting the power of stereo confidences. In CVPR, 2013. 2[20] R. Reulke, A. Luber, M. Haberjahn, and B. Piltz. Validierung von mobilen stereokamerasystemen in einem 3dtestfeld. (EPFL-CONF-155479), 2009. 2[21] G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. Lopez. The SYNTHIA Dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In CVPR, 2016. 2

provide a good input to people who plan new challenging datasets.

[21] G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. Lopez. The SYNTHIA Dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In CVPR, 2016. 2[22] D. Scharstein, H. Hirschm¨uller, Y. Kitajima, G. Krathwohl, N. Nesic, X. Wang, and P. Westling. High-resolution stereo datasets with subpixel-accurate ground truth. In Pattern Recognition, pages 31–42. Springer, 2014. 2[23] D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV, 47(1/2/3):7–42, 2002. 2 datasets.[23] D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. IJCV, 47(1/2/3):7–42, 2002. 2[24] D. Scharstein and R. Szeliski. High-accuracy stereo depth maps using structured light. In CVPR, pages 195–202. IEEE Computer Society, 2003. 2[25] T. Scharwchter, M. Enzweiler, S. Roth, and U. Franke. Stixmantics: A medium-level model for real-time semantic scene understanding. In ECCV, 2014. 2 datasets.[25] T. Scharwchter, M. Enzweiler, S. Roth, and U. Franke. Stixmantics: A medium-level model for real-time semantic scene understanding. In ECCV, 2014. 2[26] M. Smith, I. Baldwin,W. Churchill, R. Paul, and P. Newman. The new college vision and laser data set. The International Journal of Robotics Research, 28(5):595–599, May 2009. 2[27] T. Vaudrey, C. Rabe, R. Klette, and J. Milburn. Differences between stereo and motion behavior on synthetic and realworld stereo sequences. In IVCNZ, pages 1–6, 2008. 2

Border cases (the Bad)[27] T. Vaudrey, C. Rabe, R. Klette, and J. Milburn. Differences between stereo and motion behavior on synthetic and realworld stereo sequences. In IVCNZ, pages 1–6, 2008. 2[28] A. Wedel, C. Rabe, T. Vaudrey, T. Brox, U. Franke, and D. Cremers. Efficient dense scene flow from sparse or dense stereo data. In 10th European Conference on Computer Vision (ECCV ’08), pages 739–751, Berlin, Heidelberg, 2008. Springer-Verlag. 2

Border cases (the Bad)� A generic checklist „CV-HAZOP“ was used as a starting point to

Border cases (the Bad)� A generic checklist „CV-HAZOP“ was used as a starting point to

� 310: Two different sized objects are positioned on the same epipolar � A generic checklist „CV-HAZOP“ was used as a starting point to generate a specialized hazard checklist for stereo vision

� 310: Two different sized objects are positioned on the same epipolar generate a specialized hazard checklist for stereo vision

� 310: Two different sized objects are positioned on the same epipolar line but their projected views are identicalgenerate a specialized hazard checklist for stereo vision line but their projected views are identical

� Each hazard from the checklist was searched for in the well-known line but their projected views are identical

� 694: Scene contains a clear reflection of observer together with � Each hazard from the checklist was searched for in the well-known � 694: Scene contains a clear reflection of observer together with � Each hazard from the checklist was searched for in the well-known

datasets Freiburg, HCI, KITTI, Middlebury and Sintel� 694: Scene contains a clear reflection of observer together with

potential matching parts on the same epipolardatasets Freiburg, HCI, KITTI, Middlebury and Sintel potential matching parts on the same epipolardatasets Freiburg, HCI, KITTI, Middlebury and Sintel potential matching parts on the same epipolarEvaluation Results for Sintel, Intensity images were taken from the original test dataset [2]

� 758: Scene contains pronounced refraction rings (e.g. oil slick)Evaluation Results for Sintel, Intensity images were taken from the original test dataset [2]

� 758: Scene contains pronounced refraction rings (e.g. oil slick)� 758: Scene contains pronounced refraction rings (e.g. oil slick)Negative test cases (the Ugly)Negative test cases (the Ugly)Negative test cases (the Ugly)� 245: Cloud of visible particles (e.g. pollen, small leaves) in the air is � 245: Cloud of visible particles (e.g. pollen, small leaves) in the air is � 245: Cloud of visible particles (e.g. pollen, small leaves) in the air is

obscuring the whole sceneobscuring the whole sceneobscuring the whole scene� 916: One camera lens contains dust/dried mud that creates a � 916: One camera lens contains dust/dried mud that creates a � 916: One camera lens contains dust/dried mud that creates a

partially defocused area in the imagepartially defocused area in the imagepartially defocused area in the image

� 933: Images contain rolling shutter artifacts� 933: Images contain rolling shutter artifacts� 933: Images contain rolling shutter artifacts

All entries and more results on:All entries and more results on:All entries and more results on:

www.vitro-testing.comwww.vitro-testing.comwww.vitro-testing.comwww.vitro-testing.comImages taken from the original test datasets (l.t.r, t.t.b): Middlebury [23], Middlebury [24], Middlebury [9], EISATS S1 [27], EISATS S2 [28], Neilson [16], EISATS S6 [20], New College [26], Pittsburgh [3], EVD [4], Ford Campus [18], HCI-Robust [11], KITTI 2012 [7], Images taken from the original test datasets (l.t.r, t.t.b): Middlebury [23], Middlebury [24], Middlebury [9], EISATS S1 [27], EISATS S2 [28], Neilson [16], EISATS S6 [20], New College [26], Pittsburgh [3], EVD [4], Ford Campus [18], HCI-Robust [11], KITTI 2012 [7], Leuven [12], Tsukuba [14], HCI-Synth [8], Stixel [19], Daimler Urban [25], Malaga Urban [1], Middlebury [22], Cityscapes [5], KITTI 2015 [15], MPI Sintel [2], Freiburg CNN [17], HCI Training [10], SYNTHIA [21], Virtual KITTI [6], Oxford RobotCar [13]Leuven [12], Tsukuba [14], HCI-Synth [8], Stixel [19], Daimler Urban [25], Malaga Urban [1], Middlebury [22], Cityscapes [5], KITTI 2015 [15], MPI Sintel [2], Freiburg CNN [17], HCI Training [10], SYNTHIA [21], Virtual KITTI [6], Oxford RobotCar [13]