semantic parsing for priming object detection in rgb-d scenes
DESCRIPTION
Semantic Parsing for Priming Object Detection in RGB-D Scenes. Cesar Cadena and Jana Kosecka. 3rd Workshop On Semantic Perception, Mapping and Exploration (SPME) Karlsruhe, Germany ,2013. Motivation. Long-term robotic operation - PowerPoint PPT PresentationTRANSCRIPT
3rd Workshop On Semantic Perception, Mapping and Exploration (SPME)Karlsruhe, Germany ,2013
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Cesar Cadena and Jana Kosecka
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Motivation
5/5/2013
Long-term robotic operation
The semantic information about the surrounding environment is important for high level robotic tasks.
It is difficult to know a priori all the possible instances or classes of objects that the robot will find in a real operation.
Even if we know a lot of them, it is unreasonable and expensive, run all specific object detectors at the same time.
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Motivation
5/5/2013
Long-term robotic operation
The semantic information about the surrounding environment is important for high level robotic tasks.
It is difficult to know a priori all the possible instances or classes of objects that the robot will find in a real operation.
Even if we know a lot of them, it is unreasonable and expensive, run all specific object detectors at the same time.
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Motivation
5/5/2013
Long-term robotic operation
The semantic information about the surrounding environment is important for high level robotic tasks.
It is difficult to know a priori all the possible instances or classes of objects that the robot will find in a real operation.
Even if we know a lot of them, it is unreasonable and expensive, run all specific object detectors at the same time.
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Motivation
5/5/2013
Long-term robotic operation
The semantic information about the surrounding environment is important for high level robotic tasks.
It is difficult to know a priori all the possible instances or classes of objects that the robot will find in a real operation.
Even if we know a lot of them, it is unreasonable and expensive, run all specific object detectors at the same time.
Semantic Parsing for Priming Object Detection in RGB-D Scenes
However: There are things we can assume to be present
(almost) always Generic “detachable” objects also share some
characteristics
Urban: Ground Buildings Sky ObjectsIndoors: Ground Walls Ceiling Objects
Today: Ground – Structure – Furniture – Props
Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors
Motivation
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D Scenes
However: There are things we can assume to be present
(almost) always Generic “detachable” objects also share some
characteristics
Urban: Ground Buildings Sky ObjectsIndoors: Ground Walls Ceiling Objects
Today: Ground – Structure – Furniture – Props
Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors
Motivation
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D Scenes
However: There are things we can assume to be present
(almost) always Generic “detachable” objects also share some
characteristics
Urban: Ground Buildings Sky ObjectsIndoors: Ground Walls Ceiling Objects
Today: Ground – Structure – Furniture – Props
Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors
Motivation
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D Scenes
However: There are things we can assume to be present
(almost) always Generic “detachable” objects also share some
characteristics
Urban: Ground Buildings Sky ObjectsIndoors: Ground Walls Ceiling Objects
Today: Ground – Structure – Furniture – Props
Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors
Motivation
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D Scenes
However: There are things we can assume to be present
(almost) always Generic “detachable” objects also share some
characteristics
Urban: Ground Buildings Sky ObjectsIndoors: Ground Walls Ceiling Objects
Today: Ground – Structure – Furniture – Props
Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors
Our Problem
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D Scenes
However: There are things we can assume to be present
(almost) always Generic “detachable” objects also share some
characteristics
Urban: Ground Buildings Sky ObjectsIndoors: Ground Walls Ceiling Objects
Today: Ground – Structure – Furniture – Props
Efficiently to segment RGB+3D scenes into these general classes to be used as a prior for specific task detectors
Our Problem
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D Scenes
NYU Depth v2
5/5/2013
1449 labeled frames. 26 scenes classes. Labeling spans over 894 different classes.
N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, Indoor segmentation and support inference from RGBD images, in ECCV, 2012.
Thanks to N. Silberman for proving the mapping 894 to 4 classes.
Semantic Parsing for Priming Object Detection in RGB-D Scenes
The System
5/5/2013
Semantic Segmentation
MAPMarginals
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Different approaches
5/5/2013
Semantic Segmentation
MAPMarginals
N. Silberman et al. ECCV 2012 C. Couprie et al. CoRR 2013 X. Ren et al. CVPR 2012 D. Munoz et al. ECCV 2010 I. Endres and D. Hoeim, ECCV
2010
They have at least one:
Expensive over-segmentation
Expensive features Expensive Inference
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Our approach
5/5/2013
MAPMarginals
Semantic Segmentation
Conditional Random Fields
Potentials
Graph Structure Inferenc
ePreprocessing
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Outline
5/5/2013
MAPMarginals
Conditional Random Fields
Potentials
Graph Structure Inferenc
ePreprocessing (1)
(2)(3)
(5)Results(6)Conclusio
ns
(4)
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Preprocessing: Over-segmentation
5/5/2013
SLIC superpixels
R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk,SLIC superpixels compared to state-of-the-art superpixel methods,PAMI, 2012.
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Graph Structure
5/5/2013
Classical choice on images
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Graph Structure: Our choice
5/5/2013
Minimum Spanning Tree
Over 3D
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Graph Structure: Our choice
5/5/2013
Minimum Spanning Tree
Over 3D
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Potentials: Pairwise CRFs
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Potentials: Pairwise CRFs
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Potentials: Pairwise CRFs
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Potentials: unary
5/5/2013
frequency of label j in a k-NN queryfrequency of label j the database
J. Tighe and S. Lazebnik, Superparsing: Scalable nonparametric image parsing with superpixels,ECCV 2010.
The database is a kd-tree of features from training data
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Features 12D
5/5/2013
From Image: mean of Lab color space
3D vertical pixel location
1D entropy from vanishing points
1D
From 3D height and depth
2D mean and std of differences on depth
2D local planarity
1D neighboring planarity
1D vertical orientation
1D
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Features
5/5/2013
From Image: entropy from vanishing points
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Features
5/5/2013
From 3D mean and std of differences on depth
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Features
5/5/2013
From 3D mean and std of differences on depth
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Features
5/5/2013
From 3D mean and std of differences on depth
local planarity neighboring planarity vertical orientation
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Potentials: pairwise
5/5/2013
Lab color
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Inference
5/5/2013
We use belief propagation:
Exact results in MAP/marginals
Efficient computation, in
Thanks to our graph structure choice!
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Results: NYU-D v2 Dataset
5/5/2013
GT MAP
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Results: NYU-D v2 Dataset
5/5/2013
Confusion matrix:
Comparisons:
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Results: NYU-D v2 Dataset
5/5/2013
Confusion matrix:
Comparisons:
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Results: NYU-D v2 Dataset
5/5/2013
GT MAP
Some failures:
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Results: NYU-D v2 Dataset
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Marginal probabilities
5/5/2013
Provide very useful information for specific tasks, e.g. : Specific object detection Support inference
P(Ground) P(Structure) P(Furniture) P(Props)
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Conclusions
5/5/2013
We have presented a computational efficient approach for semantic segmentation of priming objects in indoors.
Our approach effectively uses 3D and Images cues.
Depth discontinuities are evidence for occlusions
The MST over 3D keeps intra-class components coherently connected.
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Discussion
5/5/2013
Features:
Local classifier:
Graph structure
Bunch of engineered features (>1000D)
Learned features(>1000D)
Select meaningful features(12D)
Logistic Regression Neural Networks k-NN
Dense ConnectionsImage
None MST over 3D
Silberman et al. 2012 Couprie et al. 2013
Ours.
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Thanks!!
5/5/2013
Cesar Cadena [email protected] Kosecka [email protected]
Funded by the US Army Research Office Grant W911NF-1110476.
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Working on:
5/5/2013
People detection by Shenghui Zhou
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Multi-view and video:
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Multi-view and video:
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Multi-view and video:
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Multi-view and video:
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Multi-view and video:
5/5/2013
Semantic Parsing for Priming Object Detection in RGB-D Scenes
Multi-view and video:
5/5/2013