sun rgb-d: a rgb-d scene understanding benchmark suite …rgbd.cs.princeton.edu/poster.pdf · sun...

1
SUN RGB-D SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite Shuran Song Samuel P. Lichtenberg Jianxiong Xiao Motivation !"# %&'() *+ ,#! -./01 ,23& !"##$ !&"''( ,&4567 89'& ! # :;<&=( >446(?@64 )* +,-.,/0123/ )* +,-.,/0123/ '* 456,70 :;<&=( A65& 83/, 9,+ -66B >446(?@64 83/, :33. ;1<3=0 Kinect v1 Kinect v2 Asus Xtion Intel Realsense color raw depth refined depth raw points refined points bedroom classroom dining room bathroom office home office conference room kitchen 2D segmentation 3D annotaion 2D segmentation 3D annotaion Effective free space Outside the room Inside some objects Beyond cutoff distance bathroom(6.4%) others(8.0%) classroom (9.3%) office (11.0%) furniture store (11.3%) bedroom(12.6%) computer room(1.0%) lecture theatre(1.2%) library(1.4%) study space(1.9%) home office(1.9%) discussion area(2.0%) dining area(2.4%) conference room(2.6%) lab(3.0%) corridor(3.8%) kitchen(5.6%) living room(6.0%) rest space(6.3%) dining room(2.3%) (a) object distribution (b) scene distribution 17712 0 1250 2500 3750 5000 chair table desk pillow sofa bed box cabinet garbage bin lamp shelf sofa chair monitor drawer frame sink side table paper trash can night stand book door book shelf computer dresser curtain toilet rack bag keyboard cpu tv kitchen cabinet coffee table fridge white board printer dining table stool bottle towel cup plant painting hanging cabinet mirror unknown computer monitor laptop board tray steel cabinet oven bench clothes bowl cooker kitchen tool carton box Kinect v2 SUN3D (ASUS Xtion) NYUv2 (Kinect v1) Intel RealSense Benchmark Tasks Manhattan Box (0.99) Ground Truth Geometric Context (0.27) Convex Hull (0.90) Convex Hull (0.85) Geometric Context (0.57) Ground Truth Manhattan Box (0.811) Data Capturing Annotation IoU: 53.1 Rr: 0.111 Rg : 0.111 Pg: 0.5 IoU 72.9 Rr: 0.333 Rg: 0.667 Pg: 0.667 IoU 63.9 Rr: 0.333 Rg: 0.667 Pg:1 IoU: 77.0 Rr: 0.25 Rg: 0.25 Pg: 0.5 Ground truth Sliding Shapes IoU: 54.6 Rr : 0.333 Rg : 0.333 Pg: 0.125 IoU:60 Rr: 0.50 Rg : 0.0.50 Pg: 0.5 bathtub bed bookshelf box chair counter desk door dresser garbage bin lamp monitor night stand pillow sink sofa table tv toilet Scene Classification Room Layout Estimation Total Scene Understanding 3D Object Detection and Pose Estimation Semantic Segmentation NYU 1,449 LSUN 10,195,373 PASCAL VOC 11,530 ImageNet 131,067 log 10 RGB datasets RGB-D SUNRGBD 10,335 NYU 1,449 P RGB-D NYU depth V2 B3DO SUN3D RealSense Xtion Kinect v1 Kinect v2 weight (pound) 0.077 0.5 4 4.5 size (inch) 5.2 × 0.25× 0.75 7.1 × 1.4 × 2 11 × 2.3 × 2.7 9.8 × 2.7 × 2.7 power 2.5W USB 2.5W USB 12.96W 115W depth resolution 628 × 468 640 × 480 640 × 480 512 × 424 color resolution 1920 × 1080 640 × 480 640 × 480 1920 × 1080 RealSense RGB-D Sensors 2. Top 4. Front 3. Side 1. Image 2. Top 4. Front 3. Side 1. Image Example Scenes Annotation Tool for 3D Object and 3D Room Layout Statistics of Semantic Annotation Examples of 2D and 3D Annotation References [NYU] Indoor Segmentation and Support Inference from RGBD Images. N. Silber- man, P. Kohli, D. Hoiem and R. Fergus. In ECCV, 2012. [SUN3D] SUN3D: A database of big spaces reconstructed using SfM and object labels. J. Xiao, A. Owens, and A. Torralba. In ICCV, 2013. [B3DO] A category-level 3-d object dataset: Putting the kinect to work. A. Janoch, S. Karayev, Y. Jia, J. T. Barron, M. Fritz, K. Saenko, and T. Darrell. In ICCV Workshop, 2011. home office RGB (38.1) D (27.7) RGB-D (39.0) RGB (19.7) D (20.1) RGB-D (23.0) bathroom bedroom classroom computer room conference room corridor dining area dining room discussion area furniture store kitchen lab lecture theatre library living room rest space bathroom bedroom classroom computer room conference room corridor dining area dining room discussion area furniture store kitchen lab lecture theatre library living room rest space GIST + RBF kernel PLACES- CNN + RBF kernel Precision = # prediction boxes # matched pairs with a correct category label Recall = # matched pairs with a correct category label # ground truth boxes Free Space IoU Precision Recall for all objects We introduce SUN RGB-D, a Pascal-scale RGB-D scene understanding dataset, which has 2D and 3D annota- tion for both objects and rooms. RGB-D sensors have also enabled rapid progress for scene understanding. However, the small dataset size has become a major bottleneck. Another problem of existing RGB-D datasets is that most of them are only labeled in 2D. Data & Code: http://rgbd.cs.princeton.edu Example Objects This work is supported by gift funds from Intel. Acknowledgement Kinect v2 and Battery Capturing Setup

Upload: others

Post on 17-Mar-2020

69 views

Category:

Documents


0 download

TRANSCRIPT

SUN RGB-D

SUN RGB-D: A RGB-D Scene Understanding Benchmark SuiteShuran Song Samuel P. Lichtenberg Jianxiong Xiao

Motivation

!"#$%&'()$*+$ ,#!$-./01$,23&$ !"##$% !&"''(%,&4567$89'&$ !% #%

:;<&=($>446(?@64$

)*%+,-.,/0123/% )*%+,-.,/0123/%'*%456,70%

:;<&=($A65&$ 83/,% 9,+%-66B$>446(?@64$83/,% :33.%;1<3=0%

Kinect v1 Kinect v2Asus XtionIntel Realsense

colo

rra

w d

epth

re�n

ed d

epth

raw

poi

nts

re�n

ed p

oint

s

bedr

oom

clas

sroo

m

dini

ng ro

omba

thro

omof

fice

hom

e offi

ceco

nfer

ence

room

kitc

hen

2D segmentation 3D annotaion 2D segmentation 3D annotaion

Effective free spaceOutside the roomInside some objectsBeyond cutoff distance

bathroom(6.4%)

others(8.0%)

classroom(9.3%)

office(11.0%)

furniture store(11.3%)

bedroom(12.6%)computer room(1.0%)lecture theatre(1.2%)

library(1.4%)study space(1.9%)home office(1.9%)

discussion area(2.0%)

dining area(2.4%)conference room(2.6%)

lab(3.0%)corridor(3.8%)kitchen(5.6%)

living room(6.0%)rest space(6.3%)

dining room(2.3%)

(a) object distribution (b) scene distribution17712

0

1250

2500

3750

5000

chair

tablede

skpil

lowsofa be

dbo

x

cabin

et

garba

ge bi

nlam

psh

elf

sofa

chair

monito

r

drawer

frame

sink

side t

ablepa

per

trash

can

night

stand

book

door

book

shelf

compu

ter

dress

er

curta

intoi

letrackba

g

keyb

oard cp

utv

kitch

en ca

binet

coffe

e tab

lefrid

ge

white b

oardpri

nter

dining

tablesto

olbo

ttletow

el cuppla

nt

paint

ing

hang

ing ca

binet

mirror

unkn

own

compu

ter m

onito

r

laptop

board tra

y

steel

cabin

etov

enbe

nch

clothe

sbo

wl

cook

er

kitch

en to

ol

carto

n box

Kinect v2 SUN3D (ASUS Xtion)

NYUv2 (Kinect v1)Intel RealSense

Benchmark Tasks

Manhattan Box (0.99)Ground Truth Geometric Context (0.27)Convex Hull (0.90)

Convex Hull (0.85)

Geometric Context (0.61)Convex Hull (0.43)Manhattan Box (0.72)

Geometric Context (0.57)Ground Truth

Ground Truth

Manhattan Box (0.811)

Data Capturing Annotation

IoU 50.7 Rr: 0.333 Rg: 0.333 Pg : 0.375IoU: 53.1 Rr: 0.333 Rg: 0.333 Pg: 0.125 IoU: 57.3 Rr :0.33 Rg: 0.667 Pg:0.125

IoU: 53.1 Rr: 0.111 Rg : 0.111 Pg: 0.5IoU 72.9 Rr: 0.333 Rg: 0.667 Pg: 0.667 IoU 63.9 Rr: 0.333 Rg: 0.667 Pg:1IoU: 77.0 Rr: 0.25 Rg: 0.25 Pg: 0.5

IoU: 78.8 Rr: 1 Rg: 1 Pg: 0.5

Gro

und

truth

Slid

ing

Shap

es3D

RC

NN

IoU: 54.6 Rr : 0.333 Rg : 0.333 Pg: 0.125

IoU:60 Rr: 0.50 Rg : 0.0.50 Pg: 0.5

bathtub bed bookshelf box chair counter desk door dresser garbage bin lamp monitor night stand pillow sink sofa table tv toilet

Scene Classi�cation

Room LayoutEstimation

Total SceneUnderstanding

3D Object Detection and

Pose Estimation

SemanticSegmentation

NYU 1,449

LSUN 10,195,373

PASCAL VOC 11,530

ImageNet 131,067

log10RGB datasets RGB-D

SUNRGBD 10,335

NYU 1,449

PASCAL VOC

RGB-D

LSUN 10,195,373

PASCAL VOC 11,530

ImageNet 131,067

RGB datasets

PASCAL VOC 11,530

NYU depth V2 B3DO SUN3D

RealSense Xtion Kinect v1 Kinect v2weight (pound) 0.077 0.5 4 4.5

size (inch) 5.2× 0.25× 0.75 7.1× 1.4× 2 11× 2.3× 2.7 9.8× 2.7× 2.7power 2.5W USB 2.5W USB 12.96W 115W

depth resolution 628× 468 640× 480 640× 480 512× 424color resolution 1920× 1080 640× 480 640× 480 1920× 1080

RealSense

RGB-D Sensors

2. Top

4. Front3. Side

1. Image 2. Top

4. Front3. Side

1. Image

Example Scenes

Annotation Tool for 3D Object and 3D Room Layout

Statistics of Semantic Annotation

Examples of 2D and 3D AnnotationReferences[NYU] Indoor Segmentation and Support Inference from RGBD Images. N. Silber-man, P. Kohli, D. Hoiem and R. Fergus. In ECCV, 2012.[SUN3D] SUN3D: A database of big spaces reconstructed using SfM and object labels. J. Xiao, A. Owens, and A. Torralba. In ICCV, 2013.[B3DO] A category-level 3-d object dataset: Putting the kinect to work. A. Janoch, S. Karayev, Y. Jia, J. T. Barron, M. Fritz, K. Saenko, and T. Darrell. In ICCV Workshop, 2011.

home o�ce

RGB (38.1) D (27.7) RGB-D (39.0)

RGB (19.7) D (20.1) RGB-D (23.0)

bathroombedroom

classroomcomputer room

conference roomcorridor

dining areadining room

discussion areafurniture store

kitchenlab

lecture theatrelibrary

living roomrest space

bathroombedroom

classroomcomputer room

conference roomcorridor

dining areadining room

discussion areafurniture store

kitchenlab

lecture theatrelibrary

living roomrest space

GIST +RBF kernel

PLACES-CNN + RBF kernel

Precision = # prediction boxes# matched pairs with a correct category label

Recall = # matched pairs with a correct category label

# ground truth boxes

Free Space IoU Precision Recall for all objects

We introduce SUN RGB-D, a Pascal-scale RGB-D scene understanding dataset, which has 2D and 3D annota-tion for both objects and rooms.

RGB-D sensors have also enabled rapid progress for scene understanding. However, the small dataset size has become a major bottleneck.

Another problem of existing RGB-D datasets is that most of them are only labeled in 2D.

Data & Code:http://rgbd.cs.princeton.edu

Example Objects

This work is supported by gift funds from Intel.

Acknowledgement

Kinect v2 and Battery Capturing Setup