urban 3d semantic modelling using stereo vision, icra 2013

27
Urban 3D Semantic Modelling Using Stereo Vision Sunando Sengupta 1 , Eric Greveson 2 , Ali Shahrokni 2 , Philip HS Torr 1 1 Oxford Brookes Vision Group, 2 2d3 Sensing.

Upload: sunando-sengupta

Post on 28-Jul-2015

39 views

Category:

Science


3 download

TRANSCRIPT

Page 1: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

Urban 3D Semantic Modelling Using Stereo Vision

Sunando Sengupta1, Eric Greveson2, Ali Shahrokni2, Philip HS Torr1

1Oxford Brookes Vision Group, 22d3 Sensing.

Page 2: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

Urban 3D Semantic Modelling Road Scene

• Given a sequence of stereo images we generate a dense 3D, semantic model

Input Stereo image Sequence Dense 3D Semantic Model

Page 3: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

• Stereo images

Pipeline –Semantic Reconstruction

Page 4: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

• Depth map generation• Camera estimation

Pipeline –Semantic Reconstruction

Page 5: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

• Surface reconstruction

Pipeline –Semantic Reconstruction

Page 6: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

• Semantic labelling of street view images

Pipeline –Semantic Reconstruction

Page 7: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

• Semantic model generation

Pipeline –Semantic Reconstruction

Page 8: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

Camera Estimation

• Feature tracking using left-right pair and consecutive frames

Page 9: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

Camera Estimation

• Use the feature tracks to estimate camera poses.

• Use bundle adjustment

[a] Andreas Geiger et. Al. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite CVPR 2012

Page 10: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

Depth-Map Estimation

• Semiglobal block matching[1] for disparity estimation

• Per-pixel depth computed as z = B x f / d

[1] H. Hirschmueller, Stereo Processing by Semi-Global Matching and Mutual Information. PAMI 2008.

B – Baselinef - Focal Length

d – pixel disparity

Page 11: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

Depth Fusion

• Depth estimates are fused using camera poses.

• Fused into truncated signed distance (TSDF) volumetric representation[1].

[1] Brian Curless and Marc Levoy, A Volumetric Method for Building Complex Models from Range Images Siggraph 96.

Page 12: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

TSDF Volume[1]

• Entire space divided into grids of voxels.

• For each voxel compute the truncated signed distance.

– +ve increasing when it lies in the free space, – -ve when it lies behind the surface– zero when lies on the surface

• Performed for all depth maps.

[1] B. Curless et. al. A volumetric method for building complex models from range images.

Page 13: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

TSDF Volume

-.8

Camera

Actual surface TSDF volume

Page 14: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

TSDF Volume

-1 -.8 -.3 .2 .8 1 1 1

-1 -.9 -.4 .1 .5 1 1 1

-1 -1 -.8 -.2 .1 1 1 1

-1 -1 -.9 -.3 .2 .8 1 1

-1 -1 -.9 -.4 .3 .9 1 1

-1 -1 -.8 -.3 .3 .9 1 1

-1 -1 -.9 -.5 .2 .8 1 1

-1 -1 -.6 .1 .7 1 1 1

Camera

TSDF volume

Actual surface

Page 15: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

Incremental Volume Update

• Road scenes are arbitrary length long sequence.

• 3x3x1 volume of voxel grids initialised

Page 16: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

Incremental Volume Update

• Road scenes are arbitrary length long sequence.

• 3x3x1 volume of voxel grids initialised

• Incrementally add volume as the vehicle moves out of the region

Page 17: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

CRF

construction

Semantic Image Segmentation• We use conditional random field framework (CRF)

Final SegmentationInput Image

17

• Each pixel is a node in a grid graph G = (V,E).• Each node is a random variable x taking a label from

label set.

X

[1] L. Ladicky, C. Russell, P. Kohli, and P. H. Torr, “Associative hierarchical crfs for object class image segmentation,” in ICCV, 2009.

Page 18: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

Semantic Image Segmentation• Total energy E = Epix + Epair + Eregion

• Epix - Model individual pixel’s cost of taking a label.

– Computed via the dense boosting approach– Multi feature variant of texton boost[1]

x

Car 0.2

Road 0.3

18[1] L. Ladicky, C. Russell, P. Kohli, and P. H. Torr, “Associative hierarchical crfs for object class image segmentation,” in ICCV, 2009.

Page 19: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

Semantic Image Segmentation• Total energy E = Epix + Epair + Eregion

• Epair- Model each pixels neighbourhood interaction.

– Encourages label consistency in adjacent pixels and sensitive to edges.

– Contrast sensitive Potts model xi xj

Car

Road

0

g(i,j)Car

Road

19

Epair

Page 20: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

Semantic Image Segmentation• Total energy E = Epix + Epair + Eregion

• Eregion - Model behaviour of a group of pixels.

– Encourages all the pixels in a region to take the same label.

– Group of pixels given by a multiple meanshift segmentations

c

Car 0.3Road 0.1

20

Page 21: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

Semantic Image Segmentation - Results

• Input Images, output of our image level CRF, ground truths.

Page 22: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

Mesh Face Labelling

• A histogram of labels is built for each mesh face (Zf ), by projecting the points from the face into labelled images.

• Majority label is considered as the label of the face.

Page 23: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

Semantic Model

Top: Left – Surface reconstruction, Right – Semantic modelBottom: Left - input image, Right- object label set

Page 24: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

Evaluation

• The Model is projected back using the estimated camera poses to create labelled images.

• The points in the model far away from the camera are ignored in the projection.

Page 25: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

Evaluation• Metrics– Recall = tp/(tp+fn)– Intersection vs Union = tp/(tp+fn+fp)

Page 26: Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013

Video