a fuller understanding of fully convolutional networks · 2020-02-13 · boxsup: exploiting...

38
UC Berkeley in CVPR'15, PAMI'16 A Fuller Understanding of Fully Convolutional Networks Evan Shelhamer* Jonathan Long* Trevor Darrell 1

Upload: others

Post on 04-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

UC Berkeley in CVPR'15, PAMI'16

A Fuller Understanding of Fully Convolutional Networks

Evan Shelhamer* Jonathan Long* Trevor Darrell1

Page 2: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

pixels in, pixels out

semanticsegmentation

2

monocular depth + normals Eigen & Fergus 2015

boundary prediction Xie & Tu 2015optical flow Fischer et al. 2015

colorizationZhang et al.2016

Page 3: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

3

“tabby cat”

1000-dim vector

< 1 millisecond

convnets perform classification

end-to-end learning

Page 4: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

4

~1/10 second

end-to-end learning

???

lots of pixels, little time?

Page 5: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

“tabby cat”

5

a classification network

Page 6: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

6

becoming fully convolutional

Page 7: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

7

becoming fully convolutional

Page 8: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

8

upsampling output

Page 9: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

9

end-to-end, pixels-to-pixels network

Page 10: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

conv, pool,nonlinearity

upsampling

pixelwiseoutput + loss

end-to-end, pixels-to-pixels network

10

Page 11: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

spectrum of deep features

combine where (local, shallow) with what (global, deep)

fuse features into deep jet

(cf. Hariharan et al. CVPR15 “hypercolumn”) 11

Page 12: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

skip layers

skip to fuse layers!

interp + sum

interp + sum

dense output 12

end-to-end, joint learningof semantics and location

Page 13: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

stride 32

no skips

stride 16

1 skip

stride 8

2 skips

ground truthinput image

skip layer refinement

13

Page 14: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

skip FCN computation Stage 1 (60.0ms)

Stage 2 (18.7ms)

Stage 3 (23.0ms)

A multi-stream network that fuses features/predictions across layers

Page 15: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

FCN SDS* Truth Input

15

Relative to prior state-of-the-art SDS:

- 30% relative improvementfor mean IoU

- 286× faster

*Simultaneous Detection and Segmentation Hariharan et al. ECCV14

Page 16: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

leaderboard

== segmentation with Caffe

16

FCNFCNFCNFCNFCNFCNFCNFCNFCNFCNFCN

FCNFCNFCN

FCN

Page 17: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

17

Page 18: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

care and feeding offully convolutional networks

18

Page 19: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

- train full image at a time without sampling

- reshape network to take input of any size

- forward time is ~100ms for 500 x 500 x 21 output(on M. Titan X)

usage

19

Page 20: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

image-to-image optimization

20

Page 21: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

momentum and batch size

21

Page 22: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

sampling images?no need! no improvement from sampling across images

22

Page 23: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

sampling pixels?no need! no improvement from (partially) decorrelating pixels

23uniform poisson

Page 24: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

context?

24

- do FCNs incorporate contextual cues?

- loses 3-4 % points whenthe background is masked

- can learn from BG/shapealone if forced to!

- Standard 85 IU- BG alone 38 IU- Shape 29 IU

Page 25: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

past and future history offully convolutional networks

25

Page 26: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

history

Convolutional Locator NetworkWolf & Platt 1994

Shape Displacement NetworkMatan & LeCun 1992

26

Page 27: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

27

Scale Pyramid, Burt & Adelson ‘83

pyramids

0 1 2

The scale pyramid is a classic multi-resolution representation

Fusing multi-resolution network layers is a learned, nonlinear counterpart

Page 28: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

28

Jet, Koenderink & Van Doorn ‘87

jets

The local jet collects the partial derivatives at a point for a rich local description

The deep jet collects layer compositions for a rich,learned description

Page 29: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

29

extensions

- detection + instances- structured output- weak supervision

Page 30: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

30

detection: fully conv. proposals

Fast R-CNN, Girshick ICCV'15

Faster R-CNN, Ren et al. NIPS'15

end-to-end detection by proposal FCN RoI classification

Page 31: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

fully conv. nets + structured output

Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs.Chen* & Papandreou* et al. ICLR 2015. 31

Page 32: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

fully conv. nets + structured output

Conditional Random Fields as Recurrent Neural Networks. Zheng* & Jayasumana* et al. ICCV 2015. 32

Page 33: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

dilation for structured output

Multi-Scale Context Aggregation by Dilated Convolutions. Yu & Koltun. ICLR 2016 33

- enlarge effective receptive field for same no. params

- raise resolution

- convolutional context model:similar accuracy toCRF but non-probabilistic

Page 34: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

[ comparison credit: CRF as RNN, Zheng* & Jayasumana* et al. ICCV 2015 ]

34DeepLab: Chen* & Papandreou* et al. ICLR 2015. CRF-RNN: Zheng* & Jayasumana* et al. ICCV 2015

Page 35: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

fully conv. nets + weak supervision

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation.Pathak et al. arXiv 2015.

FCNs expose a spatial loss map to guide learning:segment from tags by MIL or pixelwise constraints

35

Page 36: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

fully conv. nets + weak supervision

BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation.Dai et al. 2015.

FCNs expose a spatial loss map to guide learning:mine boxes + feedback to refine masks

36

Page 37: A Fuller Understanding of Fully Convolutional Networks · 2020-02-13 · BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation. Dai et al

fully conv. nets + weak supervisionFCNs can learn from sparse annotations == sampling the loss

What's the Point? Semantic Segmentation with Point Supervision. Bearman et al. ECCV 2016. 37