towards weakly supervised object segmentation & scene parsing · self-erasing network for...

42
Towards Weakly Supervised Object Segmentation & Scene Parsing Yunchao Wei IFP, Beckman Institute, University of Illinois at Urbana-Champaign, IL, USA

Upload: others

Post on 22-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

Towards Weakly Supervised

Object Segmentation & Scene Parsing

Yunchao Wei

IFP, Beckman Institute, University of Illinois at Urbana-Champaign, IL, USA

Page 2: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

Self-Erasing Network for Integral Object AttentionQibin Hou1, Peng-Tao Jiang1, Yunchao Wei 2, Ming-Ming Cheng1

1College of Computer Science, Nankai University, Beijing, China

2IFP, Beckman Institute, University of Illinois at Urbana-Champaign, IL, USA

Image Object Localization Map

Page 3: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

Fully Convolutional Networks

Background

Weak Supervision: Lower degree (or cheaper, simper) annotations at training stage than the required outputs at the testing stage.

scribblespointsimage-level labels

horse

person

bounding boxes

horse person

Weak Supervision

Page 4: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

Background

FCN

LOSS

bkg

horse

person

Learn to Produce Pseudo Mask

The Popular Pipeline

Page 5: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

dog bird

Dense and integral object localization maps

Our target

Background

Small and sparse object localization maps

Class Activation Mapping [Zhou CVPR16]

Our Target & Current Issue

Page 6: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

Revisit Adversarial Erasing

Object Region Mining with Adversarial Erasing [Wei CVPR17]

Page 7: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

Revisit Adversarial Erasing

Adversarial Complementary Learning [Zhang CVPR18]

Page 8: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

Revisit Adversarial Erasing

Over Erasing: The Failure Case of Adversarial Erasing

The backgroundregions can notbe suppressed!!

Page 9: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

Our Solution: Self-Erasing Network

Motivation

Image Attention Map Ternary Mask

object priors

background priors

Page 10: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

Our Solution: Self-Erasing Network

Ternary Mask

object priors

background priors

𝛿ℎ𝛿𝑙

Attention Map

Page 11: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

Backbone

dining table

Backbone

Conv Attention Ternary Mask T

Lo

ss

Conv

GAP

SA

SB

GAP

Lo

ss...... L

oss

Conv

GAP

SC

...C-ReLU strategy I

T

T ThresholdingGAP Global Ave Pool

dining table

C-ReLU strategy II

Our Solution: Self-Erasing Network

Framework

Conv

SA

GAP

Loss...

Attention Ternary Mask T

T

Loss

Conv

GAP ... Loss

Conv

GAP

SC

...

Page 12: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

Experimental Results

Ours ACoL [Zhang CVPR18] ACoL [Zhang CVPR18]Ours

Page 13: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

Experimental Results

Pascal VOC 2012

Page 14: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

Experimental Results

Page 15: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

015

Weakly Supervised Scene Parsing with Point-based

Distance Metric LearningRui Qian1,3, Yunchao Wei 3, Honghui Shi 2,3, Jiachen Li 3, Jiaying Liu1 and Thomas Huang3

1Institute of Computer Science and Technology, Peking University, Beijing, China

2IBM T.J. Waston Research Center, 3IFP, Beckman, UIUC

Point Annotation Full Annotation

Page 16: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

016 Review on Weakly Supervised Methods

◼ Weakly supervised methods for scene parsing◼ Image-level

◼ Box supervision

◼ Scribble supervision

◼ Point supervision

PersonBikeTreeSkyRoad

Page 17: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

03 Annotation Comparison

◼ Annotation burden comparison

Method Full Scribble Point

Average

Anno.pixel/Image170K 1817.48 12.26

Page 18: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

018 Motivation

Input

PointSup

Limited

label!

Input

PointSup

Limited

label!

How to utilize limited annotation?

Cross-image semantic similarity!

train

track

tree

Page 19: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

05 Proposed method(1/5)

◼ Overview◼ Point-based distance metric learning(PDML)

◼ Point supervision(PointSup)

◼ Online extension supervision(ExtendSup)

Feature

Extract

PDML

Classification

Module

Input Feature Prediction

PointSup

ExtendSup

CrossEntropy

Loss

Page 20: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

06 Proposed method(2/5)

◼ Point supervision◼ Only calculate cross-entropy loss on annotated pixels

◼ Back propagate gradients accordingly

◼ Optimize by stochastic gradient descent

Feature

ExtractClassification

Module

Input Feature PredictionPointSup

Cro

ssE

ntro

py

Page 21: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

07 Proposed method(3/5)

◼ Online extension supervision◼ Extension method1 (region):

◼ Select pixels in 5*5 square near the annotated ones

◼ Extension method2 (score):

◼ Select pixels with score over 0.7 in the prediction

◼ Finally choose the intersection of two methods

Feature

ExtractClassification

Module

Input Feature Prediction

PointSup

Cro

ssE

ntro

py

Online

Extension

Page 22: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

08 Proposed method(4/5)

◼ Point-based distance metric learning

Shared

weights

(b)

Feature

Extract

𝐼𝑎

𝐼𝑏

Feature

Extract

Feature

Feature

platformtrain

minimizing

maximizing

(a)

Subgroup

Point-based Distance

Metric Learning

𝐼𝑎

𝐼𝑏𝐼𝑐

𝐼𝑑

train platform

Page 23: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

09 Proposed method(5/5)

◼ Loss function of PDML◼ For each image 𝐼𝑎 , define the embedding vector set as 𝐸𝑎:

◼ 𝐸𝑎 = 𝑖=1ڂ|𝑀𝑎|{𝑃𝑎𝑖}

◼ |𝑀𝑎| is the number of annotated pixel of 𝐼𝑎◼ 𝑃𝑎𝑖 is the feature vector of 𝑖th pixel

◼ We optimize in the triplet form of {𝑃𝑎𝑖 , 𝑃𝑏𝑗, 𝑃𝑏𝑘}:

◼ 𝑃𝑎𝑖 shares the same category with 𝑃𝑏𝑗◼ 𝑃𝑏𝑘 shares different category with 𝑃𝑎𝑖 , 𝑃𝑏𝑗

◼ We use the loss function of :

𝐿𝑡 𝑃𝑎𝑖 , 𝑃𝑏𝑗, 𝑃𝑏𝑘 = α𝐿𝑝 𝑃𝑎𝑖, 𝑃𝑏𝑗 + 𝛽𝐿𝑛 𝑃𝑎𝑖 , 𝑃𝑏𝑗 , 𝑃𝑏𝑘

◼ 𝐿𝑝 𝑃𝑎𝑖 , 𝑃𝑏𝑗 = ||𝑃𝑎𝑖 − 𝑃𝑏𝑗||2

◼ 𝐿𝑛 𝑃𝑎𝑖, 𝑃𝑏𝑗 , 𝑃𝑏𝑘 = max(||𝑃𝑎𝑖 − 𝑃𝑏𝑗||2 - ||𝑃𝑎𝑖 − 𝑃𝑏𝑘||2 + 𝑚, 0)

◼ α, 𝛽,𝑚 are hyper-params and are set to 0.8, 1, 20 in practice

Page 24: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

10 Datasets

◼ Scene parsing datasets◼ PASCAL-Context

◼ ADE 20K

Dataset #Training #Evaluation #Instance/Image

PASCAL-

Context4998 5105 12.26

ADE20K 20210 2000 13.96

Page 25: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

25 Experimental Results

Method Metrics

FullSup PointSup PDML Online Ext. mIoU Pixel Acc

√ 39.6 78.6%

√ 27.9 55.3%

√ √ 29.7 57.5%

√ √ √ 30.0 57.6%

◼ Quantitative evaluation on PASCAL-Context◼ The combination of three techniques is best

◼ We use only 0.007% annotated data but reached

75% of the full supervision performance!

Page 26: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

26 Experimental Results

Method Metrics

FullSup PointSup PDML Online Ext. mIoU Pixel Acc

√ 33.9 75.8%

√(SegNet) 21.0 /

√ 17.7 58.0%

√ √ 19.0 59.0%

√ √ √ 19.6 61.0%

◼ Quantitative evaluation on ADE20K◼ The combination of three techniques is best

◼ Our method approaches the result SegNet under

full supervision scheme

Page 27: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

27

Image

PointSup

PDML

Final

GT

Subjective Evaluation

Page 28: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

28

Image

PointSup

PDML

Final

GT

Subjective Evaluation

Page 29: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

29

Image

PointSup

PDML

Final

GT

Subjective Evaluation

Page 30: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

30

Image

PointSup

PDML

Final

GT

Subjective Evaluation

Page 31: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

31

Image

PointSup

PDML

Final

GT

Subjective Evaluation

Page 32: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

32

Image

PointSup

PDML

Final

GT

Subjective Evaluation

Page 33: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

33

Image

PointSup

PDML

Final

GT

Subjective Evaluation

Page 34: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

34

Image

PointSup

PDML

Final

GT

Subjective Evaluation

Page 35: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

35

Image

PointSup

PDML

Final

GT

Subjective Evaluation

Page 36: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

36

Image

PointSup

PDML

Final

GT

Subjective Evaluation

Page 37: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

23 Discussion

◼ Visualization the effect of PDML ◼ dis(+): L2 norm distance between same-class feature vectors

◼ dis(-): L2 norm distance between different-class feature vectors

Page 38: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

23 Discussion

◼ Ablation on the design of PDML loss function◼ 𝐿𝑡 𝑃𝑎𝑖, 𝑃𝑏𝑗 , 𝑃𝑏𝑘 = α𝐿𝑝 𝑃𝑎𝑖 , 𝑃𝑏𝑗 + 𝛽𝐿𝑛 𝑃𝑎𝑖 , 𝑃𝑏𝑗 , 𝑃𝑏𝑘

Page 39: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

039

◼ Problem

◼ Point-guided scene parsing

◼ Point-based distance metric learning

◼ Exploit semantic relationship across images

◼ Experimental results

◼ Good performance both quantitatively and

qualitatively

Conclusion

Page 40: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

Conclusion

Page 41: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

Weakly Supervised Learning for Real-World Computer Vision Applications & The 1st Learning from Imperfect Data (LID) Challenge

CVPR 2019 Workshop, Long Beach, CA

Object Segmentation on ILSVRC DET (Image-level Supervision) Scene Parsing on ADE20K (Point Supervision)

Task 1 Task 2

https://lidchallenge.github.io/

Page 42: Towards Weakly Supervised Object Segmentation & Scene Parsing · Self-Erasing Network for Integral Object Attention Qibin Hou 1, Peng-Tao Jiang , Yunchao Wei 2, Ming-Ming Cheng1 1College

42 Thank you