damage detection fromaerial images via convolutional ...5/8/2017 mva2017@nagoya university 2 in the...

Post on 10-Jul-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Damage Detection from Aerial Images

via Convolutional Neural Networks

*National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan

†Nagoya University, Aichi, Japan

5/8/2017 MVA2017@Nagoya University 1

Aito Fujita*

Riho Ito*

Ken Sakurada† Tomoyuki Imaizumi*

Shuhei Hikoska* Ryosuke Nakamura*

Aerial images for damage detection

5/8/2017 MVA2017@Nagoya University 2

In the event of catastrophic disasters, fast assessment ofthe extent of damage is crucial for recovery.

E.g. “Which buildings got washed away by tsunami? And how many?”

Satellite/aerial images as source for damage assessment:

◦ The wide coverage can facilitate the assessment.

◦ However, currently, the assessment is performed manually (by human eye).

Aerial view of Tohoku quake

5/8/2017 MVA2017@Nagoya University 3

Post-quake imagePre-quake image

2km

4km

Damage assessment in practice

5/8/2017 MVA2017@Nagoya University 4

Post-quake imagePre-quake image

Damage assessment in practice

5/8/2017 MVA2017@Nagoya University 5

Post-quake imagePre-quake image

Damage assessment in practice

5/8/2017 MVA2017@Nagoya University 6

Post-quake imagePre-quake image

5/8/2017 MVA2017@Nagoya University 7

Post-quake imagePre-quake image

Damage assessment in practice

This building is “washed-away”.

Damage assessment in practice

5/8/2017 MVA2017@Nagoya University 8

Post-quake imagePre-quake image

Damage assessment in practice

5/8/2017 MVA2017@Nagoya University 9

Post-quake imagePre-quake image

This building is “surviving”.

Damage assessment in practice

5/8/2017 MVA2017@Nagoya University 10

Post-quake imagePre-quake image

Repeated for all buildings.

Damage assessment in practice

5/8/2017 MVA2017@Nagoya University 11

Post-quake imagePre-quake image

Repeated for all buildings.

Damage assessment in practice

5/8/2017 MVA2017@Nagoya University 12

Post-quake imagePre-quake image

Repeated for all buildings.

Damage assessment in practice

5/8/2017 MVA2017@Nagoya University 13

Post-quake imagePre-quake image

Repeated for all buildings.

Damage assessment in practice

5/8/2017 MVA2017@Nagoya University 14

Manual assessmentPost-quake imagePre-quake image

RED: washed-away bldg.

YELLOW: surviving bldg.

Post-quake imagePre-quake image

RED: washed-away bldg.

YELLOW: surviving bldg.

Damage assessment in practice

5/8/2017 MVA2017@Nagoya University 15

Manual assessment

Manual assessment takes too much time.

Can we automate such assessment process?

Background and Contributions

Background of damage detection:

◦ Lack of labeled dataset

◦ No study on Deep Learning applied to satellite/aerial images

Related works in this context:

◦ Gueguen+(CVPR2015): hand-designed features (tree-of-shapes); satellite images

◦ Cooner+(Remote Sensing 2016): shallow neural networks; aerial images

◦ Nia+(CRV2017): Convolutional networks but ground-level images

Our contributions:◦ A new labeled dataset (ABCD dataset)

◦ Comprehensive analyses of CNNs for washed-away building detection

5/8/2017 MVA2017@Nagoya University 16

Image coverage

New dataset for washed-away building detection

AIST Building Change Detection (ABCD) dataset:

◦ Based on images taken before/after the Tohoku earthquake

◦ Over 10K pairs of pre/post-tsunami patches

◦ Collected from 66km2 of aerial images of the Tohoku coastal region

◦ A target building at the center of the patch

◦ A damage label assigned to each pair

5/8/2017 MVA2017@Nagoya University 17

Washed-away

buildings

Surviving

buildings

◦ The damage label data is based on the survey result of the Great East Japan earthquake,

which was conducted by the Japanese government (MLIT) after March 11, 2011.

◦ Now preparing to release to the public.

Overview of washed-away detection framework

5/8/2017 MVA2017@Nagoya University 18

Overview of washed-away detection framework

5/8/2017 MVA2017@Nagoya University 19

Overview of washed-away detection framework

5/8/2017 MVA2017@Nagoya University 20

Overview of washed-away detection framework

5/8/2017 MVA2017@Nagoya University 21

Overview of washed-away detection framework

5/8/2017 MVA2017@Nagoya University 22

Practical considerations

From a practical viewpoint, we investigated the following:

Input scenario (to address availability of pre-tsunami image)

Input scale (to address variability of building size)

5/8/2017 MVA2017@Nagoya University 23

Practical considerations: Input scenario

From a practical viewpoint, we investigated the following:

Input scenario (to address availability of pre-tsunami image)

◦ Both pre- and post-tsunami images are available.

◦ Only post-tsunami images are available.

5/8/2017 MVA2017@Nagoya University 24

Practical considerations: Input scenario

From a practical viewpoint, we investigated the following:

Input scenario (to address availability of pre-tsunami image)

◦ Both pre- and post-tsunami images are available.

◦ Only post-tsunami images are available.

5/8/2017 MVA2017@Nagoya University 25

Practical considerations: Input scenario

From a practical viewpoint, we investigated the following:

Input scenario (to address availability of pre-tsunami image)

◦ Both pre- and post-tsunami images are available.

◦ Only post-tsunami images are available.

5/8/2017 MVA2017@Nagoya University 26

Input to a CNN: a pair of pre/post patches

Two configurations inspired by Zagoruyko+(CVPR15), Lin+(CVPR15), Simo-Serra+(ICCV15)

Practical considerations: Input scenario

From a practical viewpoint, we investigated the following:

Input scenario (to address availability of pre-tsunami image)

◦ Both pre- and post-tsunami images are available.

◦ Only post-tsunami images are available.

5/8/2017 MVA2017@Nagoya University 27

Input to a CNN: a pair of pre/post patches

Two configurations inspired by Zagoruyko+(CVPR15), Lin+(CVPR15), Simo-Serra+(ICCV15)

Fully-connected

Convolution

Max pooling

Practical considerations: Input scenario

From a practical viewpoint, we investigated the following:

Input scenario (to address availability of pre-tsunami image)

◦ Both pre- and post-tsunami images are available.

◦ Only post-tsunami images are available.

5/8/2017 MVA2017@Nagoya University 28

In practice, pre-tsunami images may not be available.

Practical considerations: Input scenario

From a practical viewpoint, we investigated the following:

Input scenario (to address availability of pre-tsunami image)

◦ Both pre- and post-tsunami images are available.

◦ Only post-tsunami images are available.

5/8/2017 MVA2017@Nagoya University 29

Input to a CNN: a post-tsunami patch

Practical considerations

From a practical viewpoint, we investigated the following:

Input scenario (to address availability of pre-tsunami image)

Input scale (to address variability of building size)

◦ Multiple crop sizes

◦ Fixed-scale

◦ Size-adaptive

◦ Multi-scale CNN

5/8/2017 MVA2017@Nagoya University 30

Practical considerations

From a practical viewpoint, we investigated the following:

Input scenario (to address availability of pre-tsunami image)

Input scale (to address variability of building size)

◦ Multiple crop sizes

◦ Fixed-scale -> Aims to encompass most of the buildings.

◦ Size-adaptive -> Makes tiny buildings more conspicuous.

◦ Multi-scale CNN

5/8/2017 MVA2017@Nagoya University 31

Practical considerations

From a practical viewpoint, we investigated the following:

Input scenario (to address availability of pre-tsunami image)

Input scale (to address variability of building size)

◦ Multiple crop sizes

◦ Fixed-scale: 160 x 160 pixels (64m x 64m) so that most buildings are encompassed.

5/8/2017 MVA2017@Nagoya University 32

160 pixels

16

0 p

ixel

s

Practical considerations

From a practical viewpoint, we investigated the following:

Input scenario (to address availability of pre-tsunami image)

Input scale (to address variability of building size)

◦ Multiple crop sizes

◦ Fixed-scale -> Aim to encompass most of the buildings.

◦ Size-adaptive: the crop size depends on buildings—more emphasis on small buildings.

5/8/2017 MVA2017@Nagoya University 33

Size-adaptively crop and resize

Fixed-scale (160x160) Resized version

Practical considerations

From a practical viewpoint, we investigated the following:

Input scenario (to address availability of pre-tsunami image)

Input scale (to address variability of building size)

◦ Multiple crop sizes

◦ Fixed-scale -> Aim to encompass most of the buildings.

◦ Size-adaptive -> Make tiny buildings more conspicuous.

◦ Multi-scale CNN

5/8/2017 MVA2017@Nagoya University 34

Practical considerations

From a practical viewpoint, we investigated the following:

Input scenario (to address availability of pre-tsunami image)

Input scale (to address variability of building size)

◦ Multiple crop sizes

◦ Fixed-scale

◦ Size-adaptive

◦ Multi-scale CNN

5/8/2017 MVA2017@Nagoya University 35

Inspired by Zagoruyko+(CVPR15)

Experiment

Experimental setting, the same across all conditions (input scenario and scale)

The number of data◦ Randomly sampled 8,500 pairs; balanced class distribution

Evaluation◦ 5-fold cross validation (train/val per fold = 6800 : 1700)

◦ Classification accuracy

CNN hyper-parameters◦ Conv-Pool-Conv-Pool-Conv-Conv-FC-FC with ReLU nonlinearity

◦ No batch normalization (due to degradation in performance)

◦ SGD with momentum; constant learning rate; weight decay

◦ Shared or unshared weights between branches for Siamese case

Data pre-processing◦ Zero mean and unit variance (pixel-wise)

◦ Augmentation with vertical and horizontal flip

5/8/2017 MVA2017@Nagoya University 36

Accuracy

5/8/2017 MVA2017@Nagoya University 37

93.0%

94.0%

95.0%

96.0%

Without multi-scale

With multi-scale

Without multi-scale

With multi-scale

Without multi-scale

With multi-scale

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Accuracy

5/8/2017 MVA2017@Nagoya University 38

93.0%

94.0%

95.0%

96.0%

6-channel Siamese Only post image

Without multi-scale

With multi-scale

Without multi-scale

With multi-scale

Without multi-scale

With multi-scale

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Accuracy

5/8/2017 MVA2017@Nagoya University 39

93.0%

94.0%

95.0%

96.0%

Without multi-scale

With multi-scale

Without multi-scale

With multi-scale

Without multi-scale

With multi-scale

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Accuracy

5/8/2017 MVA2017@Nagoya University 40

93.0%

94.0%

95.0%

96.0%

Without multi-scale

With multi-scale

Without multi-scale

With multi-scale

Without multi-scale

With multi-scale

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixed Adaptive

Accuracy

5/8/2017 MVA2017@Nagoya University 41

93.0%

94.0%

95.0%

96.0%

Without multi-scale

With multi-scale

Without multi-scale

With multi-scale

Without multi-scale

With multi-scale

Wrongly classified

as “surviving”

Correctly classified

as “washed-away”

Correctly classified

as “washed-away”

Wrongly classified

as “surviving”

Fixed-scale Adaptive

The different scales are complementary w.r.t. prediction.

vs

vs

Accuracy

5/8/2017 MVA2017@Nagoya University 42

93.0%

94.0%

95.0%

96.0%

6-channel Siamese Only post image

Without multi-scale

With multi-scale

Without multi-scale

With multi-scale

Without multi-scale

With multi-scale

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Accuracy

5/8/2017 MVA2017@Nagoya University 43

93.0%

94.0%

95.0%

96.0%

6-channel Siamese Only post image

Without multi-scale

With multi-scale

Without multi-scale

With multi-scale

Without multi-scale

With multi-scale

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

93.0%

94.0%

95.0%

96.0%

Accuracy

5/8/2017 MVA2017@Nagoya University 44

Without multi-scale

With multi-scale

v.s.

Accuracy: in terms of input scenario

5/8/2017 MVA2017@Nagoya University 45

Only post-tsunami images may be sufficient in the context of washed-away building detection.

93.0%

94.0%

95.0%

96.0%

Both pre/post images Only post image

Accuracy: in terms of input scale

5/8/2017 MVA2017@Nagoya University 46

93.0%

94.0%

95.0%

96.0%

No good-and-bad between fixed-scale and adaptive-scale.The ensemble is always better.

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Accuracy: in terms of input scale

5/8/2017 MVA2017@Nagoya University 47

93.0%

94.0%

95.0%

96.0%

Without multi-scale

With multi-scale

Without multi-scale

With multi-scale

Without multi-scale

With multi-scale

Multi-scale CNNs always beat single-scale counterparts.

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Fixe

d

Ad

apti

ve

Ense

mb

le

Qualitative result

5/8/2017 MVA2017@Nagoya University 48

Ground truthCNN prediction

RED: washed-away

YELLOW: surviving

Conclusion

This work in a nutshell:

◦ Effective use of CNNs for washed-away detection was explored.

◦ To this end, we compiled a new labeled dataset.

◦ On this dataset, we performed some types of experiments from an application viewpoint (input scenario and input scale).

◦ Overall, the accuracy was reasonably good.

Future work:

◦ Generalizability (other regions, other events, other data and so on)

◦ End-to-end framework from localization to classification such as instance-segmentation (Pinheiro+ECCV16)

5/8/2017 MVA2017@Nagoya University 49

5/8/2017 MVA2017@Nagoya University 50

Reference:◦Cooner et al.: Detection of urban damage using remote sensing and machine learning algorithms:

Revisiting the 2010 Haiti earthquake, in Remote Sensing, 2016.

◦Gueguen and Hamid: Large-scale damage detection using satellite imagery, in CVPR, 2015.

◦Simo-Serra et al.: Discriminative learning of deep convolutional feature point descriptors, in ICCV, 2015.

◦Zagoruyko and Komodakis: Learning to compare image patches via convolutional neural networks, in

CVPR, 2015.

◦Lin et al.: Learning deep representations for ground-to-aerial geolocalization" in CVPR, 2015.

◦Ministry of Land, Infrastructure, Transport, and Tourism: First report on an assessment of the damage

caused by the Great East Japan earthquake", http://www.mlit.go.jp/common/000162533.pdf

(published in Japanese)

◦Pinheiro: Learning to re ne object segments, in ECCV, 2016.

◦Nia and Mori: Building Damage Assessment Using Deep Learning and Ground-Level Image Data, in CRV,

2017.

Acknowledgement: This presentation is based on results obtained from a project commissioned by the New Energy and

Industrial Technology Development Organization (NEDO).

Appendix

5/8/2017 MVA2017@Nagoya University 51

Related work

Damage detection◦ Gueguen & Hamid (CVPR, 2015)

◦ Pre- and post-disaster satellite images

◦ Semi-supervised classification (BoF + SVM)

◦ Cooner et al. (Remote Sensing, 2016)

◦ Pre- and post-disaster satellite images

◦ Two-layer neural network

◦ Nia & Mori (CRV, 2017)

◦ Only post-disaster ground-level image

◦ Dilated CNN

5/8/2017 MVA2017@Nagoya University 52

Related work

Two image patches as input to CNN

◦ Simo-Serra et al. (ICCV, 2015)

◦ Lin et al. (CVPR, 2015)

◦ Zagoruyko & Komodakis (CVPR, 2015)

5/8/2017 MVA2017@Nagoya University 53

5/8/2017 MVA2017@Nagoya University 54

5/8/2017 MVA2017@Nagoya University 55

9 days after the quake (20/Mar/2011)114 days before the quake (17/Nov/2010)

Human annotations derived from the field survey are superimposed.

Red arrows are buildings that were washed away by tsunami.

Visual changes of buildings with tsunami are clear, but how can we describe these?

5/8/2017 MVA2017@Nagoya University 56

Cross validation result

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 8

15 22 29 36 43 50 57 64 71 78 85 92 99

10

6

11

3

120

12

7

13

4

14

1

14

8

15

5

16

2

16

9

17

6

183

19

0

19

7

20

4

21

1

21

8

22

5

23

2

23

9

246

25

3

26

0

26

7

27

4

28

1

28

8

29

5

Acc

ura

cy /

Lo

ss

Iteration (x10)

accuracy (fixed_size/noAug)

testloss (fixed_size/noAug)

trainloss (fixed_size/noAug)

5/8/2017 MVA2017@Nagoya University 57

Size-adaptive classification

Improved:

Worsened:

5/8/2017 MVA2017@Nagoya University 58

5/8/2017 MVA2017@Nagoya University 59

0.8

0.82

0.84

0.86

0.88

0.9

0.92

0.94

0.96

0.98

1

0 1 2 3 4

Cro

ss-v

alidati

on m

ean o

f accura

cy

Number of conv layers shared

Effect of # of shared layers on accuracy

Visualization via Global Average Pooling

Global Average Pooling is the technique for:

1) Regularizing big networks by forgoing FC layers (e.g., NIN and GoogLeNet)

2) Visualizing feature maps learned by networks in an intuitive way for human

3) Localizing object instances even under the weakly-supervised setting

Recent papers that delve into GAP:

◦ Learning Deep Features for Discriminative Localization (Zhou+, CVPR16)

◦ Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization (Selvaraju+, arXiv16)

5/8/2017 MVA2017@Nagoya University 60

in Zhou+ (CVPR16)

Replace fc-layers w/ GAP and need to retrain (leading to some drops in classification accuracy, but improving localization power)

5/8/2017 MVA2017@Nagoya University 61

Visualization using GAP: Class Activation Map

Visualization using GAP: Class Activation Map

in Selvaraju+ (arXiv16)

Retraining is unnecessary thanks to gradient backprop (keeping models intact; hence it doesn’t impair classification performance)

5/8/2017 MVA2017@Nagoya University 62

resized

fixed

5/8/2017 MVA2017@Nagoya University 63

resized

fixed

5/8/2017 MVA2017@Nagoya University 64

resized

fixed

5/8/2017 MVA2017@Nagoya University 65

top related