view2019 fulllength v2 - mprgmprg.jp/data/mprg/f_group/f20191205_goto.pdf · 2 Ö 8 q?c l ¬ ~ :[%]...

6 6

○ † † † † †:

{[email protected]., [email protected]., takayoshi@isc., fujiyoshi@isc.}chubu.ac.jp

a n A

An

l G No n

Loe No n a n

n nL G TL a 360 n

n a b

NSnL

G AnL , ,

Deep Convolutional Neural Network (DCNN)[1]a

n - 11a

i

No n i

a eo n

n A G N

o n DCNNFully Convolutional Network (FCN)[2]

Pyramid Scene Parsing Network (PSPNet)[3] Encoder-Decoder SegNet[4] U-Net[5] GAn

e DCNN RNN[6] S

DAG-RNN[7]GAn Lol

a

[8][9][10][11]n i a

[8][9][12]G No

G o n Lo

e No n a

l No n

nL G

TL a (

n

n a

D G n n

a a

nG d Dn L

o i u G

G n GAn L

C D

NSn [14]G No

n a

NSn e n

n

Dilated Convolution[13] n Dilated Convolution a C

L

G n Lo (

l b DnL G

D N

nL G n

( a

No No n

G n

GAn n

G n GAn TL

No

G C C Lo

c

G n

n (

n a

i NS

l No n

- 11

Fully Convolutional Network (FCN)[2] SegNet[4]GAn FCN a

No

An FCN a DCNNn 1×1

D n Dn

L n nL

G n e a

o G n

C GAn FCN aL n

n

a

S n Lo l

DnL G

G n SegNet a VGG16[15]

A Encoder Decoder S

n Encoder aNo

n SegNet aMax Pooilng C

i

Decoder a Encoder

e n L Encoder

Upsampling C Upsampling G No

a 0 n Lo

n DCNN RNN S

DAG-RNN[7]GAn DCNN a An

l C RNN SnL

D G n DAG-RNN a

n lo DAG-RNN

3 l D

n

No Deconvolutionn e

DnL G n Dilated Convolution[13]GAn Dilated Convolution a

o CL

G n Encoder-Decoder [4]

a G n

G n GAn e

DCNN [2] a

l C

t-2

t-4

t-3

入力出力

Convolution Dilated Convolution Max Pooling Deconvolution

t

t-1

t

DAG-LSTM 1×1Conv

DAG-LSTM 1×1Conv

DAG-LSTM 1×1Conv

DAG-LSTM 1×1Conv

DAG-LSTM 1×1Conv

DecoderEncoder

� 1(��!� %'��!� %'��

G No a DCNNRNN S Dilated ConvolutionnL

DnL G n

5 .

( a D

G n i u G

G n GA

n a D

n

NSn n

a ) DeconvolutionNo n a3×3

A a CM

n Max Pooling n e )

Deconvolution DAG-LSTMn DAG-LSTM nL

nL G

G n a

n

a a Dilated Convolution C Lo

G n

cn

�� !��

NSn RNNn NSn )

No n tt-4 l t e

NSn NSnL

n nL G

n

�� !��

NSn DAG-RNN with LSTM (DAG-LSTM) bDilated Convolution

n DAG-LSTM DAG-RNN a DCNN RNN S

An

RNN DCNN nL

D G

- 230

-9 4 8 9

n DAG-RNN a ElmanRNN[6] n RNN a

No G

C GAn a RNNLSTM[16] DAG-LSTM

n LSTM nL

nL G n

n

Dilated Convolution n Dilated Convolution 3

a 4 C 3×3

C

n Dilated Convolution a 5 C

3×3C e a 6 C

4 o o

Dn

L G n G G

i DnL G

n

�� #)*�!��

( a

��

t-1

Encoder

Σ

Decoder

LSTM

LSTM

LSTM

LSTM

t

時間方向の伝搬

空間方向の伝搬

(−#,+&) (+#,−&) (−#,−&)

DAG-LSTM : t-1

特徴マップ特徴マップ

特徴マップDAG-LSTM : t

(+#,+&)空間方向の伝搬

No Non L

n

G o G n

GAn

No G

G

n TL C

C n a n

C n

No G eon C n

n

C C

D a (1) C L

G n LL !"#$a Softmax Cross Entropy %#& ∙ %&( ∙ %)(a

An Tol Jensen-Shannon Divergense (JSD) n C n

! =!"#$ + -%.(%#& ∥ %&() + -%.(%#) ∥ %)() (1)

Lo G o

n G

n

n

No

C

cn

�� '+%$&(�

a 360 n

360 5 360a 4

5 360

l No n T

G No D G n

a

a 19 An e 360a 5171×1160 An

a 360 1466104 n Data

Augmentation u

NS n a Adam[17]n

��

n

IoU GAn cn a 1 n A

n 3435 67 n (2) L

G n

68 =3534 (9)

a M

An

a G n

GAn a

l nL

NSnL G n :; C

65 n (3) L G n

65 =1:=>;?@?

AB

?C((3)

IoU a

An IoU a

LabelLabel

Softmax Cross Entropy Softmax Cross Entropy

Label

Softmax Cross Entropy

出力��

��

��

Cameraloss

��ネットワーク

n [%] IoU

92.4 60.3 52.8 A 91.9 58.6 50.7

Dilated 92.2 61.5 54.1 A Dilated 94.3 62.4 56.2

No n :; C

E IoU F? n

(4) L G n

F? =1:=> ;?

@? + E?A

B

?C((4)

�� '+%$&( ��"��

360 n

n

n n

n 1 1 Dilated ConvolutionnL G n

CL

G n Dilated Convolution CL nL G

n 2 2

G

nL G n i SegNet

正解画像

FCN

PSPNet

提案手法

入力画像

SegNet

U-Net

� 6(360 ��'�� "&�'�#& �

2 [%]

IoU

FCN 92.5 58.5 52.1 SegNet 92.4 55.2 48.9 U-Net 93.8 60.2 54.4

PSPNet 93.5 61.6 55.1 94.3 62.4 56.2

IoU G 7% nL G n G

a An Dn 6

6G nL G n L

L l a DnL

G n Dn e SegNet FCNa

N i

G nL G n L L

l i DnL G n D

n

�� #)*��!��!��

n

7 7a

3 n [%]

IoU

93.8 61.2 55.1 94.3 62.4 56.2

G

o L G n

a

G o nL G

n e 3n 3c a 1% G

n G 360 n

a An Dn

a b

NSn

a 5NSnL

n

DnL G a

DAG-LSTM DAG-RNN a

G C GAnG RNN LSTM DnL

e Dilated Convolution A S

��

��

��

� 7(�"$��"&�'�#& ��

nL DnL G

a

G n C nL L G N

SnL 360 n

a b

G

C

[1] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, ‘‘Gradient-Based Learning Applied to Document Recognition,’’ in Proceedings of the IEEE, vol. 86, no.11, pp.2278-2324,1998.

[2] J. Long, E. Shelhamer, T. Darrell, ‘‘Fully convolutional networks for semantic segmentation,’’ IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

[3] H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia,‘‘Pyramid Scene Parsing Network,’’ IEEE Conference on Computer Vision and Pattern recognition (CVPR), 2017.

[4] V. Badrinarayanan, A. Kendall, R. Cipolla, ‘‘SegNet A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,’’ arXiv:1612.01105, 2016.

[5] O.Ronneberger, P. Fischer, T. Brox, ‘‘U-Net Convolutional Networks for Biomedical Image Segmentation,’’ Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2015.

[6] J. Elman, ‘‘Finding Structure in Time,’’ Cognitive Science. Vol. 14, no. 2, pp.1735-1780, 1997.

[7] B. Shuai, Z. Zuo, G. Wang, B. Wang, ‘‘DAG-Recurrent Neural Networks For Scence Labeling,’’ Conference on Computer Vision and Pattern Recognition (CVPR), 2015.

[8] G. Brostow, J. Shotton, J. Fauqueur, R. Cipolla, ‘‘Segmentaion and Recognition Using Structure from Motion Point Clouds,’’ European Conference on Computer Vision (ECCV), 2008.

[9] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benensen, B. Schiele, ‘‘The cityscapes dataset for semantic urban scene understanding,’’ IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

[10] M. Everingham, A. S. M. Eslami, L. Van Gool, C.K.I. Williams, J. Winn, and A. Zisserman, ‘‘The Pascal Visual Object Classes Challenge A Retro spective,’’ International Journal of Computer Vision (IJCV), 2014.

[11] N.Silberman, D. Hoiem, P. Kohil, R. Fergus, ‘‘Indoor Segmentation and Support Inference from RGBD Images,’’European Conference on Computer Vision (ECCV), 2012.

[12] G. Neuhold, T. Ollmann, S. R. Bulo, P. Kontschieder, ‘‘The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes,’’ International Conference on Computer Vision (ICCV), 2017.

[13] F. Yu, V. Koltun, ‘‘Multi-scale context aggregation by

dilated convolutions,’’ International Conference on Learning Representation (ICLR), 2016.

[14] , , , , , , “ n

”, t , 2018.

[15] K. Simonyan, A. Zisserman, ‘‘Very deep convolutional networks for large-scale image recognition,’’ International Conference on Learning Representation (ICLR), 2015.

[16] S.Hochreiter, ‘‘Long Short-Term Memory,’’ Neural Computation. Vol. 9, no. 8, pp.1735-1780,1997.

[17] D. Kingma, J. Ba, ‘‘Adam A method for stochastic optimization,’’ arXiv preprint arXiv:1412:6980, 2014.

2018

2013 20142017

(~2019 ) 2017 2019

20022002 v 2009

( ) 20142017

19971997~2000Postdoctoral Fellow.2000 2004

2005~2006

view2019 fulllength v2 - mprgmprg.jp/data/mprg/f_group/f20191205_goto.pdf · 2 Ö 8 q?c l ¬ ~ :[%]...

Documents