view2019 fulllength v2 - mprgmprg.jp/data/mprg/f_group/f20191205_goto.pdf · 2 Ö 8 q?c l ¬ ~ :[%]...
TRANSCRIPT
6 6
○ † † † † †:
{[email protected]., [email protected]., takayoshi@isc., fujiyoshi@isc.}chubu.ac.jp
a n A
An
l G No n
Loe No n a n
n nL G TL a 360 n
n a b
NSnL
G AnL , ,
Deep Convolutional Neural Network (DCNN)[1]a
n - 11a
i
No n i
a eo n
n A G N
o n DCNNFully Convolutional Network (FCN)[2]
Pyramid Scene Parsing Network (PSPNet)[3] Encoder-Decoder SegNet[4] U-Net[5] GAn
e DCNN RNN[6] S
DAG-RNN[7]GAn Lol
a
[8][9][10][11]n i a
[8][9][12]G No
G o n Lo
e No n a
l No n
nL G
TL a (
n
n a
D G n n
a a
nG d Dn L
o i u G
G n GAn L
C D
NSn [14]G No
n a
NSn e n
n
Dilated Convolution[13] n Dilated Convolution a C
L
G n Lo (
l b DnL G
D N
nL G n
( a
No No n
G n
GAn n
G n GAn TL
No
G C C Lo
c
G n
n (
n a
i NS
l No n
- 11
Fully Convolutional Network (FCN)[2] SegNet[4]GAn FCN a
No
An FCN a DCNNn 1×1
D n Dn
L n nL
G n e a
o G n
C GAn FCN aL n
n
a
S n Lo l
DnL G
G n SegNet a VGG16[15]
A Encoder Decoder S
n Encoder aNo
n SegNet aMax Pooilng C
i
Decoder a Encoder
e n L Encoder
Upsampling C Upsampling G No
a 0 n Lo
n DCNN RNN S
DAG-RNN[7]GAn DCNN a An
l C RNN SnL
D G n DAG-RNN a
n lo DAG-RNN
3 l D
n
No Deconvolutionn e
DnL G n Dilated Convolution[13]GAn Dilated Convolution a
o CL
G n Encoder-Decoder [4]
a G n
G n GAn e
DCNN [2] a
l C
t-2
t-4
t-3
入力 出力
Convolution Dilated Convolution Max Pooling Deconvolution
t
t-1
t
DAG-LSTM 1×1Conv
DAG-LSTM 1×1Conv
DAG-LSTM 1×1Conv
DAG-LSTM 1×1Conv
DAG-LSTM 1×1Conv
DecoderEncoder
� 1(����!� %'��!� %'��
G No a DCNNRNN S Dilated ConvolutionnL
DnL G n
5 .
( a D
G n i u G
G n GA
n a D
n
NSn n
a ) DeconvolutionNo n a3×3
A a CM
n Max Pooling n e )
Deconvolution DAG-LSTMn DAG-LSTM nL
nL G
G n a
n
a a Dilated Convolution C Lo
G n
cn
����� ����!����
NSn RNNn NSn )
No n tt-4 l t e
NSn NSnL
n nL G
n
����� ����!����
NSn DAG-RNN with LSTM (DAG-LSTM) bDilated Convolution
n DAG-LSTM DAG-RNN a DCNN RNN S
An
RNN DCNN nL
D G
- 230
-9 4 8 9
n DAG-RNN a ElmanRNN[6] n RNN a
No G
C GAn a RNNLSTM[16] DAG-LSTM
n LSTM nL
nL G n
n
Dilated Convolution n Dilated Convolution 3
a 4 C 3×3
C
n Dilated Convolution a 5 C
3×3C e a 6 C
4 o o
Dn
L G n G G
i DnL G
n
��� � #)*�!����
( a
������������ ������� ������� �� ����������� �������� ������� �� �����������
t-1
Encoder
Σ
Decoder
LSTM
LSTM
LSTM
LSTM
t
時間方向の伝搬
空間方向の伝搬
(−#,+&) (+#,−&) (−#,−&)
DAG-LSTM : t-1
特徴マップ 特徴マップ
特徴マップDAG-LSTM : t
(+#,+&)空間方向の伝搬
No Non L
n
G o G n
GAn
No G
G
n TL C
C n a n
C n
No G eon C n
n
C C
D a (1) C L
G n LL !"#$a Softmax Cross Entropy %#& ∙ %&( ∙ %)(a
An Tol Jensen-Shannon Divergense (JSD) n C n
! =!"#$ + -%.(%#& ∥ %&() + -%.(%#) ∥ %)() (1)
Lo G o
n G
n
n
No
C
cn
����� � '+%$&(�
a 360 n
360 5 360a 4
5 360
l No n T
G No D G n
a
a 19 An e 360a 5171×1160 An
a 360 1466104 n Data
Augmentation u
NS n a Adam[17]n
����� �����
n
IoU GAn cn a 1 n A
n 3435 67 n (2) L
G n
68 =3534 (9)
a M
An
a G n
GAn a
l nL
NSnL G n :; C
65 n (3) L G n
65 =1:=>;?@?
AB
?C((3)
IoU a
An IoU a
LabelLabel
Softmax Cross Entropy Softmax Cross Entropy
Label
Softmax Cross Entropy
出力��
��
��
Cameraloss
����ネットワーク
n [%] IoU
92.4 60.3 52.8 A 91.9 58.6 50.7
Dilated 92.2 61.5 54.1 A Dilated 94.3 62.4 56.2
No n :; C
E IoU F? n
(4) L G n
F? =1:=> ;?
@? + E?A
B
?C((4)
����� � ��� �'+%$&( ��"��� �
360 n
n
n n
n 1 1 Dilated ConvolutionnL G n
CL
G n Dilated Convolution CL nL G
n 2 2
G
nL G n i SegNet
正解画像
FCN
PSPNet
提案手法
入力画像
SegNet
U-Net
� 6(360 ��'��� ������"&�'�#& �
2 [%]
IoU
FCN 92.5 58.5 52.1 SegNet 92.4 55.2 48.9 U-Net 93.8 60.2 54.4
PSPNet 93.5 61.6 55.1 94.3 62.4 56.2
IoU G 7% nL G n G
a An Dn 6
6G nL G n L
L l a DnL
G n Dn e SegNet FCNa
N i
G nL G n L L
l i DnL G n D
n
����� � #)*��!���!���
n
7 7a
3 n [%]
IoU
93.8 61.2 55.1 94.3 62.4 56.2
G
o L G n
a
G o nL G
n e 3n 3c a 1% G
n G 360 n
a An Dn
a b
NSn
a 5NSnL
n
DnL G a
DAG-LSTM DAG-RNN a
G C GAnG RNN LSTM DnL
e Dilated Convolution A S
��
��������
�������
� 7(�"$�����������"&�'�#& ���
nL DnL G
a
G n C nL L G N
SnL 360 n
a b
G
C
[1] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, ‘‘Gradient-Based Learning Applied to Document Recognition,’’ in Proceedings of the IEEE, vol. 86, no.11, pp.2278-2324,1998.
[2] J. Long, E. Shelhamer, T. Darrell, ‘‘Fully convolutional networks for semantic segmentation,’’ IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[3] H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia,‘‘Pyramid Scene Parsing Network,’’ IEEE Conference on Computer Vision and Pattern recognition (CVPR), 2017.
[4] V. Badrinarayanan, A. Kendall, R. Cipolla, ‘‘SegNet A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,’’ arXiv:1612.01105, 2016.
[5] O.Ronneberger, P. Fischer, T. Brox, ‘‘U-Net Convolutional Networks for Biomedical Image Segmentation,’’ Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2015.
[6] J. Elman, ‘‘Finding Structure in Time,’’ Cognitive Science. Vol. 14, no. 2, pp.1735-1780, 1997.
[7] B. Shuai, Z. Zuo, G. Wang, B. Wang, ‘‘DAG-Recurrent Neural Networks For Scence Labeling,’’ Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[8] G. Brostow, J. Shotton, J. Fauqueur, R. Cipolla, ‘‘Segmentaion and Recognition Using Structure from Motion Point Clouds,’’ European Conference on Computer Vision (ECCV), 2008.
[9] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benensen, B. Schiele, ‘‘The cityscapes dataset for semantic urban scene understanding,’’ IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[10] M. Everingham, A. S. M. Eslami, L. Van Gool, C.K.I. Williams, J. Winn, and A. Zisserman, ‘‘The Pascal Visual Object Classes Challenge A Retro spective,’’ International Journal of Computer Vision (IJCV), 2014.
[11] N.Silberman, D. Hoiem, P. Kohil, R. Fergus, ‘‘Indoor Segmentation and Support Inference from RGBD Images,’’European Conference on Computer Vision (ECCV), 2012.
[12] G. Neuhold, T. Ollmann, S. R. Bulo, P. Kontschieder, ‘‘The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes,’’ International Conference on Computer Vision (ICCV), 2017.
[13] F. Yu, V. Koltun, ‘‘Multi-scale context aggregation by
dilated convolutions,’’ International Conference on Learning Representation (ICLR), 2016.
[14] , , , , , , “ n
”, t , 2018.
[15] K. Simonyan, A. Zisserman, ‘‘Very deep convolutional networks for large-scale image recognition,’’ International Conference on Learning Representation (ICLR), 2015.
[16] S.Hochreiter, ‘‘Long Short-Term Memory,’’ Neural Computation. Vol. 9, no. 8, pp.1735-1780,1997.
[17] D. Kingma, J. Ba, ‘‘Adam A method for stochastic optimization,’’ arXiv preprint arXiv:1412:6980, 2014.
2018
2013 20142017
(~2019 ) 2017 2019
20022002 v 2009
( ) 20142017
19971997~2000Postdoctoral Fellow.2000 2004
2005~2006