=2) +? 38: · 5" $ (t update) ¤tsubame 2.5 tsubame-kfc/dl / % 33%'# (611% '# +8:1...

32
"=2)+?38:$ &- 1,a) !( A> 1 4. 56 2 '( @< 3 97 ; 3 1/ B 1 1 %*" 2 3 ,0 a) [email protected], # 1 16/08/08 SWoPP2016

Upload: others

Post on 26-May-2020

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

"=2)�+?���38: $��������� ��&-����

����1,a) !(A>1 4.562 '(@<3 97;�3 1/B1

1�%*��"

2������ �������

3,0������

a) [email protected], �#�

1

16/08/08SWoPP2016

Page 2: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

DFMBDeep Learning (DL)

¤ “H�”Neural Network (NN)�.��1U0O&,¤ NN ��� 7�(�! ��)�0O���2L-

¤ <S�K@9J�N &,�G��9J4:�E5

¤ H�NN � Q$�'+������→ $>I8; =34¤ TSUBAME2.5 16���(48 GPU) 15PCNN 0O�199() (8.2#)¤ NN ?C�0O�! ����!���"���!�/A�����R6* 0O�=3

2

��(��)(%T: http://image-net.org/)

��(���)

Barn swallow = 0.95Police dog = 0.03Water beetle = 0.01

…Neural Network

16/08/08SWoPP2016

Page 3: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

69=5Stochastic Gradient Descent (03$@4;'()

¤ ������"� m��� ��(�����)��.1�����&,�@4 ∇Ei�/+�)�NN�-� W(t)�>%��#(¤ !7$@4 ∇E�.18(,2�,:PFlop)�?!DL� *<

3

W (t+1) =W (t ) −ηΣi=1m ∇Ei (W

(t ) )E

E1

-η∇E1

E2

-η∇E2

E3

-η∇E3

W(t)

m = 3

-ηΣi ∇Ei

W(t+1)

16/08/08SWoPP2016

Page 4: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

HKQG!�������<N3>

¤ !��������NN�S-(<N3>)0R*+�JM¤ /��!�������→ $#���AX�85�=I�B2¤ &��!�������→ ZE�$%�"3�F,���D41P;�Y� ��¤ OW7@���!��������96(0R*+�TV���'))<N3>�L.��C>3

4

�%����(��11UCNN�ImageNet2012�������0R:?

05

101520253035404550

0 100 200 300 400 500 600

Top

-5 ���

[%]

Epoch

T2

GW2T2 GW2

GPU� 48 1

�� � [�] 6.04 43.8

Epoch � [�] 910 28235

�������� 1076 50

�� top-5 ���[%] 17.2 15.7

Bett

er

16/08/08SWoPP2016

Page 5: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

HKOGSuPeRcomputation In Neural Training (SPRINT)

¤ AD.0�,������,�������)%)�*�:1��CQ@M� -ELDL���'

¤ GPU�����I8��RF�?B��.������N2>6<"� $)(���P7 (Asynchronous SGD, ASGD)¤ �+� ;/J9��&!#������=65��

¤ ���������� �→ #-�(�*!�+����,&"�'%(). ���$)��� ��

5

34

W (t+1) =W (t ) −ηΣi∇Ei(t ) W (t+2) =W (t+1) −ηΣi∇Ei

(t+1) W (t+3) =W (t+2) −ηΣi∇Ei(t+2)

GPU�+�

P7�+�

∇Ei(t+1) ∇Ei

(t+2) ∇Ei(t+3)W (t+1) W (t+2)

16/08/08SWoPP2016

Page 6: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

Z]=5�BQ

¤ ��: `P�>,TK�-:���ASGD�Convolutional Neural Network (CNN)�N^��@b�,�DL�� '�1�Fc&#$������Mg�AO(!)�Ve�

¤ ����: DL�� '”SPRINT”�6.���>,23�Gd�7_.i���Empirical�AO(!)�LJ¤ %�*C4�CNNVY�08� Epoch23(!����"?<�@b�23)�Fc&#$������+8

¤ ��:¤ TSUBAME 2.5�TSUBAME-KFC/DL�Epoch23�Fc&#$������Fch\6%, 8%�Hd

¤ fD�CNN��� kWGR23�Fch\12%E9�Hd¤ Fc&#$������IS� a��>,[X��@b23�Uj�/;�LJ

6

16/08/08SWoPP2016

Page 7: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

SPRINT���

7

16/08/08SWoPP2016

Page 8: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

SPRINT�25

¤ GPU�����%-¤ 0���14) �GPU�SSD�*�¤ ������SSD�".��3/

8

���CPU

7$���� GPU����

GPU

�������

SSD

GPU

�������!#

+ CNN���

+ CNN���

GPU����

83�,(

83�,(Allreduce1'����

&$�+

Momentum6

Mutex

���

16/08/08SWoPP2016

Page 9: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

SPRINT�;>GPU�)��

1) NSubbatch=�����*#(�SSD��� ��%&'�!�2) D<(49MB)�68�$��%&'?8

¤ ,B?� NSubbatch=��*#(��+C�68��

3) 10�5 �A0����.-�� ��%&'�!�

9

���CPU

A0�)�� GPU�)��

GPU

�*������

SSD

GPU

�������+/

5 CNN&�(

5 CNN&�(

GPU�)��

D<�73

D<�73Allreduce:2 �"�

10�5

Momentum@(1)(2)

(3)

Mutex

���

16/08/08SWoPP2016

Page 10: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

SPRINT�)+/�����

a) GPU�����0*�%#������,&

b) ����'!����$�(0*)����-&

c) Allreduce�"�$��/�

10

���CPU

/����� GPU����

GPU

�� ���

SSD

GPU

�� ������

$� CNN���

$� CNN���

GPU����

0*�%#

0*�%#Allreduce("����

��$�

Momentum.

(a)(b)

(c)

Mutex

��

16/08/08SWoPP2016

Page 11: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

��������

11

16/08/08SWoPP2016

Page 12: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

.-�)/���5(

1. ���* NNode����$GPU* NGPU�CNN01���!&

2. GPU�����2%�����1��������+3' "#TGPU, TUpdate�,4

3. TGPU, TUpdate�Epoch"# TEpoch�+3����� ��NMinibatch_avg�,4

12

�����(NNode, NGPU, …)

CNN01(L, {xl}, {ml}, …)

TGPU���

TUpdate���

TEpoch���

NMinibatch_avg���

!& �&

16/08/08SWoPP2016

Page 13: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

&%� '���CNN*+("

¤ .#-��������$�-�softmax�*!��CNN��)¤ ,: VGG

[Simonyan, 2014]

13

���� ��L � ��"LC .#� ��"xl � �� l �� �ml � �� l ����"c .#����� �pl � �� l ������� �

.# �����

� �� l-1 � �� l

� �� 0

softmax

� �� L� �� LC

�$�

xl−1

xl−1

xlxlmlmlml−1

cc pl

pl

xl−1 − c+1x0

x0

m0

mL

xLc

mLc

xLc xLc2mLc

16/08/08SWoPP2016

Page 14: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

/-��*0������� ' "#

¤ 2���� 1��������� ' "#�!.������34���6�)+¤ ���� 1,�'85�(��&$

14

"#

W (t+1) =W (t ) −ηΣi∇Ei(t ) W (t+2) =W (t+1) −ηΣi∇Ei

(t+1) W (t+3) =W (t+2) −ηΣi∇Ei(t+2)

GPU����

7%����

TGPU

TUpdate

TGPU = TLoadImage + TComputeGradient + TUpdateGradient + …TUpdate = TSumGradient + TAllreduce + TUpdateWeights + …

16/08/08SWoPP2016

Page 15: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

54��,6���GPU�����E:08"#(TConputeGradient)

¤ ������2<=����( "#1-����)¤ 15BCNN�0100%/�����?���¤ SGEMM(����)����

¤ C����CNN9;�NSubbatch��'$¤ convolution: D 2 BFeed-forward08¤ fc: +3!BFeed-forward08¤ dedw, dedb, dedx_*: Back-propagation08

¤ CUDA����¤ :C����CNN9;�NSubbatch��'$¤ im2col: D 2 *>�A�¤ activation: 7,)&.@*¤ pooling: max-pooling¤ c2f: D 2 B→+3!B>�A�F

15

16/08/08SWoPP2016

Page 16: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

'%��!(���GPU�����6,")��(TConputeGradient)

1. SGEMM(��� ��)���¤ �0� /����� �2��� m�n�k���#1�3-

¤ 2x/2 � 2y/2 � 2z/2�2.����8&5��2���$4*�+�

16

cublasSgemm�� �����������

cublasSgemm�� ��

Tconvolution (l) =αmnkc2ml−1ml "xl

2NSubbatch +αmnml "xl2NSubbatch +αmkc

2ml−1 "xl2NSubbatch

+αnkc2ml−1ml +αm "xl

2NSubbatch +αnml +αkc2ml−1 +β

log2(m)

log2(n)

log2(k)

log2(m)

log2(n)

log2(k)

2e−16 sec2e−14 sec2e−12 sec2e−10 sec2e−8 sec2e−6 sec2e−4 sec2e−2 sec2e0 sec

16/08/08SWoPP2016

Page 17: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

$#� %���GPU ����.)"'��(TConputeGradient)

2. CUDA���¤ �������� *�(+������������

¤ &!��-����(+�,���

17

Tim2col (l) = α !xl2c2ml−1NSubbatch +β (l ≤ LC )

CUDA������ �(�)��������(��)

16/08/08SWoPP2016

Page 18: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

>=��6@��!Q,�"���SJ8C()(TSumGradient)

¤ GPU�"���SJ�;C�D���'&��8C��→ Q,�"����G1P.H�/%()�4+ E0¤ �"���<3M�$O����R*�;C9�BF�5KN���! A2

18

TSumGradient =α ×NParam ×NGPU ×min(TUpdate /TGPU,1):�(� ���)7(CNNIL��-*)

SJ�;C9����BF�5KN

()GPU

�"��

Q,�"��

TUpdate

TGPU TGPU TGPU TGPU TGPU

TUpdate

TUpdate > TGPU

?��"���#�SJ7 = 2

?��"���#�SJ7 = 3

16/08/08SWoPP2016

Page 19: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

>=��6@��!Q,�"���SJ8C()(TSumGradient)

¤ GPU�"���SJ�;C�D���'&��8C��→ Q,�"����G1P.H�/%()�4+ E0¤ �"���<3M�$O����R*�;C9�BF�5KN���! A2

19

TSumGradient =α ×NParam ×NGPU ×min(TUpdate /TGPU,1):�(� ���)7(CNNIL��-*)

SJ�;C9����BF�5KN

()

GPU�"��

Q,�"�� TUpdate

TGPU TGPU

TUpdate

TUpdate < TGPU

?��"���#�SJ7 = 0

?��"���#�SJ7 = 1

TUpdate /TGPU1−TUpdate /TGPU

16/08/08SWoPP2016

Page 20: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

0.��*2���6%�����Allreduce'�!"

¤ <4,3�-35�������#� �→ Allreduce'�!�(:;$)��� )!"���¤ -35�/9�7����(� )!"�18

¤ X1 ~ B(NGPU, p=min(TUpdate/TGPU, 1)), F(i) �7&+

20

!"

6%����

(NNode = 3)

<4,3

X1

<4,3

<4,3

<4,3 <4,3 <4,3

XM =max(X1,X2,!,XNNode)

MPI_Allreduce

MPI_Allreduce

MPI_Allreduce

TBarrier =αE(XM − X1) =α{NGPU (1− p)−Σi=1NGPU F(i)NNode}

TBarrier

TBarrier

16/08/08SWoPP2016

Page 21: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

10� ,2�� #'

¤ �!���($%&�+�.5�������� NMiniBatch_avg�Epoch%& TEpoch��� )

21

NMinibatch_avg =NNode ×NGPU ×NSubbatch ×TUpdate

TGPU4/%&���

3*�� �"� -

1������3*%&

TEpoch =NFile ×TUpdateNMinibatch_avg

=NFile ×TGPU

NNode ×NGPU ×NSubbatch

��������"� -

1�"� ����3*%&

16/08/08SWoPP2016

Page 22: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

��������

22

16/08/08SWoPP2016

Page 23: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

"'���,&

¤ TSUBAME 2.5�TSUBAME-KFC/DL�SPRINT �¤ 15�17����CNN(CNN-A, CNN-B, CNN-C) (!¤ ILSVRC2012������ (!¤ CNN-A (!���� /+�SGEMM ��� /+����)# ��

¤ CNN-A,B,C (!���� /+������%/+ *0

¤ TSUBAME 2.5 (NNode = 1, 2, 4, …, 64)¤ Intel Xeon X5670 � 2 ($ 281 -.�GFlop/s)¤ NVIDIA Tesla K20x � 3 ($ 11.8 -.�TFlop/s) → NGPU = 3¤ 4xQDR InfiniBand � 2 ($ 8 GB/s)

¤ TSBUAME-KFC/DL (NNode = 1, 2, 4, …, 16)¤ Intel Xeon E5-2620 v2 � 2 ($ 403 -.�GFlop/s)¤ NVIDIA Tesla K80 � 4 ($ 34.9 -.�TFlop/s) → NGPU = 8¤ 4xFDR InfiniBand (7 GB/s)

23

16/08/08SWoPP2016

Page 24: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

��� ��&�*"� �� (TComputeGradient)

¤ TSUBAME 2.5�TSUBAME-KFC/DL�!��15�17�������CNN�*"� ����()$�'12%��

24

TSUBAME-KFC/DL(�)�TSUBAME 2.5(�)�������(TComputeGradient) ���(� )���( )

2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

N_Subbatch

T_ComputeGradient [sec]

1 2 3 4 5

0.00

0.05

0.10

0.15

0.20

0.25

0.30

N_Subbatch

T_ComputeGradient [sec]

CNN-A (実測)CNN-A (予測)CNN-B (実測)CNN-B (予測)CNN-C (実測)CNN-C (予測)

SGEMM������#%

16/08/08SWoPP2016

Page 25: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

�#� ��)"GPU�������� (TGPU)

¤ TSUBAME 2.5�TSUBAME-KFC/DL�%����16%����+7%���!,.'¤ �NSubbatch��-(

25

TSUBAME-KFC/DL��CNN-A(15��)� �����TGPU���(�)���(�)

UpdateGradientLockGradient_GComputeUpdateValComputeGradientDeformImageLoadImageFetchWeightsLockWeights_G

(N_Node,N_Subbatch)

T_G

PU [s

ec]

0.0

0.2

0.4

0.6

0.8

1.0

(1,1

)

(1,4

)

(1,8

)

(1,1

1)

(2,1

)

(2,4

)

(2,8

)

(2,1

1)

(4,1

)

(4,4

)

(4,8

)

(4,1

1)

(8,1

)

(8,4

)

(8,8

)

(8,1

1)

(16,

1)

(16,

4)

(16,

8)

(16,

11)

/& $��(�*)��&�

16/08/08SWoPP2016

Page 26: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

&-����3,5"����$�� (TUpdate)

¤ TSUBAME 2.5�TSUBAME-KFC/DL�/�%�33%'#�(611%'#�+8:1¤ NNode����NSubbatch �92→ �����+8����

26

TSUBAME-KFC/DL��CNN-A(15��)� �����TUpdate���(�)���(�)

UpdateWeightsLockWeights_UUpdateMomentumAllreduceAddMomentumUpdateOldWeightsSumGradientLockGradient_U

(N_Node,N_Subbatch)

T_Up

date

[sec

]

0.00

0.05

0.10

0.15

0.20

0.25

0.30

(1,1

)

(1,4

)

(1,8

)

(1,1

1)

(2,1

)

(2,4

)

(2,8

)

(2,1

1)

(4,1

)

(4,4

)

(4,8

)

(4,1

1)

(8,1

)

(8,4

)

(8,8

)

(8,1

1)

(16,

1)

(16,

4)

(16,

8)

(16,

11)

<0*.� (;4)��!���

Allreduce()4)�$�� �1.27

16/08/08SWoPP2016

Page 27: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

�������Epoch����� ���������

¤ Epoch����������6%

¤ �� ��������������8%

27

050100150200250300350400450

(1,1)

(1,4)

(1,8)

(1,11)

(2,1)

(2,4)

(2,8)

(2,11)

(4,1)

(4,4)

(4,8)

(4,11)

(8,1)

(8,4)

(8,8)

(8,11)

(16,1)

(16,4)

(16,8)

(16,11)

N_M

inib

atc

h_a

vg

(N_Node, N_Subbatch)

����

TSUBAME-KFC/DL��CNN-A(15����)���������� �������

16/08/08SWoPP2016

Page 28: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

+2��!�71� ���AC

¤ 3�06+2����TSUBAME2.5 16����*8�-;��������138((@4)��25%,'���� ����AC¤ TSUBAME-KFC/DL�8���$9�*8�1.47>�#5)¤ (@��� ���B? ��Epoch%&:<</�.@�"=

28

NNode NSubbatch

Epoch�� [sec] �%�������

�( [sec] �( [sec] �(*"[%]

�( �( �(*"[%]

KFC 8 8 2025 1779 12.1 147 165 12.2KFC 8 11 2316 2226 3.90 173 171 1.28T2.5 16 5 2725 2614 4.06 128 125 2.29T2.5 16 4 2910 2840 2.40 116 118 1.71

1.47> 138�26%

ILSVRC2012��������#������!��((�(��)'�Epoch��$&�� )

1.35> 138�25%

16/08/08SWoPP2016

Page 29: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

%*���.)�����24

¤ %*����#�����!�,*30�- �'1�,*

29

10 20 30 40

24

68

10 5e+02 sec1e+03 sec2e+03 sec5e+03 sec1e+04 sec2e+04 sec5e+04 sec1e+05 secN

_Subbatch

N_Node

Epoch���$"���+(

&/������ ��138�25%���30

TSUBAME-KFC/DL��ILSVRC2012�����������Epoch ����

16/08/08SWoPP2016

Page 30: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

��������

30

16/08/08SWoPP2016

Page 31: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

;@Z]

¤ (.,����'�A�DL��"+�ER-#/2� [Yan et al, SIGKDD, 2015]¤ (.,����'(NN�C9�I��?H���')�A��DL��"+�ER-#0�OM

¤ -#0`h�#��`h�(.,����'`h�`hF NNVY��Epoch78�Kg

¤ a5DE���Bb�X(*&'! ���)�Qi������

¤ Bb78 e=�$1�%�)�;�Z] [Gupta et al, 2015]¤ (.,����'�VY�L3 �a5D`hDL��"+Rudra�OM¤ Rudra�A�Bb>[�NN�N_ER�*&'! ��� Gd

staleness (jW�JT�8�I��c9���:F)�U6�\^���� �P�

¤ 4f�*&'! ����staleness N_ER�;S�<������

31

16/08/08SWoPP2016

Page 32: =2) +? 38: · 5" $ (T Update) ¤TSUBAME 2.5 TSUBAME-KFC/DL / % 33%'# (611% '# +8:1 ¤N Node N Subbatch 92→ +8 26 TSUBAME-KFC/DL CNN-A(15 ) T Update ( ) ( ) UpdateWeights LockWeights_U

��(% C3¤ ��

¤ TSUBAME 2.5TSUBAME-KFC/DLEpoch$&�2J���������2JNE6%, 8%5K

¤ M0 CNN����PA4=$&�2JNE12%1'5K¤ 2J���������8>��G��+�DB .H$&�?O��!)�:9

¤ (% C3¤ ���FLI FL,!)�-��" /;5K¤ +6 .H7<�@ ����#*�/;5K

32

16/08/08SWoPP2016