memory effects in metaplastic binarized neural networks · 2019. 6. 22. · binarized neural...

1
MEMORY EFFECTS IN METAPLASTIC BINARIZED NEURAL NETWORKS Axel Laborieux , Tifenn Hirtzlin, Liza Herrera-Diez, Damien Querlioz Centre de Nanosciences et de Nanotechnologies, Univ. Paris-Sud, CNRS, France Binarized Neural Networks (BNNs) are attractive for low power hardware implementation of artificial intelligence. In this work, we study how metaplastic binarized synapses enable BNNs to be used in the framework of multi- head learning, i.e. sequentially learning several tasks and inference requires specifying the task. Binarized Neural Network (BNN) Synapses {+1, -1} Synapses {+1,-1} Synapses {+1, -1} Hubara, Courbariaux et al. NIPS 2016 Binarized Neural Networks achieve state of the art results in image recognition, and rely on simple logic operations. A binary weight is the sign of a floating value which is not a weight as the loss and gradients are computed using binary values only. Magnetic Tunnel Junction TE BE TE BE LRS HRS Resistive RAM Phase Change Memory Nanodevices Using Magnetism, Spintronics, Ionics Provide Artificial Synapses Rich synaptic behaviors can be emulated by nanodevices Real Synapses Are Metaplastic Connectionnist models are subject to catastrophic forgetting, when learning sequentially several tasks. Remembering previous tasks and learning new tasks seems incompatible with respect to synapse plasticity : we need to prevent synapses from changing in order to remember and learning requires synapses to change. Metaplastic synapses with a wide range of plasticity are a way of solving this paradox. Fusi et al. Neuron 2005 Weight : -1 +1 Consolidation Processes for BNN Synapses Optimization is done with Adam (Kingma, Lei Ba ICLR 2015) on the floating value underlying the binary weight. p(t) is the point on the hypercube on which the BNN is evaluated at time step t. Synaptic metaplasticity is introduced by modulating the adam update. The floating value of the binary weight can encode for a metaplastic state. A synapse is described by a binary weight used for inference and the hidden floating value used for learning and memory purpose. Permuted Tasks Benchmark Non Permuted Tasks Fixed permutations of pixels provide a list of non correlated tasks. 10 20 30 40 Epochs for each tasks 50 60 70 80 90 100 Test accuracies Metaplastic BNN 1st tsk 2nd tsk 10 20 30 40 Epochs for each tasks 50 60 70 80 90 100 Test accuracies Regular BNN 1st tsk 2nd tsk 4 2 0 2 4 0.0 0.2 0.4 0.6 0.8 1 .0 f meta (W float )=1 tanh(|W float |) 2 4 2 0 2 4 0.0 0.2 0.4 0.6 0.8 1 .0 f meta (W float )=1 |W float |<1 0 10 20 0 5 0 5 0 5 0 10 20 0 5 0 5 0 5 0 10 20 0 5 0 5 0 5 0 10 20 0 5 0 5 0 5 Metaplastic BNNs can learn and consolidate knowledge and still be able of learning a new task. 4 2 0 2 4 0 2 0 meta Poorly correlated tasks are gradually forgotten by regular networks because of ongoing plasticity, whereas metaplasticity enables the network to learn several tasks sequentially. fCIFAR10 corresponds to CIFAR10 features extracted by ResNet18 pretrained on ImageNet. Learning correlated tasks is more difficult as some of the relevant pixels are shared. The accuracy of the first task abruptly drops while learning the second task with regular models. Starting with randomly consolidated synapses by tuning the width of weight initialization is another metaplasticity ingredient which improves performance. 0.3 0.7 1.1 1.5 1.9 2.3 Weight initialization width 20 30 40 50 60 70 80 90 100 Test accuracies Metaplastic BNN 1st tsk 2nd tsk 0.3 0.7 1.1 1.5 1.9 2.3 Weight initialization width 20 30 40 50 60 70 80 90 100 Test accuracies Regular BNN 1st tsk 2nd tsk 4 2 0 2 4 0 2 0 meta For non correlated tasks it is sufficient to start learning with only plastic synapses and consolidate upon learning. But for correlated tasks, starting learning with consolidated synapses provide plastic synapses for the next task. Conclusions Neuroscientists (Fusi et al.) have shown that biologically plausible synapses may be described by more than one parameter (i.e one weight) and that complex synapse dynamics allows for long term memory. Binarized Neural Networks seems to contain only +1 and -1 synaptic weights, yet weights with floating values far from 0 are less likely to switch than weights with floating values close to 0. We can thus introduce a metaplastic dynamics and we show that it allows BNNs to have long term memory. Spintronics nanodevices are promising for designing metaplastic synapses directly from the physics of the material and with low energy cost. Count Count

Upload: others

Post on 17-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MEMORY EFFECTS IN METAPLASTIC BINARIZED NEURAL NETWORKS · 2019. 6. 22. · Binarized Neural Networks (BNNs) are attractive for low power hardware implementation of artificial intelligence

MEMORYEFFECTSINMETAPLASTICBINARIZEDNEURALNETWORKS

AxelLaborieux,TifennHirtzlin,LizaHerrera-Diez,DamienQuerliozCentredeNanosciencesetdeNanotechnologies,Univ.Paris-Sud,CNRS,France

BinarizedNeuralNetworks(BNNs)areattractiveforlowpowerhardwareimplementationofartificialintelligence.Inthiswork,westudyhowmetaplasticbinarizedsynapsesenableBNNstobeusedintheframeworkofmulti-headlearning,i.e.sequentiallylearningseveraltasksandinferencerequiresspecifyingthetask.

Binarized Neural Network (BNN) Synapses{+1,-1}

Synapses{+1,-1} Synapses{+

1,-1}

Hubara,Courbariauxetal.NIPS2016

•  BinarizedNeuralNetworksachievestateoftheartresultsinimagerecognition,andrelyonsimplelogicoperations.

•  Abinaryweightisthesignofafloatingvaluewhichisnotaweightasthelossandgradientsarecomputedusingbinaryvaluesonly.

→ ↔

MagneticTunnelJunction

TE

BE

TE

BE

LRS HRS

ResistiveRAMPhaseChangeMemory

Nanodevices Using Magnetism, Spintronics, Ionics Provide Artificial Synapses

Richsynapticbehaviorscanbeemulatedbynanodevices

Real Synapses Are Metaplastic

•  Connectionnistmodelsaresubjecttocatastrophicforgetting,whenlearningsequentiallyseveraltasks.

•  Rememberingprevioustasksandlearningnewtasksseemsincompatiblewithrespecttosynapseplasticity:weneedtopreventsynapsesfromchanginginordertorememberandlearningrequiressynapsestochange.

•  Metaplasticsynapseswithawiderangeofplasticityareawayofsolvingthisparadox. Fusietal.Neuron2005

Weight:-1+1

Consolidation Processes for BNN Synapses •  OptimizationisdonewithAdam(Kingma,LeiBaICLR2015)onthefloatingvalueunderlyingthe

binaryweight.p(t)isthepointonthehypercubeonwhichtheBNNisevaluatedattimestept.

•  Synapticmetaplasticityisintroducedbymodulatingtheadamupdate.

•  Thefloatingvalueofthebinaryweightcanencodeforametaplasticstate.Asynapseisdescribedbyabinaryweightusedforinferenceandthehiddenfloatingvalueusedforlearningandmemorypurpose.

Permuted Tasks Benchmark

Non Permuted Tasks

•  Fixedpermutationsofpixelsprovidealistofnoncorrelatedtasks.

10 20 30 40Epochs for each tasks

50

60

70

80

90

100

Testaccuracies

Metaplastic BNN

1st tsk

2nd tsk

10 20 30 40Epochs for each tasks

50

60

70

80

90

100

Testaccuracies

Regular BNN

1st tsk

2nd tsk

−4 −2 0 2 40.0

0.2

0.4

0.6

0.8

1 .0fmeta(Wfloat) = 1 − tanh(|Wfloat|)2

−4 −2 0 2 40.0

0.2

0.4

0.6

0.8

1 .0

fmeta(Wfloat) = 1 |Wfloat|<1

0 10 20

0

5

10

15

20

25

0 10 20

0

5

10

15

20

25

0 10 20

0

5

10

15

20

25

0 10 20

0

5

10

15

20

25

•  MetaplasticBNNscanlearnandconsolidateknowledgeandstillbeableoflearninganewtask.

−4 −2 0 2 40.0

0.2

0.4

0.6

0.8

1 .0fmeta(Wfloat) = 1 − tanh(|Wfloat|)2

•  Poorlycorrelatedtasksaregraduallyforgottenbyregularnetworksbecauseofongoingplasticity,whereasmetaplasticityenablesthenetworktolearnseveraltaskssequentially.fCIFAR10correspondstoCIFAR10featuresextractedbyResNet18pretrainedonImageNet.

•  Learningcorrelatedtasksismoredifficultassomeoftherelevantpixelsareshared.Theaccuracyofthefirsttaskabruptlydropswhilelearningthesecondtaskwithregularmodels.

•  Startingwithrandomlyconsolidatedsynapsesbytuningthewidthofweightinitializationisanothermetaplasticityingredientwhichimprovesperformance.

0.3 0.7 1.1 1.5 1.9 2.3Weight initialization width

20

30

40

50

60

70

80

90

100

Testaccuracies

Metaplastic BNN

1st tsk

2nd tsk

0.3 0.7 1.1 1.5 1.9 2.3Weight initialization width

20

30

40

50

60

70

80

90

100

Testaccuracies

Regular BNN

1st tsk

2nd tsk

−4 −2 0 2 40.0

0.2

0.4

0.6

0.8

1 .0fmeta(Wfloat) = 1 − tanh(|Wfloat|)2

•  Fornoncorrelatedtasksitissufficienttostartlearningwithonlyplasticsynapsesandconsolidateuponlearning.Butforcorrelatedtasks,startinglearningwithconsolidatedsynapsesprovideplasticsynapsesforthenexttask.

Conclusions •  Neuroscientists(Fusietal.)haveshownthatbiologicallyplausiblesynapses

maybedescribedbymorethanoneparameter(i.eoneweight)andthatcomplexsynapsedynamicsallowsforlongtermmemory.

•  BinarizedNeuralNetworksseemstocontainonly+1and-1synapticweights,yetweightswithfloatingvaluesfarfrom0arelesslikelytoswitchthanweightswithfloatingvaluescloseto0.WecanthusintroduceametaplasticdynamicsandweshowthatitallowsBNNstohavelongtermmemory.

•  Spintronicsnanodevicesarepromisingfordesigningmetaplasticsynapsesdirectlyfromthephysicsofthematerialandwithlowenergycost.

Count

Count