multilayer neuronal network hardware implementation

International Conference on International Conference on Embedded Systems & Critical Embedded Systems & Critical

ApplicationsApplications

Multilayer Neuronal Multilayer Neuronal network hardware network hardware

implementation and co-implementation and co-simulation platformsimulation platform

Nabil ChoubaNabil ChoubaKhaled Ouali Khaled Ouali

Mohamed MoallaMohamed Moalla

laboratoire Lip2laboratoire Lip2

PlanPlan

Neuronal networkNeuronal network Learning algorithmLearning algorithm Precision Limitations Precision Limitations State of artState of art : hardware implementation : hardware implementation NEURONA ArchitectureNEURONA Architecture synthesis resultsynthesis result Platform & Co-simulation ToolPlatform & Co-simulation Tool conclusionconclusion PerspectivesPerspectives

Neuronal Network Neuronal Network

Neuronal network Neuronal network

Advantages :Advantages : Noise tolerance , Noise tolerance , Learning,Learning, Generalizing from examples, Generalizing from examples, Parallelism innate. Parallelism innate.

Disadvantages:Disadvantages: Heavy calculation when learningHeavy calculation when learning

The Neuron unitThe Neuron unit

(.)

wi1

wi2

wim

x1

x2

xm

inpu

t

Synaptic

Biais(bi)

output(Xi)

Activation Fonction

Neuronal Network Neuronal Network PropagationPropagation

N 0,0E1

E2

EN

N 1,0 N k-1,0

N 0,1

N 0,P0

N 1,P1

Nk-1,Pk-1

O0

Oq

N i,0

N i,2

N i,Pi

Propagation

Neuronal Network Neuronal Network Learning : Back-propagationLearning : Back-propagation

N 0,0E1

E2

EN

N 1,0 N k-1,0

N 0,1

N 0,P0

N 1,P1

Nk-1,Pk-1

O0

Oq

N i,0

N i,2

N i,Pi

erreur = ½ (di – Oi) ²

V-inputs

V-output V-desired

d0

dq

Back-propagationBack-propagation

i = Oi (1 – Oi) (di – Oi)Output layer

k-1,0

k-1,Pk-1 k-1,Pk-1

1,0

i,0

i,2

i,Pi

i = Oi (1 – Oi) j j wijHidden layer

Wij (t+1) = Wij (t) + α+1,j Xα,i

Precision Limitations Precision Limitations Theoretical studies give only the minimum number Theoretical studies give only the minimum number

of bits for convergence (back-propagation)of bits for convergence (back-propagation)

Unable to compensate precision error by the Unable to compensate precision error by the increase of the neurons numberincrease of the neurons number

The network architecture can not be theoretically The network architecture can not be theoretically determined.determined.

The simulation is the only way to test the The simulation is the only way to test the convergence of network convergence of network

The implementation hardware must have a generic The implementation hardware must have a generic arithmeticarithmetic

State of artState of art : hardware : hardware implementationimplementation

1995, FPGA size is ~ 5000 gates, it 1995, FPGA size is ~ 5000 gates, it implements only MAC operation.implements only MAC operation.

[Blanz, 92] [Marcelo, 94][Blanz, 92] [Marcelo, 94]

Using the arithmetic series (calculus bit Using the arithmetic series (calculus bit by bit) [Girau, 99] [Beuchat, 01] by bit) [Girau, 99] [Beuchat, 01] [Beuchat, 02] [Trivedi, 77][Beuchat, 02] [Trivedi, 77]

Divide the calculation steps by Divide the calculation steps by reconfigure the FPGA at each stage reconfigure the FPGA at each stage

[Elredge, 94] [Beuchat] [Elredge, 94] [Beuchat]

NEURONA: ArchitectureNEURONA: Architecture

Architecture with generic arithmetic Architecture with generic arithmetic (parameter fixed at synthesis) (parameter fixed at synthesis)

On-line learning On-line learning

On-line topology changingOn-line topology changing

Incremental and decremental algorithms Incremental and decremental algorithms support support

memory F

[activation Function]

CTR

Topology Explorator

memoryT

[topology memory]

ALU

memory W

[synaptic weigh]

Gen adr W

memoryX

[neuronal output]

Gen adr X

NEURONA NEURONA (architecture)(architecture)

NEURONA (inputs & NEURONA (inputs & outputs)outputs)

Chip Select (cs) pin are used to read and write in to embedded memory

modexec pin Two mode of execution: propagation et back-propagation

endexec pin indicate the end of execution.

addr

dataIn

dataOut

csRamX

csRamW

csRamF

csRamT

R/W

propag

retro

modexecendexecNEURONA

rst

clk

NEURONANEURONA (memory coding) (memory coding)

X[2,1])

X[2,2])

X[2,0]

W2

W4

W3

X[0,1])

X[1,0]

X[1,1]

X[0,2])

X[0,0]

W5

W7

W6

W8

W10

W9

W11

W13

W12

W14

W16

W15

W17

W1

3

3

0 (end)

2

neuron 4

neuron 5

neuron 6

neuron 7

neuron 8

layer 0

layer 1

layer 2

memory W[Values of weight neurons ]

memory X[Values weights output neurons]

memory T[Topologie]

n1

n2

n3

n4

n5

n6

n7

n8

Biais

Biais

W1

W2

W3

W4

W5

W6

W7

W8

W9

W10

W11

W12

W13

W14

W15W16

W17

input-0

input-2

input-1

Propagation (RAM X)Propagation (RAM X)

RAM X

input

output

input-0input-0

input-1input-1

input-2input-2

……

X[1,0]X[1,0]

X[1,1]X[1,1]

……

X[2,0]X[2,0]

X[2,1]X[2,1]

……

……

……

……

output-0output-0

output-1output-1

output-2output-2

……

Propagation

Propagation Propagation (exec )(exec )

w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12

B X1 X2 X4 B X1 X2 X5 B X1 X2 X6 B X1 X2 X7

x4

x5

x6

x7

x1

x2

w1

w2

w3

w4

w5

w6

w7

w8

w9

w10

w11

w12

Biais

acc w1 acc acc + w2 * X[0,0] acc acc + w3 * X[0,1] - activation fonctionX [1,0] sigmoïde (acc)

back-propagation (RAM X)back-propagation (RAM X)

RAM X

output

input-0input-0

input-1input-1

input-2input-2

……

désirée-désirée-00



……

X[1,0]X[1,0]

X[1,1]X[1,1]

……

X[2,0]X[2,0]

X[2,1]X[2,1]

……

……

……

……

output-0output-0

output-1output-1

output-2output-2

……

backpropagation

f,0f,0

f,1f,1

f,2f,2

……

f-1,0f-1,0

f-1,1f-1,1

f-1,2f-1,2

……

……

……

……

……

1,01,0

1,11,1

1,21,2

……

0,00,0

0,10,1

0,20,2

……

back-propagation back-propagation (exec )(exec )

3 2 1

w12 w9 w6 w3 w10 w7 w4 w1 w11 w8 w5 w2

7 6 5 4 3

ALU execution

4

5

6

7

2

3

w1

w2

w3

w4

w5

w6

w7

w8

w9

w10w11

w12

1

Biais

W update

W12 = W12 + 7 X3

W11 = W11 + 7 X2

W10 = W10 + 7 B (biais B = 1)

….

W1 = W1 + 4 B (biais B = 1)

Pas pour le biais

7 6 5 47 6 5 4

2

Generic parameter Generic parameter

Number of bits for WNumber of bits for W Number of bits for output neuron XNumber of bits for output neuron X Fixed point positionFixed point position Memory lengthMemory length Activation function approximationActivation function approximation Number of bits for ALU registersNumber of bits for ALU registers

synthesis result of the VHDL synthesis result of the VHDL codecode

bitsbits

weighweigh

WW

bitsbits

output output

XX

MotsMots

RAM-WRAM-WMots Mots

RAM-XRAM-XMots Mots

RAM-TRAM-TOn-line On-line

LearningLearninglogic logic

ResourceResourcess

Memory Memory useuse

synthesissynthesis

FrequencFrequencyy

1616 1616 200200 128128 44 18%18% 23% 23% 25.0 25.0

MHzMHz 1616 1616 200200 128128 44 xx 56%56% 30%30%

23.9 23.9 MHzMHz

88 88 3232 3232 44 9%9% 15% 15% 37.0 37.0

MHzMHz 88 88 3232 3232 44 xx 23%23% 15% 15% 36.1 36.1

MHzMHz * FPGA: Apex [ep20k100eqc240-2x] * Synthesis constraint: Frequency 20.0 MHz (fixed by the studied board)

Platform & Co-simulation Platform & Co-simulation

Embedded Software In the robot

GUI:Robot And the

obstacles Visualization

Virtual WorldManager

Environment Model

Neuronal Application Model

StandardAlgorithm

SimulatedArithmetic

RTL (SystemC)IP Neurona

Abstraction level

Precision Limitations Precision Limitations experimental resultexperimental result

Robot Learning ToolsRobot Learning Tools

https://www.youtube.com/watch?v=JdeRALwisew

ConclusionConclusion Architecture NEURONA IP:Architecture NEURONA IP: - dedicated- dedicated optimized implantationoptimized implantation Good synthesis result in area and frequencyGood synthesis result in area and frequency

- Generic parameter- Generic parameter Arithmetic lengthArithmetic length favorite the convergence of back-propagationfavorite the convergence of back-propagation Data precision bitsData precision bits precision increaseprecision increase Memory Memory lengthlength & Wight & Wight adapt to the importance of the target applicationadapt to the importance of the target application

- dynamic Reconfigurability - dynamic Reconfigurability Separation of memory X, T, W and FSeparation of memory X, T, W and F direct access for update parameters of the neuronal network direct access for update parameters of the neuronal network

- Platform co-simulation Tool- Platform co-simulation Tool Chose the suitable neuronal topologies & arithmetic by simulationChose the suitable neuronal topologies & arithmetic by simulation Chose generic parameter of the hardware IPChose generic parameter of the hardware IP Co-simulate the software and the hardwareCo-simulate the software and the hardware

Perspectives Perspectives

Increase the level of parallelism Increase the level of parallelism

Use the Platform for other embedded applicationUse the Platform for other embedded application

Use of the systemC TLM paradigm to :Use of the systemC TLM paradigm to :- Accelerate the co-simulation- Accelerate the co-simulation

- Explore gradually new design- Explore gradually new design

- Better interaction between soft & hard- Better interaction between soft & hard

Thank you for Thank you for your attentionyour attention

multilayer neuronal network hardware implementation

Technology

propagation i

p i propagation

output layer

w i1 w i2 w im x

w number of bits

memory use x online

activation fonction

biais w update w12