multilayer neuronal network hardware implementation
DESCRIPTION
Design of neuronal processor based on back-propagation, Tunis Science UniversityDesign of neural network processor architecture aimed at performing high-speed operations and having learning capability. Co-simulation with: SystemC & Qt. [2] (Implementation on Xilinx Spartan3).TRANSCRIPT
International Conference on International Conference on Embedded Systems & Critical Embedded Systems & Critical
ApplicationsApplications
Multilayer Neuronal Multilayer Neuronal network hardware network hardware
implementation and co-implementation and co-simulation platformsimulation platform
Nabil ChoubaNabil ChoubaKhaled Ouali Khaled Ouali
Mohamed MoallaMohamed Moalla
laboratoire Lip2laboratoire Lip2
PlanPlan
Neuronal networkNeuronal network Learning algorithmLearning algorithm Precision Limitations Precision Limitations State of artState of art : hardware implementation : hardware implementation NEURONA ArchitectureNEURONA Architecture synthesis resultsynthesis result Platform & Co-simulation ToolPlatform & Co-simulation Tool conclusionconclusion PerspectivesPerspectives
Neuronal Network Neuronal Network
Neuronal network Neuronal network
Advantages :Advantages : Noise tolerance , Noise tolerance , Learning,Learning, Generalizing from examples, Generalizing from examples, Parallelism innate. Parallelism innate.
Disadvantages:Disadvantages: Heavy calculation when learningHeavy calculation when learning
The Neuron unitThe Neuron unit
(.)
wi1
wi2
wim
x1
x2
xm
inpu
t
Synaptic
Biais(bi)
output(Xi)
Activation Fonction
Neuronal Network Neuronal Network PropagationPropagation
N 0,0E1
E2
EN
N 1,0 N k-1,0
N 0,1
N 0,P0
N 1,P1
Nk-1,Pk-1
O0
Oq
N i,0
N i,2
N i,Pi
Propagation
Neuronal Network Neuronal Network Learning : Back-propagationLearning : Back-propagation
N 0,0E1
E2
EN
N 1,0 N k-1,0
N 0,1
N 0,P0
N 1,P1
Nk-1,Pk-1
O0
Oq
N i,0
N i,2
N i,Pi
erreur = ½ (di – Oi) ²
V-inputs
V-output V-desired
d0
dq
Back-propagationBack-propagation
i = Oi (1 – Oi) (di – Oi)Output layer
k-1,0
k-1,Pk-1 k-1,Pk-1
1,0
i,0
i,2
i,Pi
i = Oi (1 – Oi) j j wijHidden layer
Wij (t+1) = Wij (t) + α+1,j Xα,i
Precision Limitations Precision Limitations Theoretical studies give only the minimum number Theoretical studies give only the minimum number
of bits for convergence (back-propagation)of bits for convergence (back-propagation)
Unable to compensate precision error by the Unable to compensate precision error by the increase of the neurons numberincrease of the neurons number
The network architecture can not be theoretically The network architecture can not be theoretically determined.determined.
The simulation is the only way to test the The simulation is the only way to test the convergence of network convergence of network
The implementation hardware must have a generic The implementation hardware must have a generic arithmeticarithmetic
State of artState of art : hardware : hardware implementationimplementation
1995, FPGA size is ~ 5000 gates, it 1995, FPGA size is ~ 5000 gates, it implements only MAC operation.implements only MAC operation.
[Blanz, 92] [Marcelo, 94][Blanz, 92] [Marcelo, 94]
Using the arithmetic series (calculus bit Using the arithmetic series (calculus bit by bit) [Girau, 99] [Beuchat, 01] by bit) [Girau, 99] [Beuchat, 01] [Beuchat, 02] [Trivedi, 77][Beuchat, 02] [Trivedi, 77]
Divide the calculation steps by Divide the calculation steps by reconfigure the FPGA at each stage reconfigure the FPGA at each stage
[Elredge, 94] [Beuchat] [Elredge, 94] [Beuchat]
NEURONA: ArchitectureNEURONA: Architecture
Architecture with generic arithmetic Architecture with generic arithmetic (parameter fixed at synthesis) (parameter fixed at synthesis)
On-line learning On-line learning
On-line topology changingOn-line topology changing
Incremental and decremental algorithms Incremental and decremental algorithms support support
memory F
[activation Function]
CTR
Topology Explorator
memoryT
[topology memory]
ALU
memory W
[synaptic weigh]
Gen adr W
memoryX
[neuronal output]
Gen adr X
NEURONA NEURONA (architecture)(architecture)
NEURONA (inputs & NEURONA (inputs & outputs)outputs)
Chip Select (cs) pin are used to read and write in to embedded memory
modexec pin Two mode of execution: propagation et back-propagation
endexec pin indicate the end of execution.
addr
dataIn
dataOut
csRamX
csRamW
csRamF
csRamT
R/W
propag
retro
modexecendexecNEURONA
rst
clk
NEURONANEURONA (memory coding) (memory coding)
X[2,1])
X[2,2])
X[2,0]
W2
W4
W3
X[0,1])
X[1,0]
X[1,1]
X[0,2])
X[0,0]
W5
W7
W6
W8
W10
W9
W11
W13
W12
W14
W16
W15
W17
W1
3
3
0 (end)
2
neuron 4
neuron 5
neuron 6
neuron 7
neuron 8
layer 0
layer 1
layer 2
memory W[Values of weight neurons ]
memory X[Values weights output neurons]
memory T[Topologie]
n1
n2
n3
n4
n5
n6
n7
n8
Biais
Biais
W1
W2
W3
W4
W5
W6
W7
W8
W9
W10
W11
W12
W13
W14
W15W16
W17
input-0
input-2
input-1
Propagation (RAM X)Propagation (RAM X)
RAM X
input
output
input-0input-0
input-1input-1
input-2input-2
……
X[1,0]X[1,0]
X[1,1]X[1,1]
……
X[2,0]X[2,0]
X[2,1]X[2,1]
……
……
……
……
output-0output-0
output-1output-1
output-2output-2
……
Propagation
Propagation Propagation (exec )(exec )
w1 w2 w3 w4 w5 w6 w7 w8 w9 w10 w11 w12
B X1 X2 X4 B X1 X2 X5 B X1 X2 X6 B X1 X2 X7
x4
x5
x6
x7
x1
x2
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10
w11
w12
Biais
acc w1 acc acc + w2 * X[0,0] acc acc + w3 * X[0,1] - activation fonctionX [1,0] sigmoïde (acc)
back-propagation (RAM X)back-propagation (RAM X)
RAM X
output
input-0input-0
input-1input-1
input-2input-2
……
désirée-désirée-00
désirée-désirée-11
désirée-désirée-22
……
X[1,0]X[1,0]
X[1,1]X[1,1]
……
X[2,0]X[2,0]
X[2,1]X[2,1]
……
……
……
……
output-0output-0
output-1output-1
output-2output-2
……
backpropagation
f,0f,0
f,1f,1
f,2f,2
……
f-1,0f-1,0
f-1,1f-1,1
f-1,2f-1,2
……
……
……
……
……
1,01,0
1,11,1
1,21,2
……
0,00,0
0,10,1
0,20,2
……
back-propagation back-propagation (exec )(exec )
3 2 1
w12 w9 w6 w3 w10 w7 w4 w1 w11 w8 w5 w2
7 6 5 4 3
ALU execution
4
5
6
7
2
3
w1
w2
w3
w4
w5
w6
w7
w8
w9
w10w11
w12
1
Biais
W update
W12 = W12 + 7 X3
W11 = W11 + 7 X2
W10 = W10 + 7 B (biais B = 1)
….
W1 = W1 + 4 B (biais B = 1)
Pas pour le biais
7 6 5 47 6 5 4
2
Generic parameter Generic parameter
Number of bits for WNumber of bits for W Number of bits for output neuron XNumber of bits for output neuron X Fixed point positionFixed point position Memory lengthMemory length Activation function approximationActivation function approximation Number of bits for ALU registersNumber of bits for ALU registers
synthesis result of the VHDL synthesis result of the VHDL codecode
bitsbits
weighweigh
WW
bitsbits
output output
XX
MotsMots
RAM-WRAM-WMots Mots
RAM-XRAM-XMots Mots
RAM-TRAM-TOn-line On-line
LearningLearninglogic logic
ResourceResourcess
Memory Memory useuse
synthesissynthesis
FrequencFrequencyy
1616 1616 200200 128128 44 18%18% 23% 23% 25.0 25.0
MHzMHz 1616 1616 200200 128128 44 xx 56%56% 30%30%
23.9 23.9 MHzMHz
88 88 3232 3232 44 9%9% 15% 15% 37.0 37.0
MHzMHz 88 88 3232 3232 44 xx 23%23% 15% 15% 36.1 36.1
MHzMHz * FPGA: Apex [ep20k100eqc240-2x] * Synthesis constraint: Frequency 20.0 MHz (fixed by the studied board)
Platform & Co-simulation Platform & Co-simulation
Embedded Software In the robot
GUI:Robot And the
obstacles Visualization
Virtual WorldManager
Environment Model
Neuronal Application Model
StandardAlgorithm
SimulatedArithmetic
RTL (SystemC)IP Neurona
Abstraction level
Precision Limitations Precision Limitations experimental resultexperimental result
Robot Learning ToolsRobot Learning Tools
https://www.youtube.com/watch?v=JdeRALwisew
ConclusionConclusion Architecture NEURONA IP:Architecture NEURONA IP: - dedicated- dedicated optimized implantationoptimized implantation Good synthesis result in area and frequencyGood synthesis result in area and frequency
- Generic parameter- Generic parameter Arithmetic lengthArithmetic length favorite the convergence of back-propagationfavorite the convergence of back-propagation Data precision bitsData precision bits precision increaseprecision increase Memory Memory lengthlength & Wight & Wight adapt to the importance of the target applicationadapt to the importance of the target application
- dynamic Reconfigurability - dynamic Reconfigurability Separation of memory X, T, W and FSeparation of memory X, T, W and F direct access for update parameters of the neuronal network direct access for update parameters of the neuronal network
- Platform co-simulation Tool- Platform co-simulation Tool Chose the suitable neuronal topologies & arithmetic by simulationChose the suitable neuronal topologies & arithmetic by simulation Chose generic parameter of the hardware IPChose generic parameter of the hardware IP Co-simulate the software and the hardwareCo-simulate the software and the hardware
Perspectives Perspectives
Increase the level of parallelism Increase the level of parallelism
Use the Platform for other embedded applicationUse the Platform for other embedded application
Use of the systemC TLM paradigm to :Use of the systemC TLM paradigm to :- Accelerate the co-simulation- Accelerate the co-simulation
- Explore gradually new design- Explore gradually new design
- Better interaction between soft & hard- Better interaction between soft & hard
Thank you for Thank you for your attentionyour attention