design of reliable processors based on unreliable devices ... · probabilistic transfer matrices...
TRANSCRIPT
Design of ReliableProcessors Based onUnreliable Devices
Séminaire COMELEC
Lirida Alves de Barros NavinerParis, 1 July 2013
Outline
Basics on reliability
Technology Aspects
Design for Reliability
Conclusions
2 /33 COMELEC Lirida Naviner Paris, June 2013
What is Reliability ?
Reliability is the ability of a system or component to perform itsrequired functions under stated conditions for a specified periodof time.
3 /33 COMELEC Lirida Naviner Paris, June 2013
What is Reliability ?
Reliability is the ability of a system or component to perform itsrequired functions under stated conditions for a specifiedperiod of time.
3 /33 COMELEC Lirida Naviner Paris, June 2013
When is Reliability Important ?
Safety criticalapplications
• Biomedical• Transport• Spatial• Energy• Security• ...
But today also for• Computer• Communications• Consumer
4 /33 COMELEC Lirida Naviner Paris, June 2013
Is the Electronics Reliable ?
inputs ‘expected’ outputsPerformancePowerData integrityAvailabilitySecurity
HW/SW system
Shocks(mechanical, temperature)Radiations
Design errorsSoftware failures
Malicious attacksHuman errors
Unexpected conditions of use
DefectsProcess variation
Ageing Noise
5 /33 COMELEC Lirida Naviner Paris, June 2013
Advances in CMOS
Moore’s law (popular form) : 2× Ntr/mm2 every 18 months
Intel 4004 (1971) : 10µm and 2.3× 103 tr Intel Xeon Phi (2012) : 22nm and 5× 109 tr
6 /33 COMELEC Lirida Naviner Paris, June 2013
More Moore
Scaling issues• Design complexity, test challenge, low power voltage• Variability – Modelling• Sensitivity to unscaled environmental disturbances
Scaling effects• Yield decrease• Reliability decrease
7 /33 COMELEC Lirida Naviner Paris, June 2013
Scaling and Reliability
ageing, single and multiple transient faults
8 /33 COMELEC Lirida Naviner Paris, June 2013
Fault Propagation
9 /33 COMELEC Lirida Naviner Paris, June 2013
Bathtub Curve
10 /33 COMELEC Lirida Naviner Paris, June 2013
Some Metrics
Mean Time To Failures MTTFMean Time To Repair MTTRMean Time Between FailuresMTBFFailures In Time FITFailure Rate
R = e−Time
MTBF
R = Prob(exact)
11 /33 COMELEC Lirida Naviner Paris, June 2013
Research at Telecom ParisTech
How to get optimized design of processors ?
12 /33 COMELEC Lirida Naviner Paris, June 2013
Research at Telecom ParisTech
How to get optimized design of reliable processorsbased on unreliable devices ?
Risk minimizationMore (than) MooreFabless generalization
Reliability improvement⇒ penalties !
12 /33 COMELEC Lirida Naviner Paris, June 2013
Masking Effects
⇒ Reliability assessment !
13 /33 COMELEC Lirida Naviner Paris, June 2013
Hardware Fault Injection
14 /33 COMELEC Lirida Naviner Paris, June 2013
Probabilistic Transfer Matrices (PTM)
000
001
010
011
100
101
110
111
0 1
000
001
010
011
100
101
110
111
0
0
0
1
1
1
1
0
0 10 1
00
01
10
11
00
01
10
11
1
1
1
1
0
0
0
01
0
0
1
0
1
1
0
0 1
1− q1− q
1− q
1− q
q
q
inputs
outputs
1− qqq
1− qq
1− q1− q
q
PTMXOR3
ROR = q
a
bz
zabc
RXOR3 = q
ITMOR
1− q
1− q
1− q
q 1− q
q
q
q
PTMOR
ITMXOR3
inputs
outputs
inputs
inputs
outputs outputs
[1] S. Krishnaswamy, G.F Viamontes, I.L. Markov, I.L and J.P Hayes, Accurate reliability evaluation and enhan-cement via probabilistic transfer matrices, DATE’2005
?
15 /33 COMELEC Lirida Naviner Paris, June 2013
PTM Computation
p
q
q
p
p
q
q
q
q
p
p
p
q
q
q
p
p
p
p
q
p
q
q
pq ·
p
q
q
pq ·
p
q
q
pq ·
p
q
q
pp ·
p
q
q
pq ·
p
q
q
pp ·
p
q
q
pp ·
p
q
q
pp ·
q2 p2
q2 p2
pq pq
pq pq
q2 p2
q2 p2
pq pq
pq pq
q2 p2
q2 p2
p2 q2
p2 q2
pq pq
pq pq
pqpq
pq pq
= =
A
B
C
S PTM PTM PTMNOT = AND = NOR =
PTM = PTM PTML1 AND NOT
Level 2 (L2)Level 1 (L1)
p
q
q
p
p
q
q
q
q
p
p
p
q
q
q
p
p
p
p
q
p
q
q
pq ·
p
q
q
pq ·
p
q
q
pq ·
p
q
q
pp ·
p
q
q
pq ·
p
q
q
pp ·
p
q
q
pp ·
p
q
q
pp ·
q2 p2
q2 p2
pq pq
pq pq
q2 p2
q2 p2
pq pq
pq pq
q2 p2
q2 p2
p2 q2
p2 q2
pq pq
pq pq
pqpq
pq pq
= =
A
B
C
S PTM PTM PTMNOT = AND = NOR =
PTM = PTM PTML1 AND NOT
Level 2 (L2)Level 1 (L1)
16 /33 COMELEC Lirida Naviner Paris, June 2013
PTM Computation
p
q
q
p
p
q
q
q
q
p
p
p
q
q
q
p
p
p
p
q
p
q
q
pq ·
p
q
q
pq ·
p
q
q
pq ·
p
q
q
pp ·
p
q
q
pq ·
p
q
q
pp ·
p
q
q
pp ·
p
q
q
pp ·
q2 p2
q2 p2
pq pq
pq pq
q2 p2
q2 p2
pq pq
pq pq
q2 p2
q2 p2
p2 q2
p2 q2
pq pq
pq pq
pqpq
pq pq
= =
A
B
C
S PTM PTM PTMNOT = AND = NOR =
PTM = PTM PTML1 AND NOT
Level 2 (L2)Level 1 (L1)
p
q
q
q
q
p
p
p
q2 p2
q2 p2
pq pq
pq pq
q2 p2
q2 p2
pq pq
pq pq
q2 p2
q2 p2
p2 q2
p2 q2
pq pq
pq pq
pqpq
pq pq
2 2 32p q + pq + q p q + 2pq + p 2 2 3
2p q + p + q2 3 3
2p q + p + q2 3 3
2 32pq + p + q 3
223p q + pq 3 32pq + p + q2
2 2 3p q + 2pq + p
2 2 32p q + pq + q
2 2 32p q + pq + q
2 2 32p q + pq + q p q + 2pq + p 2 2 3
p q + 3pq2 2
p q + 3pq2 2
p q + 3pq2 2
= PTM L1 · PTMNORPTM CIR = = ·
p q + 2pq + p 2 2 3
16 /33 COMELEC Lirida Naviner Paris, June 2013
Reliability Calculation with PTM
Rcir =∑
ITMcir (i,j)=1
p(j |i)p(i)
p(i) is the probability of input value ip(j |i) is the (i , j)th element in PTMcir
1
0
1
0
1
0
1
1
0
1
0
1
0
1
0
0
CIR =ITM
2 2 32p q + pq + q p q + 2pq + p 2 2 3
2p q + p + q2 3 3
2p q + p + q2 3 3
2 32pq + p + q 3
223p q + pq 3 32pq + p + q2
2 2 3p q + 2pq + p
2 2 32p q + pq + q
2 2 32p q + pq + q
2 2 32p q + pq + q p q + 2pq + p 2 2 3
p q + 3pq2 2
p q + 3pq2 2
p q + 3pq2 2
p q + 2pq + p 2 2 3
PTM CIR =
17 /33 COMELEC Lirida Naviner Paris, June 2013
Signal Probability and Reliability (SPR)
S =
[0c 1i0i 1c
]
SPRS =
[p(s = 0c) p(s = 1i)p(s = 0i) p(s = 1c)
]=
[s0 s1s2 s3
]
RS = p(s = 0c) + p(s = 1c) = s0 + s3
[1] D. Franco, M. Vasconcelos, L. Naviner et J.-F. Naviner, Signal Probability for Reliability Evaluation ofLogic Circuits, Microelectronics Reliability Journal, Septembre 2008, vol. 48, pp. 1586-1591
18 /33 COMELEC Lirida Naviner Paris, June 2013
Signal Reliability Propagation
0.25
0
0
0 0.25
0
0
0
0.25
0
0
0
0.25
0
0
0
0.5
0
0
0.5
0.5
0
0
0.5
=0.92
0.92
0.08
0.08
0.08 0.92
0.080.92
0.02
0.02
0.02
0.23
0.23
0.23
0.23
0.02
0
0
0
1
1
1
1
0
0.02
0.69
0.23
0.06
a
by
A4 =
B4 =
PTMNANDINAND = A4 ⊗B4
ITMNAND
qNAND = 0.92
Y4p(y)
y3
y0
?
19 /33 COMELEC Lirida Naviner Paris, June 2013
Reconvergent Signals
a = b
0
0
0
0
circuit
a0A4 =
a3
b0
b3B4 =
a3q a3q̄
a0q̄ a0q
b3q b3q̄
b0qb0q̄
RNOT = qa
b y(1)
y(0)
Y (1)4 =
Y (0)4 =
R(circuit) = Ry(0)Ry(1) = a3b3q2 + a3b0q
2 + a0b3q2 + a0b0q
2
20 /33 COMELEC Lirida Naviner Paris, June 2013
Multipath SPR
circuit
circuit
0
0
0
0
Pass 1 :
Pass 2 :
RNOT = qa y(0)
y(1)
Y (0)4 =
Y (1)4 =
A4 =
0
q̄1
RNOT = qa y(0)
y(1)
Y (0)4 =
Y (1)4 =
0A4 =
R(circuit) = R(circuit, a0) + R(circuit, a3) = a0q2 + a3q
2
R(circuit, a3) = Ry(0)Ry(1)a3 = a3q2
R(circuit, a0) = Ry(0)Ry(1)a0 = a0q2
0
q̄
0q
q
01
0
0q
0q
q̄
0
q̄
0
21 /33 COMELEC Lirida Naviner Paris, June 2013
Multipath SPR
A
B
C
G1
G2
G3D
E
S
SPRDi = ITMG1 × (SPRA ⊗ SPRBi )× PTMG1 (1)
SPREi = ITMG2 × (SPRBi ⊗ SPRC)× PTMG2 (2)
SPRSi = ITMG3 × (SPRDi ⊗ SPREi )× PTMG3 (3)
SPRS =4∑
i=1
SPRSi × wi (4)
[1] D. Teixeira Franco, M. Rabelo de Vasconcelos, L. A. B. de Naviner, and J. cois Naviner, "A Tool forSignal Reliability Analysis of Logic Circuits," DATE’09
22 /33 COMELEC Lirida Naviner Paris, June 2013
Conditional Probability
Z2
Z1
B2
B1
I2
I1
S
p(Z1 = i) = f1(S, I1,B1)⇒ p(Z1 = i |S = k) = f1(I1,B1)p(Z2 = j) = f2(S, I2,B2)⇒ p(Z2 = j |S = k) = f2(I2,B2)
23 /33 COMELEC Lirida Naviner Paris, June 2013
Conditional Probability
G G G G G G G G G G G G G G G G
1i0i0c
0c 0c 0c0c
B
A
B B B
0i
1i 1i0i 0i 0i 0i0i 0i 0i 0i1c 1c 1c 1c 1c 1c 1c0c1c
p(Z|A1) p(Z|A2) p(Z|A3) p(Z|A4)
1i
0i 0i0i1i 1i 1i 1i
1c
1c 1c 1c 1c
0c 0c 0c0c 0c 0c 0c1i 1i 1i 1i 1i
Logic XOR
24 /33 COMELEC Lirida Naviner Paris, June 2013
Conditional Probability Matrices (CPM)
CPMZ =
p(z1/a1) p(z1/a2) p(z1/a3) p(z1/a4)p(z2/a1) p(z2/a2) p(z2/a3) p(z2/a4)p(z3/a1) p(z3/a2) p(z3/a3) p(z3/a4)p(z4/a1) p(z4/a2) p(z4/a3) p(z4/a4)
?
[1] J. Torras Flaquer, J.-M. Daveau, L. Naviner and Ph. Roche, Handling reconvergent paths using condi-tional probabilities in combinatorial logic netlists reliability estimation, IEEE ICECS’10.
25 /33 COMELEC Lirida Naviner Paris, June 2013
Conditional Probability Matrices (CPM)
CPMY =
p(y1/s1
1 ∩ . . . ∩ sN1 ) . . . p(y1/s1
4 ∩ . . . ∩ sN4 )
p(y2/s11 ∩ . . . ∩ sN
1 ) . . . p(y1/s14 ∩ . . . ∩ sN
4 )p(y3/s1
1 ∩ . . . ∩ sN1 ) . . . p(y1/s1
4 ∩ . . . ∩ sN4 )
p(y4/s11 ∩ . . . ∩ sN
1 ) . . . p(y1/s14 ∩ . . . ∩ sN
4 )
[1] J. Torras Flaquer, J.-M. Daveau, L. Naviner and Ph. Roche, Handling reconvergent paths using condi-tional probabilities in combinatorial logic netlists reliability estimation, IEEE ICECS’10.
25 /33 COMELEC Lirida Naviner Paris, June 2013
Reliability based on CPM
C
D
X
Y
P0
G0 G1
P1PMA
B
ZGout
GN
SPRZ = ITMTGout·MXY · PTMT
Gout
CPMX/A∩B =M∏
i=0
CPMPi CPMY/A∩B =N∏
i=0
CPMGj
p(xi ∩ yj ) =4∑
k=1
4∑l=1
p(xi/ak ∩ bl ) · p(yj/ak ∩ bl ) · p(ak ) · p(bj )
26 /33 COMELEC Lirida Naviner Paris, June 2013
Probabilistic Binomial Reliability Model
R =2ne−1∑
j=0
∏fault−free
qi∏
faulty
(1−qi)2nx−1∑
i=0
p(xi)(
y(xi ,e0)⊕ y(xi ,ej))
Logical masking featuresTechnology/profile aspectsRelevant faultsRelevant input vectors
mnx generic logic circuit y
[1] M. Correia De Vasconcelos, D. Teixeira Franco, L. Alves de Barros Naviner and J. F. Naviner, Reliabi-lity Analysis of Combinational Circuits Based on a Probabilistic Binomial Model, IEEE-NEWCAS andTAISA Conference, Montréal, Canada, June 2008.
27 /33 COMELEC Lirida Naviner Paris, June 2013
Probabilistic Binomial Reliability Model
R =2ne−1∑
j=0
∏fault−free
qi∏
faulty
(1−qi)2nx−1∑
i=0
p(xi)(
y(xi ,e0)⊕ y(xi ,ej))
Logical masking featuresTechnology/profile aspectsRelevant faultsRelevant input vectors
mnx generic logic circuit y
[1] M. Correia De Vasconcelos, D. Teixeira Franco, L. Alves de Barros Naviner and J. F. Naviner, Reliabi-lity Analysis of Combinational Circuits Based on a Probabilistic Binomial Model, IEEE-NEWCAS andTAISA Conference, Montréal, Canada, June 2008.
27 /33 COMELEC Lirida Naviner Paris, June 2013
Probabilistic Binomial Reliability Model
R =2ne−1∑
j=0
∏fault−free
qi∏
faulty
(1−qi)2nx−1∑
i=0
p(xi)(
y(xi ,e0)⊕ y(xi ,ej))
Logical masking featuresTechnology/profile aspectsRelevant faultsRelevant input vectors
mnx generic logic circuit y
[1] M. Correia De Vasconcelos, D. Teixeira Franco, L. Alves de Barros Naviner and J. F. Naviner, Reliabi-lity Analysis of Combinational Circuits Based on a Probabilistic Binomial Model, IEEE-NEWCAS andTAISA Conference, Montréal, Canada, June 2008.
27 /33 COMELEC Lirida Naviner Paris, June 2013
Probabilistic Binomial Reliability Model
R =2ne−1∑
j=0
∏fault−free
qi∏
faulty
(1−qi)2nx−1∑
i=0
p(xi)(
y(xi ,e0)⊕ y(xi ,ej))
Logical masking featuresTechnology/profile aspectsRelevant faultsRelevant input vectors
mnx generic logic circuit y
[1] M. Correia De Vasconcelos, D. Teixeira Franco, L. Alves de Barros Naviner and J. F. Naviner, Reliabi-lity Analysis of Combinational Circuits Based on a Probabilistic Binomial Model, IEEE-NEWCAS andTAISA Conference, Montréal, Canada, June 2008.
27 /33 COMELEC Lirida Naviner Paris, June 2013
Probabilistic Binomial Reliability Model
R =2ne−1∑
j=0
∏fault−free
qi∏
faulty
(1−qi)2nx−1∑
i=0
p(xi)(
y(xi ,e0)⊕ y(xi ,ej))
Logical masking featuresTechnology/profile aspectsRelevant faultsRelevant input vectors
mnx generic logic circuit y
[1] M. Correia De Vasconcelos, D. Teixeira Franco, L. Alves de Barros Naviner and J. F. Naviner, Reliabi-lity Analysis of Combinational Circuits Based on a Probabilistic Binomial Model, IEEE-NEWCAS andTAISA Conference, Montréal, Canada, June 2008.
27 /33 COMELEC Lirida Naviner Paris, June 2013
PBR Implementation
Simulate (or emulate) errors and then compare outputsFIFA : Fault-Injection Fault-Analysis platform[1,2]
• Acceleration of fault injection and functional simulation
fault−free circuit
fault−prone circuit
com
par
ison
outputs
outputs
m
m
n
G
inputs
inputs
n
n
generation
input
generation
fault
coefficients
storage
[1] E. Marques, N. Maciel, L. Naviner and J. F. Naviner, A New Fault Generator Suitable for ReliabilityAnalysis of Digital Circuits, IEEE CUMTA’10.
[2] L. Alves de Barros Naviner, J. F. Naviner, G. Gonçalves dos Santos Jr, E. Crespo Marques and N.Maciel, FIFA : A Fault-Injection-Fault-Analysis-based tool for reliability assessment at RTL level, ES-REF’11.
28 /33 COMELEC Lirida Naviner Paris, June 2013
SNAP Approach
Fault source and fault propagation
gf = 1
ifs
input
analysis
input output
output
ofs
(a)
(b)
ofs = f (ifs,gf , input ,gate)
[1] S. Pagliarini, A. Ben Dhia, L. Naviner and J. F. Naviner, SNaP : a Novel Hybrid Method for CircuitReliability Assessment Under Multiple Faults, ESREF’13.
29 /33 COMELEC Lirida Naviner Paris, June 2013
Reliability Calculation based on SNAP
R = qofs
inputA
inputB
output
gf = 1
ifsA
inputA
analysis
gf = 1
ifsB
inputB
analysis
gf = 2
analysis
output
ofs
(a)
(b)
30 /33 COMELEC Lirida Naviner Paris, June 2013
Comparisons
Accuracy Complexity Trade-off FPGAPTM ,,, /// - -SPR ,/ ,,, - -
SPR-MP ,,, ,,/ Yes -CPM ,,, ,,/ Yes -PBR ,,, ,,/ Yes Yes
SNAP ,,/ ,,, - Yes
31 /33 COMELEC Lirida Naviner Paris, June 2013
Conclusions
Reliability issues and challenges
Need of cost-effective faulttolerant architecturesNeed of efficientassessment approaches
Some Telecom ParisTech contributions (models, methodsand tools)
• SPR, SPR-MP, CPM, SPR-DWAA• PBR, FIFA, SNAP, ...
32 /33 COMELEC Lirida Naviner Paris, June 2013
Conclusions
Reliability issues and challenges
Need of cost-effectivefault tolerant architecturesNeed of efficientassessment approaches
Some Telecom ParisTech contributions (models, methodsand tools)
• SPR, SPR-MP, CPM, SPR-DWAA• PBR, FIFA, SNAP, ...
32 /33 COMELEC Lirida Naviner Paris, June 2013
To know more about ...
Contact : Prof. Lirida A. B. [email protected]
Visit : http ://nanoelec.wp.mines-telecom.fr
33 /33 COMELEC Lirida Naviner Paris, June 2013