self-repairing electronic logic units based on convergent cellular … · 2015. 12. 20. · hillock...
TRANSCRIPT
-
TESConf 5-6 Nov 2012 TESConf 5-6 Nov 2012
Self-repairing electronic logic units based on convergent cellular automata
Richard McWilliam
-
TESConf 5-6 Nov 2012
The need for fault-tolerant electronic systems
Self-repair bring big benefits to MRO
• Better in-flight fault detection • Register self-repair events • Improve scheduled maintenance
Quantas flight 72
• Suffered in-flight upset (suspected SEE) • Failure of air data inertial reference unit • Flight control primary computer could not
handle data • 12 serious injuries, 95 minor injuries
-
TESConf 5-6 Nov 2012 TESConf 5-6 Nov 2012
Mission critical electronics
Single Event Upsets (SEU)
Especially prevalent in SRAM
Resulted in rad-hardened devices
Virtex-5QV
Cibola flight experiment (CFESat)
9x Virtex 1000 (6M bits) launched 2007
Low orbit, SEU rates varied from 0.13/hr (quiet
sun) to maximum of 4.2/hr
See e.g., M. Wirthlin, et al, 11th Annual IEEE Symp. On field-programmable custom computing machines, 2003, pp. 133-142
NASA Radiation Belt Storm Probes (RBSP) mission
-
TESConf 5-6 Nov 2012
Microsystems Packaging failure mechanisms
Wearout Mechanisms Overstress Mechanisms
Mechanical Electrical
Brittle fracture
Plastic
Deformation
Interfacial
delamination
EMI
ESD
Radiation
Gate oxide
Breakdown
Interconnect
Melting
Mechanical Chemical Electrical
Fatigue damage
Creep
Wear
Stress-driven
voiding
Interfacial
delamination
Hillock formation
Junction spiking
Electromigration
Corrosion
Diffusion
Dendritic growth
(Source: Fundamentals of Microsystems Packaging by Rao Tummala)
Root cause Failure analysis: Electronics
-
TESConf 5-6 Nov 2012 TESConf 5-6 Nov 2012
Self-recovery strategies
Fault-tolerant structures contain redundancy to permit one or more fault occurrence. E.g., EDC or Quadded logic. Self-repair requires reconfiguration after the fault event. Self-preservation takes pre-emptive actions to minimise effects of impending fault event (health monitoring, data analysis).
Self-recovery Self-recovery
-
TESConf 5-6 Nov 2012 TESConf 5-6 Nov 2012
Built-in self-test and repair:
BISTAR (Other related: BIST,BISD,BIRA,BISR)
Built-in logic, but not self-repairing.
Goal is BISTAR at system level:
E.g., A. Benso, et al, “An on-line BIST RAM architecture with self-repair capabilities,” IEEE Transactions on Reliability, vol. 51, no. 1, pp. 123 –128, Mar. 2002.
-
TESConf 5-6 Nov 2012 TESConf 5-6 Nov 2012
Towards BISTAR
Self-healing system
Built-in reconfiguration is hard to do in electronics
-
TESConf 5-6 Nov 2012 TESConf 5-6 Nov 2012
Towards self-repair
Sea of logic and
switching
BIS
R
BIS
D
Self-restoration
Goals
BISDAR within a ‘sea of logic’
Self-restoring (transient upsets)
Fault tolerant with redundancy (hard faults)
Does not require major in-circuit reprogramming
-
TESConf 5-6 Nov 2012 TESConf 5-6 Nov 2012
Cellular electronics approach
Composed of many identical ‘cells’
Each cell contains a copy of rules
Global output state depends on the rules and boundary conditions
O
O
I1
I2
I1 I2 O
0 1 0
2 5 -1
LUT Cellular Automata
A (t=0)
cell
boundary
CA (t=1)
-
TESConf 5-6 Nov 2012
Convergent Cellular Automata (CCA)
0)( DCAI final
2,2
1,2
2,1
1,1
C
C
C
C
C
0
0
0
0
10
010
001
0001
2,2
1,2
2,1
1,1
d
d
d
d
C
C
C
C
wn
n
w
C1,1 C1,2
C2,1 C2,2
Boundary=0
Boundary
=0
Define a transition function:
Apply convergence criteria to attain
A = transition matrix
D = constant DCC currentnext A
dwC tjitjitji ,1,,,11,, CnC
finalC
-
TESConf 5-6 Nov 2012
CCA Example
Solving above equation:
1 0
0 1
Goal 0 0
0 0
1 1
1 1
1 0
0 -1
1 0
0 1
3 7
-1 5
1 -2
-2 -5
1 0
0 5
1 0
0 1
0
0
0
0
1
0
0
1
10
010
001
0001
d
d
d
d
wn
n
w
Convergence: zero initial state
Convergence: random initial state
(i.e., d=1,n=-1,w=-1)
tjitjitji wC ,1,,,11,, CC-1
finalC
D. Jones, R. McWilliam, and A. Purvis, “Designing convergent cellular automata.” Biosystems, vol. 96, no. 1, pp. 80–85, 2008.
-
TESConf 5-6 Nov 2012
Stem cell algorithm: Self-organised behavior
-
TESConf 5-6 Nov 2012
Dynamic CCA
-
TESConf 5-6 Nov 2012
CCA as coordination layer
-
TESConf 5-6 Nov 2012
On-going work: test platform
Change boundary condition to (4,7)
Full adder logic element
-
TESConf 5-6 Nov 2012
On-going work: test platform
-
TESConf 5-6 Nov 2012
Future work
Subject complete CCA coordination logic to SEUs
Fault injection hardware integrated with MSTS (thermal, electrical, radiation)
ISIS facility Rutherford and Appleton Laboratory (UK), Chipir high energy neutron source (800 MeV)
MSTS Hardware
Routing
logic
Control
Introduce fault tolerant architecture to functional logic layer.
-
TESConf 5-6 Nov 2012
Thank you
Publications
Jones, David, McWilliam, Richard & Purvis, Alan 2008. Designing convergent cellular automata. Biosystems 96(1): 80-85.
Jones, David, McWilliam, Richard. & Purvis, Alan. 2010. Design of a self-assembling, repairing and reconfiguring Arithmetic Logic Unit. In New Advanced Technologies. Aleksandar Lazinica Vienna, Austria: Intech Education and Publishing. 161-176. (http://sciyo.com/books/show/title/new-advanced-technologies)
Jones, D.H., McWilliam, R. & Purvis, A. 2011. Convergence and feedback: a framework for cellular automata design. Journal of Cellular Automata 6(4-5): 399-416
http://sciyo.com/books/show/title/new-advanced-technologieshttp://sciyo.com/books/show/title/new-advanced-technologieshttp://sciyo.com/books/show/title/new-advanced-technologieshttp://sciyo.com/books/show/title/new-advanced-technologieshttp://sciyo.com/books/show/title/new-advanced-technologies