self-repairing electronic logic units based on convergent cellular … · 2015. 12. 20. · hillock...

18
TESConf 5-6 Nov 2012 TESConf 5-6 Nov 2012 Self-repairing electronic logic units based on convergent cellular automata Richard McWilliam

Upload: others

Post on 01-Feb-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

  • TESConf 5-6 Nov 2012 TESConf 5-6 Nov 2012

    Self-repairing electronic logic units based on convergent cellular automata

    Richard McWilliam

  • TESConf 5-6 Nov 2012

    The need for fault-tolerant electronic systems

    Self-repair bring big benefits to MRO

    • Better in-flight fault detection • Register self-repair events • Improve scheduled maintenance

    Quantas flight 72

    • Suffered in-flight upset (suspected SEE) • Failure of air data inertial reference unit • Flight control primary computer could not

    handle data • 12 serious injuries, 95 minor injuries

  • TESConf 5-6 Nov 2012 TESConf 5-6 Nov 2012

    Mission critical electronics

    Single Event Upsets (SEU)

    Especially prevalent in SRAM

    Resulted in rad-hardened devices

    Virtex-5QV

    Cibola flight experiment (CFESat)

    9x Virtex 1000 (6M bits) launched 2007

    Low orbit, SEU rates varied from 0.13/hr (quiet

    sun) to maximum of 4.2/hr

    See e.g., M. Wirthlin, et al, 11th Annual IEEE Symp. On field-programmable custom computing machines, 2003, pp. 133-142

    NASA Radiation Belt Storm Probes (RBSP) mission

  • TESConf 5-6 Nov 2012

    Microsystems Packaging failure mechanisms

    Wearout Mechanisms Overstress Mechanisms

    Mechanical Electrical

    Brittle fracture

    Plastic

    Deformation

    Interfacial

    delamination

    EMI

    ESD

    Radiation

    Gate oxide

    Breakdown

    Interconnect

    Melting

    Mechanical Chemical Electrical

    Fatigue damage

    Creep

    Wear

    Stress-driven

    voiding

    Interfacial

    delamination

    Hillock formation

    Junction spiking

    Electromigration

    Corrosion

    Diffusion

    Dendritic growth

    (Source: Fundamentals of Microsystems Packaging by Rao Tummala)

    Root cause Failure analysis: Electronics

  • TESConf 5-6 Nov 2012 TESConf 5-6 Nov 2012

    Self-recovery strategies

    Fault-tolerant structures contain redundancy to permit one or more fault occurrence. E.g., EDC or Quadded logic. Self-repair requires reconfiguration after the fault event. Self-preservation takes pre-emptive actions to minimise effects of impending fault event (health monitoring, data analysis).

    Self-recovery Self-recovery

  • TESConf 5-6 Nov 2012 TESConf 5-6 Nov 2012

    Built-in self-test and repair:

    BISTAR (Other related: BIST,BISD,BIRA,BISR)

    Built-in logic, but not self-repairing.

    Goal is BISTAR at system level:

    E.g., A. Benso, et al, “An on-line BIST RAM architecture with self-repair capabilities,” IEEE Transactions on Reliability, vol. 51, no. 1, pp. 123 –128, Mar. 2002.

  • TESConf 5-6 Nov 2012 TESConf 5-6 Nov 2012

    Towards BISTAR

    Self-healing system

    Built-in reconfiguration is hard to do in electronics

  • TESConf 5-6 Nov 2012 TESConf 5-6 Nov 2012

    Towards self-repair

    Sea of logic and

    switching

    BIS

    R

    BIS

    D

    Self-restoration

    Goals

    BISDAR within a ‘sea of logic’

    Self-restoring (transient upsets)

    Fault tolerant with redundancy (hard faults)

    Does not require major in-circuit reprogramming

  • TESConf 5-6 Nov 2012 TESConf 5-6 Nov 2012

    Cellular electronics approach

    Composed of many identical ‘cells’

    Each cell contains a copy of rules

    Global output state depends on the rules and boundary conditions

    O

    O

    I1

    I2

    I1 I2 O

    0 1 0

    2 5 -1

    LUT Cellular Automata

    A (t=0)

    cell

    boundary

    CA (t=1)

  • TESConf 5-6 Nov 2012

    Convergent Cellular Automata (CCA)

    0)( DCAI final

    2,2

    1,2

    2,1

    1,1

    C

    C

    C

    C

    C

    0

    0

    0

    0

    10

    010

    001

    0001

    2,2

    1,2

    2,1

    1,1

    d

    d

    d

    d

    C

    C

    C

    C

    wn

    n

    w

    C1,1 C1,2

    C2,1 C2,2

    Boundary=0

    Boundary

    =0

    Define a transition function:

    Apply convergence criteria to attain

    A = transition matrix

    D = constant DCC currentnext A

    dwC tjitjitji ,1,,,11,, CnC

    finalC

  • TESConf 5-6 Nov 2012

    CCA Example

    Solving above equation:

    1 0

    0 1

    Goal 0 0

    0 0

    1 1

    1 1

    1 0

    0 -1

    1 0

    0 1

    3 7

    -1 5

    1 -2

    -2 -5

    1 0

    0 5

    1 0

    0 1

    0

    0

    0

    0

    1

    0

    0

    1

    10

    010

    001

    0001

    d

    d

    d

    d

    wn

    n

    w

    Convergence: zero initial state

    Convergence: random initial state

    (i.e., d=1,n=-1,w=-1)

    tjitjitji wC ,1,,,11,, CC-1

    finalC

    D. Jones, R. McWilliam, and A. Purvis, “Designing convergent cellular automata.” Biosystems, vol. 96, no. 1, pp. 80–85, 2008.

  • TESConf 5-6 Nov 2012

    Stem cell algorithm: Self-organised behavior

  • TESConf 5-6 Nov 2012

    Dynamic CCA

  • TESConf 5-6 Nov 2012

    CCA as coordination layer

  • TESConf 5-6 Nov 2012

    On-going work: test platform

    Change boundary condition to (4,7)

    Full adder logic element

  • TESConf 5-6 Nov 2012

    On-going work: test platform

  • TESConf 5-6 Nov 2012

    Future work

    Subject complete CCA coordination logic to SEUs

    Fault injection hardware integrated with MSTS (thermal, electrical, radiation)

    ISIS facility Rutherford and Appleton Laboratory (UK), Chipir high energy neutron source (800 MeV)

    MSTS Hardware

    Routing

    logic

    Control

    Introduce fault tolerant architecture to functional logic layer.

  • TESConf 5-6 Nov 2012

    Thank you

    Publications

    Jones, David, McWilliam, Richard & Purvis, Alan 2008. Designing convergent cellular automata. Biosystems 96(1): 80-85.

    Jones, David, McWilliam, Richard. & Purvis, Alan. 2010. Design of a self-assembling, repairing and reconfiguring Arithmetic Logic Unit. In New Advanced Technologies. Aleksandar Lazinica Vienna, Austria: Intech Education and Publishing. 161-176. (http://sciyo.com/books/show/title/new-advanced-technologies)

    Jones, D.H., McWilliam, R. & Purvis, A. 2011. Convergence and feedback: a framework for cellular automata design. Journal of Cellular Automata 6(4-5): 399-416

    http://sciyo.com/books/show/title/new-advanced-technologieshttp://sciyo.com/books/show/title/new-advanced-technologieshttp://sciyo.com/books/show/title/new-advanced-technologieshttp://sciyo.com/books/show/title/new-advanced-technologieshttp://sciyo.com/books/show/title/new-advanced-technologies