fault tolerance in reconfigurable computing / fpgas bayram kurumahmut cmpe 516 ms computer...

31
Fault Tolerance in Reconfigurable Computing / FPGAs Bayram Kurumahmut CMPE 516 MS Computer Engineering Bogazici University 27.04.2006

Upload: susanna-murphy

Post on 02-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Fault Tolerance in Reconfigurable Computing /

FPGAs

Bayram Kurumahmut

CMPE 516MS Computer Engineering

Bogazici University 27.04.2006

Outline

• Introduction• Modify Configurable Logic Block (CLB)• Dynamic Serial Testing• Built-In Self Healing (BISH)• Hardware Voter• Configurable Fault Tolerant Processor

(CFTP)• Self-Checking Logic Design (SCLD)• CLB Functional Testing

Introduction

• Configurable Logic Block (CLB)• Interconnect Wires• Interconnect Switches• Configured by SRAM

contents

Configuration SRAM

Modify CLB [4]

• Consider faults only in CLB• Shift configuration data

– Means load only one configuration for test• Very slow process

– Shift this configuration for next tests• Do not change physical design of running application• No intervention at hardware level

– Faster– Better results in test diagnosis and defect/fault

tolerance

Modify CLB [4] (Cont’d)

• SRAM– Assume this as faulty free– Has configuration data– Modify this to enable shifting

configuration• Adding a multiplexer

– Decide shifting direction

• Shifting to east/west/north/south

Modify CLB [4] (Cont’d)

• Hardware overhead– Calculate additional transistor count– Calculate device transistor count– Compare them

Dynamic Serial vs Parallel [5]

• Reduce test configuration time• Require less i/o pin• Faster and easier

Dynamic Serial vs Parallel [5] (Cont’d)

• Consider unprogrammed FPGAs to test– No a specific user designed application

configuration– Consider all configurations

• Generate and download configurations– Time consuming

• Decompose number of configurations• Find test patterns

Dynamic Serial Test [5] (Cont’d)

• Function unit– Multiplexers and one D-Type Flip Flop– Test Pattern requirements for

multiplexers• Detect stuck-on/off faults of them• Stuck-at faults of all their i/o nets• Bridge faults of data inputs

Dynamic Serial Test [5] (Cont’d)

• 11 Test configuration (TC) for function unit

• Provide an efficient way to test many function units in short time– 11 TC * 4096 = 45056 TC for XC6216– Apply parallel testing after this step

Dynamic Serial Test [5] (Cont’d)

• Direct Parallel Testing– Test row or column cells at the same time– TC count increases with FPGA size, 11 TC

per test unit– Not so efficient

• Two – Phase Parallel Testing– Reed-Muller Propagation Chain (RMPC)– 22 TC per test unit, constant– Single faulty function unit location with 4

TC

Dynamic Serial Test [5] (Cont’d)

• Proposed Method– Link all function units into a chain– Test chain integrity in baypass mode– Test function unit with its 11 TCs and

corresponding test patterns (TP)– Return to bypass mode– Repeat for the next function unit

Dynamic Serial Test [5] (Cont’d)

• Compare with parallel testing– Required less TC

• 13 TCs, not 22 TCs

– Locate fault without additional TC– Use less i/o pin

• Simplify test observation

Dynamic Serial Test [5] (Cont’d)

• Disadvantage– Propagation path length

• Depends on array size

– Integrate with parallel approach for large arrays• Additional i/o pins

Built-In Self Healing (BISH) [8]

• Run time self configuration• Implement a soft-processor

– Manage and execute all procedures

• Fault detection/location/repair• Modular redundancy for assurance of

working correctly

BISH - Submicron technology problems [8]

• Single event upsets (SEU)– Radiation-induced transient errors caused by

neutrons from cosmic rays – Alpha particles from packing material– do not physically damage the chip – Changes in memory cell values

• Incorrect data• Improper instruction for processor

• Increase threat of electromigration– Physical damage to chip

BISH - Tasks [8]

• Detection– Scan chain

• Regulary capture net values• Analyze them in soft-processor

• Diagnosis, Repair– Controlled also by soft-processor

• Applied for only SEUs

BISH - Fault Causes [8]

• SEU changing a circuit register value– Possibly a transient error– Invalid in next capture after register update

• SEU changing configuration memory cell– Wrong functionality assignment on FPGA– Readback configuration– CRC check– Partial reconfiguration if incohorency exits

• Permanent physical defect on FPGA– Mark down this defected area

Hardware Voter [6]

• Detect and correct single errors on inputs

• Bypass double errors in X1, X2, X3 by substuting errornous data with spare one, X4

Spare Spare

Detect and correct single errors

Bypass double error by substituting errornous data with spare one Congruency level

of accepted SEs

Unrecoverable error signal

Configurable Fault Tolerant Processor (CFTP) [2]

• Applied for spacecraft onboard processing

• Triple Modular Redundancy (TMR) for soft processor on FPGA– Mitigate bit errors in computation by

detecting and correcting them using voting logic

– On orbit updates, reconfigurations, modifications

• Detect SEU-induced configuration faults

Self-Checking Logic Design (SCLD) [3]

• Map boolean functions into FPGA• Functional cell• Generate complementary outputs• Checker cell

– Verify correctness of final outputs

• Fault: same value at outputs• Increase number of CLBs used but

incorporate self-checking or testability features

SCLD – Fault Types [3]

• Single stuck-at faults in RAM cells• Single stuck-at faults on any line of a

CLB• Functional faults in any multiplexer

within a single CLB• Functional faults in any D-Type Flip

Flop within a single CLB• Single stuck-at faults in any pass

transistor connecting CLBs

SCLD [3]

• k-feasible– 4 inputs for functional cells

• 4-feasible boolean functions required• If not, decompose boolean function before

map it on FPGA

SCLD – Algorithm [3]

• Decompose a sum-of-products expression into 4-feasible expression. Choose the expression with the minimum number of nodes

• Map each expression directly into a 4-input function cell

• Connect outputs of a pair of intermediate function cells to the inputs of a checker cell, and generate the equations for each output of the checker cell

• Cascade the checker cells to form a checker tree. The outputs of the function cell at the last stage are outputs circuit.

SCLD – Example [3]

SCLD – Implementation [3]

CLB Functional Testing [1]

• Gate level testing not required• Use CLB functional property

– AND, OR gate or any boolean expression

• Additional hardware to apply test– Multiplexer– Example for 2-inputs CLB

CLB Functional Testing - Redundant Faults [1]

• CLB function = AND gate– Sa0 on first data input of a multiplexer– Sa0 on second data input of a multiplexer– Sa0 on third data input of a multiplexer– Sa1 on fourth data input of a multiplexer

• CLB function = OR gate– Sa0 on first data input of a multiplexer– Sa1 on second data input of a multiplexer– Sa1 on third data input of a multiplexer– Sa1 on fourth data input of a multiplexer

CLB Functional Testing [1]

• Exhaustive testing applied

• Long test length but high fault coverage– 99.81%, compare with 87.90% of gate-level

testing

Conclusion

• Dynamic reconfigurable environments– Use flexible test of circuits– Repair errors by partial reconfiguration

• Do not disturb normal operation in defect on partial hardware

– Design your processor on them to provide self-test on circuit

References• [1] Testing of FPGA Logic Cells, E. Bareisa, V.Jusas, K.Motiejunas,

R.Seinauskas, 2004 ISSN 1392-1215 Elektronika IR Elektrotechnica.• [2] Configurable Fault-Tolerant Processor (CFTP) for SpaceCraft

Onboard Processing, Charles A. Hulme, Herschel H. Loomis, Alan A. Ross, Rong Yuan, 2004 IEEE Aerospace Conference Proceedings

• [3] Self-Checking Logic Design for FPGA Implementation, Parag K. Lala, Alfred L. Burress, 2003 IEEE Transactions on Instrumentation and Measurement

• [4] FPGAs and Fault Tolerance, Abderrahim Doumar, Hideo Ito, 2001 The 13th International Conference on Microelectronics

• [5] Fault Detection and Location of Dynamic Reconfigurable FPGAs, Chi-Feng Wu, Cheng-Wen Wu

• [6] FPGA Implementation of Hardware Voter, Milos D. Krstic, Mile K. Stojcev, TELSIKS 2001 IEEE

• [7] Testing the Configurability of Dynamic FPGAs, N. Park, S. J. Ruiwale, F. Lombardi, 2000 IEEE

• [8] A Self –Healing Real-Time System Based on Run-Time Self Reconfiguration, Manuel G. Gericota, Gustavo R. Alves, Jose M. Ferreira, 2005 IÊEE

• [9] Testing Approach within FPGA-based Fault Tolerant Systems, Abderrahim Doumar, Hideo Ito, 2000 IEEE