maintaining data integrity in programmable logic in atmospheric environments through error detection...

21
Maintaining Data Integrity in Maintaining Data Integrity in Programmable Logic in Atmospheric Programmable Logic in Atmospheric Environments through Environments through Error Detection Error Detection Joel Seely Technical Marketing Manager Military & Aerospace Business Unit

Upload: conrad-hawkins

Post on 17-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Maintaining Data Integrity in Maintaining Data Integrity in Programmable Logic in Atmospheric Programmable Logic in Atmospheric

Environments through Environments through Error DetectionError Detection

Joel SeelyTechnical Marketing Manager

Military & Aerospace Business Unit

Page 2: Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Single Event Upset (SEU) Single Event Upset (SEU) Overview for SRAM-Based Overview for SRAM-Based

FPGAsFPGAs

Page 3: Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Copyright © 2004 Altera Corporation

DefinitionsDefinitions

SEU: Single Event Upset Unwanted Change in State of a Latch or a

Memory Cell SER: Soft Error Rate

SEU Rate SEFI: Single Event Functional Interrupt

Functional Failure by SEU Not All SEUs are SEFIs Generally Takes 5-10 SEUs to Cause SEFI

Page 4: Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Copyright © 2004 Altera Corporation

Circuit Components of Circuit Components of SRAM-Based FPGAsSRAM-Based FPGAs I/O Registers & I/O Configuration

No Issue, Very Robust Registers, < 1 FIT

Logic Registers (LEs) No Issues, Very Robust Registers, < Hard Error Rate

User Memory Typically On-Chip Memories are “By 9” for

Parity Checking IP Available for ECC

Configuration RAM (CRAM) for LUTs & Routing Area of Focus

Page 5: Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Copyright © 2004 Altera Corporation

Upset of a CRAM CellUpset of a CRAM Cell

Data In

Add

Vcc

Vss

Clear

Data Out

Time

Time

Vo

ltag

e

Vo

ltag

e

6 Transistor Cell

Noise Current for 10fC Collected Charge

0

50

100

150

200

0 50 100 150 200

Time (ps)

Cu

rre

nt

(µA

)

Page 6: Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Copyright © 2004 Altera Corporation

SEU Induced Failure RateSEU Induced Failure Rate**

Device LE Count SEU Rate (FIT)

SEFI Rate (FIT)

MTBF**(Years)

EP1C6 6K 250 60 1,900 Years

EP1C20 20K 730 180 634 Years

EP1S25 26K 1950 400 285 Years

EP1S80 79K 6000 1200 95 Years

* Data at Sea Level

**MTBF: Mean Time Between Functional Interrupt

Page 7: Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Copyright © 2004 Altera Corporation

Altera EP1S25 Neutron SER - WNR data

0.5%1%

16%20%30%40%50%60%70%84%90%

99%99.5%

-3-2.5

-2-1.5

-1-0.5

00.5

11.5

22.5

3

0 10 20 30 40 50

# of CRAM bit upsets for each event of functional upset

Std D

eviat

ion

Number of CRAM Bit Upsets for Each Number of CRAM Bit Upsets for Each Occurrence of Functional UpsetOccurrence of Functional Upset

Altera EP1S25 Alpha SER

0.5%1%

16%20%30%40%50%60%70%

84%90%

99%99.5%

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

0 10 20 30 40 50

# of CRAM bit upsets for each event of functional upset

Std

De

via

tio

n

Median ~6Median ~6 Median 5Median 5

Page 8: Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

AddressingAddressingSystem-Level IssuesSystem-Level Issues

Page 9: Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Copyright © 2004 Altera Corporation

SER Improvements/MitigationSER Improvements/Mitigation

Chip Design Enhancements New Materials & Process Enhancements Larger CRAM Structure Increase in Capacitance on Critical Node Smaller Process => Smaller Die => Lower

SEU Probability Built-In Error Detection/Correction Circuitry

Page 10: Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Copyright © 2004 Altera Corporation

SER Per SRAM Bit TrendSER Per SRAM Bit Trend

Process Technology Year

0.5 µm1995

0.13 µm2002

SE

R p

er S

RA

M M

Bit

100 FITS

1,000 FITS

90 nm Projection

Page 11: Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Copyright © 2004 Altera Corporation

System Level Improvements System Level Improvements MitigationMitigation

ECC for User Memory Use Detection/Correction Feature Triple Module Redundancy (TMR)

To Achieve Lower Error Rate & Less Downtime

Migrate to Structured ASIC

Page 12: Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Copyright © 2004 Altera Corporation

Soft Error Detection MethodsSoft Error Detection Methods

Configuration RAM Readout Read-Out Full Bitstream Compare with Stored Bitstream Can Determine where in Configuration Error Occurred

Caveat: Security Issues with Reading Out Bitstream

StoredCRAMData

StoredCRAMData

FPGAFPGAMicroprocessor

or CPLD

Microprocessoror

CPLD

Same or Different?

Page 13: Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Copyright © 2004 Altera Corporation

Soft Error Detection MethodsSoft Error Detection Methods

On-Chip SEU Detection Dedicated Comparison Circuitry

e.g. CRC Engine Comparing Stored CRC with That Calculated from Configuration RAM

Detection Circuitry Running Continuously Error Detection Rate Variable Based on Implementation of

Hardware, Number of CRAM Bits & Input Clock Frequency Error Signal Available Internally or ExternallyCaveat: Cannot Determine Where in Configuration Error Occurred

Computed Value

Stored Value

To Core

=

FPGA

Page 14: Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Copyright © 2004 Altera Corporation

On-Chip Detection ExampleOn-Chip Detection Example

Dedicated CRC Circuit Configuration RAM Verification Capability

32-Bit Cyclic Redundancy Code Check Verified Against Internally Stored Value Runs in the Background Without Impacting

Device Performance Close to Real-Time Detection

Variable Clock Frequency Depends on Number of CRAM Bits

Multi-Event Detection Up to 3-Bit for 32-Bit CRC

Result Output to Either Core or Pin Use with Either Internal or External Hardware for

Error Correction

Page 15: Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Copyright © 2004 Altera Corporation

Correction MethodsCorrection Methods

FPGA Detection, System-Level Correction Lower Total Cost Downtime Is Limited & Manageable Used in Non-Critical Applications

Triple Module Redundancy Two Flavors

All On-Chip in FPGA Separate Chips & Voter

Correction Can Be Real-Time Used in Critical Applications

Page 16: Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Copyright © 2004 Altera Corporation

Single System Detection & CorrectionSingle System Detection & Correction

Step One: Detect the Soft Error 75% of Reported Errors Are “Don’t Care” Errors

Step Two: Alert the System Step Three: Fix the Error

In Some Cases, Re-Program the FPGA In Some Cases, Reboot the Sub-System In Some Cases, Reboot the System

Need to Focus on System “Downtime” Each System Has Unique Requirements Re-Programming FPGA Takes < 250 ms Rebooting Time Varies & Can Be Fast “by Design”

Page 17: Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Copyright © 2004 Altera Corporation

TMR Method 1TMR Method 1

Identical Hardware in

FPGAs

Use Voter Implemented

in FPGA or CPLD

Utilize Either Hardware

Output or CRC Error Pin

Voter Also Used to

Signal Reconfiguration

on Difference or Error

FPGAHardware1

FPGAHardware1

FPGAHardware3

FPGAHardware3

FPGAHardware 2

FPGAHardware 2

FPGA orCPLD

(Voting)

FPGA orCPLD

(Voting)

Page 18: Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Copyright © 2004 Altera Corporation

TMR Method 2TMR Method 2

Multiple Instantiations of

Hardware in Single FPGA

For Low-Rate SEUs

SEU Events May Occur Much

More Frequently than

Functional Error (De-Rating)

Voter Signals Reconfiguration

of FPGA

FPGA Must be Reconfigured

VotingCircuitVotingCircuit

FPGA

Hardware 1

Hardware 1

Hardware 2

Hardware 2

Hardware 3

Hardware 3

Page 19: Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Copyright © 2004 Altera Corporation

De-RatingDe-Rating Methodology Methodology Only a Fraction of Configuration Bits Are Actually

Programmed e.g. Using Only Two Inputs of 4-Input LUT Leaves 75% of LUT as

“Don’t Care” Only About 20% of Routing Is Used Depends on Utilization & Application

Some Un-Programmed Bits Still Matter Flipping Could Change Function of the Device

Extensive Experimentation Shows a Range From 1/8 to 1/3 of the Bits Matter

Page 20: Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Copyright © 2004 Altera Corporation

Structured ASIC: Ultimate SEU Structured ASIC: Ultimate SEU ProtectionProtection

No Configuration Memory = Estimated SER is below Hard Failure Rate for the Device

FPGA Structured ASIC

PLD Architecture with ASIC Routing

Page 21: Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &

Copyright © 2004 Altera Corporation

SummarySummary SEU is a Well Understood Phenomena Many Chip Level Enhancements Mitigate SEUs

Process Design Manufacturing Techniques

Easy Detection of SEU Events is Key After Detection, Other Methods Must be Employed to Deal

with the Event Critical Nature of Application Determines Level of SEU Response

Structured ASICs from FPGA Designs Offer a Much More Robust Solution Due to Removal of All CRAM