april 30, 2014 1 cost efficient soft-error protection for asics tuvia liran; ramon chips ltd....

17
April 30, 2014 1 Cost efficient soft- error protection for ASICs Tuvia Liran; Ramon Chips Ltd . [email protected]

Upload: jack-romer

Post on 14-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

April 30, 2014 1

Cost efficient soft-error protection for ASICs

Tuvia Liran; Ramon Chips Ltd.- .tuvia@ramon chips com

April 30, 2014 2

What is soft error

An error in the functionality of the chip/system due to corruption of data

Hard error: An error in the functionality of the chip/system due to permanent damage

April 30, 2014 3

Key topics

• SE basics• Sources and mechanisms of SE• Single event effects• Methodologies for SE mitigation• Summary

April 30, 2014 4

Terminology• SE - Soft Error• SER - Soft Error Rate• SEE – Single Event Effect

– SEU – Single Event Upset– SET – Single Event Transient– SEL – Single Event Latch-up

• SEFI – Single Event Functional Interrupt• LET – Rate of energy loss in Si/SiO2 [Mev/mg/cm2]

• SBU - Single bit upset• MCU - Multi-cell upset• MBU – Multi Bit Upset• MTTF – Mean Time to Fail• FIT – Failure in Time; =10-9/MTTF(h)

April 30, 2014 5

Myth Blasting• SE is relevant only for space environment

• SE is affecting memory cells only

• SE is not a reliability issue

• DRAM is more sensitive to SE than SRAM

• Latchup cannot cause SE

WRON

G

WRON

G

WRON

G

WRON

G

WRON

G

April 30, 2014 6

Sources of soft-errors• Terrestrial:

– Interactions with cosmic particles (Neutrons)

– Alpha particles from die/package

• Space:– Protons (>93%)– Alpha (~6%)– Neutrons– Electrons– Alpha and heavy particles

(up to 500Mev)

April 30, 2014 7

Interaction of Neutrons

• Typical flux – 20N/h/cm2@ sea level (NYC)• Energy of cosmic Neutron – 20-300Mev• Strong dependence on altitude• Higher concentration in polar areas (Lat >60)• Interaction with Boron (20% of B10) , mostly in BPSG

(@≥0.25µ) – most dominant effect • Interacts with various nuclei (W, Si, Pb,Au …)• Shielding is not practical

April 30, 2014 8

Alpha particles

• Alpha particle == He++

• Energy: 1-6MeV / particle (1.5Mev from N->B10)

– ~3.6eV requires to generate e-h pair

• Penetration range in Si – 15-30µ• Generating ~45fQb / Mev

– Critical charge of SRAM cell is ~1fQb– No SEU immune SRAM cell is not practical in VDSM

April 30, 2014 9

SEU in SRAM

April 30, 2014 10

SBU/MCU in SRAM

Single bit upset Multi cell upset

Multi cell upset ≠ Multi bit upset

High MUX ratio -> lower MBU

April 30, 2014 11

Single Event Transient (SET)

• Pulses at logic nodes, caused by ionizing particles• Typical pulse width is 10÷200pS (@Alpha)• Such pulses might cause:

– Permanent error if sampled by FF– Permanent error if activates asynchronous signals

P(error)=T(pulse)/T(period)=T(pulse)*F

April 30, 2014 12

Frequency dependence of SEU/SET

BE

R [

log

]

Frequency [log]

SEU

SET

Total SEE

Lower effective frequency, such as by clock gating, reduces SET

SET can be mitigated by glitch filtering before sampling

April 30, 2014 13

Effective glitch filtration

Glitch filtering• Narrow pulses are filtered

by C-element+delay

Weak SCAN mux• Implementing resistive MUX

at the input of the flip-flop will filter narrow glitches

13

April 30, 2014 14

Impact of scaling

• Scaling reduces the “critical charge”• Faster devices do not “filter” narrow SET pulses• More devices/memory per chip – more sensitive

elements• Higher frequency increases the probability to SET• SE is a major issue in advanced UDSM

April 30, 2014 15

SEE mitigation concepts

• S/W techniques for error correction– Example: 3 processing + voting

• Mitigation at system level– Example: TMR

• Mitigation at chip level – Example: EDAC, SE protected FFs

• Mitigation by circuit– Example: SE protected FFs & SRAM cells

• Mitigation by Si process– Example: SOI (poor !!!)

April 30, 2014 16

SEE sensitivity - application

• Types of data, sorted by SEE sensitivity:– Configuration (FPGA, CNFREG…)– Control logic (FSM, uCode,…)– Executable data (.exe files, I-Cach,…)– Stored data (databases, D-Cach, …)– Temporary data (Video, Audio, …)

April 30, 2014 17

Summary

• SEE is becoming a significant reliability issue in VDSM technologies

• SRAMs are typically the most sensitive elements to SEE

• SE mitigation is possible, mostly by digital techniques and proper methodologies and tools

• Availability of EDA tools for analyzing SER and mitigating SEE is limited. Optima presents a very interesting tool.