SPP1500.itec.kit.edu
Dependable Embedded Systems The SPP 1500 Research Program
by Jörg Henkel (Coordinator) Karlsruhe Institute of Technology (KIT), Germany
Together with: J.Becker, O, Bringmann, U.Brinkschulte, S.Chakraborty, R.Ernst, H.Härtig, L.Hedrich, A.Herkersdorf, P.Marwedel, M.Platzner, U.Schlichtmann, O.Spinczyk, W.Rosenstiel, M.Tahoori, J. Teich, N. Wehn, H.-J.Wunderlich
2
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium The SPP 1500
! Approved by DFG Senate in Spring 2009
! 41 Submissions, 12 accepted ! Kick-Off: January 2011 ! Projected Duration: 3 x 2 years ! Budget: 9+ Mio. Euro
3
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium
Overview ! Background of the SPP 1500
“Dependable Embedded Systems” ! Focus and Goals ! Exemplary Projects
4
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium Variabilities
Que
lle: I
ntel
, 65
nm
frequ
ency
Leakage
Variations for processors an single die
1.2 nm 32 nm
Que
lle: I
ntel
, 65
nm
latency of an Inverter # d
op. a
tom
s # of dop. atoms in T-channel
5
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium Soft Errors through Radiation
n+ n+
p+
N-Well
P-Well
P-Substrate
Isolation
Gate
+-+-
+-+- +-
+- +- +-
+-
+- Depletion Region
High-Energy Particle (Neutron or Proton)
! radiation effects on semiconductor devicesàSoft Errors ! alpha particles ! low-energy neutrons ! high-energy neutrons/protons
! radiation event ! ion track formation ! ion drift ! ion diffusion
! Sensitive areas: ! Channel region of NMOS ! Drain region of PMOS ! “off” state is more sensitive
n+---
- ---- --
- --- -- ----
----
- ---
-- -
+++++++++
++++++++++++++++++
++
n+
--
--
-
-- --
- -
- -
-
--
-
--
+
++
+
++++ +
++
++
3
2
1
010-13 10-12 10-11 10-10 10-9
Time (seconds)
Cur
rent
(arb
itrar
y un
it)
Source: Baumann, TI@Design&Test’05, Ziegler, IBM@IBM JRD’96
HeYX AZ
AZ
42
42 +→ −−
6
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium Temperate remains a Problem
Src
: Hen
kel,
Ebi
, Am
rouc
h
! Example showing localized computation switching between two areas on the chip
MTTF [years]
Temp (Celsius) K. S
kadr
on e
t al.,
ICC
AD
200
4
6.85 7.24 7.73 8.34
9.06
5 6 7 8 9
10
0 5 10 15 20 25
7
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium Temperature and Leakage
! Thermal “runaway” problem: ! Increase in temperature leads to increase in leakage power
à feedback loop possible! ! Sub-threshold leakage
approximated by where A and B are constants à exponential growth!
BT
subI A e−
≈ ⋅
[Zhang 2003]
8
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium Aging: TDDB
! TDDB: Time Dependent Dielectric Breakdown ! Created by:
! Accumulation of trapped charges at dielectric
! Effects: ! Increase of power consumption ! Slowing of switching speed
(TDDB)
9
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium Aging: Electro-Migration
Source: [Stott, 2010]
(Electro-migration)
! Electro-migration: aging effect due to transport of mass in metal interconnects
! directly linked to temperature ! Basic Mean time to failure modeled
by Black’s Equation:
! MTTF decreases exponentially with temperature à Goal: reduce peak temperatures
[wikipedia]
Qn kTMTTF Aj e
⎛ ⎞⎜ ⎟− ⎝ ⎠=
10
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium NBTI
! Negative Bias Temperature Instability ! Breakdown of Si-H bonds at
the silicon-oxide interface due to voltage/thermal stress à causes interface traps
! Affects mostly P-MOSFETs because of negative gate bias ! Effect in N-MOSFETS is
negligible
! NBTI is not yet fully understood
n p
S oxide gate
D
Si Si Si
H+ O H H
P-type MOSFET
Si Si
O H trap
Vg
Vg < 0 à STRESS! Vg = 0
p
11
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium NBTI
! NBTI manifests itself as a shift in Vth ! Causes increase in transistor delay ! Delay faults are responsible for
NBTI induced bit-flips and resulting circuit failure
! Recovery effect in periods of no stress ! When voltage and temperature are
low, Vth can shift back towards ist original value
! Full recovery from a stress period only possible in infinite time à In practice overall Vth shift increases monotonously over longer periods, e.g. months/years
Vth
shi
ft [V
] Time
Stress Recovery
Vg
[V] 0
-1
12
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium NBTI
Std deviation in 65nm SRAM P-MOSFETS Std deviation at 32nm
Vth
shi
ft [V
]
Vth
shi
ft [V
]
Time [years] Time [years]
SRAM Vth shift Std. deviation
SRAM Vth shift Std. deviation
! Mean Vth shift mainly due to Temperature/Voltage ! Small technology nodes have less Vth shift due to lower voltages
! However: Standard deviation of Vth shift mainly due to structure size ! Small technology nodes and small P-MOSFETs (e.g. SRAM) show large
deviations from the mean Vth shift à inceased reliability concern
Src: IBM, KIT
13
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium Other Effects
! This was not a complete list …
! See call: “It is the goal of this Priority Program to develop new system-level methods and architectures that can cope with the negative effects caused by the inherent unreliability observed at transistor and physical level when migrating to new technology nodes”
14
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium
Overview ! Background of the SPP 1500
“Dependable Embedded Systems” ! Focus and Goals ! Exemplary Projects
15
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium
Focus
Devices, Technology, Physics, …
Logic
Architecture
HW/SW System
Dig
ital h
ardw
are/
softw
are
syst
ems
Physical sources
Faults
Error
Failure
Bit-FlipSingle/multi
Temporal andSpatial correlated
Radiation Process variationTemperature Coupling (C)
JitterSignal /
Vdd noise
Crosstalk
Wrong CPU reg. value
Wrong branch decision
Crash Data corruption
“No effect”
Permanent/transient
Electro-migration
Fault Model is needed!
16
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium Focus (cont’d)
Dependable Embedded Software
! Hardware-dependent software ! Operating system and
middleware ! Management of
observation strategies
! Performing online tests
! Perform adaptation ! Scheduling and allocations
schemes ! Application software:
! instruction-level ! task-level ! algorithm level
17
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium Focus (cont’d)
Dependable Embedded Software
Dependable Hardware Architectures
! Hardware Architectures: various levels ! Register-Transfer ! Micro-Architecture ! System-on-Chip
! Technology Abstractions provides physical properties
! Distinguish between: ! Permanent and transient
problems ! Fabrication time and run-time
(detect and fix) ! Possible means:
! Masking of undependable components
! Reconfiguration ! static ! dynamic
18
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium Focus (cont’d)
Dependable Embedded Software
Dependable Hardware Architectures
Technology Abstraction
! This SPP does not deal with technology!
! Means and architectures should be as technology independent as possible
! Technology abstraction should: ! Characterize technology ! Provide technology parameters ! Model undependability ! …
19
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium Focus (cont’d)
Dependable Embedded Software
Dependable Hardware Architectures
Technology Abstraction
20
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium Focus (cont’d)
Dependable Embedded Software
Dependable Hardware Architectures
Technology Abstraction
Ope
ratio
n/O
bser
vatio
n/A
dapt
atio
n
21
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium
Dependable Embedded Software
Dependable Hardware Architectures
Technology Abstraction
Ope
ratio
n/O
bser
vatio
n/A
dapt
atio
n
Des
ign
Met
hods
Focus (cont’d)
22
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium Cross-layer techniques are key
Dependable Embedded Software
Dependable Hardware Architectures
Technology Abstraction
Ope
ratio
n/O
bser
vatio
n/A
dapt
atio
n
Des
ign
Met
hods
23
spp1500.itec.kit.edu
DFG SPP 1500
J. Henkel DVLSI Symposium
Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
Goals ! We are NOT investigating technology
! Bit flip model as resilience articulation point (RAP)
! Investigate reliability cost trade-offs ! What trade-offs exist ! Energy/throughput/area/lifetime
! 2D/3D MPSoC’s, reconfigurable hardware
! Focus on embedded systems ! Consider and take advantage of application knowledge
! Consider the whole stack: hardware, software, application ! Interaction HW/SW/application ! Cross-layer optimization
24
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium
Cost per Transistor
Goals C
ost
Reliability Cost
Product Cost
time
Error resilient Architectures
Scaling NOT profitable Scaling profitable
25
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium
Overview ! Background of the SPP 1500
“Dependable Embedded Systems” ! Focus and Goals ! Exemplary Projects
26
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium
Dependable Hardware Architectures
! Hardware Architectures: various levels ! Register-Transfer ! Micro-Architecture ! System-on-Chip
! Technology Abstractions provides physical properties ! Distinguish between:
! Permanent and transient problems ! Fabrication time and run-time (detect and fix)
! Possible means: ! Masking of undependable components ! Reconfiguration
! static ! dynamic
27
spp1500.itec.kit.edu
DFG SPP 1500
J. Henkel DVLSI Symposium
Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
Adaptive Reliability for SoCs using a CGRA
! Traditional approaches to implement reliable SoCs are expensive and inefficient ! Hardware might have to be duplicated or triplicated to
achieve desired reliability. ! Not all SoC components are required all the time.
! The ARES project focuses on SoCs that include a Coarse Grained Reconfigurable Architecture (CGRA) ! Traditionally CGRAs are used in a SoC as accelerators
of functionality that is not required all the time. ! In the ARES project CGRAs are used to provide
redundancy to SoC components that are not required all the time => time-multiplexed redundancy
(Source: Rosenstiel, Tübingen)
28
spp1500.itec.kit.edu
DFG SPP 1500
J. Henkel DVLSI Symposium
Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
Adaptive Reliability for SoCs using a CGRA
! Two-Layer Methodology 1. Harden the CGRA 2. Use hardened CGRA to improve
reliability of other SoC-components HW1 HW2 HW3
CPU reliable CGRA
MEM
! Benefits ! Redundant hardware only for one component (CGRA)
instead of all SoC components. ! Reliability can be adjusted to the level required by the
current applicaiton. ! Functionality of defective SoC components can be
replicated by the CGRA to enable gracefull degradation.
(Source: Rosenstiel, Tübingen)
29
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium
Error Resilience Exploration of ASIP Timing Errors
FER/BER Analysis
Bit true high performance C++ simulation framework
Source (RNG) Encoding Modulation
AWGN
Channel
Error-free Reference Decoder
Investigate impact of errors on decoding performance via graphical interface
De- Modulation
Varying channel SNR
Over-clocking
XILINX IESE Tool: 70MHz
(Source: Wehn, Kaiserslautern)
30
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium
Dependable Embedded Software
?
(Src: paragoninnovations)
31
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium
Dependable Embedded Software
! Only hardware-dependent software is considered ! Possible means:
! Operating system and middleware ! Management of observation strategies ! Performing online tests ! Perform adaptation
! Scheduling and allocations schemes ! Application software:
! instruction-level ! task-level ! algorithm level
32
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium
Separate error detection from error correction!
� An error is signaled, © Error detection executed in a short amount of time,
classification decides if, when and how to handle the error, � Normal system execution continues, ➂ If required, error correction takes place after timing-critical tasks have
finished but before error has fatal consequences.
➁ ➀
©
➂
(Source: Marwedel/Engel)
33
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium
Application analysis provides information on error propagation
! Values assigned to reliable variables must also be reliable
! Unreliable variables can tolerate errors
! Constraints:
! Pointers/array indices must be reliable
! Loop Conditions must be reliable
! Reliability of if-conditions depends on statements inside body (Source: Marwedel/Engel)
34
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium Technology Abstraction
?
(Src: Intel)
35
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium Technology Abstraction ! This SPP does not deal with technology ! Means and architectures should be as technology
independent as possible ! Technology abstraction should:
! Characterize technology ! Provide technology parameters ! Model undependability ! …
36
spp1500.itec.kit.edu
DFG SPP 1500
J. Henkel DVLSI Symposium
Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
Lifting Device-Level Characteristics for Error
Resilient System-Level Design
Cross-Layer Methods Circuit Level Timing Analysis with restrictions given by the software program. è Program dependent circuit timing with parameterization on yield and aging.
System Analysis With Technology Information Mapping of tasks under consideration of processor age, desired yield, error probability, ...
Hardware-Level Abstraction Models
Abstraction of timing behavior of circuit elements into error probability dependent on given parameters (Age, Yield, Temperature, etc.)
Embedded System
(Source: Schlichtmann/Chakraborty)
37
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium Design Methods
Design Space Generate Extensible Processor Prototoyping Synthesis and
Tape Out
Application Profiling
Indentify - pre-dedined blocks - parameter settings Define - Extensible Instructions - in/exclusion of blocks - parameters settings
Retargetable Tool Generation Compiler Linker
Instruction Set Simulator Assembler
Synthesizeable RTL of - core
- blocks
y n
Explore Extensible Pro-cessor Design
Space
Generate Extensible Processor Prototoyping Synthesis and
Tape Out
Application Profiling
Indentify - Extensible instructions - pre-designed blocks - parameter settings Define - Extensible Instructions - in/exclusion of blocks - parameters settings
Retargetable Tool Generation Compiler Linker
Instruction Set Simulator Assembler
OK ? [Henkel03]
?
38
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium Design Methods
! Design methods and tools for dependable ! Hardware design ! Software design (application software, OS, middleware) ! Simulation ! Synthesis ! Design space exploration
! Means: ! Design systems from the very beginning with dependability in mind
(and not just as an afterthought) ! Therefore: consider dependability as THE major design constraint (in
front of power, performance, area etc.) ! Co-Design of HW/SW/OS/Middleware/Application
39
spp1500.itec.kit.edu
DFG SPP 1500
J. Henkel DVLSI Symposium
Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
Providing Efficient Reliability in Critical Embedded Systems
! Motivation ! Increasing vulnerability of VLSI circuits and
embedded systems to radiation induced soft errors
! Traditional fault tolerant approaches are either too costly or ineffective
! Need for cost effective approaches applicable to embedded systems
! This Project ! Hierarchical techniques and methodologies ! At both hardware and software levels to ensure
cost-effective reliability ! Multi-level reliability modeling, effective
concurrent error detection and recovery schemes
Cost
Failure Rate
+
+
Reliability+
Mainstream systems
(handhelds, mobile, ASICs)
Traditional Fault-tolerance,
High-end systems
Proposed Approaches:
Tolerating High Failure Rates at
Low Cost
Towa
rds
nano
scal
e
(Source: Tahoori, KIT)
40
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium Operation, Observation,
Adaptation ! Operation ! Scheduling and dispatching of resources
! Observation ! Observe sources of undependability ! When and how often to observe ! What to observe ! Resources for observation: how many and where to place etc.
! Adaptation ! Reaction to undependable operation ! Self-adaptation (e.g. bio-inspired like in organic computing) ! Convergence of adaptation strategies
?
41
spp1500.itec.kit.edu
DFG SPP 1500
J. Henkel DVLSI Symposium
Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
Reliable Computing Base
Operating System Support for Redundant Multithreading
! OS replicates application binaries ! Automatic (no programmer
involvement) ! Flex ib le ( turn on/of f per-
application) ! High appl icat ion coverage
(replication includes device drivers & protocol stacks)
! Low overhead (loosely coupled replicas, distributed across CPU cores)
(Potentially faulty) Hardware
Fiasco.OC Microkernel
L4Re OS Runtime Replicator
Application
Application
Appli-cation Applicat
ion Applicat
ion Device Driver
Appli-cation
User Mode Kernel Mode
(Source: Ernst/Härtig)
42
spp1500.itec.kit.edu
DFG SPP 1500
J. Henkel DVLSI Symposium
Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
VirTherm-3D
! Provides thermal-aware task allocation and rerouting physical connections between tasks, agents and I/O
! Architecture & Operating System ! Virtualization layer on L4
microkernel basis for processing virtualization
! Agent-based thermal management for monitoring and task migration
! Communication virtualization with HW enablement
SW Task
Therm Agent
Run2me
SW Task
Therm Agent
OS
SW Thread
Dedicated IP Block
VNA
Core Core
Mem VNA
Core Core
Mem VNA
Core
Mem VNA
Mem Ctrl
I/O Ctrl
HW Accl
R R
R R
R R
VNA
HW Accl
ReconOS-‐Kernel
SW Thread
HW sc
hedu
ler
HW Thread
HW Thread
HW Thread
HW Accl
VNIC
Core
Mem
Core
Mem
Dedicated IP Block
VNA
Core Core
Mem VNA
Core
Mem VNA
Mem Ctrl
HW Accl
R
R
R
VNA
HW Accl
HW Accl
Core
Mem
Core
Mem
Core Core
Mem VNA
I/O Ctrl
R
R
R
VNIC
Dedicated IP Block
VNA
Core Core
Mem VNA
Core
Mem VNA
Mem Ctrl
HW Accl
R R
R
VNA
HW Accl
HW Accl
Core
Mem
Core
Mem
Core Core
Mem VNA
I/O Ctrl
R R
R
VNIC
Fiasco-‐µC/VMM
(Source:Henkel/Herkersdorf)
43
spp1500.itec.kit.edu
DFG SPP 1500
J. Henkel DVLSI Symposium
Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
VirTherm-3D
! VirTherm-3D ! Dias PyroView infrared thermal camera on Virtex-5/-2 ! Generating heating in one layer and simulating effect on
active/idle layer above to calibrate thermal model
(Source:Henkel/Herkersdorf)
44
spp1500.itec.kit.edu Jörg Henkel, Dec. 1, 2012, Tokyo, Japan
DFG SPP 1500
J. Henkel DVLSI Symposium Summary ! The SPP 1500 aims at technology-induced reliability
problems ! Focus is on:
! Dependable Hardware Architectures ! Dependable Embedded Software ! Technology Abstraction ! Design Methods ! Operation, Observation and Adaptation
! Cross-layer techniques are key ! The topic of reliability remains very hot (several
international projects; large submission numbers in international conferences)