self-testable, self-adaptable and error-resilient system ... · serializer 2 2 digital core pattern...
TRANSCRIPT
Engineering Insights 2006
Self-Testable, Self-Adaptable and Error-Resilient System Design –
Tim ChengElectrical and Computer Engineering
coping with increasing variability and reliability concerns
Engineering Insights 2006
• Decreasing design window• Less tolerance for design revisions
• Decreasing design window• Less tolerance for design revisions
• Exponentially more transistors• Increasing complexity in system
context
• Exponentially more transistors• Increasing complexity in system
context
Time-to-Money Heterogeneity
• Coupling cap• Signal integrity• Inductance• Leakage
•• Coupling capCoupling cap•• Signal integritySignal integrity•• InductanceInductance•• LeakageLeakage
• Greater diversity of on-chip elements: processors, SW, RF, memory, analog, high-speed bus
•• Greater diversity of onGreater diversity of on--chip chip elements: elements: processors, SW, RF, processors, SW, RF, memory, analog, highmemory, analog, high--speed busspeed bus
Drivers of SystemDrivers of System--onon--aa--Chip/SystemChip/System--inin--aa--Package Design & Test TechnologiesPackage Design & Test Technologies
Com
plex
ityNanoScale
Effects
Engineering Insights 2006
Challenges Facing the Next Decade in Integrated System Design
• Managing and exploiting design partitioning and trade-offs for heterogeneous systems – HW, SW, analog, RF, MEMS, optical, etc..
• Power and energy• Verification and test• Reliability and robustness • Implementation fabrics beyond silicon
Engineering Insights 2006
Increasing Failure Sources and Failure Rates
40
50
60
70
80
90
100
110
Tem
pera
ture
(C)
On-Die Temperature variations
SEU
random defects
parametric variations
soft errors
design errors
Engineering Insights 2006
Harder to Design Reliable System-Chips• First-silicon success rate has been dropping
– <30% for complex ASIC/[email protected]– Pre-silicon logic bugs have been increasing at 3X-4X/generation
for Intel’s processors
• Yield has been dropping for volume production– IBM’s 8-core Cell-Processor chips: ~10-20% yield
• “Better than worst-case” design resulting in failures w/o defects– Increase in variation of process parameters with scaling– Worst-case design getting way too conservative
• One-time-factory production testing will be too costly and insufficient for failure screening
Engineering Insights 2006
New Design and Test Paradigm: Reliable Systems With Unreliable Components
• Systems must be designed to cope with failures• Efficient silicon debug is a must
– Design for debugging would become necessary• Must have embedded self-test for error detection
– For both testing in manufacturing line and in-the-field– Both on-line and off-line testing
• Must be re-configurable and adaptable for error recovery– Using spares to replace defective parts– Using redundancy to mask errors– Using self-tuning to compensate variations
Engineering Insights 2006
Some of Our Research Results• Embedded Software-Based Self-Test for SoC
– Reuse of embedded processors and on-chip resources for self-test and diagnosis
• Test, Characterization and Diagnosis for High Speed Serial IO Interfaces
• Silicon Debug for Timing Failures• Formal Equivalence Checking between System
Specification and RTL Code
Engineering Insights 2006
Embedded-Software-Based Self-Test For Programmable Systems
Test and diagnosis are applications of a programmable SOC!!Test and diagnosis are applications of a programmable SOC!!
Reuse of on-chip programmable components for testProcessor/DSP/FPGA cores for on-chip test generation, measurement, response analysis and even diagnosis
Self-test a processor using its instruction set for high fault coverageUse the tested processor/DSP to test buses, interfaces and other components, including analog and mixed-signal components
Engineering Insights 2006 Embedded SW-Based Self-Testing for Programmable System Chips
BusInterface Master Wrapper
BusArbiter
Low-CostTester
On-ChipMemory
Test program
Responses
VCISignatures
DSP
VCI
IP CoreVCI
System MemoryVCI
On-chip Bus
BusInterfaceMaster Wrapper
BusInterface Target WrapperBusInterface Target Wrapper
Loading test program at low speed
Self-test at operational
speed
Unloading response
signature at low speed
CPU
• Low-cost tester• High-quality at-speed test• Low test overhead• Non-intrusive
Test in normal operational mode• No violation of power consumption• More accurate speed-binning
Engineering Insights 2006
• DSP-based v.s. analog ATE– Increased test throughput– Reduced switching & settling
time– Device response is
memorized and analyzed for different parameters
– Software DSP doesn’t have to be real-time
– Performance limited by ADC/DAC
DSP-Based Self-Test for Analog/Mixed-Signal Components
DSP-based ATE
ADCDAC AnalogDUT
Memory
Digital signal processing
Synchronization
SOC
ADCDAC AnalogCUT
Memory
Digital signal processing
Synchronization
DSP/Processor core
• DSP-based embedded self-test - on-chip tester– Relieve need of
expensive ATE– Reduce external noise
• Practical issues– Test quality limited by
DAC/ADC– DAC/ADC are not
always available, and must be tested first
1-bit DAC
1-bit DSM
Engineering Insights 2006
Self-Test for Analog/MS Components• A self-test architecture using the delta-sigma
modulation principle for signal generation and for waveform acquisition
ATE SOCATE SOC
AnalogCUT
AnalogCUT
Responseanalysis
Programmablecore + memory
Test stimuli& spec.
Pass/fail ?
On-chip DAC/Low. res. DACOn-chip DAC/Low. res. DAC
On-chip ADC/1-bit ΔΣ
modulator
On-chip ADC/1-bit ΔΣ
modulator
Software ΔΣmodulator
Engineering Insights 2006
Bit-Error-Rate (BER) Estimation for High-Speed Serial Interface
• Most applications require 10-12 or even lower BER– Measuring BER impractical for production testing
• BER can be estimated using parameters with strong correlation with BER:– Spectral information of jitter
• Frequencies and amplitudes of Periodic Jitter (PJ)• Rms value of Random Jitter (RJ)
– Jitter transfer characteristics of a CDR circuit• Magnitude response (low pass filter characteristic)• Phase response (timing response in clock recovery)
– Channel characteristics
Engineering Insights 2006
Measure phase response of a CDR circuit
Measure magnitude
response of a CDR circuit
Measure BER
2.5 Gbps Data with jitter Recovered
clock
Clock with jitter
BER Estimation - HW Validation
Maxim’s 2.5 Gbps CDR circuit (MAX 3873A )
SynthesysSynthesys ReseachReseach’’ssBERTScopeBERTScope
Engineering Insights 2006
Results - Clk-like and PRBS Data
1.0E-12
1.0E-11
1.0E-10
1.0E-09
1.0E-08
1.0E-07
1.0E-06
1.0E-05
0.01 0.1 1 10 100
PJ Freq (M Hz)
BER
M ea. BER Est. BER
1.0E-12
1.0E-11
1.0E-10
1.0E-09
1.0E-08
1.0E-07
0.01 0.1 1 10 100
PJ Freq (M Hz)BER
M ea. BER Est. BER
Clock-like pattern w/ 0.5T PJ Clock-like pattern w/ 0.45T PJ
• Measured and estimated BER match very well
• The difference is larger at low (~10-12) level, due to the bounded nature of RJ in practice
1.0E-12
1.0E-11
1.0E-10
1.0E-09
1.0E-08
1.0E-07
0.01 0.1 1 10 100
PJ Freq (M Hz)
BER
M ea. BER Est. BER
PRBS pattern w/ 0.5T PJ
Engineering Insights 2006
Serial IOTx
Rx
Serializer2
2
Digital core
Patterngenerator
Patternanalyzer
Deserializer
Digital Analog
PE
Pre-emphasis
EQ
Equalizer
CDR
Testable/Debuggable Design for Adaptive Equalizer in High Speed Serial-Link
• A design-for-testability (DfT) solution for adaptive equalizer (EQ) in high-speed IO
• Applicable to various EQ architectures• Addressing RX’s observability & controllability problems• Lower test cost and higher fault coverage than
conventional eye-diagram-based method
EQ CDR
Clock
Signal
Access Point
RX
EqualizedSignal
On-ChipMonitor
Data
DriveCircuit
Engineering Insights 2006
Testable Design for Adaptive Equalizer
Adp.Alg.
UnequalizedInput y(t)
clk
ε (k)
v(t)
EqualizedOutput
DataI(k)
v(t)/I(k)
EQ
(for DFE)
v(k)
y(k)
Clock
I(k)
ε (k)
Z-1 Z-1
Adp.Alg.
PatternGenerator
Scan OutFF FF
K1
FFE
ci
• Minor HW modification of EQ: a FF chain (storing tap-coefficients), a pattern generator, and some switches– Extra circuits are all digital
• Addressing both characterization and production testing
General Architecture for DFE/FFETestable Decision-Feedback
Equalizer (DFE) or Feed-Forward Equalizer (FFE):
Engineering Insights 2006
Experiments: Testable Equalizers
AnalogInput
DigitalPE Card
I'(k)
DSP
Tester
Clock
FFEy(t)
ci DUTε (k)
I(k)
Adp.Alg.
PatternGen
DigitalInput -1
-0.50
0.51
1.5
-2 -1 0 1 2Tap Number
Tap
Wei
ght fault-free
with tap-1 s.a. fault
-0.20
0.20.40.60.8
11.2
-2 -1 0 1 2Tap Number
Tap
Wei
ght Fault-free
With Gain Error
-0.4-0.2
00.20.40.60.8
11.2
-2 -1 0 1 2 Tap Number
Tap
Wei
ght Fault-free
With Offset Voltage
• Applying digital tests and examining tap-coefficients for fault detection and diagnosis
• Demonstrating much higher fault coverage than eye-diagram-based approach
Engineering Insights 2006
From Test to Recovery/Reconfiguration - Examples
• Memory– “BIST → BISD → BISR” a common practice
• Analog/RF/High-speed IO components– Digitally assisted self-calibration: fine-tuning
performance; more robust to process, temperature and voltage variations…
• Dynamic circuits– Using programmable keeper and on-chip leakage
sensors for tuning performance and robustness
Engineering Insights 2006
Dynamic Circuit Using Static Keeper
clk
. . .RS0 RS7
D0 D7
RS1
D1
LBL0
LBL1
N0
Keeper upsizing degrades average performance
Conventional Static Keeper
Engineering Insights 2006
Pessimistic Design Hurts Performance
worst-case corner
(130nm CMOS Measurements, 110°C)
0
50
100
150
200
Normalized IOFF
Num
ber
of d
ies
0 1 2 3 4 5 6
nominal corner
Substantial variation in leakage across dies4-5X variation between nominal and worst-case leakagePerformance determined at nominal leakageRobustness determined at worst-case leakage
Engineering Insights 2006
Programmable Keeper for Dynamic Ckts3-bit programmable keeper
clk
. . .RS0 RS7
D0 D7
RS1
D1
LBL0
LBL1
N0
b[2:0]
W 2W 4Ws s s
Opportunistic speedup via keeper downsizingRef: C. Kim and K Roy, Purdue
Engineering Insights 2006
On-Die Leakage Sensor
C. Kim et al. , VLSI Circuits Symp. ‘04
83μm
73μm
current reference
comparators
currentm
irrors
VBIASgen.
NMOS device
test interface
High leakage sensing gain Compact analog design sharing bias generators
Technology 90nm dual Vt CMOS
VDD 1.2VResolution 7 levels
Power consumption 0.66 mW @80Cº
Dimensions 83 X 73 μm2
Engineering Insights 2006
Output codes from leakage sensor001 010 011 100 101 110 111
Leakage Binning Results
Engineering Insights 2006
Process detection
Test & Tuning Process for Self-Tunable DesignFab
Assembly
wafer test
Burn inPackage testCustomer
Leakage measurement
On-die leakage sensor
Program using fuses
Engineering Insights 2006
Digital Logic - Exploring Redundancy and Reconfiguration Tradeoffs
• Redundancy– Suitable for soft/transient/marginality errors– Different forms:
• Hardware redundancy • Time redundancy • Information redundancy
• Reconfiguration– Suitable for hard errors (e.g. defects)
• Design methodology and tools for area, power, performance and reliability tradeoffs?
Engineering Insights 2006
SummaryQuality can't be added, it has to be designed in. Cost-effective embedded self-test will replace existing manufacturing test methodologies for heterogeneous SoC/SiPPost-silicon tuning/calibration/reconfiguration is becoming promising, and necessary, for Si nano systems