21. design for testability...example, microprocessor has nanoamps of current with no faults; many...

40 60 80 100 120

40

60

80

mm

21. Design for Testability

J. A. Abraham

Department of Electrical and Computer EngineeringThe University of Texas at Austin

VLSI DesignFall 2015

November 11, 2015

ECE Department, University of Texas at Austin Lecture 21. Design for Testability J. A. Abraham, November 11, 2015 1 / 51

40 60 80 100 120

40

60

80

mm

Design for Testability (DFT)

Reduce costs associated with testing complex circuit

Design circuit so that it will be easier to test

Increase accessibility of internal nodes

Controllability: ability to establish specific signal value ateach internal node by setting inputsObservability: ability to determine internal values bycontrolling inputs and observing outputs

Ensure predictable circuit responses

Tradeoffs

Technical: area, I/O pins, performanceEconomic: design time, yield, time to revenue


40 60 80 100 120

40

60

80

mm

Testable Sequential Circuits

Sequential circuits are very difficult to test

Design the internal memoryelements to be part of ashifter register chain toprovide controllability,observability through serialshifts

With scan chain, problem of testing any circuit reduces to testingthe combinational logic


40 60 80 100 120

40

60

80

mm

Level-Sensitive Scan Design (LSSD)

Structured DFT developed at IBM

All internal storage implemented in hazard-free polarity-holdlatches (SRLs), part of a scan chain

Latches controlled by two or more non-overlapping clocks,with rules for clocking

All clock inputs to SRLs must be in their off states whenprimary inputs (PIs) are “off”Clock signal at any clock input to an SRL must be controlledfrom one or more clock PIsNo clock can be ANDed with another clockClock PIs cannot feed data inputs to latches, either directly orthrough combinational logic


40 60 80 100 120

40

60

80

mm

Scan Chains

Convert each flip-flop to a scan registerOnly costs one extra multiplexer

Normal mode: flip-flops behave as usual

Scan mode: flip-flops behave as shift register

Contents of flops can be scanned out and new values scannedin


40 60 80 100 120

40

60

80

mm

Scannable Flip-Flops


40 60 80 100 120

40

60

80

mm

Boundary Scan

Testing boards is also difficult

Need to verify solder joints are goodDrive a pin to 0, then to 1Check that all connected pins get the values

Single-sided PC boards with “through-hole” construction used“bed of nails” to contact pins of chip on the back side of theboard

SMT and BGA boards cannot easily contact pins

Build capability of observing and controlling pins into eachchip to make board test easier


40 60 80 100 120

40

60

80

mm

Boundary Scan (IEEE 1149.1)


40 60 80 100 120

40

60

80

mm

Boundary Scan Example


40 60 80 100 120

40

60

80

mm

Boundary Scan Interface

Boundary scan is accessed through five pins

TCK: test clockTMS: test mode selectTDI: test data inTDO: test data outTRST*: test reset (optional)

Chips with internal scan chains can access the chains throughboundary scan for unified test strategy


40 60 80 100 120

40

60

80

mm

IDDQ Testing

Measure quiescent power supply current of CMOS circuitsfor selected test vectors

Example, microprocessor has nanoamps of current with nofaults; many defects (shorts, for example) cause much highercurrents

Direct relationship found (Sandia Labs., HP, Phillips) betweenIDDQ test acceptance rates and quality, reliability of ICs

Only need to activate site of potential defect (no need topropagate errors)

Leakage current in very large chips may overwhelm the abnormalcurrent due to a fault in one gate


40 60 80 100 120

40

60

80

mm

IDDQ Testing, Cont’d


40 60 80 100 120

40

60

80

mm

Built-In Self Test (BIST)

Increasing circuit complexity, tester cost

Interest in techniques which integrate some tester capabilitieson the chip

Reduce tester costsTest circuits at speed (more thoroughly)

Approach:

Compress test responses into “signature”Pseudo-random (or pseudo-exhaustive) pattern generator(PRG) on the chip

Integrating pattern generation and response evaluationon chip – BIST


40 60 80 100 120

40

60

80

mm

Pseudo-Random Sequence Generator (PRSG)

Linear Feedback Shift Register

Shift register with input taken from XOR of statePseudo-Random Sequence Generator (or Pseudo-RandomPattern Generator (PRPG), or Linear Feedback Shift Register(LFSR))

Step Q0 1111 1102 1013 0104 1005 0016 0117 111 (repeats)


40 60 80 100 120

40

60

80

mm

Example of BIST

Technique calledSTUMPS (from IBM)


40 60 80 100 120

40

60

80

mm

Testability Issues – Custom Microprocessors

Chip complexity, large volumes, costs

Area/performance penalties

Fault (defect) coverage goals

Test generation (fault simulation) costs

Development time

Tester time (manufacturing costs)

ASICs generally use a “full scan” methodology for testability

Some of the ad-hoc techniques used in custom microprocessors willbe described in the following slides


40 60 80 100 120

40

60

80

mm

Testability Features of the Motorola 68020

Chip Statistics:

Process: 2µ HCMOSDie Size: 375 × 347 milsTransistor count: 200K

Test Overhead

Test Logic Area: < 3%Test Microcode: < 2%Test routing shared with functional routing to keep chip areadownBus structure helped create test partitioning for embeddedstructuresTradeoff of extra time to produce ad-hoc test logic withexpected higher yield

Test techniques for ROMs, PLAs, Cache

Test mode not available to customers (Why??)


40 60 80 100 120

40

60

80

mm

Testing Scheme for ROMs in the 68000 Family

Initial schemes for testing 68000 family control structuresinvolved bussing nanoROM outputs off-chip via the addressbus in test mode

Could not test at speed since nanoROM output buffers werenot sized to drive a bus

Only nanobits were observable, and the PLAs, microROM andnanoROM had to be tested as one large sequential machine

Test sequences were lengthy and difficult to write

Tests were also microcode dependent


40 60 80 100 120

40

60

80

mm

Testability Techniques for 68020 ROMs

Used test mode to force next microcode address (NMA) fromdata pins

Data pins also control a MUX for both micro and nano ROMoutputs, which are moved to the BC bus, into the datasection of the execution unit, and to the address bus whichcan be observed

Exhaustive testing of the2K ROM entries

32 bits of ROM visibleevery 2 clocks

Four passes of testsneeded to read the 110outputs of the two ROMs


40 60 80 100 120

40

60

80

mm

Microrom Test in MC68881 Floating Point Co-processorROM physically split into two sectionsIn test mode, ROM is addressed directly through thecommand registerExhaustive addresses fed to ROM in a “ping-pong” fashion(address/address complement)Outputs of ROM go to two 16-bit signature registers (usingCCITT-16 polynomial x15 + x12 + x5 + 1)Monitor both the quotient and final signature serially on atest pin (Probability of aliasing 2−(n+m−1))


40 60 80 100 120

40

60

80

mm

MC68881 NanoROM Test

NanoROM physically located between the microROM and theexecution unit (ECU), and outputs fed to ECU

No functional path for nanoROM to access signature registers

In test mode, nanoROM columns coupled with the microROMcolumns (with additional columns of nanoROM multiplexed tosignature register)


40 60 80 100 120

40

60

80

mm

MC68881 Entry PLA Test

Four entry PLAs, A0, A1, A2 and A3

A0 PLA contains the entry point reset vectors and iscompletely tested functionallyA1–A3 tested like the ROMs

Command register generatespatterns and outputs are routedthrough a bus to the signatureregister

Test patterns generated using PLAtest generator

PLA Inputs Outputs Products

A1 6 10 23A2 13 10 133A3 5 10 28Response 25 25 56


40 60 80 100 120

40

60

80

mm

Built-in Self Test in the Intel 80386

Normal PLA inputs disabled during test and output ofpseudorandom generator provides exhaustive set of tests toAND-plane input

CROM tested with binary counter (exhaustive test)

Responses compressed using multiple-input signature registers

Test transistors: 2.7%

Area overhead for BIST:1.8%

Transistor sites testedwith BIST: 52.5%

Area tested: 18.6%


40 60 80 100 120

40

60

80

mm

Testing Cache Memory Arrays in MC68030

Cache cell layout design is resistant to both bridging defectsand capacitive coupling

Most likely bridging defect is between adjacent metal bit linesMemory fault model:

One or more cells stuck at 0 or 1Coupling between cells

11n March Test (Marinescu, 1982)

Refresh and data retention tests for the dynamic memory cellsECE Department, University of Texas at Austin Lecture 21. Design for Testability J. A. Abraham, November 11, 2015 23 / 51

40 60 80 100 120

40

60

80

mm

Test Architecture of MC68040

Utilizes three different DFT methods

Functional tests for data paths with normal mode instructionsAd-hoc test modes constructed for testing on-chip memoryarrays and on-chip phase-locked loopExtension of existing functional snoop logic to allow arrays tobe accessed from the bus similar to a static RAM arrayStructured custom scan approach for ROM, PLA and controlareas

Global test bus driven from two 5-bit registers coordinates testlogic

IEEE 1149.1 boundary scan test interface

Boundary scan data register shared with scan logic

Test overhead:

18,560 devices, 1.55% of total number of devices3.15% of total die area, including global test signal routing


40 60 80 100 120

40

60

80

mm

MC68040 DFT Method Coverage


40 60 80 100 120

40

60

80

mm

MC68040 Scan Chain and Timing


40 60 80 100 120

40

60

80

mm

BIST in IBM Risc System/6000

LSSD with pseudo-random BIST


CommonEngineeringProcessor (COP),bus, on-cardsequencer (OCS),engineeringsupport processor(ESP)

40 60 80 100 120

40

60

80

mm

Common on-chip Processor (COP) in IBM RS/6000

Hardware for pseudo-random vector generation and resultcompression (31-bit LFSR)


Small processor controlsoperation

40 60 80 100 120

40

60

80

mm

BIST in IBM/Motorola Power-PC

Variety of test techniques applied to the Power-PC 603

Full LSSD test of logicBIST of “large” embedded RAMSFunctional test of small RAMSIDDQ tests

BIST for cache and tag RAMs

Functional vectors (good for data cache, not instruction cache)and random BIST (size, complexity, test coverage) notapplicable

Use modified march test of Dekker (1988)

log2 n pattern for data

Overhead: BIST is 2.9% of RAM array, 0.58% of total chip

Performance impact: less than 100 pS due to extra MUXinput leg

All four RAMs tested in parallel, 2.5 mS at 80 MHz


40 60 80 100 120

40

60

80

mm

Testing Embedded Cores in a System on Chip

Source/Sink for test: Embedded processorTest access mechanisms: BussesCore test “wrapper” for Test AccessInterface (TAI)

Lagged Fibonnaci Sequencerandom pattern generatorXn = (Xn24 +Xn55) mod m,n ≥ 55Primitive polynomial:X55 +X24 + 1Period: 2321 × (255 − 1)No multiplication necessary

x86 code for generator:

LDR R5,[R0],#4

LDR R6,[R1],#4

ADD R5,R5,R6

STR R5,[R2],#4

AND R0,R0,R3

AND R1,R1,R3

AND R2,R2,R3ECE Department, University of Texas at Austin Lecture 21. Design for Testability J. A. Abraham, November 11, 2015 30 / 51

40 60 80 100 120

40

60

80

mm

Issues with Built-In Self Test

Technique can run test sequences at operating frequencies andcapture results within the chip

Pseudorandom pattern generators, signature analyzers

Can also use weighted randompatterns or deterministic patterns

Example (Synopsys): (DeterministicLogic BIST

Problems:Hardware overheadTest powerNon-functional modes during test


40 60 80 100 120

40

60

80

mm

Industry Issues with Processor Testing

Concern with detecting real defects

Small delay defects due to process variations, power droopsand capacitive coupling

Cause a shift in the speed of the part

Problems with logic BIST (same issues with scan AC tests)

Overheads on chip

False paths tested

Test operating conditions different from normal operatingmodes


40 60 80 100 120

40

60

80

mm

Software-Based (Native-Mode) Self Test for Processors

Why not use functional capabilities of processors to replaceBIST hardware?

No additional hardware

Reduce test costs by using low-cost testers

Increase coverage of delay defects and increase yield by testingnative

No issues with excessive power consumption during test

Developed at University of Texas (Int’l Test Conference 1998)

Application to processors at Intel (Int’l Test Conference 2002)


40 60 80 100 120

40

60

80

mm

Software Based Self Test

Advantages

X MinimizedDFTcircuitry

X Reducedexternaltester per-formance

X Excessivetest powerandover-testingeliminated


40 60 80 100 120

40

60

80

mm

Native-Mode Signature Compression

Pseudo-code for software signature analyzer// S: general register or memory, holds signature

for each register Ri to be compressed

{Shift Right Through Carry(Ri);

if (Carry) {S = XOR(S, feedback polynomial)};S = XOR(S, Ri);

}ECE Department, University of Texas at Austin Lecture 21. Design for Testability J. A. Abraham, November 11, 2015 35 / 51

40 60 80 100 120

40

60

80

mm

Intel Functional BIST

Functional Random Instruction Testing at Speed (FRITS) appliedto Itanium processor family and Pentium 4 line (ITC 2002)

Tests (kernels) are instruction sequences

Kernels loaded into cache and executed in real time duringtest application

They generate and execute pseudo-random or directedsequences of machine code

On Pentium-4, FRITS

added 5% unique coverage to manual tests

screened 10% – 15% of chips which passed wafersort/package tests, but failed system tests

enabled low-cost testers: 40% increase in defect screening onstructural tester

Kernels execute 20 loops in ≈ 8 mSecsECE Department, University of Texas at Austin Lecture 21. Design for Testability J. A. Abraham, November 11, 2015 36 / 51

40 60 80 100 120

40

60

80

mm

Are Random Tests Sufficient?

Intel implementation involved code in the cache which generatedrandom instruction sequences

Interest in generating instructions targeting faults

Possible to generate instruction sequences which will test foran internal stuck-at fault in a module

In order to deal with defects in DSM technologies, need to targetsmall delay defects

Automatically generate instruction sequences which willtarget small delay defects in an internal module


40 60 80 100 120

40

60

80

mm

RT Level Test Generation for Hard-to Detect Faults

Overview

Map gate level stuck-at faultto RTL

Capture the propagationconstraints as an LTLproperty

Generate a witness for theLTL property using BoundedModel Checking

All required constraintsavailable in RTL

Use SMT based BoundedModel Checking

Scaling with

cone-of-influence reduction


40 60 80 100 120

40

60

80

mm

Modeling Stuck-at Faults in RTL

Approach

Assume one to one match between flops in RTL and gatenetlist

Identify flops/primary outputs o1, o2, ..., on in output cone ofthe fault

Identify the boolean function for each of the outputflops/primary outputs. Ex ok = fk(i1, i2, ..., im)

Identify the Boolean function for the output flops with thefault inserted. Ex ofk = ffk (i1, i2, ..., im)

Fault condition : faultk = fk(i1, i2, ..., im)⊕ ffk (i1, i2, ..., im)


40 60 80 100 120

40

60

80

mm


Example

always @(posedge clk)sum <= PI ⊕ sum;

sum = (PI ∧ ¬sum) ∨ (¬PI ∧ sum)

sumf = PI ∧ ¬sumfaultsum = sum⊕ sumf = ¬PI ∧ sum


40 60 80 100 120

40

60

80

mm


Example

always @(posedge clk)sum′f <= PIf ⊕ sumf ;

assign faultsum = (¬PIf ∧ sumf );

assign sumf = (faultsum?¬sum′f : sum′f );


40 60 80 100 120

40

60

80

mm

Capturing Fault Observability

Naive Approach

Let M be the DUT and Mf be the faulty machine

SMT good = I ∧M0 ∧ ... ∧Mn−1

SMT faulty = I ∧Mf0 ∧ ... ∧M

fn−1

Product Machine, SMTM×Mf= SMT good ∧ SMT faulty

Observability Condition:

F (∨∀i

(Outputi ⊕Outputfi ))

Issues

Two unrolled copies of the machine

Search space is large


40 60 80 100 120

40

60

80

mm

Capturing Fault Observability

Approach

Identify error(D/D̄) propagation paths in RTL for givenfault(Dependency Graph)

Capture propagation path conditions as observabilityproperties Oj

p

Property for fault excitation and propagation F (Cp ∧Ojp)

Scalability by constraining the search space

Bounded Cone of Influence Reduction


40 60 80 100 120

40

60

80

mm

RTL Test Generation for Hard-to-Detect Faults

Experimental Setup

OR1200 RISC processor was DUT

EBMC Model checker / Boolector SMT solver

Bound of pipleine depth + 1

Focused on hard to detect faults in control logic

Commercial ATPG to seive out easy to detect stuck-at faults

78% Fault coverage by commercial ATPG


40 60 80 100 120

40

60

80

mm

RT Level Test Generation for Hard-to Detect Faults

Experimental Results using SMT Solver

Module ATPGFC(%)

Flts.SAT basedmethod

Naive ObservabilityMethod

FC(%) # TO T(sec) FC(%) # TO T(sec)

if 80.35 328 84.11 310 96.18 88.49 161 95.13ctrl 63.21 832 65.97 817 83.12 97.15 59 69.72

oprmuxes 73.66 378 76.09 354 95.49 98.26 6 57.46sprs 89.59 393 90.85 381 93.71 93.78 57 90.27

freeze 82.94 17 99.14 2 64.41 100 0 43.51rf 78.59 7444 80.50 7268 97.57 90.21 463 69.83

except 72.69 1263 73.48 1209 98.63 92.79 128 96.19

Overall 78.05 10655 79.17 10343 96.23 93.86 874 76.11

FC(%) : Fault Coverage in %# TO : # of Timed Out faults

T(sec) : Average Time for generating a test for a fault in secondsECE Department, University of Texas at Austin Lecture 21. Design for Testability J. A. Abraham, November 11, 2015 45 / 51

40 60 80 100 120

40

60

80

mm

Experimental Results, Structural Observability

Module FC(%) # TO T(sec)

if 98.17 25 23.14ctrl 99.21 8 21.16

oprmuxes 100 0 19.33sprs 97.53 12 18.39

freeze 100 0 10.48rf 98.37 172 22.85

except 97.63 69 38.14

Overall 98.87 454 24.23

FC(%) : FaultCoverage in %# Faults : # ofUndetected CollapsedFaults# TO : # of TimedOut faults

T(sec) : Average Time

for generating a test for

a fault in seconds

Summary of Results

Functional fault coverage of ≈99% for OR1200 processor

SMT based approach was 4x faster than SAT


40 60 80 100 120

40

60

80

mm

Detecting Delay Defects After Manufacturing

Necessary to detect “small-delay defects”

Delay defects which don’t affect performance soon aftermanufacture could be reliability hazards

Need Path Delay Tests to ensure that defective parts arescreened out

Difficult to apply two-pattern tests in scan modeRequires precise control of clocks for high-speed circuitsIf capture clocks are based on the system clock, there is noinformation on the slack for the path

Solution: On-chip programmable capture mechanismAbility to capture faster than at-speed


40 60 80 100 120

40

60

80

mm

Delay Lines for On-Chip Programmable Capture

Source: R. Tayade, et al.ECE Department, University of Texas at Austin Lecture 21. Design for Testability J. A. Abraham, November 11, 2015 48 / 51

40 60 80 100 120

40

60

80

mm

System for On-Chip Programmable Capture

Source: R. Tayade, et al.ECE Department, University of Texas at Austin Lecture 21. Design for Testability J. A. Abraham, November 11, 2015 49 / 51

21. design for testability...example, microprocessor has nanoamps of current with no faults; many...

Documents