ee371 debug examples - stanford university · 1 j. stinson © 2007 ee 371: debug examples 1 ee371...

28
EE 371: Debug Examples J. Stinson © 2007 1 EE371 Debug Examples Intel Corporation [email protected] EE 371: Debug Examples J. Stinson © 2007 2 Agenda Speedpath Failure Circuit Marginality: Noise Functional Failure Circuit Marginality: Multiple • PowerUp Problems

Upload: lamnga

Post on 26-Aug-2018

237 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

1

EE 371: Debug ExamplesJ. Stinson © 2007 1

EE371Debug Examples

Intel [email protected]

EE 371: Debug ExamplesJ. Stinson © 2007 2

Agenda

• Speedpath Failure• Circuit Marginality: Noise• Functional Failure• Circuit Marginality: Multiple• PowerUp Problems

Page 2: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

2

EE 371: Debug ExamplesJ. Stinson © 2007 3

Speedpath Failure

EE 371: Debug ExamplesJ. Stinson © 2007 4

Speedpath Example: The Wall Shmoo

d004508A16588 -- #FF005 (92C)3.2V |EEEEEEEE+++++++++++++3.0V |EEEEEEEE+++++++++++++2.8V |XEEEEEEE+++++++++++++2.6V |XXEEEEEE+++++++++++++2.4V |XXXXXEEE+++++++++++++2.2V |XXXXXXXE+++++++++++++2.0V |XXXXXXXXXX+++++++++++

+^-----^-----^-----^--10.0 11.2 12.4 13.6

+ - passE - Wall failX - other fail

Voltage

Bus Period

Passing Region

Page 3: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

3

EE 371: Debug ExamplesJ. Stinson © 2007 5

d004508A16588 -- #SS131 (92C)3.2V |XXXXXEEE+++++++++++++3.0V |XXXXXXEE+++++++++++++2.8V |XXXXXXXE+++++++++++++2.6V |XXXXXXXXX++++++++++++2.4V |XXXXXXXXXXX++++++++++2.2V |XXXXXXXXXXXXXXX++++++2.0V |XXXXXXXXXXXXXXXXXXX++

+^-----^-----^-----^--10.0 11.2 12.4 13.6 + - passE - Wall failX - other fail

d004508A16588 -- #FF005 (92C)3.2V |EEEEEEEE+++++++++++++3.0V |EEEEEEEE+++++++++++++2.8V |XEEEEEEE+++++++++++++2.6V |XXEEEEEE+++++++++++++2.4V |XXXXXEEE+++++++++++++2.2V |XXXXXXXE+++++++++++++2.0V |XXXXXXXXXX+++++++++++

+^-----^-----^-----^--10.0 11.2 12.4 13.6 + - passE - Wall failX - other fail

Fast Transistor Part Slow Transistor Part

Skew Insensitive wall

EE 371: Debug ExamplesJ. Stinson © 2007 6

The Wall Debug

• Production test platform suspected– A timing setup problem– How could silicon act this way?

However...Debug test platform confirmed– Unlikely two diff’t platforms had same timing

error– Now we had to do the debug...

Page 4: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

4

EE 371: Debug ExamplesJ. Stinson © 2007 7

Pattern Timeline

Slow Pattern

Clock 0

Debug Process

TesterShmoo

Pin Failure

TesterShmooTAP

First ScanMismatch

TAPClock ShrinkRTL SimProbing

SpeedpathClock

EE 371: Debug ExamplesJ. Stinson © 2007 8

Why was it a wall?

• Long Interconnect Line– RC Delay less sensitive to driver strength– Voltage/process only improve driver

Page 5: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

5

EE 371: Debug ExamplesJ. Stinson © 2007 9

Interconnect Effect

Transistor Delay

RC Delay

Clock

Path #1

Path #2

Data Valid Times

Must be valid before

Voltage = 2.0V

EE 371: Debug ExamplesJ. Stinson © 2007 10

Transistor Delay

RC Delay

Clock

Path #1

Path #2

Xtor Path Shows Big Improvement

RC Path has little Improvement

Interconnect Effect

Voltage = 2.5V

Page 6: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

6

EE 371: Debug ExamplesJ. Stinson © 2007 11

Why was it a wall?

Jam Sustainer

• Jam sustainer at end of the line– Fights transition of signal– Sustainer gets stronger with voltage/skew– Adds to “wall” characteristics

EE 371: Debug ExamplesJ. Stinson © 2007 12

Wall Follow-up

• Two FIB experiments– Driver speedup – wall moved– Cut sustainer – wall “leaned”

Page 7: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

7

EE 371: Debug ExamplesJ. Stinson © 2007 13

Shmoo with Cut Sustainer

d004508A16588 -- #FF013 FIB3.2V |EEEE+++++++++++++++++3.0V |EEEEE++++++++++++++++2.8V |XEEEE++++++++++++++++2.6V |XXEEEE+++++++++++++++2.4V |XXXXXEE++++++++++++++2.2V |XXXXXXXE+++++++++++++2.0V |XXXXXXXXXX+++++++++++

+^-----^-----^-----^--10.0 11.2 12.4 13.6

+ - passE - Wall failX - other fail

Wall has leaned over

EE 371: Debug ExamplesJ. Stinson © 2007 14

Circuit Marginality: Noise

Page 8: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

8

EE 371: Debug ExamplesJ. Stinson © 2007 15

Noise Example

• High Voltage Failure– Only one FAB showed

signature• Second FAB seemed clean

– Scan pointed to branch memory array

g196027*, part#ZC-1063.3V |BCCCCCCCCCC3.2V |BCBBCCCCCCC3.1V |BBBBBBBBBBB3.0V |BBBBBBBBBBB2.9V |BBBBBBBBBBB2.8V |BBBBBBBBBBB2.7V |BBBBBBBBBBB2.6V |BBBBBBBBBBB2.5V |BBBBBBBBBBB2.4V |+B+BBBBBBBB2.3V |++++BB+++BB2.2V |+++++++++++2.1V |+++++++++++2.0V |A+++++++++++^-----^----10.0 16.0+ - pass Pass Low Voltage Only

EE 371: Debug ExamplesJ. Stinson © 2007 16

Noise Debug

• EBeam confirmed branch array read– Visibility limited in array

• Bit 4 resolved later than other bits– Based on EBeam waveforms

• Signals on either side of read lines transistioned in opposite direction

– Suspected coupling problem

Page 9: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

9

EE 371: Debug ExamplesJ. Stinson © 2007 17

Bit

Bit#

WL

Sens

eA

mpl

ifier Out

BitBit#

Bit

Bit#

SAEN

SAEN

Coupling Schematics

EE 371: Debug ExamplesJ. Stinson © 2007 18

Bit

Bit#

WL

Sens

eA

mpl

ifier Out

BitBit#SAEN

Coupling Attacks

Sense Amp senses Wrong Value!

SAEN

Coupling Schematics

Page 10: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

10

EE 371: Debug ExamplesJ. Stinson © 2007 19

BTB Coupling Debug

• Parameters data checked at problematic FAB– M2 CD’s wider than normal– ILD1 and ILD2 thicker than normal– More sensitive to coupling

• Audit of original design– Simulations ignored some coupling– New simulations showed failure

EE 371: Debug ExamplesJ. Stinson © 2007 20

BTB Coupling Validation: FIB experiments

• Deposit extra capacitance on read line– Resists coupling from neighbors

• Extend sense amp pulse width– Gives more time for read to resolve

Page 11: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

11

EE 371: Debug ExamplesJ. Stinson © 2007 21

Bit

Bit#

WL

Sens

eA

mpl

ifier Out

BitBit#SAEN

Bitlines split correctly

SAEN

Coupling Schematics

EE 371: Debug ExamplesJ. Stinson © 2007 22

Functionality Failure

Page 12: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

12

EE 371: Debug ExamplesJ. Stinson © 2007 23

Functionality Problem

• “Dash stepping” first silicon non-functional– Stepping was supposed to fix a min-delay race

• Suspected inadequate race fix– Scandiff confirmed same circuitry– EBeam also confirmed…– But visibility was limited

EE 371: Debug ExamplesJ. Stinson © 2007 24

Functionality Debug

• Design team was confident in fix, so…• Plan to strip back the entire block

– Look for possible mask defect– Takes 4-10 days in FIB

However...

• Noticed a floating node in EBeam scope

Page 13: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

13

EE 371: Debug ExamplesJ. Stinson © 2007 25

Floating Node

Driven Metal Lines held at Vss

Electron Beam charges Floating Node

EE 371: Debug ExamplesJ. Stinson © 2007 26

Floating Node

Page 14: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

14

EE 371: Debug ExamplesJ. Stinson © 2007 27

Floating Node Debug

• Node should NOT have been floating• A0 and A1 layout compared

– Via1 or M1 could cause error• FIB strip back focused on this node

EE 371: Debug ExamplesJ. Stinson © 2007 28

Should be 3 via1’sFAB contacted

– Accidentally used A0 via1 mask

Problem fixed– New silicon

arrived shortly

FIB Stripback Results

Page 15: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

15

EE 371: Debug ExamplesJ. Stinson © 2007 29

Good Silicon has all 3 via1’s

Fully functional with correct via1 mask

FIB Stripback Results

EE 371: Debug ExamplesJ. Stinson © 2007 30

Functionality Summary

• Notice details– Focused stripback saved days of work– Very important during time critical debug

Page 16: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

16

EE 371: Debug ExamplesJ. Stinson © 2007 31

Circuit Marginality: Multiple Sources

EE 371: Debug ExamplesJ. Stinson © 2007 32

Circuit Marginality

• Observed High Vcc failures– Frequency Insensitive

• TDO only failure– All signature mode tests were failing– Turning off signature mode allowed test to pass

Page 17: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

17

EE 371: Debug ExamplesJ. Stinson © 2007 33

High Vcc Shmoo

DQS Sigmode Shmoo (104C)2.0V |XXXXXXXXXXXXXXXXXXXXX

|XXXXXXXXXXXXXXXXXXXXX1.8V |XXXXXXXXXXXXXXXXXXXXX

|XXXXXXX++XXXXXX+++XXX1.6V |AA+++++++++++++++++++

|AA+++++++++++++++++++1.4V |AAA++++++++++++++++++

|AAAAAA+++++++++++++++1.2V |AAAAAAAAAAAA+++++++++

+^-----^-----^-----^--12.0 13.0 14.0 15.0

+ - passX - High Vcc failA - Other fail

EE 371: Debug ExamplesJ. Stinson © 2007 34

Marginality Root Cause

• Scanout stopped working in failure region– Deduce scan chain itself was broken

• Probing was only way to root cause– Laser Voltage Probe was able to narrow failure

down to Scan MSFF– Three different mechanisms observed

Page 18: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

18

EE 371: Debug ExamplesJ. Stinson © 2007 35

Scan MSFF Analysis: Pass

Data

Clock

Out

Clock#

N1 N2

Clock#

N2

Clock#

N1

N1 N2

EE 371: Debug ExamplesJ. Stinson © 2007 36

Problem #1: Charge-Share

Data

Clock

Out

Clock#

N1 N2

Clock#

N2N1

Charge-Sharing

Glitch on N1

Page 19: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

19

EE 371: Debug ExamplesJ. Stinson © 2007 37

MSFF Problem #2

Data

Clock

Out

Clock#

N1 N2

Clock#

N2N1

Contention

Larger Glitch on N1

EE 371: Debug ExamplesJ. Stinson © 2007 38

MSFF Problem #3

Data

Clock

Out

Clock#

N1 N2

Clock#

N2N1

Cross Coupling

Flips State

Page 20: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

20

EE 371: Debug ExamplesJ. Stinson © 2007 39

Scan MSFF “Backwriting”

• Slave “backwrites” value into Master– Combination of three mechanisms to cause failure

• Re-simulated all standard cell MSFF’s– Two other cells flagged with same problem

• Circuit was a direct “shrink” from a previous process– Discovered same issue on prior process—but at a

MUCH higher voltage

EE 371: Debug ExamplesJ. Stinson © 2007 40

PowerUp and Initialization

Page 21: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

21

EE 371: Debug ExamplesJ. Stinson © 2007 41

PowerUp Issue

• Observed *some* systems wouldn’t boot– Toggling RESET always enabled boot– Toggling power did not guarantee boot

• Nasty problem to debug– System level issue (not seen on tester)– Intermittent failure (occurred 1 out of 100 times)– Debug tools not enabled (part hasn’t booted)

• Started with oscilloscope waveforms…

EE 371: Debug ExamplesJ. Stinson © 2007 42

Oscope Waveforms

xxRESET#

xxPWRGOOD

xxHIT#

Should be Tri-stated (pulled hi)

Assertion enables Tristate

Page 22: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

22

EE 371: Debug ExamplesJ. Stinson © 2007 43

Why is TriState determined by PWRGOOD?

• Discovered busclk dependency @ – ACLOOP[1] directly controls I/O tristate signal

• Depends upon busclk for proper initialization– While !PWRGOOD, busclk is not generated

• Power-up initialization @ may generate a busclk no issue• Otherwise, must depend on power-up initialization of ACLOOP[1] ( )• “Driven value” on I/O pins will depend on power-up initialization at

EE 371: Debug ExamplesJ. Stinson © 2007 44

Why wasn’t the part booting?

• PWRGOOD will always clear the ACLOOP– Eventually the pins should tristate– So, why was the part still not booting?

• Further characterization: Power levels were very low– When the part failed to boot, the power was very low– Potentially indicated that the PLL wasn’t running– Discovered secondary effect of ACLOOP initialization

problem

Page 23: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

23

EE 371: Debug ExamplesJ. Stinson © 2007 45

PLL Ratio Depends on PWRGOOD

PLL frequency ratio determined

at assertion edge of PWRGOOD

By Sampling the Address Pins

EE 371: Debug ExamplesJ. Stinson © 2007 46

Final Root Cause

• System drives address pins at PWRGOOD assertion– Sets internal PLL frequency– Address pins are *supposed* to be tristated by the processor

• If ACLOOP powers up incorrectly, contention can occur– Processor is driving a ‘0 on address pin; system is driving a ‘1– The processor will always win

• PWRGOOD assertion tristates the address bus– Too late! It’s already been sampled by PWRGOOD assertion– Only “illegal” bus fractions will cause failure

• Only 7 out of 32 possible bus fractions are “illegal”• Failure requires a confluence of diff’t events

– ACLOOP powers up “on”– Bus clock does NOT glitch during power up– Address pins power up driving an “illegal” bus fraction

Page 24: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

24

EE 371: Debug ExamplesJ. Stinson © 2007 47

2nd PowerUp Issue

• Observed *some* systems wouldn’t boot– Toggling RESET never enabled boot– Toggling power usually enabled boot

• Nasty problem to debug– Intermittent failure (occurred 1 out of 1000+ times)

• Some bright spots– Able to demonstrate on tester

• Enabled “deterministic” behavior• Enabled debug tools (scan)

EE 371: Debug ExamplesJ. Stinson © 2007 48

Vcc Shmoo (100x repeat)

NoBoot Shmoo (40C)1.5V |+++++++++++++++++++++

|+++++++++++++++++++++1.4V |+++++++++++++++++++++

|+++++++++++++++++++++1.3V |A++++++++++++++++++++

|AA+++++++X+++++++++++1.2V |AAA++++++++++++++++++

|AAAAAA+++++++++++++++1.1V |AAAAAAAAAAAA+++++++++

+^-----^-----^-----^--6.0 7.0 8.0 9.0

+ - passX - FailA - Other fail

Page 25: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

25

EE 371: Debug ExamplesJ. Stinson © 2007 49

Vcc Shmoo (10000x repeat)

NoBoot Shmoo (40C)1.5V |+++++++++++++++++++++

|+++++++++++++++++++++1.4V |XXXXXXXXXXXXXXXXXXXXX

|XXXXXXXXXXXXXXXXXXXXX1.3V |XXXXXXXXXXXXXXXXXXXXX

|XXXXXXXXXXXXXXXXXXXXX1.2V |XXXXXXXXXXXXXXXXXXXXX

|XXXXXXXXXXXXXXXXXXXXX1.1V |XXXXXXXXXXXXXXXXXXXXX

+^-----^-----^-----^--6.0 7.0 8.0 9.0

+ - passX - FailA - Other fail

EE 371: Debug ExamplesJ. Stinson © 2007 50

Temperature Shmoo (10000x repeat)

NoBoot Shmoo (40C)80C |AA+++++++++++++++++++

|+++++++++++++++++++++60C |+++++++++++++++++++++

|++X+XXX+++XXXX++XXX++40C |XXXXXXXXXXXXXXXXXXXXX

|++++XX+XXX++XXX+XX+XX20C |+++++++++++++++++++++

|+++++++++++++++++++++0C |+++++++++++++++++++++

+^-----^-----^-----^--6.0 7.0 8.0 9.0

+ - passX - FailA - Other fail

Page 26: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

26

EE 371: Debug ExamplesJ. Stinson © 2007 51

Scan Analysis

Add

rDec

ode

FF

FF

Scan Failure

Clean Scan

• Scan failure looked like OR of two entries

- Common for multiple WL firing (dynamic read)

- Uncommon for random logic

• Address decode was simple CMOS

EE 371: Debug ExamplesJ. Stinson © 2007 52

Wordline Driver

• Used a fancy self-resetting mechanism– Self-reset WL prevented read→write min-delay– Pulsed WL read array for short period of time

Input

rWLn1

Forward inverter

Feedback inverter

Page 27: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

27

EE 371: Debug ExamplesJ. Stinson © 2007 53

Wordline Driver: Problem

• Self-reset sized diff’t than forward path– Initial state could flip forward inverter but not

feedback (pseudo-metastable state)• Resolving pseudo-meta state

– Access WL– High temp– Low temp

Input

rWLn16

1

1

1

Forward inverter

EE 371: Debug ExamplesJ. Stinson © 2007 54

Summary

Page 28: EE371 Debug Examples - Stanford University · 1 J. Stinson © 2007 EE 371: Debug Examples 1 EE371 Debug Examples Intel Corporation jstinson@mipos2.intel.com J. Stinson © 2007 EE

28

EE 371: Debug ExamplesJ. Stinson © 2007 55

Summary

• Debug requires a lot of detective work– Review all the evidence– Develop experiments to eliminate possible problems– Develop theory of failure– Validate theory

• Can’t ignore ANY evidence– If something doesn’t fit, you’re missing something

• EVERY problem is different– Need to constantly think about alternative methods of validation

• The Norwegian capacitor• The Kleveland voltmeter