[ieee 16th asian test symposium (ats 2007) - beijing, china (2007.10.8-2007.10.11)] 16th asian test...

A High Compression and Short Test Sequence Test Compression Technique toEnhance Compressions of LFSR Reseeding

Seongmoon Wang Wenlong Wei Srimat T. Chakradharfswang,wwei,[email protected] ph:+1 (609)951-2992 fax:+1 (609)951-2482

NEC Labs. America, Princeton, NJ 08540, USA

Abstract

This paper presents a test data compression scheme thatcan be used to further improve compressions achieved byLFSR reseeding. The proposed compression technique canbe implemented with very low hardware overhead. Unlikemost commercial test data compression tools, the proposedmethod requires no special ATPG that is customized for theproposed scheme and can be used to compress test patternsgenerated by any ATPG tool. The test data to be stored in theATE memory are much smaller than that for previously pub-lished schemes and the number of test patterns that need tobe generated is smaller than other weighted random patterntesting schemes. Experimental results on a large industry de-sign show that over 1600X compression is achievable by theproposed scheme with the number of patterns comparable tothat of highly compacted deterministic patterns.

1 Introduction

Ascertaining high quality of test for complex chips re-quires huge test data. Hence test data volumes for complexchips often exceed memory capacity of ATEs (automatic testequipments). A number of different techniques to compresstest data have been developed. Several test data compressiontechniques based on linear feedback shift register (LFSR) re-seeding [9, 11] have been published since Konemann showedit can efficiently compress test patterns in [5]. LFSR Reseed-ing techniques take advantage of the fact that typical scan testpatterns have very few care (or specified) bits. Care bits arethose bits that are assigned binary values during test patterngeneration. All other bits are not specified, i.e., don’t cares.The compression ratio that can be achieved by LFSR reseed-ing is determined by numbers of don’t cares in test patterns.In LFSR reseeding, test pattern di, which has Si specifiedbits, is compressed into an Si �M bit seed, where M is asmall natural number.

ITRS 2005 [1] predicts that about 1000X compressionwill be required around 2013. Achieving 1000X compres-sion only by LFSR reseeding is very difficult. Weighted ran-dom pattern testing has been developed as a technique to im-prove fault coverage in random pattern-based built-in self-test (BIST) [2, 12]. Recently, an application of weightedrandom pattern testing techniques to test data compressionwas presented in [6, 7, 3], where weight data of the in-puts that have signal probabilities other than 0.5 are storedin the ATE and transferred to the chip during test applica-tion. Test data for inputs that have 0.5 signal probability are

generated by an on-chip random pattern generator. The 3-weight weighted random BIST scheme, which is also calledhybrid BIST [8, 10, 13], can be classified as an extreme caseof conventional weighted random pattern BIST. In contrastto conventional weighted random pattern BIST where vari-ous weights, e.g., 0, 0.25, 0.5 0.75, 1.0, can be assigned tooutputs of TPG’s, in 3-weight weighted random BIST, onlythree weights, 0, 0.5, 1, are assigned. Assigning weight 0 (1)to an input is implemented by fixing the input to logic 0 (1).

This paper proposes an efficient 3-weight weighted ran-dom pattern testing technique, which can be used to furtherimprove the compression that can be achieved by LFSR re-seeding. An efficient algorithm that generates optimal weightsets from a set of test patterns is proposed. The proposedmethod requires no special automatic test pattern genera-tor (ATPG) that is customized for the proposed compressionscheme and hence can be used to compress test patterns gen-erated by any ATPG tool. Recently, a test data compres-sion technique that is based on the 3-weight weighted ran-dom pattern testing technique is proposed [14]. Unlike otherweighted random pattern testing techniques [7, 3, 6], the pro-posed technique requires no on-chip memory to store weightsets. The computed weight sets are compressed by LFSR re-seeding, stored in the ATE memory in a compressed form,and transferred to the decompressor during test application.Since this technique requires no on-chip memory, it can beimplemented with very low hardware overhead. However,this technique requires two reseedable LFSRs each of whichshould be loaded with a separate seed for each weight set.The compression scheme proposed in this paper also uses a 3-weight weighted random pattern testing technique like [14].However, unlike [14], the proposed scheme needs only oneseed for a weight set. Hence the proposed scheme can achieveeven higher compression.

The rest of this paper is organized as follows. Section 2illustrates the generator and the conceptual decompressor forthe proposed compression method. In Section 3, architectureof the proposed decompressor is described. In section 4, thealgorithm of computing generators is described. Section 5presents variations of the basic decompressor described inSection 3. Section 6 discusses extension of the proposedmethod to multiple scan chain designs. Experimental resultsare shown in Section 7 and Section 8 has conclusions.

2 Preliminaries

A test cube is a test pattern that has unspecified bits. Agenerator for a circuit with n inputs, which is derived from

16th IEEE Asian Test Symposium

1081-7735/07 $25.00 © 2007 IEEEDOI 10.1109/ATS.2007.52

79


1081-7735/07 $25.00 © 2007 IEEEDOI 10.1109/ATS.2007.52

79


1081-7735/07 $25.00 © 2007 IEEEDOI 10.1109/ATS.2007.52

79

a set of test cubes, is represented by an n-bit tuple Gk ��Gk�� Gk

�� Gk

n �, where Gki � f�� X� Ug. If input pi is

always assigned X or 1 (0) in every test cube in the test cubeset and assigned 1 (0) in at least one test cube, then inputpi is assigned 1 (0) in the corresponding generator. If inputpi is never assigned a binary value 1 or 0 in any test cubein the test cube set, pi is assigned X in the correspondinggenerator. Finally, if input pi is assigned a 1 (0) in test cubeda and assigned a 0 (1) in test cube db in the test cube set,then test cube da is said to conflict with test cube db at inputpi and pi is assigned a U in the generator. Inputs that areassigned U ’s in Gk are called conflicting inputs of Gk.

In Figure 1, Dk � fd�� d�� d�g is a set deterministic testcube set that is compressed into a generator Gk. In the testcube set Dk, input p� and p� are assigned onlyX or 0. Henceweight 0 is given to p� and p� in Gk. Note that even if we fixinputs p� and p� to 0’s, we can still detect all faults that aredetected by d�� d�� and d�. Since input p� is always assignedX or 1 in every test cube, weight 1 is assigned to input p�,i.e., p� is fixed to 1. On the other hand, inputs p� and p�are assigned 0 in some test cubes and 1 in some other testcubes. Hence, unlike p�, p� and p�, we cannot fix these inputsto binary values and weight 0.5 is assigned to these inputs(symbol U that denotes weight 0.5 is given to p� and p� inGk). Finally, since the value at input p� is a don’t care inevery test cube, X is assigned to p� in Gk .

The three test cubes that are compressed into Gk can begenerated by the conceptual decompressor shown in Figure 1.The S-TPG and the F-TPG of the decompressor are con-trolled by the ATE during test application to generate desiredpatterns while the R-TPG is a free running random patterngenerator. The S-TPG controls the select input of the multi-plexer; if scan input pi is assigned a U (binary value), thenthe R-TPG (F-TPG) is selected as a pattern source for pi.The F-TPG generates the values for the scan inputs that areassigned binary values in the generator. The F-TPG can beimplemented with any linear test pattern generator such asan LFSR, an cellular automaton, or even a ring generator ofEDT [11]. If scan input pi is assigned a 1 (0) in Gk , then theoutput of the F-TPG is set to a 1 (0) at the cycles when a valuefor pi is scanned into the scan chain. The values required atthe output of S-TPG (F-TPG) are represented in S-pattern Sk

(F-pattern F k). X’s in both Sk and F k represent don’t cares.

In most LFSR reseeding techniques, each test pattern iscompressed into a separate seed. Assume that 100X com-pression is achieved by reseeding the F-TPG. If on average3 test cubes are compressed into a generator and about 10 %additional data, which include data for the S-TPG and addi-tionally specified bits (since multiple test cubes are mergedinto a generator, the number of specified bits in a genera-tor can be larger than the number of specified bits in any in-dividual test cube that was merged into the generator), areneeded for the proposed method, then the compression thatcan be achieved by the proposed method is approximately�� . In other words, the proposed methodimproves the compression achieved by LFSR reseeding by afactor of about 3. Although the proposed TPG requires extradata for the S-TPG in addition to data for the F-TPG, data forthe S-TPG is very low as described in the following section.

d 1

d 2

d 3

1 X X 1 X 0

X 0 X 0 0 X1 X X 0 X 1

p6p5 p4 p3 p2 p1

1 0 X U 0 U

Dk

GkFk 1 0 X X 0 XSk 0 0 X 1 0 1

01

scan chain

p6 p5 p4 p3 p2 p1

X1 0 UU0

0 0 X 1 0 1

1 0 X X 0 X

1 0 0 1 0 1

S-TPG

F-TPG

R-TPG

Figure 1. Test Cube Set and Generator3 Architecture of Proposed Decompressor

Figure 2 (a) shows architecture of the decompressor forthe proposed method, which requires very small amount ofdata for the S-TPG. Let us consider generating test patternsby the decompressor for the generatorGk shown in Figure 1.The proposed decompressor is composed of three TPGs: theF-TPG, the R-TPG, and the S-TPG. In this particular exam-ple, the F-TPG is implemented with an LFSR that has re-seeding capability. However, it can be replaced with any typeof linear test pattern generator. Before each test pattern isgenerated, the F-TPG is loaded with a proper seed that wascomputed from F k by solving linear equations. The R-TPGcan be implemented with a simple LFSR that has no reseed-ing capability or shared with the LFSR that implements theF-TPG.

In this paper, the maximum number of conflicting inputsor U ’s allowed in a generator is denoted by Umax. If largenumber ofU ’s are allowed, i.e., largeUmax is used, then largenumber of test cubes can be compressed into each generator.However, to guarantee detection of all the faults targeted bythe set of test cubes that are compressed into a generator, ev-ery test cube in the test cube set should cover at least one testpattern generated by the decompressor using the generator.An n-bit test cube ta is said to cover another n-bit test cubetb if (i) tax � v or X , where v � � or 1, at the positions wheretbx � v, where x � �� n, and (ii) tay � X at the posi-tions where tby � X , where y � �� n. If large Umax,say 10, is used, in general a lot more than �� patterns shouldbe generated by each generator. Hence, we use Umax � �.

As shown in Figure 2 (a), the S-TPG is comprised of amodulo-7 counter, a � � � FIFO, a multiplexer, and a com-parator. The modulo-7 counter is reset to 0 in every capturecycle (before the scan shift operations to load the next testpattern) and then increments by 1 thereafter at every shift cy-cle. The FIFO is loaded with the locations of conflicting scaninputs of Gk, i.e., the scan inputs that are assigned U ’s in Gk

(since scan inputs p� and p� are assigned U ’s in Gk shown inFigure 1, the FIFO is loaded with 1 and 3). The output of thecomparator, s, is set to 1 when the content of the modulo-7counter is equal to the first (topmost) entry of the FIFO andset to 0 in all other cycles. Hence s will be set to 1 in thecycles when the content of the counter is 1 and 3 and 0 in allother cycles. When s is set to 1, all the entries in the FIFOare rotated by one entry.

Data for the S-TPG should also be stored in the ATE.Hence the proposed TPG requires additional storage data ontop of the data for regular LFSR reseeding, i.e., the data forthe F-TPG. However, since very small number of U ’s are al-lowed, the number of storage elements required for the FIFOis also very small. The storage bits for the S-TPG, i.e., thenumber of bits required to store locations of conflicting scaninputs of a generator is given by Umax � dlog�SLe, whereSL is the number of scan flip-flops in the scan chain, or scan

808080

decompressor

0

R-TPG

p1p2p3p4p5p6

scan chain

1 0 0 00 1 0 00 0 1 01 0 0 11 1 0 00 1 1 01 0 1 1

F-TPG

X10100

X0XX01

f4 f3 f2 f1time

0123456

A

F-TPG

f4 f3 f2 f110

0

s

comparator=

(modulo-7)

scan_en

FIFO

s

1 3

3

counter

S-TPG

S-TPG

SI

SI

TE

1

Gk 1 0 X U 0 U 1 0 1 0 0 0 1 0 1 1 0 0 1 0 1 0 0 1

t1

t2t3

generated patterns

t4 1 0 1 1 0 1

(a) (b)

Figure 2. Decompressor Architecture (a) Decom-pressor (b) Test Patterns Generated by Decompres-sor from Gk

depth. Let us consider a scan design with 130000 scan flip-flops. (This is only to give a rough idea on the data vol-ume for the S-TPG. In reality, this long scan chain will besplit into many shorter scan chains.) Assume that if a reg-ular LFSR reseeding technique is used to compress the testset, it requires an LFSR with 650 (0.5% of total number ofscan flip-flops) stages (this will achieve 200X compression).If Umax � �, then total data overhead for the S-TPG is�dlog��e � �� bits for each generator, less than 10%of the test data volume for LFSR reseeding.

The FIFO for the S-TPG also needs 51 storage elements(bits), depth 3 and width 17. The depth of the FIFO is inde-pendent of the size of design for which the decompressor isdesigned. Note that the width of the S-FIFO, which stores thelocations of scan flip-flops, is logarithmically proportional tothe number of scan flip-flops in the design. Hence the storageelements (also, hardware overhead) for the S-TPG will notincrease significantly even for large designs. Data overhead(ratio of the storage elements for the S-TPG to the storageelements for the regular LFSR reseeding) for the S-TPG willbe even lower for larger designs. Other hardware componentsrequired to implement the proposed decompressor in additionto the FIFO in the S-TPG and the F-TPG include a dlog�SLe-stage modulo counter and a dlog�SLe-bit comparator (a 17-stage counter and a 17-bit comparator when SL � ��).Combined area overhead for these components is negligiblylow considering the size of the design that has 130000 flip-flops (since the R-TPG can be shared with the F-TPG, it willnot be considered as additional hardware).

4 Computing Generators

If an LFSR is used to implement the F-TPG, the number ofbits to be stored in ATE memory for F-TPG seeds is roughlygiven by the number of generators � the number of bits of aseed (since data for the S-TPG are small as described in theprevious paragraph, F-TPG seeds will dominate the overalltest data volume for the proposed method). Hence, to min-imize overall test data volume, numbers of specified bits ingenerators and the total number of generators should be min-imized. Typically, even if some specified bits in a test cube,which was generated by an ATPG tool, are relaxed to X’s(don’t cares), all the faults that are detected by the original

test cube can still be detected. In this paper, the number ofX’s in each generator is maximized by relaxing such over-specified bits in test cubes. Then we try to merge as manytest cubes as possible into each generator to minimize the to-tal number of generators by using the algorithm described inthe following section.

4.1 Algorithms to Compute Generators

Let the set of test cubes to be compressed by the proposedmethod be D. Test cubes in D are grouped into smallertest cube subsets, D�� D�� , and a generator Gk, wherek � �� , is computed from each test cube subset Dk.Each test cube subset Dk is constructed by moving test cubesfromD into Dk until adding any more test cube in D into Dk

makes the number of care bits (0, 1, U ) in the correspondinggenerator Gk greater than a predefined number Smax or thenumber of conflicting inputs in Gk greater than another pre-defined number Umax.Example 1: Figure 3 illustrates computing generators froma set of test cubes D, which has 12 test cubes. Assume thatSmax is set to 6 andUmax to 2 and the F-TPG is implementedwith an LFSR, which has Smax �M stages, where M is asmall natural number. First, we run fault simulation with theentire test cubes in D and identify the set of faults Ej thatare detected by each test cube dj , where j � �� .The set of faults Ej is called the target fault list of dj andthe faults in Ej are called the target faults for dj . Then,we start constructing test cube subsets starting from D� bymoving test cubes from D one test cube at a time. Thecolumn jEj j shows numbers of faults in target fault lists.First, an empty set D� is created and generator G� is ini-tialized to � X�X�X� � � � � X �. The test cube that has themost faults in its target fault list is selected as the first testcube to be moved. Since d� has the most faults in its tar-get fault list, d� is selected first to be moved into D�. Af-ter d� is added into D�, G� �� G�

�� G�

�� G�

�� is up-

dated to � �� X� �� X� �� X �. Next, the test cubethat will cause the minimum number of conflicting inputsin Gk is selected from D. Since d� causes only 1 con-flicting inputs and 6 care bits in Gk (smaller than Smax),d� is selected as the next test cube. We try to relax over-specified bits in d�. Assume that no over-specified bits arerelaxed to X’s in d�. After d� is added into D�, Gk isupdated to � �� X� U� �� X� �� X �. Adding d�� andd� into D� both causes only 1 additional conflicting inputin Gk and does not make the number of care bits in Gk

greater than Smax. Assume that d�� is selected as the nexttest cube. Also assume there are no over-specified bits ind��. Hence d�� is added into D� as it is. Gk is updated to� �� X� U� �� X� U�X �.

Since the numbers of care and conflicting inputs to beincurred by adding a test cube are computed before over-specified bits in the test cube are relaxed, some test cubesthat make the number of care bits in Gk greater than Smax

or the number of conflicting inputs greater than Umax beforethe relaxation can actually be added without exceeding Smax

or Umax after over-specified bits in test cubes are relaxed toX’s. Hence, we introduce margins Mu and Ms to compen-sate for this inaccuracy. If no test cube in D can be added intoD� without exceeding Smax or Umax before the relaxation,

818181

then we select a test cube in D that does not make the numberof care bits in Gk greater than Smax �Ms or the number ofconflicting inputs greater than Umax � Mu and relax over-specified bits in that test cube. Assume that margins Ms andMu are both set to 1. If the selected test cube still makes thenumber of care bits in Gk greater than Smax or the number ofconflicting inputs greater thanUmax even after over-specifiedbits are relaxed to X’s, the selected test cube is returned to D.

Now no test cube in D can be added without exceed-ing Umax or Smax. However, adding d� (before relaxingover-specified bits) makes the number of specified bits 7 (notgreater than Smax �Ms) and the number of conflicting in-puts 2. Hence d� is selected as the next candidate. Assumethe 1 assigned at p� is relaxed to X . Hence d� is added intoD�. Adding d� does not change generator Gk. Next, addingd� into D� makes the number of conflicting inputs in Gk 3(Umax � Mu � �) and makes the number of care bits 7(Smax � Ms � �). Hence d� is selected as the next can-didate. Assume that no specified bits in d� can be relaxed.Hence d� cannot be added into D� and thus returned to D.Now no more test cube from D can be added into D� with-out making the number of specified bits in Gk greater thanSmax �Ms or the number of conflicting inputs greater thanUmax �Mu. Hence the construction of D� is completed.

We obtain F-pattern F � �� X�X� �� X�X�X �from generator G� �� X� U� �� X� U�X �. Nexta seed for F � is computed by using a linear solver. Weload the F-TPG with the computed seed and load the S-TPGFIFO with locations of conflicting scan inputs of G�, i.e., 2and 6. �Umax test patterns are generated by using the pro-posed decompressor. If there is any deterministic test cube inD� � fd�� d�� d�� d�g that covers no test pattern in the setof �� test patterns generated by the decompressor, then moretest patterns are generated by the decompressor until all the4 test cubes cover at least one test pattern generated by thedecompressor. We run fault simulation with the generatedtest patterns, which are fully specified, and drop all detectedfaults from the target fault lists of test cubes remaining in D.Note that jEj j of some test patterns dj have reduced due todropped faults. This process is repeated until all test cubesare removed from D. In this example, the 12 test cubes in Dare compressed into 4 generators. �

4.2 Overall Algorithm

Now the procedure to compute generators from a set oftest cubes is summarized in the following:1. Define Smax (the maximum number of care bits allowedin any generator) and Umax (the maximum number of con-flicting inputs allowed in any generator). k � �.2. Unmark all test cubes in D. Gk �� X�X� � � � � X � andDk � fg. Select a test cube that has the largest number offaults in its target fault list from D, relax over-specified bitsin the test cube, and move it to Dk. Update Gk accordingly.3. If D is empty, go to Step 5. If there is at least one testcube in D that can be added into the current test cube sub-set Dk without making the number of U �s in Gk greater thanUmax or the number of care bits in Gk greater than Smax,then select test cube da from those test cubes that will causethe minimum number of new U ’s in Gk, add the test cube da

into Dk after relaxing over-specified bits in da, and update

d2: 1 0 1 1 X 0 X 1 1

d4: X 0 X 0 0 X 1 X Xd5: X X X 0 X X X 1 X

d12: 0 0 X 1 X X 0 X 1

d3: 1 X 0 1 X 0 X X X

d6: X 1 0 X X 1 X 0 1

d7: 1 X X 0 X 0 0 X X

d11: X X 1 1 X X 1 X 0

d9: 1 X 0 1 X 1 X X Xd10: X 0 X 0 0 1 X 0 X

d8: X 1 X 0 0 0 1 X 0

d1: 0 0 X 1 0 1 X 1 X

d1: 0 0 X 1 0 1 X 1 Xp9 p8 p7 p6 p5 p4 p3 p2 p1

d10: X 0 X 0 0 1 X 0 X

G1: 0 0 X U 0 1 X U X

d5: X X X 0 X X X 1 X

d2: 1 X 1 1 X 0 X 1 Xd3: 1 X 0 1 X 0 X X X

G2: 1 X U 1 X U X 1 X

d9: 1 X 0 1 X 1 X X X

d6: X 1 0 X X 1 X 0 1

G3: X 1 0 X X 1 X 0 1d8: X X X 0 0 0 X X 0

d7: 1 X X X X 0 X X X

G4: 0 X X U 0 0 U X 0

d12: 0 X X 1 X X 0 X Xd11: X X X 1 X X 1 X 0

(a)

D2

D1

D3

D4

D

d4: X 0 X 0 0 X X X X

(b)

|Ej|

8

7

5

6

6

4

3

33

1

22

8526

6531

3

12

1

Figure 3. Constructing Test Cube Subsets (a) Origi-nal Test Cube Set D (b) Partitioned Test Cube Sub-sets

Gk accordingly. Otherwise go to Step 4. Repeat Step 3.4. If there is at least one unmarked test cube in D that doesnot make the number ofU �s inGk greater thanUmax�Mu orthe number of care bits in Gk greater than Smax�Ms when itis added into Dk, then select a test cube db randomly amongthose test cubes and relax over-specified bits in test cube db.Otherwise, go to Step 5. If the relaxed test cube db now can beadded to Dk without making the number of U �s greater thanUmax or the number of care bits in Gk greater than Smax,then add test cube db into Dk and update Gk accordingly.Otherwise, mark db and put db back into D. Repeat Step 4.5. Compute a seed for F k, generate test patterns by simu-lating the decompressor, fault simulate with the test patternsgenerated by the decompressor using Gk, and drop the de-tected faults from the target fault list of every test cube in D.k � k � �. If D is empty, then exit. Otherwise go to Step 2.

5 Variations of the S-TPG

5.1 Scheme to Generate More Variable Patterns

The compression ratio that can achieved by the proposedtechnique depends on the number of specified bits in testcubes like other test data compression techniques. Considerthe decompressor and the test patterns generated by the de-compressor shown in Figure 2. Only two inputs, p� and p�,are assigned U ’s in Gk . Hence, as Figure 2 (b), all the 4 testpatterns generated by the decompressor differ only at 2 in-puts, p� and p�. Since the test patterns generated are verysimilar, only a few new faults will be detected by each testpattern after the first test pattern t� and only a small numberof specified bits can be relaxed to X’s from test cubes in D.This leads to increase in the number of generators.

The decompressor shown in Figure 4 (a) can gener-ate test patterns with more variations from the same gen-erator. Consider generating test patterns for Gk ��X� �� X�X�X�U�X� �� X�X�U� �� X� �� by the decom-pressor shown in Figure 4 (a). Note that even if Gk has onlytwo U ’s, the S-TPG FIFO has 4 entries and a toggle flip-flopis inserted between the select signal of the multiplexer and theoutput of comparator. Because of the toggle flip-flop, signals changes its state only when the counter value is equal to theoutput of the FIFO. In consequence, as shown in Figure 4 (a),

828282

p1p5p10p15

A

comparator=

(modulo-16)

scan_en

58

4

counter

S-TPG0

1

Gk X 0 X X X U X 1 X X U 1 X 0 1

1014

T

R-TPG

F-TPG s

s

shift register

1

0

comparator =

(modulo-7)scan_en

FIFO

s

3 6

3

counter

S-TPG

001110 -

R-FIFO 01

(a) (b)

0 0 1 1 1 1 0 0 1 1 1 0 0 0 0

FIFO

F-TPG R-TPG

TE

ATE

G4 0 X X U 0 0 U X 0

0 0 1 0 1 0 0 1 1 1 1 1 0 0 10 0 0 1 0 1 0 1 1 0 0 1 0 0 10 0 1 0 0 1 0 1 0 1 1 1 0 0 10 0 1 1 1 0 0 1 0 0 1 1 0 0 1

t1t2t3

t4

0 0 1 0 1 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 1 1 1 0 0 0 0

t1

t2t3

Figure 4. Variations of Proposed Decompressorthe set of consecutive inputs p�� p�� and p and another setof consecutive inputs p�� p�� p�� and p�� can be assigneddifferent values in each test pattern. Since test patterns withmore variations detect more new faults, this can reduce thenumber of generators.

5.2 Scheme to Reduce Test Sequence Length

In Figure 1, inputs p� and p�, which are assigned U ’s, areassigned respectively 10 in test cube d�, 0X in test cube d�,and 01 in test cube d�. Hence in order to guarantee detectingall faults that are detected by d�� d�� and d�, the decompres-sor should continue generating test patterns using the samegenerator Gk until it generates 3 test patterns each of whichrespectively assigns 10, 00 or 01, and 01 to p� and p�. Sincethe R-TPG is free running, i.e., it is not loaded with seeds tochange sequences of patterns, the R-TPG may not generatea test pattern that assigns a desired pattern to the conflictinginputs of the generator for a long period time. Even thoughthis does not increase test data volume, it can increase testapplication time, which is also one of important terms thatdetermine test cost. Another variant of proposed decompres-sor, which is shown in Figure 4 (b), can be used to reduce thenumber of test patterns generated by using each generator.Note that in this variation, the R-TPG is implemented with aFIFO (called R-FIFO) and a shift register.

Consider generating test patterns by generator G� shownin Figure 3 (b). G� has 2 conflicting inputs,U ’s, at p� and p�.Since there are 2 U ’s, if the basic scheme shown in Figure 2,�� test patterns will be generated. In contrast, this variation ofdecompressor generates only Np test patterns for each gener-atorGk, whereNp is the number of test cubes that are mergedinto the corresponding test cube subset Dk. For example,,D� shown in Figure 3 (b) has 3 test cubes, d�� d�� and d��.Hence the decompressor generates only 3 test patterns fromG� rather than 4. Hence in addition to data for the FIFOs (inthe R-TPG and the S-TPG) and the F-TPG, the number oftest patterns to be generated by each generator should also bestored in the tester memory. However the volume for thosedata is small compared to the volumes for the S-TPG and theF-TPG. Note that the R-FIFO is loaded with 00, 11, and 10,which are covered respectively by 0X, 11, and 10 (the val-ues p� and p� are assigned in the 3 test cubes d�� d�� and

d��). In each capture cycle, the value in the first entry of theR-FIFO is loaded into the shift register and the other entriesin the R-FIFO are shifted up by one entry. Then in everyscan shift cycle when the counter value is equal to the outputvalue of the S-TPG FIFO, i.e., signal s is set to 1, the last bitin the shift register is shifted into the the scan chain. Notethat if Np test cubes are compressed into a test cube subsetDk, then only Np test patterns need to be generated by thedecompressor using Gk.

6 Extension to Multiple Scan Chain

Figure 5 depicts an implementation of the proposed de-compresser for a circuit with multiple (512) scan chains.For convenience of illustration, assume that all of the 512scan chains are comprised of 256 scan flip-flops withoutany loss of generality (hence the design has total 131,072scan flip-flops). Scan chains are organized into 64 groups,group�� group�� group��. Although in this particularexample, every group contains the same number (8) of scanchains, it is not necessary for every group to have the samenumber of scan chains. Like the decompressor for single scandesigns (see Figure 2), a multiplexermh is inserted before theinput of each scan chain chainh, where h � �� , toselect a scan pattern source between the output of the F-TPG(fh) and the output of the R-TPG (rh). The select inputs forall 8 multiplexers in each group groupa are driven by the out-put of a common 2-input AND gate, which is in turn drivenby ga and s. Hence if the output of the AND gate for scanchain group groupa is set to a 1 (0) at ith shift cycle, then allith flip-flops of the 8 scan chains in groupa are loaded withvalues generated by the R-TPG (F-TPG). Unlike the S-TPGFIFO for single scan chain version that stores only locationsof the scan flip-flops, each entry of the S-TPG FIFO for mul-tiple scan chain version is divided into two sections: one forgroup identification number (group ID) and the other for lo-cation of the scan flip-flop in the scan chain. For example,the first (topmost) entry in the FIFO shown in Figure 5 has 1for the group identification and 13 for the location of the scanflip-flop. The group ID section of the first entry is input to thedecoder to generate signals ga, where a � �� , eachof which controls the AND gate in the corresponding groupgroupa, and the location of the scan flip-flop of the first entryis input to the comparator. The output of the comparator s isset to 1 when the scan flip-flop section of the first entry equalsthe content of the counter. Otherwise s is set to 0.

The main purpose of organizing scan chains into groupsis to reduce hardware overhead for the decoder (if Nh scanchains are grouped into each group, then the number of out-puts of the decoder can be reduced by a factor of Nh). Group-ing can also reduce overall test data volume to be stored in theATE memory and the number of storage elements requiredfor the S-FIFO. If we reduce the number of chains in eachgroup from 8 to 4 for the decompressor shown in Figure 5,then the total number of groups increases from 64 to 128.The 6x64 decoder should be replaced by a 7x128 decoder,which is larger than the 6x64 decoder. We also need 64 more2-input AND gates and extra routing to connect the 64 addi-tional AND gates to the outputs of the decoder. In addition,the group ID sections of the S-TPG FIFO need one more bit.

838383

chain1

chain8

r1f1 . . .

chain241

chain248

. . .

=

. . .

group1

group31

group64

chain512r512

f512

U

X

1

X

X

X

decoder . . .

1

31

646

22413

0

131

0counter

r248

f248

r241

f241

. . .

13

224

. . .

m1

m241

m248

m512

. . .

. . .. . .

g1

g31

g64

1

chain505

X1

r505

f505

m505

FIFO

U

0

X

U

R-TPG

F-TPG

r1r2

r512

f1f2

f512

. . .. . .

. . .

. . .

. . .

. . .. . .

. . .

s

S-TPG

. . .

. . . . . .

f8

r8 m81. . .

. . .

. . .

1. . .. . .

1. . .. . .

. . .

1. . .. . .

. . . 1. . .. . .

. . .. . .

q1

q31

q64

phase

shifte

rphase

shifte

r

LF

SR

LF

SR

. . .. . .

. . .

. . .

. . .

. . .

. . .

. . .. . . . . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

121

0

X. . .

. . .

Figure 5. Decompresser for 512 Scan ChainsAs an extreme case, if scan chains are not grouped, i.e., eachgroup has only one scan chain, then the decoder needs 512outputs and the S-FIFO needs the 9 bits for its group ID sec-tion. If the design has large number of scan chains, hardwareoverhead for the decoder may be significant.

The S-TPG FIFO shown in Figure 5 is loaded to gener-ate test patterns for a generator that has 3 conflicting inputs(U ’s): the 13th scan input of scan chain chain�, the 13thscan input of chain�, and the 224th scan input of chain��.Since all 8 multiplexers in a group groupa are controlled bythe same signal ga, if the ith scan input in scan chain chainhis assigned a U in Gk , then all ith scan inputs in the other7 scan chains in the same group can be assigned only U orX , but neither 1 nor 0 (because all the 8 ith scan inputs areassigned values generated by the R-TPG). The 224th scan in-puts of all scan chains in group g�� except the 224th scaninput of chain�� are assigned X’s (chain�� chain��are not shown in Figure 5). Likewise, the 13th scan inputs ofall scan chains in group g� except chain� and chain� are as-signed X’s in generator Gk. The S-TPG FIFO is loaded withtwo valid entries � �� and � �� (the last entry� �� is not valid). Since the group ID field of the firstentry is 1, initially signal g� is set to 1 and all the other out-puts of the decoder are set to 0. In the 13th scan shift cycle,i.e., when the content of the counter in the S-TPG becomes13, the output of the comparator s is set to 1 and the outputof the AND gate q� transitions to 1. Therefore all scan chainschainh, where h � �� , in group�, are loaded withthe values generated by the R-TPG (all the other scan chainsin the design are loaded with the values generated by the F-TPG) in the 13th shift cycle. Then the entries of the FIFOare rotated up by one entry and � �� becomes thefirst entry. In the 224th scan shift cycle, the scan chains ingroup�� are loaded with the values generated by the R-TPGand the entries of the FIFO are again rotated up by one en-try. When the scan test pattern is fully loaded into the scanchains (at the 256th scan shift cycle), the counter is reset to 0.This makes the entries of the FIFO rotated up again and theFIFO returns to its initial state. This is repeated for all �Umax

dj

p13

chain1

chain3

chain8

. . .p121

0

X

dj

G’k

. . . . . .. . .

. . .. . .. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .. . .. . .

. . .

. . .

. . .

. . .

. . .

. . .

0

UG’k 0 . . .

. . . . . . . . .G’k XXX

XX

UX

XXX

Gk

. . .

. . .

. . .

. . .

. . .

. . .

Gk X

. . .

. . .

. . .

Gk

dj0

1

Figure 6. Updating Generator Values(Umax � �) test patterns, which are generated from the samegenerator Gk.

As described above, the ith scan inputs of all 8 scan chainsin groupa receive their scan values from the same TPG, ei-ther the R-TPG or the F-TPG in any scan shift cycle. Hence,during the process of computing generators (see Section 4),if adding a test cube into the current test cube subset causesa conflict at the ith input of scan chain chainh that belongsto groupa, i.e., changes the ith input of chainh in the currentgenerator to U from v, where v � � or 1, then the genera-tor values assigned at the ith inputs of the other scan chainsthat belong to the same group groupa and are currently as-signed binary values (0 or 1) in the current generator shouldbe changed to U ’s. (If the ith scan input of a scan chain in thesame group is currently assigned an X in the current genera-tor, its value need not be changed.)

This is described more clearly with Figure 6, which illus-trates adding a new test cube dj into the current test cubesubset. The 13th scan input of scan chain chain�, which isdenoted by p��, is assigned a 1 in the generator Gk and the13th scan input of scan chain chain�, which belongs to thesame group as chain�, is also assigned a binary value 0 in Gk

before dj is added. Now test cube dj is added into the currenttest cube set. p�� is assigned a 0 in dj . Hence adding dj intothe current test cube subset causes a conflict at p�� and the1 at p�� in Gk is now changed to U (Gk after dj is addedinto the current test cube set is denoted as G�k in Figure 6).Even though p�� is assigned X in dj (an X does not causea conflict with any value), the 0 at p�� is also changed to Uin G�k due to the U at p�� in G�k . On the other hand, sinceinput p�� is assigned X in Gk before dj is added and alsoassigned X in dj , input p�� holds its previous value X inG�k even after dj is added.

For the reason described in the previous paragraph, if eachgroup contains large number of scan chains, then the num-ber of conflicting inputs will quickly reach Umax. In otherwords, if the number of scan chains in each group is too large,then the number of test cubes that can be added into each testcube subset will decrease. This will in turn increase the totalnumber of generators and decrease compression. On the con-trary, if the number of scan chains in each group is too small,then it will increase hardware overhead and also test data vol-ume. The optimal number of scan chains in a group shouldbe determined by considering the number of specified bits in

848484

test cubes; if test cubes are sparsely specified, large sizes ofgroups will be preferable. According to our extensive exper-iments (see Section 7), group size 8 is reasonable (did notincrease the number of generators) for most large designs.The size of groups can be maximized without increasing thenumber of generators by carefully organizing scan chains intogroups. Assume that the scan inputs of scan chain chaina donot drive the part of the circuit that is driven by the scan in-puts of chainb. Hence, when the scan inputs of chaina arespecified, the scan inputs of chainb will not be specified. Inother words, when an input of chaina, pa�z, is changed to aU , input pb�z of chainb will stay at X . Placing such scanchains, which have little structural correlations, into the samegroup will allow to increase the group size without increasingthe total number of generators.

Let us consider hardware overhead required for the de-compressor shown in Figure 5. Extra hardware that is re-quired to implement the proposed decompressor in additionto the F-TPG, which is required also for a regular LFSR re-seeding technique, is the S-TPG, 64 2-input AND gates, and64 2-to-1 multiplexers. The S-TPG is in turn comprised ofa 3x14 (6 bit group identification and 8 bit scan flip-flop lo-cation) FIFO, a 6-to-64 decoder, an 8-stage counter, and an8-bit comparator. Since the R-TPG is a free running pseudo-random test pattern generator, it can be shared with the F-TPG. The gate equivalent (the number of gates in two in-put NAND gates) for the 6-to-64 decoder (we synthesizedthe decoder by Synopsis Design Compiler) is 385 and thegate equivalent for the 8-bit comparator is 116. If we as-sume that the gate equivalent for a storage element is 6, thetotal gate equivalent for the 3x14 FIFO is 252. The 8-stagecounter can be implemented with 48 NAND gates. Since a2-to-1 multiplexer can be implemented with 4 NAND gates,the gate equivalent for the 64 2-to-1 multiplexers is 256. Thegate equivalent for all components mentioned above is about1100. Considering the size of the design, which has morethan 130,000 flip-flops, overhead for 1100 2-input NANDgates will be almost negligible. Like the single scan chainversion, the width of the S-TPG (and also the counter and thecomparator) is logarithmically proportional to the scan depth(the number of scan flip-flops in the longest scan chain).Hence hardware overhead will not increase significantly evenfor larger designs that have very large number of scan scanflip-flops. This analysis demonstrates that hardware overheadfor the multiple scan chain version of the proposed decom-pressor will also be very low and scalable to large designs.

7 Experimental Results

Table 1 compares compressions achieved by using onlyLFSR reseeding (columns under the heading Only LFSR Re-seeding), using the proposed method along with LFSR re-seeding, and using other recent compression techniques [3,11, 4]. For LFSR reseeding, we used our proprietary highcompression LFSR reseeding technique. The columns # patgive the number of test patterns applied and the columns# store (bits) give the total number of (compressed) databits that need to be stored in ATE memory. For the pro-posed method, we show results obtained by using all thethree different decompressor schemes: the basic scheme (see

Figure 2), Variation-I (see Figure 4 (a)), and Variation-R(see Figure 4 (b)). For the LFSR reseeding (heading OnlyLFSR Reseeding) and the proposed method, we first applieda sequence of pseudo-random patterns to drop easy-to-detectfaults. Then for the remaining undetected faults, we gen-erated deterministic test patterns by an in-house ATPG andcompressed them only by LFSR seeding or the proposedcompression method. The number of pseudo-random pat-terns applied to drop easy-to-detect faults is included in thetotal number of test patterns reported in the columns # pat.The columns # Gen show the number of generators.

The results clearly demonstrate that the proposed methodcan efficiently improve compression ratios that are achievedwhen only LFSR reseeding is used. Large reductions wereachieved especially for ITC benchmark circuits; numbers ofstorage bits for the proposed method are only about 1/2-1/3of those of storage bits for LFSR reseeding for all ITC bench-mark circuits. Note that the Variation-R scheme reduced thenumber of storage bits by a factor of about 3.4 for b17s with-out any increase in the number of patterns. Among the threedifferent decompressors, the basic scheme achieved the high-est compression and the Variation-R scheme obviously (thenumber of decompressed patterns generated by the Variation-R decompressor is always same as that of the deterministicpatterns compressed by the proposed method) generated thesmallest number of patterns.

We first compare our results with another hybrid BIST [3].Since [3] applied very long (32000 patterns) sequences ofpseudo-random patterns, we conducted experiments withlonger sequences of pseudo-random patterns for fair compar-isons. Those results are shown in the columns stor (bits)�

and # pat�. Even if a shorter test sequence was applied,the number of storage bits for the proposed method is a lotsmaller than that of storage bits for [3] for every circuit ex-cept s15850. Numbers of storage bits of the proposed methodare much smaller than also those of [11, 4] for most circuits.However, because numbers of test patterns are not reportedin [11, 4] and the compression depends on the number of testpatterns generated, fairness of the comparisons with [11, 4]is limited.

Table 2 shows results of gate delay patterns for industrialdesigns. The column # FF gives the number of flip-flops inthe circuit. The columns under the heading Determin. showresults on highly compacted deterministic delay test patternsgenerated by an in-house ATPG. Results obtained by usingthe proposed method are given under the heading Proposed.The columns FE% give achieved fault efficiencies. The com-pression obtained by using the proposed scheme is shownin the last column, labeled CR. The compression is calcu-lated as the ratio of storage bits required for highly com-pacted deterministic patterns to that required by the proposedscheme. Over 1600X compression was achieved for D3 bythe proposed method. Note that the number of test pat-terns increased only about 33 % against the deterministic testset (this increase is due to the uncompaction procedure andthe pseudo-random patterns applied to detect easy-to-detectfaults). About 500X compression was achieved for D2 andthe increase in the number of test patterns is only 45 %. Notethat higher compressions were achieved for larger designs.The column TR gives the reduction in total test cycles (the

858585

Table 1. Experimental ResultsCKT Only LFSR Proposed [3] [11] [4]

Reseeding Basic Variation-I Variation-R hybrid BIST EDT Huffman# stor # # stor # stor # # stor # # stor # stor stor stor

Name pat (bits) Gen pat (bits) pat� (bits)� Gen pat (bits) Gen pat (bits) pat (bits) (bits) (bits)

s5378 253 8085 71 523 4794 71 523 5982 77 253 5040 5676 9358s9234 465 15810 90 958 7301 94 1012 9078 101 465 7931 9534 15511s13207 448 14101 84 892 7401 32139 1005 77 969 8675 85 448 7723 33K 1200 10585 18384s15850 407 13144 87 625 9198 35790 3976 91 608 10547 94 407 9423 36K 3740 9803 18926s38417 536 49140 153 809 36495 41234 17192 155 803 38264 158 536 36884 42K 23856 31458 58785s38584 532 27028 114 945 15402 32066 1544 109 865 17388 119 532 15622 35K 4206 18567 55200b17s 914 152622 243 1912 44544 240 1875 49782 257 914 44864b20s 1261 99840 289 3192 38344 286 3192 44980 304 1261 40222b21s 1226 87676 279 3199 33788 282 3142 40520 302 1226 35862b22s 1200 121806 385 2713 65107 391 2767 72278 418 1200 70319

Table 2. Results on Industrial DesignsCKT Determin. Proposed

Name # FF FE% # pat FE % # Gen # tpat CR TR

D1 2193 98.22 1838 98.23 829 6655 33 1(60)D2 65265 98.00 24962 98.01 12600 36201 498 13(171)D3 258745 95.39 87562 95.39 35041 116341 1602 47(999)

number in the parenthesis of the same column is the numberof scan chains in the design). For deterministic test patternresults, we used 16 scan chains for every design.

8 Conclusions

In this paper, a test data compression scheme that can beused to further improve compressions achieved by LFSR re-seeding is presented. The proposed method consists of anovel decompressor architecture and an efficient algorithm tocompute generators (weight sets) that lead to minimum testdata volume. In addition to a linear test pattern generator thatis used for reseeding, the proposed scheme need only one ortwo small FIFOs, several multiplexers and AND gates, and acomparator. The proposed method requires no on-chip mem-ory to store weight sets. Hence the proposed decompressorcan be implemented with very low area overhead. Two varia-tions of the decompressor, which can be adopted for differenttesting requirements such as short test time application, arealso proposed. Unlike most commercial test data compres-sion tools, the proposed method requires no special ATPGthat is customized for the proposed compression scheme andcan be used to compress test patterns generated by any ATPGtool.

Experimental results show that the proposed method caneffectively improve compressions achieved by LFSR reseed-ing without increasing test sequence length significantly.Over 1600X compression was achieved for a large indus-trial design with only about 30 % increase in the number oftest patterns against a highly compacted deterministic test set.Numbers of test patterns generated by the proposed methodare comparable to those of highly compacted deterministictest patterns for most circuits. The test data to be stored inthe ATE memory are much smaller than that for previouslypublished schemes and the number of test patterns that needto be generated is smaller than other weighted random patterntesting schemes.

References[1] International Technology Roadmap for Semiconductors. Test

& Test Equipment, 2005.[2] M. Bershteyn. Calculation of Multiple Sets of Weights for

Weighted Random Testing. In Proceedings International TestConference, pages 1031–1040, 1993.

[3] A. Jas, C. V. Krishna, and N. A. Touba. Weighted Pseudoran-dom Hybrid BIST. IEEE Trans. VLSI Systems, 12(12):1277–1283, December 2004.

[4] X. Kavousianos, E. Kalligeros, and D. Nikolos. Efficient Test-Data Compression for IP Cores Using Multilevel HuffmanCoding. In Proceedings Design Automation and Test in Eu-rope, pages 1–6, 2006.

[5] B. Konemann. LFSR-Coded Test Patterns for Scan Designs.In Proceedings European Design and Test Conference, pages237–242, 1991.

[6] B. Konemann. Care Bit Density and Test Cube Clusters:Multi-Level Compression Opportunities. In ProceedingsIEEE International Conference on Computer Design, pages320–325, 2003.

[7] B. Konemann. STAGE: A Decoding Engine Suitable forMulti-Compressed Test Data. In In Proceedings 12-th AsianTest Symposium, 2003.

[8] B. Konemann, K. D. Wagner, and J. A. Waicukauski. HybridPattern Self-Testing of Integrated Circuits. In United StatesPatent 005612963A, June 1995.

[9] C. V. Krishna and N. A. Touba. Reducing Test Data VolumeUsing LFSR Reseeding with Seed Compression. In Proceed-ings International Test Conference, pages 321–330, 2002.

[10] I. Pomeranz and S. Reddy. 3-Weight Pseudo-Random TestGeneration Based on a Deterministic Test Set for Combina-tional and Sequential Circuits. IEEE Trans. on Computer-Aided Design of Integrated Circuit and System, Vol. 12:1050–1058, July 1993.

[11] J. Rajski, J. Tyszer, M. Kassab, and N. Mukherjee. EmbeddedDeterministic Test. IEEE Trans. on Computer-Aided Designof Integrated Circuit and System, Vol. 23:776–792, May 2004.

[12] A. P. Strole and H.-J. Wunderlich. TESTCHIP: A Chip forWeighted Random Pattern Generation, Evaluation, and TestControl. IEEE J. Solid-State Circuits, 26:1056–1063, July1991.

[13] S. Wang. Low Hardware Overhead Scan Based 3-WeightWeighted Random BIST. In Proceedings International TestConference, pages 868–877, 2001.

[14] S. Wang, K. J. Balakrishnan, and S. T. Chakradhar. XWRC:Externally-loaded Weighted Random Pattern Testing for In-put Test Data Compression. In Proceedings International TestConference, 2005.

868686

[ieee 16th asian test symposium (ats 2007) - beijing, china (2007.10.8-2007.10.11)] 16th asian test...

Documents