selective trigger

8/11/2019 Selective Trigger

1/13

A Selective Trigger Scan Architecturefor VLSI Testing

Mohammad Hosseinabady, Shervin Sharifi, Fabrizio Lombardi, Senior Member, IEEE, and

Zainalabedin Navabi,Senior Member, IEEE

AbstractTime, power, and data volume are among some of the most challenging issues for testing System-on-Chip (SoC) and have

not been fully resolved, even if a scan-based technique is employed. A novel architecture, referred to the Selective Trigger Scan

architecture, is introduced in this paper to address these issues. This architecture reduces switching activity in the circuit-under-test

(CUT) and increases the clock frequency of the scanning process. An auxiliary chain is utilized in this architecture to avoid the large

number of transitions to the CUT during the scan-in process, as well as enabling retention of the currently applied test vectors and

applying only necessary changes to them. The auxiliary chain shifts in the difference between consecutive test vectors and only the

required transitions (referred to as trigger data) are applied to the CUT. Power requirements are substantially reduced; moreover, DFT

penalties are reduced because no additional multiplexer is utilized along the scan path. Data reformatting is applied in order to make

the proposed architecture amenable to data compression, thus permitting a further reduction in test time. It also permits delay fault

testing. Using ISCAS 85 and 89 benchmark circuits, the effectiveness of this architecture for improving SoC test measures (such as

power, time, and data volume) is experimentally evaluated and confirmed.

Index TermsScan test, test data volume, test application time, test power, test compression, delay testing.

1 INTRODUCTION

INTELLECTUALproperty (IP) cores are commonly used fordesigning a System-on-Chip (SoC). Although IP cores canhelp to reduce the design cycle time, they still pose manychallenges when testing is considered. The precomputedtest patterns that are provided by core vendors must beapplied to each core within the power constraints of thewhole SoC. As a system integrator may use a core in

different platforms with diverse test mechanisms (whetherfor on-chip or off-chip implementation), the test mechanismof the core must take into account issues related to datavolume, application time, and power consumption duringtest. Moreover, other models (such as for delay faults) mustbe considered to improve the overall test quality. Acomprehensive solution is very difficult; such a solutionrequires major changes in different parts of the design asprovided by the IP providers. Power and test data volumeare especially challenging:

Test power. The increased use of portable computing

and wireless communication (together with a growing

density and a higher operational frequency) have madepower dissipation an important issue in both the design andtest of VLSI circuits. Power consumption in CMOS circuitscan be static or dynamic. Dynamic power consists ofswitching power and short circuit power. Switching powerresults from the activity of a circuit in changing its statesdue to the charging and discharging of the effective

capacitive loads. Dynamic power significantly contributesto total power dissipation. Switching power dissipation isgiven by PPDynamicDynamic CVCV

2ccff, where C is the capacitance of

the switching nodes,V ccis the supply voltage, andfis theeffective operating frequency. As the activity of the testinput signal is significantly higher than during normaloperation, power dissipation can be substantially higherwhile testing takes place [1], [2]. However, power con-straints are usually defined with respect to the normaloperational mode. Currently, design techniques are em-ployed to reduce power dissipation during the normalmode of operation [2]. The power constraints that areusually considered during design are much lower than the

power consumed during testing [3], thus causing severereliability problems. Furthermore, the current trend towardVLSI circuit miniaturization prevents the use of dissipatingdevices to remove excessive heat generated during test [2].

Test time and test data volume. Application time is one ofthe sources of complexity when testing IP cores as commonlyfound in SoCs. Random or deterministic vectors are eithergenerated by on-chip hardware for built-in self-test (BIST) orprovided by an external tester such as automatic testequipment (ATE) for manufacturing test. For testing logicblocks, the feasibility of on-chip test generation is mainlyrestricted to (pseudo)random methods. In general, the areaoverhead required by a dedicated (on-chip deterministic)

vector generation is rather prohibitivefor manufacturing test.

316 IEEE TRANSACTIONS ON COMPUTERS, VOL. 57, NO. 3, MARCH 2008

. M. Hosseinabady and Z. Navabi are with the Electrical and ComputerEngineering Department, Faculty of Engineering, University of Tehran,North Kargar Ave., Tehran, Iran, 14395/894.E-mail: [email protected], [email protected],[email protected].

. S. Sharifi is with the Computer Science and Engineering Department,University of California, San Diego, 9500 Gilman Drive, La Jolla, CA92093-0404. E-mail: [email protected].

. F. Lombardi is with the Department of Electrical and ComputerEngineering, Northeastern University, 424 Dana, Boston, MA 02115.E-mail: [email protected].

Manuscript received 25 Feb. 2006; revised 15 Feb. 2007; accepted 21 May2007; published online 28 Aug. 2007.Recommended for acceptance by C. Metra.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TC-0075-0206.

Digital Object Identifier no. 10.1109/TC.2007.70806.0018-9340/08/$25.00 2008 IEEE Published by the IEEE Computer Society


2/13

Random vectors are usually generated using linear feedbackshift registers (LFSR), so no storage is required. Responseevaluation is performed using a signature analyzer thatcompacts test responsesinto a signatureand compares it withthe signature of an error-free reference design. LFSRsintroduce a small area overhead;however,testing by randomvectors requires a long application time due to its modest

quality. Hybrid schemes are commonly used to reduce testtimeby reseedingthe LFSRs. Comparedwithrandom testing,deterministic vectors are designed to detect a set of targetfaults. This significantly reduces the number of vectorsrequired forlarge designs. Dueto thehigh overhead incurredin on-chip generation, deterministic vectors are usuallyprovided from an external source (such as an ATE). Theapplication of external data (as for manufacturing test)involves downloading vectors from a storage device to auser interface workstation usually attached to the ATE.Compression has been investigated for resolving some of theproblems associated with SoC testing. Lossless compressionis the process of encoding test vectors such that the original

data can be uniquely reconstructed by a decoder. A basicfeature of lossless compression is to decompose an input dataset into a sequence of events, then to encode the events usingas few bits as possible. Reordering test vectors in combina-tional or full-scan circuits can increase similarity (andredundancy) of data in the vectors and consequentlyincreasethe compression rate. The relatively small I/O pin count isone of the main causes of speed degradation for data transferacross a chip. In general, for a manufacturing test, determi-nistic solutions face many challenges for applicability to SoC;a high storage volume, long application time through theserial paths,and vectorsetsthat aregenerated by third parties(that is, from the IP core providers) with limited information

are some of the unresolved issues associated with an efficienttest of these devices.

1.1 Previous Works

Recent years have seen the development of many techni-ques for overcoming the aforementioned difficulties in VLSItesting. In this section, some of these works are brieflyreviewed.

Power reduction. Circuits are often designed tooperate in two modes: normal and test modes. The testmode usually dissipates more power than the normalmode, especially if a scan mechanism is employed.During the data scan-in process, the difference betweentwo adjacent bits moves through the scan path due to theshift operation; many floating transitions are then appliedto the CUT. Fig. 1 shows this process for shifting theak vector ak;0; ak;1; ak;2; ak;3; ak;4when the ak1 vector is inthe scan chain; (1) can be used to determine the number ofshift-in transitions:

Sk1;k ak10 ak11 2ak11 ak12

3ak12 ak13 4ak13 ak14

5ak14 ak0 4ak0 ak1 3ak1

ak2 2ak2 ak3 ak3 ak4:

1

Many techniques in the technical literature have been

proposed to reduce the number of these transitions for

power dissipation management. These techniques can becategorized as follows:

. transition techniques to reduce the difference be-tween two consecutive test vectors,

. transition techniques to reduce the effect of thedifference between two consecutive bits in the scanchain,

. transition techniques by partitioning to reduce theeffective length of the scan chains,

. techniques to block transitions in a circuit,

. scan reordering techniques, and

. integrated techniques that use two or more of theaforementioned techniques.

Sharifi et al. [4], [5] propose a technique for reducing thepower dissipation in scan-based structures used for testingdigital circuits. This method reduces both the switchingactivity and static power. They propose a scan structure

along with an algorithm to find the optimum configurationof the structure that results in minimum dynamic and staticpower consumptions during the scan mode.

Baik et al. [6] have proposed the use of the so-calledRandom Access Scan (RAS) for the simultaneous reductionin test power, data volume, and application time. Twotechniquestest vector reordering and Hamming distancereductionhave been proposed to reduce the total numberof RAS operations. An MISR has been used for outputresponse compaction. Dabholkar et al. [7] have proposed apost-ATPG process to reduce power dissipation for full-scanand combinational circuits; in this technique, test vector andscan latch orderings are used to reduce power dissipation

during test application. In [8], an ATPG technique thatreduces switching activity (during testing of sequentialcircuits with full-scan) has been presented. The testsgenerated by this ATPG can be used for an at-speed testingof chips and barely dies with no risk of damage due toexcessive heat. Sinanoglu et al. [9] have presented amethodology in which the scan chain is modified byinserting logic gates between the scan cells, thus reducingthe number of transitions. This methodology proposes theintroduction of gate delays to the scan path. Nicolici andAl-Hashimi [10], Saxena et al. [11], and Rosinger et al. [12]have proposed different approaches to divide the scanchain into multiple length-balanced partitions and to enable

only one partition at each test clock; therefore, instead of

HOSSEINABADY ET AL.: A SELECTIVE TRIGGER SCAN ARCHITECTURE FOR VLSI TESTING 317

Fig. 1. Scan cell values during shift operation.


3/13

being all active at the same time, only a fraction of the scancells is active in each test clock cycle. Bhunia et al. [13] haveproposed a solution to reduce test power by insertingblocking logic into the stimulus path of the scan flip-flops toprevent propagation of the scan ripple effect to logic gates.

Ghosh et al. [34] describe a technique for minimizingpower dissipation during scan testing. For a given set of test

vectors, the (local) optimal reordering of the scan cells isfound to minimize a score function; the selected function isa linear combination of power and area overhead. The scancell reordering technique in [35] uses a heuristic algorithmto modify the order of the scan cells within the chain toreduce switching activity.

Chen et al. [14] have presented an integrated approach toreduce both power consumption and test application time.This method is made possible by combining a scanarchitecture (referred to as the multiple clock disablingarchitecture) and several other techniques, including scancell and vector reordering.

Test time.A reduction in test time can be accomplished

using data compression and scan tree techniques.The advantages of compression are twofold: It reduces

storage requirements and decreases test application time (inthis case, a smaller number of vectors are applied to a CUT).Run-length coding, Huffman coding, LempelZiv [18], andarithmetic coding are some of the compression techniquesfound in the technical literature. In run-length coding, asequence of symbols is encoded into two elements (therepeating symbol and the length of the sequence). Run-length coding is efficient for data with long sequences ofequal symbols. Huffman coding is more sophisticated thanrun-length coding. It uses a table of frequencies ofoccurrence to build up an optimal representation of each

character as a binary string. Compression is accomplishedby allocating short codewords to frequent characters andlong codewords to infrequent characters. Jas and Touba [15]have presented a method for using embedded processors asaid to the execution of the test process. The tester loads aprogram along with compressed data into the on-chipmemory; the processor executes a program that decom-presses the data and applies it to the scan chains in the othercomponents of the SoC. The application of this method isrestricted by the assumption of availability of appropriatefunctional units within the SoC architecture. Jas and Touba[16] have introduced a scheme for compression/decom-pression of test data using cyclic scan chains. It captures

repeated patterns in a given set of vectors by applying arun-length coding technique to the difference vectorbetween consecutive vector pairs. Adjacent repeated dataresult in long runs of zeros in the difference vector; thus,they are effectively encoded by run-length coding. How-ever, cyclic and noncontiguous repeating patterns cannot befully used in [16]. Compressed scan data is seriallytransferred from a tester to a chip at the clock speed ofthe tester and is decompressed by an internal decoder. Dueto on-chip data expansion, [16] employs a faster internalclock to avoid overrunning data. Golomb and the relatedRice [19] coding methods are based on data with exponen-tially distributed run lengths. The Golomb and Rice

methods consist of a family of parameterized codes that

can be estimated adaptively [20], thus giving a goodcompression performance. Chandra and Chakrabarty [21]have introduced the application of Golomb coding for testdata compression. A high compression rate, analyticallypredictable compression results, and a low-cost scalable on-chip decoder [24] are the major advantages of Golombcoding [21], [24]. Although the compression results in [21]

are promising, the exponential distribution of 0 runs in thedata is not a guaranteed characteristic in practical VLSI testdata for industrial designs. Lingappan et al. [22] haveinvestigated the use of heterogeneous and multilevelcompression schemes and demonstrated that substantialreductions in test volume are accomplished compared withcurrent test compression techniques. They have alsoinvestigated how these techniques can be efficientlyimplemented by exploiting functional components that arealready present in todays SoCs.

Scan tree design is often used to address different issuessuch as test time [32], [33]. In a scan tree design, the chain isdivided into multiple scan chains and a cell may drive

multiple scan cells. This technique is suitable for test timereduction. In this scheme, the same test data is stored indifferent scan chains; however, this method suffers fromreduced controllability in the design. The effectiveness of ascan tree architecture depends on the correlation betweentest data in the structure of the scan tree. Bonhomme et al.[32] have proposed a scan tree architecture for reducing testapplication time. This technique is based on a dynamicreconfiguration mode, allowing a reduction in the depen-dence between a test set and the final scan tree architecture.This procedure does not require additional test controlinputs and an MISR is used for circuit response compaction.Miyase et al. [33] have proposed a scan tree method formultiple scan inputs; in a multiple scan design, the numberof scan trees is equal to the number of scan inputs. As eachscan input drives a scan tree, then test data volume andapplication time are dominated by the scan tree ofmaximum height. They have also proposed a method fortest compression in multiple scan designs.

1.2 Our Contributions

This work proposes a novel scan architecture that isreferred to as the Selective Trigger Scan Architecture(STSA). This scan architecture uses a triggering (enabling)chain in addition to the data registers. Furthermore,triggering chain hardware is designed to take advantageof similar adjacent data for test compression. Instead of

shifting new serial data into the data registers, thetriggering chain decides where a data flip-flop must toggleor retain its old value. Retaining data causes a small numberof transitions at the data register outputs and low powerdissipation. Along with test reformatting techniques, thisarchitecture can reduce test time and power. It can alsoreduce data volume by enabling the application ofcompression algorithms on its reformatted data. Thisstructure can be used as a core or chip-level design-for-test (DFT) technique. In addition, it is applicable to delayfault testing. When applied at the core level, substantialimprovements in power and test time can be achieved byreformatting the precomputed vectors rather than starting

with a new set of tests.



4/13

The rest of this paper is organized as follows: Section 2describes the proposed scan architecture. Test data refor-matting (as required for generating the vectors for theproposed architecture) is explained in Section 3. Section 4describes the algorithm used for compressing the testvectors. Section 5 describes the test time reduction of theproposed architecture and the application of this architec-ture to delay fault testing. Sections 6 and 7 reportexperimental results and conclusions, respectively.

2 THEPROPOSED ARCHITECTURE

We start explanation of our proposed architecture with asimple example.Let us assumethatV1in Fig.2a is an existingtest vector in a scan chain and V2is the next test vector thatmust be shifted. Comparing V1and V2 transitions (Fig. 2a),there are only three differences in their bits that are callednecessarytransitions. If we were to use a standard scan chainand shift V2 into the scan chain in eight test clocks, with

each shift, transitions shown in Fig. 2b would occur. Forexample, shifting the rightmost 1 ofV2 into the scan chaincauses five transitions in the eight scan flip-flops. Alltogether, shifting V2 would cause 32 transitions that arecalled unnecessary transitions. On the other hand, parallelloading V2 directly into our architecture eliminates un-necessary transitions on the input of a CUT.

Hence, our scan architecture should eliminate theunnecessary transitions. In addition, the following featuresshould be considered for the proposed scan architectureand DFT method:

. A scan architecture should not add extra inputs

compared to a conventional scan approach.. A DFT approach must add no delay to the normal

operation of the circuit.

The proposed architecture, shown in Fig. 3, serves twopurposes. One is to reduce the activity at the data outputsand the second is to facilitate test data compression. Asshown in Fig. 3, this architecture has data registers thatcontain the test data applied to the CUT, a triggering chainwhere the test data is shifted in, and a triggering logiccircuit with an enabling AND that determines how the testdata should be decoded and trigger the data registers.

The triggering chain is for reducing the activity at thedata register outputs. For this purpose, instead of shifting

test vectors into the data registers, triggering data is

obtained by formatting test vectors and shifted into thetriggering chain. For example, if the triggering logic has anidentity function, the current data register is 00101110, andthe new test vector is 01100111, then the triggering chainmust contain 01001001, that is, the bitwise difference of thetwo vectors. This architecture also blocks changes in the testdata from being directly applied to the CUT.

The triggering logic comes into play for compression oftest data. For test compression, instead of shifting thedifference of consecutive test vectors into the triggeringchain, the 0 ! 1and 1 ! 0 transitions of the difference areshifted. The triggering logic is implemented by an XORgate.Fig. 4 shows the structure of a scan cell of the proposed scanchain. In this case, for the two aforementioned test vectors,01101101 should be shifted into the triggering chain. This isformed by the XOR of adjacent bits of the difference vector01001001, starting from the right-hand side.

As shown in Fig. 4, the DR flip-flops is the main storagecell and contains the vector that must be applied to theCUT. TheTR chain provides the data required for selectivetriggering. Testing starts by resetting the DR chain. The TRchain cell has three modes of operation: Shift, Trigger, andNormal. The Shift and Trigger modes are used for testing,


Fig. 2. (a) Necessary transitions between two consecutive test vectors

(three transitions). (b) Total number of unnecessary transitions in

conventional scan architecture (32 transitions).

Fig. 3. Proposed scan-chain architecture.

Fig. 4. Scan cell structure of the proposed scan architecture.


5/13

while the Normal mode is used for normal operation of thecircuit. Table 1 shows the cell configuration in the differentoperational modes.

. In the Shift mode, theEnable signal is low (inactive)and theDR flip-flops remain unchanged. Therefore,the required data can be shifted in the TRchain withno effect on the contents of the DR flip-flops.

. In the Trigger mode, theEnablesignal is high (active)and the multiplexer selects the input connected totheQ output ofDR flip-flops. If the XOR output is 0,

the DR flip-flop value will not change. If the XORoutput is 1, the value of theDR flip-flop is inverted.Therefore, in the Trigger mode, a 1 at the XORoutputof a cell causes an inversion of the value stored in itsDR flip-flop. This is accomplished by storingdifferent values in the TR flip-flops of this cell andits neighboring cell (to the left).

. In the normal mode, theTR chain is loaded with asequence of alternating 1s and 0s (1010... ). Thisactivates the outputs of all XORs; by selecting thenormal input of the multiplexer and setting theEnable signal to the desired value, each cell performsits normal operation. The loading process of the TR

chain with 1010. . . is performed only once, that is,when the test process is completed and the circuitstarts its normal operation. During the test, each newvector is obtained through a vector update cycle.

The timing diagram for a test vector update cycle isshown in Fig. 5. In the Shift mode, the trigger data is shiftedinto theTR chain; this requiresn scan clocks. Aftern clocksin the Shift mode, the cell enters the Trigger mode for asingle clock. In this mode (based on the TR chain data),some of the DR flip-flops invert their values to obtain thenew vector. A test vector is reconstructed in a single testupdate cycle (Fig. 6 shows this process).

As the Enable signal can be generated internally, no

additional pin is required. This signal can be easilygenerated through a pulse after receiving n clock cycles.Moreover, theSelectsignal of the MUX does not require anadditional pin; the same signal in conventional scan chainscan be used to determine the Test and Normal modes. In the

Test mode, the Select signal selects the 1 input of the MUX(from the Q output), while, in the normal mode, it selectsthe other input (which directs the Normal Input to the inputof theDR flip-flop). Therefore, the proposed STSA requiresno additional test pin compared to conventional scanstructures. The output response is not captured in the scanchain; however, there are many techniques available in thecurrent literature [30], [31] that can be used at no loss ofcoverage depending on the application and its constraints.

Significant improvements to SoC testing are achievedusing the proposed architecture; these improvements are

described in more detail here.Power reduction. One of the main features of the

proposed architecture is to prevent unnecessary transitionsfrom affecting the CUT by altering a conventional scanchain. As the required transitions are only a small portion ofthe total transitions made during scanning, a reduction intransitions will affect power consumption. This is accom-plished using the so-called trigger data; in the proposedarchitecture, trigger data is transferred from the TR chainbecause its transitions have a less significant effect onpower dissipation. A transition in the TR chain will affectonly one XOR gate (that is, a low-power consumption gate,as described in Section 3). Cells in the TRchain change only

in the Shift mode (in this mode, the Enable signal is 0).Therefore, transitions in the TR chain can only affect theXORgate and are masked by the 0 input of the AND gate.The so-called TR transitions have substantially less impactthan the transitions in the DR chain. The proposedarchitecture also reduces the number ofTRchain transitionsby using XORs. The use ofXORs makes it possible to sendthe same difference information with a smaller number oftransitions in the TR flip-flops. Without the XOR gates,enabling aDR flip-flop at a specific position would requireshifting a 1 to that position in the TRchain, thus resulting intwo transitions (a high-to-low and a low-to-high transition)per shift passing through the TR chain. When using XORs,

only the difference between two adjacent TR flip-flops at a


TABLE 1Different Operational Modes of the New Cell

Fig. 5. Timing diagram of a test vector update cycle.

Fig. 6. Test session state diagram.


6/13

given position is required, that is, only one transition mustpass through the TR chain to reach the specified position.Although it may seem that adding XORs could result in asignificant area overhead, these gates can be efficientlyimplemented in terms of area. As most flip-flops provideboth the Qand its complementQ0, an XORfunctionality can

be provided by using only two transistors in a pass-transistor-based structure, as shown in Fig. 7a. Thisimplementation of the XOR not only has a small impacton area, but its effects on power consumption are negligibledue to its pass-transistor-based structure. As shown inFig. 7, the logic required for providing the Clk Enablesignalto each DR flip-flop requires only six transistors toimplement the functionality of both the XOR and ANDgates in Fig. 4. Strollo et al. [23] have introduced newdesigns for gating the clock in D-type flip-flops to reducepower consumption. These D-type flip-flops are best suitedfor applications in which the data switching activity is low,such as with the TR chain. These D-type flip-flops use a

comparator to determine whether the new data is identicalto the previous state of the flip-flop. If not, the comparatoroutput activates the clock and the new data is clocked in.Else, the clock of the D-type flip-flop is disabled by thecomparator. The comparator is an XOR gate in which itsinputs are connected to the flip-flop inputs and output lines(Fig. 8c). As mentioned previously, the switching activity intheTR chain is low, so this D-type flip-flop can be used toreduce the power consumption when scanning the data intheTR chain. As shown in Fig. 8d, the XOR gate comparesthe input of theTRcell with its output. With the D-type flip-flop in Fig. 8b in the TR cells, the XOR gate can be removedby using the output of the built-in XOR of the D-type flip-

flop. The inverter used for inverting the D input is notrequired because theD input of each cell is theQ output ofthe previous cell (whose inverted value is also available).Fig. 9 shows the HSPICE simulation of the proposed scanchain with three scan cells in Fig. 8a. This simulation is for180nmtechnology and a voltage supply of1.2v.Q1,Q2, andQ3are the outputs of the data registers;SIis the serial inputof the TR chain. SO1, SO2, and SO are the outputs of thethree triggering registers. In this simulation, the dataregisters have the value of 000 and the data shifted intothe triggering registers is 110, which corresponds to the010 test vector. After shifting the data to the TRchain, thenew test vector is triggered in the DR register when the

Enable signal is active (that is, Trigger mode).

Data compression.Data compression techniques usuallycompress the difference between successive vectors and

then use an on-chip hardware to regenerate the originalvectors from the difference information [24], [25]. Both [24]and [25] use cyclical scan registers (CSRs) to reconstruct theoriginal vectors from the difference vectors. A CSR consistsof anXORand a scan chain of length n(nis the length of theinternal scan chain of the circuit). In STSA, the additionalchain is used differently to significantly improve featuressuch as power utilization, execution time together withdelay testing, and high compression. The proposed archi-tecture can be directly used to reconstruct the original testvectors from the difference information; this is accom-plished by triggering theDRcells as they must be flipped to


Fig. 7. (a) A two-transistor XOR.(b) The structure used for implementing

the Clk Enable signal of the DR flip-flops.

Fig. 8. Low-power flip-flop structure [23]. (a) A scan chain with three

cells. (b) Normal negative edge triggered flip-flop. (c) Comparator and

inverter. (d) Gating logic.

Fig. 9. SPICE simulation of the proposed scan chain.


7/13

construct the next test vector. This is made possible byproperly filling the TR chain such that the desired DR flip-flops are enabled when entering the Trigger mode. Thisprocess requires reformatting the difference information tomake the test data compatible with this architecture. Thiscapability makes the STSA suitable for algorithms that arebased on compressing difference vectors. Test data com-

pression can further reduce test time and also the memoryrequirements of the tester.

3 TESTREFORMATTING AND COMPRESSION

Test vectors should be reformatted for use in the proposedarchitecture and to generate the original vectors at theinputs of the CUT (that is, theDR outputs of the scan cells).This section describes the process of generating test data forthe proposed architecture from vectors provided for acircuit with a traditional scan. Changes are made to the testdata to address power consumption while improving testtime and data volume. The process of reformatting test

vectors consists of the following steps.

3.1 Test Vector Reordering

In the first step, test vectors are reordered to further reducethe number of transitions in the DRchain. In the reorderingprocess, similar vectors are placed next to each other toreduce the number of transitions between consecutive testvectors, hence also reducing the number of total transitionsresulting from the entire test set. This technique is usuallyused in data compression to reduce the number of 1s in thedifference vectors. The Hamming distance is used as ameasure of transition activity to estimate power consump-tion. A complete undirected graph is generated with test

vectors as nodes. Then, the Hamming distance betweeneach pair of vectors is assigned as the weight of the edgeconnecting them. The solution to this problem consists offinding a path that traverses all nodes with minimumoverall weight. This corresponds to the well-knownproblem of finding the minimal tour for the travelingsalesman, which has been proven to be NP-complete [26].Some heuristics can be used to find nearly optimal solutionsin polynomial time complexity [29]. When the initial orderof the test vectors is important (for example, for delay faulttesting or sequential circuits), reordering should carefullyconsider the limitations of those particular cases.

3.2 Extracting the Difference Vectors

The second step extracts the difference vectors from thereordered vectors. The difference vectors show the posi-tions in which two consecutive vectors differ. Dk denotesthe difference vector of two vectorsVk1 andVk and can beeasily calculated as

Dk Vk1 Vk: 2

D0 denotes the difference between the initial state of thescan chain and the first test vector V0. Difference vectorsshow where transitions are required to produce the desiredtest vectors, that is, the positions of the scan chain in whichtheDR flip-flops should be enabled in the Trigger mode to

invert their values.

3.3 Generating the TR Chain Scan Data

In the Trigger mode, the DR flip-flops for the positions inwhich transitions are required must be enabled; so,different values should be stored in the two TR flip-flopsconnected to the XOR gate for enabling the correspondingDR flip-flop. The 1s in the difference vectors should betranslated to transitions in the new scan data. Let ak;i andNk;i represent the bits at the ithpositions in the kthvectorsof the original test set and the test set generated for theproposed architecture; Dk;i represents the ith bit of the

difference vector of the kth and k 1th original testvectors. The first bit in the new vector N0;0 can be selectedas 1 or 0. Other bits can be calculated as follows:

Nk;i Nk;i1 Dk;i: 3

Using (2), the new vectors are calculated from thereordered vectors by the following equation:

Nk;i Nk;i1 ak1;i ak;i: 4

For conversion, the first bit of each vector Nk;0 can bearbitrarily selected. However, this may increase the numberof transitions if the last bits of some vectors are differentfrom the values selected for the first bit of the next vector.

Therefore, the last converted bit of each vector is used tostart the next converted vector.

Fig. 10 shows the process of converting the original testvectors to new vectors, as required by the proposedarchitecture. Each vertical arrow indicates the conversionof a 1 in the difference vector to a transition in theTR chainscan data.

4 COMPRESSION/DECOMPRESSION

4.1 Compression

Jas and Touba [25] and Chandra and Chakrabarty [24] usedifferent run-length codes to compress the difference

between successive test vectors; however, both of thesetechniques use CSRs to reconstruct the original test vectorsfrom the difference vectors and they do not consider powerconsumption during testing. The proposed STSA can beused for reconstructing test vectors from the differencevectors. A compression technique similar to [24] is utilizedto reduce the data volume. The reformatted vectorspreserve the information in the difference vectors. Sincesuccessive vectors are highly correlated, then the newvectors are likely to contain long runs of 0s and 1s and theycan be compressed by a run-length encoding technique.

As a Golomb-like coding technique is used in this paper,then Golomb coding is briefly described next. The most

importantparameter of Golombcoding is thegroupsizem.


Fig. 10. Generating TR chain vectors from reordered test vectors.


8/13


9/13

5.3 Delay Faults

Delay faults are those faults that affect the timing of thecircuit without changing its logic operation. In a delay fault,the traversal of one or more paths (not necessarily thecritical path) exceeds the clock period. Testing a delay faultrequires placing the appropriate transition at the input ofthe path and appropriately setting the required off-path

inputs of those gates located on the path under test.Moreover, a circuit should be clocked at-speed after theapplication of each vector. The transition on the circuitinputs requires the application of different vectors at twoconsecutive clocks. Therefore, delay fault testing consists ofvector pairs. However, conventional scan cells cannot applytwo arbitrary vectors in two consecutive clock cycles. Theproposed scan cell allows for the application of arbitraryvector pairs, hence making possible testing of delay faults.This is accomplished as follows: Assume that the testgenerated for a delay fault includes the vector pair (V1; V2).First, the DR chain is loaded with V1. Then, the TR vector thatis required for changing theDRvector V1to V2is shifted intothe E chain. At the next clock edge, the DR vector V1 ischanged to V2. The proposed scan architecture allowsapplying two arbitrary vectors to theCUT in two consecutiveclocks. Using the HSPICE simulation of the scan-chain inFig. 8a, Fig. 13 shows this scenario for delay fault testing.Another method for delay fault testing is the so-calledenhanced scan technique [27]. However, this increases thepath delay during normal operation compared to theenhanced scan; the proposed scan architecture not onlyallows a reduction of data volume, time, and power, it alsodoes not increase the delay during normal operation whentesting for delay faults.

6 EXPERIMENTAL RESULTS

The proposed architecture has been evaluated using fullyspecified vectors generated for some of the ISCAS 85 and 89benchmark circuits [28]. The full-scan versions of thesequential benchmark circuits and predefined fully speci-fied test vectors are utilized. The use of partially specifiedtest vectors would provide an evaluation with improvedresults. In all tables, n andp represent the scan chain lengthand the number of test vectors in the test set, respectively.Table 2 shows the number of necessary and unnecessarytransitions for the application of the test sets to the circuitinputs. TTB denotes the total number of unnecessarytransitions in the scan chain. TNB and TNA denote thenumber of necessary transitions required for the applicationof the same test vectors to the inputs of the CUT before andafter vector reordering, respectively. The last column of thistable shows the efficiency of the reordering algorithm. Insome cases, scan vector reordering significantly affects thenumber of transitions.

The test vectors are reordered to reduce the number oftransitions between vector pairs; this is accomplished byconstructing a complete graph with test vectors as nodesand the Hamming distance between the two connectedvectors as the weight of the edge between two nodes; the

Lin-Kernighan Heuristic (LKH) algorithm [29] is then run

on this complete graph. The execution time of thisalgorithm is On2:2[29].

Fig. 14 shows the runtime of the reordering algorithm for

the test vectors of the benchmark circuits.In a conventional scan design, all transitions occurring inthe scan chain are directly applied to the CUT. However, inthe STSA, only the DR transitions are applied to the CUT.As power consumption is proportional to the number oftransitions, a substantial reduction is achieved. As shown inTable 3, the transitions occurring in the logic circuitry of theCUT are reduced using the Selective Trigger architecture.The number of transitions in the flip-flops is also less thanin a conventional scan design. In this table, STTis the totalnumber of transitions in a conventional scan design. STDRand STTR are the numbers of transitions in the DR andTR cells in the proposed STSA when applying the same test

set. LTT and LTST are the number of transitions in the


Fig. 13. Timing diagram of delay testing with the proposed scan

architecture.


10/13


11/13

applicability to delay fault testing. This is achieved by

enabling the application of two arbitrary test vectors to the

CUT in two consecutive clock cycles. The process of test

data reformatting for this new architecture has been

presented; a Golomb-like coding compression algorithmhas been proposed and analyzed. Experimental results for

various ISCAS benchmark circuits have confirmed the

effectiveness of the proposed approach on predefined fully

specified vectors. When embedded in a core by the

provider, this architecture can be used by an integrator

for different scan-based strategies in testable SoC designs.


TABLE 3Number of Transitions in SCAN Cells and Combinational Logic

TABLE 4Percentage of Compression Obtained by Golomb-Like Coding

TABLE 6Comparison between ATE Clock of theProposed Method and Traditional Scan

TABLE 5HSPICE Simulation


12/13


13/13

selective trigger

Documents