adaptive 3d-ic tsv fault tolerance structure generation · pdf fileadaptive 3d-ic tsv fault...

12
Adaptive 3D-IC TSV Fault Tolerance Structure Generation Song Chen, Member, IEEE, Qi Xu, Student Member, IEEE, Bei Yu, Member, IEEE Abstract—In three dimensional integrated circuits (3D-ICs), through silicon via (TSV) is a critical technique in providing vertical connections. However, the yield and reliability is one of the key obstacles to adopt the TSV based 3D-ICs technology in industry. Various fault-tolerance structures using spare TSVs to repair faulty functional TSVs have been proposed in literature for yield and reliability enhancement, but a valid structure cannot always be found due to the lack of effective generation methods for fault-tolerance structures. In this paper, we focus on the problem of adaptive fault-tolerance structure generation. Given the relations between functional TSVs and spare TSVs, we first calculate the maximum number of tolerant faults in each TSV group. Then we propose an integer linear programming (ILP) based model to construct adaptive fault-tolerance struc- ture with minimal multiplexer delay overhead and hardware cost. We further develop a speed-up technique through efficient min-cost-max-flow (MCMF) model. All the proposed method- ologies are embedded in a top-down TSV planning framework to form functional TSV groups and generate adaptive fault- tolerance structures. Experimental results show that, compared with state-of-the-art, the number of spare TSVs used for fault tolerance can be effectively reduced. Index Terms—3D-IC, fault-tolerance, TSV planning, TSV yield. I. I NTRODUCTION A S device feature sizes continue to rapidly decrease, the interconnect delay is becoming a bottleneck lim- iting IC performance. Three dimensional integrated circuits (3D-ICs) technology involves vertically stacking multiple dies connected by through silicon vias (TSVs), providing a promising way to alleviate the interconnect problem and achieve a significant reduction in chip area, wire-length and interconnect power [1]. Study indicates that the average wire- length of a 3D-IC varies according to the square root of the number of layers [2]. Moreover, 3D-ICs also offer the potential for heterogeneous integration, which is essential for More than Moore (MtM) technology [3]. 3D integration has already seen commercial applications in the form of 3D memory but there are still significant open problems in both research and implementation [4]. In this work, we will focus on the TSV reliability problem. This work was supported in part by the National Natural Science Foun- dation of China (NSFC) under grant No. 61674133, 61404123 and Anhui Provincial Natural Science Foundation (1508085MF134, China), and The Research Grants Council of Hong Kong SAR (Project No. CUHK24209017). S. Chen and Q. Xu are with Department of Electronic Science and Technology, University of Science and Technology of China, China (e-mail: [email protected], [email protected]). B. Yu is with the Department of Computer Science and Engineer- ing, The Chinese University of Hong Kong, NT, Hong Kong (e-mail: [email protected]). TSVs may be affected by various reliability issues such as undercut, misalignment, or random open defects [5]. Because there exist a large number of TSVs in a chip, these issues in turn lead to low chip yield. For example, [5], [6] reported a 60% chip yield for a chip with 20000 TSVs and only 20% yield for 55000 TSVs in IMEC process technology. Since yield and reliability is a primary concern in 3D ICs design, a robust fault-tolerance structure is imperative. In general, there are two types of yield losses in 3D-ICs: the yield loss due to defects in stacked dies and the yield loss due to defects occurred during assembling process [7]. For the former case, it is critical to conduct pre-bond testing to avoid the stacking of defective dies [8]. A number of die/wafer matching and inter-die repair strategies have also been proposed to increase the stack yield [9]–[12]. For the latter case, adding spare TSVs (referred to as s-TSVs) to repair fault functional TSVs (referred to as f-TSVs) is an effective method for enhancing yield. One key problem in TSV fault-tolerance design is the fault- tolerance structure generation, where a number of functional TSVs and one or several spare TSVs are grouped together to provide redundancy. Chen et al.[6] proposed a minimum spanning tree based method to group f-TSVs and form one- fault-tolerance structures. However, the method is difficult to be applied to multiple-fault-tolerance structure generation. Wang et al.[13] presented a regular TSV replacing chain structure that can repair faulty TSVs based on a realistic clustered defect model. Xu et al.[14] further considered the physical information of the TSV groups, and developed an ILP formulation for fault-tolerance structure generation. They model replaceable relations between f-TSVs, so the maximum input-port number of individual multiplexers can be effectively reduced. However, all previous works [13], [14] are under an assumption that a predetermined number of s-TSVs are assigned to each TSV group. To ensure that K common s-TSVs can be allocated to each f-TSV group, in each group f-TSV number is usually quite small, which introduces a large number of TSV groups. Since the total number of s-TSVs is proportional to the TSV group number, it may cause overuse of s-TSVs. To overcome the above issue, in this paper we propose an adaptive fault-tolerance structure, in which the number of tolerant faults is adaptively determined by the distribution of the f-TSVs and their candidate s-TSVs. A set of s-TSVs will be selected from a large amount of candidates. Our adaptive fault-tolerance structure generation method can achieve min- imal multiplexer delay overhead, as well as minimal number of required s-TSVs. Key technical contributions of this work arXiv:1803.02490v1 [cs.AR] 7 Mar 2018

Upload: vohuong

Post on 14-Mar-2018

231 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Adaptive 3D-IC TSV Fault Tolerance Structure Generation · PDF fileAdaptive 3D-IC TSV Fault Tolerance Structure Generation Song Chen, Member, IEEE, Qi Xu, Student Member, IEEE, Bei

Adaptive 3D-IC TSV Fault Tolerance StructureGeneration

Song Chen, Member, IEEE, Qi Xu, Student Member, IEEE, Bei Yu, Member, IEEE

Abstract—In three dimensional integrated circuits (3D-ICs),through silicon via (TSV) is a critical technique in providingvertical connections. However, the yield and reliability is one ofthe key obstacles to adopt the TSV based 3D-ICs technology inindustry. Various fault-tolerance structures using spare TSVs torepair faulty functional TSVs have been proposed in literaturefor yield and reliability enhancement, but a valid structurecannot always be found due to the lack of effective generationmethods for fault-tolerance structures. In this paper, we focuson the problem of adaptive fault-tolerance structure generation.Given the relations between functional TSVs and spare TSVs,we first calculate the maximum number of tolerant faults in eachTSV group. Then we propose an integer linear programming(ILP) based model to construct adaptive fault-tolerance struc-ture with minimal multiplexer delay overhead and hardwarecost. We further develop a speed-up technique through efficientmin-cost-max-flow (MCMF) model. All the proposed method-ologies are embedded in a top-down TSV planning frameworkto form functional TSV groups and generate adaptive fault-tolerance structures. Experimental results show that, comparedwith state-of-the-art, the number of spare TSVs used for faulttolerance can be effectively reduced.

Index Terms—3D-IC, fault-tolerance, TSV planning, TSVyield.

I. INTRODUCTION

AS device feature sizes continue to rapidly decrease,the interconnect delay is becoming a bottleneck lim-

iting IC performance. Three dimensional integrated circuits(3D-ICs) technology involves vertically stacking multipledies connected by through silicon vias (TSVs), providinga promising way to alleviate the interconnect problem andachieve a significant reduction in chip area, wire-length andinterconnect power [1]. Study indicates that the average wire-length of a 3D-IC varies according to the square root ofthe number of layers [2]. Moreover, 3D-ICs also offer thepotential for heterogeneous integration, which is essentialfor More than Moore (MtM) technology [3]. 3D integrationhas already seen commercial applications in the form of 3Dmemory but there are still significant open problems in bothresearch and implementation [4]. In this work, we will focuson the TSV reliability problem.

This work was supported in part by the National Natural Science Foun-dation of China (NSFC) under grant No. 61674133, 61404123 and AnhuiProvincial Natural Science Foundation (1508085MF134, China), and TheResearch Grants Council of Hong Kong SAR (Project No. CUHK24209017).

S. Chen and Q. Xu are with Department of Electronic Science andTechnology, University of Science and Technology of China, China (e-mail:[email protected], [email protected]).

B. Yu is with the Department of Computer Science and Engineer-ing, The Chinese University of Hong Kong, NT, Hong Kong (e-mail:[email protected]).

TSVs may be affected by various reliability issues such asundercut, misalignment, or random open defects [5]. Becausethere exist a large number of TSVs in a chip, these issues inturn lead to low chip yield. For example, [5], [6] reported a60% chip yield for a chip with 20000 TSVs and only 20%yield for 55000 TSVs in IMEC process technology. Sinceyield and reliability is a primary concern in 3D ICs design, arobust fault-tolerance structure is imperative. In general, thereare two types of yield losses in 3D-ICs: the yield loss dueto defects in stacked dies and the yield loss due to defectsoccurred during assembling process [7]. For the former case,it is critical to conduct pre-bond testing to avoid the stackingof defective dies [8]. A number of die/wafer matching andinter-die repair strategies have also been proposed to increasethe stack yield [9]–[12]. For the latter case, adding spareTSVs (referred to as s-TSVs) to repair fault functional TSVs(referred to as f-TSVs) is an effective method for enhancingyield.

One key problem in TSV fault-tolerance design is the fault-tolerance structure generation, where a number of functionalTSVs and one or several spare TSVs are grouped togetherto provide redundancy. Chen et al. [6] proposed a minimumspanning tree based method to group f-TSVs and form one-fault-tolerance structures. However, the method is difficultto be applied to multiple-fault-tolerance structure generation.Wang et al. [13] presented a regular TSV replacing chainstructure that can repair faulty TSVs based on a realisticclustered defect model. Xu et al. [14] further consideredthe physical information of the TSV groups, and developedan ILP formulation for fault-tolerance structure generation.They model replaceable relations between f-TSVs, so themaximum input-port number of individual multiplexers canbe effectively reduced. However, all previous works [13],[14] are under an assumption that a predetermined numberof s-TSVs are assigned to each TSV group. To ensure thatK common s-TSVs can be allocated to each f-TSV group,in each group f-TSV number is usually quite small, whichintroduces a large number of TSV groups. Since the totalnumber of s-TSVs is proportional to the TSV group number,it may cause overuse of s-TSVs.

To overcome the above issue, in this paper we proposean adaptive fault-tolerance structure, in which the number oftolerant faults is adaptively determined by the distribution ofthe f-TSVs and their candidate s-TSVs. A set of s-TSVs willbe selected from a large amount of candidates. Our adaptivefault-tolerance structure generation method can achieve min-imal multiplexer delay overhead, as well as minimal numberof required s-TSVs. Key technical contributions of this work

arX

iv:1

803.

0249

0v1

[cs

.AR

] 7

Mar

201

8

Page 2: Adaptive 3D-IC TSV Fault Tolerance Structure Generation · PDF fileAdaptive 3D-IC TSV Fault Tolerance Structure Generation Song Chen, Member, IEEE, Qi Xu, Student Member, IEEE, Bei

nt2

nt4

nt1

nt3

s1s2

f1

f2

f3 f4

(a)

1 2 3 4 1 2

Signal A Signal B Signal C Signal D

Signal A Signal B Signal C Signal D

(b)

1 2 3 4 1 2

Signal A Signal B Signal C Signal D

Signal A Signal B Signal C Signal D

f-TSV

s-TSV

MUX

(c)

Fig. 1: (a) An example of TSV group with four f-TSVs and two s-TSVs; (b) A fault-tolerance structure with large multiplexerdelay overhead; (c) A regular chain structure.

are listed as follows.• We are able to determine the maximum number of

tolerant faults, denoted as K, in polynomial time.• We present an integer linear programming formulation

in generating the adaptive K-fault tolerance structures.• We further propose an efficient min-cost-max-flow

(MCMF) based heuristic method to speed-up the K-faulttolerance structure generation.

• All the proposed methodologies are embedded in a top-down TSV planning framework to form f-TSV groupsand generate fault-tolerance structures.

Experimental results show that, compared with state-of-the-art, the proposed framework can reduce the number of useds-TSVs and maximum port number of multiplexers.

The remainder of this paper is organized as follows. SectionII presents the motivation and gives the problem formulation.The method for determining the maximum number of tolerantfaults is presented in Section III. Section IV and Section Vpresent the proposed ILP formulation and heuristic method.Section VI describes the proposed fault tolerance TSV plan-ning methodology. Section VII provides experimental results,followed by conclusion in Section VIII.

II. PRELIMINARIES

A. Chip Yield and TSV Yield

Consider a 3D IC containing l layers, and the yield ofith layer die is Ydiei . The yield for wafer-to-wafer (W2W)stacking Ystack can be roughly modeled as [7]:

Ystack =

l∏i=1

(Ydiei) (1)

Therefore, the defects exist in each die will certainly affectthe overall chip yield after stacking.

Besides, during bonding, any foreign particle caught be-tween the wafers can lead to peeling, as well as delamination,which dramatically reduces bonding quality and yield [15].YBonding captures the yield loss of the chip due to faults inthe bonding processes.

According to the cumulative yield property, the yield of a3D chip Y3D−chip can be formulated as follows [7]:

Y3D−chip = Ystack ·l−1∏i=1

(YBonding(i) · YTSV (i)), (2)

where YBonding(i) is the yield of the ith bonding step, andYTSV (i) is the TSV yield in the ith layer. In our work, wefocus on the yield enhancement of 3D chip in terms of TSVyield YTSV [13]. The total TSV yield YTSV is calculated bymultiplying all f-TSV group yield Ygj as follows.

YTSV =

N∏j=1

Ygj , (3)

where N is the number of f-TSV groups. In this paper weadopt the algorithm described in [13] for the calculation ofgroup yield Ygj .

B. TSV Fault-Tolerance Structure

By inserting the multiplexers (including control circuits)and carefully designing the reconfigurable TSV replacingpaths, we can construct TSV fault-tolerance structures, wherethe s-TSVs can be used to transfer signals in the presence offaulty f-TSVs [5].

Given an f-TSV planning result, we know the numberand positions of all f-TSVs. Then we perform a top-downiterative f-TSV partitioning to form f-TSVs groups and allo-cate s-TSVs in the whitespace for each group. The numberand positions of used s-TSVs for each f-TSV group aredetermined simultaneously in the f-TSV partitioning stage.Fig. 1(a) shows an example of a TSV group with four f-TSVs(f1 · · · f4) and two s-TSVs (s1 and s2). Here f1 · · · f4 belongto nets nt1 · · ·nt4, respectively. The dashed large rectanglesrepresent the bounding boxes of different nets. Without lossof generality, we denote the bounding box of an f-TSV fi asthe bounding box of the net fi belonging to. We say that anf-TSV fi can be replaced by another TSV v, if and only if vis located inside or nearby the bounding box of fi. Note thathere the TSV v can be either f-TSV or s-TSV. For example,

Page 3: Adaptive 3D-IC TSV Fault Tolerance Structure Generation · PDF fileAdaptive 3D-IC TSV Fault Tolerance Structure Generation Song Chen, Member, IEEE, Qi Xu, Student Member, IEEE, Bei

nt2

nt4

nt1

nt3

s1

f1

f2

f3

f4

nt5

f5

s3

s2

s4

(a)

Signal A Signal C Signal D Signal E

Signal B Signal C Signal D Signal E

2 3 4 5 1 22

Signal A

1

Signal B

f-TSV

s-TSV

MUX

3

1 2 3 4 5 6

7 8 9 10 11

a b

y

s

s01

yab

MUX

a b

y

s

s0

1

yab

c

0010 c

(b)

Fig. 2: (a) An example of TSV group with five f-TSVs and four s-TSVs, which cannot be handled by previous works; (b) Theadaptive fault tolerance structure generated by our proposed methodology.

f1 is replaceable by f2, f3, s1, s2, since these four TSVs arecovered by the bounding box of f1.

Given a TSV group with some f-TSVs and K s-TSVs, aK-fault tolerance structure includes K independent directedTSV-replacing paths from each f-TSV to s-TSVs. In thisstructure we can repair at most K faulty f-TSVs throughmultiplexer rerouting. For instance, for the TSV group shownin Fig. 1(a), a 2-fault tolerance structure with two s-TSVs canbe generated as in Fig. 1(b), where each f-TSV is directlyconnected to all s-TSVs. Although the design scheme is verysimple, this structure suffers from large delay overhead dueto large multiplexer input size. Some recent works [13], [14]proposed regular K-fault tolerance structure, as shown inFig. 1(c). Here each f-TSV is regularly connected to tworight side neighbouring TSVs and the rightmost f-TSVs areconnected to s-TSVs. Instead of 4-port multiplexers occupiedin Fig. 1(b), here only 3-port multiplexers and 2-port mul-tiplexers are needed. For each f-TSV, the independent TSV-replacing paths are listed as follows.

f1: {f1 → f3 → s1}, {f1 → f2 → f4 → s2}.f2: {f2 → f3 → s1}, {f2 → f4 → s2}.f3: {f3 → s1}, {f3 → f4 → s2}.f4: {f4 → s1}, {f4 → s2}.To ensure the existence of fault-tolerance structures in TSV

groups, the previous works (e.g. [13], [14]) form TSV groupsunder two constraints: (1) K fault-tolerance structures useexactly K s-TSVs and (2) an f-TSV in a group can bereplaced by any s-TSV within the group. Fig. 1(a) shows anexample of TSV group having two-fault tolerance structures,where all the f-TSVs, f1, f2, f3, and f4, can be replacedby both s1 and s2 considering the net bounding boxes.Unfortunately, general cases may violate these constraints.Fig. 2(a) shows a generalized example, where five f-TSVs(f1 · · · f5) and four s-TSVs (s1 · · · s4) are involved. Thereplaceable relations between TSVs are shown in Fig. 3(a).In this TSV group, the constraint (1) is violated since wecannot find two-fault tolerance structures if only two s-TSVsare used. The constraint (2) is also violated even if the groupis partitioned into smaller groups since f2 have no replaceable

s-TSVs. Consequently, the method in [13] cannot generatecost-effective fault-tolerance structures for this TSV group,because f2 has no candidate s-TSVs. The ILP-based methodin [14] cannot generate fault-tolerance structures for thisTSV group since the number of tolerant faults is unknown.However, the f-TSV group definitely includes a two-faulttolerance structure as shown in Fig. 3(c), where three outof four s-TSVs are used in the fault-tolerance structure. Thepossible TSV replacing paths are as follows.

f1: {f1 → s1}, {f1 → f2 → f3 → f4 → s2}.f2: {f2 → f5 → f1 → s1}, {f2 → f3 → f4 → s2}.f3: {f3 → s1}, {f3 → f4 → s2}.f4: {f4 → f3 → s1}, {f4 → s2}.f5: {f5 → f1 → s1}, {f5 → s3}.In reality, there is no essential difference between the

f-TSVs and s-TSVs. Therefore, the existing TSV testingtechnique can be directly adopted to test the f-TSVs and s-TSVs [16]. And the control signal of multiplexers can be setto determine the direction of signal transfer. As shown inFig. 2(b), the control signal of 2-to-1 and 3-to-1 multiplexerare 1-bit and 2-bit, respectively. When all TSVs are fault-freeor existing faulty s-TSVs, the control signals of each multi-plexer are set to transfer signal through their corresponding f-TSVs. But once an f-TSV is faulty, the reconfigurable routingpaths can be determined by the corresponding control signalof multiplexers. For instance, when f-TSV 1 is faulty, thecontrol signals of multiplexer 6 and 7 are set to 0 and 10,causing s-TSV 1 to reroute the signal A.

C. Hardware Cost and Multiplexer Delay Overhead

The hardware cost incurred by the fault-tolerance structurecan be divided into several parts, including the area overheaddue to inserted s-TSVs, related control logic (i.e., MUXes),and re-routing interconnect [13]. And the cost is dominatedby the first two parts [12]. Jiang et al. [17] point out that thearea of control logic is negligible compared with the TSVsize and the TSV manufacturing cost is much larger thanlogic gates. Therefore, in order to reduce the hardware cost,

Page 4: Adaptive 3D-IC TSV Fault Tolerance Structure Generation · PDF fileAdaptive 3D-IC TSV Fault Tolerance Structure Generation Song Chen, Member, IEEE, Qi Xu, Student Member, IEEE, Bei

f5

f2

f3 f4

f1

s1

s3

s2

s4

(a)

f5

f5'f1 f1'

f2

f2'

f3 f3' f4 f4'

s1 s1'

s2 s2'

s3 s3's4

s4'

(b)

f5

f2

f3 f4

f1

s1

s3

s2

(c)

Fig. 3: (a) The corresponding directed graph G of layout in Fig. 2(a); (b)The corresponding splitting graph G′; (c) 2-faulttolerance structure on graph G.

we should reduce the number of s-TSVs used in the fault-tolerance structures.

The delay of a multiplexer is increased along with thenumber of ports. Therefore, a large multiplexer will introducelarge delay overhead. Moreover, the proposed TSV faulttolerance planning is performed in floorplanning stage andwe have no exact timing information. If we minimize themultiplexer delay overhead in this stage, we could alleviatethe timing closure issue in next placement and routing stage.Therefore, in our work, we consider the multiplexer delayoverhead as one of the optimization objectives.

D. Problem Formulation

From the example in Fig. 2, we can see that we confrontnew design challenging if not all s-TSVs can be occupied inconstructing K-fault tolerance structure. Given a TSV groupwith m f-TSVs and n s-TSVs, we first construct a directedgraph G(V,E) consisting of all TSV replaceable relations.Here vertex set V = V1 ∪ V2, where V1 = {fi|i = 1, · · · ,m}is the f-TSVs set and V2 = {si|i = 1, · · · , n} is the s-TSVsset. Besides, the edge set E = {(u, v)|u ∈ V1 ∧ v ∈ V ∧u can be replaced by v}. Given the TSV group in Fig. 2(a),the corresponding replaceable relation graph is shown inFig. 3(a).

We define the problem of TSV fault-tolerance structuregeneration as follows.

Problem 1. Given a TSV group with m f-TSVs and n s-TSVs, and the directed graph G(V,E), we search for themaximum number of tolerant faults K. Then we generate aK-fault tolerance structure, which includes K independentTSV replacing paths (vertex-disjoint) for each f-TSVs, tominimize both the multiplexer delay overhead and the numberof used s-TSVs.

Notice that the yield of the TSV group is evaluated basedon the allocated s-TSVs and the f-TSVs. With the yieldsof the TSV groups, the total TSV yield can be calculatedas discussed in Section II-A. If the target TSV yield is notsatisfied, a TSV group will be selected and partitioned intotwo smaller new TSV groups, where the above TSV fault-tolerance structure generation problem will be solved again.New TSV groups will be iteratively generated until the targetchip yield is satisfied.

III. MAX FLOW BASED METHODOLOGY

Given a TSV group with replaceable relation graph G, wesay the TSV group has a K-fault tolerance structure if eachf-TSV f ∈ V1 has K paths to s-TSV vertices in G. Besides,for each f-TSV f , the paths are vertex-disjoint except the fitself. In this section, we develop a polynomial time algorithmto determine the K value in a TSV group. Our methodologyis based on the Menger’s theorem as follows.

Lemma 1 (Menger’s theorem [18]). Let G be a directedgraph, and let S and T be distinct vertices in G. Then themaximum number of vertex-disjoint S-T paths is equal to theminimum size of an S-T disconnecting vertex set.

Here the S-T disconnecting vertex set represents a vertexset whose removal will cause no paths from any vertex in Sto any vertex in T . According to Lemma 1, for each f-TSVf , the number of vertex-disjoint paths Nd(f) equals to theminimum size of the {f}-V2 disconnecting vertex set in G.For example, in Fig. 3(a), {f2, s1} is a minimum {f1}-V2disconnecting vertex set. Therefore, the number of vertex-disjoint paths, Nd(f1), equals to 2. Based on above lemma,we reach the following theorem:

Theorem 1. Given the replaceable relation graph, the max-imum number of tolerant faults, K, can be determined inpolynomial time, as follows:

K = minf∈V1

{Nd(f)}. (4)

Since vertex-disjoint problem is not easy to model, we per-form vertex splitting on G(V,E) so that it can be transformedto an edge-disjoint problem, which can be appropriatelymodelled in a maximum flow problem. Each vertex u ∈ V issplit into two vertices u and u′, respectively, corresponding tothe vertex’s input and output, and an extra edge (u, u′) withzero cost is also added. A new directed graph G′(V ′, E′) isconstructed as follows.• The vertex set V ′ = V ∪ V ′1 ∪ V ′2 , where V ′1 is the split

vertex set of V1 and V ′2 is the split vertex set of V2.• The edge set E′ = E′1 ∪ E′2, where E′1 = {(u, u′)|u ∈V ∧u′ is the corresponding split vertex of u} and E′2 ={(u′, v)|(u, v) ∈ E(G) ∧ u′ is the corresponding splitvertex of u}. If there is a directed edge from u to vin E(G), a corresponding directed edge from u′ to v isadded in E′(G′).

Page 5: Adaptive 3D-IC TSV Fault Tolerance Structure Generation · PDF fileAdaptive 3D-IC TSV Fault Tolerance Structure Generation Song Chen, Member, IEEE, Qi Xu, Student Member, IEEE, Bei

TABLE I: Notations used in ILP.V , V ′ set of f-TSVs and s-TSVs, set of split f-TSVs and split

s-TSVs

V1, V ′1 set of f-TSVs, set of split f-TSVs

V2, V ′2 set of s-TSVs, set of split s-TSVs

fi, f ′i f-TSV in V1, split f-TSV in V ′

1

sj , s′j s-TSV in V2, split s-TSV in V ′2

E′ set of all edges in graph G′

E′1 set of all splitting edges in graph G′ (fi → f ′

i and sj →s′j )

E′2 set of all replaceable edges in graph G′

(w,w′) edge in E′1 and w in V2

s, t split f-TSV in V ′1 , split s-TSV in V ′

2

v(s,t) binary variable; if a unit flow (path) exists from s to t thenv(s,t) = 1, otherwise v(s,t) = 0

(v, u) edge in E′

x(s,t)vu binary variable; if a unit flow (path) from s to t goes through

edge (v, u), then x(s,t)vu = 1, otherwise x

(s,t)vu = 0

dvu binary variable on edge (v, u); if a unit flow (path) goesthrough edge (v, u), then dvu = 1, otherwise dvu = 0

Based on the splitting graph, the maximum number oftolerant faults K can be determined in polynomial timeby solving a max-flow problem [18] for each f-TSV. Forinstance, given the replaceable relation graph G(V,E) inFig. 3(a), Fig. 3(b) illustrates the splitting graph G′(V ′, E′).The number of edge-disjoint paths for each f-TSV are asfollows, Nd(f1) = 2, Nd(f2) = 2, Nd(f3) = 3, Nd(f4) = 3and Nd(f5) = 3. Since f1 and f2 have only two edge-disjointpaths, the maximum number of tolerant faults, K, equals to2.

The fault-tolerance structure can be generated by findingm×K paths, which begin with each split f-TSV in V ′1 andend with split s-TSV in V ′2 . In addition, all the paths sharingone same source vertex should be edge-disjoint. In the nexttwo sections, we will propose an ILP based algorithm and amin-cost max-flow based heuristic method to generate the K-fault tolerance structure in minimizing both the used s-TSVnumber and the multiplexer delay overhead.

IV. INTEGER LINEAR PROGRAMMING FORMULATION

In this section, we discuss how the K edge-disjoint pathsearch problem can be formulated as an integer programming.For convenience, some notations used in this section are listedin TABLE I.

First, an integer programming formulation in [14] is givento generate the fault-tolerance structures with minimizationof the multiplexer delay overhead.

To model the delay of each multiplexer, it is of importancecalculating indegree of each vertex u ∈ V . As shown inFig. 3(b), the edge (f ′2, f3) is on the path from f ′1 to s′2,as well as the path from f ′2 to s′2. Although the same edge istraversed by two paths, it only increases the indegree of f3by one. Meanwhile, there may be several edges directed intosame TSV vertex on the paths. For instance, due to edges(f ′2, f3) and (f ′4, f3), the indegree of f3 should be increased

by two. Given a vertex u ∈ V , its indegree is calculated bythe following equation:

indegree(u) =∑

v:(v,u)∈E′

min(∑

s∈V ′1 ,t∈V ′

2

x(s,t)vu , 1). (5)

The starting integer programming formulation of fault-tolerance structure generation problem in [14] is shown inFormula (6). The objective function in Formula (6) is tominimize the maximum indegree of all the vertices. Thenumber of binary variables x(s,t)vu is m× n× |E′|, where mis the number of f-TSVs, n is the number of s-TSVs, while|E′| is the number of edges in split directed graph G′. Theconstraint (6a) defines a unit flow from s ∈ V ′1 to t ∈ V ′2 ,which corresponds a path from s, an f-TSV, to t, an s-TSV.The number of this set of constraints is m × n × |V ′|. Theconstraint (6b) ensures that a set of V ′2 paths, which have thesame source s ∈ V ′1 , are edge-disjoint. The number of thisset of constraints is m× (m+ n).

min maxu∈V

indegree(u) (6)

s.t.∑

v:(u,v)∈E′

x(s,t)uv −∑

v:(v,u)∈E′

x(s,t)vu = 1, if u = s,0, if u ∈ V ′ − {s, t},−1, if u = t;

∀s ∈ V ′1 , t ∈ V ′2 ,

(6a)∑t∈V ′

2

x(s,t)uu′ ≤ 1, ∀s ∈ V ′1 , (u, u′) ∈ E′1, (6b)

x(s,t)vu ∈ {0, 1}, ∀(v, u) ∈ E′, s ∈ V ′1 , t ∈ V ′2 . (6c)

Though the integer programming method in [14] can gener-ate K fault-tolerance structures using K s-TSVs, the methodcannot be directly applied for the generation of adaptive fault-tolerance structures, where the number of s-TSVs might belarger than K in K fault-tolerance structures. Then a newinteger programming formulation is proposed to generateadaptive fault-tolerance structures in minimizing both theused s-TSV number and the multiplexer delay overhead. Thenumber of s-TSVs used in the structure can be calculated bythe Equation (7).

usedstsv =∑w∈V2

min(∑

s∈V ′1 ,t∈V ′

2

x(s,t)ww′ , 1). (7)

Based on the above notations, the edge-disjoint path searchproblem can be formulated as the following integer program-ming (8).

Compared with the integer programming (6), in constraint(8a) a new binary variable v(s,t) is introduced to indicatewhether a unit flow (path) exists from source s ∈ V ′1 to sinkt ∈ V ′2 . Besides, a new constraint (8b) is defined to ensurethat there will be K paths from each source s ∈ V ′1 to verticesin V ′2 . The number of this set of constraints is m. By this way,Formula (8) can be applied for any K ≤ n and additionallyminimize the number of required s-TSVs in the structure,while Formula (6) can only be applied for the case K = n.

Page 6: Adaptive 3D-IC TSV Fault Tolerance Structure Generation · PDF fileAdaptive 3D-IC TSV Fault Tolerance Structure Generation Song Chen, Member, IEEE, Qi Xu, Student Member, IEEE, Bei

min {maxu∈V

indegree(u) + usedstsv} (8)

s.t.∑

v:(u,v)∈E′

x(s,t)uv −∑

v:(v,u)∈E′

x(s,t)vu = v(s,t), if u = s,0, if u ∈ V ′ − {s, t},−v(s,t), if u = t;

∀s ∈ V ′1 , t ∈ V ′2 ,

(8a)∑t∈V ′

2

v(s,t) = K, ∀s ∈ V ′1 . (8b)

v(s,t) ∈ {0, 1}, ∀s ∈ V ′1 , t ∈ V ′2 , (8c)(6b)− (6c).

Formula (8) is non-linear due to the min-max-min and min-min operations in the objective function. Through linearizingthe objective function, Formula (8) can be transformed into aninteger linear programming (ILP) Formula (9). For each edge(v, u) ∈ E′, an extra binary variable dvu and extra constraints(9a)-(9c) are introduced to replace the min operation inFormula (5) and (7). Besides, the extra constraint (9d) ensuresthat the indegrees of all TSVs will not be greater than λ1.Another extra constraint (9e) ensures that the number of s-TSVs used in the structure equals to λ2.

min (λ1 + λ2) (9)

s.t. dvu ≥ x(s,t)vu , ∀s ∈ V ′1 , t ∈ V ′2 , (v, u) ∈ E′,

(9a)

dvu ≤∑

s∈V ′1 ,t∈V ′

2

x(s,t)vu , ∀(v, u) ∈ E′, (9b)

dvu ∈ {0, 1}, ∀(v, u) ∈ E′, (9c)∑v:(v,u)∈E′

dvu ≤ λ1, ∀u ∈ V , (9d)

∑(w,w′)∈E′

1

dww′ = λ2, ∀w ∈ V2, (9e)

(6b)− (6c), (8a)− (8c).

For instance, as shown in Fig. 3(b), the blue lines presentedge-disjoint paths for each split f-TSV, and the correspond-ing generated 2 fault-tolerance structure is shown in Fig. 2(b).

V. HEURISTIC FRAMEWORK

For large TSV groups, the ILP based method is very timeconsuming. Consequently, in this section, we propose a min-cost-max-flow (MCMF) based heuristic method to solve theedge-disjoint path problem. The basic idea is to deal with thef-TSVs one by one and, for each f-TSV, a min-cost-max-flowalgorithm is used to find K independent paths. The edge costsare defined to keep the input port number of multiplexer andthe number of s-TSVs as small as possible.

A. Network graph model

In order to find K (K ≤ n) edge-disjoint paths for an f-TSV f ∈ V1, we construct a directed graph Gs(Vs, Es) fromG′ by adding an extra sink vertex t and some edges. Thevertex set Vs contains two portions, Vs = V ′ ∪ {r}, and r isthe sink vertex. The edge set Es = E′ ∪ {V ′2 → r}.

When finding edge-disjoint paths for a certain TSV fi ∈V1, the edge capacities are defined as follows: the capacityof the edge from fi to its splitting vertex f ′i equals to K;while the capacities of all the other edges are set to 1. Thecapacity constraints ensure that we can find up to K edge-disjoint paths from f ′i to s-TSV vertices, which correspondto K independent TSV-replacing chains for the TSV fi.

For the splitting edges corresponding to f-TSVs, the edgecosts are defined as zero while the splitting edges of s-TSVsare defined as follows.

ecs(w,w′) =

0, if (w,w′) ∈ E′1, w ∈ V2, and w hasbeen used.CK , if (w,w′) ∈ E′1, w ∈ V2, and whas not been used.

(10)C is constant, which represents the costs of introducing

a new s-TSV for constructing the fault-tolerance structure.And the edge costs tend to restrict the use of s-TSVs. In theexperiment, we set C to 3 by the experimental results shownin Section VII-A.

For the edges in E′2, which correspond to the replaceablerelations between TSVs, the edge costs are defined as follows.

ecs(u, v) =

0, if (u, v) ∈ E′2 and (u, v) correspondsto a TSV connectionCtc[v], if (u, v) ∈ E′2 and (u, v) does notcorrespond to a TSV connection

(11)In the edge cost function (11), tc[v] is defined to be the

number of edges that end at v and have been used as TSVconnections in the generated partial fault-tolerance structure,that is, the edges that have been traversed by edge-disjointpaths of some other f-TSVs. Therefore, tc[v] corresponds tothe input port number of the multiplexer in the input side ofthe TSV v.

With this edge costs function, firstly, we tend to make fulluse of existing TSV connections to build the edge-disjointpaths for the current f-TSV since it will not increase the inputports of the multiplexers.

Secondly, to minimize the maximum size of multiplexers,the costs of the edges that do not correspond to TSV connec-tions are defined as the exponential function of tc[v].

B. Algorithmic flow of heuristic

The algorithmic flow of the proposed heuristic is summa-rized in Algorithm 1. Because the quality of solution dependson the order of f-TSVs selected, an iterative post-processingstage is used to improve the generated fault-tolerance struc-tures. In the post-processing stage, we randomly select anf-TSV, and define the edge costs based on the TSV paths ofall the other f-TSVs. Then we re-solve the min-cost-max-flowmodel to find edge-disjoint paths for the selected f-TSV. The

Page 7: Adaptive 3D-IC TSV Fault Tolerance Structure Generation · PDF fileAdaptive 3D-IC TSV Fault Tolerance Structure Generation Song Chen, Member, IEEE, Qi Xu, Student Member, IEEE, Bei

f5

f5'f1 f1'

f2f2'

f3 f3' f4 f4'

s1 s1'

s2 s2'

s3 s3'

s

t

(1, 1)

(1, 1) (2, 0)

(1, 1)

(1, 0)(1, 1)

(1, 1)

(1, 0)

(1, 1)

(1, 1)

(1, 1)(1, 1)(1, 0) (1, 0)

(1, 1)

(1, 1)

(1, 1)(1, 1)

(1, 9)

(1, 9)(1, 0)

(1, 9)

(1, 0)

(1, 0)

s4 s4'(1, 1)(1, 9)

(1, 0)

f1 f1'

f2f2'

f3 f3'

s1 s1'

s2 s2'

(a)

f5

f5'f1 f1'

f2f2'

f3 f3' f4 f4'

s1 s1'

s2 s2'

s3 s3'

t

(1, 1)

(1, 3) (1, 0)

(1, 0)

(2, 0)(1, 0)

(1, 1)

(1, 0)

(1, 0)

(1, 0)

(1, 3)(1, 1)(1, 0) (1, 0)

(1, 3)

(1, 3)

(1, 1)(1, 3)

(1, 0)

(1, 9)(1, 0)

(1, 0)

(1, 0)

(1, 0)

s4 s4'(1, 1)(1, 9)

(1, 0)

f5

f5' f1 f1'

f2f2'

f3 f3'

s1 s1'

s2 s2'

(b)

f5

f5'f1 f1'

f2f2'

f3 f3' f4 f4'

s1 s1'

s2 s2'

s3 s3'

t

(1, 1)

(1, 0) (1, 0)

(1, 0)

(1, 0)(1, 0)

(1, 0)

(1, 0)

(1, 0)

(1, 0)

(1, 3)(1, 1)(2, 0) (1, 0)

(1, 3)

(1, 3)

(1, 1)(1, 3)

(1, 0)

(1, 9)(1, 0)

(1, 0)

(1, 0)

(1, 0)

s4 s4'(1, 1)(1, 9)

(1, 0)

f5

f5' f1 f1'

f2f2'

f3 f3'

s1 s1'

s2 s2'

(c)

Fig. 4: Label on edges represents (capacity, cost): (a) The min-cost-max-flow network for f-TSV f ′1, where the two edge-disjoint paths for f ′1: {f ′1→ s1→ s′1} and {f ′1→ f2→ f ′2→ f3→ f ′3→ s2→ s′2}; (b) After solving f ′1, the min-cost-max-flownetwork for f-TSV f ′2, where the two edge-disjoint paths for f ′2: {f ′2 → f5 → f ′5 → f1 → f ′1 → s1 → s′1} and {f ′2 → f3 →f ′3 → s2 → s′2}; (c) After solving f ′1 and f ′2, the min-cost-max-flow network for f-TSV f ′3, where the two edge-disjoint pathsfor f ′3: {f ′3 → s1 → s′1} and {f ′3 → s2 → s′2}.

Algorithm 1 Pseudo code of our heuristic methodInput: A directed graph G′(V ′, E′), which contains m f-TSVs and n s-TSVs.Output: A repairable structure including m × K paths.

1: for f-TSV fi ← 1 to m do2: Construct a directed graph Gs(Vs, Es) for fi;3: . Find K edge-disjoint paths for fi;4: Solve the MCMF model for fi;5: end for6: . Perturb the repairable structure;7: while no coverage do8: Randomly select an f-TSV fi;9: Resolve edge-disjoint paths for fi by MCMF;

10: Record the maximum number of TSV connections onall TSVs;

11: end while

f5

f5'f1 f1'

f2

f2'

f3 f3' f4 f4'

s1 s1'

s2 s2'

s3 s3'

Fig. 5: The generated 2-fault tolerance structure by solvingedge-disjoint paths for all f-TSVs, where the TSV connectionsare shown in solid edges.

procedure is repeated until the multiplexer maximum inputport number keeps unchanged over a predefined threshold

iteration number.

Fig. 4(a) – Fig. 4(c) illustrate the process of the heuristicmethod. We choose the f-TSV f ′1 to start with. The min-cost-max-flow network for f ′1 is shown in Fig. 4(a). All thecosts of edges that end at f-TSVs and s-TSVs are initializedat 1 since there are no any other f-TSV paths and for all v,tc[v] = 0. By solving the min-cost-max-flow, 2 edge-disjointpaths, which correspond to two independent TSV replacingchains for f1, are obtained and the TSV connections (solidedges) in the partial fault-tolerance structure.

With the 2 edge-disjoint paths for f1, the flow network isupdated (edge costs and capacities) for f-TSV f ′2 and shownin Fig. 4(b). The edges that are on the edge-disjoint paths off1 have zero costs. Considering the vertex s1, for example, theedge (f ′1, s1) has zero costs since it has been traversed by theTSV path of f1 while the edges (f ′3, s1) and (f ′4, s1) have acost of 3 because the both edges are not traversed by any TSVpaths of f1 and tc[s1] = 1. A new TSV connection will beintroduced if we use (f ′3, s1) or (f ′4, s1) on the edge-disjointpaths for f2, which increase the input ports of multiplexer inthe input side of the TSV s1. With the updated network, wecan find two edge-disjoint paths from f ′2 to s-TSVs by makinguse of the existing TSV connections as many as possible,which potentially reduces the TSV connections on individualTSVs and minimizes the maximum number of the input portsof multiplexers. The bottom part of Fig. 4(b) shows the TSVconnections in the updated partial fault-tolerance structure.

Repeating the same process until the min-cost-max-flowmodel is solved for all f-TSVs, we obtain 2 edge-disjointpaths from each split f-TSV vertex in V ′1 , f ′1 · · · f ′5, to splits-TSV vertices in V ′2 , s′1 · · · s′3, as shown in Fig. 5. Here thesolid edges are TSV connections.

Page 8: Adaptive 3D-IC TSV Fault Tolerance Structure Generation · PDF fileAdaptive 3D-IC TSV Fault Tolerance Structure Generation Song Chen, Member, IEEE, Qi Xu, Student Member, IEEE, Bei

Determine the number of

tolerant faults for new groups

Allocate s-TSVs to new

groups

f-TSV planning results

Generate fault-tolerance

structure

TSV planning

solution

Calculate the chip yield

Satisfy target

yield?Partition the group

with smallest yield

Y

N

Fig. 6: The flow of the proposed fault tolerance TSV planning.

VI. FAULT TOLERANCE TSV PLANNING

In this section, we discuss a top-down fault toleranceTSV planning framework to form f-TSV groups and generateadaptive fault-tolerance structures. The number of f-TSVgroups is greatly reduced as well as the total number of s-TSVs because of adaptive fault-tolerance structures.

Given an f-TSV planning result and the floorplan of theblocks, we know the number and positions of all f-TSVs.Then f-TSV groups are firstly formed using a top-downiterative f-TSV partitioning under the yield constraint and,then, the adaptive fault-tolerance structures are generated foreach group. In each iteration of the f-TSV partitioning stage,the group with the smallest yield will be partitioned into twonew f-TSV groups using the min-cut bi-partitioning algorithmand the required s-TSVs are also allocated for evaluatingthe group yield. The iterative f-TSVs partitioning is repeateduntil the target chip yield is satisfied. Therefore, the numberand position of required s-TSVs for each f-TSV group aredetermined simultaneously in the f-TSV partitioning stage.

The chip yield is the product of group yield, which dependson the maximum number of tolerant faults (K), the numberof TSVs, and the defect probability of TSVs as discussedin Section II-A. We construct the replaceable relation graphG, whose vertex set includes the f-TSVs in the group andthe corresponding candidate s-TSVs, for computing K andallocating s-TSVs. The maximum number of tolerant faults,K, can be determined in polynomial time by solving a max-flow problem on G, as discussed in Section III. The min-cost-max-flow based heuristic in Section V is used to temporarilygenerate an adaptive K-fault tolerance structure, thus thenumber of required s-TSVs are determined.

Finally, the ILP based method in Section IV and the min-cost-max-flow (MCMF) based heuristic in Section V can beadopted to generate adaptive fault-tolerance structures withminimization of both the multiplexer delay overhead and thehardware cost. Fig. 6 illustrates the proposed TSV planningframework.

In [13], a greedy method is used to partition f-TSVs intogroups and then an ILP formulation is adopted to allocates-TSVs for each group. The generation of fault-tolerancestructure is not considered since they assume regular struc-

tures always exist. In [14], the TSV planning frameworkincludes a top-down partitioning followed by a bottom-upiterative merging (clustering) for reducing the number of f-TSV groups. Then, a min-cost-max-flow based method isused to allocate s-TSVs for each group and an ILP modelis adopted to generate fault-tolerance structures. The samenumber of s-TSVs are allocated to all the f-TSV groups in[13], [14] and, for an f-TSV group, the key point is to ensureenough number of candidate s-TSVs that can be shared byall the f-TSVs in the group. As a result, many small f-TSVgroups are formed, which potentially causes an overuse ofs-TSVs.

Compared with the above mentioned two works, the pro-posed TSV planning framework includes a similar top-downpartitioning stage, but the allocation of s-TSVs during thepartitioning is very different. That is because adaptive fault-tolerance structures with various number of s-TSVs are builttemporarily by solving a sequence of min-cost max-flowproblem.

VII. EXPERIMENTAL RESULTS

The proposed algorithms have been implemented in C++language and tested on a 12-core 2.0 GHz Linux server with64 GB RAM. The TSV pitch is assumed to be 5um×5um[3]. LEDA [19] is adopted to solve the max-flow and themin-cost-max-flow problems. GLPK [20] is used as the ILPsolver. hMetis [21] is adopted on f-TSVs partitioning.

A. Effectiveness and Efficiency of Fault-Tolerance StructureGeneration Method

We generate several TSV replaceable relation graphs G11–G18 by using the proposed TSV planning framework onMCNC and GSRC benchmarks. Each graph contains f-TSVsand the corresponding candidate s-TSVs, which are coveredby at least one of the bounding boxes of the f-TSVs. In orderto compare the proposed ILP model with the ILP methodin [14] on G11–G18, we adapt the ILP formulation in [14]here. To generate the K-fault tolerance structure on a TSVreplaceable relation graph G, we select K s-TSVs in all n s-TSVs, and unit flow constraints are defined from all f-TSVs tothose chosen K s-TSVs. If the K-fault tolerance structure isstill not achieved after solving all K combinations, we thinkthe ILP method in [14] cannot generate the K-fault tolerancestructure on this TSV replaceable relation graph G.

In addition, the previous work in [14] deals with a specialtype of TSV fault-tolerance structure generation. That is, theyare under an assumption that a predetermined number of s-TSVs are assigned to each TSV group, and an f-TSV in agroup should be replaced by any s-TSV within the group. Wealso generate some specific TSV replaceable relation graphsG21–G28 by using the TSV planning methods in [14] onMCNC and GSRC benchmarks. Since the f-TSVs can bereplaced by all n s-TSVs in each graph, the n-fault tolerancestructure always exists.

First, we show the effectiveness of the proposed ILP model.TABLE II shows the experimental results, where “ILP”and “Heuristic” denote results of the proposed ILP model

Page 9: Adaptive 3D-IC TSV Fault Tolerance Structure Generation · PDF fileAdaptive 3D-IC TSV Fault Tolerance Structure Generation Song Chen, Member, IEEE, Qi Xu, Student Member, IEEE, Bei

TABLE II: Comparison between ILP [14] and our methods for generating adaptive fault-tolerance structure.Graph m n #Edges K

ILP [14] ILP Heuristic

#Port #usIWire(um)

(ratio)RT(s) #Port #us

IWire(um)(ratio)

RT(s) #Port #usIWire(um)

(ratio)RT(s)

G11 9 4 72 3 3 3 32.90 (0.51%) 535.20 3 3 32.90 (0.51%) 301.53 3 4 25.88 (0.40%) 0.008G12 13 4 129 2 3 2 6.85 (0.18%) 603.68 3 2 9.65 (0.25%) 67.80 3 4 16.79 (0.43%) 0.013G13 14 4 101 1 NA NA NA >3600 2 4 29.77 (1.77%) 1.09 3 4 28.99 (1.72%) 0.006G14 15 5 177 2 NA NA NA >3600 3 4 32.50 (0.62%) 96.90 3 4 32.71 (0.62%) 0.009G15 18 5 215 2 NA NA NA >3600 3 4 65.20 (0.96%) 240.07 4 5 52.35 (0.77%) 0.013G16 18 6 199 2 NA NA NA >3600 3 6 90.03 (1.76%) 155.74 3 6 98.36 (1.93%) 0.011G17 21 7 255 2 NA NA NA >3600 NA NA NA >3600 4 6 214.39 (1.83%) 0.017G18 26 13 529 4 NA NA NA >3600 NA NA NA >3600 4 12 333.60 (1.47%) 0.038G21 9 5 99 5 4 5 16.34 (0.15%) 100.84 4 5 16.34 (0.15%) 101.10 4 5 16.34 (0.15%) 0.005G22 12 5 155 5 5 5 49.77 (0.25%) 304.91 5 5 49.77 (0.25%) 306.14 6 5 56.26 (0.28%) 0.007G23 14 5 197 5 5 5 10.21 (0.06%) 3435.64 5 5 10.21 (0.06%) 3468.93 5 5 11.84 (0.07%) 0.010G24 16 5 225 5 5 5 108.19 (0.71%) 3519.16 5 5 108.19 (0.71%) 3519.16 7 5 123.18 (0.81%) 0.016G25 18 5 329 5 NA NA NA >3600 NA NA NA >3600 5 5 72.01 (0.26%) 0.016G26 23 6 467 6 NA NA NA >3600 NA NA NA >3600 6 6 45.99 (0.12%) 0.027G27 24 6 550 6 NA NA NA >3600 NA NA NA >3600 6 6 30.65 (0.08%) 0.034G28 25 7 524 7 NA NA NA >3600 NA NA NA >3600 7 7 24.65 (0.06%) 0.037

and min-cost-max-flow based heuristic method, respectively.Columns “m”, “n”, “#Edges”, and “K” list the number off-TSVs, the total number of available s-TSVs, the number ofedges, and the number of maximumly tolerant faults on eachTSV replaceable relation graph. Besides, columns “#Port” and“#us” show the maximum port number of multiplexers and thenumber of s-TSVs used in the generated fault-tolerance struc-ture. “IWire” shows the sum of incremental half-perimeterwirelength overhead of all f-TSVs incurred by the fault-tolerance structure, and the ratio of “IWire” to the sum ofnet wirelength of all f-TSVs is listed in “ratio”. “RT” reportsthe total computational time in seconds. “NA” represents thatthe K-fault tolerance structure cannot be achieved within thetime limit (3600s). As shown in TABLE II, the ILP methodin [14] generates the fault-tolerance structure only on twosmallest graphs. However, the proposed ILP formulation canachieve the fault-tolerance structure on six graphs.

Second, we show the efficiency of the proposed heuristicmethod. TABLE II also compares the proposed heuristicmethod with the proposed ILP method. It can be noticedthat, on small graphs G11–G16 and G21–G24, the fault-tolerance structure generated by ILP has smaller maximumport number of multiplexers and used less s-TSV numbersthan that generated by the heuristic method. Therefore, forsmall TSV replaceable relation graphs, ILP can achieve anoptimal solution, which can be used to verify the accuracy ofthe solution of the heuristic method. But since ILP is an NP-hard problem, its runtime increases dramatically with the sizeof TSV replaceable relation graphs. As shown in TABLE II,the ILP method cannot generate the fault-tolerance structureon large graphs G17–G18 and G25–G28 within the time limit(3600s). Therefore, for large TSV replaceable relation graphs,the ILP based method is very time consuming, which canindirectly demonstrate the efficiency of the proposed heuristicmethod.

In addition, the parameter C in edge cost functions (10)and (11) is also set through experimental results. The experi-ment is performed on MCNC and GSRC benchmarks. In theexperiment, if C is set to 4, some edge cost values are out ofbound, which cannot be solved by min-cost-max-flow based

TABLE III: Effect of C on s-TSV numbers and maximumport number of multiplexers.

Benchmark C = 2 C = 3#s-TSV #Port #s-TSV #Port

ami33 52 4 46 4ami49 80 8 66 6n50 108 7 98 7n100 181 8 169 7n200 267 7 250 7n300 395 8 381 6

model. And we also set C to 2 and 3, the number of useds-TSVs and maximum port number of multiplexers variedwith C, which is shown in TABLE III. Columns “#s-TSV”and “#Port” list the total number of allocated s-TSVs andthe maximum port number of multiplexers among all f-TSVgroups. We noticed that compared with C = 2, C = 3 canachieve a fault tolerance structure with less number of useds-TSVs and smaller maximum port number of multiplexers.Therefore, in the experiment, we set C to 3.

B. Comparison with Previous TSV Fault Tolerance PlanningWork

We use simulated annealing-based multi-layer floorplan-ning [22] to generate the block floorplan and the f-TSVplanning method in [14] to generate f-TSV planning resultas the input to the proposed fault-tolerance TSV planningframework. Based on the same f-TSV planning result, we runthe flow in [13], [14], and the proposed heuristic based frame-work, respectively. The experiment is tested on MCNC andGSRC benchmarks, including two MCNC circuits (ami33and ami49), and four GSRC circuits (n50, n100, n200and n300). We adopt one more industrial 2D design, whichcontains 403266 cells and 448514 nets. hMetis [21] is adoptedto partition the design into several blocks for floorplanning.Based on different block numbers, two benchmark cases,t337 and t469, are generated. That is, t337 has 337 blocksand 1836 nets, while t469 has 469 blocks and 5479 nets.Since the square has the smallest perimeter among all therectangles with the same area [23], here the shapes of all the

Page 10: Adaptive 3D-IC TSV Fault Tolerance Structure Generation · PDF fileAdaptive 3D-IC TSV Fault Tolerance Structure Generation Song Chen, Member, IEEE, Qi Xu, Student Member, IEEE, Bei

TABLE IV: Comparisons among [13], [14], and the proposed adaptive fault-tolerance structure (AFTS) under 3-fault tolerancestructures (target yield = 99.7%, p = 0.001).

Bench #f-TSV[13] [14] AFTS (K≤3) AFTS (maximum K)

#s-TSV #gp Yield #s-TSV #gp #Port K Yield #s-TSV #gp #Port K Yield #s-TSV #gp #Port K Yield

ami33 55 48 16 100% 48 16 4 3 100% 31 2 3 3 100% 46 2 4 4 100%ami49 130 72 24 100% 66 22 5 3 100% 54 2 5 3 99.99% 66 2 6 5 100%n50 386 210 70 99.97% 204 68 7 3 100% 82 5 6 2 99.96% 98 5 7 5 99.98%n100 592 294 98 99.91% 291 97 7 3 99.94% 136 7 6 3 99.91% 169 7 7 6 99.93%n200 1127 396 132 99.86% 393 131 6 3 99.86% 179 8 5 3 99.85% 250 8 7 6 99.86%n300 1232 501 167 99.81% 498 166 6 3 99.83% 246 9 5 3 99.78% 381 7 6 6 99.80%t337 640 315 105 99.90% 309 103 4 3 99.91% 158 8 5 3 99.88% 214 6 6 6 99.90%t469 1546 600 200 99.71% 588 196 6 3 99.73% 313 11 6 3 99.71% 412 9 7 7 99.72%avg. 714 305 102 99.90% 300 100 6 3 99.91% 150 7 5 3 99.89% 205 6 7 6 99.90%ratio – +32.79% – – +31.67% – – – – -26.83% – – – – 1.00 – – – –

TABLE V: Comparisons among [13], [14], and the proposed adaptive fault-tolerance structure (AFTS) under 3-fault tolerancestructures (target yield = 99.5%, p = 0.01).

Bench #f-TSV[13] [14] AFTS (K≤3) AFTS (maximum K)

#s-TSV #gp Yield #s-TSV #gp #Port K Yield #s-TSV #gp #Port K Yield #s-TSV #gp #Port K Yield

ami33 54 51 17 100% 51 17 4 3 100% 35 4 3 3 100% 48 4 4 4 100%ami49 130 87 29 99.96% 81 27 5 3 99.96% 62 5 4 3 99.94% 73 5 5 4 99.95%n50 388 231 77 99.89% 222 74 6 3 99.92% 102 8 5 3 99.88% 113 8 7 5 99.90%n100 589 330 110 99.84% 324 108 6 3 99.87% 165 12 5 3 99.84% 194 11 7 6 99.87%n200 1130 438 146 99.73% 435 145 7 3 99.74% 210 17 6 2 99.72% 280 15 7 6 99.73%n300 1236 555 185 99.62% 549 183 6 3 99.63% 295 20 5 3 99.60% 426 20 6 5 99.61%t337 637 342 114 99.82% 330 110 4 3 99.82% 184 13 4 3 99.78% 227 12 7 7 99.81%t469 1553 645 215 99.55% 633 211 7 3 99.56% 352 25 6 3 99.52% 455 23 7 6 99.55%avg. 715 335 112 99.80% 329 110 6 3 99.81% 176 13 5 3 99.79% 227 12 7 6 99.80%ratio – +32.24% – – +31.01% – – – – -22.47% – – – – 1.00 – – – –

TABLE VI: Comparisons among [6], [13], [14], and the proposed adaptive fault-tolerance structure (AFTS) under 1-faulttolerance structures (target yield = 99.5%).

Bench #f-TSV[13] [6] [14] AFTS (K=1)

#s-TSV #gp Yield #s-TSV #gp #Port Yield #s-TSV #gp #Port Yield #s-TSV #gp #Port Yield

ami33 52 16 16 99.99% 16 16 4 99.99% 16 16 3 99.99% 13 2 2 99.99%ami49 124 28 28 99.95% 25 25 5 99.96% 25 25 4 99.96% 22 3 3 99.95%n50 383 74 74 99.84% 68 68 8 99.87% 68 68 4 99.87% 53 8 3 99.84%n100 596 108 108 99.65% 95 95 8 99.68% 95 95 5 99.68% 78 12 4 99.64%n200 1126 141 141 99.61% 132 132 8 99.64% 132 132 6 99.64% 110 22 5 99.61%n300 1230 197 197 99.51% 183 183 9 99.53% 183 183 6 99.53% 158 31 5 99.51%t337 639 124 124 99.65% 113 113 8 99.67% 113 113 6 99.67% 91 16 5 99.64%t469 1551 252 252 99.50% 236 236 8 99.52% 236 236 6 99.52% 214 40 5 99.50%avg. 713 118 118 99.71% 109 109 8 99.73% 109 109 5 99.73% 93 17 4 99.71%ratio – +21.19% – – +14.68% – – – +14.68% – – – 1.00 – – –

blocks are set to square. The experiment is executed 20 timesindependently for each benchmark.

In fault-tolerance structures, the multiplexers are used toreroute signals, and the delay of a multiplexer is increasedalong with the number of input ports. Besides the hardwarecost incurred by the fault-tolerance structure is related to thenumber of s-TSVs. In this experiment, we compare the num-ber of s-TSVs and the maximum port number of multiplexersof [13], [14], and the proposed TSV planning frameworkunder 3-fault tolerance structures. The layer number is setto 3. The target chip yield is set to 99.7% and the TSVdefect probability p is set to 0.001. The yield results inexperiment are accurate to the fourth decimal place. 3 s-TSVsare assigned to each f-TSV group in [13], [14], that is, themaximum number of tolerant faults K equals to 3.

TABLE IV lists the statistic results averaged over 20independent experiments. All results listed in table satisfythe target chip yield. Column “#f-TSV” represents the totalnumber of f-TSVs. Since the three frameworks are run onthe same f-TSV planning result, the number of f-TSVs is

the same. Columns “#s-TSV”, “#gp”, and “Yield” list thetotal number of allocated s-TSVs, the number of groups, andthe chip yield, respectively. Besides, column “#Port” providesthe maximum port number of multiplexers among all groups,while column K gives the number of tolerant faults in thatgroup, respectively. Since the generation of fault-tolerancestructure is not considered in [13], the maximum port numberof multiplexers is not listed. As shown in TABLE IV, thenumber of f-TSV groups is greatly reduced in the proposedmethod. Compared with [13] and [14], the proposed faulttolerance TSV planning framework can reduce the number ofused s-TSVs by 32.79% and 31.67% on average, respectively.In addition, in the proposed framework, if the maximum K isused for each group, it will cause larger multiplexers. Becausethe maximum number of tolerant faults (K) in adaptive fault-tolerance structures is often much greater than that of [14],which is fixed at 3. As a result, the maximum port number ofmultiplexers is increased accordingly in the generated fault-tolerance structures.

To reduce the size of required multiplexers, we also run

Page 11: Adaptive 3D-IC TSV Fault Tolerance Structure Generation · PDF fileAdaptive 3D-IC TSV Fault Tolerance Structure Generation Song Chen, Member, IEEE, Qi Xu, Student Member, IEEE, Bei

the proposed fault tolerance TSV planning framework withK ≤ 3, that is, we set K to 3 if the maximum number oftolerant faults K in a group is greater than 3. As shown inTABLE IV, compared with [14], the proposed fault toleranceTSV planning framework with K ≤ 3 has comparablemaximum port number of multiplexers. But the required s-TSVs are surprisingly reduced by 50% on average under thesame target yield, as shown in TABLE IV.

The TSV defect probability p in [12] ranges from 0.001to 0.01. In order to see the impact of p on performance, wealso execute the experiment when p is set to 0.01 under 3-fault tolerance structures. The layer number is set to 3. Thetarget chip yield is set to 99.5%. TABLE V lists the statisticresults averaged over 20 independent experiments. All resultslisted in table satisfy the target chip yield. Based on the samef-TSV planning result, we run the flow in [13], [14], and theproposed heuristic based framework, respectively. Comparedwith [13] and [14], the proposed fault tolerance TSV planningframework can reduce the number of used s-TSVs by 32.24%and 31.01% on average, respectively. In order to reduce thesize of required multiplexers, we also run the proposed faulttolerance TSV planning framework with K ≤ 3. As shown inTABLE V, compared with [14], the proposed fault toleranceTSV planning framework with K ≤ 3 has comparablemaximum port number of multiplexers. But the required s-TSVs are surprisingly reduced by 46.50% on average underthe same target yield, as shown in TABLE V.

Besides, in [6], 1-fault tolerance structures are generatedusing minimum spanning tree based method. However, it isdifficult to apply the method to the fault-tolerance structureusing more than one spare TSVs. In addition, the delayoverhead introduced by the multiplexers, which are used forrerouting signals in the generated fault-tolerance structures,is not considered. In the worst-case the input port number ofa multiplexer could be the number of f-TSVs in the groupif the tree is a star structure, which introduces large delayoverhead. In this experiment, we consider 1-fault tolerancestructures case, that is, the maximum number of tolerant faultsK equals to 1. Since the chip yield is lower under 1-faulttolerance structures, the target chip yield is set to 99.5% andthe TSV defect probability p is set to 0.001. And we compare[6], [13], [14], with the proposed heuristic based model under1-fault tolerance structures. One s-TSV is assigned to each f-TSV group in [13] and [14]. And we also set K to 1 inthe proposed fault tolerance TSV planning framework, if themaximum number of tolerant faults K in a group is greaterthan 1. Based on the TSV planning method in [14], we runthe minimum spanning tree method in [6]. Therefore, the s-TSV numbers and chip yield of [6] and [14] are same in theexperiment.

TABLE VI lists the statistic results averaged over 20independent experiments. As shown in TABLE VI, comparedwith [6] and [14], the proposed fault tolerance TSV planningframework can reduce the number of s-TSVs and the max-imum port number of multiplexers when generating 1-faulttolerance structures.

Fig. 7 shows the required s-TSV numbers under varioustarget yields, in comparison among [13], [14], and our pro-

0.991 0.993 0.995 0.997 0.9990

90

180

270

360

Target Yield

#s-T

SV

[13] [14] Ours

Fig. 7: The number of required s-TSVs under various targetyields.

posed framework. The experiment is performed on n100benchmark. Each data point in the figure is an average of 20independent experiments. It can be observed that the numberof required s-TSVs increases along with increasing targetyield and is significantly reduced by the proposed frameworkfor all target chip yields.

VIII. CONCLUSION

In this paper, we focus on the generation of adaptiveTSV fault-tolerance structure. An integer linear programming(ILP) based model and an efficient min-cost-max-flow basedheuristic method are proposed to generate the adaptive fault-tolerance structures in minimizing both the multiplexer delayoverhead and the used s-TSV number. In the end, a fault-tolerance TSV planning methodology is also proposed to pro-vide yield awareness in TSV planning. Experimental resultsshow that, compared with state-of-the-art, the proposed faulttolerance TSV planning methodology can effectively reducethe number of s-TSVs used for fault tolerance.

Besides, in this work, the proposed TSV fault toleranceplanning is performed in floorplanning stage and we haveno accurate timing information. Therefore, we only use thewirelength to reflect the wire delay in floorplanning stage.In future we plan to evaluate the delay more accurately byexecuting time-consuming routing.

ACKNOWLEDGMENTS

The authors would like to thank the Information ScienceLaboratory Center of USTC for hardware and software ser-vices.

REFERENCES

[1] S. J. Souri, K. Banerjee, A. Mehrotra, and K. C. Saraswat, “Multiple Silayer ICs: Motivation, performance analysis, and design implications,”in ACM/IEEE Design Automation Conference (DAC), 2000, pp. 213–220.

[2] J. W. Joyner, P. Zarkesh-Ha, and J. D. Meindl, “A global interconnectdesign window for a three-dimensional system-on-a-chip,” in IEEEInternational Interconnect Technology Conference (IITC), Jun. 2001,pp. 154–156.

[3] “International technology roadmap for semiconductors,” [Online].http://www.itrs2.net.

[4] T. Lu, C. Serafy, Z. Yang, S. K. Samal, S. K. Lim, and A. Srivastava,“TSV-Based 3-D ICs: Design Methods and Tools,” IEEE Transactionson Computer-Aided Design of Integrated Circuits and Systems (TCAD),vol. 36, no. 10, pp. 1593–1619, 2017.

Page 12: Adaptive 3D-IC TSV Fault Tolerance Structure Generation · PDF fileAdaptive 3D-IC TSV Fault Tolerance Structure Generation Song Chen, Member, IEEE, Qi Xu, Student Member, IEEE, Bei

[5] I. Loi, S. Mitra, T. H. Lee, S. Fujita, and L. Benini, “A low-overheadfault tolerance scheme for TSV-based 3D network on chip links,”in IEEE/ACM International Conference on Computer-Aided Design(ICCAD), Nov. 2008, pp. 598–602.

[6] Y.-G. Chen, W.-Y. Wen, Y. Shi, W.-K. Hon, and S.-C. Chang, “Novelspare TSV deployment for 3-D ICs considering yield and timing con-straints,” IEEE Transactions on Computer-Aided Design of IntegratedCircuits and Systems (TCAD), vol. 34, no. 4, pp. 577–588, 2015.

[7] Q. Xu, L. Jiang, H. Li, and B. Eklow, “Yield enhancement for 3D-stacked ICs: Recent advances and challenges,” in IEEE/ACM Asia andSouth Pacific Design Automation Conference (ASPDAC), Feb. 2012,pp. 731–737.

[8] H.-H. S. Lee and K. Chakrabarty, “Test challenges for 3D integratedcircuits,” IEEE Design & Test of Computers, vol. 26, no. 5, pp. 26–35,2009.

[9] C. Ferri, S. Reda, and R. I. Bahar, “Strategies for improving theparametric yield and profits of 3D ICs,” in IEEE/ACM InternationalConference on Computer-Aided Design (ICCAD), Nov. 2007, pp. 220–226.

[10] C.-W. Chou, Y.-J. Huang, and J.-F. Li, “Yield-enhancement techniquesfor 3D random access memories,” in International Symposium on VLSIDesign, Automation, and Test (VLSI-DAT), Apr. 2010, pp. 104–107.

[11] L. Jiang, R. Ye, and Q. Xu, “Yield enhancement for 3D-stacked mem-ory by redundancy sharing across dies,” in IEEE/ACM InternationalConference on Computer-Aided Design (ICCAD), Nov. 2010, pp. 230–234.

[12] L. Jiang, Q. Xu, and B. Eklow, “On effective TSV repair for 3D-stacked ICs,” in IEEE/ACM Proceedings Design, Automation and Testin Eurpoe (DATE), Mar. 2012, pp. 793–798.

[13] S. Wang, M. B. Tahoori, and K. Chakrabarty, “Defect clustering-aware spare-TSV allocation for 3D ICs,” in IEEE/ACM InternationalConference on Computer-Aided Design (ICCAD), Nov. 2015, pp. 307–314.

[14] Q. Xu, S. Chen, X. Xu, and B. Yu, “Clustered fault tolerance TSVplanning for 3D integrated circuits,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), vol. 36, no. 8,pp. 1287–1300, 2017.

[15] Y. Chen, D. Niu, Y. Xie, and K. Chakrabarty, “Cost-effective integrationof three-dimensional (3D) ICs emphasizing testing cost analysis,”in IEEE/ACM International Conference on Computer-Aided Design(ICCAD), Nov. 2010, pp. 471–476.

[16] B. Noia and K. Chakrabarty, Design-for-Test and Test OptimizationTechniques for TSV-based 3D Stacked ICs. Switzerland: Springer,2014.

[17] L. Jiang, Q. Xu, and B. Eklow, “On effective through-silicon via repairfor 3-D stacked ICs,” IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems (TCAD), vol. 32, no. 4, pp. 559–571,2013.

[18] A. Schrijver, Combinatorial Optimization: Polyhedra and Efficiency.Berlin: Springer Science & Business Media, 2002, vol. 24.

[19] K. Mehlhorn and S. Naher, LEDA: A Platform for Combinatorial andGeometric Computing. Cambridge University Press, 1999.

[20] A. Makhorin, “GLPK (GNU linear programming kit),” 2008.[21] G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar, “Multilevel hyper-

graph partitioning: applications in VLSI domain,” IEEE Transactionson Very Large Scale Integration Systems (TVLSI), vol. 7, no. 1, pp.69–79, 1999.

[22] S. Chen and T. Yoshimura, “Multi-layer floorplanning for stacked ICs:Configuration number and fixed-outline constraints,” Integration, theVLSI Journal, vol. 43, no. 4, pp. 378–388, 2010.

[23] ——, “Fixed-outline floorplanning: Block-position enumeration anda new method for calculating area costs,” IEEE Transactions onComputer-Aided Design of Integrated Circuits and Systems (TCAD),vol. 27, no. 5, pp. 858–871, 2008.