multi-functional interconnect co-optimization for … interconnect co-optimization ... integrated...

7
Multi-functional Interconnect Co-optimization for Fast and Reliable 3D Stacked ICs Young-Joon Lee, Rohan Goel, and Sung Kyu Lim School of Electrical and Computer Engineering, Georgia Institute of Technology 777 Atlantic Drive NW, Atlanta, Georgia 30332, U.S.A. [email protected] ABSTRACT Heat removal and power delivery have become two major relia- bility concerns in 3D stacked IC technology. For thermal prob- lem, two possible solutions exist: thermal-through-silicon-vias (T- TSVs) and micro-fluidic channel (MFC) based liquid cooling. In case of power delivery, a highly complex power distribution net- work is required to deliver currents reliably to all parts of the 3D stacked IC while suppressing the power supply noise to an ac- ceptable level. However, these thermal and power networks pose major challenges in signal routability and congestion. This is be- cause the signal, power, and thermal interconnects are all compet- ing for routing space. In this paper, we present a co-optimization methodology for the signal, power, and thermal interconnects for 3D stacked ICs based on design of experiments (DOE) and re- sponse surface method (RSM). The goal is to improve performance, thermal, noise, and congestion metrics with our holistic approach. We also provide in-depth comparison between T-TSV and MFC based cooling method and discuss how to employ DOE and RSM to best co-optimize the multi-functional interconnects simultane- ously. Categories and Subject Descriptors B.7.2 [Hardware]: Integrated Circuits—Design Aids General Terms Design, Reliability Keywords 3D stacked IC, through-silicon-via, micro-fluidic channel, design of experiments 1. INTRODUCTION This material is based upon the work supported by the National Science Foundation under CAREER Grant No. CCF-0546382, the Center for Circuit and System Solutions (C2S2), and the Intercon- nect Focus Center (IFC). Today, it is widely accepted that three-dimensional (3D) system integration is a key enabling technology and has recently gained significant momentum in the semiconductor industry. One of the core technologies, through-silicon-via (TSV), plays a vital role in 3D integration. TSVs provide high density connections between adjacent dies and allow stacking of processor dies with memory dies or stacking heterogeneous dies. In 3D stacked ICs, the average and the maximum distance between transistors are greatly reduced, which translates to significant savings on delay, power, and area [2]. However, heat removal and power delivery have become two ma- jor reliability concerns in 3D stacked IC technology. Many efforts have been made to solve heat removal and power delivery con- cerns in the 3D stacked IC technology. Thermal management using thermal-TSVs (T-TSVs) has been proposed as a solution to the heat problem [3]. Also, liquid cooling based on micro-fluidic channels (MFCs) has been proposed as a viable solution to dramatically re- duce the operating temperature of 3D stacked ICs [4]. With regards to power supply noise management, designers use a highly com- plex hierarchical power distribution network to deliver currents to all parts of the 3D stacked IC while simultaneously suppressing the power supply noise to an acceptable level. These so called silicon ancillary technologies, however, pose major challenges to routing completion and congestion because these large interconnects need to be routed together with billions of smaller signal interconnects. Since these interconnects interact in a complex manner, optimiz- ing one after another may lead to local optimal designs. Thus, co-optimization of these interconnects with a holistic approach is highly called for. The major contributions of this work are as follows: To the best of our knowledge, this is the first work that com- pares the effectiveness of T-TSV and MFC based liquid cool- ing for 3D stacked ICs. We demonstrate the strengths and weaknesses of these two thermal management techniques. We compare 2D vs. 3D design characteristics with a real pro- cessor design and demonstrate the need for more powerful thermal management techniques for 3D stacked IC designs. We show how to co-optimize signal, power, and thermal in- terconnect geometries using the methods of Design of Exper- iments (DOE) and Response Surface Methodology (RSM). We show how to deploy DOE and RSM to efficiently co- optimize the multi-functional interconnects for high perfor- mance and reliable 3D stacked IC designs. The remainder of this paper is organized as follows: In Section 2, routing requirements and ways of computing metrics of the three kinds of interconnects are discussed. Section 3 discusses the details Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICCAD’09, November 2–5, 2009, San Jose, California, USA. Copyright 2009 ACM 978-1-60558-800-1/09/11...$10.00. 645

Upload: truonghuong

Post on 08-Mar-2018

247 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Multi-functional Interconnect Co-optimization for … Interconnect Co-optimization ... Integrated Circuits ... 2.2 Power Interconnect Ina3Dstacked IC,powerisfedfromthepackage through

Multi-functional Interconnect Co-optimizationfor Fast and Reliable 3D Stacked ICs∗

Young-Joon Lee, Rohan Goel, and Sung Kyu LimSchool of Electrical and Computer Engineering, Georgia Institute of Technology

777 Atlantic Drive NW, Atlanta, Georgia 30332, [email protected]

ABSTRACTHeat removal and power delivery have become two major relia-bility concerns in 3D stacked IC technology. For thermal prob-lem, two possible solutions exist: thermal-through-silicon-vias (T-TSVs) and micro-fluidic channel (MFC) based liquid cooling. Incase of power delivery, a highly complex power distribution net-work is required to deliver currents reliably to all parts of the 3Dstacked IC while suppressing the power supply noise to an ac-ceptable level. However, these thermal and power networks posemajor challenges in signal routability and congestion. This is be-cause the signal, power, and thermal interconnects are all compet-ing for routing space. In this paper, we present a co-optimizationmethodology for the signal, power, and thermal interconnects for3D stacked ICs based on design of experiments (DOE) and re-sponse surface method (RSM). The goal is to improve performance,thermal, noise, and congestion metrics with our holistic approach.We also provide in-depth comparison between T-TSV and MFCbased cooling method and discuss how to employ DOE and RSMto best co-optimize the multi-functional interconnects simultane-ously.

Categories and Subject DescriptorsB.7.2 [Hardware]: Integrated Circuits—Design Aids

General TermsDesign, Reliability

Keywords3D stacked IC, through-silicon-via, micro-fluidic channel, designof experiments

1. INTRODUCTION∗This material is based upon the work supported by the NationalScience Foundation under CAREER Grant No. CCF-0546382, theCenter for Circuit and System Solutions (C2S2), and the Intercon-nect Focus Center (IFC).

Today, it is widely accepted that three-dimensional (3D) systemintegration is a key enabling technology and has recently gainedsignificant momentum in the semiconductor industry. One of thecore technologies, through-silicon-via (TSV), plays a vital role in3D integration. TSVs provide high density connections betweenadjacent dies and allow stacking of processor dies with memorydies or stacking heterogeneous dies. In 3D stacked ICs, the averageand the maximum distance between transistors are greatly reduced,which translates to significant savings on delay, power, and area[2].

However, heat removal and power delivery have become two ma-jor reliability concerns in 3D stacked IC technology. Many effortshave been made to solve heat removal and power delivery con-cerns in the 3D stacked IC technology. Thermal management usingthermal-TSVs (T-TSVs) has been proposed as a solution to the heatproblem [3]. Also, liquid cooling based on micro-fluidic channels(MFCs) has been proposed as a viable solution to dramatically re-duce the operating temperature of 3D stacked ICs [4]. With regardsto power supply noise management, designers use a highly com-plex hierarchical power distribution network to deliver currents toall parts of the 3D stacked IC while simultaneously suppressing thepower supply noise to an acceptable level. These so called siliconancillary technologies, however, pose major challenges to routingcompletion and congestion because these large interconnects needto be routed together with billions of smaller signal interconnects.Since these interconnects interact in a complex manner, optimiz-ing one after another may lead to local optimal designs. Thus,co-optimization of these interconnects with a holistic approach ishighly called for.

The major contributions of this work are as follows:

• To the best of our knowledge, this is the first work that com-pares the effectiveness of T-TSV and MFC based liquid cool-ing for 3D stacked ICs. We demonstrate the strengths andweaknesses of these two thermal management techniques.

• We compare 2D vs. 3D design characteristics with a real pro-cessor design and demonstrate the need for more powerfulthermal management techniques for 3D stacked IC designs.

• We show how to co-optimize signal, power, and thermal in-terconnect geometries using the methods of Design of Exper-iments (DOE) and Response Surface Methodology (RSM).We show how to deploy DOE and RSM to efficiently co-optimize the multi-functional interconnects for high perfor-mance and reliable 3D stacked IC designs.

The remainder of this paper is organized as follows: In Section2, routing requirements and ways of computing metrics of the threekinds of interconnects are discussed. Section 3 discusses the details

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ICCAD’09, November 2–5, 2009, San Jose, California, USA.Copyright 2009 ACM 978-1-60558-800-1/09/11...$10.00.

645

Page 2: Multi-functional Interconnect Co-optimization for … Interconnect Co-optimization ... Integrated Circuits ... 2.2 Power Interconnect Ina3Dstacked IC,powerisfedfromthepackage through

(a) With T-TSVs

(b) With MFCs

20

6

210

100 100

metal

layers

bonding

layer

bulk Sigates decap

power TSV ground TSV T-TSVsignal TSV

60

6

10

10100 100

metal

layers

bonding

layer

bulk Si

gates decap

power TSV ground TSV signal TSV

MFC

Figure 1: Side view of dies in 3D stacked ICs. (a) shows a die withT-TSVs, and (b) shows a die with MFCs. Shapes are drawn to scalebased on our default settings. Unit is μm.

of DOE and RSM. Experimental results are presented in Section 4,followed by the conclusions in Section 5.

2. THERMAL, POWER, AND SIGNAL IN-TERCONNECTS

2.1 Thermal Interconnect3D stacked ICs bring several challenges in thermal management.

By stacking layers, the power consumption per unit footprint areais significantly increased. In addition, the interior layers of 3Dstacked ICs are thermally detached from the heat sink. Heat trans-fer is further restricted by the interlayer dielectric with low thermalconductivity and oxide-based bonding layers. Figure 1 shows twopossible solutions to the thermal problem.

2.1.1 Thermal-Through-Silicon-ViasOne way to dissipate heat in 3D stacked ICs is to insert T-TSVs

in white spaces. T-TSVs do not provide any electrical or rout-ing functionality. They are primarily inserted to help decrease thesilicon temperature by lowering the inter-layer thermal resistance,hence providing more thermally conductive paths to the heat sink.T-TSVs are assumed to have the same dimensions as signal TSVs,yet they go through the entire die (via-last TSVs), unlike signalTSVs, as shown in Fig. 1 (a). Previous works considered T-TSV in-sertion during floorplanning [10], placement [5], and routing [3]. Inour case, we perform T-TSV insertion after signal routing. This en-sures that T-TSVs do not act as obstacles during signal and power/g-round (P/G) routing.

Our thermal analysis is based on the finite element analysis, whe-re the entire 3D stacked IC is modeled with a 3D thermal meshstructure. To calculate the thermal conductivity of each thermaltile, we calculate the area ratio of copper to silicon. After globalplacement and routing phases are completed, we estimate the areaavailable for T-TSV insertion. For this, the contents of each globalplacement and routing grid are projected onto the x-y plane and thefollowing equation is used:

Awhite = Attile−(Awire +Astsv +Apgtsv +Agate−Aov) (1)

Here, Awhite is grid white space, Attile is thermal tile area, Awire

is estimated wire area, Astsv is signal TSV area, Apgtsv is P/G TSV

area, Agate is placed gate area, and Aov is overlap area. The over-lap area consists of the overlap between x- and y-direction wires aswell as between wires, signal TSVs and gates.

We assume an area ratio of T-TSV to decoupling capacitor (de-cap) for the given white space. For instance, if the ratio chosen is70%, T-TSV area is 70% of the white space, and 30% is used fordecap which help in power noise reduction. Thereafter, the thermalconductivity of a thermal tile is calculated as follows:

ktile = ARTSV × kCu + (1 − ARTSV ) × kSi

Here, ARTSV is the area ratio of total TSVs (signal, P/G, and ther-mal) in the tile. In order to obtain the temperature distribution, wesolve the following matrix equation: G·T = P , where G is thermalconductance matrix calculated from ktile, T is temperature vector,and P is power vector.

2.1.2 Micro-Fluidic ChannelsUnlike conventional air-cooled heat sinks or T-TSVs, liquid cool-

ing using MFCs offer a much larger heat transfer coefficient andchip-scale cooling solution. MFCs can be fabricated on the bulkside of silicon dies, enabling rejection of heat from every layer ef-ficiently. To analyze the thermal performance of MFC cooling for3D stacked ICs, we run numerical simulations.

The on-chip thermal network is composed of fluidic TSVs, mani-folds and MFCs. We assume that all the fluidic TSVs and manifoldsare located outside the core region. Since we focus on the core re-gion, only MFCs are considered for the analysis. The coolant pumpand the heat exchanger are assumed to be off-chip.

TSVs, especially P/G TSVs, should not touch any MFCs. MFCsare wide (around 60μm) and decrease the routing capacity of sig-nal TSVs quite considerably (see Fig. 1 (b)). Furthermore, dieswith MFCs are thicker than the ones without MFCs. Given a TSVaspect ratio (= TSV height to TSV diameter), thick dies lead to alarger TSV diameter and a lower TSV density. Thus, it is desired tooptimally design the width and the depth of MFCs along with thedimensions of signal and power TSVs.

2.2 Power InterconnectIn a 3D stacked IC, power is fed from the package through power

I/O bumps distributed over the bottom-most die, and travels to theupper dies using P/G TSVs. The P/G net structure in [8] is used inthis work. P/G TSVs are placed regularly in a dual mesh structure,and each P/G TSV has a co-located P/G I/O bump on the bottomside of the chip. The pitch between two power TSVs is predefined(200μm) and the same pitch is used for all layers. The diameterof P/G TSVs is around 10μm. On Metal 7 and 8, P/G wires areglobally distributed. Thick wires have 10μm width and connectP/G TSVs. Between two thick wires, 10 thin wires are placed.The remaining space is used for signal wires. In our 3D stackedIC structure, each P/G TSV pierces through the entire stack forefficient vertical power delivery (see Fig. 1). Thus, no gates canbe placed and no wires can be routed at the P/G TSV locations.For the placement tiles with pre-placed P/G TSVs, the placementtile capacity is decreased by a large amount, and the correspondingrouting tile has decreased signal routing capacities in x-, y-, andz-directions. To reduce power noise analysis time, we used thetechniques in [8].

2.3 Signal InterconnectWe perform global routing for signal interconnects. The reason

for global routing, instead of detailed routing, is to obtain quickbut reasonably accurate pictures of routing congestion so that wecan use them in our co-optimization flow. For the signal wires,

646 2009 IEEE/ACM International Conference on Computer-Aided Design Digest of Technical Papers

Page 3: Multi-functional Interconnect Co-optimization for … Interconnect Co-optimization ... Integrated Circuits ... 2.2 Power Interconnect Ina3Dstacked IC,powerisfedfromthepackage through

power

TSV

ground

TSV

MFC

signal TSV P/G thick

wires

60

2010

Figure 2: Top view of global placement and routing tiles with MFCs.Objects are drawn to scale based on our default setting. P/G thin wiresare not shown.

Table 1: Default technology and setting parameters. Our baseline onlyuses a top-mounted heat-sink, not T-TSVs or MFCs.

Baseline & T-TSV MFC caseChip size (μm) 1000x1000 1200x1200Number of dies 4 4Die bonding face-to-back face-to-backSi layer thickness (μm) 20 60Metal layer thickness (μm) 6 6Bonding layer thickness (μm) 2 10MFC depth (μm) 0 40MFC pitch (μm) 0 400P/G TSV pitch (μm) 200 200Clock period (ns) 1.33 1.33Power supply voltage (V ) 1.00 1.00

we use the metal interconnect dimensions similar to the ones inthe 45nm technology library from North Carolina State University[9]. We assume that TSV integration scheme is via-first, and theTSV aspect ratio is 30:1. High TSV aspect ratios of more than 30:1were demonstrated in [1]. For thick dies, a high TSV aspect ratiois needed to handle many die-to-die interconnects.

Figure 1 shows the side view of dies with T-TSVs and MFCs.In both cases, the diameter of signal TSVs is set to a minimum toaccommodate as many connections as possible. In contrast, the di-ameter of P/G TSVs is 10μm. Figure 2 shows the global routingtile objects. We fix the width of our routing tile to 20μm. Note thatsome tiles are fully occupied by MFCs and thus are not routable inz-direction. Since the size of a P/G TSV is comparable to that of arouting tile, the tiles that contain P/G TSVs have significantly lowerx-, y-, and z-direction routing capacities. Also note that TSVs aremuch larger than global wires, making them significant routing ob-stacles in 3D stacked ICs with MFCs.

For each global routing tile, there are x-, y-, and z-direction rout-ing capacity values. x- and y-direction capacity represents avail-able routing space in metal layers, while z-direction capacity is forsignal TSVs. The routing capacity values are calculated as in [8].

3. DESIGN OF EXPERIMENTSDOE has been used for many science and engineering applica-

tions. It has been proven to be effective and efficient when opti-mization is desired for complex systems with multiple input fac-tors. It provides a well-organized way of performing experimentsso that we can use the experimental results to find meaningful rela-tions among input factors and responses and optimize the system.

Table 2: 2D vs. 3D comparison. Congestion corresponds to the av-erage routing congestion in x- and z-direction, and Tmax representsmaximum silicon temperature.

2D, 1 die 3D, 2 dies 3D, 4 diesFootprint (μm2) 2,560,000 1,210,000 640,000Wirelength (μm) 38,833,520 34,007,180 27,291,940# signal TSVs 0 57,133 104,653Congestion (x, z) 0.54, 0.00 0.52, 0.18 0.38, 0.19Max. arrival time (ps) 1730.97 1415.38 1362.14Total power (W ) 2.42 2.50 2.35Tmax (◦C) 66.90 104.35 186.86Power noise (mV ) 8.73 4.28 5.39

Table 3: Congestion, thermal, and power noise comparison for T-TSVvs. MFC cases. All cases have 4 dies.

Baseline T-TSV MFC Baselineexpanded

Footprint (mm2) 1.00 1.00 1.44 1.44Wirelength (mm) 35,441 35,441 40,136 39,228Congestion (x,z) 0.29,0.09 0.29,0.09 0.21,0.53 0.21,0.05Tmax (◦C) 110.47 100.69 48.12 90.03Tavg (◦C) 105.75 99.03 45.91 85.92Power noise (mV ) 3.37 2.84 6.09 3.65

3.1 Overall Design FlowOverall design flow is summarized in Fig. 5. We start with defin-

ing the design knobs (= input factors) and the metrics (= responses).Our single experiment is equivalent to performing gate-level globalplacement and routing, where we first perform gate-level partition-ing and placement, followed by routing MFCs, P/G nets, and sig-nal nets. Then T-TSVs may be inserted. We evaluate the metricsof interest and complete the current experiment. Once all the ex-periments are performed, we construct response surfaces and usethem to obtain optimal design solutions. Table 7 shows our designknobs, and Table 8 shows our assessing metrics. We assume allMFCs have the same width and the same pressure drop.

3.2 Designing the ExperimentsWe used the Model-Based Calibration Toolbox in MATLAB to

design the experiments. Since we expected more complex responsesthan quadratic ones, we did not use classical designs such as centralcomposite or Box-Behnken. Stratified Latin Hypercube is selectedfrom space filling design styles. The number of design points toexperiment is determined based on the response model equationas well as the number of input factors and their ranges. We de-cided to generate 30 design points for Case Study I (= cooling withT-TSVs), and 50 design points for Case Study II (= cooling withMFCs). Since MFCs should not contact with P/G TSVs, we ap-plied the following constraint onto the design space for Case StudyII: WMF C +DPGTSV +2 ·SPmin ≤ PPGTSV /2. Here, WMF C

is MFC width, DPGTSV is P/G TSV diameter, SPmin is the min-imum spacing between MFC and P/G TSV, and PPGTSV is P/GTSV pitch. We divide P/G TSV pitch by 2 to get the distance be-tween a power and a ground TSV (see Fig. 2). For the designs thatsatisfy this equation we can place MFCs so that they do not touchP/G TSVs. As a result, eight design points were removed fromthe experiment for Case Study II. In addition, four random designpoints for model validation were generated for each case study. Theexperimental results of the validation design points were used to seehow the models predict unseen design points.

3.3 Enhancing Response Model AccuracyDetermining model equations is based on regression analysis.

2009 IEEE/ACM International Conference on Computer-Aided Design Digest of Technical Papers 647

Page 4: Multi-functional Interconnect Co-optimization for … Interconnect Co-optimization ... Integrated Circuits ... 2.2 Power Interconnect Ina3Dstacked IC,powerisfedfromthepackage through

Table 4: Summary of models for Case Study I (= cooling with T-TSVs) and Case Study II (= cooling with MFCs).Response Model type # parameters PRESS RMSE RMSE R2 Validation RMSE

Case Study I Total wirelength Poly3 + RBF-recmultiquadric 17 239.901 116.273 1.000 541.075Max. arrival time Poly9 + RBF-recmultiquadric 24 2.171 0.913 0.998 1.727Max. Si temp. Poly7 + RBF-linear 19 0.289 0.173 0.998 0.262Max. power noise Poly5 + RBF-linear 19 0.058 0.042 0.995 0.103

Case Study II Total wirelength Poly5 + RBF-multiquadric 29 17207.026 14546.450 0.999 26820.284Max. Si temp. Poly7 + RBF-recmultiquadric 28 0.045 0.036 1.000 0.084Max. power noise Poly7 + RBF-thinplate 22 0.046 0.036 1.000 0.058Pump power Poly5 + RBF-linear 37 1.92E-04 1.49E-04 1.000 4.63E-04

Figure 3: Response surfaces for Case Study I (= cooling with T-TSVs).Only two significant axes per response are shown.

Table 5: Optimization results for Case I (= cooling with T-TSVs).Baseline DOE-pred. DOE-actual

Design knobsRatio bet. T-TSV & decap 0.5 0.9 0.9P/G TSV diameter (μm) 10 10 10P/G thin wire ratio 0.5 0.77 0.77

Assessing metricsTotal wirelength (μm) 35,440,880 35,442,060 35,441,980Max. arrival time (ps) 662.75 660.6 656.81Max. Si temp. (◦C) 100.93 100.13 100.52Max. power noise (mV ) 2.56 2.38 2.32CostI 0.452 0.398 0.364

Response surface models can be expressed in multivariate polyno-mial equations. We estimate the coefficients such that the responseequation fits the data in optimal way. The goodness-of-fit of amodel can be tested with such statistics as root mean squared error(RMSE), prediction error sum of squares RMSE (PRESS RMSE)and the coefficient of determination (R2).

We chose hybrid radial basis functions (RBFs) as our modelclass. A hybrid RBF model is composed of a polynomial modeland an RBF network model to consider both linear and nonlinearbehaviors. We tried four kinds of RBF kernels: multiquadric, rec-multiquadric, thin-plate, and linear. The accuracy of the responsesurface equations is important because we use them in the opti-mization process. As a starting point, linear part was set with apolynomial of order 3 for main factors and order 2 for interactionfactors. The RBF kernel that we first tried was multiquadric.

We observed that on the corners of the region of interest, themodel error is relatively higher. Thus, eight design points at corners

Figure 4: Response surfaces for Case Study II (= cooling with MFCs).Only two significant axes per response are shown.

Table 6: Optimization results for Case II (= cooling with MFCs).Baseline DOE-pred. DOE-actual

Design knobsMFC width (μm) 60 75 75MFC pressure drop (μm) 140 164 164P/G TSV diameter (μm) 10 15 15P/G thin wire ratio 0.5 0.8 0.8

Assessing metricsTotal wirelength (μm) 40,136,240 40,539,337 40,538,040Max. Si temp. (◦C) 48.13 44.92 44.97Max. power noise (mV ) 6.09 5.10 5.11CostII 0.450 0.306 0.308

of region of interests were added for the T-TSV case, and 16 designpoints were added for the MFC case. A comparison of power noiseresponse surfaces before and after adding corner design points isshown in Fig. 6. Since increasing P/G thin wire ratio and P/GTSV diameter decreases power noise in general, we can judge thatadding corner points improved the model accuracy. In the opti-mization process the model may find a better solution by avoidinga false optimum points around corners.

We also observed that increasing the order of main factors mayimprove the model accuracy. Table 9 shows the impact of model or-der on the accuracy of the maximum silicon temperature. Increas-ing model order generally results in increased R2 and decreasedRMSE. However, the model prediction error, PRESS RMSE, doesnot decrease monotonically with increased model order. Moreover,the validation RMSE is the minimum when the model order is 7.A model with too high a polynomial order and too many parame-ters may have lower prediction capability for unseen design points,

648 2009 IEEE/ACM International Conference on Computer-Aided Design Digest of Technical Papers

Page 5: Multi-functional Interconnect Co-optimization for … Interconnect Co-optimization ... Integrated Circuits ... 2.2 Power Interconnect Ina3Dstacked IC,powerisfedfromthepackage through

Tier partitioning

3D placement

MFC routing

P/G net routing

Signal net routing

Timing, thermal , power,

noise, congestion analysis

Rip up and rerouting

Prepare settingsDefine input factors

and responses

BEGIN

Determine design points

to experiment

Found optimal ?

Assess metrics,

build response surfaces

END

Find optimal design

with minimum Cost

Y

N

Run all experiments

for the design points

< single experiment >

T-TSV and decap insertion

Figure 5: Overall design flow of DOE and RSM.

Table 7: Design knobs.Ratio betweenT-TSV and de-cap

For given white space, we change the area ratio be-tween T-TSV and decap. 0 means no T-TSV and alldecap, and 0.9 means 90% T-TSV and 10% decap.

MFC width Given a pressure drop between inlet and outlet, awider MFC means higher mass flow rate and bettercooling capability.

MFC pressuredrop

The pressure drop between inlet and outlet of MFCsaffects the mass flow rate and cooling capability.

P/G TSV diam-eter

With the given die thickness, the diameter of a P/GTSV determines the RLC parasitics. We change thediameter to understand the possible tradeoff betweenpower noise and other metrics.

P/G thin wireratio

This is the ratio between P/G thin wires and signalwires on metal layer 7 and 8.

although it follows the known design points very closely and hasvery low RMSE. This is known as the over-fitting problem. We canjudge how well the model behaves by comparing RMSE to PRESSRMSE. When the gap between these two was higher than 3 times,we observed the over-fitting phenomena frequently. Each responsehad a different optimal polynomial order.

In addition, several kinds of RBF kernel functions were tested.Which RBF fits the best is highly dependent on the input data. Wecarefully determined each RBF function per metric, consideringR2, RMSE, PRESS RMSE, and validation RMSE. Table 10 showsthe comparison of models with different RBF kernels for maximumarrival time in Case Study I. For this metric, recmultiquadric is thebest in terms of R2 and validation RMSE.

3.4 Design OptimizationWith multiple assessing metrics and design constraints, there can

be several optimization scenarios. To consider multiple metrics atonce, each of the metrics under consideration is normalized to [0, 1]and forms a partial cost. Then, we combine them into a singledesirability function which we call Cost. Using optimization al-gorithms such as nonlinear programming or genetic algorithm, wefind the optimal design point with minimum Cost.

4. EXPERIMENTAL RESULTS

4.1 Experimental Settings

Table 8: Assessing metrics.Total wire-length

All the wirelengths of signal nets are summed up.This represents the quality of signal routing.

Max. arrivaltime

The maximum arrival time from timing analysis de-termines the maximum clock frequency. We try tominimize this to maximize performance.

Max. Si temp. The performance of transistors degrades with highertemperature. We want this below 85

◦C.Max. powernoise

The maximum power noise in the entire power gridshould be less than power noise margin.

Coolant pumppower

The coolant pump requires power to provide fluidthrough MFCs. This is considered in planning thesystem power budget.

(a) Before adding corners (b) After adding corners

Figure 6: A comparison of power noise response surfaces be-fore and after adding corner design points. Circled regions onthe left show high distortion.

We implemented our design package in C++/STL and MAT-LAB. The simulations were executed on a 64-bit Linux server withtwo quadcore Intel Xeon 2.5GHz CPUs and 16GB main memory.Our default technology and setting parameters are shown in Table1. As an input circuit, we synthesized a RISC processor circuitnamed OpenRISC (or1200) from OpenCores [6] using SynopsysDesign Compiler and a 45nm technology library [9]. The synthe-sized circuit had about 330K gates and nets.

To obtain the power map, the switching activity of each gate wasassigned uniformly at random between 0 and 0.8. The power mapwas generated after timing analysis stage, and fed into the thermalanalyzer and the power noise analyzer. For T-TSV case, the ther-mal analyzer was written in C++/STL and the runtime was about 5minutes. In case of MFC, the thermal analyzer was written in MAT-LAB. We used the thermal simulation method in [7]. The runtimewas about 3 minutes.

For power noise analysis, the power consumption at each gridlocation was modeled as a current source. To determine the decaparea ratio at each power tile, for T-TSV case, we calculated the de-cap size as described in Eq. (1). For MFC case, we calculated thewhite space per tile, and assumed that 80% of the white space isused for decap. The gate oxide thickness used for decap size calcu-lation was 1nm, and the inductance and the resistance of packagepins were 0.3nH and 3mΩ, respectively. We assumed that 1/8 ofeach layer in the entire stack is turned on at once, and the rise timeof the current profile was set to 5ns. After the power noise simula-tion, we gathered the peak power noise voltage for each grid. Theruntime of a power noise simulation was about 1 minute.

4.2 Comparison of 2D and 3D DesignsTable 2 shows the comparison of a 2D design with 3D designs (2

dies and 4 dies). We set the chip size so that each case has about thesame total silicon area. As a result, footprint area becomes smallerwith more dies in 3D. In addition, total wirelength becomes shorteras well. With increased number of dies, congestion in x- and y-

2009 IEEE/ACM International Conference on Computer-Aided Design Digest of Technical Papers 649

Page 6: Multi-functional Interconnect Co-optimization for … Interconnect Co-optimization ... Integrated Circuits ... 2.2 Power Interconnect Ina3Dstacked IC,powerisfedfromthepackage through

Table 9: Impact of model order on the accuracy of the maximum sili-con temperature metric.

Model Number of R2 RMSE PRESS Validationorder parameters RMSE RMSE3, 4 20 0.994 0.284 0.662 1.0195 21 0.995 0.284 0.641 1.0686 23 0.999 0.160 0.451 0.5497 19 0.998 0.173 0.289 0.2628 22 0.999 0.116 0.247 0.3129 23 0.999 0.144 0.397 0.425

Table 10: Models with different RBF kernels for the maximum arrivaltime in Case Study I.

multi- recmulti- thin-plate linearquadric quadric

R2 0.962 0.998 0.973 0.971Validation RMSE 3.817 1.727 3.538 2.757

direction decreases because less number of gates are on a die. Notethat wirelength and congestion are also significantly dependent onthe quality of placement and routing algorithms. Maximum arrivaltime also decreases with increased number of dies, but not so muchbetween 2 die and 4 die case.

A big problem with 3D designs is elevated maximum tempera-tures. With 4 dies, the bottom die has very long heat dissipationpath to heat sink, leading to very high temperature. Since the re-sistivity of metal wires increases with higher temperature, the wiredelay also increases. Thus, the maximum arrival time did not de-crease much in 4 die case, despite shorter wirelengths. The maxi-mum silicon temperatures of 3D designs are too high, and we mayhave to reduce the clock frequency to lower the temperature. Hencethermal solutions are crucial for 4 die designs to be practical. Thetotal power consumptions were similar for all cases, however maxi-mum power noise was the highest in 2D case. This is partly becausethe size of power switching region per die is smaller with increasednumber of dies. Spreading one big switching region into dies mayhelp decrease maximum power noise level.

4.3 Comparison of T-TSV and MFCTable 3 shows the congestion, thermal, and power noise com-

parison for three cases: baseline, T-TSV, and MFC. No T-TSVs orMFCs are used in the baseline. Compared to the baseline, MFC re-duced the maximum temperature by 56%. However, MFC requiresan expansion of the footprint by 44% due to z-direction conges-tion. The last column shows the results for the baseline, where thefootprint is expanded to match that of MFC case. This show thatthe increase in footprint was responsible for about 19% decrease intemperature. In comparison, T-TSVs decreased the maximum sili-con temperature by only 9%, which is small compared to a previouswork [5], where silicon-on-insulator (SOI) was assumed. Since theinsulator has very low thermal conductivity, inserting T-TSVs musthave increased thermal conductivity of layers much. However, inthis work we assumed bulk silicon layers that already have goodthermal conductivity. Thus, inserting T-TSVs did not decrease themaximum temperature dramatically.

In the MFC case, the z-direction congestion is more severe thanthe other two cases. This is due to thicker dies and larger diametersignal TSVs which constitute about 30% of the chip area. Also, themaximum power noise increased by 81% compared to the baseline.The space for decap insertion decreased, and P/G TSVs have largerparasitics with thicker dies, thus power noise increased much. Thelast column shows that the increase in wirelength in MFC case is

not significant. In comparison, congestion and total wirelength forT-TSV case are the same as the baseline because T-TSVs are in-serted after signal routing.

4.4 Case Study I: Optimization with T-TSVIn this case study, we show how to utilize DOE to co-optimize

the T-TSV, P/G interconnect, and signal wires. We allow (1) theratio between T-TSV and decap to vary in [0, 0.9] range (continu-ous), (2) P/G TSV diameter (μm) to be set to 5, 10, or 15 (discrete),and (3) P/G thin wire ratio to vary in [0.2, 0.8] range (continuous).

4.4.1 Response SurfacesTable 4 shows the summary of models for responses. We found

the best models as described in Section 3.3. Since the values ofPRESS RMSE are not much higher than those of RMSE, the re-sponse surfaces are not over-fitted. All the models have R2 veryclose to 1, which means most of the design points can be fitted bythe models used for response surfaces. Compared to RMSE, val-idation RMSE confirms that the model predicts the unseen designpoints very well.

Figure 3 shows the response surfaces of all metrics. For eachmetric, we show the two input factors that are the most influential.For total wirelength, the most significant knob was P/G TSV diam-eter. In case of maximum arrival time and power noise, P/G TSVdiameter and P/G thin wire ratio were the major factors. For max-imum silicon temperature, the most influential input factor was theratio between T-TSV and decap.

4.4.2 Correlations Among Knobs and MetricsTotal wirelength gets shorter with smaller P/G TSV diameter.

This is due to the signal wire congestion around P/G TSVs. Themaximum arrival time showed a rather complex response surface,although the maximum arrival time is larger with bigger P/G TSVdiameter, which seems reasonable because the wirelength is longer.Maximum silicon temperature dropped sharply when the ratio be-tween T-TSV and decap increased from 0 to 0.2, yet the slope de-creased after the ratio of 0.2. Maximum power noise decreasedwith higher P/G thin wire ratio. Larger P/G TSV diameter alsohelped decrease power noise a little.

4.4.3 Optimization and ResultsWith the response surface models, the Cost function was formed

and minimized. In this case study, we solve the following problem:minimize the combined cost (= CostI ) under 100% routabilityconstraint. We ensure the routability by performing global routingas well as by checking congestion values. We want to minimizetotal wirelength, maximum arrival time, maximum silicon temper-ature, and maximum power noise. The following is used to eval-uate the solution: CostI = 4

√C∗

wl · C∗

at · C∗

st · C∗

pn. Here, C∗

wl,C∗

at, C∗

st, and C∗

pn denote normalized total wirelength, maximumarrival time, maximum silicon temperature, and maximum powernoise costs, respectively.

Then, with the optimum design settings, we run the experimentsto see the actual result of the optimal design. We compare thefollowing three cases: baseline - For comparison, we made thebaseline setting with the input factors set as follows: T-TSV ra-tio = 0.5, P/G TSV diameter = 10μm, and P/G thin wire ratio =0.5. DOE-predicted - This is the optimal design by the responsemodels. DOE-actual - This is the actual result obtained from ex-perimenting the optimal setting in DOE-predicted. Comparison be-tween DOE-predicted and DOE-actual reveals the accuracy of themodel prediction.

Table 5 shows the knob settings obtained from the three cases as

650 2009 IEEE/ACM International Conference on Computer-Aided Design Digest of Technical Papers

Page 7: Multi-functional Interconnect Co-optimization for … Interconnect Co-optimization ... Integrated Circuits ... 2.2 Power Interconnect Ina3Dstacked IC,powerisfedfromthepackage through

well as the comparison of design results. Compared to the baseline,DOE method found a better solution with about 19% less CostI .Since the partial costs such as Cost∗wl are normalized, CostI looksmuch decreased. The reduction of maximum silicon temperature issmall due to the inefficiency of T-TSVs. P/G thin wire ratio in-creased to around maximum because it decreased power noise anddid not exacerbate other metrics. Comparing DOE-predicted andDOE-actual, we see that the DOE prediction was quite accurate onall metrics. The error between DOE-predicted and DOE-actual perresponse was less than 1%, except for maximum power noise whichhad around 2.6% error. The baseline setting was already good, andDOE did not improve the design much.

4.5 Case Study II: Optimization with MFCIn this case study, we use DOE to co-optimize thermal intercon-

nect (= MFC), P/G interconnect, and signal wires. We allow (1) theMFC width (μm) to vary in [40, 85] range (continuous), (2) MFCpressure drop (kPa) to vary in [100, 180] range (continuous), (3)P/G TSV diameter (μm) to be set to 5, 10, or 15 (discrete), and (4)P/G thin wire ratio to vary in [0.2, 0.8] range (continuous).

4.5.1 Response SurfacesTable 4 shows the summary of models for responses. As shown

in the table, each response has its best RBF kernel. We can checkthat the response surfaces are not over-fitted by comparing PRESSRMSE to RMSE. All the models have R2 very close to 1. Val-idation results seem good enough. Figure 4 shows the responsesurfaces of all metrics. MFC width and P/G TSV diameter weredominant factors on total wirelength. In case of the maximum sili-con temperature, the most influential input factors were MFC widthand MFC pressure drop. The maximum power noise was affectedmostly by P/G thin wire ratio and P/G TSV diameter. The pumppower was mostly affected by MFC width and pressure drop.

4.5.2 Correlations Among Knobs and MetricsTotal wirelength increased with larger MFC width, due to de-

tours with z-direction congestion coming from MFCs. Larger P/GTSV diameter also increased the total wirelength, due to the detourswith x- and y-direction congestion around T-TSVs. The maximumsilicon temperature decreased with increased MFC width and pres-sure drop. When MFC width is smaller, higher MFC pressure dropdecreased maximum silicon temperature more. Maximum powernoise gets decreased with higher P/G thin wire ratio and P/G TSVdiameter, which is the same relation as in Case Study I. Pumppower gets increased with larger MFC width and pressure drop,because more fluid per unit time flows through MFCs.

4.5.3 Optimization and ResultsIn this case study, we solve the following problem: minimize

the combined cost (= CostII ) under 100% routability constraintand less than 0.1W pump power. We want to minimize total wire-length, maximum silicon temperature, and maximum power noise.The maximum pump power is set at 0.1W , which is around 3% ofthe chip power consumption. We put the pump power constraintinto the optimization engine, and with the pump power responsemodel the optimization engine prunes the regions with pump power≥ 0.1W . The following is used to evaluate the solution: CostII =3

√C∗

wl · C∗

st · C∗

pn. Here, C∗

wl, C∗

st, and C∗

pn denote normalizedtotal wirelength, maximum silicon temperature, and maximum powernoise costs, respectively.

We made the baseline setting with the input factors set as fol-lows: MFC width = 60μm, MFC pressure drop = 140kPa, P/GTSV diameter = 10μm, and P/G thin wire ratio = 0.5. Table 6

shows the knob settings obtained from the three cases as well asthe comparison of design results. Compared to the baseline, DOEmethod found a better solution with about 32% less CostII . MFCwidth was increased to lower the maximum silicon temperature.MFC pressure drop was increased as well, but not to the maximumlevel because of the pump power constraint. P/G TSV diameterand P/G thin wire ratio were increased to maximum to decrease themaximum power noise level. Maximum silicon temperature andmaximum power noise decreased at the expense of increased totalwirelength. We improved the baseline noticeably on thermal andpower noise. The DOE prediction was again quite accurate on allmetrics. The error between DOE-predicted and DOE-actual per re-sponse was less than 1%. We can say that the optimized designwith DOE and RSM is a good quality solution in terms of CostII .However, considering the model error, we may find a better solu-tion by a local search around the DOE-predicted optimal point.

5. CONCLUSIONSIn this paper, we presented a co-optimization study of three types

of interconnects in 3D stacked ICs: signal, power, and thermal in-terconnects. The effectiveness of the optimization based on DOEand RSM was demonstrated for the cooling with T-TSVs and MFCs.Carefully tuned response surface models led to reliable optimiza-tion results. These models can be reused if the optimization goal ischanged by system designers.

Inserting T-TSVs does not incur z-direction congestion much,yet it may not solve the thermal problem effectively. On the otherhand, MFCs can bring down die temperature to an acceptable level,however z-direction congestion may lead to a larger chip. Design-ers should understand trade-offs when adopting these techniquesinto 3D stacked ICs.

6. REFERENCES[1] A. J. Joseph et al. Through-silicon vias enable next-generation SiGe

power amplifiers for wireless communications. IBM J. RES. & DEV.,52(6):635–648, Nov. 2008.

[2] K. Banerjee, S. Souri, P. Kapur, and K. Saraswat. 3-D ICs: A NovelChip Design for Improving Deep-Submicrometer InterconnectPerformance and Systems-on-Chip Integration. Proceedings of theIEEE, 89:602–633, May 2001.

[3] J. Cong and Y. Zhang. Thermal-driven multilevel routing for 3-DICs. In Proc. Asia and South Pacific Design Automation Conf.,volume 1, pages 121–126, Jan. 2005.

[4] D. Sekar et al. A 3D-IC Technology with Integrated MicrochannelCooling. In Proc. Int. Interconnect Technol. Conf., pages 13–15, June2008.

[5] B. Goplen and S. Sapatnekar. Thermal Via Placement in 3D ICs. InProc. Int. Symp. on Physical Design, pages 167–174, Apr. 2005.

[6] D. Lampret. opencores.org.[7] Y.-J. Lee, Y. J. Kim, G. Huang, M. Bakir, Y. Joshi, A. Fedorov, and

S. K. Lim. Co-Design of Signal, Power, and Thermal DistributionNetworks for 3D ICs. In Proc. Design, Automation and Test inEurope, pages 610–615, Apr. 2009.

[8] Y.-J. Lee and S. K. Lim. Routing Optimization of Multi-modalInterconnects In 3D ICs. In Electronic Components and TechnologyConf., pages 32–39, May 2009.

[9] North Carolina State University. NCSU FreePDK.[10] E. Wong and S. Lim. 3D Floorplanning with Thermal Vias. In Proc.

Design, Automation and Test in Europe, volume 1, pages 1–6, Mar.2006.

2009 IEEE/ACM International Conference on Computer-Aided Design Digest of Technical Papers 651