review on thermal related vlsi design - semantic scholar · 2015-07-28 · review on thermal...

Review on Thermal related VLSI Design Student: Sheng Xu

Advisor: Professor Wayne Burleson May 25, 2006

Outline: 1. Abstract 2. Introduction 3. Architectural analysis and management

3.1. Modeling 3.2. DTM 3.3. Analysis and Design methodologies 4. Thermal-aware Circuit Design 4.1. Modeling 4.2. Thermal-Aware design 5. Simulation and Measurement 6. Sensors 7. Synthesis 8. Test 9. Summary

1 Abstract: Chip temperature rising is aggravating and becoming more difficult to handle in deep-sub-micro regimes. Consequently, temporal and spatial hotspots across chip induce various performance and reliability problems. Efforts have been drawn in all levels in semiconductor area, from architectural down to material science. This paper investigates the mechanism of thermal surging in digital microprocessor, reviews techniques on thermal impacts mitigating and controlling. Recent advances in architecture, circuit, CAD, device and etc are explored, together with the demonstration of their advantages and limitations.

2 Introductions: There is significant increasing of temperature throughout the whole chips. And the uneven heat distribution across temporal and spatial domain is gaining more attention in contemporary processors than before. The most well known result of heat damage is physical devastation. But it is far than all. Temperature fluctuation can cause timing error by the delay time changing. Signal integrity can not be sure since the temperature surging can induce noise.[*] Heat environment will inject more power consumption which become a dead loop between temperature and power. Temperature limits the power delivery and dissipation which is the primary design concern in future high-end processors.[2]

Chips become hotter because of the speed mismatch between integration density increase and power density decrease. Static thermal control becomes inefficient when the thermal surging is largely dependent on the computation pattern. Leakage power become dominant in chip 70nm and below which make the thermal problem more complicated. It makes sections such as cache blocks which are usually dense and inactive become hot.[73] Self-heating is also a concern in bipolar transistor since it is fragile to temperature varying and in SOI device because of its poor thermal conductivity.

3. Architectural management 3.1 Thermal Modeling Full chip thermal modeling can be categorized into Precise predictions, Grid Meshing, Finite element analysis, Thermal circuits, and the mixture of several methods. There are a number of existing thermal models for different parts of a microelectronic design. For example, our previous work [8][9] presents a dynamic compact thermal model, HotSpot, only at the microarchitecture level. [25] presents a chip-level thermal model based on full-chip layout. In [74], the authors present a thermal modeling approach based on analytical solutions of heat transfer equations, and the model is mainly focused at device level. None of these thermal models have the flexibility to model temperature at arbitrary granularities. Some of them are also computationally intensive.[5] A methodology for deriving more or less ‘standardized’ compact models is presented in [33]. The possibility of creating parametric compact models, and the influence of differences in ambient temperatures, is discussed. It is demonstrated that the thermal behaviour of electronic devices by means of compact models can be approached within typically 6% of the detailed model values. The methodology has several steps: 1. Creation of a ‘full’ model of an electronic device 2. A set of 38 combinations of boundary conditions 3. The junction temperatures and heat flow rates thus obtained are implemented in Optimize, a proprietary software code that essentially minimizes a user-defined cost function, using optimization routines from the NAG library. 4. Definition of a user-defined cost function which is to be minimized in the optimization procedure. The cost function is:

in which: F value of the cost function; Tj,c junction temperature calculated with the ‘compact’ model, K; Tj,f junction temperature calculated with the ‘full’ model, K; W weighting factor; Phi i,c heat flow rate through side i, calculated with the ‘compact’ model Xie and Ketchen proposed a modeling technique for Small Extreme Power Density

Macro on a High Power Density Microprocessor Chip based on a FFT procedure. The governing equation of heat conduction for such a multiplayer system is a general Laplace equation

where subscript i is the index corresponding physical'values in th e i-th layer and A stands for the heat conductivity of the heat conduction media. Such an equation is applicable to the Si substrate layer and Sic heat spreader layer, however, special attention must be paid when determining the heat conductivity of the via and interconnect layers above the device plane. Since such layers are highly hetiaogeneous spatially, as the first order approximation, the heat conductivities for such layers are determined by lumping the densities of the via, interconnect and dielectric material together The homogeneity of thermal properties in the x and y directions of each layer and the insulating boundary condition on the side surfaces enable a 2-D cosine transform of the temperature function

In the spatial frequency domain, the heat conduction equation (2) becomes

13y applying the matrix relationship (12) continuously across all the layers we can relate the target temperature T, with the all known boundary conditions by

and from the boundary conditions we have

The solution of matrix can be written in a generalized form

They have applied a lumped-multiplayer-based model to study the thermal behavior of a small but extremely high power density macro on a microprocessor chip of average power density 40 W/cm2. They found that the worst case temperature rise for such a hotspot is about 11% (- 7 "C) above the global chip temperature rise (- 80 "C) for 2500 W/cm2 power density in SO1 technology. For the 15 x 300 pm2 macro studied the back end metal layers and solder ball can decrease the peak temperature on the device plane somewhat due to the fact that heat flux can spread well laterally before being dissipated in the vertical direction. As a result of the presence of buried oxide, the thermal behaviors of macros of this size are significantly different for bulk and SO1 technologies. The total rise against the ambient temperature of macros of the size is still dominated by the global chip temperature, and is only reduced slightly by heat flow through.

Figure 1.left: Top view of the contour plot of the temperature rise of a heated macro with dimensions of 15 x 300 pm* and a power density of 2500 W/cm2. Right: 1-D plot of the temperature distribution of the heated macro in the device plane for bulk case. The plot is along the line cutting through the center of the macro and perpendicular to its long dimension. The three cases denoted in the legend are the same as described in Fig. 5. In[5], Huang et. al proposed a compact thermal model for temperature-aware design. Thermal design will be one of the major challenges for the CAD community for sub-100nm designs. The extended version of HotSpots take temperature as a guideline during the entire design flow. Results show that a temperature-aware methodology can provide more accurate design estimations, and therefore better design decisions and faster design convergence.

Figure 1 a Interactions among thermal model and power, performance and reliability models.

Figure 1 (b) A typical flipchip, CBGA package with heat sink (adapted from [13]). (c) Corresponding thermal circuit in our thermal model. Thermal capacitors connecting each node to ambient are not shown for clarity.

Figure 1 (d) Thermal circuit of a silicon die with 3 microarchitecture blocks, adapted from [9]. (e) Thermal circuit of a silicon die with 3x3 grid cells, with thermal interface material, heat spreader and heat sink. (Thermal capacitors and heat sources are not shown for clarity.) 3.2 Dynamic Thermal Management (DTM) The term dynamic thermal management (DTM) refers to a range of possible hardware and software strategies which work dynamically, at run-time, to control a chip’s operating temperature.[2] The ideal situation of DTM is to reduce the power with an inexpensive response and impact the chip performance as little as possible. As discussed in [2], the key of DTM is to response the thermal emergency appropriately. As shown in fig. 1.2, there are several mechanisms need to be considered DTM design, i.e. initiation, trigger and response. Any dynamic response technique requires a trigger mechanism to engage the response during program execution. [2] The popular methods as triggers are:

Temperature Sensors for Thermal Feedback On-chip Activity Counters Dynamic profiling analysis Compile-time trigger requirements

Fig. 1.1 Overview of Dynamic Thermal Management (DTM) technique. 1.2 Mechanisms for Dynamic

Thermal Management.[2]

In [2], Authors pointed out that on chip sensors or other hardware-only trigger is not sufficient for thermal response since (1) average temperature is not accurate for thermal evaluation, (2) hardware does not provide information about the workload. A more complex mechanism as a combination of various triggers including hardware and software may more effective than any of the techniques taken alone. There are several response mechanisms as:

Clock Frequency Scaling Voltage and Frequency Scaling Decode Throttling Speculation Control I-cache Toggling Clock frequency scaling is the techniques to cut of the clock frequency to

alleviate the power consumption, then cool down the chip. The drawback is the linear performance degradation with the frequency scaling. Synchronization is also a big problem for the scaling. In [M24-3], Transmeta proposed a combination of clock frequency scaling along and dynamic voltage scaling. Similarly, this technique also needs accurate timing analysis. Also it may be critical for circuit design and bring leakage problem in the more aggressive process technologies. In [M21-4], decoding throttling is applied by restricting instruction. Clock gating is used to control the power dissipation. The main problem for decoding throttling is cache interface and PLL design. In [M11-5], speculation control is discussed. The key point is pipeline gating based on the branch confidence. In [2], I-cache toggling is proposed to disabling the instruction fetch unit (I-cache and branch prediction) and using the instruction fetch queue to feed the pipeline. DTM need an initiation method to start the response when the trigger reaches the threshold point. This initiation mechanism could be implemented by a combination of hardware and software (operation system). Fine-grained control will decrease the thermal emergency cycles but definitely increase the overhead. In order to be effective without too heavy penalty, designer needs to be careful on trigger selection and focus on average power. It also be pointed out that thermal management may take long time than expectation. And lightweight policy is more effective to keep the temperature in a close target level.

3.3 Analysis and Design Methodology

3.3.1 Thermal evaluation for SMT and CMP In[6-10], simultaneous multithreaded (SMT) and chip multi-processor(CMP) architectures are evaluated. Simultaneous multithreading (SMT) and chip multiprocessing (CMP) both allow a chip to achieve greater throughput, but their thermal properties are still poorly understood. Temperature profile is different from SMT and CMP, while Skedron et. Al claimed that the peak temperature is similar in [8]. SMT heating is primarily caused by localized heating in certain key structures such as the register file, due to increased utilization. On the other hand, CMP heating is mainly caused by the global impact of increased energy output, due to the extra energy of an added core. Because of this difference in heat up machanism, Skedron et.

al found that the best thermal management technique is also different for SMT and CMP. CMP and SMT will also scale differently as the contribution of leakage power grows, with CMP suffering from higher leakage due to the second core’s higher temperature and the exponential temperature-dependence of subthreshold leakage.

Figure (2.1) Temperature of SMT and CMP vs. ST. (2.2) Temperature Difference between CMP and

SMT with different leakage scaling factors.[8]In [2], Martonosi et. Al also compared the SMT and CMP architectures. They began with models for different processors using roughly equal silicon resources and develop parameters and floorplan layouts for each of these cases. They showed that large temperature gradients are prominent with either multithreading technique, but both architectures show promise as a basis for temperature-aware enhancements to mitigate the problem. Several techniques are proposed for managing peak temperature problems and find that allowing hot functional blocks to be allocated more die area can reduce the processors’ hottest unit temperatures by as much as 12° Celsius. They scale their processors up to 4 contexts or 4 processor cores and find that these same temperature trends continue.

Figure (3.1) Original and final temperatures for four-context SMT and CMP workloads.(3.2) Temperatures of hottest unit (instruction window) for gzip-vortex under SMT with granularly enlarging hot units. At an infinite number of pieces, the averages of the active pieces and blank pieces should both converge to the single-piece temperature. [1]

3.3.2 Cache Design With the aggressive scaling of CMOS devices, the Transistor threshold voltage and the supply voltage have scaled down simultaneously in order to maintain the performance improvement. This decrease in the threshold voltage has resulted in an exponential increase in the subthreshold leakage current, which is the dominant source of leakage power [73]. Leakage power has already become comparable to dynamic power, and it is projected to dominate the total chip power in nanometer scale technologies.In [62], one-dimension thermal model as shown in figure 4.1 is used to modeling the temperature profile. The relations can be represented as: where _ ja is the chip junction-to-ambient thermal resistance of the silicon substrate and the package, c is the heat capacity of the system, Tj is the chip junction temperature, Tj’ is the time derivative of Tj, P is the chip power dissipation, and Ta is the ambient air temperature. Note that power and temperature are functions of each other creating electrothermal coupling effect. A rise in the temperature results in an increase in the leakage power, which in turn, raises the temperature even higher, thus creating a positive feedback loop. Thus, both leakage current and junction temperature have to be modeled iteratively with each other. Ismail et. Al. proposed a simulation loop as shown in figure 4.2. Spice and HotSpot are used to simulate power and temperature profile separately which take the input from each other. They also proposed two techniques to reduce both active and leakage power in cache. The first technique, Power density Minimized-Architecture (PMA), enhances power-down techniques with power density (hence temperature) consideration of the active parts in the cache. Instead of turning off entire banks, PMA architecture spreads out the active parts by turning off alternating rows in a bank. This reduces the power density of the active parts in the cache, which then lowers the junction temperature. Due to the exponential relationship between the leakage power and temperature, the drop in the temperature results in a significant energy savings from the remaining active parts of the cache. The second method proposed, Block Permutated Scheme (BPS), aims to maximize the physical distance between the logically consecutive blocks of the cache. Since there is spatial locality in caches, this distribution results in an increase in the distance between hot spots, thereby reducing the peak temperature.

Figure 4.1 One-dimensional chip thermal model. 4.2 Flowchart of the simulation process to estimate power and temperature. 3.3.3 Core Swapping Core swapping is also a technique of DTM in dual core microprocessors. In[61], Reinman et. al. have explored the use of core swapping on a microcore architecture, a deeply decoupled processor core with larger structures factored out as helper engines. Core swapping is complicated by two factors: the cost of migrating an application from one core to another and the area overhead of the additional core for thermal management. Microcores enable efficient core swapping by buffering processor state in shared helper engines that reduce startup costs when switching to a new core. And they are small enough to reduce the area overhead of core replication. The results demonstrate that the microcore reduces the impact of core swapping significantly, on average 43% while showing promising thermal reduction ability. They also evaluated alternative approaches to spending the area overhead of the additional microcore, including larger microcores, CMP cores, and SMT cores, all with DFS. Our results indicate that while additional core on CMP and larger resources on SMT can improve performance, even an idealized version of DFS can hurt their performance significantly.

Figure 5.1 Thermal and Performance behavior of different architectures with and without DTM

3.3.4 Distributing the Frontend González et. al discussed the issue of temperature in the frontend of a clustered microarchitecture, which is an important contributor of the total heat dissipated by the processor.[32] Most techniques focus on the backend of the microprocessor, while the heat dissipated by the frontend becomes more significant, and one of the major contributors to the total average temperature. In [32], A thermally efficient frontend is proposed and analyzed. A mechanism to distribute the rename and commit logic is

shown to reduce temperature by more than 30% (both peak and average temperatures) in the rename table and in the reorder buffer, with a small impact on performance (only 2%). Bank hoping scheme in trace cache is proposed and with a thermal-aware mapping that attempts to balance temperature among cache banks. When both techniques are combined together, the temperature of the reorder buffer, rename table, and trace cache are all reduced.

Figure 6.1 Rename table and reorder buffer distribution. 6.2 Overall temperature results for the

distributed frontend

4. Thermal-aware Circuit design 4.1 Thermal impact on Digital Circuit In[39], authors discuss the potential circuit risk in an excessive heat environment. Thermal effect need to be considered during the circuit design stage since it will affect circuit performance in various aspects, including:

Circuit Reliability Propagation Delays and Signal Integrity Power Dissipation Power/Ground Integrity

In order to design circuit that can cope thermal challenge sufficient, accurate modeling and simulation are the first step to hit down the temperature. First of all, The mean-time-to-failure (MTTF) due to EM is calculated by the well-known Black’s equation indicate that reliability will decrease as the temperature increase. In [19], Black’s equation about mean-time-to-failure is expressed as:

Where MTTF is the median time to failure. A is a process- and geometry-dependent constant, J is the DC (average) current, exponent n is 2 under normal use conditions, Q is the activation energy for grainboundary diffusion and is equal to ~0.7eV for Al-Cu, k is Boltzmann’s constant, and T denotes the metal temperature. While temperature are depended on the power and thermal resistance as:

P is the power consumption, including dynamic and leakage power. Re is the thermal resistance of the system. Secondly, temperature affect on propagation delays and signal integrity According to [76], the logic gate delay change is about 4% with 40°C temperature difference in a 130-nm industrial process. The wire resistance increased by about 12% for 40°C around the nominal temperature. Delay change for the wire resulted in about 5% for 40°C using the same process. Clock skew can be increased by as much as 10% of the clock cycle time when the junction temperature changes in the substrate by as much as 40°C [39] Circuit simulation plays an important role during early design steps and final verification phases. The accuracy of circuit simulation tools depend on the implemented device models. Hence for accurate performance (timing) analysis precise gate as well as interconnect delay models must be considered. Temperature dependence, noise effects, and process variations are among the factors that need to be taken into account in a complete model. The circuit delay has a specifically strong dependence on temperature; therefore it is important to explore how the thermal effects can affect the circuit timing characteristics, such as the path delay, and the criticality of path. [39] Leakage power are a significant component in nanometer stage. Since subthreshold leakage power is strongly depend on temperature, the extra power dissipation due to the thermal effect will form a positive feedback to push the circuit into a worse situation. Authors in [77] present a framework for full-chip estimation of subthreshold leakage power distribution considering both within-die and die-to-die variations in process (P), temperature (T) and supply voltage (V). Using this framework, a quantitative analysis of the relative sensitivities of subthreshold leakage to P-T-V variations has been presented. It was shown that for accurate estimation of subthreshold leakage, it is important to consider die-to-die temperature variations which can significantly increase the leakage power due to electrothermal couplings between power and temperature. Furthermore, the full-chip leakage power distribution arising due to both within-die and die-to-die P-T-V is calculated, which is subsequently used to estimate the leakage constrained yield under the impact of these variations. The calculations show that the yield is significantly lowered under the impact of within-die and die-to-die process and temperature variations. Uneven power consumption in various logic blocks in an IC chip results in two effects: (a) appearance of hot spots and temperature gradients (b) supply voltage variations due to current demands of logic blocks and design of a power/ground distribution network. To accommodate variations in local temperatures and supply voltage levels, designers have traditionally been forced to pad logic cell characteristics and design margins. However, creating the power distribution network using excessively conservative design practices can result in loss of valuable silicon real estate, increasing congestion, and thereby, resulting in performance loss.[39] Voltage drop happens in Vdd rail as well as ground. The voltage drops on the power rail can be in the form of a self-induced IR drop from the external power pin to the power terminal of a logic block due to the current that is drawn by the logic block itself. In case there are large logic black along

the power line, it will consume more power when it is turned on. This give the power network a even worse situation since the uneven distribution not only cause by the logic block itself but also the block related to it. Besied IR drop, the inductive voltage drop (Ldi/dt noise) must also add to this DC drop, this make the circuit even easier to fail. Since resistance are changing with temperature, the temperature distribution will influence the circuit performance or even induce timing failure. In [39], authors summarize the relationship between these aspect as figure 6.1 show.

Figure 6.1 Thermal analysis affects timing (with signal integrity), power and P/G network analyses. 4.2. Thermal-Aware circuit design and analysis In [36] Authors developed a TCAD tool, ERNI, which allows process-sensitive and layout-specific reliability estimates for fully laid out or partially laid out integrated circuits

Figure 7.1 A flowchart for a full hierarchical circuit-level reliability assessment, the basis for the prototype tool ERNI As the 3D integration technology is not yet widespread, and no CAD tool supports IC layouts for such a technology, authors extended ENRI to 3-D ENRI, a comprehensive 3D circuit layout methodology. The circuit on each wafer or device interconnect layer can be laid out separately with inter-wafer via information embedded in the layout. The inter-wafer via information is generalized into three categories sufficient for defining all types of interconnection between wafers in a 3D stack . A strategy for layout-file management that incorporates the orientation of each wafer in the bonding process is also proposed. In [20] the effects of temperature on very large-scale integration design are presented, and an analytical technique is introduced to systematically design and evaluate thermal control mechanisms, such as the dynamic clock throttling (DCT) and the

dynamic frequency scaling (DFS). Using the energy-delay product (EDP) metric, the DFS is shown to outperform the DCT. One of the difficulties in crafting an effective thermal control mechanism is the lack of systematic methods to evaluate the design. To address this difficulty, authors performed first-order analysis even on techniques that vary significantly in concepts. First, an overview of the control point selection methodology is presented, and it is followed by descriptions on how both techniques above can be evaluated and compared. In generic terms, the objective of the selection methodology is to minimize the energy-delay product (EDP). Authors analyzed the DCT and DFS respectively. From the analysis result, DFS is outperformed than DCT.

Figure 8.1 Left DCT transient behavior. Right: DCT power-performance tradeoff.

Figure 8.2 left DFS transient behavior. Right: DFS power–performance tradeoff.

Besides analyzing from geometry and system view, thermal impact has been investigated in certain component [19][62] e.g. interconnect. In [21], nonuniform substrate impact on interconnect is analyzed. Authors presents a detailed modeling and analysis of the interconnect performance degradation due to the nonuniform temperature profiles that are encountered along long metal interconnects as a result of existing thermal gradients in the underlying silicon substrate. A new distributed RC delay model that incorporates nonuniform interconnect temperature dependency is analyzed. The model was applied to analyze a wide variety of interconnect layouts and temperature profiles. Analytical models for accurate interconnect temperature distributions arising from nonuniform substrate temperature profiles were derived using fundamental heat diffusion equations. It was shown that the clock skew could be significantly impacted by the interconnect temperature nonuniformities. These studies reveal the necessity of incorporating the nonuniform chip thermal analysis during various opti mization and planning steps in physical-synthesis flow in high-performance IC designs.

Figure 8.3 left Schematic of two exponentially-distributed thermal profiles in different directions along the length of an interconnect line used to examine the effect of such nonuniformities on the signal propagation performance. In the worst case, both thermal profiles impose an excessive gradient of (_)110 C along the length of the interconnect line. Right: Performance degradation in a 2000-_m-long interconnect line subject to thermal profiles T and T (cf. Fig. 8) as compared to the delay of the same line at uniform temperature of 27 C. The x axis shows the value of T in left, while T assumed to be fixed at 40 C.

Figure 8.4 left Constant-peak normally distributed thermal profile with variable median _ and standard deviation _ along a 2000-_m-long interconnect line. Delay degradation of a 2000-_m-long interconnect line subject to a constant-peak normally-distributed thermal profile shown in Fig. left as a function of its median value for various standard deviations (s), comparing to the delay of the same line at uniform room temperature of 27 C. In [47], Authors investigated the thermal coupling effects between interconnects. Two coupling cases were considered, parallel coupling between a power line and a signal line in the same layer and cross-at-90 coupling between a power line and a line array. Authors have shown that the temperature reduction due to the cross-at-90 case is significant even if the line array carries current, while the effect of parallel coupling is negligible. Modified design rules in terms of maximum allowed rms current density were proposed. can be increased by up to 20% when coupling effects are taken into account. By fitting the data from the simulations, authors developed semi-empirical formulae for interconnect temperature. These formulae can be implemented in CAD tools to provide more accurate simulation of circuit timing and reliability.

Authors have reported the first successful application of the DISMAP and NESSUS technologies to the field of semiconductor on-chip interconnect reliability analysis. [18] In comparing the DISMAP results to the finite-element analysis (FEA) predictions, it is observed that because the strain field appears to be affected by larger-scale phenomenon in reality, the modeling of a single interconnect structure for the purpose of examining strain fields may be of limited utility. Some similarities between the FEA results on a single interconnect and the DISMAP results are observed. However it is likely that to be truly useful, the FEA should incorporate three-dimensional geometries on a scale large enough to account for the effect of neighboring structures. This observation is important since the study of single interconnects is the primary mode of investigation reported in the literature. Despite the limitations of the FEA of a single interconnect, in combining it with NESSUS, results that are perhaps more pertinent can be obtained. NESSUS has shown that the statistical importance of copper’s thermal coefficient of expansion and oxide’s Young’s modulus are quite dominant and it is reasonable that this type of result might also carry over to the large scale analysis. To appreciate the importance of that result, it is worth noting that data recently collected by a major chip manufacturer has exhibited a 300% variation of the oxide’s Young’s Modulus on a single wafer. For those engaged in determining geometric tolerances, the mesh can be parameterized using random variables and the same type of reliability analysis repeated. The three major random variables in the thermal modeling is copper’s thermal expansion coefficient(Cu alpha), the Intial temperature, and Oxide’s Young’s Modulus(Ox E) The approach combining DISMAP measurement with FEA and probabilistic analysis has the potential to be very useful in the field of interconnect reliability analysis. The limitations of the model are highlighted by the experimental data, while the limited range of exploration offered by DISMAP is expanded by NESSUS.

Figure 8.5 left: The cross-sectioned interconnect structure viewed at high accelerating voltage (15 kV) in the scanning electron microscope. Right: 8. Interconnect material behavior model.

5 Thermal and Electrothermal Simulation The coexisting electrical and thermal networks are not independent; they are coupled to each other. To predict electrothermal effects, these electrothermal circuit simulators are similar to the commonly used SPICE simulatoc can handle the thermal subnetwork and the electricalthermal coupling effects as well.

Due to the tremendous complicity in the VLSI circuit, gate level or logic level simulation and modeling are not feasible. Since temperature is not dependent to electricity, various simulation and modeling methods have been developed. It can be put into two categories. The first category of methods is based on the discretization of differential operators or the field quality. The corresponding thermal simulators solve the heat conduction problem numerically by using various techniques such as the finite difference , the finiteelement, or the boundary-element techniques.[39] The advantage of these methods is their high accuracy and ability to handle different heat sources and different types of boundary conditions. The main draw back of these methods is the enormous sizes of the resulting thermal circuits due to volume meshing. Different techniques have been devised to tackle this shortcoming. Examples include 3D thermal ADI in [75] and model order reduction in [44]. The second category of methods is based on Green function formulation [26][52], which provides a fast, yet less accurate thermal simulation, due to the simplified two dimensional modeling of the thermal problem.[39] There are two major approaches to doing electrothermal simulation of a given circuit. In the first approach, the thermal problem is mapped into an equivalent electrical problem and an electrical solver performs co-simulation of both electrical and thermal subsystems that co-exist in an IC chip. In the other approach, two independent simulators, one thermal simulator and one electrical circuit simulator, iterate interactively to deliver the solution of the given electrothermal problem. Both approaches have advantages and disadvantages of their own; however, the latter approach, which is known as the relaxation method, is more desirable.[39] With the relaxation method, existing software packages can be used for basic simulations and the electrical and thermal model of a specific chip can be constructed separately. The most important representatives of relaxation methods are ILLIADS-T and EST-A. The advantage is the relative simplicity of the implementation, the drawback is that very fast changes cannot be considered, and in case of strongly coupled thermal problems, the simulator coupling frequently cannot achieve convergence. 5.1 SISSI In reality an IC is composed of two coupled systems: an electrical circuit (its proper functioning is the ultimate design objective) and a thermal system represented by the silicon chip and its thermal environment. These two subsystems are coupled through the dissipation of the electrical circuit (realizing heat sources) and through the temperature dependence and sensitivity of the components of the electrical circuit (temperature dependence of semiconductor device parameters, Seebeck-effect of Si-Al contacts). The mutual dependence of these subsystems can be briefly formulated by

the corresponding state equations:

or the electrical side and by

In simultaneous iteration equations, these two equations are treated as one single system, while in the relaxation method they are simulated seperately.

Figure 9.1 General flow-chart of electro-thermal simulation, performed by the method of simultaneous iteration.

In [75] Authors suggest that for large digital chip, the relaxation method should be applied for coupling a logic simulator (using logic models of gates, flip-flops and other standard digital building blocks) and a possibly quick thermal simulator. Since logic simulation is coupled to thermal simulation, they call this method logi-thermal simulation as shown in 9.2.

Figure 9.2 General flow-chart of the logi-thermal simulation 5.2 ILLIADS Illinois Analog Digital Simulator (ILLIADS) is a fast timing simulator for MOS VLSI circuits and their interconnects. There are special two versions of ILLIADS, namely ILLIADS-I and ILLIADS-T. ILLIADS-I can handle circuits with complex RLC interconnects by performing efficient and accurate interconnect model order reduction, while ILLIADS-T is an enhanced electrothermal timing simulator that considers the effects of temperature on circuit operation. Note that for now the special functionalities of ILLIADS-I and ILLIADS-T are mutually exclusive, i.e., ILLIADS-I is not capable of performing electrothermal simulation, while ILLIADS-T cannot handle complex interconnects with inductance effects.[25] IILIADS aims at finding the on-chip temperature profile, hot spots, and the resulting circuit performance for VLSI chips. A decoupled approach is employed to find the steadystate temperature, which greatly reduces the computation time compared with coupled electrothermal simulation. IILIADS also includes a numerical 3-D thermal simulator. This thermal simulator takes into account various packaging structures and boundary conditions.A tester chip has been designed and fabricated to facilitate the verification and calibration of ILLIADS-T. Good agreement between measured and simulated results has been observed. The thermal effect has been shown to have a significant impact on the overall circuit performance. The ILLIADS-T program has been included as part of the electromigration diagnosis tool, TEM, for temperaturedependent interconnect mean-time-to-failure (MTF) estimation. [25]

Figure 10.1 Flowchart of ILLIADS-T electrothermal simulation.

5.3 using PROPHET for full chip thermal modeling As we have discussed before, thermal models have been analyzed and applied in the local chip area. But full chip evaluation is less focused. Button et. al used PROPHET, an equation solver to set up a thermal profile for the full chip. In [78] the heat generation is pre-calculated at the functional block level using electrical circuit simulators, e.g., PowerMill from Synopsys. The thermal diusion equation is then solved rigorously on the chip level taking into consideration the

complete structure of the chip including interconnect layers and package. The temperature distribution simulated can be used for the analysis of signal/clock propagation delay along global paths. With further improvement, the coupled electro-thermal simulation is possible by feeding the temperature information back to the functional blocks for electrical simulation. The important work in their paper is to model the multi-layer structure accurately. A script-based simulator, PROPHET, are utilized to solve the thermal diffusion equation for real, engineering problems. The code has been interfaced to widely available visualization tools and enhanced by post-processing programs to allow multi-dimensional rendering and probing of the simulation results. 5.4 ETS-A In [79], authors used a sequence of procedures: layout extraction with x-y coordinates for individual transistors, fast timing- based power calculation, analytical thermal simulation using integral transform, followed by the electrothermal iterations until convergence. ETS-A takes advantage of the fast timing simulator while preserving the accuracy with use of temperature-dependent regionwise quadratic (RWQ) MOS transistor modeling techniques. The novel mixed 3-0 & 1-D thermal simulator implemented in ETS-A eficiently takes into account the chip packaging and the thermal boundary conditions (BCs), which were often ignored an typical thermal simulations. With ETS-A, on-chip temperature profile can be calculated and further applied to guide the temperature-driven module placement as well as chip packaging designs.

Figure 11.1 left Temperature distribution on substrate top using PROPHET, Right: Temperature distribution along the global data path (X = 7000 _m in Figure 1) 5.5 QUILT QUILT stands for quick utility for integrated circuit layout and temperature modeling. QUILT permits users to rapidly build floorplans of integrated circuits, providing both a visual aid as well as an input to the Hotspots simulator. The tool provides numerous features for estimating circuit performance, such as interconnect delay, and for generating graphical images for publications. QUILT is based on Java, so it is

platform-independant. QUILT has been developed using Java, which facilitates its execution on multiple platforms. QUILT acts as an interface between raw text data and the user. It can run on a variety of computing platforms which makes it accessible to many users. Using QUILT enables users to make changes to IC layout quickly and to evaluate and analyze the results of their modifications. One of the features not seen in other tools is the ability to generate graphics for hard copies or for use in presentations and documentation. Finally, QUILT addresses the issues of temperature and interconnect. These are two areas of growing importance for future microprocessors, and need increased emphasis in the classroom. This tool provides interactive visualizations which are effective in helping to meet that need.[80]

6. Sensors The most suitable place for temperature sensors is inside the package, somewhere on the chip surface. The most cost effective way to realize this arrangement is to integrate the temperature sensor with the chip’s circuitry, thus avoiding an excess production step for sensor insertion. For digital VLSI circuits, integrated sensors should meet special requirements: compatibility with the given process (typically digital CMOS) low area requirement and power consumption and easy interfacing to the digital environment. [67] CMOS compatible sensor is also a merging technique. Frequency-output arrangements are a straightforward way to provide easy digital interfacing. In this case, the sensor is a square-wave oscillator whose frequency depends on the temperature. By counting the oscillator’s pulses in a time window, one obtains a measure of the temperature.. The oscillator’s frequency depends on the temperature. Unfortunately, however, the frequency also depends on the bias voltage. A configurable ring oscillator overcomes this problem. One can switch this oscillator between two configurations showing different temperature and bias sensitivities. By measuring the two frequencies in these configurations, one obtains both the temperature and the bias voltage from a calibration diagram. When this circuit was applied in a VLSI dataflow processor, the accuracy of the temperature measurement was ±3°C, which is not good but acceptable for the purpose. The area consumption was 0.5 mm2 for a 1-micron process. A number of sensors use parasitic, lateral or substrate bipolar transistors, which can be realized in most CMOS processes. These are usually PTAT sensors. PTAT means that the output value is proportional to the absolute temperature. In [82], authors described a smart temperature sensor based on a pnp substrate transistor. This sensor is integrated with a sigma-delta A/D converter. The power consumption is extremely low in sampled operation: 7mW. The accuracy is ±1°C. The area requirement, however, is about 0.6 mm2 for a 0.7-micron process. In a completely CMOS solution announced recently,12 the sensor is based on the temperature dependence of the two main parameters of MOS transistors: threshold voltage and gain factor. A current-frequency converter obtains the digital output. The sensor’s temperature

sensitivity is -0.81%/°C. The bias voltage dependence is fairly weak: only -1.3°C/V in terms of the measured temperature. The sensor’s area is 0.0185 mm2 for a 1-micron process, and its power consumption is less than 200 mW. Its accuracy and long-term stability are better than ±1°C. 6.1 Thermal sensor application in temperature adaptive design In [81] Authors designed a low overhead process variation tolerant temperature sensor with good sensitivity over a wide temperature range. They also proposed a temperature adaptive supply voltage scaling technique. It correctly tracks the changes in the die temperature of the chip and modifies the voltage supply to keep the die temperature constant. This low overhead CMOS thermal sensor is tolerant to process variation and has a linear sensitivity over a wide range of temperature. It is shown that the temperature adaptive supply voltage scaling scheme (TASS) scheme correctly tracks the die temperature and adaptively modifies the voltage supply so as to reduce the power consumption dynamically. With TASS scheme, the die temperature can be kept relatively constant for a wide range of ambient temperature changes.

Figure 12. 1 left Temperature Sensor Schematic right Temperature adaptive supply scaling (TASS)

block diagram

Figure 12.2 Temperature adaptive supply scaling (TASS) schematic

6.2 Thermal Sensors and Adaptivity for power reducation In [83] authors demonstrate that by using temperature sensors, one can adapt a hybrid “Cache Decay”-“Drowsy Cache” technique, to the existing thermal conditions. This approach adapt to the leakage conditions of the chip, and to yield the maximum energy reduction, with the minimal performance cost. They also present two mechanisms to perform this scheme, a digital one based on hierarchical counters and a novel, cost-effective 4T DRAM based, analog one.

Figure 13.1 Two Adaptive Mechanisms: an accurate digital and a novel analog 6.3 Delay-line-based Thermal Sensor A CMOS smart temperature sensor featured with extremely small chip area and low-power consumption has been presented in [84]. By the replacement of voltage or current ADC used in conventional versions with a cyclic TDC, the achieved chip area is as small as 0.09 mm2 which is less than one-twentieth of those of most former versions. Accordingly, a temperature-to-time generator rather than a BJT-based sensor is utilized to generate the thermal sensitive time interval required by the cyclic TDC. The digital output of the sensor is highly linear and no curvature correction or dynamic offset-cancellation is required to reach satisfactory accuracy. By thermally compensating the TDC for linearity and resolution enhancement and sharing one thermal compensation circuit between two adjacent cells for chip size and power consumption reduction, the error, resolution, chip size and power consumption of the chip are decreased from −0.9–0.7 ◦C, 0.16 ◦C, 0.175 mm2 and 10 μW of its former version to ±0.6 ◦C, 0.09 ◦C, 0.09 mm2 and 1.5 μW, respectively.

Figure 14.1 left: Circuit of the proposed temperature-to-time generator. Right: (a) Former delay cell

and (b) proposed delay cell used in delay line 1.

Figure 14.2 Left: The block diagram of the cyclic TDC. Right: The effect of the gates’ inhomogeneity on pulse width.

7. Synthesis, floorplanning and placement Traditional synthesis flow treats substrate and interconnect layer uniform in temperature. It is not sufficient to model a simple uniform temperature through the chip in DSM regime since the potential thermal risk may degrade or damage the chip. During placement and floorplanning, designer has chance to move the hot block apart from each other, which reduce the temperature of hot spots. In reality, this kind of thermal-aware synthesis only can be realized with the CAD tools. 7.1 Matrix approach for thermal placement In [10], author considered the thermal placement problem for gate arrays. They introduced a combinatorial optimization problem MSP (Matrix Synthesis Problem) to model the thermal placement problem. Given a list of mn non-negative real numbers and an integer t, MSP constructs a m_n matrix out of the given numbers such that the maximum sum among all t_t sub-matrices is minimized. They showed that MSP is NP-complete and present several provably good approximation algorithms for the problem. They also demonstrated that the thermal placement strategy is flexible enough to allow simultaneous consideration of other objectives such as wiring. 7.2 IP Virtualization and Placement for Networks-on-Chip Architecture Irwin et. al proposed a thermal-aware virtualization and placement for the NoC architecture in [6]. A particular property of Network-on-chip architecture is the concept of hardware virtualization, which maps one or more logical processing units onto a single PE, thus allowing the PE to virtually perform the computation of one or more (depending on the degree of virtualization) logical processing units. An accurate yet simple temperature is required since Temperature can have dramatic impacts on circuit behavior. Authors used HotSpots to model the PE matrix as shown below:

where Pi is the power consumed by IP block PEi and Ti is the temperature of the IP block PEi. The transfer thermal resistance matrix can be obtained from Hotspot, given the IP block placement. Then, they gave a thermal-aware mapping framework based on a genetic algorithm. The fitness function, which decides the survival chance for a specific chromosome, is related to the mapping goals. Depending on the optimization goal, the fitness function of the genetic algorithm is different. This thermal-aware algorithm needs to balance the performance, power, temperature, and communication simultaneously. They also test their algorithm on a Low Density Parity Check (LDPC)

decoder on networks-on-chip architecture, and evaluate their algorithm by using this real application.

Figure 15.1 Derivation of the Bipartite Graph from the H-Matrix. In [70] Consider interconnect power consumption in exploring a thermal-aware floorplanner for 3D structures. They consider Wafer-bonding as 3D structure extended Hotspots from 2D structure to 3D, i.e. HS3D take place with Flotherm The experiment take IVM processor model, i.e. similar to Alpha. They used B* tree floorplan model/perturbation operation and simulated annealing engine which treats actual dimension as “soft” model. They modeled interconnect power distribution and temperature as an approximation. 7.3 Physical-Aware Synthesis of Vertically Integrated 3D Systems Vertical integration of active device layers in the third dimension makes room for significant reduction in interconnect lengths leading to decrease in both delay and power consumption associated with interconnects. This characteristic can be well leveraged only by judicious layer assignment of active devices. To address layer assignment as a part of behavioral synthesis, author in[86] proposed a 0-1 linear program formulation to simultaneously schedule, bind and perform layer assignment for synthesis of vertically integrated 3D systems. In these systems, inter-stratal communication occurs through inter-layer vias that have higher resistivity and capacitance. In [86], they proposed a trade-off between reducing the total interconnect delay in critical paths while reducing the inter-stratal communication. Three different objective models are proposed and their combinations examined to find the most suitable methodology to achieve these goals. A power gradient is proposed between layers to address the thermal issues associated with these systems that has been shown to be of concern. Results showed a significant reduction in total interconnect lengths compared to a traditional two-dimensional implementation. The examination of the various proposed optimization objectives indicate that a combination of communication minimization and critical path optimization leads to the best synthesis results for a range of benchmarks. 7.4 Thermal –Aware Unified Physical-Level and High-Level Synthesis In [87], authors proposed an efficient and accurate thermal-aware floorplanning high-level synthesis system that makes use of integrated highlevel and physical-level

thermal optimization techniques. Voltage islands are automatically generated via novel slack distribution and voltage partitioning algorithms in order to reduce the design’s power consumption and peak temperature. A new thermal-aware floorplanning technique is proposed to balance chip thermal profile, thereby further reducing peak temperature. The proposed system was used to synthesize a number of benchmarks, yielding numerous designs that trade off peak temperature, integrated circuit area, and power consumption. The proposed techniques reduces peak temperature by 12.5C on average. When used to minimize peak temperature with a fixed area, peak temperature reductions are common. Under a constraint on peak temperature, integrated circuit area is reduced by 9.9% on average. As shown in fig. 16.1, A comparison of left and right pictures indicates that voltage islands can dramatically improve thermal conditions. The number of functional units with temperatures above the thermal constraint decreased from 29 to 19. They later proposed an algorithm the decrease the peak temperature in the rest of the units. They used a tightly integrated thermal model and incremental floorplanner to optimize ICs peak temperatures, areas, and power consumptions, while meeting performance constraints. In order to optimize peak temperature, it was necessary to tightly integrate floorplanning, wire modeling, power profile generation and chip-package thermal analysis with high-level synthesis. Experimental results indicate that TAPHS is able to trade off peak temperature, IC area, and power consumption. The proposed techniques allowed a reduction in peak temperature of 12.5C, on average. Peak temperature was also reduced under a fixed area constraint. Moreover, we have found that thermal optimization can allow significant improvements in IC area under temperature constraints. We conclude that it is important to incorporate thermal optimization in high-level synthesis to support continued increases in device and power density.

Figure 16.1 left: Post-synthesis thermal profile without voltage islands. Right: Post-synthesis thermal profile with voltage islands. 7.5 Resource Allocation and Binding Authors introduced resource binding techniques to create awareness of temperature effects during high-level synthesis [88]. The main goal was to effectively minimize the maximum temperature that is reached by any module in a design. A reliability-driven design methodology can leverage on this mechanism to prevent or

reduce the likelihood of hot spots on a chip. They formulated the temperature-aware resource binding problem, develop models for thermal profiles of functional resources for a given task assignment. They also developed resource binding techniques to effectively control maximum temperature of a binding, study the impact of temperature on leakage current. Specifically they compare the performance of a switching optimized binding and the temperature-aware binding with respect to total power consumption, where the leakage power has a growing contribution with increasing temperature. The results show that we can bound the maximum temperature on any module with overhead on area (28% increase in number of multipliers and 54% increase in the number of ALUs), and power (34% increase in total power) using the temperature constrained resource minimization (TC_R_MIN) technique. Using the resource constrained temperature minimization (RC_TEMP_MIN) technique, on the other hand, the maximum observed temperature can be reduced by 5oC on average, while incurring no area and a small (5% on average) power penalty

Figure 17.1Overall experimental flow.

8. Test Thermal testing is part of the overall testing concept it become an important aspect of chip performance. The hottest region of an electronic assembly is always the chip surface which makes testing and Measurement result may differ from the actual temperature. Digital circuits are less sensitive to thermal couplings than analog circuits since it have higher noise Margin. But digital circuit still be influenced on voltage and propagation time of logic gates. The common methods of measuring temperature distribution in testing are creating a thermal map of the chip surface. Designer need to realize that not only chip itself, electrostatic discharge (ESD) will

also give problem.. 8.1 Design for Thermal Testability Testability is one of the concern designer needs to be considered. Designer should establish a regular, commonly accepted methodology for insertion of thermal sensors. At a minimum, such a methodology should define sensor features, sensor placement, architectural aspects of control and readout paths, and temperature evaluation methods. This DFTT methodology would result in designs with enhanced thermal testability, for the price of some excess circuitry. The situation is similar to general testability solutions that use excess circuitry to facilitate testing: boundary-scan, built-in self-test circuits, scan paths in the core, and so forth.[67] In [67], authors discussed three sensor placement strategy. Designer can use one temperature sensor per chip, without attention to its on-chip position. Automatic place-and-route tools can perform placement, with no special measures for including the sensor. The design procedure is simple. A weakness is that the measured temperature is only an average value for the chip, and possible hot areas cannot be monitored separately. An alternative way is to use one temperature sensor at a prescribed position. Obviously, when the chip contains only one power block, the sensor should be placed at the expected hottest point. In this case, the design procedure requires manual steps. Designer can also use several sensors on the chip. The sensors can follow a regular arrangement: a sensor matrix of 2´2 or 4´4 cells. Another possibility is to match the arrangement to the chip’s structure by placing a sensor beside circuit elements or blocks that have considerable dissipation. The design requires manual steps. This procedure obtains a rough image of the chip’s internal temperature distribution called the thermal signature. The thermal signature serves as a test measure; deviation from the nominal distribution may indicate defects.

Figure 18.1 Boundary-scan interfacing of the CMOS-compatible temperature sensor.

8.2 Testing thermal throttling The Thermal Monitor/Thermal Throttling mechanism is present in Pentium 4 CPUs with Northwood and Prescott cores and functions precisely in line with the specs in [89]. Evidently, particular values of time intervals and temperatures measured can hardly be applied to any system. This is due to hardware differences (CPU frequency,

cooler and thermal grease pair, motherboard model, ambient temperature, opened/closed case). However, the mechanism of throttling functioning and throttling patterns have been experimentally established and confirmed. At standard rate of performance loss (about 50 percent), throttling is not enough to prevent a CPU overheating which leads to an automatic shutdown. On the other hand, now that we know the temperature threshold we can focus on measures that wouldn't let temperatures reach it. It can be efficient ventilation, case coolers, more aggressive throttling options in BIOS Setup, etc. It proved very difficult - if possible at all - to burn a Pentium 4 CPU by stopping the cooler. The manufacturer (Intel) doesn't try to hide this fact, underlining instead a high importance of cooling system and its correct installation. In other words, working at a risky, close-to-throttling rate is declared normal for Prescott, and cooling system is charged with an overheating-prevention task. On the whole, the Thermal Monitor/Thermal Throttling technology deserves to be praised, even taken separately as a bare technology. It’s better to try to stop or slow down a dangerous temperature rise in the first place, and only then considering a shutdown option than just stopping a CPU when it has reached a certain temperature. 8.3 Thermal testing by using thermal coupling Authors proposed a dynamic and spatial thermal behavioral characterization of VLSI MOS devices is presented using laser thermoreflectance measurements and on-chip differential temperature sensing circuits in [90]. They presents a spatial and dynamical characterization of the thermal waveform generated by an MOS transistor behaving as a heat source; and also presents a sensing strategy able to monitor these temperature increases. They discussed the fault detection and fault diagnosis capabilities of thermal testing. The advantages that temperature can offer as a test observable are as follows. 1) There is no electrical loading of the CUT. Thermal coupling is the link between the CUT and the monitoring circuit. 2) Both on-line and off-line tests are possible. 3) A differential sensing strategy may provide immunity to changes in the silicon surface thermal map offset. 4) Temperature measurements provide diagnosis capabilities. On the other hand, from the results presented in this paper, thermal testing has two main drawbacks, i.e. area overhead and speed degradation of test application rate.

9. Summary Thermal impact on integrated circuit has become an active research area due to the increasing temperature-induced problems. Since temperature changing depends on power, computation migration or other dynamic thermal management may effective reduce the temperature by lowering the dynamic power. While leakage remain a big portion in total power in nanometer technology, barely modeling dynamic power is no longer sufficient. Another problem of DTM is the accuracy and response speed. In order to keep the system in a safe temperature, designers may tend to “over” design

the system. As every technique may influence the system performance, too pessimistic design is too “expensive” to the system. Whole chip thermal modeling and solving are more meaningful and feasible than local area management, since the temperature may vary a lot through the chip and there are too many transistors on the chip to be solved. Several modeling techniques such as meshing, finite element analyzing are discussed. On the contrary, global temperature distribution pattern are very useful for circuit level design. Computer Aided Design (CAD) may also be a effective to temperature control. By estimating the temperature of each unit, design can place and floorplan accordingly. Designer can also do thermal-aware resource allocation and binding based on the logic function of an operation. Thermal synthesis quality may be largely influenced by the accuracy of the thermal model. Since the true power information is unknown before the design finished, all power/temperature information can only be estimated. CAD may also need to consider the sensor placement, though thermal sensor itself is a very debating area. Thermal sensors can be direct and indirect. Thermal sensors may not reflect the actual temperature since there are always difference between substrate and worst case temperature. Sensor is also crucial in testing. In order to do the thermal test, design for testability need to be considered. Heating problem has become crucial in the nanometer technology. No single design or modeling can solve the problem alone, but further understanding and analyzing will help us mitigate the problem. Cross field research such as electrical/thermal, device/circuit/architecture, deterministic/statistics, experiment/mathematics deduction…is a must to understanding and solving the problem.

10 References: [1] C. Poirier, R. C. Bostak and S. Naffziger, “Power and Temperature Control on a 90nm Itanium®-Family Processor”, ISSCC, 2005.[link] [2] J. Donald and M. Martonosi, “Temperature-aware design issues for smt and cmp architectures,” In Proceedings of the Workshop on Complexity-Effective Design (WCED). ACM Press, 2004. [link] [3] D. Brooks and M. Martonosi, “Dynamic Thermal Management for High-Performance Micro-processors”, Proc. 7th Int’l Symp. High-Performance Computer Architecture, IEEE CS Press, pages 171-182, 2001.[link] [4] J. Srinivasan and S. Adve, “Predictive Dynamic Thermal Management for Multimedia Applications”, Proc. 17th Int’l Conf. Supercomputing, ACM Press, pages 109-120, 2003.[link] [5] W. Huang, M. R. Stan, K. Skadron, K. Sankaranarayanan, S. Ghosh, and S. Velusam. “Compact thermal modeling for temperature-aware design”, In Proceedings of the 41st annual conference on Design Automation, pages 878–883, 2004.[link] [6] W.-L. Hung, C. Addo-Quaye, T. Theocharides, Y. Xie, N. Vijaykrishnan, and M. J. Irwin. “Thermal-aware IP virtualization and placement for networks-on-chip architecture”, In International Conference on Computer Design (ICCD), pages 430–437. ACM Press, 2004.[link] [7] T. Juan, J. J. Navarro, and O. Temam. “Data caches for superscalar processors”, In Proceedings of the 11th international conference on Supercomputing, pages 60–67. ACM Press, 1997.[link] [8] Y. Li, K. Skadron, Z. Hu, and D. Brooks. “Evaluating the thermal efficiency of SMT and CMP

architectures”. In IBM T. J. Watson Conference on Interaction between Architecture, Circuits, and Compilers, October 2004.[link] [9] K. Skadron, M. R. Stan, K. Sankaranarayanan, W. Huang, S. Velusamy, and D. Tarjan. “Temperature-aware microarchitecture: Modeling and implementation”, ACM Trans. Archit. Code ,Optim., 1(1):94–125, 2004 [link] [10] C. C. N. Chu and D. F. Wong. “A matrix synthesis approach to thermal placement”, In Proceedings of the 1997 international symposium on Physical design, pages 163–168.ACM Press, 1997.[link] [11] P. Chaparro, J. Gonz´alez, and A. Gonz´alez. “Thermal-aware clustered microarchitectures”, In International Conference on Computer Design (ICCD), pages 48–53. ACM Press, 2004.[link] [12] W. Huang, M. R. Stan, K. Skadron, K. Sankaranarayanan, S. Ghosh, and S. Velusam. “Compact thermal modeling for temperature-aware design”, U. of Virginia technical report [link] [13] M. S. Bakir, B. Dang, R. Emery, G. Vandentop, P. A. Kohl and J. D. Meindl, “Sea of leads compliant I/O interconnect process integration for the ultimate enabling of chips with low-k interlayer dielectrics” IEEE trans. Advanced Packaging, vol 28, No.3 Aug. 2005 [link] [14] J. N. Calata, J. G. Bai, X. Liu, S. Wen, and G. Lu. “Three-Dimensional packaging for power semiconductor devices and modules”, IEEE trans. Advanced Packaging, vol 28, No.3 Aug. 2005 [link] [15] L. Zhu, Y. Sun, J. Xu, Z. Zhang, D. W. Hess1 and C. P. Wong, “ Aligned carbon nanotubes for electrical interconnect and thermal management”, In Electronic Components and Technology,(ECET), pages 44-50, 2005.[link] [16] F. A. Mohammadi, K. Meres and M. C. E Yagoub, “Rigorous thermal treatment of heat generation and heat transfer in GaAs-based HBT devices modeling”, In EuroSimE, 2005 [link] [17] F. Yu, M-C. Cheng, P. Habitz and G. Ahmadi, “Modeling of thermal behavior in SOI structures”, IEEE trans. Electron Device, vol 51, No.1 Jan. 2004 [link] [18] S. R. Runnels, R. A. Page, M. P. Enright, and H. R. Millwater, Jr. “Advanced experimental and computational tools for robust evaluation of On-Chip interconnect reliability”, IEEE trans. Semiconductor manufacturing, vol 15, No.3 Aug. 2002 [link] [19] B. Li, D. Harmon, J. Gill, F. Chen and T.Sullivan, “Thermal and electromigration challenges for advanced interconnects”, IEEE International Integrated reliability workshop, 2004 [link] [20] W. R. Daasch, C. H. Lim and G. Cai, “Design of VLSI CMOS circuits under thermal constraint”, IEEE Trans Circuit and Systems II, VOL. 49, NO. 8 Aug. 2002 [link]

[21] A.H. Ajami, K. Banerjee, and M. Pedram, “Modeling and analysis of nonuniform substrate temperature effects on global ULSI interconnects”, IEEE Trans Computer-aided Design of Integrated Circuit and Systems, VOL. 24, NO. 6 Jun. 2005[link] [22] S. F. Al-sarawi, D. Abbott and P. D. Franzon, “A review of 3-D packaging technology”, IEEE Trans Components, Packaging, and Manufacturing Technology-Part B, Vol. 21, No. 1 Feb. 1998[link] [23] D. Deleganes, J. Douglas, B. Kommandur, and M. Patyra “ Designing a 3GHz, 130nm, Intel Pentium 4 processor”, Symposium On VLSl Circuits Digest of Technical Papers, 2002 [link] [24] M. R. Casu, M. Graziano, G. Masera, G. Piccinini, and M. Zamboni “An Electromigration and Thermal Model of Power Wires for a Priori High-Level Reliability Prediction”, IEEE Trans. Very Large Scale Integration (VLSI) Systems, Vol. 12, No. 4 Apr. 2004 [link] [25] Y-K Cheng; P. Raha, C-C Teng, E. Rosenbaum, and S-M Kang, “ILLIADS-T: an electrothermal timing simulator for temperature-sensitive reliability diagnosis of CMOS VLSI chips”, IEEE trans Computer-Aided Design of Integrated Circuits and Systems, Volume 17, No 8 Pages:668 – 681, Aug.

1998 [link] [26] Y. Zhan; S.S.Sapatnekar, “Fast computation of the temperature distribution in VLSI chips using the discrete cosine transform and table look-up”, In Proceedings of Design Automation Conference, Asia and South Pacific, 2005. [link] [27] L. K. Wang, H.H. Chen, T-D Yuan, and B. Z. Hong, “Performance projection and thermal management of high performance VLSI designs”, In Proceedings on Solid-State and Integrated-Circuit Technology, 2001.[link] [28] J. Altet, A. Rubio, E. Schaub, S. Dilhaire, and W. Claeys, “Thermal coupling in integrated circuits: application to thermal testing”, IEEE Journal of Solid-State Circuits, Vol 36, No 1 Pages: 81 – 91, Jan. 2001 [link] [29] S. A. Bota, M. Rosales, J. L. Rossello, and J. Segura, “Smart temperature sensor for thermal testing of cell-based ICs”, In Proceeding of Design, Automation and Test in Europe, 2005. [link] [30] M. L. Mui, K. Banerjee, and A. Mehrota, “Power supply optimization in sub-130 nm leakage dominant technologies”, In Symposium on Quality Electronic Design, 2004.[link][31] D Deleganes, J Douglas, B Kommandur, M Patyra,” Designing a 3GHz, 130nm, Intel® Pentium®4 Processor“IEEE Symp. on VLSI Circuits, 2002 [link] [32] P. Chaparro, G. Magklis, J. González and A. González” Distributing the Frontend for Temperature Reduction” Proceedings of the 11th Int’l Symposium on High-Performance Computer Architecture 2005 [link] [33] C. J. M. L. H. Vinke, “Recent Achievement in the Thermal Characterization of Electronic Devices by Means of Boundary Condition Independent Compact Models,” 13th IEEE SEMITHERM Symposium, Austin, Texas, 1997.[link] [34] E. C. H.A. Mantooth, “Modeling and Simulation of Electrical and Thermal Interaction,” in Modeling in Analog Design, vol. 2, Current Issues in Electronic Modeling, O. L. Jean-Michel Berge, Jacques Rouillard, Ed. Dordrecht: Kluwer Academic Publishers, 1995, pp. 93-120. [35] H. Chiueh, J. Draper1, L. Luh and J. Choma, Jr.” A Novel Model for On-chip Heat Dissipation” APCCAS98, Nov. 24-27, 1998 [link] [36] S. M. Alam, D. E. Troxel, and C.V. Thompson “Circuit and System Level Tools for Thermal-Aware Reliability Assessments of IC Designs” MIT Research lab of Electronics report chapter 8 [link] [37] Yong Zhan, Brent Goplen, and Sachin S. Sapatnekar” Electrothermal Analysis and Optimization Techniques for Nanoscale Integrated Circuits “ [link] [38] Takashi SATO1, Junji ICHIMIYA2,6, Nobuto ONO3, Koutaro HACHIYA4 and Masanori HASHIMOTO5 “On-Chip Thermal Gradient Analysis and Temperature Flattening for SoC Design”

[link] [39] M. Pedram and S. Nazarian, “Thermal Modeling, Analysis and Management in VLSI Circuits: Principles and Methods”[link] [40] E. Rohou and M. Smith, “Dynamically managing processor temperature and power,” Proc. FDDO-2, Nov. 1999. [link] [41] S. Das, A. Chadrakasan, “Three-dimensional integrated circuits: performance, design methodology, and CAD tools,” Proc. IEEE Annual Symp. on VLSI, 2003, pp. 13-18.[link] [42] A. Rahman, R. Reif, “Thermal analysis of threedimensional (3-D) integrated circuits (ICs),” Proc.Int’l Interconnect Technology Conf., 2001, pp. 157-159. [link] [43] K. Banerjee, A. Mehrotra, A. Sangiovanni-Vincentelli, and C. Hu “On Thermal Effects in Deep

Sub-Micron VLSl Interconnects,” Proc. Design Automation Conf. 1999, pp. 885-891 [link] [44] T-Y. Wang, C-C. Chen, “Spice-compatible thermal simulation with lumped circuit modeling for thermal reliability analysis based on model reduction,” Proc. Int’l Symp. Quality Electronic Design, 2004, pp. 357- 362. [link] [45] T.-Y. Chiang, K. Banerjee, and K. C. Saraswat, “Effect of via separation and low-k dielectric materials on the thermal characteristics of Cu interconnects,”Tech. Dig. IEEE Int. Electron Devices Meeting, 2000,pp. 261–264. [link] [46] G. Digele, S. Lindenkreuz, E. Kasper, “Fully coupled dynamic electrothermal simulation,” IEEE Trans. Very Large Scale Integration Systems, Vol. 5, Issue 3,1997, pp. 250-257.[link] [47] D. Chen, E. Li, E. Rosenbaum, and S.-M. Kang, “Interconnect thermal modeling for accurate simulation of circuit timing and reliability,” IEEE Trans. Computer-Aided Design, vol. 19, Feb. 2000, pp. 197–205.[link] [48] C.C.S. Wunsche, C. Clauss, P. Schwarz, and F. Winkler, “Electrothermal circuit simulation using simulator coupling,” IEEE Trans. Very Large Scale Integration Systems, Vol. 5, Issue 3, 1997, pp. 277- 282.[link] [49] Y. K. Cheng and S. M. Kang, “An efficient method for hotspot identification in ULSI circuits,” Proc. Int’l Conf. on Computer-Aided Design, Nov. 1999, pp. 124-127. [50] B. Wang, P. Mazumder, “Fast thermal analysis for VLSI circuits via semi-analytical green’s function in multilayer materials,” Proc. IEEE Int’l Symp. On Circuits and Systems (ISCAS), 2004.[link] [51] Y. Zhan, S. Sapatnekar, “Fast Computation of the Temperature Distribution in VLSI Chips Using the Discrete Cosine Transform and Table Look-up,” Proc. Asia-Pacific Design Automation Conf., 2005.[link] [52] Y-K. Cheng et al. “ETS-A: A New Electrothermal Simulator for CMOS VLSI Circuits” Proc. ED&TC’96, March 1996, pp. 566-570.[link] [54] L. Miao, Z. Runde, G. Yuanqing, “A New Electrothermal Simulator Based on Relaxation Method for Integrated Circuits with Distributed Temperatures.” Proc. ICDA, 2000.[link] [55] C. H. Diaz, S. M. Kang, and C. Duvvury, “Circuit level electrothermal simulation of electrical overstress failures in advanced MOS I/O protection devices,” IEEE Trans. Computer-Aided Design, 1994, pp. 482– 493. [56] V. Székely, A. Poppe, M. Rencz, A. Csendes, and A. Páhi “Self-consistent electro-thermal simulation:fundamentals and practice” Microelectronics Journal, Vol. 28, 1997, pp. 247-262.[link] [57] V. Székely, A. Poppe, A. Páhi, A. Csendes, G. Hajas, M. RenczEl, “Electrothermal and logi-thermal simulation of VLSI designs,” IEEE Trans. on Very Large Scale Integration Systems, Vol. 5, Issue 3,1997, pp. 258-269. [link] [58] H. Su, F. Liu, A. Devgan, E. Acar, and S. Nassif, “Full Chip Leakage Estimation Considering Power Supply and Temperature Variations,” Proc. Symp. Low Power Electronics and Design, Aug. 2003, pp. 78-83.[link] [59] K. Xiu, M. Ketchen, “Thermal modeling of a small extreme power density macro on a high power density microprocessor chip in the presence of realistic packaging and interconnect structures”, Proceedings on Electronic Components and Technology, 2004. ECTC '04.[link] [61] A. Shayesteh, E. Kursun, T. Sherwood, S. Sair , G. Reinman “Reducing the Latency and Area Cost of Core Swapping through Shared Helper Engines“ [link] [62] J. C. Ku, S. Ozdemir, G. Memik, and Y. Ismail “Thermal Management of On-Chip Caches

Through Power Density Minimization” Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture 2005 [link] [63] M. Rencz , V. Székely, A.Poppe, “A fast algorithm for the layout based electro-thermal simulation” DATE’03 [link] [64]K. Black, K. Kelly and N. Wright, “Modeling Subthrshold Leakage and Thermal Stability in a Production Life Test environment”, IEEE Semi-Therm Symposium 2005 [65]S. Im, N. Srivastave, K. Banerjee, K. E. Goodson, “Scaling Analysis of Multilevel Interconnect Temperatures for High-Performance ICs”, ITTT Trans. Electron Device Vol. 52, 2005 [66]W. Liao, F. Li, and L. He, “Microarchitecture Level Power and Thermal Simulation Considering Temperature Dependent Leakage Model”, ISLPED’03 [67]V. Szekely, M. Rencz, and B. Courtois, “Tracing the Thermal Behavior of ICs”, IEEE Design and Test of Computers, 1998 [link] [68]T. Grasser and S. Selberherr, “Electro-Thermal Effects in Mixed-Mode Device Simulation”, CAS’98 [69]M. Meterelliyoz, H. Mahmoodi, and K. Roy, “A Leakage Control System for Thermal Stability During Burn-In Test” International Test Conference, Nov. 2005 [70] W.-L. Hung, G.M. Link, Y. Xie, N. Vijaykrishnan, and M.J. Irwin, “Interconnect and Thermal-aware Floorplanning for 3D Microprocessors”, Porceedings of the 7th International Symposium on Quality Electronic Design, 2006 [71]Heat Transfer fundamental, http://www.chomerics.com//products/documents/thermcat/ heat_transfer fund.pdf [72]R. Schreier, J. Silva, J. Steensgaard, and G. C. Temes, “Design-oriented estimation of thermal noise in switched-capacitor circuits”, IEEE Trans. On Circuits and Systems I, Vol 52, 2005 [73]V.De and S. Borkar, “Technology and design challenges for low power and high performance,” in Proc. ISLPED, pp. 163-168, 1999 [74]W. Batty et al. Global coupled EM-electrical-thermal simulation and experimental validation for a spatial power combining MMIC array. IEEE Transactions on Microwave Theory and Techniques, pages 2820–33, Dec. 2002. [75]T.-Y. Wang and C. C.-P. Chen. 3-D thermal-ADI: A linear-time chip level transient thermal simulator. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 21(12):1434–1445, December 2002. [76]T. Sato, J. Ichimiya, N. Ono, K. Hachiya, and M. Hashimoto, “On-Chip Thermal Gradient Analysis and Temperature Flattening for SoC Design,” IEICE Trans. Fundamentals, Vol. E88-A, No. 12, Dec. 2005, pp. 3382-3389 [77]S. Zhang, V. Wason and K. Banerjee, “A Probabilistic Framework to Estimate Full-Chip Subthreshold Leakage Power Distribution Considering Within-Die and Die-to-Die P-T-V Variations,” Proc. Symp. Low [78]Yu, Z.; Yergeau, D.; Dutton, R.W.; “Full chip thermal simulation”, ISQED 2000 [79]Y-K. Cheng et al. “ETS-A : A New Electrothermal Simulator for CMOS VLSI Circuits: Proc. ED&TC 1996 [80]G. Briggs, E. Tan, N. Nelson and D. Albonesi “QUILT: A GUI-based Integrated Circuit Floorplanning Environment for Computer Architecture Research and Education” Workshop on Computer Architecture Education, 2005 [81]Qikai Chen, Mesut Meterelliyoz, and Kaushik Roy “A CMOS Thermal Sensor and Its Applications

in Temperature Adaptive Design” ISQED’06 [82]A. Bakker and J.H. Huijsing, “Micropower CMOS Smart Temperature Sensor,” Proc. European Solid-State Circuits Conf., Editions Frontieres, Gif-sur-Yvette, France, 1995, pp. 238-241. [83]P. Xekalakis, S. Kaxiras, G. Keramidas, “Thermal Sensors and Adaptivity for Power Reduction”[link] [84] C. Chen1, P. Chen, A. Liu, W. Lu and Y. Chang “An accurate CMOS delay-line-based smart temperature sensor for low-power low-cost systems” Meas. Sci. Technol. 17 2006 [85]R.Patrikar1, O. Peyran “Design Planning for Uniform Thermal Distribution” VLSID’06 [link] [86] M. Mukherjee, R. Vemuri, “On Physical-Aware Synthesis of Vertically Integrated 3D Systems”, ICVD’05 [87]Z. Gu, Y. Yang, J.Wang, R. Dick, L. Shang, “TAPHS: Thermal-Aware Unified Physical-Level and High-Level Synthesis” Proc. Asia & South Pacific Design Automation Conference 2006 [88]R. Mukherjee et. al., “Temperature-Aware Resource Allocation and Binding in High-Level Synthesis”, DAC 2005 [89]Thermal throttling : http://www.digit-life.com/articles2/p4-throttling/ [90]J. Altet, A. Rubio, E. Schaub, S. Dilhaire, and W. Claeys, “Thermal Coupling in Integrated Circuits: Application to Thermal Testing [91]T. Li, C.H. Tsai, S.M. Kang: “Efficient Transient Electrothermal Simulation of CMOS VLSI Circuits under Electrical Overstress” Proc. Int’l Conf. Computer-Aided Design, 1998, pp.6-10.[link] [92]Herming Chiueh, Jeffrey Draper, Louis Luh, and John Choma “A Thermal Evaluation of Integrated Circuits: On-Chip Offset Temperature Measurement and Modeling”, Proceedings of the Second International Workshop on Design of Mixed-Mode Integrated Circuits and Applications, July 1998, pp. 109-13 [link] [93] P Marchal et. al, “TEMPERATURE ISSUES ON LOW-POWER MPSoCs”, DATE’06 [94] A Chakraborty, et. al., “THERMAL RESILIENT BOUNDED-SKEW CLOCK TREE OPTIMIZATION METHODOLOGY”, DATE’06 [95] F Wang,et. al, “ON-CHIP BUS THERMAL ANALYSIS AND OPTIMIZATION“, DATE’06

review on thermal related vlsi design - semantic scholar · 2015-07-28 · review on thermal...

Documents