elysium vlsi 2010

8/8/2019 Elysium VLSI 2010

1/21

Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Kollam | Ramnad | Tuticorin | Singapore

#230, Church Road, Anna Nagar, Madurai 625 020, Tamil Nadu, India(: +91 452-4390702, 4392702, 4390651Website: www.elysiumtechnologies.com,www.elysiumtechnologies.infoEmail: [email protected]

Abstract Very Large Scale Integration 2010 - 2011

01 Novel Vth Hopping Techniques for Aggressive Runtime Leakage Control

The continuous increase of leakage power consumption in deep sub-micro technologies necessitates more aggressive

leakage control. Runtime leakage control (RTLC) is effective, since runtime circuits generally have significant amount

of idleness. However, current RTLC techniques are only used when circuits have long idleness, rendering the

techniques less profitable. The reason is due to the large energy and delay overhead when performing RTLC mode

transition. We propose two novel techniques, workload-adaptive Vth hopping (WAVTH) and hierarchical Vth hopping

(HIVTH), to tackle the overhead problems and enable aggressive runtime leakage control. Experimental results show

19.2% average improvement on leakage saving with WAVTH and HIVTH over basic Vth hopping. The optimum design

points of these two techniques are determined through accurate modeling

02Leakage-Aware Energy Minimization using Dynamic Voltage Scaling and Cache Reconfiguration in Real-

Time Systems

System optimization techniques are widely used to improve energy efficiency as well as overall performance. Dynamic

voltage scaling (DVS) is acknowledged to be successful in reducing processor energy consumption. Due to the

increasing significance of the memory subsystems energy consumption, dynamic cache reconfiguration (DCR)

techniques are recently proposed at the aim of saving cache subsystems energy consumption. As the manufacturing

technology scales into the order of nanometers, leakage current, both in the processor and cache subsystem,

becomes a significant contributor in the overall power dissipation. In this paper, we efficiently integrate processor

voltage scaling and cache reconfiguration together that is aware of leakage power to minimize overall system energy

consumption. Experimental results demonstrate that our approach outperforms existing techniques by on average 12 -

23%.

03 Modeling of RF- MEMS BAW Resonator

Due to the demand of smaller and more portable devices the applications of MEMS resonators are rapidly increasing.

Solidly Mounted Resonators (SMR) based on Bulk Acoustic Wave (BAW) technology follow MEMS principles to build

high performance microwave filters for RF communication. In this paper we will provide the architecture of SMRs by

discussing the designing aspects of its core structures which are within foundry CMOS processes using RF design

software Advanced Design System (ADS). Conventional VLSI processes are followed for the fabrication of the SMRs.

The results from the fabricated data are compared and discussed.


2/21



04 Modeling and Design Considerations of Coupled Inductor Converters

In this part of the sequel on the modeling and analysis of coupled inductors and coupled inductor based multiphase

switching converters, the recently developed symmetrical coupled inductor model is first extended to include the

inductor winding dc resistance (DCR). The extended model is then used to analyze the influence of the coupling on the

DCR based current sensing schemes popularly used in multi-phase switching regulators. It is found that the time-

constant matching condition in coupled inductor converters needs to be modified to include the coupling coefficient.

The proposed model is also used to derive the small-signal control-to-output transfer function of the converters

incorporating coupled inductors, with which the effect of coupling on the dynamic behaviors of the converter power

stage, such as resonant frequency and damping factor, can be easily evaluated.

05 Instruction Selection in ASIP Synthesis using Functional Matching

In embedded systems, Application Specific Instruction Set Processors (ASIPs) are used commonly with the aim to get

high performance without losing flexibility. A crucial operation required during ASIP synthesis (in particular, selection

of custom instructions) as well as code generation for ASIPs is identifying portions of an application program that can

be executed by custom functional units (CFUs). Most existing solutions achieve this by matching structure of patterns

corresponding to CFUs with sub-graphs of application data flow graphs. Often it happens that the computations

performed by the two are equivalent, but due to structural dissimilarities the match is missed. What is needed is a

method that can match two graphs functionally rather than structurally. In this paper, we present a novel method to do

this and give implementation results to show its effectiveness.

06 Inexact Decision Circuits: An Application to Hamming Weight Threshold Voting

In this work, the authors present the idea of inexact decision making, and its application to threshold voting of Hamming

weights, used in bus coding schemes, N-Modular Redundancy (NMR), Median filtering, and other pattern matching

applications. Decision circuits can be tweaked, to perform in an inexact manner, in order to optimize in terms of delay and

power, but still maintaining high system accuracy. One such circuit identified by the authors is the threshold voting of

Hamming weights. A majority voter is a special case of this family of circuits. The proposed inexact voter consumes up to 8

times less power than the exact voter, with negligible reduction in system accuracy and performance. The leakage power is

also reduced by a factor of 3. The inexact voter allows for a higher frequency of operation, by reducing the critical path delay

by a factor of up to 3.4. The results obtained validate the application of inexact decision making, with respect to threshold

voters.


3/21



07 Implementation of a Novel Phoneme Recognition System using TMS320C6713 DSP

A number of techniques have been proposed in the literature for phoneme based speech recognition system. In this

paper, a technique for automatic phoneme recognition using zero-crossings (ZC) and magnitude sum function (MSF) is

proposed. The number of zero-crossings and Magnitude sum function per frame are extracted and a Minimum Distance

Classifier is proposed to recognize the phonemes in each frame with these features. In order to increase the

recognition accuracy of phonemes, a finite state machine is also proposed. The performance of the proposed

phoneme recognition system is evaluated using TTS database and compared with the system using Linear Predictive

Coefficients (LPC) feature inputs. Phoneme recognition accuracies of 70.93% and 55.25% are obtained for the system

using LPC and the one using ZC along with MSF respectively. However, using the finite state machine proposed in this

paper, 100% recognition accuracy is obtained for both the techniques. The computational costs required for

recognizing various sentences using both of the feature extraction techniques are evaluated. It is observed that the

proposed technique requires about 9.3 times lower computational cost than the one using LPC. The proposed

technique is adopted for the implementation of the phoneme recognition system on Texas Instruments TMS320C6713

floating point processor. The different ways to reduce the recognition time for the target device is explored and

reported in this paper. The technique proposed here is also applicable for speech inputs from other database.

08 Impact of Temperature on Test Quality

The usage of more advanced, less mature processes during manufacturing of semiconductor devices has increased

the need for performing unconventional types of testing, like temperature-testing, in order to maintain the same high

quality levels. However, performing temperature-testing is costly. This paper proposes a viable low-cost alternative to

temperature testing that quantifies the impact of temperature variations on the test quality and also determines optimal

test conditions. The test flow proposed is empirically validated on an industrial-standard die. The results obtained

show that majority of the defects that were originally detected by temperature-testing are also detected by the

proposed test flow, thereby reducing the dependence on temperature testing to achieve zero-defect quality. Details of

an interesting defect behavior at cold test conditions are also presented.

09 Identifying the bottlenecks to the RF performance of FinFETs

: In this work, the high frequency (RF) performance of FinFETs is investigated in detail using a two-level parasitic

model comprising outer and inner parasitic capacitances in addition to parasitic series resistances. Use of scaling

relations of these parasitic capacitances with numbers of fins and fingers allows extraction of these elements. Next, by

defining a series of reference surfaces, each associated with a certain set of parasitic elements; we proceed to

calculate the RF Figures of Merit, namely fT and fmax at these surfaces. These are called available fT (fmax) in this

work. Analysis of the available fT (fmax) gives insight into the extent to which different parasitics affect the FinFETs

RF performance. The main bottleneck to the FinFETs RF performance is identified, solutions are proposed and

relevant trade-offs are discussed.


4/21



10 Identifying Tests for Logic Fault Models Involving Subsets of Lines without Fault Enumeration

Bridging and interconnect open faults are defined using subsets of lines. We study the possibility of identifying input

vectors that are effective as test vectors for such faults without enumerating the faults. This process does not require

accurate layout information, it can handle very large numbers of faults, and it deals with undetectable faults implicitly.

We describe a static test compaction process that uses the ability to identify effective test vectors without enumerating

faults. This process selects a subset T of a given test set U such that T is guaranteed to detect the same faults as U.

We also describe a test generation process based on the same concept. Finally, we show how this concept can be

used to compare test sets.

11Hamming Distance Based Reordering and Column wise Bit Stuffing with Difference Vector: A Better

Scheme or Test Data Compression with Run Length Based Codes

Because of increased design complexity and advanced fabrication technologies, the number of tests and

corresponding data volume increases rapidly. As the large size of test data volume is becoming one of the major

problems in testing System on- a-Chip (SoC), several compression coding schemes have been proposed in past. Run

Length Coding is one of the most familiar coding methodologies for compression. In this paper, we present a new

scheme named Hamming Distance Based Reordering and Column wise Bit Stuffing with Difference Vector (HDR-

CBSDV), Which can be used with any run length based code technique for better compression ratio. Four techniques

have been applied in this scheme: Selection of first vector, Hamming Distance Based Reordering, Column wise Bit

Stuffing and Difference Vector. Instead of directly applying any known run length code like Golomb, Frequency

Directed Run Length (FDR), Extended FDR (EFDR), Modified FDR (MDFR) or Shifted Alternate FDR (SAFDR) to given

test set, if we apply the proposed scheme to test set prior to applying the run length base code, the compression

obtained is improved drastically. The experimental results on ISCAS89 Benchmark circuits shows that the test data

compression ratio improves significantly for each case. It is also noteworthy that in most of the case, this scheme

does not involve any extra silicon area over-head compared to the base code with which it used. For few cases, it

requires an extra XOR gate and feedback path only. The proposed scheme can be easily integrated into the existing

industrial flow.

12 Functional Refinement: A Generic Methodology for Managing ESL Abstractions

Ever increasing complexity of SoCs has resulted in starting the system design at a higher level of abstraction. System-

level design methodology envisages step-wise refinement of high-level models towards final RTL. However, current

practices are limited to only interface-refinement and the true functionality refinement is performed by developing a

different model for each abstraction-level. This results in minimal re-use of existing model, loss of efforts and high

maintenance cost of multiple models. This paper presents a novel methodology that enables seamless refinement of IP

model functionality from one level to another. The presented methodology is generic to be applied to various SoC

design tasks. This paper demonstrates the application of the methodology to software energy-estimation for a DSP

and functionalcum- timing refinement of DDR-memory model. The proposed methodology resulted in complete re-use

of the existing models, easy availability of various model abstractions and 20% savings in development-effort of a new

model.


5/21



13 Exploring use of NoC for Reconfigurable Video Coding

MPEG RVC is a standard under development which addresses the issues of standardization of video coding tools and

multi-format codec design. It is likely to have SoC based solutions developed for MPEG RVC in future. In this paper we

evaluate Network on Chip (NoC) as an on-chip interconnection mechanism for MPEG RVC SoC. We use MPEG RVC

reference C code for MPEG 2 and AVC intra-only decoding, and the Open source NoC named NoCem for simulation.

We make a new simulation platform over NoCem by using VHDL FLI. This platform allows us to make cycle accurate

measurements. We experiment with different input resolutions and different configurations of NoCem and measure the

network performance and overhead for reconfiguration. The results point that NoC is potential candidate for on-chip

interconnection mechanism for MPEG RVC SoC.

14 Experimental Results and Study of a Modified Adaptive Bus Voltage Controller

Two-stage power conversion is used in many applications. The determination of the bus voltage in two stage converter

is important to achieve high overall power conversion efficiency. Conventional two-stage converters utilize either a

fixed DC bus voltage or a variable but predetermined DC bus voltage, which do not necessarily result in an optimum

operation with optimum bus voltages under variable conditions. The paper presents a modified adaptive bus voltage

controller for two-stage power converter. The controller adaptively converges to the optimum bus voltage that yields to

the maximum power conversion efficiency under variable operating conditions. The modified adaptive two-stage bus

voltage controller is evaluated using results obtained from a proof of concept experimental prototype

15 Electrical Modeling of Lithographic Imperfections

Lithographic wavelength of 193nm has been used for past few generations of patterning and is likely to remain in use

for next few technology generations (at least till 28nm technology half-node) as well. This deep sub-wavelength

patterning has resulted in wafer shapes not resembling drawn rectilinear shapes. The resulting non-rectangular

devices and wires are not handled by current generation modeling and analyses methods. In this paper, we present a

survey of electrical modeling methods for such lithographic imperfections especially on transistor layers. We also

discuss use contexts of such models as well as briefly present electrical implications of the likely future patterning

candidate, namely double patterning


6/21



16 Design Procedure for High Frequency Operation of the Modified Series Resonant APWM Converter withImproved Efficiency and Reduced Size

In this paper, a generalized analysis for the auxiliary network in a modified series resonant asymmetrical pulse width-

modulated (APWM) converter is performed to produce a design procedure that ensures ZVS is achieved for any

converter design. New equations that correctly predict the magnitude of auxiliary current are obtained by accounting

for the trapezoidal nature of the waveforms associated with high frequency operation, and the dead time between the

switches in the half bridge. A design example of a 48V/1.2V, 25A converter operating at 1MHz is chosen to highlight the

validity of the proposed design and that superior results can be achieved if the resonant tank is designed in tandem

with the auxiliary network. Experimental results verify that ZVS is achieved, and that the proposed design reduces the

auxiliary inductor by close to a factor of 3.

17 Design of Reversible Latches Optimized for Quantum Cost, Delay and Garbage Outputs

Reversible logic has extensive applications in emerging nanotechnologies, such as quantum computing, optical

computing, ultra low power VLSI and quantum dot cellular automata. In the existing literature, designs of reversible

sequential circuits are presented that are optimized for the number of reversible gates and the garbage outputs. The

optimization of the number of reversible gates is not sufficient since each reversible gate is of different computational

complexity, and thus will have a different quantum cost and delay. While the computational complexity of a reversible

gate can be measured by its quantum cost, the delay of a reversible gate is another parameter that can be optimized

during the design of a reversible sequential circuit. In this work, we present novel designs of reversible latches that are

optimized in terms of quantum cost, delay and the garbage outputs. The optimized designs of reversible latches

presented in this work are the D Latch, JK latch, T latch and SR latch

18 Design of NoC for SoC w ith Multiple Use Cases Requiring Guaranteed

Many SoC architectures aimed at the multimedia domain support multiple use cases where only a subset of the

applications is active at any time. Further, each multimedia application itself poses strict constraints on core-to-core

communication latency. This paper presents an approach for automated synthesis of NoC architectures for such an

SoC. We evaluated our design approach through comparisons with two existing techniques aimed at generating best

effort and guaranteed throughput designs. Designs generated by our approach showed a marked improvement in both

power consumption (12.3% decrease) and resource requirements (12.9% decrease) in comparison to the best effort

NoC design approach. In comparison to the existing guaranteed throughput design approach our designs can

guarantee core-to-core latency while consuming less power (8.1% decrease) and resources (7.9% decreases).


7/21



19 Design of Low-Cost High-performance Floating-point Fused Multiply-Add with Reduced Power

This paper presents a floating-point fused multiply-add (FMA) unit with low-cost and low power techniques. To

improve the performance, two single-precision operations can be performed concurrently with one double-precision

data path, which is very useful in multimedia and even scientific applications. Moreover, to reduce the additional area

costs for supporting two single-precision operations in parallel, multiple double-precision units, i.e., the multiplier,

shifter and adder, are reused as much as possible. A modified dual-path algorithm is proposed by classifying the

exponent difference into three cases and implementing them with CLOSE and FAR paths, which can reduce latency

and facilitate lowering power consumption by enabling only one of the two paths. In addition, in case of FADD

instructions, the multiplier in the first stage is bypassed and kept in stable mode, which can significantly improve

FADD instruction performance and lower power consumption. The overall FMA unit has a latency of 4 cycles while the

FADD operation has 3 cycles. Each cycle has a time delay of about 0.66ns in the ST 65nm CMOS technology.

Compared with the conventional double-precision FMA, about 13% delay is reduced and about 22% area is increased,

which is acceptable since two single-precision results can be generated simultaneously.

20 Design Considerations for BEOL MIM Capacitor Modeling in RF CMOS Processes

Modeling ofMIM capacitors in high frequency RF applications depends heavily on the design of test structures. An

external substrate ring is shown to be essential in capturing and modeling the inherent inductance of the MIM

capacitor. Additionally, deembedding of series parasitics plays a very important role in modeling of MIM capacitors

since these devices have very low series resistance. Various short Structures were studied and their impacts on the

MIM characteristics are reported. It is shown that a short structure with the shortest path to ground is best suited to

deembed the series parasitics.

21Coverage Management with Inline Assertions and Formal Test Points

This paper studies the problem of coverage management with two emerging formalisms in simulation based validation,

namely formal specification of test points and the use of inline temporal assertions. We present methods for checking

whether a test-bench with inline assertion covers a set of formal test points. This is particularly useful in developing

verification IPs for standard on-chip protocols where the development team must make sure that the test bench

provided in the verification IP checks all the important aspects of the protocol. We demonstrate the efficacy of our

approach over the ARM AMBA verification IP.

22 Clocking-based Coplanar Wire Crossing Scheme for QCA

Quantum-dot Cellular Automata is one of the promising next-gen fabrics for circuits. Coplanar wire crossings is one of

the more elegant features of this new low power computing paradigm. However, these need two types of cells and are

known to be neither easy to fabricate nor very robust. In this work, we propose coplanar wire crossing using a single

type of QCA cells, by applying the concept of Time Division Multiplexing to design the crossing. This has massive

implications in fabrication and fault tolerance of QCA circuits.


8/21



23 Channel Optimization for the Design of High Speed I/O links

The continuous increase in microprocessor performance demands an equal order of increase in the bandwidth

requirements on the memory and I/O interfaces. Providing the required bandwidth at an acceptable cost is a challenge

to the system packaging engineer. This paper discusses how a passive channel can be optimized in a cost effective

way to provide the maximum bandwidth. The paper focuses on the design methodology including modeling the

channel, identifying the channel bottle-necks, optimizing around the bottle-necks and verifying the conclusions

through simulation. Finally the simulation results are verified through hardware measurements.

24 Bridgeless Buck PFC Rectifier

TA new bridgeless buck PFC rectifier that substantially improves efficiency at low line of the universal line range is

introduced. By eliminating input bridge diodes, the proposed rectifiers efficiency is further improved. Moreover, the

rectifier doubles its output voltage, which extends useable energy of the bulk capacitor after a drop-out of the line

voltage. The operation and performance of the proposed circuit was verified on a 700-W, universal-line experimental

prototype operating at 65 kHz. The measured efficiencies at 50% load from 115-V and 230-V line are both close to

96.4%. The efficiency difference between low line and high line is less than 0.5% at full load. A second-stage half-

bridge converter was also included to show that the combined power stages easily meet Climate Saver Computing

Initiative Gold Standard

25Bottleneck Identification Techniques leading to Simplified Performance Models for Efficient Design

Space Exploration in VLSI Memory Systems

High performance VLSI systems are being built as multiprocessor systems-on-chip. The number of processors and

their performance is rising rapidly while the change is slower for the memories. The memory system is often a

performance bottleneck in terms of either its bandwidth or latency. We propose sensitivity analysis as a means to

pinpoint the bottleneck. We introduce a novel randomized technique to measure the sensitivities within cycle accurate

simulators. The sensitivity measures identify the bottleneck regions of the design space, within which simplified

performance models can be used for optimization. We demonstrate this methodology on the Augmint-MemSim

simulator, which is a cycle accurate model for multi-processor systems with a distributed memory sub-system. We

empirically show that: (i) Performance predictions from simplified models are strongly correlated with the simulator in

the high sensitivity regions. (ii) The simplified models speed up design Space exploration by 2 3 orders of magnitude

over the simulator resulting in better design solutions.


9/21



26 Architectural Comparison of Analog and Digital Duty Cycle Corrector for High Speed I/O Link

To achieve high speed data signaling rates with the internal fast clock operating at half its speed, the XDR (extreme

data rate) I/O link employs dual-edge signaling where in data bits are transmitted on both the edges (rise/fall) of

transmit clock. Duty cycle correction technique is used to provide high frequency low jitter clocks that have 50% duty

cycle. This paper compares two different techniques to implement duty cycle corrector (DCC). These techniques are

implemented in high speed I/O operating at data rate of 4Gbps and 6.4Gbps in TSMC 65nm & TSMC 40nm technology

achieving an output duty cycle error below 2% for 10% input duty cycle error.

27 Analyzing Energy-Delay Behavior in Room Temperature Single Electron Transistors

This paper presents Single Electron Transistor (SET) devices operating at room temperature as an attractive option to

implement low energy consumption circuits with low-tomoderate performance requirements. Currently, such circuits

are implemented using CMOS technologies operating at low supply voltages. CMOS is usually leakage dominated at

such a low voltage regime and various optimizations are necessary to design low energy circuits. By discussing the

energy-delay trade-offs for SET devices and comparing them to those of contemporary CMOS technology, we present

an argument that SET devices may be more favorable compared to CMOS from the energy and delay standpoints at

low supply voltages.

28 Analysis, Design and Simulation of Capacitive Load Balanced Rotary Oscillatory Array

The high frequency of the rotary clocking technology is often susceptible to implementation parameters such as the

variation in the total capacitive load distribution between the rings. SPICE simulations performed on the rotary rings

with Unbalanced capacitive load distribution shows a 30.31% variation in the simulated frequencies across the rings.

To address this problem, two novel methodologies called OCLB and SOCLB, are formulated for the optimal capacitive

load balancing and suboptimal capacitive load balancing with minimized wire length, respectively. SPICE simulations

performed with OCLB show 0.30% variation in the simulated frequencies across the rings. Further, SOCLB results in

an average wire length improvement of 69.24% over OCLB with a relatively balanced capacitive load distribution.

SPICE simulations performed with SOCLB show 2.40% variation in the simulated frequencies across the rings,

improved significantly over the 30.31% variation of the unbalanced case.


10/21



29 An L-band Fractional-N Synthesizer with noise-less Active Capacitor scaling

In a charge-pump based type-II analog Phase Locked Loop (PLL), the loop filter often uses a small resistor along with a

big integrating capacitor for good phase noise performance. This comes at the cost of large silicon area or external

component. The noise from the resistor contributes to the output phase noise through both feedback and feed-forward

paths and hence has a presence in the output over a very wide frequency band. In this PLL, the loop filter avoids the

feed-forward and limits the contribution of the resistor noise over a narrow frequency band. This technique allows a

large resistor to be used with a small capacitor without phase noise penalty. The achieved independent control of

bandwidth and stabilizing zero gives better stability and reduces noise peaking. The integrated phase error achieved at

1.3GHz is -38dBc.

30An improvised MOS transistor model suitable for Geometric Program based analog circuit sizing in Sub-

micron technology

This paper presents ways to improve accuracy of performance prediction for geometric program based analog design

in submicron regime. Geometric program requires a special monomial form of the device model it uses. The major

sources of inaccuracy in this basic model have been identified and it has been shown that slightly relaxing the strict

monomial form in order to include second order effects can greatly improve the accuracy. In order to make use of this

model we deploy it in collaboration with an iterative solution betterment scheme, by solving the sizing problem as a

sequence of geometric programs instead of a single one. We illustrate the efficacy of our scheme through a folded-

cascode op-amp sizing example.

31 An Improved High Resolution CMOS Timing Generator Using Array of Digital Delay Lock Loops

In this paper, an improved high resolution CMOS timing generator using array of digital delay lock loops is presented.

The timing generator is implemented as an array of delay locked loops. This architecture enables a timing generator

with sub gate delay resolution to be implemented. The proposed Delay Lock Loops use novel start controlled Dual

Phase and frequency Detector along with a charge pump where the injected charge approaches zero as the loop

approaches lock on the leading edge and the trailing edge of an input clock reference. The delay lock loop locks to

both the leading and trailing clock edges as the start controlled dual phase and frequency detector along with charge

pump convert the phase difference into voltage, which greatly reduces the timing jitter. In the start controlled dual

phase and frequency detector, the start-controlled circuit is used to provide a precise output without the Locking

problem. The results show that the total delay time between the input and the output of the DLL (Delay Lock Loop) is

one clock cycle and all of the delay cells provide precise output without false locking or harmonic locking. Test resultsshow a timing jitter of less than 5 pS for the DLL circuit and has very low phase sensitivity errors. The timing generator

implemented as an array of delay locked loops has exponentially reduced the locking time as well avoids false locking

or harmonic locking.An experimental proto type was simulated using 0.35 technology with a supply voltage of 3.3V.


11/21



32 An Efficient Method for Bottom-Up Extraction of Analog Behavioral Model Parameters

This paper presents a fast, accurate and robust method for bottom-up extraction of analog behavioral model

parameters from the corresponding transistor level netlists. The proposed Verilog-A in-loop simulation based

modeling approach is generic and can estimate the parameters of the corresponding model of any given circuit using

relevant test-benches, thus removing the need to implement structure based estimation tools for each circuit. The

models are usually non-linear with respect to the parameters and often the optimization problem becomes nonconvex.

A hybrid method based on co-operation and switching between search and gradient methods is proposed for

achieving significantly faster convergence to the global minima even in Presence of local minima in such non-convex

cases. This method is applied by the authors to a wide variety of analog circuits, and is demonstrated in the paper

using two distinctly different analogcircuits. Simulation results comparing the model and transistor level netlist show

that high level of accuracy can be achieved. The comparison of the search, gradient and the proposed hybrid method

is presented

33 An Efficient Design of a Reversible Barrel Shifter

The key objective of todays circuit design is to increase the performance without the proportional increase in power

consumption. In this regard, reversible logic has become an immensely promising technology in the field of low power

computing and designing. On the other hand, data shifting and rotating are required in many operations such as

arithmetic and logical operations, address decoding and indexing etc. In this consequence, barrel shifters, which can

shift and rotate multiple bits in a single cycle, have become a common design choice for high speed applications. For

this reason, this paper presents an efficient design of a reversible barrel shifter. It has also been shown that the new

circuit outperforms the previously proposed one in terms of number of gates, number of garbage outputs, delay and

quantum cost.

34 Accelerating Synchronous Sequential Circuits using an Adaptive Clock

In this paper we propose a scheme for enhancing the timing performance of a pre-designed synchronous sequential

circuit. In the proposed scheme, a circuit is driven by two clocks. One of them is the conventional clock while the other

one, having a shorter period, is applied when the circuit stabilizes well before the critical delay. We use a symbolic

algorithm to analyze the timing behavior of the synchronous Sequential circuit and provide add-on circuitry to select

the appropriate clock based on the current state of the circuit. We demonstrate an appreciable gain (67% in average) In

timing performance on several benchmark circuits


12/21



35 A Unified Solution to Scan Test Volume, Time, and Power Minimization

The double-tree scan-path architecture, originally proposed for low test power, is adapted to simultaneously reduce

the test application time and test data volume under external testing. Experimental results show significant

performance improvements over other existing scan architectures.

36 A Unified Approach for IP Protection across Design Phases in a Packaged Chip

IP values contributed by the distinct design tools in specific design phases are recognized by observing the signature

of the owner of each tool as functional or scan mode output of the fabricated chip, for certain input vector secret to the

owner. An existing approach inserts watermark through reordering of single scan chain, and solely identifies the

owner of the logic design tool. Here we propose a novel scheme to watermark the recent reconfigurable scan

architectures, operating in both scan tree and single scan mode. The signature of the owner of physical design tool

along with that of logic design tool can separately be embedded while designing the scan tree and also verified from

the packaged chip without conflict using two distinct modes. A bi-objective minimization of overhead in routing and

power is supported through our scheme. Experimental results on design overhead and robustness for ISCAS89

benchmarks are encouraging.

37 A Reconfigurable Architecture for Secure Multimedia Delivery

This paper introduces a reconfigurable architecture for ensuring secure and real-time video delivery through a novel

parameterized construction of the Discrete Wavelet Transform (DWT). This parameterized construction promises

multimedia encryption and is also well-suited to a hardware implementation due to our derivation of rational filter

coefficients. We achieve an efficient and high-throughput reconfigurable hardware implementation through the use of

LUT-based constant multipliers enabling run-time reconfiguration of encryption key. We also compare our prototype

(using a Xilinx Virtex 4 FPGA) to several existing implementations in the research literature and show that we achieve

superior performance as compared to both traditional CPU-based and custom VLSI approaches while adding features

for secure multimedia delivery.

38 A P4VT (Power-Performance-Process-Parasitic-Voltage-Temperature) Aware Dual-VTh Nano-CMOS VCO

We present the design flow for a P4VT (Power- Performance-Process-Parasitic-Voltage-Temperature) aware voltage

controlled oscillator (VCO). Through simulations, we have shown that parasitic, process, voltage and temperature have

a drastic effect on the performance (center frequency) of the VCO. A design optimization of the VCO, along with dual-

threshold power minimization has been performed in the presence of worst-case variations. The end product of the

proposed methodology is a P4VT-optimal dual-threshold 90nm VCO layout. We have achieved 16.4% power (including


13/21



leakage) minimization with 10% degradation in center frequency compared to the target frequency, in the presence of

worst-case variations

39 A Novel Circuit to Optimize Access Time and Decoding Schemes in Memories

As the microprocessor speed increases from 500MHz to 1GHz and beyond, SOC designers are forced to innovate new

schemes in their use of cache memory for high speed access. In this paper, clock to word line path delay is optimized

using a novel circuit design technique. Using this novel circuit, clock to word line path delay is optimized by 2.5 times

at worst case corner. For a typical memory instance frequently used in cache memories) whose access time is of the

order of 800ps and where read and write operation occurs in the same clock cycle, overall access time is improved by

18% at worst case corner. For this case, write margin is improved by 2.26 times at worst case corner for write

operation. A decoding scheme is also discussed in this paper which describes how to choose the best pre-decoding

and post-decoding schemes based on minimum pre-decoded lines, minimum stack size in post decoder and maximum

granularity of xdecoders

40 A Non Quasi-Static Small Signal Model for Long Channel Symmetric DG MOSFET

We propose a compact model for small signal non quasi static analysis of long channel symmetric double gate

MOSFET. The model is based on the EKV formalism and is valid in all regions of operation and thus suitable for RF

circuit design. Proposed model is verified with professional numerical device simulator and excellent agreement is

found well beyond the cut-off frequency.

41A new Hetero-material Stepped Gate (HSG) SOI LDMOS for RF Power Amplifier Applications

In this paper, we propose a new hetero-material stepped gate (HSG) SOI LDMOS in which the gate is divided into three

sections - an n+ gate sandwiched between two p+ gates and the gate oxide thickness increases from source to drain.

This new device structure improves the inversion layer charge density in the channel, results in uniform electric field

distribution in the drift region and reduces the gate to drain capacitance. Using two-dimensional simulation, the HSG

LDMOS is designed and compared with the conventional LDMOS. We demonstrate that the proposed device exhibits

28% improvement in breakdown voltage, 32% reduction in on-resistance, 13% improvement in transconductance, 9%

reduction in gate to drain charge and 38% reduction in switching delay. HSG LDMOS may be effectively deployed in RF

power amplifier applications

42 A Methodology for Power Aware High-Level Synthesis of Co-Processors from Software Algorithms

Hardware co-processors are used for accelerating specific compute-intensive tasks dedicated to video/audio codec,

encryption/ decryption, etc. Since many of these data-processing tasks already have efficient software algorithms, one

could reuse those to synthesize the co-processor IPs. However, such software algorithms are usually sequential and


14/21



written in C/C++. High-level Synthesis (HLS) helps in converting software implementation to register transfer level

(RTL) hardware design. Such co-processor based systems show enhanced performance but often have greater

power/energy consumption. Therefore, the automated synthesis of such accelerator IPs must be power-aware.

Downstream power savings features such as clock-gating are unknown during HLS. Designer is forced to take such

power-aware decisions only after logic synthesis stage, causing an increase in design time and effort. In this paper, we

present a design automation solution to facilitate various granularities of clock-gating at high-level C description of the

design

43 A Hierarchical Methodology for Word-Length Optimization of Signal Processing Systems

The problem of converting floating point algorithms to implementation friendly fixed point formats is often solved as

an optimization problem where the precision is traded to gain in the implementation cost. The complexity of the

problem is known to grow exponentially with more optimizable variables. This paper proposes a divide and conquer

technique to solve the growing size of the problem. The approach In this technique is original in the sense that it is

formulated from a designers perspective rather than merely attempting to divide and conquer at the algorithmic level.This paper introduces the single noise source model based on which the proposed technique is built. A mixed

approach for error propagation is also explained keeping in view of the elements in the circuit that cannot be handled

analytically

44 A Hardware Scheduler for Real Time Multiprocessor System on Chip

This paper presents the design and implementation of a low power Hardware scheduler for multiprocessor system-on-

chips. The Pfair scheduling algorithm is considered with three different implementation schemes: replicated software

scheduler running on each processor, single software scheduler running on a dedicated processor and the proposed

hardware scheduler. Experimental evaluation with benchmarks shows that the hardware scheduler outperforms theother two schemes in terms of energy consumption by an order of magnitude of 105 and scheduling delay by an order

of magnitude of 103.

45 A Graph-based I/O Pad Pre-placement Technique for use with Analytic FPGA Placement Methods

Typical analytic placement methods seek to minimize total squared wire length by solving a linear equation system.

However, to avoid trivial solutions, certain blocks must be assigned locations on the Field Programmable Gate Array

(FPGA) fabric prior to optimization. A simple way to achieve this is to assign blocks randomly. However, this does not

always result in the best solution. In this paper, we present a novel algorithm, called Shrub Place, for pre-assigning I/O

blocks to I/O pads around the perimeter of the FPGA. To verify the efficacy of our pre-placement algorithm, we

integrated the algorithm into the analytic placer in [1, 2]. When tested with the 20 MCNC benchmarks [11], our results

show a reduction in wire length is Possible, with very little additional execution time required to perform the pre-

placement.


15/21



46 GA Combined DOE-ILP Based Power and Read Stability Optimization in Nano-CMOS SRAM

A novel design approach for simultaneous power and stability (static noise margin, SNM) optimization of nano- CMOS

static random access memory (SRAM) is presented. A 45nm single-ended seven transistor SRAM is used as a case

study. The SRAM is subjected to a dual-VTh assignment using a novel combinedDesign of Experiments and Integer

Linear Programming (DOE-ILP) algorithm, resulting in 50.6% power reduction (including leakage) and 43.9% increase

in the read SNM. The process variation analysis of the optimal SRAM carried out considering twelve device parameters

shows the robustness of the design.

47 23.97GHz CMOS Distributed Voltage Controlled Oscillators with Inverter Gain Cells and Frequency Tuningby Body Bias and MOS Varactors Concurrently

Tunable VCOs operating around 24GHz in 0.18m CMOS are reported. Simple CMOS inverters are used as gain stages

and tuning is achieved with a novel Method using both body-bias as well as MOS varactors concurrently and

compared for Performances. The novel tuning method allows for a wider tuning range than using a single method.

Here forward body bias (FBB) type tuning of p-FETs has 9- 10 times higher tuning bandwidth as compared to MOS

varactors tuning when the latter is connected in series (before output collection point) but equal or nearly equal tuning

when the Varactor pair is connected in parallel (to drain transmission line). Six monolithically integrated novel

distributed voltage ontrolled oscillators (D-VCOs) with a novel gain cell comprising of CMOS inverter are designed.

Top Layer metal is used for coplanar waveguide (CPW) for onchip inductors. First D-VCO OSC-1 has 3-stages of the

gain cell and oscillating at 23.97GHz, the second D-VCO OSC-2 has 4-stages of gain cell and oscillating at 18.64GHz,

both K-band oscillators use body bias variation of p-FETs for wide frequency tuning. For further tuning after body bias

type of tuning, MOS Varactors are added in series to OSC-1 and OSC-2 resulting in designs respectively OSC-3 and

OSC-4, while in parallel resulting in designs respectively OSC-3a and OSC-4a. OSC-3 is oscillating at 23.53GHz and

OSC-4 is oscillating at 18.09GHz. OSC-3a is oscillating at 22.79GHz with 340MHz tuning by each of these two tuning

techniques (doubling of tuning bandwidth as total tuning is 680MHz). OSC-4a is oscillating at 17.77GHz (resulting Ku-

band VCO from K-band for substantial design reuse) with 240MHz tuning by FBB and 200MHz tuning by Varactor pair

(total tuning of 440MHz). The phase noise is reported at 1MHz offset from the carrier, for example it is -102.4dBc/Hz for

18.64GHz D-VCO. These oscillators are emitting very low power in 2nd and 3rd harmonics.

48A 90mW/GFlop 3.4GHz onfigurable Fused/Continuous Multiply-Accumulator for Floating-point and Integer

Operands in 65nm

This paper describes energy efficient and reconfigurable fused/continuous Multiply-Accumulator (MAC) architecture

for single-precision Floating-point and 16- bit signed integer operands. This eight-stage pipelined and single-cycle

throughput MAC design contains a bit level pipelined multiplier, followed by fast sparse-tree adder and single cycle

accumulator loop with delayed normalization logic. Operation driven energy control is achieved using dynamic clock

and fine grained power gating techniques. Power gating is employed in 98% of design to save 79% of leakage power in

idle mode, at 1.2V supply and 110C. The use of fully shared logic in the multiplier, accumulator and normalization

blocks for different operations enables a compact design of 0.54mm2 containing 117K transistors in eight-metal 65nm


16/21



CMOS technology. The 15-FO4 design provides 6.8GFLOPS of performance with total energy efficiency of

90mW/GFLOP at 1.2V and 3.4GHz operation.

49 A6 bit 800MHzTIADC based on Successive Approximation in 65nm Standard CMOS Process

Applications like Ultra-wideband radio, Optical Communication require sampling rates of at least 500MS/s with low

resolution. The potential energy savings of successive approximation based time interleaved A-D conversion

architecture overrides traditional flash architecture. This paper presents a 6- bit 800 MS/s ADC in 65 nm

STMicroelectronics standard CMOS process. The ADC uses 8-channel time interleaved SAR topology and achieves 36

dB SNR and 43 dB SFDR with 13.5 mW power consumption from 1.2 V supply. The resulting FOM is 0.3251 pJ/step. The

timing mismatch among the channels is reduced by clock-edge reassignment technique. The high speed specification

of the system requires the design of low offset comparator. Power consumption and jitter are reduced by using shift

register based phase generator.

50 4 GHz 130nm Low Voltage PLL Based on Self Biased Technique

This paper explores a PLL core design that can satisfy a wide range of high frequency serial data communication

applications. There exist several high frequency serial data communication protocols that co-exist today. The PLL

design requirements for all these clock frequencies separately call for enormous design effort in terms of time and

cost. It is desired to design a PLL core which makes it possible to address a wide segment of clock frequency

requirement. The PLL achieves this using single 1.2V supply, it doesnt use any special mask layers and also doesnt

need a bandgap reference for its operation. This PLL is based on self-biased technique and achieves high process

technology independence, fixed damping factor, fixed bandwidth to operating frequency range and input phase offset

cancellation. Here the self biased PLL in 130nm CMOS technology achieves the frequency range of 400 MHz to 4GHz.

The PLL core is designed to accept a wide range of input reference frequencies

51 Voltage-Frequency Planning for Thermal-Aware, Low-Power Design of Regular 3-D NoCs

Network-on-Chip combined with Globally Asynchronous Locally Synchronous paradigm is a promising architecture for

easy IP integration and utilization with multiple voltage levels. For power reduction, multiple voltage-frequency levels

are successfully applied to 2-D NoCs, but never with a generic approach to 3-D counterparts; in which low heat

conductivity of insulator layers makes high dense temperature distribution at layers away from heat sink. In this paper,

a thermal-aware methodology for regular 3-D NoCs based on multiple voltage levels is proposed. Given an application

task graph, this methodology determines an efficient mapping of tasks onto network tiles, considering inherent

computation and communication requirements of the tasks and thermal resistance from any silicon layer to the

ambient. Then, a heuristic approach is utilized to determine voltage and frequency specifications of all IP cores, such

that total power is reduced, dissipated heat is properly conducted to the layers close to the heat sink, and application

requirements (in terms of deadline) are satisfied. The experiments confirm a significant saving in total power while

performance of the running application is guaranteed.


17/21



52 Transition Inversion based Low Power data coding scheme for Buffered Data Transfer

In this work the authors propose a data coding protocol that leads to power reduction for block data transfer in off-chip

buses. I/O pads driving off-chip buses contribute to a major portion of power dissipation in chips. Also, block data

transfer is Preferred in most systems like caches, DMA etc. In this proposed work, the prior knowledge of the block of

data to be transmitted, when it is stored in the buffer, is exploited in a serial fashion to reduce transitions on every bus

line. Statistical analysis shows up to 31.9% reduction in transitions. Benchmark results show that it leads to 29%

reduction in power consumption. The technique provides added error detection on the lines of parity bit technique,

with similar average error detection capability

53 Towards Active-Passive Co-Synthesis of Multi-Gigahertz Radio Frequency Circuits

This paper proposes a methodology and framework for rapid active-passive co-synthesis of radio frequency circuits.

The presented approach leverages advances in accelerated three-dimensional electromagnetic simulation technology

to construct Maxwell-accurate parametric macro models of on-chip passives, in particular spiral inductors. These

macromodels can be used in the context of SPICE level synthesis, thereby enabling concurrent sizing of passive and

active components of radio frequency circuits. Moreover, macromodels obviate the need for topology exploration and

parametric RLC model generation via optimization for on-chip passives. The co-synthesis framework is enabled by

nonlinear, hyper-dimensional regression for macromodel generation and a simulated annealing based optimization

scheme. As examples, and to demonstrate the efficacy of the proposed approach, two standard low-noise amplifier

topologies are synthesized with tight performance constraints by co-optimization of circuit parameters and inductor

geometries.

54 The dawn of 22nm era: Design and CAD challenges

Technology scaling clearly has been the driver of semiconductor and thereby EDA industry. In the semiconductor

industry today, 45nm CMOS designs are in full production and 32nm design rules and infrastructure are already in

place for designs starting later this year. It will not be long before the beat of 22nm will be upon us. Due to ever

increasing cost of doing design, design productivity and more specifically, cost of design has become a major

bottleneck in large scale design projects. Due to this cost crunch, automated synthesis techniques have been

becoming increasingly important and this is bound to become a major trend going into 22nm for high performance

SoCs. In addition, in 22nm and beyond, 3D IC technology has the potential of easing the system performance

challenge problem. In order to exploit the full potential of 3D technology, new challenges in the area of physicaldesign, thermal analysis, system level design and analysis need to be addressed. 3D interconnects have the potential

of reducing critical paths delays significantly, which are typically between memory and the interfacing logic. In

addition, now that the physical limits are beginning to impact scaling, the question is: how can we cost effectively

design with complicated technology requirements presented by 22nm node and how the design automation

community can help to achieve this goal? What are the challenges at 22nm and what would design look like going into

22nm and beyond? In this paper, we will focus on the major design and CAD challenges associated with 22nm and

beyond.


18/21



55 Test Pattern Generation and Compaction for Crosstalk Induced Glitches and Delay Faults

VLSI circuits have become more susceptible to signal integrity related failures with the ever decreasing process

geometries. Detection of crosstalk induced faults is thus important as capacitive crosstalk is one of the major sources

of signal integrity related failures. Crosstalk glitch can result in erroneous output if the glitch effect propagates to a

primary output or to an intermediate flip-flop. Similarly the crosstalk induced delay effects can also result in latching of

an incorrect value if the delay exceeds the allowed margins. In this work a test generation and compaction method is

proposed for crosstalk faults. Test patterns are generated by simultaneously considering the coupling capacitance,

timing and functional incompatibilities between the victim and aggressor nets, to produce the practical maximum

crosstalk noise. A unique method is proposed for finding the functional incompatibilities between interconnects. The

generated test set is then compacted initially through pattern merging and then further through the fault-chaining

algorithm. Three different implementations of this algorithm are compared on crosstalk test sets generated for

ISCAS85 benchmark circuits. Results show considerable reduction in crosstalk pessimism for the given layout and

timing, as well as up to 75% reduction in overall test set size.

56 Synthesizability of 3 party Formal Specifications-Does my controller see enough?

This paper presents the problem of bounded synthesizability of formal specifications in the context of three party

systems, consisting of a machine, its environment and a controller. The overall objective is to determine whether it is

possible to synthesize both the machine and its controller for a given Linear Temporal Logic (LTL) specification over

the signals in the machine and the controller interfaces.

57Synchronization of Concurrently-Implemented Fluidic Operations in Pin-Constrained Digital Microfluidic

Biochips

The implementation of bioassays in pin-constrained biochips may involve pin-actuation conflicts if the concurrently

implemented fluidic operations are not carefully synchronized. We propose a two-phase optimization method to

identify and Synchronize the fluidic operations that can be executed in parallel. The goal is to implement these fluidic

operations without pinactuation conflict, and minimize the duration of implementing the outcome sequence after the

synchronization. The effectiveness of the proposed two-phase optimization method is demonstrated for a

representative 3-plex assay performed on a fabricated pin constrained biochip


19/21



58 Robust System Design

Robust system design ensures that future systems continue to meet user expectations despite rising levels of

underlying disturbances. This paper discusses two essential aspects of robust system design: 1. Effective post-silicon

validation, despite staggering complexity of future systems, using a new technique called Instruction Footprint

Recording and Analysis (IFRA). 2. Cost-effective design of systems that overcome CMOS reliability challenges through

built-in tolerance to errors in hardware during system operation. A combination of Built-In Soft Error Resilience

(BISER) and circuit failure prediction, together with on-line self-test/diagnostics and software-orchestrated

optimization across multiple abstraction layers, enable design of cost-effective resilient systems.

59 RF SOI Switch FET Design and Modeling Tradeoffs for GSM Applications

A single-pole double-throw novel switch device in 0.18m SOI complementary metal-oxide semiconductor (CMOS)

process is developed for 0.9 GHz wireless GSM systems. The layout of the device is optimized keeping in mind the

parameters of interest for the RF switch. A sub circuit model, with the standard surface potential (PSP) model as the

intrinsic FET model along with the parasitic elements is built to predict the Ron and Coff of the switch. The measured

data agrees well with the model. The eight FET stacked switch achieved an Ron of 2.5 ohms and an Coff of 180 fF.

60 Rethinking Threshold Voltage Assignment in 3D Multicore Designs

Due to the inherent nature of heat flow in 3D integrated circuits, stacked dies exhibit a wide range of thermal charac-

teristics. The strong dependence of leakage with temperature and process variation plays havoc in achieving system

level energy efficiency in such systems, complicating the task of power provisioning in 3D multicores. In this paper, we

address this power provisioning challenge in 3D ICs by advocating a novel microprocessor design paradigm, where

the circuit designers are aware of the in- tended placement of a die in a 3D stack. We present a con- crete application

of this paradigm through a threshold voltage (Vt) assignment algorithm for a 3D multicore system, where we

specifically account for: (a) the change in the role of leakage power, (b) expected operating frequency, and (c)

dependency of PV induced leakage variation and Vt levels. Detailed simulation based experiments with our proposed

algorithm show 215% improvement in energy efficiency for a typical multicore system organized as 3D stacked dies

61 Processor Architecture Design Using 3D Integration Technology

The emerging three-dimensional (3D) chip architectures, with their intrinsic capability of reducing the wire length, is

one of the promising solutions to mitigate the interconnect problem in modern microprocessor designs. 3D memory

stacking also enables much higher memory bandwidth for future chip-multiprocessor design, mitigating the memory

wall problem. In addition, heterogenous integration enabled by 3D technology can also result in innovation designs

for future microprocessors. This paper serves as a survey of various approaches to design future 3D microprocessors,

leveraging the benefits of fast latency, higher bandwidth, and heterogeneous integration capability that are offered by

3D technology.


20/21



62 Post assembly timing closure for multi million gate chips

A hierarchical timing closure methodology is presented. It has timing closure effectiveness of flat methods, while

capacity and run time efficiency of subchip based methods. The unique proposition is that it performs flat logic

physical optimization of cross subchip timing paths, while at the same time, abides to hierarchy rules. The principle

and details of the methodology are provided. Experimental result on multi million gate designs shows its timing

closure effectiveness with run time gains of 50% on optimization steps, and peak memory reduction as well.

63 Pinpointing Cache Timing Attacks on AES

The paper analyzes cache based timing attacks on optimized codes for Advanced Encryption Standard (AES). The

work justifies that timing based cache attacks create hits in the first and second rounds of AES, in a manner that the

timing variations leak information of the key. To the best of our knowledge, the paper justifies for the first time that

these attacks are unable to force hits in the third round and con- cludes that a similar third round cache timing attack

does not work. The paper experimentally verifies that protecting only the first two AES rounds thwarts cache based

timing attacks.

64 Parametric Fault Diagnosis of Nonlinear Analog Circuits using Polynomial Coefficients

We propose a method for diagnosis of parametric faults in analog circuits using polynomial coefficients of the circuit

model [15]. As a sequel to our recent work [14], where circuit response is modeled as polynomial for uncovering

parametric faults in nonlinear circuits, we propose diagnosis of such faults using sensitivity of coefficients of the

estimated polynomial to circuit parameters. The proposed method requires no design for test hardware as might be

added to the circuit by some other methods. The proposed method is illustrated for a benchmark elliptic filter. It is

shown to uncover several parametric faults causing deviations as small as 5% from the nominal values.

65 Optimized Stage Ratio of Tapered CMOS Inverters for Minimum Power and Mismatch Jitter Product

In this paper, an optimum stage ratio (tapering factor) for a tapered CMOS inverter chain is derived to minimize the

product of power dissipation and jitter variance due to device mismatch. Analysis shows that this optimum stage ratio

(2.4) is lower than that of minimum delay (3.6) and minimum power-delay (6.35) product. This analysis is verified by

simulation results using standard 180nm as well as 90nm CMOS technology. Knowledge of the optimum stage ratio

helps to design low power low mismatch jitter buffers for multi phase clock generation circuits that can drive Large

load capacitances.


21/21



66 Optical Lithography Simulation with Focus Variation Using Wavelet Transform

Printed image on silicon wafer differs from layout due to optical diffraction. Optical proximity correction (OPC) is a

layout distortion technique to improve printed image. During manufacturing, parameters such as focus, dose and

resist thickness may vary within tolerance margins. These factors contribute to additional distortion of expected

printed shape, not addressed directly by OPC. To ensure a robust IC, a process window consideration is extremely

important while running lithography simulations as we scale the technology even further, where the sensitivity of

patterns printed on silicon to process variations is very high. Optical Lithography simulation has always been an

important link in the chain for Design for manufacturability (DFM) and a lot of research has been put into making it

faster and more accurate. However, being a compute intensive process, speeding up litho simulation without

significant Compromise in accuracy has always been tricky. In this paper we propose a new method to approximate

litho simulation based on wavelet transform as opposed to the traditional method employed and we validate the speed

and accuracy of our simulator by comparing our results with those of a popular commercial Lithography simulator

considering focus variations. While our simulator suffers from an RMS error of < 6%, the major gains are (1) an

increase in simulation speed of > 20X and (2) the ability to simulate very large circuit masks where the commercial

software fails and direct incorporation of (3) manufacturing process variation. This allows litho simulation againstmultiple manufacturing process corners, which in turn helps in producing robust design.

67 On-Chip Inductor-less DC-DC Boost Converter with Non-Overlapped Rotational-Interleaving Scheme

An architecture of inductor-less DC-DC boost converter for high efficiency and low output ripple is proposed. Output

ripple is reduced by splitting flying capacitors into a number of smaller elements and using a new switching scheme

called Non-Overlapped Rotational-Interleaving (NORI). The proposed switching scheme also helps to eliminate

reversion and shoot through current hence improves the power efficiency. The proposed converter is designed in

0.18M CMOS thick gate process having 440pF total flying capacitance. The target specification of load current is 1mA

- 23mA for 5V - 6.5V output voltage from an input supply of 3.3V. The achieved peak Power efficiency is 89% at 10mA

load current as compare to 83% peak power efficiency obtained from the best existing architecture designed in same

technology. The output ripple at 10mA load current is 2.2mV in presence of only 50pF load capacitance

68 On Minimization of Test Application Time for RAS

Conventional Random access scan (RAS) for testing has lower test application time, low power dissipation, and low

test data volume compared to standard serial scan chain based design. In this paper, we present two cluster based

techniques, namely, Serial Input Random Access Scan and Variable Word Length Random Access Scan to reduce test

application time even further by exploiting the parallelism among the clusters and performing write operations on

multiple bits. Experimental results on benchmarks circuits show on an average 2-3 times speed up in test write time

and average 60% reduction in write test data volume.

elysium vlsi 2010

Documents