elysium vlsi 2010

Upload: elysiumtechnologies

Post on 29-May-2018

234 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Elysium VLSI 2010

    1/21

    Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Kollam | Ramnad | Tuticorin | Singapore

    #230, Church Road, Anna Nagar, Madurai 625 020, Tamil Nadu, India(: +91 452-4390702, 4392702, 4390651Website: www.elysiumtechnologies.com,www.elysiumtechnologies.infoEmail: [email protected]

    Abstract Very Large Scale Integration 2010 - 2011

    01 Novel Vth Hopping Techniques for Aggressive Runtime Leakage Control

    The continuous increase of leakage power consumption in deep sub-micro technologies necessitates more aggressive

    leakage control. Runtime leakage control (RTLC) is effective, since runtime circuits generally have significant amount

    of idleness. However, current RTLC techniques are only used when circuits have long idleness, rendering the

    techniques less profitable. The reason is due to the large energy and delay overhead when performing RTLC mode

    transition. We propose two novel techniques, workload-adaptive Vth hopping (WAVTH) and hierarchical Vth hopping

    (HIVTH), to tackle the overhead problems and enable aggressive runtime leakage control. Experimental results show

    19.2% average improvement on leakage saving with WAVTH and HIVTH over basic Vth hopping. The optimum design

    points of these two techniques are determined through accurate modeling

    02Leakage-Aware Energy Minimization using Dynamic Voltage Scaling and Cache Reconfiguration in Real-

    Time Systems

    System optimization techniques are widely used to improve energy efficiency as well as overall performance. Dynamic

    voltage scaling (DVS) is acknowledged to be successful in reducing processor energy consumption. Due to the

    increasing significance of the memory subsystems energy consumption, dynamic cache reconfiguration (DCR)

    techniques are recently proposed at the aim of saving cache subsystems energy consumption. As the manufacturing

    technology scales into the order of nanometers, leakage current, both in the processor and cache subsystem,

    becomes a significant contributor in the overall power dissipation. In this paper, we efficiently integrate processor

    voltage scaling and cache reconfiguration together that is aware of leakage power to minimize overall system energy

    consumption. Experimental results demonstrate that our approach outperforms existing techniques by on average 12 -

    23%.

    03 Modeling of RF- MEMS BAW Resonator

    Due to the demand of smaller and more portable devices the applications of MEMS resonators are rapidly increasing.

    Solidly Mounted Resonators (SMR) based on Bulk Acoustic Wave (BAW) technology follow MEMS principles to build

    high performance microwave filters for RF communication. In this paper we will provide the architecture of SMRs by

    discussing the designing aspects of its core structures which are within foundry CMOS processes using RF design

    software Advanced Design System (ADS). Conventional VLSI processes are followed for the fabrication of the SMRs.

    The results from the fabricated data are compared and discussed.

  • 8/8/2019 Elysium VLSI 2010

    2/21

    Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Kollam | Ramnad | Tuticorin | Singapore

    #230, Church Road, Anna Nagar, Madurai 625 020, Tamil Nadu, India(: +91 452-4390702, 4392702, 4390651Website: www.elysiumtechnologies.com,www.elysiumtechnologies.infoEmail: [email protected]

    04 Modeling and Design Considerations of Coupled Inductor Converters

    In this part of the sequel on the modeling and analysis of coupled inductors and coupled inductor based multiphase

    switching converters, the recently developed symmetrical coupled inductor model is first extended to include the

    inductor winding dc resistance (DCR). The extended model is then used to analyze the influence of the coupling on the

    DCR based current sensing schemes popularly used in multi-phase switching regulators. It is found that the time-

    constant matching condition in coupled inductor converters needs to be modified to include the coupling coefficient.

    The proposed model is also used to derive the small-signal control-to-output transfer function of the converters

    incorporating coupled inductors, with which the effect of coupling on the dynamic behaviors of the converter power

    stage, such as resonant frequency and damping factor, can be easily evaluated.

    05 Instruction Selection in ASIP Synthesis using Functional Matching

    In embedded systems, Application Specific Instruction Set Processors (ASIPs) are used commonly with the aim to get

    high performance without losing flexibility. A crucial operation required during ASIP synthesis (in particular, selection

    of custom instructions) as well as code generation for ASIPs is identifying portions of an application program that can

    be executed by custom functional units (CFUs). Most existing solutions achieve this by matching structure of patterns

    corresponding to CFUs with sub-graphs of application data flow graphs. Often it happens that the computations

    performed by the two are equivalent, but due to structural dissimilarities the match is missed. What is needed is a

    method that can match two graphs functionally rather than structurally. In this paper, we present a novel method to do

    this and give implementation results to show its effectiveness.

    06 Inexact Decision Circuits: An Application to Hamming Weight Threshold Voting

    In this work, the authors present the idea of inexact decision making, and its application to threshold voting of Hamming

    weights, used in bus coding schemes, N-Modular Redundancy (NMR), Median filtering, and other pattern matching

    applications. Decision circuits can be tweaked, to perform in an inexact manner, in order to optimize in terms of delay and

    power, but still maintaining high system accuracy. One such circuit identified by the authors is the threshold voting of

    Hamming weights. A majority voter is a special case of this family of circuits. The proposed inexact voter consumes up to 8

    times less power than the exact voter, with negligible reduction in system accuracy and performance. The leakage power is

    also reduced by a factor of 3. The inexact voter allows for a higher frequency of operation, by reducing the critical path delay

    by a factor of up to 3.4. The results obtained validate the application of inexact decision making, with respect to threshold

    voters.

  • 8/8/2019 Elysium VLSI 2010

    3/21

    Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Kollam | Ramnad | Tuticorin | Singapore

    #230, Church Road, Anna Nagar, Madurai 625 020, Tamil Nadu, India(: +91 452-4390702, 4392702, 4390651Website: www.elysiumtechnologies.com,www.elysiumtechnologies.infoEmail: [email protected]

    07 Implementation of a Novel Phoneme Recognition System using TMS320C6713 DSP

    A number of techniques have been proposed in the literature for phoneme based speech recognition system. In this

    paper, a technique for automatic phoneme recognition using zero-crossings (ZC) and magnitude sum function (MSF) is

    proposed. The number of zero-crossings and Magnitude sum function per frame are extracted and a Minimum Distance

    Classifier is proposed to recognize the phonemes in each frame with these features. In order to increase the

    recognition accuracy of phonemes, a finite state machine is also proposed. The performance of the proposed

    phoneme recognition system is evaluated using TTS database and compared with the system using Linear Predictive

    Coefficients (LPC) feature inputs. Phoneme recognition accuracies of 70.93% and 55.25% are obtained for the system

    using LPC and the one using ZC along with MSF respectively. However, using the finite state machine proposed in this

    paper, 100% recognition accuracy is obtained for both the techniques. The computational costs required for

    recognizing various sentences using both of the feature extraction techniques are evaluated. It is observed that the

    proposed technique requires about 9.3 times lower computational cost than the one using LPC. The proposed

    technique is adopted for the implementation of the phoneme recognition system on Texas Instruments TMS320C6713

    floating point processor. The different ways to reduce the recognition time for the target device is explored and

    reported in this paper. The technique proposed here is also applicable for speech inputs from other database.

    08 Impact of Temperature on Test Quality

    The usage of more advanced, less mature processes during manufacturing of semiconductor devices has increased

    the need for performing unconventional types of testing, like temperature-testing, in order to maintain the same high

    quality levels. However, performing temperature-testing is costly. This paper proposes a viable low-cost alternative to

    temperature testing that quantifies the impact of temperature variations on the test quality and also determines optimal

    test conditions. The test flow proposed is empirically validated on an industrial-standard die. The results obtained

    show that majority of the defects that were originally detected by temperature-testing are also detected by the

    proposed test flow, thereby reducing the dependence on temperature testing to achieve zero-defect quality. Details of

    an interesting defect behavior at cold test conditions are also presented.

    09 Identifying the bottlenecks to the RF performance of FinFETs

    : In this work, the high frequency (RF) performance of FinFETs is investigated in detail using a two-level parasitic

    model comprising outer and inner parasitic capacitances in addition to parasitic series resistances. Use of scaling

    relations of these parasitic capacitances with numbers of fins and fingers allows extraction of these elements. Next, by

    defining a series of reference surfaces, each associated with a certain set of parasitic elements; we proceed to

    calculate the RF Figures of Merit, namely fT and fmax at these surfaces. These are called available fT (fmax) in this

    work. Analysis of the available fT (fmax) gives insight into the extent to which different parasitics affect the FinFETs

    RF performance. The main bottleneck to the FinFETs RF performance is identified, solutions are proposed and

    relevant trade-offs are discussed.

  • 8/8/2019 Elysium VLSI 2010

    4/21

    Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Kollam | Ramnad | Tuticorin | Singapore

    #230, Church Road, Anna Nagar, Madurai 625 020, Tamil Nadu, India(: +91 452-4390702, 4392702, 4390651Website: www.elysiumtechnologies.com,www.elysiumtechnologies.infoEmail: [email protected]

    10 Identifying Tests for Logic Fault Models Involving Subsets of Lines without Fault Enumeration

    Bridging and interconnect open faults are defined using subsets of lines. We study the possibility of identifying input

    vectors that are effective as test vectors for such faults without enumerating the faults. This process does not require

    accurate layout information, it can handle very large numbers of faults, and it deals with undetectable faults implicitly.

    We describe a static test compaction process that uses the ability to identify effective test vectors without enumerating

    faults. This process selects a subset T of a given test set U such that T is guaranteed to detect the same faults as U.

    We also describe a test generation process based on the same concept. Finally, we show how this concept can be

    used to compare test sets.

    11Hamming Distance Based Reordering and Column wise Bit Stuffing with Difference Vector: A Better

    Scheme or Test Data Compression with Run Length Based Codes

    Because of increased design complexity and advanced fabrication technologies, the number of tests and

    corresponding data volume increases rapidly. As the large size of test data volume is becoming one of the major

    problems in testing System on- a-Chip (SoC), several compression coding schemes have been proposed in past. Run

    Length Coding is one of the most familiar coding methodologies for compression. In this paper, we present a new

    scheme named Hamming Distance Based Reordering and Column wise Bit Stuffing with Difference Vector (HDR-

    CBSDV), Which can be used with any run length based code technique for better compression ratio. Four techniques

    have been applied in this scheme: Selection of first vector, Hamming Distance Based Reordering, Column wise Bit

    Stuffing and Difference Vector. Instead of directly applying any known run length code like Golomb, Frequency

    Directed Run Length (FDR), Extended FDR (EFDR), Modified FDR (MDFR) or Shifted Alternate FDR (SAFDR) to given

    test set, if we apply the proposed scheme to test set prior to applying the run length base code, the compression

    obtained is improved drastically. The experimental results on ISCAS89 Benchmark circuits shows that the test data

    compression ratio improves significantly for each case. It is also noteworthy that in most of the case, this scheme

    does not involve any extra silicon area over-head compared to the base code with which it used. For few cases, it

    requires an extra XOR gate and feedback path only. The proposed scheme can be easily integrated into the existing

    industrial flow.

    12 Functional Refinement: A Generic Methodology for Managing ESL Abstractions

    Ever increasing complexity of SoCs has resulted in starting the system design at a higher level of abstraction. System-

    level design methodology envisages step-wise refinement of high-level models towards final RTL. However, current

    practices are limited to only interface-refinement and the true functionality refinement is performed by developing a

    different model for each abstraction-level. This results in minimal re-use of existing model, loss of efforts and high

    maintenance cost of multiple models. This paper presents a novel methodology that enables seamless refinement of IP

    model functionality from one level to another. The presented methodology is generic to be applied to various SoC

    design tasks. This paper demonstrates the application of the methodology to software energy-estimation for a DSP

    and functionalcum- timing refinement of DDR-memory model. The proposed methodology resulted in complete re-use

    of the existing models, easy availability of various model abstractions and 20% savings in development-effort of a new

    model.

  • 8/8/2019 Elysium VLSI 2010

    5/21

    Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Kollam | Ramnad | Tuticorin | Singapore

    #230, Church Road, Anna Nagar, Madurai 625 020, Tamil Nadu, India(: +91 452-4390702, 4392702, 4390651Website: www.elysiumtechnologies.com,www.elysiumtechnologies.infoEmail: [email protected]

    13 Exploring use of NoC for Reconfigurable Video Coding

    MPEG RVC is a standard under development which addresses the issues of standardization of video coding tools and

    multi-format codec design. It is likely to have SoC based solutions developed for MPEG RVC in future. In this paper we

    evaluate Network on Chip (NoC) as an on-chip interconnection mechanism for MPEG RVC SoC. We use MPEG RVC

    reference C code for MPEG 2 and AVC intra-only decoding, and the Open source NoC named NoCem for simulation.

    We make a new simulation platform over NoCem by using VHDL FLI. This platform allows us to make cycle accurate

    measurements. We experiment with different input resolutions and different configurations of NoCem and measure the

    network performance and overhead for reconfiguration. The results point that NoC is potential candidate for on-chip

    interconnection mechanism for MPEG RVC SoC.

    14 Experimental Results and Study of a Modified Adaptive Bus Voltage Controller

    Two-stage power conversion is used in many applications. The determination of the bus voltage in two stage converter

    is important to achieve high overall power conversion efficiency. Conventional two-stage converters utilize either a

    fixed DC bus voltage or a variable but predetermined DC bus voltage, which do not necessarily result in an optimum

    operation with optimum bus voltages under variable conditions. The paper presents a modified adaptive bus voltage

    controller for two-stage power converter. The controller adaptively converges to the optimum bus voltage that yields to

    the maximum power conversion efficiency under variable operating conditions. The modified adaptive two-stage bus

    voltage controller is evaluated using results obtained from a proof of concept experimental prototype

    15 Electrical Modeling of Lithographic Imperfections

    Lithographic wavelength of 193nm has been used for past few generations of patterning and is likely to remain in use

    for next few technology generations (at least till 28nm technology half-node) as well. This deep sub-wavelength

    patterning has resulted in wafer shapes not resembling drawn rectilinear shapes. The resulting non-rectangular

    devices and wires are not handled by current generation modeling and analyses methods. In this paper, we present a

    survey of electrical modeling methods for such lithographic imperfections especially on transistor layers. We also

    discuss use contexts of such models as well as briefly present electrical implications of the likely future patterning

    candidate, namely double patterning

  • 8/8/2019 Elysium VLSI 2010

    6/21

    Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Kollam | Ramnad | Tuticorin | Singapore

    #230, Church Road, Anna Nagar, Madurai 625 020, Tamil Nadu, India(: +91 452-4390702, 4392702, 4390651Website: www.elysiumtechnologies.com,www.elysiumtechnologies.infoEmail: [email protected]

    16 Design Procedure for High Frequency Operation of the Modified Series Resonant APWM Converter withImproved Efficiency and Reduced Size

    In this paper, a generalized analysis for the auxiliary network in a modified series resonant asymmetrical pulse width-

    modulated (APWM) converter is performed to produce a design procedure that ensures ZVS is achieved for any

    converter design. New equations that correctly predict the magnitude of auxiliary current are obtained by accounting

    for the trapezoidal nature of the waveforms associated with high frequency operation, and the dead time between the

    switches in the half bridge. A design example of a 48V/1.2V, 25A converter operating at 1MHz is chosen to highlight the

    validity of the proposed design and that superior results can be achieved if the resonant tank is designed in tandem

    with the auxiliary network. Experimental results verify that ZVS is achieved, and that the proposed design reduces the

    auxiliary inductor by close to a factor of 3.

    17 Design of Reversible Latches Optimized for Quantum Cost, Delay and Garbage Outputs

    Reversible logic has extensive applications in emerging nanotechnologies, such as quantum computing, optical

    computing, ultra low power VLSI and quantum dot cellular automata. In the existing literature, designs of reversible

    sequential circuits are presented that are optimized for the number of reversible gates and the garbage outputs. The

    optimization of the number of reversible gates is not sufficient since each reversible gate is of different computational

    complexity, and thus will have a different quantum cost and delay. While the computational complexity of a reversible

    gate can be measured by its quantum cost, the delay of a reversible gate is another parameter that can be optimized

    during the design of a reversible sequential circuit. In this work, we present novel designs of reversible latches that are

    optimized in terms of quantum cost, delay and the garbage outputs. The optimized designs of reversible latches

    presented in this work are the D Latch, JK latch, T latch and SR latch

    18 Design of NoC for SoC w ith Multiple Use Cases Requiring Guaranteed

    Many SoC architectures aimed at the multimedia domain support multiple use cases where only a subset of the

    applications is active at any time. Further, each multimedia application itself poses strict constraints on core-to-core

    communication latency. This paper presents an approach for automated synthesis of NoC architectures for such an

    SoC. We evaluated our design approach through comparisons with two existing techniques aimed at generating best

    effort and guaranteed throughput designs. Designs generated by our approach showed a marked improvement in both

    power consumption (12.3% decrease) and resource requirements (12.9% decrease) in comparison to the best effort

    NoC design approach. In comparison to the existing guaranteed throughput design approach our designs can

    guarantee core-to-core latency while consuming less power (8.1% decrease) and resources (7.9% decreases).

  • 8/8/2019 Elysium VLSI 2010

    7/21

    Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Kollam | Ramnad | Tuticorin | Singapore

    #230, Church Road, Anna Nagar, Madurai 625 020, Tamil Nadu, India(: +91 452-4390702, 4392702, 4390651Website: www.elysiumtechnologies.com,www.elysiumtechnologies.infoEmail: [email protected]

    19 Design of Low-Cost High-performance Floating-point Fused Multiply-Add with Reduced Power

    This paper presents a floating-point fused multiply-add (FMA) unit with low-cost and low power techniques. To

    improve the performance, two single-precision operations can be performed concurrently with one double-precision

    data path, which is very useful in multimedia and even scientific applications. Moreover, to reduce the additional area

    costs for supporting two single-precision operations in parallel, multiple double-precision units, i.e., the multiplier,

    shifter and adder, are reused as much as possible. A modified dual-path algorithm is proposed by classifying the

    exponent difference into three cases and implementing them with CLOSE and FAR paths, which can reduce latency

    and facilitate lowering power consumption by enabling only one of the two paths. In addition, in case of FADD

    instructions, the multiplier in the first stage is bypassed and kept in stable mode, which can significantly improve

    FADD instruction performance and lower power consumption. The overall FMA unit has a latency of 4 cycles while the

    FADD operation has 3 cycles. Each cycle has a time delay of about 0.66ns in the ST 65nm CMOS technology.

    Compared with the conventional double-precision FMA, about 13% delay is reduced and about 22% area is increased,

    which is acceptable since two single-precision results can be generated simultaneously.

    20 Design Considerations for BEOL MIM Capacitor Modeling in RF CMOS Processes

    Modeling ofMIM capacitors in high frequency RF applications depends heavily on the design of test structures. An

    external substrate ring is shown to be essential in capturing and modeling the inherent inductance of the MIM

    capacitor. Additionally, deembedding of series parasitics plays a very important role in modeling of MIM capacitors

    since these devices have very low series resistance. Various short Structures were studied and their impacts on the

    MIM characteristics are reported. It is shown that a short structure with the shortest path to ground is best suited to

    deembed the series parasitics.

    21Coverage Management with Inline Assertions and Formal Test Points

    This paper studies the problem of coverage management with two emerging formalisms in simulation based validation,

    namely formal specification of test points and the use of inline temporal assertions. We present methods for checking

    whether a test-bench with inline assertion covers a set of formal test points. This is particularly useful in developing

    verification IPs for standard on-chip protocols where the development team must make sure that the test bench

    provided in the verification IP checks all the important aspects of the protocol. We demonstrate the efficacy of our

    approach over the ARM AMBA verification IP.

    22 Clocking-based Coplanar Wire Crossing Scheme for QCA

    Quantum-dot Cellular Automata is one of the promising next-gen fabrics for circuits. Coplanar wire crossings is one of

    the more elegant features of this new low power computing paradigm. However, these need two types of cells and are

    known to be neither easy to fabricate nor very robust. In this work, we propose coplanar wire crossing using a single

    type of QCA cells, by applying the concept of Time Division Multiplexing to design the crossing. This has massive

    implications in fabrication and fault tolerance of QCA circuits.

  • 8/8/2019 Elysium VLSI 2010

    8/21

    Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Kollam | Ramnad | Tuticorin | Singapore

    #230, Church Road, Anna Nagar, Madurai 625 020, Tamil Nadu, India(: +91 452-4390702, 4392702, 4390651Website: www.elysiumtechnologies.com,www.elysiumtechnologies.infoEmail: [email protected]

    23 Channel Optimization for the Design of High Speed I/O links

    The continuous increase in microprocessor performance demands an equal order of increase in the bandwidth

    requirements on the memory and I/O interfaces. Providing the required bandwidth at an acceptable cost is a challenge

    to the system packaging engineer. This paper discusses how a passive channel can be optimized in a cost effective

    way to provide the maximum bandwidth. The paper focuses on the design methodology including modeling the

    channel, identifying the channel bottle-necks, optimizing around the bottle-necks and verifying the conclusions

    through simulation. Finally the simulation results are verified through hardware measurements.

    24 Bridgeless Buck PFC Rectifier

    TA new bridgeless buck PFC rectifier that substantially improves efficiency at low line of the universal line range is

    introduced. By eliminating input bridge diodes, the proposed rectifiers efficiency is further improved. Moreover, the

    rectifier doubles its output voltage, which extends useable energy of the bulk capacitor after a drop-out of the line

    voltage. The operation and performance of the proposed circuit was verified on a 700-W, universal-line experimental

    prototype operating at 65 kHz. The measured efficiencies at 50% load from 115-V and 230-V line are both close to

    96.4%. The efficiency difference between low line and high line is less than 0.5% at full load. A second-stage half-

    bridge converter was also included to show that the combined power stages easily meet Climate Saver Computing

    Initiative Gold Standard

    25Bottleneck Identification Techniques leading to Simplified Performance Models for Efficient Design

    Space Exploration in VLSI Memory Systems

    High performance VLSI systems are being built as multiprocessor systems-on-chip. The number of processors and

    their performance is rising rapidly while the change is slower for the memories. The memory system is often a

    performance bottleneck in terms of either its bandwidth or latency. We propose sensitivity analysis as a means to

    pinpoint the bottleneck. We introduce a novel randomized technique to measure the sensitivities within cycle accurate

    simulators. The sensitivity measures identify the bottleneck regions of the design space, within which simplified

    performance models can be used for optimization. We demonstrate this methodology on the Augmint-MemSim

    simulator, which is a cycle accurate model for multi-processor systems with a distributed memory sub-system. We

    empirically show that: (i) Performance predictions from simplified models are strongly correlated with the simulator in

    the high sensitivity regions. (ii) The simplified models speed up design Space exploration by 2 3 orders of magnitude

    over the simulator resulting in better design solutions.

  • 8/8/2019 Elysium VLSI 2010

    9/21

    Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Kollam | Ramnad | Tuticorin | Singapore

    #230, Church Road, Anna Nagar, Madurai 625 020, Tamil Nadu, India(: +91 452-4390702, 4392702, 4390651Website: www.elysiumtechnologies.com,www.elysiumtechnologies.infoEmail: [email protected]

    26 Architectural Comparison of Analog and Digital Duty Cycle Corrector for High Speed I/O Link

    To achieve high speed data signaling rates with the internal fast clock operating at half its speed, the XDR (extreme

    data rate) I/O link employs dual-edge signaling where in data bits are transmitted on both the edges (rise/fall) of

    transmit clock. Duty cycle correction technique is used to provide high frequency low jitter clocks that have 50% duty

    cycle. This paper compares two different techniques to implement duty cycle corrector (DCC). These techniques are

    implemented in high speed I/O operating at data rate of 4Gbps and 6.4Gbps in TSMC 65nm & TSMC 40nm technology

    achieving an output duty cycle error below 2% for 10% input duty cycle error.

    27 Analyzing Energy-Delay Behavior in Room Temperature Single Electron Transistors

    This paper presents Single Electron Transistor (SET) devices operating at room temperature as an attractive option to

    implement low energy consumption circuits with low-tomoderate performance requirements. Currently, such circuits

    are implemented using CMOS technologies operating at low supply voltages. CMOS is usually leakage dominated at

    such a low voltage regime and various optimizations are necessary to design low energy circuits. By discussing the

    energy-delay trade-offs for SET devices and comparing them to those of contemporary CMOS technology, we present

    an argument that SET devices may be more favorable compared to CMOS from the energy and delay standpoints at

    low supply voltages.

    28 Analysis, Design and Simulation of Capacitive Load Balanced Rotary Oscillatory Array

    The high frequency of the rotary clocking technology is often susceptible to implementation parameters such as the

    variation in the total capacitive load distribution between the rings. SPICE simulations performed on the rotary rings

    with Unbalanced capacitive load distribution shows a 30.31% variation in the simulated frequencies across the rings.

    To address this problem, two novel methodologies called OCLB and SOCLB, are formulated for the optimal capacitive

    load balancing and suboptimal capacitive load balancing with minimized wire length, respectively. SPICE simulations

    performed with OCLB show 0.30% variation in the simulated frequencies across the rings. Further, SOCLB results in

    an average wire length improvement of 69.24% over OCLB with a relatively balanced capacitive load distribution.

    SPICE simulations performed with SOCLB show 2.40% variation in the simulated frequencies across the rings,

    improved significantly over the 30.31% variation of the unbalanced case.

  • 8/8/2019 Elysium VLSI 2010

    10/21

    Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Kollam | Ramnad | Tuticorin | Singapore

    #230, Church Road, Anna Nagar, Madurai 625 020, Tamil Nadu, India(: +91 452-4390702, 4392702, 4390651Website: www.elysiumtechnologies.com,www.elysiumtechnologies.infoEmail: [email protected]

    29 An L-band Fractional-N Synthesizer with noise-less Active Capacitor scaling

    In a charge-pump based type-II analog Phase Locked Loop (PLL), the loop filter often uses a small resistor along with a

    big integrating capacitor for good phase noise performance. This comes at the cost of large silicon area or external

    component. The noise from the resistor contributes to the output phase noise through both feedback and feed-forward

    paths and hence has a presence in the output over a very wide frequency band. In this PLL, the loop filter avoids the

    feed-forward and limits the contribution of the resistor noise over a narrow frequency band. This technique allows a

    large resistor to be used with a small capacitor without phase noise penalty. The achieved independent control of

    bandwidth and stabilizing zero gives better stability and reduces noise peaking. The integrated phase error achieved at

    1.3GHz is -38dBc.

    30An improvised MOS transistor model suitable for Geometric Program based analog circuit sizing in Sub-

    micron technology

    This paper presents ways to improve accuracy of performance prediction for geometric program based analog design

    in submicron regime. Geometric program requires a special monomial form of the device model it uses. The major

    sources of inaccuracy in this basic model have been identified and it has been shown that slightly relaxing the strict

    monomial form in order to include second order effects can greatly improve the accuracy. In order to make use of this

    model we deploy it in collaboration with an iterative solution betterment scheme, by solving the sizing problem as a

    sequence of geometric programs instead of a single one. We illustrate the efficacy of our scheme through a folded-

    cascode op-amp sizing example.

    31 An Improved High Resolution CMOS Timing Generator Using Array of Digital Delay Lock Loops

    In this paper, an improved high resolution CMOS timing generator using array of digital delay lock loops is presented.

    The timing generator is implemented as an array of delay locked loops. This architecture enables a timing generator

    with sub gate delay resolution to be implemented. The proposed Delay Lock Loops use novel start controlled Dual

    Phase and frequency Detector along with a charge pump where the injected charge approaches zero as the loop

    approaches lock on the leading edge and the trailing edge of an input clock reference. The delay lock loop locks to

    both the leading and trailing clock edges as the start controlled dual phase and frequency detector along with charge

    pump convert the phase difference into voltage, which greatly reduces the timing jitter. In the start controlled dual

    phase and frequency detector, the start-controlled circuit is used to provide a precise output without the Locking

    problem. The results show that the total delay time between the input and the output of the DLL (Delay Lock Loop) is

    one clock cycle and all of the delay cells provide precise output without false locking or harmonic locking. Test resultsshow a timing jitter of less than 5 pS for the DLL circuit and has very low phase sensitivity errors. The timing generator

    implemented as an array of delay locked loops has exponentially reduced the locking time as well avoids false locking

    or harmonic locking.An experimental proto type was simulated using 0.35 technology with a supply voltage of 3.3V.

  • 8/8/2019 Elysium VLSI 2010

    11/21

    Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Kollam | Ramnad | Tuticorin | Singapore

    #230, Church Road, Anna Nagar, Madurai 625 020, Tamil Nadu, India(: +91 452-4390702, 4392702, 4390651Website: www.elysiumtechnologies.com,www.elysiumtechnologies.infoEmail: [email protected]

    32 An Efficient Method for Bottom-Up Extraction of Analog Behavioral Model Parameters

    This paper presents a fast, accurate and robust method for bottom-up extraction of analog behavioral model

    parameters from the corresponding transistor level netlists. The proposed Verilog-A in-loop simulation based

    modeling approach is generic and can estimate the parameters of the corresponding model of any given circuit using

    relevant test-benches, thus removing the need to implement structure based estimation tools for each circuit. The

    models are usually non-linear with respect to the parameters and often the optimization problem becomes nonconvex.

    A hybrid method based on co-operation and switching between search and gradient methods is proposed for

    achieving significantly faster convergence to the global minima even in Presence of local minima in such non-convex

    cases. This method is applied by the authors to a wide variety of analog circuits, and is demonstrated in the paper

    using two distinctly different analogcircuits. Simulation results comparing the model and transistor level netlist show

    that high level of accuracy can be achieved. The comparison of the search, gradient and the proposed hybrid method

    is presented

    33 An Efficient Design of a Reversible Barrel Shifter

    The key objective of todays circuit design is to increase the performance without the proportional increase in power

    consumption. In this regard, reversible logic has become an immensely promising technology in the field of low power

    computing and designing. On the other hand, data shifting and rotating are required in many operations such as

    arithmetic and logical operations, address decoding and indexing etc. In this consequence, barrel shifters, which can

    shift and rotate multiple bits in a single cycle, have become a common design choice for high speed applications. For

    this reason, this paper presents an efficient design of a reversible barrel shifter. It has also been shown that the new

    circuit outperforms the previously proposed one in terms of number of gates, number of garbage outputs, delay and

    quantum cost.

    34 Accelerating Synchronous Sequential Circuits using an Adaptive Clock

    In this paper we propose a scheme for enhancing the timing performance of a pre-designed synchronous sequential

    circuit. In the proposed scheme, a circuit is driven by two clocks. One of them is the conventional clock while the other

    one, having a shorter period, is applied when the circuit stabilizes well before the critical delay. We use a symbolic

    algorithm to analyze the timing behavior of the synchronous Sequential circuit and provide add-on circuitry to select

    the appropriate clock based on the current state of the circuit. We demonstrate an appreciable gain (67% in average) In

    timing performance on several benchmark circuits

  • 8/8/2019 Elysium VLSI 2010

    12/21

    Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Kollam | Ramnad | Tuticorin | Singapore

    #230, Church Road, Anna Nagar, Madurai 625 020, Tamil Nadu, India(: +91 452-4390702, 4392702, 4390651Website: www.elysiumtechnologies.com,www.elysiumtechnologies.infoEmail: [email protected]

    35 A Unified Solution to Scan Test Volume, Time, and Power Minimization

    The double-tree scan-path architecture, originally proposed for low test power, is adapted to simultaneously reduce

    the test application time and test data volume under external testing. Experimental results show significant

    performance improvements over other existing scan architectures.

    36 A Unified Approach for IP Protection across Design Phases in a Packaged Chip

    IP values contributed by the distinct design tools in specific design phases are recognized by observing the signature

    of the owner of each tool as functional or scan mode output of the fabricated chip, for certain input vector secret to the

    owner. An existing approach inserts watermark through reordering of single scan chain, and solely identifies the

    owner of the logic design tool. Here we propose a novel scheme to watermark the recent reconfigurable scan

    architectures, operating in both scan tree and single scan mode. The signature of the owner of physical design tool

    along with that of logic design tool can separately be embedded while designing the scan tree and also verified from

    the packaged chip without conflict using two distinct modes. A bi-objective minimization of overhead in routing and

    power is supported through our scheme. Experimental results on design overhead and robustness for ISCAS89

    benchmarks are encouraging.

    37 A Reconfigurable Architecture for Secure Multimedia Delivery

    This paper introduces a reconfigurable architecture for ensuring secure and real-time video delivery through a novel

    parameterized construction of the Discrete Wavelet Transform (DWT). This parameterized construction promises

    multimedia encryption and is also well-suited to a hardware implementation due to our derivation of rational filter

    coefficients. We achieve an efficient and high-throughput reconfigurable hardware implementation through the use of

    LUT-based constant multipliers enabling run-time reconfiguration of encryption key. We also compare our prototype

    (using a Xilinx Virtex 4 FPGA) to several existing implementations in the research literature and show that we achieve

    superior performance as compared to both traditional CPU-based and custom VLSI approaches while adding features

    for secure multimedia delivery.

    38 A P4VT (Power-Performance-Process-Parasitic-Voltage-Temperature) Aware Dual-VTh Nano-CMOS VCO

    We present the design flow for a P4VT (Power- Performance-Process-Parasitic-Voltage-Temperature) aware voltage

    controlled oscillator (VCO). Through simulations, we have shown that parasitic, process, voltage and temperature have

    a drastic effect on the performance (center frequency) of the VCO. A design optimization of the VCO, along with dual-

    threshold power minimization has been performed in the presence of worst-case variations. The end product of the

    proposed methodology is a P4VT-optimal dual-threshold 90nm VCO layout. We have achieved 16.4% power (including

  • 8/8/2019 Elysium VLSI 2010

    13/21

    Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Kollam | Ramnad | Tuticorin | Singapore

    #230, Church Road, Anna Nagar, Madurai 625 020, Tamil Nadu, India(: +91 452-4390702, 4392702, 4390651Website: www.elysiumtechnologies.com,www.elysiumtechnologies.infoEmail: [email protected]

    leakage) minimization with 10% degradation in center frequency compared to the target frequency, in the presence of

    worst-case variations

    39 A Novel Circuit to Optimize Access Time and Decoding Schemes in Memories

    As the microprocessor speed increases from 500MHz to 1GHz and beyond, SOC designers are forced to innovate new

    schemes in their use of cache memory for high speed access. In this paper, clock to word line path delay is optimized

    using a novel circuit design technique. Using this novel circuit, clock to word line path delay is optimized by 2.5 times

    at worst case corner. For a typical memory instance frequently used in cache memories) whose access time is of the

    order of 800ps and where read and write operation occurs in the same clock cycle, overall access time is improved by

    18% at worst case corner. For this case, write margin is improved by 2.26 times at worst case corner for write

    operation. A decoding scheme is also discussed in this paper which describes how to choose the best pre-decoding

    and post-decoding schemes based on minimum pre-decoded lines, minimum stack size in post decoder and maximum

    granularity of xdecoders

    40 A Non Quasi-Static Small Signal Model for Long Channel Symmetric DG MOSFET

    We propose a compact model for small signal non quasi static analysis of long channel symmetric double gate

    MOSFET. The model is based on the EKV formalism and is valid in all regions of operation and thus suitable for RF

    circuit design. Proposed model is verified with professional numerical device simulator and excellent agreement is

    found well beyond the cut-off frequency.

    41A new Hetero-material Stepped Gate (HSG) SOI LDMOS for RF Power Amplifier Applications

    In this paper, we propose a new hetero-material stepped gate (HSG) SOI LDMOS in which the gate is divided into three

    sections - an n+ gate sandwiched between two p+ gates and the gate oxide thickness increases from source to drain.

    This new device structure improves the inversion layer charge density in the channel, results in uniform electric field

    distribution in the drift region and reduces the gate to drain capacitance. Using two-dimensional simulation, the HSG

    LDMOS is designed and compared with the conventional LDMOS. We demonstrate that the proposed device exhibits

    28% improvement in breakdown voltage, 32% reduction in on-resistance, 13% improvement in transconductance, 9%

    reduction in gate to drain charge and 38% reduction in switching delay. HSG LDMOS may be effectively deployed in RF

    power amplifier applications

    42 A Methodology for Power Aware High-Level Synthesis of Co-Processors from Software Algorithms

    Hardware co-processors are used for accelerating specific compute-intensive tasks dedicated to video/audio codec,

    encryption/ decryption, etc. Since many of these data-processing tasks already have efficient software algorithms, one

    could reuse those to synthesize the co-processor IPs. However, such software algorithms are usually sequential and

  • 8/8/2019 Elysium VLSI 2010

    14/21

    Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Kollam | Ramnad | Tuticorin | Singapore

    #230, Church Road, Anna Nagar, Madurai 625 020, Tamil Nadu, India(: +91 452-4390702, 4392702, 4390651Website: www.elysiumtechnologies.com,www.elysiumtechnologies.infoEmail: [email protected]

    written in C/C++. High-level Synthesis (HLS) helps in converting software implementation to register transfer level

    (RTL) hardware design. Such co-processor based systems show enhanced performance but often have greater

    power/energy consumption. Therefore, the automated synthesis of such accelerator IPs must be power-aware.

    Downstream power savings features such as clock-gating are unknown during HLS. Designer is forced to take such

    power-aware decisions only after logic synthesis stage, causing an increase in design time and effort. In this paper, we

    present a design automation solution to facilitate various granularities of clock-gating at high-level C description of the

    design

    43 A Hierarchical Methodology for Word-Length Optimization of Signal Processing Systems

    The problem of converting floating point algorithms to implementation friendly fixed point formats is often solved as

    an optimization problem where the precision is traded to gain in the implementation cost. The complexity of the

    problem is known to grow exponentially with more optimizable variables. This paper proposes a divide and conquer

    technique to solve the growing size of the problem. The approach In this technique is original in the sense that it is

    formulated from a designers perspective rather than merely attempting to divide and conquer at the algorithmic level.This paper introduces the single noise source model based on which the proposed technique is built. A mixed

    approach for error propagation is also explained keeping in view of the elements in the circuit that cannot be handled

    analytically

    44 A Hardware Scheduler for Real Time Multiprocessor System on Chip

    This paper presents the design and implementation of a low power Hardware scheduler for multiprocessor system-on-

    chips. The Pfair scheduling algorithm is considered with three different implementation schemes: replicated software

    scheduler running on each processor, single software scheduler running on a dedicated processor and the proposed

    hardware scheduler. Experimental evaluation with benchmarks shows that the hardware scheduler outperforms theother two schemes in terms of energy consumption by an order of magnitude of 105 and scheduling delay by an order

    of magnitude of 103.

    45 A Graph-based I/O Pad Pre-placement Technique for use with Analytic FPGA Placement Methods

    Typical analytic placement methods seek to minimize total squared wire length by solving a linear equation system.

    However, to avoid trivial solutions, certain blocks must be assigned locations on the Field Programmable Gate Array

    (FPGA) fabric prior to optimization. A simple way to achieve this is to assign blocks randomly. However, this does not

    always result in the best solution. In this paper, we present a novel algorithm, called Shrub Place, for pre-assigning I/O

    blocks to I/O pads around the perimeter of the FPGA. To verify the efficacy of our pre-placement algorithm, we

    integrated the algorithm into the analytic placer in [1, 2]. When tested with the 20 MCNC benchmarks [11], our results

    show a reduction in wire length is Possible, with very little additional execution time required to perform the pre-

    placement.

  • 8/8/2019 Elysium VLSI 2010

    15/21

    Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Kollam | Ramnad | Tuticorin | Singapore

    #230, Church Road, Anna Nagar, Madurai 625 020, Tamil Nadu, India(: +91 452-4390702, 4392702, 4390651Website: www.elysiumtechnologies.com,www.elysiumtechnologies.infoEmail: [email protected]

    46 GA Combined DOE-ILP Based Power and Read Stability Optimization in Nano-CMOS SRAM

    A novel design approach for simultaneous power and stability (static noise margin, SNM) optimization of nano- CMOS

    static random access memory (SRAM) is presented. A 45nm single-ended seven transistor SRAM is used as a case

    study. The SRAM is subjected to a dual-VTh assignment using a novel combinedDesign of Experiments and Integer

    Linear Programming (DOE-ILP) algorithm, resulting in 50.6% power reduction (including leakage) and 43.9% increase

    in the read SNM. The process variation analysis of the optimal SRAM carried out considering twelve device parameters

    shows the robustness of the design.

    47 23.97GHz CMOS Distributed Voltage Controlled Oscillators with Inverter Gain Cells and Frequency Tuningby Body Bias and MOS Varactors Concurrently

    Tunable VCOs operating around 24GHz in 0.18m CMOS are reported. Simple CMOS inverters are used as gain stages

    and tuning is achieved with a novel Method using both body-bias as well as MOS varactors concurrently and

    compared for Performances. The novel tuning method allows for a wider tuning range than using a single method.

    Here forward body bias (FBB) type tuning of p-FETs has 9- 10 times higher tuning bandwidth as compared to MOS

    varactors tuning when the latter is connected in series (before output collection point) but equal or nearly equal tuning

    when the Varactor pair is connected in parallel (to drain transmission line). Six monolithically integrated novel

    distributed voltage ontrolled oscillators (D-VCOs) with a novel gain cell comprising of CMOS inverter are designed.

    Top Layer metal is used for coplanar waveguide (CPW) for onchip inductors. First D-VCO OSC-1 has 3-stages of the

    gain cell and oscillating at 23.97GHz, the second D-VCO OSC-2 has 4-stages of gain cell and oscillating at 18.64GHz,

    both K-band oscillators use body bias variation of p-FETs for wide frequency tuning. For further tuning after body bias

    type of tuning, MOS Varactors are added in series to OSC-1 and OSC-2 resulting in designs respectively OSC-3 and

    OSC-4, while in parallel resulting in designs respectively OSC-3a and OSC-4a. OSC-3 is oscillating at 23.53GHz and

    OSC-4 is oscillating at 18.09GHz. OSC-3a is oscillating at 22.79GHz with 340MHz tuning by each of these two tuning

    techniques (doubling of tuning bandwidth as total tuning is 680MHz). OSC-4a is oscillating at 17.77GHz (resulting Ku-

    band VCO from K-band for substantial design reuse) with 240MHz tuning by FBB and 200MHz tuning by Varactor pair

    (total tuning of 440MHz). The phase noise is reported at 1MHz offset from the carrier, for example it is -102.4dBc/Hz for

    18.64GHz D-VCO. These oscillators are emitting very low power in 2nd and 3rd harmonics.

    48A 90mW/GFlop 3.4GHz onfigurable Fused/Continuous Multiply-Accumulator for Floating-point and Integer

    Operands in 65nm

    This paper describes energy efficient and reconfigurable fused/continuous Multiply-Accumulator (MAC) architecture

    for single-precision Floating-point and 16- bit signed integer operands. This eight-stage pipelined and single-cycle

    throughput MAC design contains a bit level pipelined multiplier, followed by fast sparse-tree adder and single cycle

    accumulator loop with delayed normalization logic. Operation driven energy control is achieved using dynamic clock

    and fine grained power gating techniques. Power gating is employed in 98% of design to save 79% of leakage power in

    idle mode, at 1.2V supply and 110C. The use of fully shared logic in the multiplier, accumulator and normalization

    blocks for different operations enables a compact design of 0.54mm2 containing 117K transistors in eight-metal 65nm

  • 8/8/2019 Elysium VLSI 2010

    16/21

    Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Kollam | Ramnad | Tuticorin | Singapore

    #230, Church Road, Anna Nagar, Madurai 625 020, Tamil Nadu, India(: +91 452-4390702, 4392702, 4390651Website: www.elysiumtechnologies.com,www.elysiumtechnologies.infoEmail: [email protected]

    CMOS technology. The 15-FO4 design provides 6.8GFLOPS of performance with total energy efficiency of

    90mW/GFLOP at 1.2V and 3.4GHz operation.

    49 A6 bit 800MHzTIADC based on Successive Approximation in 65nm Standard CMOS Process

    Applications like Ultra-wideband radio, Optical Communication require sampling rates of at least 500MS/s with low

    resolution. The potential energy savings of successive approximation based time interleaved A-D conversion

    architecture overrides traditional flash architecture. This paper presents a 6- bit 800 MS/s ADC in 65 nm

    STMicroelectronics standard CMOS process. The ADC uses 8-channel time interleaved SAR topology and achieves 36

    dB SNR and 43 dB SFDR with 13.5 mW power consumption from 1.2 V supply. The resulting FOM is 0.3251 pJ/step. The

    timing mismatch among the channels is reduced by clock-edge reassignment technique. The high speed specification

    of the system requires the design of low offset comparator. Power consumption and jitter are reduced by using shift

    register based phase generator.

    50 4 GHz 130nm Low Voltage PLL Based on Self Biased Technique

    This paper explores a PLL core design that can satisfy a wide range of high frequency serial data communication

    applications. There exist several high frequency serial data communication protocols that co-exist today. The PLL

    design requirements for all these clock frequencies separately call for enormous design effort in terms of time and

    cost. It is desired to design a PLL core which makes it possible to address a wide segment of clock frequency

    requirement. The PLL achieves this using single 1.2V supply, it doesnt use any special mask layers and also doesnt

    need a bandgap reference for its operation. This PLL is based on self-biased technique and achieves high process

    technology independence, fixed damping factor, fixed bandwidth to operating frequency range and input phase offset

    cancellation. Here the self biased PLL in 130nm CMOS technology achieves the frequency range of 400 MHz to 4GHz.

    The PLL core is designed to accept a wide range of input reference frequencies

    51 Voltage-Frequency Planning for Thermal-Aware, Low-Power Design of Regular 3-D NoCs

    Network-on-Chip combined with Globally Asynchronous Locally Synchronous paradigm is a promising architecture for

    easy IP integration and utilization with multiple voltage levels. For power reduction, multiple voltage-frequency levels

    are successfully applied to 2-D NoCs, but never with a generic approach to 3-D counterparts; in which low heat

    conductivity of insulator layers makes high dense temperature distribution at layers away from heat sink. In this paper,

    a thermal-aware methodology for regular 3-D NoCs based on multiple voltage levels is proposed. Given an application

    task graph, this methodology determines an efficient mapping of tasks onto network tiles, considering inherent

    computation and communication requirements of the tasks and thermal resistance from any silicon layer to the

    ambient. Then, a heuristic approach is utilized to determine voltage and frequency specifications of all IP cores, such

    that total power is reduced, dissipated heat is properly conducted to the layers close to the heat sink, and application

    requirements (in terms of deadline) are satisfied. The experiments confirm a significant saving in total power while

    performance of the running application is guaranteed.

  • 8/8/2019 Elysium VLSI 2010

    17/21

    Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Kollam | Ramnad | Tuticorin | Singapore

    #230, Church Road, Anna Nagar, Madurai 625 020, Tamil Nadu, India(: +91 452-4390702, 4392702, 4390651Website: www.elysiumtechnologies.com,www.elysiumtechnologies.infoEmail: [email protected]

    52 Transition Inversion based Low Power data coding scheme for Buffered Data Transfer

    In this work the authors propose a data coding protocol that leads to power reduction for block data transfer in off-chip

    buses. I/O pads driving off-chip buses contribute to a major portion of power dissipation in chips. Also, block data

    transfer is Preferred in most systems like caches, DMA etc. In this proposed work, the prior knowledge of the block of

    data to be transmitted, when it is stored in the buffer, is exploited in a serial fashion to reduce transitions on every bus

    line. Statistical analysis shows up to 31.9% reduction in transitions. Benchmark results show that it leads to 29%

    reduction in power consumption. The technique provides added error detection on the lines of parity bit technique,

    with similar average error detection capability

    53 Towards Active-Passive Co-Synthesis of Multi-Gigahertz Radio Frequency Circuits

    This paper proposes a methodology and framework for rapid active-passive co-synthesis of radio frequency circuits.

    The presented approach leverages advances in accelerated three-dimensional electromagnetic simulation technology

    to construct Maxwell-accurate parametric macro models of on-chip passives, in particular spiral inductors. These

    macromodels can be used in the context of SPICE level synthesis, thereby enabling concurrent sizing of passive and

    active components of radio frequency circuits. Moreover, macromodels obviate the need for topology exploration and

    parametric RLC model generation via optimization for on-chip passives. The co-synthesis framework is enabled by

    nonlinear, hyper-dimensional regression for macromodel generation and a simulated annealing based optimization

    scheme. As examples, and to demonstrate the efficacy of the proposed approach, two standard low-noise amplifier

    topologies are synthesized with tight performance constraints by co-optimization of circuit parameters and inductor

    geometries.

    54 The dawn of 22nm era: Design and CAD challenges

    Technology scaling clearly has been the driver of semiconductor and thereby EDA industry. In the semiconductor

    industry today, 45nm CMOS designs are in full production and 32nm design rules and infrastructure are already in

    place for designs starting later this year. It will not be long before the beat of 22nm will be upon us. Due to ever

    increasing cost of doing design, design productivity and more specifically, cost of design has become a major

    bottleneck in large scale design projects. Due to this cost crunch, automated synthesis techniques have been

    becoming increasingly important and this is bound to become a major trend going into 22nm for high performance

    SoCs. In addition, in 22nm and beyond, 3D IC technology has the potential of easing the system performance

    challenge problem. In order to exploit the full potential of 3D technology, new challenges in the area of physicaldesign, thermal analysis, system level design and analysis need to be addressed. 3D interconnects have the potential

    of reducing critical paths delays significantly, which are typically between memory and the interfacing logic. In

    addition, now that the physical limits are beginning to impact scaling, the question is: how can we cost effectively

    design with complicated technology requirements presented by 22nm node and how the design automation

    community can help to achieve this goal? What are the challenges at 22nm and what would design look like going into

    22nm and beyond? In this paper, we will focus on the major design and CAD challenges associated with 22nm and

    beyond.

  • 8/8/2019 Elysium VLSI 2010

    18/21

    Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Kollam | Ramnad | Tuticorin | Singapore

    #230, Church Road, Anna Nagar, Madurai 625 020, Tamil Nadu, India(: +91 452-4390702, 4392702, 4390651Website: www.elysiumtechnologies.com,www.elysiumtechnologies.infoEmail: [email protected]

    55 Test Pattern Generation and Compaction for Crosstalk Induced Glitches and Delay Faults

    VLSI circuits have become more susceptible to signal integrity related failures with the ever decreasing process

    geometries. Detection of crosstalk induced faults is thus important as capacitive crosstalk is one of the major sources

    of signal integrity related failures. Crosstalk glitch can result in erroneous output if the glitch effect propagates to a

    primary output or to an intermediate flip-flop. Similarly the crosstalk induced delay effects can also result in latching of

    an incorrect value if the delay exceeds the allowed margins. In this work a test generation and compaction method is

    proposed for crosstalk faults. Test patterns are generated by simultaneously considering the coupling capacitance,

    timing and functional incompatibilities between the victim and aggressor nets, to produce the practical maximum

    crosstalk noise. A unique method is proposed for finding the functional incompatibilities between interconnects. The

    generated test set is then compacted initially through pattern merging and then further through the fault-chaining

    algorithm. Three different implementations of this algorithm are compared on crosstalk test sets generated for

    ISCAS85 benchmark circuits. Results show considerable reduction in crosstalk pessimism for the given layout and

    timing, as well as up to 75% reduction in overall test set size.

    56 Synthesizability of 3 party Formal Specifications-Does my controller see enough?

    This paper presents the problem of bounded synthesizability of formal specifications in the context of three party

    systems, consisting of a machine, its environment and a controller. The overall objective is to determine whether it is

    possible to synthesize both the machine and its controller for a given Linear Temporal Logic (LTL) specification over

    the signals in the machine and the controller interfaces.

    57Synchronization of Concurrently-Implemented Fluidic Operations in Pin-Constrained Digital Microfluidic

    Biochips

    The implementation of bioassays in pin-constrained biochips may involve pin-actuation conflicts if the concurrently

    implemented fluidic operations are not carefully synchronized. We propose a two-phase optimization method to

    identify and Synchronize the fluidic operations that can be executed in parallel. The goal is to implement these fluidic

    operations without pinactuation conflict, and minimize the duration of implementing the outcome sequence after the

    synchronization. The effectiveness of the proposed two-phase optimization method is demonstrated for a

    representative 3-plex assay performed on a fabricated pin constrained biochip

  • 8/8/2019 Elysium VLSI 2010

    19/21

    Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Kollam | Ramnad | Tuticorin | Singapore

    #230, Church Road, Anna Nagar, Madurai 625 020, Tamil Nadu, India(: +91 452-4390702, 4392702, 4390651Website: www.elysiumtechnologies.com,www.elysiumtechnologies.infoEmail: [email protected]

    58 Robust System Design

    Robust system design ensures that future systems continue to meet user expectations despite rising levels of

    underlying disturbances. This paper discusses two essential aspects of robust system design: 1. Effective post-silicon

    validation, despite staggering complexity of future systems, using a new technique called Instruction Footprint

    Recording and Analysis (IFRA). 2. Cost-effective design of systems that overcome CMOS reliability challenges through

    built-in tolerance to errors in hardware during system operation. A combination of Built-In Soft Error Resilience

    (BISER) and circuit failure prediction, together with on-line self-test/diagnostics and software-orchestrated

    optimization across multiple abstraction layers, enable design of cost-effective resilient systems.

    59 RF SOI Switch FET Design and Modeling Tradeoffs for GSM Applications

    A single-pole double-throw novel switch device in 0.18m SOI complementary metal-oxide semiconductor (CMOS)

    process is developed for 0.9 GHz wireless GSM systems. The layout of the device is optimized keeping in mind the

    parameters of interest for the RF switch. A sub circuit model, with the standard surface potential (PSP) model as the

    intrinsic FET model along with the parasitic elements is built to predict the Ron and Coff of the switch. The measured

    data agrees well with the model. The eight FET stacked switch achieved an Ron of 2.5 ohms and an Coff of 180 fF.

    60 Rethinking Threshold Voltage Assignment in 3D Multicore Designs

    Due to the inherent nature of heat flow in 3D integrated cir- cuits, stacked dies exhibit a wide range of thermal charac-

    teristics. The strong dependence of leakage with temperature and process variation plays havoc in achieving system

    level energy efficiency in such systems, complicating the task of power provisioning in 3D multicores. In this paper, we

    address this power provisioning chal- lenge in 3D ICs by advocating a novel microprocessor design paradigm, where

    the circuit designers are aware of the in- tended placement of a die in a 3D stack. We present a con- crete application

    of this paradigm through a threshold volt- age (Vt) assignment algorithm for a 3D multicore system, where we

    specifically account for: (a) the change in the role of leakage power, (b) expected operating frequency, and (c)

    dependency of PV induced leakage variation and Vt levels. Detailed simulation based experiments with our proposed

    al- gorithm show 215% improvement in energy efficiency for a typical multicore system organized as 3D stacked dies

    61 Processor Architecture Design Using 3D Integration Technology

    The emerging three-dimensional (3D) chip architectures, with their intrinsic capability of reducing the wire length, is

    one of the promising solutions to mitigate the interconnect problem in modern microprocessor designs. 3D memory

    stacking also enables much higher memory bandwidth for future chip-multiprocessor design, mitigating the memory

    wall problem. In addition, heterogenous integration enabled by 3D technology can also result in innovation designs

    for future microprocessors. This paper serves as a survey of various approaches to design future 3D microprocessors,

    leveraging the benefits of fast latency, higher bandwidth, and heterogeneous integration capability that are offered by

    3D technology.

  • 8/8/2019 Elysium VLSI 2010

    20/21

    Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Kollam | Ramnad | Tuticorin | Singapore

    #230, Church Road, Anna Nagar, Madurai 625 020, Tamil Nadu, India(: +91 452-4390702, 4392702, 4390651Website: www.elysiumtechnologies.com,www.elysiumtechnologies.infoEmail: [email protected]

    62 Post assembly timing closure for multi million gate chips

    A hierarchical timing closure methodology is presented. It has timing closure effectiveness of flat methods, while

    capacity and run time efficiency of subchip based methods. The unique proposition is that it performs flat logic

    physical optimization of cross subchip timing paths, while at the same time, abides to hierarchy rules. The principle

    and details of the methodology are provided. Experimental result on multi million gate designs shows its timing

    closure effectiveness with run time gains of 50% on optimization steps, and peak memory reduction as well.

    63 Pinpointing Cache Timing Attacks on AES

    The paper analyzes cache based timing attacks on opti- mized codes for Advanced Encryption Standard (AES). The

    work justifies that timing based cache attacks create hits in the first and second rounds of AES, in a manner that the

    tim- ing variations leak information of the key. To the best of our knowledge, the paper justifies for the first time that

    these attacks are unable to force hits in the third round and con- cludes that a similar third round cache timing attack

    does not work. The paper experimentally verifies that protecting only the first two AES rounds thwarts cache based

    timing attacks.

    64 Parametric Fault Diagnosis of Nonlinear Analog Circuits using Polynomial Coefficients

    We propose a method for diagnosis of parametric faults in analog circuits using polynomial coefficients of the circuit

    model [15]. As a sequel to our recent work [14], where circuit response is modeled as polynomial for uncovering

    parametric faults in nonlinear circuits, we propose diagnosis of such faults using sensitivity of coefficients of the

    estimated polynomial to circuit parameters. The proposed method requires no design for test hardware as might be

    added to the circuit by some other methods. The proposed method is illustrated for a benchmark elliptic filter. It is

    shown to uncover several parametric faults causing deviations as small as 5% from the nominal values.

    65 Optimized Stage Ratio of Tapered CMOS Inverters for Minimum Power and Mismatch Jitter Product

    In this paper, an optimum stage ratio (tapering factor) for a tapered CMOS inverter chain is derived to minimize the

    product of power dissipation and jitter variance due to device mismatch. Analysis shows that this optimum stage ratio

    (2.4) is lower than that of minimum delay (3.6) and minimum power-delay (6.35) product. This analysis is verified by

    simulation results using standard 180nm as well as 90nm CMOS technology. Knowledge of the optimum stage ratio

    helps to design low power low mismatch jitter buffers for multi phase clock generation circuits that can drive Large

    load capacitances.

  • 8/8/2019 Elysium VLSI 2010

    21/21

    Elysium Technologies Private LimitedISO 9001:2008 A leading Research and Development DivisionMadurai | Chennai | Kollam | Ramnad | Tuticorin | Singapore

    #230, Church Road, Anna Nagar, Madurai 625 020, Tamil Nadu, India(: +91 452-4390702, 4392702, 4390651Website: www.elysiumtechnologies.com,www.elysiumtechnologies.infoEmail: [email protected]

    66 Optical Lithography Simulation with Focus Variation Using Wavelet Transform

    Printed image on silicon wafer differs from layout due to optical diffraction. Optical proximity correction (OPC) is a

    layout distortion technique to improve printed image. During manufacturing, parameters such as focus, dose and

    resist thickness may vary within tolerance margins. These factors contribute to additional distortion of expected

    printed shape, not addressed directly by OPC. To ensure a robust IC, a process window consideration is extremely

    important while running lithography simulations as we scale the technology even further, where the sensitivity of

    patterns printed on silicon to process variations is very high. Optical Lithography simulation has always been an

    important link in the chain for Design for manufacturability (DFM) and a lot of research has been put into making it

    faster and more accurate. However, being a compute intensive process, speeding up litho simulation without

    significant Compromise in accuracy has always been tricky. In this paper we propose a new method to approximate

    litho simulation based on wavelet transform as opposed to the traditional method employed and we validate the speed

    and accuracy of our simulator by comparing our results with those of a popular commercial Lithography simulator

    considering focus variations. While our simulator suffers from an RMS error of < 6%, the major gains are (1) an

    increase in simulation speed of > 20X and (2) the ability to simulate very large circuit masks where the commercial

    software fails and direct incorporation of (3) manufacturing process variation. This allows litho simulation againstmultiple manufacturing process corners, which in turn helps in producing robust design.

    67 On-Chip Inductor-less DC-DC Boost Converter with Non-Overlapped Rotational-Interleaving Scheme

    An architecture of inductor-less DC-DC boost converter for high efficiency and low output ripple is proposed. Output

    ripple is reduced by splitting flying capacitors into a number of smaller elements and using a new switching scheme

    called Non-Overlapped Rotational-Interleaving (NORI). The proposed switching scheme also helps to eliminate

    reversion and shoot through current hence improves the power efficiency. The proposed converter is designed in

    0.18M CMOS thick gate process having 440pF total flying capacitance. The target specification of load current is 1mA

    - 23mA for 5V - 6.5V output voltage from an input supply of 3.3V. The achieved peak Power efficiency is 89% at 10mA

    load current as compare to 83% peak power efficiency obtained from the best existing architecture designed in same

    technology. The output ripple at 10mA load current is 2.2mV in presence of only 50pF load capacitance

    68 On Minimization of Test Application Time for RAS

    Conventional Random access scan (RAS) for testing has lower test application time, low power dissipation, and low

    test data volume compared to standard serial scan chain based design. In this paper, we present two cluster based

    techniques, namely, Serial Input Random Access Scan and Variable Word Length Random Access Scan to reduce test

    application time even further by exploiting the parallelism among the clusters and performing write operations on

    multiple bits. Experimental results on benchmarks circuits show on an average 2-3 times speed up in test write time

    and average 60% reduction in write test data volume.