a technique to remove glitches in physical design stage

120
A TECHNIQUE TO REMOVE GLITCHES IN PHYSICAL DESIGN STAGE A Dissertation submitted in partial fulfilment of the requirements for the award of the degree of MASTER OF SCIENCE IN ELECTRONICS & COMMUNICATION ENGINEERING (VLSI & Embedded Systems Design) Submitted by S.S.SARAT CHANDRA (11011J6033) Under the esteemed guidance of Mr P.PARAMESWARA RAO Assistant Professor Seer Akademi Pvt. Ltd. Department of Electronics and Communication Engineering JNTUH COLLEGE OF ENGINEERING HYDERABAD (Autonomous) Jawaharlal Nehru Technological University Hyderabad 500085 2011 - 2013

Upload: sarab-susheel-sarat-chandra

Post on 15-Dec-2015

15 views

Category:

Documents


1 download

DESCRIPTION

removing glitches from a design at physical design stage vlsi

TRANSCRIPT

Page 1: a technique to remove glitches in physical design stage

A TECHNIQUE TO REMOVE GLITCHES IN PHYSICALDESIGN STAGE

A Dissertation submitted in partial fulfilment of the requirements for the award of the degree of

MASTER OF SCIENCE

IN

ELECTRONICS & COMMUNICATION ENGINEERING

(VLSI & Embedded Systems Design)

Submitted by

S.S.SARAT CHANDRA

(11011J6033)

Under the esteemed guidance of

Mr P.PARAMESWARA RAO

Assistant Professor

Seer Akademi Pvt. Ltd.

Department of Electronics and Communication Engineering

JNTUH COLLEGE OF ENGINEERING HYDERABAD(Autonomous)

Jawaharlal Nehru Technological University

Hyderabad – 500085

2011 - 2013

Page 2: a technique to remove glitches in physical design stage

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING

JNTUH COLLEGE OF ENGINEERING

HYDERABAD-500085

CERTIFICATE

This is to certify that this dissertation work entitled “A TECHNIQUE TO REMOVE

GLITCHES IN PHYSICAL DESIGN STAGE” is a bonafide work carried out by

S.S.SARAT CHANDRA bearing Roll NO.11011J6033 in partial fulfilment of the requirement

for the award of MASTER OF SCIENCE degree in ECE with specialization in VLSI AND

EMBEDDED SYSTEMS DESIGN from JNTUH during the academic year 2011-13. The

results have been verified and found to be satisfactory.

Internal GuideMr P.Parameswara RaoM.Tech, Asst. ProfessorSeer Akademi Pvt.Ltd.

Page 3: a technique to remove glitches in physical design stage

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING

JNTUH COLLEGE OF ENGINEERING

HYDERABAD-500085

CERTIFICATE BY THE SUPERVISOR

This is to certify that this dissertation work entitled “A TECHNIQUE TO REMOVE

GLITCHES IN PHYSICAL DESIGN STAGE”, being submitted by S.S.SARAT

CHANDRA bearing Roll NO.11011J6033 in partial fulfilment of the requirement for the award

of MASTER OF SCIENCE degree in ECE with specialization in VLSI AND EMBEDDED

SYSTEMS DESIGN, is a record of bonafide work carried out by him under my supervision.

The results have been verified and found to be satisfactory.

SupervisorMr P.Parameswara RaoM.Tech, Asst. ProfessorSeer Akademi

Page 4: a technique to remove glitches in physical design stage

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING

JNTUH COLLEGE OF ENGINEERING

HYDERABAD-500085

CERTIFICATE BY HEAD OF THE DEPARTMENT

This is to certify that this dissertation work entitled “A TECHNIQUE TO REMOVE

GLITCHES IN PHYSICAL DESIGN STAGE”, being submitted by S.S.SARAT

CHANDRA bearing Roll NO.11011J6033 in partial fulfilment of the requirement for the award

of MASTER OF SCIENCE degree in ECE with specialization in VLSI AND EMBEDDED

SYSTEMS DESIGN, is a record of bonafide work carried out by him.

Dr. D. SREENIVASA RAOProfessor & Head of the DepartmentDepartment of ECEJNTUH College of EngineeringHyderabad

Page 5: a technique to remove glitches in physical design stage

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING

JNTUH COLLEGE OF ENGINEERING

HYDERABAD-500085

DECLARATION OF THE CANDIDATE

I, S.S.SARAT CHANDRA bearing Reg. No.11011J6033 hereby declare that the

dissertation work entitled “A TECHNIQUE TO REMOVE GLITCHES IN PHYSICAL DESIGN

STAGE” have been developed under the valuable guidance of Mr. P.Parameswara Rao and

submitted in partial fulfilment of the requirements for the Award of the Degree of Master of

Science in “VLSI AND EMBEDDED SYSTEMS DESIGN”.

This is a record of bonafide work carried out by me and the results obtained have not

been reproduced or copied from any source. The results of this dissertation have not been

submitted to any other University or Institute for the award of any Degree.

S.S.SARAT CHANDRAReg.No.11011J6033Branch ECE (M.S VLSI & ESD)

Page 6: a technique to remove glitches in physical design stage

ACKNOWLEDGEMENT

This is an acknowledgement of intensive drive and technical competence of many

individuals who have contributed to the success of my project.

I wish to express my deep sense of gratitude and sincere thanks to Mr P.

Paramesawara Rao, Assistant Professor, in Seer Akademi for his valuable suggestions,

sagacious guidance in all respects during the course of the project.

I am particularly thankful to Dr. D. Sreenivasa Rao, Professor and Head of the

department of ECE, JNTU College of engineering, Hyderabad for her support during

the project work.

I express my heart-felt thanks to Mr Srikanth Jadcherla, CEO of Seer Akademi,

Mr M. Ram Kumar, Manager of Seer Akademi, and Mr P.Parameswara Rao for

their encouragement, guidance, suggestions and Mr. Ramanji Reddy for his IT

support.

I am grateful to my friends, parents, to the entire faculty and non-faculty staff for

their encouragement and support.

S.S.SARAT CHANDRAReg.No.11011J6033Branch ECE (M.S VLSI & ESD)

Page 7: a technique to remove glitches in physical design stage

ABSTRACT

A glitch compensation methodology is proposed in this paper which involves

in reducing the undesired switching of combinational circuits in order to save

dynamic power. The proposed methodology can be seamlessly integrated to existing

physical design flow to reduce the glitch power which is one of the major contributing

factors for both dynamic and IR drop. A glitch is an undesired transition that occurs

before intended value in digital circuits.

A glitch occurs in CMOS circuits when differential delay at the inputs of a

gate is greater than inertial delay, which results into notable amount of power

consumption. The glitch power is becoming more prominent in lower technology

nodes. Introduction of buffers at the input of the Logic gate may reduce glitches, but it

results into large area overhead and dynamic power. Coupling capacitance of the nets

should be decreased by shielding or spacing far or by using proposed methodology.

By using the proposed methodology the functionality can be improved by removing

static glitches and timing can be improved by removing the dynamic glitches.

Hence, the proposed methodology will ensure low dynamic power

consumption with less area. The proposed methodology has been validated using

Synopsys 90nm SAED PDK.

Tools Used: VCS (Functionality and simulation), DC (Logic Synthesis) & IC

compiler (Physical design).

Page 8: a technique to remove glitches in physical design stage

LIST OF CONTENTS

Title Page No

LIST OF FIGURES i

LIST OF TABLES iii

ABBREVIATIONS iv

Chapter 1: INTRODUCTION 1

1.0 Introduction and background 1

1.1 Introduction to ASIC 2

1.2 Standard cell based ASIC 2

1.3 Need for low power ASIC 3

1.4 ASIC flow 4

1.5 Objective of the project 7

1.6 Organization of thesis 8

Chapter 2: Asynchronous Fifo 10

2.1 Asynchronous Interface 10

2.2 Issues in designing asynchronous fifo 10

2.3 Operation of the design 11

2.3.1 Data write operation 11

2.3.2 Fifo full status 11

2.3.3 Asynchronous fifo pointers 12

2.4 Handling full and empty conditions 15

2.4.1 Generating empty flag 15

2.4.2 Generating full flag 15

2.5 Procedure to design fifo module 16

2.5.1Dual port RAM 20

2.5.2 Gray counters 20

2.5.3 Address pointer difference generation 20

2.5.4 Full and empty generation logic 20

Page 9: a technique to remove glitches in physical design stage

2.5.5 Next read and write control logic 20

Chapter 3: Logic Synthesis 20

3.1 Synthesis and its basic flow 20

3.2 Synopsys design compiler flow for synthesis 21

3.3 Design flow 23

3.3.1 Read design 23

3.3.2 Synthesis libraries 24

3.3.2.1 Target library 24

3.3.2.2 Synthetic library 24

3.3.2.3 Link library 24

3.3.2.4 Symbol library 24

3.4 Design environment 25

3.5 Synthesis constraints 26

3.5.1 Design rule constraints 27

3.5.2 Design optimization constraints 27

3.6 Design constraints 28

Chapter 4: Design Planning 32

4.1 Introduction 32

4.2 Design planning flow 34

4.3 Tasks to be performed in design planning 35

4.4 Macro planning 36

4.5 Partitioning 36

4.6 Power planning 37

4.7 Defining chip area 38

4.8 Virtual flat placement 39

Chapter 5: Placement & Power Planning 40

5.1 Introduction 40

5.2 IO power 41

5.2.1 Core power 41

5.3 Power network synthesis 41

5.3.1 Power pads 42

Page 10: a technique to remove glitches in physical design stage

5.3.2 Rectangular rings 43

5.3.3 Power straps 43

5.4 Placement 43

5.4.1 Tasks to be performed during placement 44

5.4.2 Global placement 44

5.4.3 Detailed placement 45

Chapter 6: Clock Tree Synthesis 46

6.1 Introduction 46

6.2 Clock network modelling for logic synthesis 48

6.2.1 Virtual clocks 48

6.2.2 Trail of clock network synthesis 48

6.3 Clock design at implementation stage 49

6.3.1 Clock network synthesis in a flat design flow 49

6.3.2 Clock network synthesis in a hierarchical design flow 49

6.4 Prerequisites for clock tree synthesis 50

6.4.1Design prerequisites 50

6.4.2 Library prerequisites 51

6.5 Clock distribution architectures 51

6.5.1 Tree 52

6.5.2 H-tree 52

6.6 Algorithm for clock tree construction 53

6.7 Analyzing the clock trees 53

6.7.1 Identify the clock tree at the end points 54

6.7.2 Analyzing the clock sink groups 54

6.7.3 Defining clock root attributes 54

6.8 Clock skew scheduling 55

6.9 Optimization of registers 55

6.10 Clock analysis and on chip variations 56

Chapter 7: Routing 57

7.1 Introduction 57

7.2 Process design rules 57

7.3 Routing grid 57

7.4 Global & detailed routing 69

Page 11: a technique to remove glitches in physical design stage

7.5Routing congestion 59

7.6 Routing order 60

Chapter 8: Signal Integrity 61

8.1 Introduction 61

8.2 Crosstalk 61

8.3 Si closure methodologies 62

8.4 Si prevention 63

8.5 Si analysis and repair 64

Chapter 9: Results & Analysis 67

9.1 Synthesis results 67

9.1.1 Timing report 69

9.1.2 Data required and arrival time 70

9.1.3 Slack 70

9.1.4 Setup and hold time 70

9.1.5 Power report 72

9.1.6 QOR (quality of results) 73

9.2 Design planning results 75

9.2.1 Floorplan results 75

9.2.2 Virtual flat placement 77

9.2.3 Congestion analysis 77

9.3 Power planning results 79

9.3.1 Rectangular rings result 79

9.3.2 Power straps result 79

9.4 Power network synthesis results 80

9.4.1 IR drop analysis 80

9.5 Placement results 82

9.5.1 Timing report 84

9.5.2 Area report 86

9.5.3 Power report 87

9.5.4 Congestion report ` 88

9.6 Clock tree synthesis results 89

9.6.1 Timing report 89

Page 12: a technique to remove glitches in physical design stage

9.7 Analyzing the routing results 93

9.7.1 Noise report before buffer insertion 93

9.7.2 Noise report after buffer insertion 94

9.7.3 Power report before buffer insertion 96

9.7.4 Power report after buffer insertion 97

Chapter 10: Conclusion 100

Chapter 11: Future scope 101

Bibliography 102

Page 13: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH i

List of figuresS.No Figure. No Figure Name Page. No

1 1.1 Cell based ASIC 3

2 1.3 Traditional ASIC design flow 6

3 2.2 Fifo full and empty conditions 13

4 2.3 n bit Gray code converted to n-1 gray code 14

5 2.4 Top module of asynchronous fifo 16

6 2.5 Internal architecture asynchronous fifo 17

7 3.1 Synthesis flow 21

8 3.2 Design environment for a design 26

9 3.3 Specification of input delay 29

10 3.4 Specification of output delay 30

11 4.1 Basic flow of physical design 33

12 4.2 Data setup for design flow 34

13 4.3 Design planning 35

14 5.1 Power planning 42

15 6.1 Clock skew 47

16 6.2 Tree generated by clock tree synthesis 52

17 6.3 H-tree balances skew 52

18 6.5 Routing grids 58

19 7.2 Routing with two different metals 59

20 8.1 SI closure criteria 61

21 8.2 Buffer insertion to victim and aggressor 63

22 8.3 Adding shielding to aggressor 65

23 9.1 Setup & hold time 71

24 9.2 Floorplanned design 76

25 9.3 Virtual flat placement 77

26 9.4 Rectangular rings & power straps 80

27 9.5 Power network synthesis 82

28 9.6 Placement 83

29 9.7 Clock tree synthesis 89

Page 14: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH ii

S.No Figure. No Figure Name Page. No

30 9.8 Routed clock drivers with clock nets 93

31 9.9 Victim net representations 98

32 9.10 Buffer additions to victim net 98

Page 15: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH iii

List of tables

S.No Table. No Table Name Page .No

1 9.1 Floorplan reports 75

2 9.2 IR drop analysis 81

Page 16: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH iv

ABBREVIATIONS

i. ASIC : Application specific integrated circuit

ii. BGAs : Ball grid arrays

iii. CAD : Computer aided design

iv. CPU : Central processing unit

v. CTS : Clock tree synthesis

vi. CMP : Chemical mechanical polishing

vii. CNS : Clock network Synthesis

viii. DC : Design compiler

ix. DCTB : Dynamic clock-tree building

x. DME : Deferred merge embedding

xi. DMST : Dual-MST geometric matching topology

xii. DNNA : Dynamic Nearest-Neighbour Algorithm

xiii. DLL : Delay locked loop

xiv. DSP : Digital Signal Processing

xv. DDR : Domain Deskew Register

xvi. DRC : Design rule check

xvii. DSP : Digital signal processing

xviii. DIPS : Dual In-line package

xix. ECO : Engineering Change Orders

xx. ESD : Electrostatic Discharge

xxi. ERC : Electrical Rule check

xxii. FPGA : Field programmable gate array

xxiii. FDP : Force-directed placement (FDP) framework.

xxiv. FPU : Floating point unit

xxv. GDSII : Geometric data stream

xxvi. GMA : Geometric matching algorithm

xxvii. GPL : General Public License

xxviii. GTECH : General technology

xxix. HDL : Hardware description language

xxx. ISPD : International symposium for Physical design

xxxi. I/O : Input and output

xxxii. IP : Intellectual properties

Page 17: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH v

xxxiii. IC : Integrated circuits

xxxiv. ICC : Integrated circuit compiler

xxxv. IEEE : Institute of electrical and electronic engineers

xxxvi. IP : Intellectual protocol

xxxvii. LC : Library compiler

xxxviii. LCB: Local clock buffers

xxxix. MCMM : Multi-corner and Multi mode

xl. MLBB : Multi level bounding box

xli. PPO : Post placement optimization

xlii. PVT : Process voltage and temperature

xliii. PGAs : Pin grid arrays

xliv. QOR : Quality of results

xlv. RC : Resistance and capacitance

xlvi. RCD : Regional clock Driver

xlvii. RTL : Register transfer level

xlviii. SDC : Synopsys design constraints

xlix. STA : Static timing analysis

l. SDF : Standard delay format

li. SOC : System On-chip

lii. SPICE : Simulation Program with Integrated Circuit Emphasis

liii. TDF : Top design file

liv. TLU : Table look up

lv. TNS : Total Negative Slack

lvi. VCS : Verilog compiler and simulator

lvii. VHDL : Very high speed integrated circuit hard ware description

Language

lviii. VLSI : Very large scale integrated circuit

lix. WNS : Worst Negative Slack

lx. WSBFS : Walk-Segment Breadth First Search

Page 18: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 1

Chapter 1

Introduction

The integrated circuits in today’s scenario coming up with increased transistors

placed on it and making the designers to complete it with many challenges, so

challenges are compromised with much compensation. So designing a vlsi chip

conditioned to have low power consumption, low power dissipation is not easy task,

they have to look up at the following factors and balance

PVT (Process Voltage Temperature), Clock frequencies, Timing Closure,

Scaling, Power Dissipation, Electromigration & IR drop.

No circuit is to have low area low power consumption low power dissipation

and high speed any one or two factors are to be sacrificed, so by decreasing feature

size day by day it’s a big dilemma to designers and fabrication.

However, at the time when the need and opportunity for an ASIC market were

clear, the design challenges were not as numerous and complex. As it is with any

industry and market, legacy forces become too strong to overcome. This has been the

problem with today's RTL-to-GDSII design flow which is clinging to past successes

and hobbles to tackle future problems.

Many of the design decisions that are made in today's RTL-to-GDSII

methodology are based on coarse estimates or worst-case decisions. Such decisions

can no longer lead to successful design due to the increased miniaturization of the

process which in turn leads to tighter design margins, as well as the tight market

constraints that demand a shorter turn-around design time. This dissertation is not a

survey of the various EDA algorithms involved in the RTL-to-GDSII flow, nor is it a

survey of the various design methodologies and flows that are employed by designers

in the field of high-end IC design.

Page 19: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 2

1.1 Introduction to ASIC

An application-specific integrated circuit (ASIC) is an integrated circuit(IC)

customized for a particular use, rather than intended for general-purpose use. In

today's world, ASICs offer many advantages over off-the-shelf devices.

Smaller die size leads to board size reduction, reduced power consumption,

less heat dissipation, Lower costs under mass production, improved performance,

Better radiation tolerance, improved testability, Enhanced reliability, Proprietary

design implementation

1.2 Standard-Cell–Based ASIC

A cell-based ASIC uses predefined logic cells like AND gates, OR gates,

multiplexers, and flip-flops known as standard cells. The flexible blocks in a CBIC

are built of rows of standard cells. Placement of the standard cells and the

interconnect is defined by an ASIC designer in a CBIC. The advantage of CBICs is

that they can be designed in less time with small amount of money compared to full-

custom ASICs, and also the most important thing is it reduce the risk by using a

predesigned, pretested, and pre-characterized standard-cell library which can be

optimized individually. At the same time, the disadvantages are the time or expense of

designing or buying the standard-cell library and the time needed to fabricate all

layers of the ASIC for each new design. Figure-1.1 shows a CBIC (cell based IC).

Page 20: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 3

Figure 1.1 cell based ASIC [13]

Each standard cell in the library is constructed using full-custom design

methods, but you can use these predesigned and pre-characterized circuits without

having to do any full-custom design yourself. This design style gives you the same

performance and a flexibility advantage of a full-custom ASIC but reduces design

time and reduces risk.

1.3 Need for Low Power ASIC

For early digital circuits, high speed and minimum area were the main design

constraints. Most of the EDA tools were designed specifically to meet these criteria.

Power consumption was never highly visible. Nowadays, the area reduction of digital

circuits is no longer a big issue as with the latest sub-micron techniques, many

millions of transistors can be fit in a single IC. Smaller chip size eventually leads to

high demand for portable and handheld devices. More and more applications are

battery powered, and low power IC’s are the key to extend the usage time in between

battery recharge, and in turn increase battery life and reliability of the product. Also in

submicron technologies, there is a limitation on the proper functioning of circuits due

to heat generated by power dissipation. Market forces are demanding low power for

not only longer battery life but also reliability, portability, performance, cost and time

to market. This is very true in the field of personal computing devices, wireless

communications systems, home entertainment systems, which are becoming popular

now-a-days. Implantable medical devices, such as pace maker, deep brain system for

Page 21: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 4

Parkinson’s disease, and spinal cord stimulator for pain management, particularly

need to dissipate less power for longer battery life and improved component reliability

and safety.

As process technology reduces into 90nm and below, performance and density

are taken to new levels, yet power loss in both switching and leakage makes designing

with these devices a major challenge. Leakage power reduction is essential in

sustaining the scalingof the CMOS process. Leakage power is now becoming

proportional to dynamic orswitching power loss as shown in Figure below. While

lowering of the threshold voltage leads to significant increase in sub-threshold

leakage current, the increase in gate tunneling leakage current is caused by thinner

gate oxides. While scaling improves transistor density, functionality, and higher

performance on chip, it also results in power dissipation increase. Therefore, it has

become necessary to use new techniques to manage energy at the system level.

1.4 ASIC Flow

The traditional ASIC design flow:

Prepare requirement specification and create a Micro-Architecture document.

RTL design and development of IP’s. After the previous step DFT memory BIST

insertion can also be implemented, if the design contains any memory element.

Functional verification all the IPS. Check whether the RTL is free from lifting errors

and analyze whether the RTL is synthesis friendly. Perform cycle based verification

(functional) to verify the protocol behaviour of the RTL. Perform the property

checking to verify the RTL implementation and the specification understanding is

matching. Design environment setting. This includes the technology file to be used

along with Other environmental attributes.

Prepare the design constraints file to perform synthesis, usually called as an

SDC synopsys_constraints or dc_synopsys_setup file, specific to synthesis tool

(design compiler). Once the constraints file is set. For performing synthesis inputs to

the DC are the library file (for which the synthesis needs to be targeted for, which has

the functional/timing information available for the standard cell library and the wire

load models for the wires based on the fan-out length of the connectivity), RTL files

Page 22: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 5

and the design constraints files, so that the synthesis tool can perform the synthesis of

the RTL files and map and optimize to meet the design constraints requirements.

After performing the synthesis, scan insertion and JTAG scan chain insertions

are implemented and then synthesis is repeated. Check whether the design is meeting

the requirements after synthesis. Perform block level static timing analysis using

Design compiler’s built-in static timing analysis engine. Perform Formal verification

between RTL and the synthesized netlist to confirm that the synthesis tool has not

altered the functionality. Perform the pre-layout STA (static timing analysis) using

PrimeTime with the SDF (standard delay format) file and synthesized netlist file to

check whether the design is meeting the timing requirements. Once the synthesis is

performed the synthesized netlist file (VHDL/Verilog format) and the SDC

(constraints file) is passed as input files to the Placement and routing tool to perform

the back-end activities. The tool used is IC Compiler.

Initialize the floorplanning with timing driven placement of cells, clock tree

insertion and global. Transfer of clock treeto the original design (netlist) residing in

Design Compiler. In-place optimization of the design in Design Compiler. Formal

verification using Formality. Extraction of estimated timing delays from the layout

after the global routing step. Back annotation of estimated timing data from the global

routed design, to PrimeTime. Static timing analysis in PrimeTime, using the estimated

delays extracted after performing global route. Detailed routing of the design.

Extraction of real timing delays from the detailed routed design. Back annotation of

the real extracted timing data to PrimeTime. Post-layout static timing analysis using

PrimeTime. Functional gate-level simulation of the design with post-layout timing (if

desired). Tape out after LVS and DRC verification.

CAD tools are involved in all stages of VLSI design flow–Different tools can

be used at different stages due to EDA common data formats. CAD tools provide

several advantages:

Ability to evaluate complex conditions in which solving one problem creates

other problems. Use analytical methods to assess the cost of a decision. Use synthesis

methods to help provide a solution. Allows the process of proposing and analyzing

solutions to occur at the same time.

Page 23: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 6

Figure 1.3 Traditional ASIC Design Flow [14]

As shown in the Figure 1.3 graphically illustrates the typical ASIC design

flow discussed above. The acronyms STA and CT represent static timing analysis and

clock tree respectively. DC represents Design Compiler Synopsys CAD tool for

Physical Design is called Integrated Circuit Compiler (ICC).

Page 24: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 7

1.5 Objective of the project

The main objective is to remove glitches or minimize the glitches effect on thedesign.

Glitches are formed when coupling capacitances are more, so need to decrease

coupling capacity effect either by placing buffer in aggressor or victim nets. When the

unnecessary signals switch together there would be dynamic IR drop.

To implement backend flow process some inputs are necessary of which major

input is gate level netlist obtained from synthesis. In floor planning determines the

size of the cell (or die) creates boundary and core area, Aspect ratio and creates wire

tracks for power planning. Size of the die and utilization directly reflects wire

spacing, power consumption and IR drop.

Virtual flat placement is done to analyze congestion which affects the goals of

glitch less design. Power network synthesis is done to know IR drop as it is one of on-

chip skew variation problem.

In noise analysis it gives glitch which is produced by noise due to it victim net

which needs to be constant gives out any dynamic logic value in turn we get

unnecessary transitions and corrupted output. By inserting buffers the glitch effect is

reduced or removed.

Buffers are inserted either on aggressor net or victim net and should be before

the receiving side of the glitch circuit. Then the resultant noise should be noted to see

whether it is decreased or removed. If decreased then it has to be below noise margin

to produce glitch less output.

Page 25: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 8

1.6 Organization of Thesis

This section defines the organization of the entire thesis and the flow of

the project from introduction to the conclusion of the project.

Chapter1 describes the introduction of the project and main objectives of

the project.

Chapter 2 describes about the Asynchronous FIFO, it explains about different

operations, hierarchy and signals. It’s known as asynchronous dual ram. It reads and

writes the data using counters and registers. It explains about the top level module

and modules involved in the design.

Chapter 3 explains about the flow of synthesis, libraries and

constraints applied to the design. It also explains about design environment for the

design.

Chapter 4 explains about the design planning, which includes the flow

of physical design, floor planning, virtual flat placement, congestion issues and

macro planning.

Chapter 5 describes about placement, power planning, power straps, IO

power, core power and power structure of the design.

Chapter 6 explains about the clock tree structure and different types of

clock tree distribution structures for the design. In this distribution of clock trees,

analyzing of clock sink groups, clock tree attributes and clock network modeling

for synthesis. It also explains about analyzing clock trees, clock skew scheduling,

optimization of registers and clock analysis. Trail for clock tree synthesis, virtual

clocks present in the design and describes about clock design at implementation

stage.

Chapter 7 explains about routing at all stages after tracks and virtual

placement of clock and power nets are formed. Its interconnects all the signals in

real.

Chapter 8 explains about signal integrity and types of noise occurred. Eco cell

buffer addition to the aggressor or victim net.

Chapter 9 explains about the results occurred in each flow of the design.

Chapter 10 explains about conclusion of the project.

Chapter 11 explains about future scope of the methodology.

Page 26: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 9

Chapter 2

Asynchronous FIFO

2.1 Asynchronous Interface

Asynchronous interface design is the circuitry in which set of signals that

comprises the connection between devices of a computer system where the transfer of

information between devices is organized by the exchange of signals not

synchronized to some controlling clock. A request signal from an initiating device

indicates the requirement to make a transfer; an acknowledging signal from the

responding device indicates the transfer completion. This asynchronous interchange is

also widely known as Handshaking.

Most of the time, asynchronous designs are referred to as the designs with no

clocks, but this project asynchronous FIFO interface circuit incorporates multiple

clocks for transmitting and receiving the data values. The description of the design is

explained below along with the top module diagram of the design.

An asynchronous FIFO refers to a FIFO design where data values are written

to a FIFO buffer (RAM) from one clock domain and the data values are read from the

same FIFO buffer from another clock domain, where the two clock domains are

asynchronous to each other. Asynchronous FIFOs are used to safely pass data from

one clock domain to another clock domain.

There are a lot of different ways to design asynchronous FIFO interface

design, the method used in this project is “FIFO partitioning with synchronized

pointer comparison”; for comparing and synchronizing the design working on two

clocks one for transmitting and one for receiving, uses gray counters for comparison

of full and empty registers of RAM which is FIFO buffer for writing and reading the

data values.

Data words are placed into a FIFO buffer memory array by control signals in

one clock domain, and the data words are removed from another port of the same

FIFO buffer memory array by control signals from a second clock domain. The

Page 27: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 10

difficulty associated with doing FIFO design is related to generating the FIFO

pointers and finding a reliable way to determine full and empty status on the FIFO. [6]

Generally FIFOs are used where write operation is faster than read operation.

However, even with the different speed and access types the average rate of data

transfer remains constant. FIFO pointers keep track of number of FIFO memory

locations read and written and corresponding control logic circuit prevents FIFO from

either under flowing or overflowing. FIFO architectures inherently have a challenge

of synchronizing itself with the pointer logic of other clock domain and control the

read and write operation of FIFO memory locations safely.

2.2 Issues in Designing Asynchronous FIFO

Although the design states that the circuitry is asynchronous and is working in

multiclock environment, it is essential to synchronize the two clocks as the data can

be lost due to setup and hold violations. It is very important to understand the signal

stability in multi clock domains since for a travelling signal the new clock domain

appears to be asynchronous. If the signal is not synchronized to new clock, the first

storage element of the new clock domain may go to metastable state and the worst

case is that resolution time cannot be predicted. It can traverse throughout the new

clock domain resulting in failure of functionality. To prevent such failures setup time

and hold time specification has to be obeyed in the design. Manufacturers provide

statistics of probability of failure of flip-flops due to metastability characters in terms

of MTBF (Mean Time before Failure). Synchronizers are used to prevent the

downstream logic from entering into the metastable state in multiclock domain with

multibit data values.

Thus, for efficient working of FIFO architecture designing of FIFO pointers is

the key issue. At this point, deep understandings of the FIFO read and write pointers

become necessary. On reset both read and write pointers are pointing to the starting

location of the FIFO. This location is also the first location where data has to be

written at the same time this first location happens to be first read location. Therefore,

in general, read pointer always points to the word to be read and write pointer always

points to the next location to which data has to be written.

Page 28: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 11

2.3 Operation of the Design

2.3.1 Data write operation

When both read and write pointers are pointing to first location of FIFO empty

flag is asserted indicating the FIFO status as empty. Now data writing can be

performed. Data will be written to the location where the write pointer is pointing and

after the data write operation write pointer gets incremented pointing to the next

location to be written. At the same time, empty flag is de-asserted which indicates that

FIFO is not empty, somedata is available. One notable point regarding read pointer is

with empty flag active the data pointed out by the read pointer is always invalid data.

When first data written and empty flag status cleared (i.e. empty flag inactive) read

pointer logic immediately drives the data from the location to which it was pointing to

the read port of the dual port RAM, ready to be read by read logic. With this

implementation of read logic the biggest advantage is that only one clock pulse is

required to read from read port since previous clock cycle has already incremented

read pointer and drives the data to read port. This will help in reducing latency in

detecting empty and full pointer flag status. Empty status flag can be asserted in one

more condition. After some n number of data write operations if same n number of

read is performed then both pointers are again equal. Hence, if both pointers “catch

up” each other, then empty flag is asserted.

2.3.2 FIFO full status

When write pointer reaches the top of the FIFO, it is pointing towards the

location, which can be written and is the last location to be written. No read operation

is performed yet and read pointer is pointing to first location itself. This is one method

is to generate FIFO full condition. When write pointer reaches the top of the FIFO, if

full flag is asserted then it is not the actual FIFO full condition, this is only ‘almost

full’ as there is one location which can be written. Similarly almost empty condition

can exist in FIFO. Now a write operation causes the location to be written and

increment of write pointer. Since the location was the last one write pointer wraps up

to first location. Now both read and write pointers are equal and hence empty flag is

Page 29: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 12

asserted instead of full flag assertion, which is a fatal mistake. Hence wrap around

condition of a full pointer may be a FIFO full condition.

After writing the data to FIFO (consider write pointer is in top of FIFO) some

data has been read and read pointer is somewhere in between FIFO. One more write

operation causes the write pointer to wrap. Note that even though write pointer is

pointing to first location of FIFO this is NOT FIFO full condition, since read pointer

has moved up from the first location. Further data writing pushes write pointer up.

Imagine read pointer wraps around after some more read operation. Present condition

is that both pointers have wrapped around but there is no FIFO full or FIFO empty

condition. Data can be written to FIFO or read from the FIFO. The disadvantage of a

FIFO of this kind is that the status signals cannot be fully synchronized with the read

and write clock.

2.3.3 Asynchronous FIFO pointers

FIFO is full when the pointers are equal, that is, when the write pointer has

wrapped around and caught up to the read pointer. This is a problem. Considering that

point, it is difficult to decide which condition has occurred; the FIFO is either empty

or full when the pointers are equal.

One design technique used to distinguish between full and empty is to add an

extra bit to each pointer. Whenever the write pointer increments past the final FIFO

address, the write pointer will increment the unused MSB while setting the rest of the

bits back to zero as shown in Figure below (the FIFO has wrapped and toggled the

pointer MSB). The same is done with the read pointer. If the MSBs of the two

pointers are different, it means that the write pointer has wrapped one more time that

the read pointer. If the MSBs of the two pointers are the same, it means that both

pointers have wrapped the same number of times.

Page 30: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 13

Figure 2.2 FIFO full and empty conditions [15]

Using n-bit pointers where (n-1) is the number of address bits required to

access the entire FIFO memory buffer; the FIFO is empty when both pointers,

including the MSBs are equal. And the FIFO is full when both pointers, except the

MSBs are equal. The FIFO design uses n-bit pointers for a FIFO with 2(n-1) write-

able locations to help handle full and empty conditions. As shown in the figure 2.2 it

explains full and empty conditions.

The counters designed to synchronize the signals are Gray code counters. The

reason to choose gray coder counter and not the binary code counter is that, trying to

synchronize a binary count value from one clock domain to another is problematic

because every bit of an n-bit counter can change simultaneously (example 7->8 in

binary numbers is 0111->1000, all bits changed). Gray codes only allow one bit to

change for each clock transition, eliminating the problem associated with trying to

synchronize multiple changing signals on the same clock edge. It is desirable to create

both an n-bit Gray code counter and an (n-1)-bit Gray code counter. It would certainly

be easy to create the two counters separately, but it is also easy and efficient to create

Page 31: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 14

a common n-bit Gray code counter and then modify the 2nd MSB to form an (n-1)-bit

Gray code counter with shared LSBs. This will be called a “dual n-bit Gray code

counter.”

Figure 2.3 n-bit Gray code converted to an (n-1)-bit Gray code [15]

It is obvious that inverting the second MSB of the second half of the 4-bit

Gray code will produce the desired 3-bit Gray code sequence in the three LSBs of the

4-bit sequence. The only other problem is that the 3-bit Gray code with extra MSB is

no longer a true Gray code because when the sequence changes from 7 (Gray 0100) to

8 (~Gray 1000)and again from 15 (~Gray 1100) to 0 (Gray 0000), two bits are

changing instead of just one bit. A true Gray code only changes one bit between

counts. As shown in the figure 2.3 it explains about gray code counters.

Page 32: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 15

2.4 Handling full and empty conditions

Exactly how FIFO full and FIFO empty are implemented is design-dependent.

The FIFO design in this paper assumes that the empty flag will be generated in the

read-clock domain to insure that the empty flag is detected immediately when the

FIFO buffer is empty, that is, the instant that the read pointer catches up to the write

pointer (including the pointer MSBs).The FIFO design in this paper assumes that the

full flag will be generated in the write-clock domain to insure that the full flag is

detected immediately when the FIFO buffer is full, that is, the instant that the write

pointer catches up to the read pointer (except for different pointer MSBs).

2.4.1 Generating empty flag

The FIFO is empty when the read pointer and the synchronized write pointer

are equal. The empty comparison is simple to do. Pointers that are one bit larger than

needed to address the FIFO memory buffer are used. If the extra bits of both pointers

(the MSBs of the pointers) are equal, the pointers have wrapped the same number of

times and if the rest of the read pointer equals the synchronized write pointer, the

FIFO is empty. The Gray code write pointer must be synchronized into the read-clock

domain. Since only one bit changes at a time using a Gray code pointer, there is no

problem synchronizing multi-bit transitions between clock domains. In order to

efficiently register the rempty output, the synchronized write pointer is actually

compared against the rgraynext (the next Gray code that will be registered into the

rptr).

2.4.2 Generating full flag

Since the full flag is generated in the write-clock domain by running a

comparison between the write and read pointers, one safe technique for doing FIFO

design requires that the read pointer be synchronized into the write clock domain

before doing pointer comparison. The full comparison is not as simple to do as the

empty comparison. Pointers that are one bit larger than needed to address the FIFO

memory buffer are still used for the comparison, but simply using Gray code counters

with an extra bit to do the comparison is not valid to determine the full condition.

Page 33: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 16

2.5 Procedure to Design FIFO Module

The general block diagram of asynchronous FIFO is shown in Figure 2.4.

Functionality wise mainly we can distinguish four blocks in this diagram. They are:

dual port RAM, read pointer logic, writes pointer logic.

ReadEn_in

Data_out

Clear_in

RCLK

Data_in Empty_out

WriteEn_in

WCLK Full_out

Figure 2.4 Top Module of Asynchronous fifo [16]

Asynchronous FIFO

Page 34: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 17

Figure 2.5 Internal Architecture of Asynchronous fifo [16]

Dual port RAM has two ports-one is for reading and the other one is for

writing operation. These two accesses of the FIFO are independent of each other and

are completely controlled by read pointer logic and write pointer logic. Number of

memory locations of the FIFO varies from 8 locations to some kilobytes. The data

width of each location is also varying from one to 256 bits depending on the

Page 35: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 18

applications and technology. Modern day FIFOs provide options to program of the

above parameters as per requirements.

Data is written sequentially into the FIFO and read sequentially such that the

first data written is the first data read out and so on with the remaining sequential

data. Thus architecture of FIFO is completely characterized by these two independent

operations as shown in the figure 2.5. Dual port RAM and read-write logic circuits

with synchronizers accomplish this task. Read port has its associated memory

addressing logic called as ‘read pointer’ logic and write port has ‘write pointer’ logic.

When FIFO is reset both read and write pointers point to first memory location of the

FIFO. As and when data is written to FIFO write pointer gets incremented and points

to next memory location. Similarly when read operation takes place read pointer gets

incremented for every read. Both pointer works in circular fashion i.e. after reaching

the last position it will jump to first location of the FIFO.

Full flag’ and ‘empty flag’ are used to detect the status of the FIFO. These two

flags are generated depending on the comparison result of FIFO pointers. Full flag is

asserted when FIFO is completely full. Empty flag is asserted when FIFO is empty.

Assertion of full flag indicates that no data can be written further unless at least one

data is read out of the FIFO. Assertion of empty flag indicates the condition that no

more data can be read from the FIFO unless until at least one data is written to the

FIFO.

Even after the assertion of full flag, if data is written to FIFO ‘overflow’

condition occurs. Similarly after the assertion of empty flag if read operation is

performed then ‘underflow’ occurs. Either overflow or underflow condition causes

the data corruption or data loss. Safe and reliable FIFO designs always avoid both

extreme conditions.

A new asynchronous FIFO design is presented here. The concept of using

pointer difference for determining the FIFO status is already used in synchronous

FIFO designs. Here same concept is extended to asynchronous FIFO. The block

diagram consists of a dual port RAM, two 4 bit binary up-counters, address pointer

gap generation logic, full and empty condition generation logic, next read control

logic and next write control logic.

Page 36: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 19

One of the most interesting architectural decision is to how to calculate depth

of a fifo. For worst case scenario, difference in the data rate between write and read

should be maximum. Thus for write operation maximum data rate should be

considered and for read operation minimum data rate should be considered for

calculating depth of fifo.

Dual ports has two ports one is for writing and other is for reading .when data

is written to fifo write pointer gets incremented and points to next memory location.

Similarly when read operation takes place.

To determine full/empty flags and fifo size, the read pointer must be

synchronised to the write domain, and the write pointer must be synchronised to the

read domain.The design consists of dual port RAM , two 4 bit gray counter , full and

empty condition generation logic , write next control logic and read next control logic.

The naming conventions are as follows

Data_in : input data 8 bits width is considered, Data_out : output data 8 bots

width is considered, ReadEn_in : read enable, WriteEn_in : write enable, Clear_in :

clear input, WClk : write clk, RClk : read clk, Empty_out :fifo empty flag is asserted

when fifo is empty, Full_out :fifo full flag is asserted when fifo is full, Mem [3:0] :

memory to store data, pNextWordToWrite: write pointer, pNextWordToRead : read

pointer, EqualAddresses : write pointer == read pointer, NextWriteAddressEn : write

next address enable, NextReadAddressEn : read next address enable, Set_Status : set

status based on pointers, Rst_Status : reset status set by pointers, Status :status of fifo,

PresetFull : reset when fifo is full & PresetEmpty : reset when fifo is empty.

Page 37: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 20

2.5.1 Dual port RAM

For this design depth of RAM is considered to be 16 and width is 8 .Data is

written to fifo only if fifo is not full and write enable signal is enabled. Similarly data

is read out of fifo only if fifo is not empty and read signal is enabled.

2.5.2 Gray counters

Four bit gray counters are used to generate address for read and write port.

These address generators have external reset and enable signals called write

next enable and read next enable which generated and controlled by next write control

logic and next read control logic. Single resets are mapped to both gray counters to

reset both write and read pointers.

2.5.3 Address pointer difference generation

This block compares the both read and writes address and gives out difference

of two address pointers. This block contains comparators and adders and subtractors

which gives the status of the fifo.

2.5.4 Full and Empty generation logic

This block takes pointer difference as input and gives status of the fifo. If

pointer difference is zero empty condition is generated and if pointer difference is 15

full condition is generated.

2.5.5 Next read and write control logic

This control logic decides enabling read and writes once the empty and full

conditions are asserted.

Page 38: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 21

Chapter 3

Logic Synthesis

3.1 Synthesis and its Basic Flow

Synthesis is the process that generates a gate-level netlist for an IC design that

has been defined using a Hardware Description Language (HDL). Synthesis includes

reading the HDL source code and optimizing the design from that description. Using

the technology library's cell logical view, the Logic Synthesis tool performs the

process of mathematically transforming the ASIC's register-transfer level (RTL)

description into a technology-dependent netlist. This process is similar to a software

compiler converting a high-level C-program listing into a processor-dependent

assembly-language listing. The netlist is the standard-cell representation of the ASIC

design, at the logical view level. It consists of instances of the standard-cell library

gates, and port connectivity between gates. Proper synthesis techniques ensure

mathematical equivalency between the synthesized netlist and original RTL

description. The netlist contains no unmapped RTL statements and declarations. As

shown in the figure 3.1 it explains basic synthesis flow.

RTL Source Constraints Technology Libraries

RTL synthesis

Figure 3.1 Synthesis flow [17]

Page 39: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 22

3.2 Synopsys Design Compiler Flow for Synthesis

The Design Compiler is a synthesis tool from Synopsys Inc. In simple terms,

synthesis tool takes a RTL [Register Transfer Logic] hardware description written in

either Verilog or VHDL and standard cell library as input and the resulting output

would be a technology dependent gatelevel-netlist. The gatelevel-netlist is nothing but

structural representation of only standard cells based on the cells in the standard cell

library. The synthesis tool internally performs many steps, which are listed below.

Also below is the flowchart of synthesis process.

Design Compiler reads in technology libraries, DesignWare libraries, and

symbol libraries to implement synthesis. During the synthesis process, Design

Compiler [DC] translates the RTL description to components extracted from the

technology library and DesignWare library. The technology library consists of basic

logic gates and flip-flops.

The DesignWare library contains more complex cells for example adders and

comparators which can be used for arithmetic building blocks. DC can automatically

determine when to use Design Ware components and it can then efficiently synthesize

these components into gate-level implementations.

Design Compiler also needs the RTL designed by the designer. It reads the

RTL hardware description written in either Verilog/VHDL.

The synthesis tool now performs many steps including high-level RTL

optimization, RTL to un-optimized Boolean logic, technology independent

optimizations, and finally technology mapping to the available standard cells in the

technology library, known as target library. This resulting gate-level-netlist also

depends on constrains given. Constraints are the designer’s specification of timing

and environmental restrictions [area, power, process etc] under which synthesis is to

be performed. As an RTL designer, it is good to understand the target standard cell

library, so that one can get a better understanding of how the RTL coded will be

synthesized into gates.

After the design is optimized, it is ready for DFT [design for test/ test

synthesis]. DFT is test logic; designers can integrate DFT into design during

Page 40: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 23

synthesis. This helps the designer to test for issues early in the design cycle and also

can be used for debugging process after the chip comes back from fabrication.

After test synthesis, the design is ready for the place and route tools. The Place

and route tools place and physically interconnect cells in the design. Based on the

physical routing, the designer can back-annotate the design with actual interconnect

delays; DC can be used again to resynthesize the design for more accurate timing

analysis.

While running DC, it is important to monitor/check the log files, reports,

scripts etc to identity issues which might affect the area, power and performance of

the design.

3.3 Design Flow

3.3.1 Read Design

Design Compiler reads designs into memory from design files. Many designs canbe in memory at any time. After a design is read in, you can change it in numerousways, such as grouping or ungrouping its sub designs or changing sub designreferences. Design Compiler provides the following ways to read design files:

The analyze and elaborate commands & the read_file command.

Using the analyze and elaborate Commands

The analyze command does the following:

Reads an HDL source file, Checks it for errors (without building generic logic

for the design). Creates HDL library objects in an HDL-independent intermediate

format, Stores the intermediate files in a location you define, If the analyze command

reports errors, fix them in the HDL source file and run analyze gain. After a design is

analyzed, you must reanalyze it only when you change it.

The elaborate command does the following:

Translates the design into a technology-independent design (GTECH) from

the intermediate files produced during analysis. Allows changing of parameter values

defined in the source code. Allows VHDL architecture selection. Replaces the HDL

Page 41: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 24

arithmetic operators in the code with DesignWare components. Automatically

executes the link command, which resolves design references.

Resolving the reference means that the design library or file containing the

detailed design data for the sub-block can be found and processed. If any references in

the netlist cannot be resolved, the link command will issue warnings as to which sub-

component designs are not available.

3.3.2 SYNTHESIS LIBRARIES

3.3.2.1 Target library

The target library variable defines the technology library that tool uses to build

the circuit. That is, during technology mapping phase Design Compiler selects

components from the library specified with the target library variable to build the

gate-level netlist.

3.3.2.2 Synthetic Library

The synthetic library variable specifies the synthetic or Design Ware libraries.

These synthetic libraries are technology-independent, micro architecture-level design

libraries providing implementations for various IP blocks.

3.3.2.3 Link Library

The link library variable is used to resolve design references. That is, Design

Compiler must connect all the library components and designs it references. This step

is called linking the design or resolving references.

Note that in most cases the link library is the same as the target library

3.3.2.4 Symbol Library

Symbol library defines the schematic symbols for components in technology

library. These symbols are needed for drawing design schematics.

Page 42: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 25

3.4 Design Environment

In order to obtain optimum results from synthesis, designers have

to methodically constrain their designs by describing the design environment,

target objectives and design rules. The constraints may contain timing and/or area

information, usually derived from design specifications. Synthesis tool uses these

constraints to perform synthesis and tries to optimize the design with the aim

of meeting target objectives. It defines the environment by defining the operating

conditions, wire load models, and system interface characteristics.

Operating conditions include temperature, voltage, and process variations. Wire

load models estimate the effect of wire length on design performance.

System interface characteristics include input drives, input and output loads,

and fan-out loads. The environment model directly affects design synthesis

results.

Operating Conditions describes the process, voltage and temperature

conditions of the design. The process variation accounts for deviations in the

semiconductor fabrication process. The design’s supply voltage can vary from

established ideal value during day-to-day operation. Temperature variation is

unavoidable in the everyday operation a design. Effects on performance caused

by temperature fluctuations are most often handled as linear scaling effects. The

library contains the library contains the description of these conditions, usually

described as WORST, TYPICAL and BEST case. The names of operating

conditions are library dependent.

By changing the value of the operating condition command, full ranges of

process variations are covered. The worst case operating condition is generally used

during pre-layout synthesis phase, thereby optimizing the design for maximum

setup- time. The best case condition is commonly used to fix the hold-time

violations. The typical case is mostly ignored, since analysis at worst and best case

also covers the typical case. It is possible to optimize the design both with the

worst and the BEST case, simultaneously. The optimization is achieved by using

the analysis in the case analysis. This is very useful for fixing the design for

Page 43: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 26

possible hold-time violations. The wire load models used to estimate the net delays

as the function of loading, for a particular block different wire load models are

present.

As shown in figure 3.2 it explains about design environment and constraints

added for the particular blocks. The design environment consists of two blocks and

clock divider circuit. Clock divider logic generates clock and applies to block B.

Block A is used to send data to block B. Block A is the input to the design and

its output is applied to block B. For each block there are particular constraint’s

added for input and output to drive signals to their respective locations.

Figure 3.2 Design Environment for a design [10]

3.5 Synthesis Constraints

There are basically two types of design constraints they are design ruleand optimization constraints.

Design rule constraints are supplied in the technology library we

specify They are referred to as the implicit design rules. These rules are

established by the library vendor, and, for the proper functioning of the fabricated

circuit, they must not be violated. We can, however, specify stricter design rules

if appropriate. The rules you specify are referred to as the explicit design rules.

Design optimization constraints define timing and area optimization goals

Department of ECE, JNTUHCEH 26

possible hold-time violations. The wire load models used to estimate the net delays

as the function of loading, for a particular block different wire load models are

present.

As shown in figure 3.2 it explains about design environment and constraints

added for the particular blocks. The design environment consists of two blocks and

clock divider circuit. Clock divider logic generates clock and applies to block B.

Block A is used to send data to block B. Block A is the input to the design and

its output is applied to block B. For each block there are particular constraint’s

added for input and output to drive signals to their respective locations.

Figure 3.2 Design Environment for a design [10]

3.5 Synthesis Constraints

There are basically two types of design constraints they are design ruleand optimization constraints.

Design rule constraints are supplied in the technology library we

specify They are referred to as the implicit design rules. These rules are

established by the library vendor, and, for the proper functioning of the fabricated

circuit, they must not be violated. We can, however, specify stricter design rules

if appropriate. The rules you specify are referred to as the explicit design rules.

Design optimization constraints define timing and area optimization goals

Department of ECE, JNTUHCEH 26

possible hold-time violations. The wire load models used to estimate the net delays

as the function of loading, for a particular block different wire load models are

present.

As shown in figure 3.2 it explains about design environment and constraints

added for the particular blocks. The design environment consists of two blocks and

clock divider circuit. Clock divider logic generates clock and applies to block B.

Block A is used to send data to block B. Block A is the input to the design and

its output is applied to block B. For each block there are particular constraint’s

added for input and output to drive signals to their respective locations.

Figure 3.2 Design Environment for a design [10]

3.5 Synthesis Constraints

There are basically two types of design constraints they are design ruleand optimization constraints.

Design rule constraints are supplied in the technology library we

specify They are referred to as the implicit design rules. These rules are

established by the library vendor, and, for the proper functioning of the fabricated

circuit, they must not be violated. We can, however, specify stricter design rules

if appropriate. The rules you specify are referred to as the explicit design rules.

Design optimization constraints define timing and area optimization goals

Page 44: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 27

for Design Compiler. These constraints are user-specified. Design Compiler

optimizes the synthesis of the design, in accordance with these constraints,

but not at the expense of the design rule constraints. That is, Design Compiler

attempts never to violate the higher -priority design rules.

3.5.1 Design Rule Constraints

Maximum transition time is the longest time allowed for a driving pin of a

net to change its logic value. The maximum and minimum total capacitive load

that an output pin can drive. The total capacitance comprises of load pin

capacitance and interconnects capacitances. The maximum Fanout is applied for the

driving pin.

Some technology libraries contain cell degradation tables. The cell

degradation tables list the maximum capacitance that can be driven by a cell as

a function of the transition times at the inputs of the cell.

3.5.2 Design Optimization Constraints

The system clock definitions and clock delays are the most important

Constraints in your ASIC design. The clock signal is the synchronization signal that

controls the operation of the system. The clock signal also defines the timing

requirements for all paths in the design. Most of the other timing constraints are

related to the clock signal.

A multicycle path is an exception to the default single cycle timing

requirement of the paths. That is, on a Multicycle path the signal requires more

than a single clock cycle to propagate from the path start point to the path endpoint.

Clock uncertainty is used to define the clock skew information. Basically

this is used to certain amount of margin to the clock, both for setup and hold

times. During the pre layout phase one can add more margin as compared to

the post- layout phase.

Input specifies the input arrival time of a signal in relation to the clock. It

is used at the input ports to specify the time it takes for the data to be stable

Page 45: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 28

after the clock edge. Given the top-level timing specification of the design, this

information may also be extracted for the sub-blocks of the design. Output delay

is used at the output port to define the time it takes for the data to be available

before the clock edge. Given the top-level timing specification of the design, this

information may also be extracted for the sub-blocks of the design.

Minimum and maximum path delays allow constraining paths individually

and setting specific timing constraints on those paths. Input transition and output

load capacitance can be used to constrain the input slew rate and output

capacitance on output pins.

3.6 Design Constraints

3.6.1 create_clock command is used to define a clock object with a particular

period and waveform. The –period option defines the clock period, while the –

waveform option controls the duty cycle and the starting edge of the clock. This

command is applied to a pin or port, object types.

In some cases, a block may only contain combinational logic. To define delay

constraints for this block, one can create a virtual clock and specify the input and

output delays in relation to the virtual clock. To create a virtual clock, designers may

replace the port name (CLK, in the above example) with the –name <virtual clock

name>, in the above command. Alternatively, one can use the set_max_delay or

set_min_delay commands to constrain such blocks.

3.6.2 create_generated_clock command is used for clocks that are generated

internal to the design. This command may be used to describe frequency

divided/multiplied clocks as a function of the primary clock.

3.6.3 set_dont_touch is used to set a dont_touch property on the

current_design, cells, references or nets. This command is frequently used during

hierarchical compilation of the blocks. Also, it can be used for, preventing DC from

inferring certain types of cells present in the technology library.

3.6.4 set_input_delay specifies the input arrival time of a signal in relation to

the clock. It is used at the input ports to specify the time it takes for the data to be

stable after the clock edge. The timing specification of the design usually contains this

Page 46: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 29

information, as the setup/hold time requirements for input signals. Given the top-level

timing specification of the design, this information may also be extracted for the sub-

blocks of the design.

In Figure 3.3, the maximum input delay constraint of 23ns and the minimum input

delay constraint of 0ns is specified for the signal datain with respect to the clock

signal CLK, with a 50% duty cycle and a period of 30ns. In other words the setup-

time requirement for the input signal datain is 7ns, while the hold-time requirement is

0ns.

Figure 3.3 Specification of the Input Delay [10]

3.6.5 set_output_delay command is used at the output port to define the time

it takes for the data to be available before the clock edge. The timing specification of

the design usually contains this information. Given the top-level timing specification

of the design, this information may also be extracted for the sub-blocks of the design.

In Figure 3.4, the output delay constraint of 19ns is specified for the signal

dataout with respect to the clock signal CLK, with a 50% duty cycle and a period of

30ns. This means that the data is valid for 11ns after the clock edge.

Page 47: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 30

Figure 3.4 Specification of the Output Delay [10]

3.6.6 set_clock_latency command is used to define the estimated clock

insertion delay during synthesis. This is primarily used during the prelayout synthesis

and timing analysis. The estimated delay number is an approximation of the delay

produced by the clock tree network insertion (done during the layout phase).

3.6.7 set_clock_uncertainty command lets the user define the clock skew

information. Basically this is used to add a certain amount of margin to the clock,

both for setup and hold times. During the pre-layout phase one can add more margins

as compared to the post-layout phase.

3.6.8 set_false_path is used to instruct ICC to ignore a particular path for

timing or optimization. Identification of false paths in a design is critical. Failure to do

so compels DC to optimize all paths in order to reduce total negative slack.

Consequently, the critical timing paths may be adversely affected due to optimization

of all the paths, which also includes the false paths. The valid start point and endpoint

to be used for this command are the input ports or the clock pins of the sequential

elements, and the output ports or the data pins of the sequential cells.

3.6.9 set_max_delay defines the maximum delay required in terms of time

units for a particular path. In general, it is used for the blocks that contain

combinational logic only. . However, it may also be used to constrain a block that is

driven by multiple clocks, each with a different frequency.

Page 48: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 31

3.6.10 set_min_delay is the opposite of the set_max_delay command, and is

used to define the minimum delay required in terms of time units for a particular path.

Page 49: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 32

Chapter 4

Design Planning

4.1 Introduction

Design planning was not a concern when designs were relatively small (less

than one million placeable components). The implementation of those designs relied

on a at design methodology where the whole design was viewed as one entity.

However, as the level of integration increased and multi-million cell designs started to

appear, these designs exceeded the capacity of a design flow, and convergence issues

became more severe. At that point, design planning becomes a mandatory step for a

successful and efficient implementation of the designs.

Design planning here means the necessary design steps that are needed to

manage the implementation and verification of the various components of the design.

To manage complexity, a design planning system is responsible of partitioning the

design into a number of components/blocks such that each can be designed and

optimized independently. The top-level design constraints are partitioned and mapped

onto the blocks to ensure that the overall design meets its design targets. Once each

block is designed, design planning is responsible of the necessary steps to integrate

these blocks and ensure that the design goals are met.

Design planning was not a necessity in previous generations of process

technologies due to two reasons: the size of the design in terms of the number of

gates was reasonable, and the performance targets were modest. In current and future

process technologies, two aspects made design planning a necessity: the exponential

growth of the number of transistors that can be packed on a die, and the aggressive

and tight design constraints and market forces.

There are two kinds of digital design styles: custom and structured. Each

design style has its own design methodologies and goals. Custom designs are

typically used for high-end microprocessors, high-end graphics, and communication

designs. The other type of design is Application Specific Integrated Circuits (ASIC)

for various process generation technologies.

Page 50: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 33

The basic flow of physical design is as follows

Figure 4.1 Basic flow of Physical design [10]

Before entering in to design planning the first step of design flow is data setup Below

shows the diagram of data setup for the design.

Synthesis

Data Setup

Design Planning

Placement

Clock Tree Synthesis

Routing

Chip Finishing

Page 51: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 34

Figure.4.2 Data setup for design flow [10]

As shown in figure 4.2, explains about setup for physical design flow.

The logical and timing libraries are given by vendor. Constraints file written by

the designer. Technology file (.tf) and RC models (.TLU) are given by the vendor.

Gate level netlist is obtained from synthesis. These are mandatory inputs to tool to

obtain desired outputs.

4.2 Design planning

Design planning was not a concern when designs were relatively small. The

implementation of those designs relied on a flat design methodology where the

whole design was viewed as one entity. However, as the level of integration

increased and multi-million cell designs started to appear, these designs exceeded

the capacity of a flat design flow. Design planning became a mandatory step for

a successful and efficient implementation of the design.

Design planning here means the necessary design steps that are needed

to manage the implementation and verification of the various components of the

design. To manage complexity, a design planning system is responsible of

partitioning the design in to number of components/blocks can be designed

and optimize independently.

Page 52: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 35

Design Planning (figure 4.3) consists of macro planning, portioning and

global placement, power planning, top level routing, constraints management and

Top-level clock routing.

Figure 4.3 Design Planning [10]

4.3 Tasks to be performed during Design Planning

Initializing the Floorplan. Automating Die Size Exploration. Performing an

Initial Virtual Flat Placement. Performing Power Planning. Performing Prototype

Global Routing. Performing Hierarchical Clock Planning. Performing In-Place

Optimization. Performing RC Extraction. Performing Timing Analysis. Performing

Timing Budgeting.

Page 53: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 36

4.4 Macro Planning

Almost every design contains some I P blocks. IP blocks could come in

three different forms, they are soft, firm, and hard. Soft IPs i s typical l y

RTL designs with their verification counterparts. This IPs are usually

synthesized with the rest of the logic in the design and are handled any other

HDL module. Firm IPs are those that consist of a synthesized netlist (logic

netlist), and are usually placed so that the module can be characterized

with respect to timing and power. Such a module is represented as a soft

macro in the physical design world, and could be placed manually or

automatically in the floor planning stage. Hard IPs is usually the most common

form of IPs. This IPs form the memory blocks, analog, R F, and other custom

circuitry. Most often, the designer is responsible of the placement of these

IPs because of their dependence on their outside connectivity ( off-chip

buses), o r because of the sensitivity of their circuits as in the analog/RF case.

Such blocks are usually very sensitive and require special attention when

placing them and routing.

Over or nearby them. For the most part, traditional placement engines

do not the top-level macros automatically. However, better automated

placement can be attained by on some hints provided by the user that can

specify the side of the die where the macros should reside, or some form of

clustering which serves to simplify the placement job and improve the QoR.

4.5 Partitioning

The number of devices on a single die is increasing rapidly due to

the continuing shrinking of the process technology. Today, billion-transistor

systems have become a reality. Such complexity necessitates a divide-and-

conquer approach to manage the design process. Partitioning plays a key role in

attaining the design goals in an acceptable turnaround t i m e . However due to

the t ight design constraints present, partitioning becomes a formidable task.

Partitioning the top-level constraints amongst the different blocks is a

formidable task by itself entails performing budgeting of the top -level constraints

amongst the various blocks. Since the partitions have not been

Page 54: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 37

implemented yet, it is hard to estimate the performance and area of these

partitions. This makes deciding on accurate timing budgets early in the design

planning stage a difficult task. To avoid all the issues mentioned above,

designers tend to opt for a flat implementation of the design whenever possible.

However, due to the large size of ICs today, the implementation algorithms are

not scaling at a comparable rate to the aggressive levels of design integration.

This fact forces designers to engage in design planning and carry out

the partitioning step to be able to manage and design the partitions

concurrently and independently. Typically, partitioning and g l o b a l placement go

hand-in-hand. To produce good quality partitions, the top-level connectivity

(global routing) of the partitions has to be taken into account. Most of the

global placement algorithms have some form of partitioning and global routing

embedded in t h e m t o accomplish this task.

4.6 Power Planning

Power integrity is an important factor to any successful design. Power

plays a key role in achieving the speed target set for the design. In addition, it

plays a key role in the reliability and proper functionality of the design. In

nanometre technologies, designs switching at high frequencies require a

comprehensive design approach of the power network that takes into account the

chip and package. Noise sources in the package such as inductive noise, signal

reflections due to impedance mismatches, and signal coupling are no longer

negligible; they could travel to the chip core and affect the power levels seen by

the clock buffers (power supply drop and ground bounce).This will limit the

performance of the clock network and may negatively affect its reliability, it

cannot be guaranteed to provide the necessary power levels and could cause the

clock network design to fail.

Page 55: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 38

4.7 Defining Chip Area

Size of your design while maintaining the relative placement of hard

macros, I/O cells and a power structure that meets the voltage drop

requirements. The technology into the design planning will estimate the area in cell

view. It specifies the exact width and height of die. The aspect ratio target

utilization so that tool will estimate the width and height. The exact boundary

of die area in case block is rectilinear one. Rectilinear means a die area having

more than four corners.

The core size, chip area, chip utilization and aspect ratio for the design are

discussed with formulas in the below context.

Core area = Standard cell area + Macro Area …………………………..(4.1)

Standard cell utilization

Die Size = Core Size + IOtoCoreclearance + Area pad………………………(4.2)

IO-core clearances are the space from the core boundary to the inner side

of I/O pads (Design Boundary). Blockages macros and pads are combined in the

denominator of the effective Utilization. Aspect ratio for the design is

Aspect ratio = W = Horizontal routing resources ..........................................(4.3)

H Vertical routing resources

Chip Utilization is defined as the ratio of the area of standard cells, macros,

and the pad cells with respect to area of chip.

Chip Utilization = Area (standard cells) + Area (macros) + Area (pad cells) ....(4.4)

Area (chip)

Page 56: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 39

4.8 Virtual Flat Placement

The initial virtual flat placement is very fast and is optimized for wire length,

congestion and timing. To perform initial virtual flat placement is described there

are some steps to follow.

To evaluate initial hard Macro Placement, No straightforward criteria exist

for evaluating the initial hard macro placement. Measuring the quality of results

(QoR) of the hard macro placement can be very subjective and often depends

on practical design experience.

Some of the constraints are specified for hard macro placement. Different

methods can be use to control the pre placement of hard macros and improve the

QoR of the hard macro placement. To create a user-Defined array of hard macros

and by setting floor plan placement constraints on macro cells. Place a macro cell

relative to an anchor object. Using a virtual placement strategy create macro

blockages for hard macros and pad the macros to their respective positions.

Standard cell placement tile is used during placement phase. the placement

tile is defined by one vertical routing track and the standard cell height. Placement

and routing blockage layer definitions are internal to physical design tools. To

avoid placing standard cells too close to macros, which can cause congestion or

DRC violations, one can set a user-defined padding distance or keep out

margin around the macros. One can set this padding distance on a selected

macro’s cell instance master. During virtual flat placement no other cells will be

placed within the specified distance from the macro’s edges.

Page 57: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 40

Chapter 5

Placement and Power Planning

5.1 Introduction

Power integrity is an important factor to any successful design .

Power integrity refers to the notion of providing each circuit in the design the

required supply voltage to enable proper switching. Given the loss nature of

chips and the various noise- inducing factors that m a k e this task almost

impossible, the goal of power design is to provide reliable power levels within

acceptable design margins to the various switching devices in the chip.

Given that the design of a reliable and robust clock network necessitates

a design of a robust and reliable power network, in this chapter we discuss

the various factors that play a role in the design of the power delivery system.

Reliable power directly affects the performance and reliability o f the design.

The delays of the switching devices are directly proportional to the power levels

they receive. In addition, t h e design of the power distribution network is a

function of the number of switching devices, their switching speeds, their sizes,

there locations, and their interconnections.

Failure to design a robust power network and provide the required

power levels to the different parts of the chip will cause the design to

violate its performance constraints, and potentially, it might lead to failure in

functionality. With the down-scaling of the process technology, the noise margins

have shrunk 10’s of millivolts. Any perturbation in the power delivery network

could cause a design failure.

In nanometre technologies, designs switching at high frequencies require

a com- prehensile design approach of the power network that takes into account

the chip and package Thus if the design of the power network does not account

for the package’s effects, it cannot be guaranteed to provide the necessary

power levels and could cause the clock network design to fail. This is why

we believe an in depth study of the power design in the package and chip is

needed.

Page 58: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 41

5.2 IO Power

Due to the fast switching speeds of high-end IO circuits careful design of the

power network for these circuits is required. The IO power network is separated

from the core power network. This is done not only because the IO circuits might

have different supply voltages, but also to protect the IO and the core from

the high- frequency effects caused by the switching of these IO cells. The IO

circuits are typically large buffers that draw large currents when switching on and

off. Due to the high-inductive nature of the package and board traces, the

inductive noise (L) is typically very large and could cause logic failures to the core.

Due to the high inductance of the package traces, the inductive noise

could wreak in power supply of the switching IO drivers as well.

5.2.1 Core Power

The core power network is separated from the IO power network as

discussed above. This results in separate power and ground pads on the chip to

supply current to the IOs and core. The number of power pads needed is a function

of the current needs of the logic, the size of the die, and the layout of the chip. If

the pads are bounded to the periphery of the die, then the number of pads needed is

a function of the resistivity of the power grid and the estimate of the current needs.

This ensures that the IR Drop constraint is honored and the needed current is applied

to the design.

The inductance of the power grid is gaining importance in nanometre and

high-end designs. Careful design of the power and ground networks is needed to

make sure the current loops are as small as possible to reduce the effective

inductance that is seen by the switching devices.

5.3 Power Network Synthesis

Power network synthesis offers advanced power planning technology and

helps solve signal integrity problems without lengthy and tedious iterations. By

performing power network synthesis, you can view early power plan and thereby

reduces the chance of encountering electro migration and voltage drop problems

later in the routing.

Page 59: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 42

Before doing the power plan the prerequisites to be done for the design as

shown in figure 5.1 as power planning.

Figure.5.1 Power Planning [10]

5.3.1 Power Pads

Power pads are used to supply power to the core the number of power pads

for each side of the core is decided by the factors like Total core power, number

of sides core voltage and maximum allowable current is current density of that pad.

Number of power pads on each Side = [Total Core Power / (Number of

side*Core Voltage * Maximum Allowable current on Each I/O Pad)]……. (5.1)

Total Core Power = [Total dynamic Power of core / Core voltage]……...... (5.2)

Current Drawn by the core = (Total Core Power / Worst Case Voltage)……. (5.3)

Page 60: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 43

5.3.2 Rectangular Rings

Rectangular Rings are the core part of the design to supply power to the

core cells and I/O cells for the design this rings are used, we can use two rings

one for VDD and one for VSS. The Ring width is calculated by using the formula

as follows.

Core ring width for Metal 4 = (Current Drawn by Core / (2*JMetal4* core power

pad for each side of the chip))…………………………………………...... (5.4)

Note: current in to core is split in to two directions so we can multiply by 2.

Core ring width for Metal 5 = (Current Drawn by Core / (2*JMetal5* core power

pad for each side of the chip))…………………………………………….. (5.5)

5.3.3 Power Straps

If the design cell count is more we cannot supply power to each cell in

the core through rectangular rings, if we supply power through rings more

power is required to drive power to cells and we can’t supply power to each cell

in the core so we are using power straps to supply power.

To calculate how many power straps are required for the design we

require core height, width, power, voltage and current to calculate power Straps

Max Vertical Strap spacing = Lmax = Vmax/(J x Rsh)………………………. (5.6)

No. of vertical straps = Nv = (Core Width)/ Lmax …………………………… (5.7)

Max Horizontal strap spacing = Lh = 2 x Lmax ……………………………… (5.8)

No. of horizontal straps = Nh = (Core Height) / Lh ………………………… (5.9)

Strap Width = Wring / (Nv x Nh)…………………………………………….. (5.10)

Where Vmax = Maximum Voltage

J = Current Density

Rsh = Sheet Resistance

5.4 Placement

This step is tasked with placing the cells (placeable components) legally

such that there is no cell overlap and congestion is minimized. The objective of

the placement is to reduce area, wire-length, and improve timing. Placement is

divided to two parts, global placement and detailed placement.

Page 61: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 44

5.4.1 Tasks to Be Performed During Placement

In placement stage primary task to define placement blockages are areas

that leaf cells must avoid during placement and legalization, including

overlapping any part of the placement blockage. Placement blockages can be hard

or soft. A hard blockage prevents cells from being put in the blockage area. A soft

blockage restricts the coarse placer from putting cells in the blockage area, but

optimization and legalization can place cells in a soft blockage area.

Second task is to set placement options to minimize congestion during

placement and optimization. Congestion occurs when the number of wires

going through a region exceeds the capacity of that region. Third task is to

automatically insert protection diodes on sub design ports to prevent antenna

violations at the top level.

Next task is to perform placement and optimization. To perform this task

uses the commands available in the tool. In physical optimization one can run

incremental placement-based optimization that supports area recovery, design rule

fixing, sizing and route-based optimization.

5.4.2 Global Placement

In global placement, the objective is to distribute the cells over the die-

area in a such a fashion that global objectives (timing, w i r e -length) are attained.

It is permissible to have overlaps amongst the cells. At this s tage, the

objective is to be able to compile some estimates of the die-area, wire-length,

and timing violation. If the produced estimates are not satisfactory better

partitioning, design guides, and re-planning the IO signals or the hard macros

are carried out to improve the results of the global placement.

Page 62: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 45

5.4.3 Detailed Placement

In detail placement, the cells which are clustered together or are on

top of each other as a result of the previous step are spread and re-

ordered. The objective is to produce a legal placement (no overlaps),

minimize congestion, and improve wire-length. Again different design constraints

can be imposed on the detail placer so that the legalization step does not

wreak havoc in the timing of the design.

Page 63: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 46

Chapter 6

Clock Tree Synthesis

6.1 Introduction

The design of the clock network has become a challenging task due to the

growing complexity of the design, the down-scaling of the process technology,

and the increasing frequency of the devices. In nanometre designs, tight design

constraints related to skew; power and latency are imposed on clock network.

This task of designing clock network is further complicated by the fact that

reasonably accurate cell and interconnect delay estimated are needed for the

design for the clock network, 40% of the total power is consumed by clock network

in the design. Accurate power planning depends on knowing the placement and

sizes of the cells in the design, especially the clock buffers and sequential elements.

In a synchronous digital system clock signal is used to synchronize the

movement of data within the system. Clock signals are required to be distributed at

physically remote locations of an integrated circuit. Clock signals transitions drive

all the synchronous elements of a digital circuit like Flip Flops and Memories.

These elements are referred to as Sinks. Clock Distribution Networks (CDN) is

circuits that distribute a clock signal from a central global clock source at the

centre of the Integrated circuit to all the sinks which use it.

In the process of Clock distribution, the clock signal traverses through a lot

of interconnect networks and buffers which are a part of the clock distribution

network. These elements introduce delay in the clock signal path. Ideally, a clock

signal should arrive at all the sinks at the same time. But due to the variations in

parameters like wire interconnect length, temperature variations, capacitive

coupling and process variations; the arrival time of the clock transition at

different sink locations varies. Clock skew as a fraction of the cycle time, is a

growing problem for faster chips. The problems for clock skew are fewer gate

delays, large clock loads.

This spatial variation in the arrival time of the clock transition on an

integrated circuit is commonly referred to as Clock Skew. For two points i and j,

Page 64: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 47

if the arrival times of the clock signals are ai and aj respectively then the clock

skew between two points is given by d(i,j) = ai-aj.

Figure 6.1 Clock Skew Illustrations [18]

Clock signals typically have the highest fan out and operate at the

highest speed in a synchronous digital s y s t e m . Since t h e c l o c k s i g n a l s a r e

u s e d t o synchronize the operations of the entire digital circuit the clock transitions

should be sharp and should have minimum possible skew to avoid any data

integrity errors or race conditions. As the frequency of operation of the synchronous

circuit increases the circuit becomes more and more susceptible to clock skew i.e.

the timing becomes more and more critical.

At this point we introduce another term called Slew Rate. Slew is

maximum rate of change of signal in a circuit. Slew depends upon the time it takes

for a signal to rise (fall) from logic low (logic high) to logic high (logic low). More

commonly, it depends on the time it takes for a signal to rise from 10% to 90% or

fall from 90% to 10% of the supply voltage. For a clock distribution network, in

addition to achieving minimum skew, it should also try to obtain as high slew

rate as possible (i.e. minimum time to change from one logic level to another).

Another factor of prime importance in clock distribution network design is

the power consumption. Clock distribution networks account for a significant

component of power consumption on an integrated circuit. It is therefore

absolutely essential to build clock distribution networks with minimum possible

power consumption.

Page 65: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 48

6.2 Clock Network Modelling For Logic Synthesis

The clock tree should be modelled and analyzed at different stages of

the design. Modelling of clock tree is done differently in synthesis and backend

stage. The modelling of clock at these stages is discussed below.

6.2.1 Virtual Clocks

Prior to the interconnect-dominated era, a clock tree was synthesized and

the buffers were inserted based on s o m e load-driven delay estimates.

Since the interconnect resistance was so low, those estimates did not differ

much from the actual delay values after implementation. In the interconnect-

dominated era, such an approach is no longer viable. To overcome this problem,

most designers estimate the clock timing annotations (latency, uncertainty, skew)

and annotate them on an ideal clock network. The hope is that these estimates will

more conservative than the implementation results, and the design converges.

6.2.2 Trail Clock Network Synthesis

A second approach is to do placement and clock tree synthesis under the

hood while doing logic synthesis in order to get reasonable clock annotations. Once

logic synthesis is done, the clock network is removed before handling the netlist off

to the physical synthesis stage. Although there is the issue of correlation between

the final clock network that is synthesized after the P&R stage and that built

during logic synthesis based on global placement information, this approach is

an improvement over the ideal clock assumption since it captures the global

placement as well as the global congestion in the design when synthesizing the

clock network. Since physical synthesis could make big changes to the design in

order to close timing or improve some metric be it power, routability, or noise-

related issue, the clock estimates generated during logic synthesis are likely to be

off compared to final numbers.

Page 66: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 49

6.3 Clock Design at Implementation Stage

In the implementation stage, the design face’s with the same issue as in

the logic synthesis stage. As we mentioned earlier, clock network synthesis requires

cell placement, in particular latch placement to be able to extract realistic parasitics

and do delay calculation. However, early in the design planning stage cell

placement is not done and the need to make some assumptions about the clock

network is still present. In a similar fashion to logic synthesis, physical synthesis

has make some assumption about the clock annotations or synthesize a clock

network under the hood as it tries to optimize the physical netlist to meet the design

constraints.

It is worth mentioning that in both the logic synthesis and the

physical synthesis, the algorithms are iterative in nature. This leads to multiple clock

network synthesis processes if the chosen route is to synthesize the clock

network under the hood. Since at the end of both stages and prior to the final clock

network synthesis the clock network is discarded off, a lot of time and resources

are wasted. In some cases, it is due to the lack of the proper automation algorithm

due to the complexity of the problem at hand (NP-complete problems), and in other

cases, the iterative and incremental nature of the flow is due to legacy reasons.

6.3.1 Clock Network Synthesis in a Flat Design Flow

The clock network synthesis approach is directly affected by the RTL-

to- GDSII design flow. The clock network synthesis approach in a flat fashion.

However if the design flow is hierarchical, the clock network synthesis can be

done either hierarchically or in a similar fashion to the flat approach.

6.3.2 Clock Network Synthesis in a Hierarchical

Design Flow

In a hierarchical implementation of the design, each partition will have its

own clock driver and its clock tree or network. One way of creation the clock port

for each partition is to synthesize the clock network flat, and then use the

information of the produced network to add clock ports to the partitions. In addition,

the produced clock network will provide information on the latency and skew in

Page 67: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 50

each partition.

The timing analysis in each partition is done with ideal clocks whose

latency, skew and jitter/uncertainty values are estimated based on the global clock

planning. Since neither the placement nor the routing on which the clock plan

relied are the final placed and routed netlist, the estimates could be off as compared

with final place optimized and routed designs.

6.4 Prerequisites for Clock Tree Synthesis

Before doing clock tree synthesis there are some factors to be consider

to check whether the design is ready for cts or not.

6.4.1 Design Prerequisites

Before running clock tree synthesis, the design should meet the following

requirements and if issues a raised, the designer has to repeat previous steps.

The design is placed and optimized. Check whether the placement is

legalized or not. The estimated QoR for the design should meet your requirements

before you start clock tree synthesis.

If congestion issues are not solved before clock tree synthesis, the addition

of clock trees and placement of buffers can increase the congestion, if the design

is congested, you can rerun placement step or identify the congestion spots by

reloading the design and remove them by finding coordinates of particular tracks.

To ensure that the clock tree can be routed, verify that the placement is

such that the clock sinks are not in narrow channels and that there are no

blockages between the clock root and its sinks, if these condition occur, fix the

placement before running before clock tree synthesis. The power and ground nets

are prerouted. High- Fanout nets, such as scan enables, are synthesized with buffers.

Page 68: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 51

6.4.2 Library Prerequisites

Any cell in the logic library that you want to use as a clock tree reference (a

buffer or inverter cell that can be used to build a clock tree) or for sizing of gates

on the clock network must be usable by clock tree synthesis and optimization.

By default, clock tree synthesis and optimization cannot use buffers and

inverters that have the dont_use attribute to build the clock tree. If we have cells

with no reference pin of particular gate in the library put dont_use on that gate and

link the library and perform clock tree synthesis.

The physical library should include all clock tree references (the buffer and

inverter cells that can be used to build the clock trees). Routing information,

which includes layer information and non-default routing rules. Resistance and

capacitance

Information models used to estimate the Resistance and capacitance.

6.5 Clock Distribution Architectures

The clock distribution network is responsible to provide a reliable and

stable environment for the clock signal to reach the clock cells. To do so, the

distribution network should provide immunity from systematic and random

variations which could distort the clock signal as it travels to the destinations.

The clock distribution network is typically composed of two parts: global

and local. The global clock network delivers a reliable and low skew clock to

different parts (sections or blocks) of the chip. The global clock network could be a

grid a synthesized tree, an H-tree or a hybrid network which uses a combination of

these topologies. H-tree driving a mesh or a set of spines is a favourite top-level

clock network topology due to the simplicity of the H-tree although its

implementation in nanometre has become very challenging. The integrity of both

the global and the local parts of the distribution network are needed to provide a

reliable and robust clock network.

Page 69: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 52

6.5.1 Tree

It is the most common topology choice for ASIC designs. Although trees

provide the least control over skew, they exhibit low power consumption, and

low area overhead. Low to middle frequency designs employ trees while high-end

designs employ custom-made topologies.

A tree generated by a clock tree synthesis engine is shown fig 6.2 as

tree generated by clock tress synthesis engine.

Figure.6.2 A tree generated by clock tree synthesis engine [19]

6.5.2 H-Tree

H-tree is a tree topology which relies on matching delays to all clock sinks

in the network (figure6.3).This is accomplished by placing nodes at equidistant

positions from their roots and by matching delays to all nodes at the same level. In

real designs where hard macros and congestion may make such an ideal network

unrealizable, for the best way H-tree try to optimize for whatever skew present at

the leaves of the H- tree. A clock tree in the clock distribution network is show in

fig 6.3 clock tree as follows.

Figure.6.3 H-tree balances skew by equidistant paths form root to sinks [19]

Department of ECE, JNTUHCEH 52

6.5.1 Tree

It is the most common topology choice for ASIC designs. Although trees

provide the least control over skew, they exhibit low power consumption, and

low area overhead. Low to middle frequency designs employ trees while high-end

designs employ custom-made topologies.

A tree generated by a clock tree synthesis engine is shown fig 6.2 as

tree generated by clock tress synthesis engine.

Figure.6.2 A tree generated by clock tree synthesis engine [19]

6.5.2 H-Tree

H-tree is a tree topology which relies on matching delays to all clock sinks

in the network (figure6.3).This is accomplished by placing nodes at equidistant

positions from their roots and by matching delays to all nodes at the same level. In

real designs where hard macros and congestion may make such an ideal network

unrealizable, for the best way H-tree try to optimize for whatever skew present at

the leaves of the H- tree. A clock tree in the clock distribution network is show in

fig 6.3 clock tree as follows.

Figure.6.3 H-tree balances skew by equidistant paths form root to sinks [19]

Department of ECE, JNTUHCEH 52

6.5.1 Tree

It is the most common topology choice for ASIC designs. Although trees

provide the least control over skew, they exhibit low power consumption, and

low area overhead. Low to middle frequency designs employ trees while high-end

designs employ custom-made topologies.

A tree generated by a clock tree synthesis engine is shown fig 6.2 as

tree generated by clock tress synthesis engine.

Figure.6.2 A tree generated by clock tree synthesis engine [19]

6.5.2 H-Tree

H-tree is a tree topology which relies on matching delays to all clock sinks

in the network (figure6.3).This is accomplished by placing nodes at equidistant

positions from their roots and by matching delays to all nodes at the same level. In

real designs where hard macros and congestion may make such an ideal network

unrealizable, for the best way H-tree try to optimize for whatever skew present at

the leaves of the H- tree. A clock tree in the clock distribution network is show in

fig 6.3 clock tree as follows.

Figure.6.3 H-tree balances skew by equidistant paths form root to sinks [19]

Page 70: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 53

A clock buffer in the clock tree is used to balance the output loads

and minimize the clock skew a delay line can be added to the network to

meet the minimum insertion delay (for clock balancing) , buffers are used to speed

up the clock signals. The effects of cts in the design several hundreds of clock

buffers are added to the design, Placement and routing congestion may increase,

Timing violations can be introduced.

Clock Planning in flat implementation of the clock relied on a

preliminary placement of the logic cells in the blocks; it is obvious that such a

design flow does provide guarantees of convergence. None of t h e l e s s ,

hierarchical design implementation. Typically this is the adopted flow. It is this

lack of convergence guarantees and the complexity of the flow that make

designers lean toward a flat implementation whenever such an approach is

feasible.

6.6 Algorithms for clock tree construction

The first geometric algorithms for clock routing evaluated skew in terms

of wire length from the source to sinks and produced minimum wire length trees

for a given sink clustering using the deferred merging and embedding (DME)

principle. The deferred-merge embedding (DME) algorithm defers the choice of

merging (tapping) points for sub trees of the clock tree. The principle of algorithms

works on Manhattan geometry.

6.7 Analyzing the Clock Trees

Before running clock tree synthesis, analyze each clock tree to

determine its characteristics and its relationship to other clock trees in the design.

For each clock tree, determine the clock root pin or position of clock tree and

number of clock tree sinks and clock tree exceptions. The number of clock tree

levels, if any pre-existing cells such as clock-gating cells are present in the design.

If any logical design rules constraints like maximum Fanout, transition time and

maximum capacitance are applied to the design. If there are any routing

constraints like routing rules and metal layers are applied to the design whether

the clock tree has timing relationships with other clock trees in the design, such as

Page 71: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 54

interlock skew requirements.

6.7.1 Identify the Clock Tree End Points

Clock paths have two types of endpoints. Stop pins are the endpoints of

the clock tree that are used for delay balancing. During clock tree synthesis, IC

Compiler uses stop pins in calculations and optimizations for both design rule

constraints.

Exclude pins are clock tree endpoints that are excluded from clock tree

timing calculations and optimizations. Verify that the default sink pins (implicit stop

pins), implicit nonstop pins, and implicit exclude pins are accurate by generating

a clock tree exceptions report. If the default sinks pins, implicit nonstop pins, and

implicit exclude pins are correct, you are done with the clock tree exception

definition.

6.7.2 Analyzing Clock Sink Groups

A clock sink group is a group of clock sinks driven directly by

a single net. The sink group assumes the net name. Sink groups can have timing

relationships when an endpoint in a sink group has one or more start points or

endpoints in another sink group. Each start point-and-endpoint pair forms one

timing relationship path.

6.7.3 Defining Clock Root Attributes

If the clock root is an input port (without an I/O pad cell), must

accurately specify the driving cell of the input port. A weak driving cell does not

affect logic synthesis, because logic synthesis uses ideal clocks. However, during

clock tree synthesis, a weak driving cell can cause IC Compiler to insert extra

buffers as the tool tries to meet the clock tree design rule constraints, such as

maximum transition time and maximum capacitance. If not specified a driving

cell (or drive strength), IC Compiler assumes that the port has infinite drive

strength. If the clock root is an input port with an I/O pad cell, must

accurately specify the input transition time of the input port.

Page 72: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 55

6.8 Clock Skew Scheduling

As clock network design became more complex, and design

convergence became harder, design emphasis shifted from designing minimum-skew

networks to designing low-skew. Clock networks while reducing power and

improving robustness. By utilizing the available skew, timing convergence of the

design can be enhanced. There are two approaches to skew scheduling they are

useful skew and intentional skew.

Useful skew converts the present skew into a useful timing budget that can

be allocated to that critical or near critical paths in the design. This is done by

shifting the clock assertions such that STA will use the adjusted clock

assertions when checking for timing violations and reporting critical paths.

Intentional Skew is a sequential optimization technique to design skew in

as part of the logic/physical synthesis stage. The task becomes one of designing a

clock network satisfying the skew constraints generated by the synthesis engine to

optimize performance.

This can be accomplished either by inserting intentional delay

buffets/inverters on the clock paths that need to be delayed or it can be

accomplished by sizing the buffers/Inverters to decrease or increase the delay

along some paths Although skew scheduling reduces the number of close-to-zero

skew clock nodes, the actual physical implementation of the clock network

becomes harder. Another advantage of this approach is to reduce the number of

simultaneous switching clock cells by delaying the toggling of some of the

registers. This is desirable in order to reduce the peak current consumption of

the clock network and reduce the noise injected into the power grid.

6.9 Optimization of Registers

Circuit optimization plays a key role in the performance and cost of the

clock network. Selection and optimization of the type of latch or register to be

used in the clock network has a great impact on the achieved skew and power. To

properly assign the type of latch needed, timing analysis is performed to annotate

the netlist with the correct path constraints and path slacks. Since different logic

paths have different skews, the fastest latch is not warranted on every path. A

Page 73: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 56

trade-off can be made between power and performance on the less critical paths.

Typically, such an optimization is not carried out as part of the back-end flow

since changing the registers used in the RTL, netlist is not encouraged. However,

given that most of the power is in the last stage of clock network and to converge on

design timing.

6.10 Clock Analysis and On-Chip Variation

If statistical analysis algorithms are not employed in the verification of

the clock network, designers have to rely on worst case decisions to study the

timing of the clock in the presence of process variations. Although this conservative

approach makes converging on the timing of a design with very tight timing

constraints difficult When a cell is common between a clock path and a data path.

This causes pessimism in the analysis not realistic to have cells in that path.

Page 74: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 57

Chapter 7

Routing

7.1 Introduction

After CTS, the routing process determines the precise paths for interconnections.

This includes the standard cell and macro pins, the pins on the block boundary or pads

at the chip boundary. After placement and CTS, the tool has information about the

exact locations of blocks, pins of blocks, and I/O pads at chip boundaries. The logical

connectivity as defined by the netlist is also available to the tool. In routing stage,

metal and vias are used to create the electrical connection in layout so as to complete

all connections defined by the netlist. Now, to do the actual interconnections, the tool

relies on some “Design Rules”.

It is essential that tool completes all connections that are defined by the netlist

(100% routability), i.e. no LVS errors. No design rules are violated in completing the

routes (No DRC errors). All timing constraints are met.

7.2 Process Design Rules

In the Physical Design Flow, an input to the PnR tool is a ‘Technology File’

(or technology LEF for Cadence.) These are the constraints that the router should

honour. Designer’s techfile will have many more parameters for each layer. As in the

layer M1 above, minimum spacing, minimum width, minimum area etc are defined. It

also specifies which via connects the two metal layers M1 & M2. If any of these

parameters like spacing, width, via size etc are violated for any routing the tool does,

you will get a DRC error.

7.3 Routing Grid

Most of the routers available are grid based routers. There are routing grids

defined for the entire layout. Consider it like a graph as below. For grid based routers,

there is also preferred routing direction defined for each metal layer. e.g. Metal1 has a

Page 75: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 58

preferred direction of “horizontal’, metal2 has preferred routing direction of “vertical’

and so on. So, in the whole layout, metal1 routing grids will be drawn (superimposed)

horizontally with metal1 wire picthand metal2 grids will be drawn vertically with

metal2 wire pitch between each. In the technology section above has a “pitch” defined

for metal1.

The first figure 7.1 on left figure shows how routing grids are drawn. I am only

considering two metals for now, but in a process with more metals, similar grids will

be superimposed on the layout for all available metals. Pitch is calculated by

determining the minimum spacing required between grid lines of same metal. This

can be the minimum spacing of the metal itself, but is usually a value greater than the

minimum spacing. This is calculated by taking into account the via dimension as well,

so that no two adjacent wires on the grid create any DRC violation even when there

are vias present.

Figure 7.1 Routing grids [20]

In a grid based routing algorithm, the router switches the metal as per

preferred direction to interconnect the nodes. In the second figure 7.2, metal1 &

metal2 wires are drawn along the metal1 & metal2 grids respectively. They are

interconnected by via1 to complete the routing path.

Department of ECE, JNTUHCEH 58

preferred direction of “horizontal’, metal2 has preferred routing direction of “vertical’

and so on. So, in the whole layout, metal1 routing grids will be drawn (superimposed)

horizontally with metal1 wire picthand metal2 grids will be drawn vertically with

metal2 wire pitch between each. In the technology section above has a “pitch” defined

for metal1.

The first figure 7.1 on left figure shows how routing grids are drawn. I am only

considering two metals for now, but in a process with more metals, similar grids will

be superimposed on the layout for all available metals. Pitch is calculated by

determining the minimum spacing required between grid lines of same metal. This

can be the minimum spacing of the metal itself, but is usually a value greater than the

minimum spacing. This is calculated by taking into account the via dimension as well,

so that no two adjacent wires on the grid create any DRC violation even when there

are vias present.

Figure 7.1 Routing grids [20]

In a grid based routing algorithm, the router switches the metal as per

preferred direction to interconnect the nodes. In the second figure 7.2, metal1 &

metal2 wires are drawn along the metal1 & metal2 grids respectively. They are

interconnected by via1 to complete the routing path.

Department of ECE, JNTUHCEH 58

preferred direction of “horizontal’, metal2 has preferred routing direction of “vertical’

and so on. So, in the whole layout, metal1 routing grids will be drawn (superimposed)

horizontally with metal1 wire picthand metal2 grids will be drawn vertically with

metal2 wire pitch between each. In the technology section above has a “pitch” defined

for metal1.

The first figure 7.1 on left figure shows how routing grids are drawn. I am only

considering two metals for now, but in a process with more metals, similar grids will

be superimposed on the layout for all available metals. Pitch is calculated by

determining the minimum spacing required between grid lines of same metal. This

can be the minimum spacing of the metal itself, but is usually a value greater than the

minimum spacing. This is calculated by taking into account the via dimension as well,

so that no two adjacent wires on the grid create any DRC violation even when there

are vias present.

Figure 7.1 Routing grids [20]

In a grid based routing algorithm, the router switches the metal as per

preferred direction to interconnect the nodes. In the second figure 7.2, metal1 &

metal2 wires are drawn along the metal1 & metal2 grids respectively. They are

interconnected by via1 to complete the routing path.

Page 76: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 59

Figure 7.2 Routing with two different metals [20]

7.4 Global & Detailed routing

The PnR tools do routing in various stages, like global routing, track

assignment and detailed routing. It could also be that all these algorithmic stages are

masked from you and you just have a couple of commands to play with. Most PnR

tools deal with the routing problem in a two stage approach. In global routing, the tool

partitions the design into routing regions. A rough route is determined taking into

account the number of tracks available in each region. Routing congestion is also

determined at this stage by calculating 1) how many nets should pass through the

region; 2) How many routing tracks are available in the region. In detailed routing,

global routing results are used to lay the actual wires interconnecting the nodes. Do a

man on the routing options command and how much controllability is available in

each of these stages for the tool of our choice.

7.5 Routing Congestion

It is difficult to route a highly congested design. Some not-so congested

designs may have pockets of high congestion which will again create routing issues. It

is important that the congestion is analysed and fixed before detailed routing. After

CTS, the tool can give you a congestion map by a trial route/ global route values.

Department of ECE, JNTUHCEH 59

Figure 7.2 Routing with two different metals [20]

7.4 Global & Detailed routing

The PnR tools do routing in various stages, like global routing, track

assignment and detailed routing. It could also be that all these algorithmic stages are

masked from you and you just have a couple of commands to play with. Most PnR

tools deal with the routing problem in a two stage approach. In global routing, the tool

partitions the design into routing regions. A rough route is determined taking into

account the number of tracks available in each region. Routing congestion is also

determined at this stage by calculating 1) how many nets should pass through the

region; 2) How many routing tracks are available in the region. In detailed routing,

global routing results are used to lay the actual wires interconnecting the nodes. Do a

man on the routing options command and how much controllability is available in

each of these stages for the tool of our choice.

7.5 Routing Congestion

It is difficult to route a highly congested design. Some not-so congested

designs may have pockets of high congestion which will again create routing issues. It

is important that the congestion is analysed and fixed before detailed routing. After

CTS, the tool can give you a congestion map by a trial route/ global route values.

Department of ECE, JNTUHCEH 59

Figure 7.2 Routing with two different metals [20]

7.4 Global & Detailed routing

The PnR tools do routing in various stages, like global routing, track

assignment and detailed routing. It could also be that all these algorithmic stages are

masked from you and you just have a couple of commands to play with. Most PnR

tools deal with the routing problem in a two stage approach. In global routing, the tool

partitions the design into routing regions. A rough route is determined taking into

account the number of tracks available in each region. Routing congestion is also

determined at this stage by calculating 1) how many nets should pass through the

region; 2) How many routing tracks are available in the region. In detailed routing,

global routing results are used to lay the actual wires interconnecting the nodes. Do a

man on the routing options command and how much controllability is available in

each of these stages for the tool of our choice.

7.5 Routing Congestion

It is difficult to route a highly congested design. Some not-so congested

designs may have pockets of high congestion which will again create routing issues. It

is important that the congestion is analysed and fixed before detailed routing. After

CTS, the tool can give you a congestion map by a trial route/ global route values.

Page 77: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 60

There are commands to check routability which gives you congestion numbers,

blocked pins etc, like check_routability.

7.6 Routing Order

It is recommended to route sensitive nets like clock before the rest of the signal

route. Completing power routing after the floorplan stage. Anyway the order of

routing is:

7.6.1 Power Routing

Connect the macro and standard cell power pins to the power rings and straps

which is created for the design. IR drop

7.6.2 Clock Routing

Do not upset the skew and delay values for the clock net as much as possible.

So the clocks are given higher priority in using routing resources and routed prior to

any other net routing. Clock routing can be limited to higher metal layers for reduced

RC numbers.

7.6.3 Signal Routing

The rest of the nets are routed. We can also route groups of nets, and non-

default routing rules can also be applied to select nets.

Page 78: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 61

Chapter 8

Signal Integrity

8.0 Signal Integrity

Signal integrity is the ability of an electrical signal to carry information

reliably and to resist the effects of high-frequency electromagnetic interference from

nearby signals. The following conditions can impact signal integrity:

8.1 Introduction

For nanometre designs it is no longer sufficient to just achieve timing

closure—a design must also reach signal integrity (SI) closure. SI closure implies that

the design is free from SI-related functional problems and meets its timing goals

while accounting for the impact of SI (see Figure 8.1).

Figure 8.1 SI closure criteria [21]

8.2 Crosstalk

Crosstalk is the undesirable electrical interaction between two or more

physically adjacent nets due to capacitive coupling. Crosstalk can lead to crosstalk-

induced delay changes or static noise.

Page 79: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 62

8.3 SI Closure Methodologies

In order to efficiently achieve SI closure certain design methodology decisions

should be made up front. They should be based on product schedule and market

requirements. SI avoidance is the most efficient way to achieve SI closure, but it

needs to be balanced against trade-offs of other design metrics such as area,

performance, and power. For example, most SI problems can be avoided by spreading

wires farther apart and reducing the ratio of coupling capacitance to grounded

capacitance.

However, if this approach is applied everywhere in the design the result is a

much larger die and increased cost. For certain critical nets, such as clocks or chip-

level buses, a practical solution could involve using wider wires, shielding with power

and ground lines, using repeaters to break up wire lengths, using different routing

layers for adjacent wires, or using 2-3X minimum spacing.

Other up-front decisions can be based around the selection of intellectual

property (IP) blocks. Ideally IP blocks should neither be noise-sensitive nor noise

sources. This applies to all forms of IP from standard cells, memories, I/Os, and

custom digital or analog cores. If an IP block is noise-sensitive or a noise source, then

early decisions can be made to protect this block—such as using guard-rings,

applying blockages to prevent over-the-block or near-block routing, spacing, or

shielding, or even selecting an alternative implementation of the same function.

All of the SI methodology choices mentioned above can be made early in the

design process. They all involve trade-offs in terms of area, performance, and

engineering schedule. They can be implemented as design methodology restrictions

and the implementation tools can be used to enforce the decisions in a correctly-

construction fashion. Creating design restrictions that minimize or eliminate certain

noise sources or noise-sensitive blocks or nets prior to implementation will greatly

enhance SI closure productivity.

Page 80: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 63

8.4 SI Prevention

A number of techniques can be used to prevent SI problems during design

creation. During placement for example, the placement can be optimized to avoid

over-congested areas. Congested areas increase the likelihood of congested wires

leading to an increase in crosstalk. Other techniques during placement include

balancing slews within the design so that there are no very fast or very slow signal

transitions. Very fast transitions when present on aggressors will lead to an increase in

crosstalk. Weakly driven nets with slow transitions are potential crosstalk victims if

there is significant coupling on these nets. Typical examples of weakly driven nets are

non-timing critical signals such as resets or scan lines. These nets tend to be long and,

consequently, subject to many potential aggressors. A noise glitch on a reset line can

cause intermittent resetting of a chip while a noise failure on the scan line will make

testing a design very problematic. Using these heuristics during placement greatly

decreases the occurrence of these types of SI failures as shown in the figure 8.2.

Figure 8.2 buffer insertion to victim & aggressor [21]

While SI prevention during placement will help reduce certain SI problems,

the main prevention effort should come during routing. As SI is inherently a wiring

problem, it has become necessary to address SI prevention as the design is being

routed. Crosstalk effects, such as glitch and delay, can only effectively be measured

when physical wires are available and final wire topology, layer selection, and track

assignments are concrete. Since placement-based and global route-based SI

Page 81: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 64

prevention solutions do not have this detailed information with which to make trade-

offs, they are only a partial solution. In the nanometre era, physical wire effects need

to be taken into account to achieve reliable timing and SI closure during the final

routing stage of the design.

The key to successful SI closure during routing comes from having native on-

the-fly incremental extraction, timing analysis, SI analysis, and optimization. This

means that potential SI problems can be addressed as they occur. A number of

prevention techniques can be employed during routing to correct SI issues:

Wire spacing, Net ordering, Layer selection to reduce coupling and resistance,

Minimizing parallel wire lengths, Shielding, Buffer insertion & Gate resizing.

8.5 SI Analysis and Repair

After SI-aware routing is complete, a full detailed extraction and analysis

should be performed to determine if there are any remaining SI problems. This

analysis should include identifying potential functional and timing problems

introduced by SI. Functionality checking should involve calculating the worst-case

potential crosstalk glitch that can occur on every wire and propagating that glitch to a

storage element such as a latch or flip-flop to determine if it will cause a stored logic

state to change. A noise failure criterion based on latching glitches, rather than noise

peak on each victim or noise rejection curves on each receiving cell, will reduce the

number of potential repairs by several orders of magnitude.

In a SI prevention-based flow that uses noise propagation as the failure

criteria, the number of potential violations found post-route should be relatively small,

typically fewer than 50 for a design with 500K instances (~2M gates) at 250Mhz

using a 130 nm process. Consequently, repairing the remaining functional noise

problems (if any) is easily achieved through automatic or even manual repair. In

contrast, if the noise failure criterion is such that thousands of functional noise

problems are reported, then the repair effort can be significant and may not converge.

Page 82: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 65

To repair glitch problems, a number of techniques can be used such as

Downsize the victim’s driver, Upsizing aggressors’ drivers, Buffer or repeater

insertion to break down crosstalk effects into smaller constituents & Spacing,

shielding or re-routing wires as shown in the figure 8.3. The key to successful

convergence on repair is to find the best solution that creates the least disturbance to

the existing design. For example, if re-routing a net to reduce coupling, the original

timing can be maintained by restricting the length of the new route to be similar to the

original route.

Figure 8.3 Adding Shielding to aggressors [21]

More challenging than fixing functional violations, however, is fixing the

impact SI has on timing. The additional delay changes caused by crosstalk increase

the degree of difficulty for achieving timing closure. First a post-route static timing

analysis of the design must be performed to determine if any new setup or hold

violations have been introduced by SI. Each new failing path needs to be re-

optimized. This timing repair process must endeavour to fix the failing paths with the

minimum of design disturbance while identifying the optimal way to regain lost time.

Fixes for timing violations can include traditional in-place optimizations as

well as crosstalk reduction techniques such as those used to repair functional

violations. To converge quickly on repairs, both functional and timing problems

should be repaired simultaneously. As each potential repair is implemented it should

Page 83: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 66

be incrementally analyzed to determine if it really fixes the problem and to ensure it

does not introduce a new timing or functional glitch.

After all repairs have been implemented, the design should be considered

closed and ready for final verification and SI sign-off.

Page 84: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 67

Chapter 9

Results

9.1 Synthesis Results

The synthesis is performed on the design asynchronous fifo.

9.1.1 Timing Reports****************************************Report : timing-path full-delay max-max_paths 1Design :aFifoVersion: D-2010.03-SP4Date : Mon Jun 3 17:42:47 2013****************************************Operating Conditions: TYPICAL Library: saed90nm_typWire Load Model Mode: topStartpoint: ReadEn_in (input port clocked by WClk)Endpoint: GrayCounter_pRd/BinaryCount_reg[3](rising edge-triggered flip-flop clocked by RClk)Path Group: RClkPath Type: maxPoint Incr Path--------------------------------------------------------------------------clockWClk (rise edge) 24.00 24.00

Page 85: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 68

clock network delay (ideal) 0.00 24.00input external delay 0.50 24.50 fReadEn_in (in) 0.00 24.50 fU642/QN (NAND2X0) 0.04 24.54 rU643/QN (INVX0) 0.13 24.67 fU639/QN (NAND2X0) 0.08 24.75 rU957/QN (INVX0) 0.05 24.79 fU637/QN (NAND2X0) 0.06 24.85 rU987/QN (NOR2X0) 0.05 24.90 fU988/QN (NOR2X0) 0.04 24.94 rU989/Q (MUX21X1) 0.08 25.02 rGrayCounter_pRd/BinaryCount_reg[3]/D (DFFX1) 0.00 25.02 rdata arrival time 25.02clockRClk (rise edge) 30.00 30.00clock network delay (ideal) 0.00 30.00clock uncertainty -4.00 26.00GrayCounter_pRd/BinaryCount_reg[3]/CLK (DFFX1) 0.00 26.00 rlibrary setup time -0.07 25.93data required time 25.93--------------------------------------------------------------------------data required time 25.93data arrival time -25.02--------------------------------------------------------------------------slack (MET) 0.91Startpoint: Empty_out_reg(rising edge-triggered flip-flop clocked by RClk)Endpoint: Empty_out (output port clocked by WClk)

Page 86: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 69

Path Group: WClkPath Type: maxPoint Incr Path------------------------------------------------------------------------------clockRClk (rise edge) 0.00 0.00clock network delay (ideal) 0.00 0.00Empty_out_reg/CLK (DFFASX2) 0.00 0.00 rEmpty_out_reg/Q (DFFASX2) 3.79 3.79 rEmpty_out (out) 0.00 3.79 rdata arrival time 3.79clockWClk (rise edge) 6.00 6.00clock network delay (ideal) 0.00 6.00clock uncertainty -0.50 5.50output external delay -0.50 5.00data required time 5.00------------------------------------------------------------------------------data required time 5.00data arrival time -3.79------------------------------------------------------------------------------slack (MET) 1.21

As shown in the report 9.1.1 the design consists of two clocks read clock and

write clock represented as RClk & WClk by which it can read data after its written.

Here the endpoint gray counter binary count reg[3] is capture flip flop which needs to

catch data before setup timing of the gray counter is clocked at the rise edge by the

Page 87: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 70

clk. Clk represents the clock path. The incr means incremental cell where it adds the

net delay from start pin to end pin. The r and f represents rise and fall delay of the

cell.

9.1.2 Data Required time and Data Arrival Time

The “data arrival time” shown in report 9.1.1 is the amount of elapsed

time from the source of the launch clock edge to the arrival of data at the

endpoint. The “data required time” shown in report 9.1.1 is the latest allowable

time for the date at the path endpoint, taking into account the nominal capture clock

edge time, the clock network delay, the clock uncertainty, the least possible delay

along the clock path, and the library setup time is taken from the library.

9.1.3 Slack

The slack value shown in report 9.1.1 is difference between data required

time and data arrival time. The slack is the amount of time by which the timing

constraint is met, considering the latest possible arrival of data at the endpoint and

the earliest possible arrival of the capture clock edge. In this example, the slack

is zero which means that the timing constraint is barely met. A negative slack

would require a change in the design to fix the violation. On the other hand, a

large positive slack offer opportunities for optimization.

9.1.4 Setup and Hold Time

Every flip-flop has restrictive time regions around the active clock edge

in which input should not change. We call them restrictive because any change

in the input would effect the output. The setup time is the interval before the clock

where the date must be held stable.

Page 88: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 71

Figure 9.1 Setup and Hold Time

As shown in figure 9.1, the timing window around the clocking event during

which the synchronous input must remain stable and unchanged in order to be

recognized. This window is defined by the setup and hold times. If either is violated

correct operation of the flip flop is not guaranteed.

The hold time is the interval after the clock where the data must be held

stable. Hold time can be negative, which means the data can change slightly before

the clock edge and still be properly captured. Most of the current flip-flops has zero

or negative hold time.

To avoid setup time violations the combinational logic between the flip-flops

should be optimized to get minimum delay, redesign the flip-flops to get lesser

setup time tweak launch flip-flop to have better slew at the clock pin, this will make

launch flip-flop to be fast there by helping fixing setup violations.

To avoid hold time violations, delays can be added (using buffers), one

can add lockup-latches (in cases where the hold time requirement is very huge,

basically to avoid data slip).

Page 89: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 72

9.1.5 Power report

****************************************Report : power-analysis_effort lowDesign :aFifoVersion: D-2010.03-SP4Date : Mon Jun 3 17:42:49 2013****************************************Library(s) Used:saed90nm_typ (File: /home/11011J6033/dcshellfinal/ref/saed90nm_typ.db)Operating Conditions: TYPICAL Library: saed90nm_typWire Load Model Mode: topGlobal Operating Voltage = 1.2Power-specific unit information :Voltage Units = 1VCapacitance Units = 1.000000pfTime Units = 1nsDynamic Power Units = 1mW (derived from V,C,T units)Leakage Power Units = 1pWCell Internal Power = 401.1208 uW (82%)Net Switching Power = 87.0481 uW (18%)--------------------Total Dynamic Power = 488.1689 uW (100%)Cell Leakage Power = 24.7386 uWThe above report 9 . 1 . 5 shows the total power consumed by the design.

The total power is sum of cell internal power and net switching power. Cell internal

Page 90: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 73

power is obtained from library for particular cells used in the design and switching

power is dissipated when charging and discharging the load capacitance at the cell

output. The units of voltage capacitance and time unit’s values are taken default

from the library.

9.1.6 QOR (Quality of results)****************************************Report :qorDesign :aFifoVersion: D-2010.03-SP4Date : Mon Jun 3 17:42:47 2013****************************************Timing Path Group 'RClk'----------------------------------------------Levels of Logic: 8.00Critical Path Length: 0.52Critical Path Slack: 0.91Critical Path Clk Period: 30.00Total Negative Slack: 0.00No. of Violating Paths: 0.00Worst Hold Violation: 0.00Total Hold Violation: 0.00No. of Hold Violations: 0.00----------------------------------------------Timing Path Group 'WClk'----------------------------------------------

Page 91: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 74

Levels of Logic: 0.00Critical Path Length: 3.79Critical Path Slack: 1.21Critical Path Clk Period: 6.00Total Negative Slack: 0.00No. of Violating Paths: 0.00Worst Hold Violation: 0.00Total Hold Violation: 0.00No. of Hold Violations: 0.00----------------------------------------------Cell Count----------------------------------------------Hierarchical Cell Count: 0Hierarchical Port Count: 0Leaf Cell Count: 551Buf/Inv Cell Count: 7CT Buf/Inv Cell Count: 0----------------------------------------------Area---------------------------------------------------------Combinational Area: 3700.889009Noncombinational Area: 3924.139912Net Area: 0.000000---------------------------------------------------------Cell Area: 7625.028921Design Area: 7625.028921Design Rules---------------------------------------------------------

Page 92: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 75

Total Number of Nets: 718Nets With Violations: 0The above report 9.1.6 describes several factors like design rules area, timing

path group and cell count for the design. This reports a QoR summary without

reporting details about timing path group. The timing path group details of cell

count, along with current design statistics such as combinational, non-

combinational, a n d total area, and the area reports are um2. Under the cell count

section the Leaf cell count report includes all leaf cells that are not constant cells.

9.2 Design Planning Results

9.2.1 Floorplan reports

Total Area 7625.028921 um2

Core utilization 0.930

Number of Rows 31

Core Width 91.84

Core Height 89.28

Aspect Ratio 0.972

Total Number of Nets 718

Total Number of Cells 551

Table 9.1 floor plan Report

Total area is the area of total cells from the netlist, core utilization is

0.930 (93%) we can take core utilization 40 80 and so on, but default value of

utilization is 70% because it becomes to 100% at the end of routing and for small

Page 93: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 76

designs core utilization can 90 to 95%. if we take 40 or 80 percent utilization, the

placement of cells is over utilized at placement stage so, default value is

70%.Based on numbers of cells in the design the number of will be decided to place

the cells. Aspect Ratio is used to build the die with the available resources.

If we take AR 0.5 the shape will be rectangular and clock structure is not

built correctly so, AR plays important role in building clock tree so.AR is 1.The

number rows specified in table 9.1 is the number of rows in which standard cells

are placed, This rows or power rails are further used to deliver power to the

placed cells and power straps attached to the cells on the core. Total no of nets and

cells are from the netlist which is obtained from synthesis stage. The units of

core utilization, core height, width and area are microns.

As shown in figure 9.2, the design after floor plan. The figure shows the

die, core and standard cells locations with indications.

Figure 9.2 Floor Planned Design

Page 94: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 77

9.2.2 Virtual Flat Placement

The virtual flat placement is very fast and is to optimized for wire

length, congestion and timing. To evaluate initial virtual flat placement some of the

tasks are done before it. During virtual flat placement no other cells will be placed

within the specified distance from the macro’s edges, if they are present. To avoid

Congesiton related issues and placement of cells this flat placement is done.

As shown in figure 9.3, the virtual flat placement of the design with

placement of cells. The placement of cells is not fixed at this stage.

Figure 9.3 Virtual Flat Placement

9.2.3 Congestion Analysis

Congestion occurs where there are a lot of chip-level or inter-block wires

that need to cross an area. For instance, interconnect between cells and I/O

Page 95: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 78

pins or memory ports will be very dependent on both the floorplan as well as

where those cells are placed within the floorplan, Global interconnect congestion

can occur even when there is low placement density. In fact, in some cases low

placement density can even cause congestion because of the need for long

connections and additional buffering. Finally, chips with a limited number of

routing layers for cost reasons can also cause global congestion.

Both Dirs: Overflow = 43 Max = 7 (1 GRCs) GRCs = 15 (1.42%)

H routing: Overflow = 4 Max = 2 (2 GRCs) GRCs = 2 (0.19%)

V routing: Overflow = 39 Max = 7 (1 GRCs) GRCs = 13 (1.23%)

The output of virtual flat placement is congestion report, the above brief

report shows that the congestion is 0.03% means that in vertical direction two cells

are in overlapped, one GRC cell is allowing more than its limitation of

wire tracks .congestion occurs during this stage because of standard cells and

macros are close together. If the congestion issues are not solved at this stage

it will increase the congestion stage at further stages of design and cause DRC

violations at final routing stage of the design.

As shown in figure 9.3, the congestion spot indicating the number

56/54, means the signals passing through this wiring track is 56, but it is allowing

54 signals. To allow rest of the signals pass through that track we have to do

incremental placement.

After virtual flat placement congestion is 1.42% means that in vertical

direction two cells are in overlapped, one GRC cell is allowing more than its

limitation of wire tracks .congestion occurs during this stage because of standard cells

and macros are close together. For instance congestion can occur in slots between

memories or around corners of memories identifying this type of congestion requires

floorplan be used as the input.If the congestion issues are not solved at this stage it

will increase the congestion stage at further stages of design and cause drc violations

Page 96: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 79

at final routing stage of the design , signal integrity issues and timing violations for

high frequency designs.

After Congesiton spot is identified we can eliminate that by incremental

placement to move some logic closer together and spread other logic farther

apart. After eliminating congestion spot the congestion report is as follows.

9.3 Power Planning Results

9.3.1 Rectangular Rings Result

After doing floor planning next step is to add rings .To supply power to

the core cells and I/O cells for the design this rings are used, basically this rings are

in rectangular in shape because chip size is rectangular in shape. The rectangular

rings type, width and number of rings are used for the design. The shape itself

indicates they are rectangular, if we have drawn rings inside square boundary

they are square rings. Current in to the core is split in to two directions.

As shown in the figure 9.4 Rectangular rings are highlighted, and metal

layers are used to design these rings are 4 and 5.

9.3.2 Power Straps Result

The straps are automatically connected to the closest ring to supply power

to cells in the core. The power straps attached to core and rings are shown in the

figure 8.8 as power straps.

Page 97: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 80

Figure 9.4 Rectangular Rings & Power Straps

9.4 Power Network Synthesis Results

After adding power straps next step is PNS (power network synthesis)

to identify the early power plan to avoid on-chip variations like IR drop and voltage

drop problems later in detailed power routing.

9.4.1 IR Drop Analysis

The power network has resistance associated with its wires. This resistance

causes a voltage drop as power is transferred from the power pads to the target

devices. To reduce the IR drop in the power grid, sufficient number of

power IO’s, decoupling capacitors, and sufficiently wide grid wires

(low resistance) are needed. IR Drop = max (Vdd − vi)· · vi is the potential a t

any node on the power grid.

Page 98: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 81

Target IR Drop 150 mv

Average Power Dissipation 2000.00 mw

Total Number of Cells 551

Average Power Current 1333.33 mA

Power Supply Voltage 1.5 V

Maximum IR drop on VDD 144.699 mV

Maximum IR drop on VSS 217.00 mV

Maximum Current 41.003 mA

Maximum Instance IR Drop U479 352.398 mV

Table 9.2 IR Drop Analysis

As shown in the table 9.2 the summary of IR drop explains about the

power analysis for the design. IR drop occurs at the L/2 distance of the

chip. If the distance is 50 um the Ir drop occurs at 25 um, because signals

have to carry information between two distances. Here Target ir drop and

power supply voltage are related each other. Average power dissipation is total

power budget of the die to supply power. U479 is the name of the net for which

maximum IR drop occurs. Maximum current is amount of current supplied to the

core.

As shown in figure 9.5 the power network synthesis, IR drop position

is identified with red occurs, rest of the colours are levels of IR drop

occurs at different areas of the chip.

Page 99: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 82

Figure.9.5 Power Network Synthesis

As shown in the table 9.2 Target IR Drop (135mv is 10% of Vdd), Less than

Maximum IR drop on instance net 352.398 mV, means IR drop is not met. If can’t

met the Target IR drop, we can met the IR drop by using these techniques

They are Increase the Mesh width and Add more no of Straps. So by increasing mesh

width here the ir drop is met. But ir drop sometimes may be due to glitches, so at after

routing stage when glitch is removed ir drop analysis need to done to meet ir drop.

9.5 Placement Results

Placement is an essential step in electronic design automation, the portion

of the physical design flow that assigns exact locations for various circuit

components within the chip’s core area. Typical placement objectives include total

wire length, power, timing and runtime minimization.

The main focus of the placement algorithm is making the chip as dense

as possible (Area constraint), minimize the total wire length (Reduce the

length for critical nets. The number of horizontal/vertical wire segments crossing a

Page 100: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 83

line. In the placement the total number of cells increases due to additional buffers

and inverters increases due to changes in the length and width of soft macro cells,

and also the net length of the cells adds to the area of the design. As shown in

figure 9.6 shows the design after placement. At this stage the cells are placed

and early power plan has made. The above colours (pink, blue) represent the

power straps, pink colour refers to VDD and Blue colour refers to VSS, are used to

supply power to the cells in the design.

Figure 9.6 Placement

Page 101: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 84

9.5.1 Timing report

****************************************Report : timing-path full-delay max-max_paths 1Design :aFifoVersion: D-2010.03-ICC-SP4Date : Tue Jun 4 13:21:36 2013***************************************** Some/all delay information is back-annotated.Operating Conditions: TYPICAL Library: saed90nm_typStartpoint: ReadEn_in (input port clocked by WClk)Endpoint: GrayCounter_pRd/BinaryCount_reg[3](rising edge-triggered flip-flop clocked by RClk)Path Group: RClkPath Type: maxPoint Incr Path--------------------------------------------------------------------------clockWClk (rise edge) 24.00 24.00clock network delay (ideal) 0.00 24.00input external delay 0.50 24.50 fReadEn_in (in) 0.00 24.50 fU642/QN (NAND2X0) 0.12 * 24.62 rU643/QN (INVX0) 0.25 * 24.87 fU639/QN (NAND2X0) 0.12 * 24.98 rU957/QN (INVX0) 0.06 * 25.05 f

Page 102: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 85

U637/QN (NAND2X0) 0.06 * 25.11 rU987/QN (NOR2X0) 0.06 * 25.17 fU988/QN (NOR2X0) 0.04 * 25.22 rU989/Q (MUX21X1) 0.08 * 25.30 rGrayCounter_pRd/BinaryCount_reg[3]/D (DFFX1) 0.00 * 25.30 rdata arrival time 25.30clockRClk (rise edge) 30.00 30.00clock network delay (ideal) 0.00 30.00clock uncertainty -4.00 26.00GrayCounter_pRd/BinaryCount_reg[3]/CLK (DFFX1) 0.00 26.00 rlibrary setup time -0.07 25.93data required time 25.93--------------------------------------------------------------------------data required time 25.93data arrival time -25.30--------------------------------------------------------------------------slack (MET) 0.63Startpoint: Full_out_reg(rising edge-triggered flip-flop clocked by WClk)Endpoint: Full_out (output port clocked by WClk)Path Group: WClkPath Type: maxPoint Incr Path------------------------------------------------------------------------clockWClk (rise edge) 0.00 0.00

Page 103: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 86

clock network delay (ideal) 0.00 0.00Full_out_reg/CLK (DFFASX2) 0.00 0.00 rFull_out_reg/Q (DFFASX2) 3.80 3.80 rFull_out (out) 0.17 * 3.97 rdata arrival time 3.97clockWClk (rise edge) 6.00 6.00clock network delay (ideal) 0.00 6.00clock uncertainty -0.50 5.50output external delay -0.50 5.00data required time 5.00-----------------------------------------------------------data required time 5.00data arrival time -3.97-----------------------------------------------------------slack (MET) 1.039.5.2 Area report

Area----------------------------------------------Combinational Area: 3700.889009Noncombinational Area: 3924.139912Net Area: 0.000000Net XLength: 6753.28Net YLength: 6021.40----------------------------------------------Cell Area: 7625.028921

Page 104: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 87

Design Area: 7625.028921Net Length: 12774.69Design Rules----------------------------------------------Total Number of Nets: 718Nets With Violations: 0

9.5.3 Power report

Report : power-analysis_effort lowDesign :aFifoVersion: D-2010.03-SP4Date : Mon Jun 3 17:42:49 2013****************************************Library(s) Used:saed90nm_typ (File: /home/11011J6033/icshellfinal/ref/saed90nm_typ.db)Operating Conditions: TYPICAL Library: saed90nm_typWire Load Model Mode: topGlobal Operating Voltage = 1.2Power-specific unit information :Voltage Units = 1VCapacitance Units = 1.000000pfTime Units = 1nsDynamic Power Units = 1mW (derived from V,C,T units)Leakage Power Units = 1pWCell Internal Power = 407.0691 uW (80%)Net Switching Power = 98.7447 uW (20%)

Page 105: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 88

---------------------------------------------------------------------Total Dynamic Power = 505.8137uW (100%)Cell Leakage Power = 34.6414uW9.5.4 Congestion report

Both Dirs: Overflow = 2028 Max = 25 (1 GRCs) GRCs = 237 (22.40%)

H routing: Overflow = 205 Max = 11 (1 GRCs) GRCs = 47 (4.44%)

V routing: Overflow = 1823 Max = 25 (1 GRCs) GRCs = 190 (17.96%)

After placement congestion has increased because before it was virtual

placement where everything placed in unorganised way but now it has to be placed

orderly so all cells are placed in congestion so it creates less space for channels to get

routed that’s why congestion has increased a lot. In order to remove congestion at

placement stage we need to create soft blockages where there is cell density more and

make utilization over there to decrease keeping in mind not to disturb timing.

Leakage power has been increased because ir drop can create many shorts or

opens in device, so many of them will be inactive which creates leakage power. and

due to shorts unnecessary transitions takes place which implies dynamic power to

increase. Operation of deign at low frequency if it happens to be at high frequency

generally it gets timing violations and signal integrity issues.

Page 106: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 89

9.6 Clock Tree Synthesis Results

After placing all blocks after the placement an actual clock is assigned to the

design and also a clock tree will be formed accordingly replacing the virtual

clock which has been in the design till now, and the clock tree bifurcates to

individual blocks. As shown in figure 9.7, the propagation of clock is highlighted.

Figure 9.7 clock tree synthesis

In Cts stage the clock changes its state form ideal to propagated so the

real time delays will be added to the design, the there will be a variation in the

slack.

9.6.1 Timing report:

****************************************Report : timing-path full-delay max-max_paths 1Design :aFifoVersion: D-2010.03-ICC-SP4

Page 107: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 90

Date : Tue Jun 4 13:42:20 2013***************************************** Some/all delay information is back-annotated.Operating Conditions: TYPICAL Library: saed90nm_typParasitic source : LPEParasitic mode : RealRCExtraction mode : MAXExtraction derating : 25/25/25Startpoint: ReadEn_in (input port clocked by WClk)Endpoint: GrayCounter_pRd/BinaryCount_reg[3](rising edge-triggered flip-flop clocked by RClk)Path Group: RClkPath Type: maxPoint Incr Path-------------------------------------------------------------------------------clockWClk (rise edge) 24.00 24.00clock network delay (ideal) 0.00 24.00input external delay 0.50 24.50 fReadEn_in (in) 0.00 24.50 fU642/QN (NAND2X0) 0.11 * 24.61 rU643/QN (INVX0) 0.23 * 24.85 fU639/QN (NAND2X0) 0.11 * 24.95 rU957/QN (INVX0) 0.06 * 25.02 fU637/QN (NAND2X0) 0.07 * 25.08 r

Page 108: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 91

U987/QN (NOR2X0) 0.06 * 25.14 fU988/QN (NOR2X0) 0.04 * 25.19 rU989/Q (MUX21X1) 0.08 * 25.27 rGrayCounter_pRd/BinaryCount_reg[3]/D (DFFX1) 0.00 * 25.27 rdata arrival time 25.27clockRClk (rise edge) 30.00 30.00clock network delay (propagated) 0.01 30.01clock uncertainty -4.00 26.01GrayCounter_pRd/BinaryCount_reg[3]/CLK (DFFX1) 0.00 26.01 rlibrary setup time -0.07 25.93data required time 25.93---------------------------------------------------------------------------------------data required time 25.93data arrival time -25.27---------------------------------------------------------------------------------------slack (MET) 0.67Startpoint: Full_out_reg(rising edge-triggered flip-flop clocked by WClk)Endpoint: Full_out (output port clocked by WClk)Path Group: WClkPath Type: max

Point Incr Path

Page 109: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 92

------------------------------------------------------------------------------clockWClk (rise edge) 0.00 0.00clock network delay (propagated) 0.54 0.54Full_out_reg/CLK (DFFASX2) 0.00 0.54 rFull_out_reg/Q (DFFASX2) 3.81 4.36 rFull_out (out) 0.17 * 4.53 rdata arrival time 4.53clockWClk (rise edge) 6.00 6.00clock network delay (ideal) 0.00 6.00clock uncertainty -0.50 5.50output external delay -0.50 5.00data required time 5.00-----------------------------------------------------------data required time 5.00data arrival time -4.53-----------------------------------------------------------slack (MET) 0.47

As explained in 9.1.1, it can see that the there is a minor change in the slack

in the implementation, this is because an actual clock is assigned to the design and

also a clock tree will be formed accordingly replacing the virtual clock. it can be

noticed that the clock network delay which was ideal in all the above steps

has changed to propagated in this step, so the network delay also comes into

picture which increases the delay slightly.

Page 110: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 93

9.7 Analyzing the Routing Results

Figure 9.8 Routed Clock Drivers with Clock Nets

9.7.1 Noise report before buffer insertion

****************************************Report : noise-verboseDesign :aFifoVersion: D-2010.03-ICC-SP4Date : Sun Jun 9 16:27:31 2013****************************************slack type: areanoise_region: above_lowpin name width height slack------------------------------------------------------------------------Full_out_reg/SETBAggressors:

Page 111: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 94

Full_out 0.21 0.43WriteEn_in 0.03 0.05n273 0.08 0.03Total: 0.21 0.43 -0.08noise_region: below_highpin name width height slack------------------------------------------------------------------------Full_out_reg/SETBAggressors:Full_out 0.20 0.42WriteEn_in 0.04 0.05n273 0.09 0.04Total: 0.20 0.42 -0.07****************************************9.7.2 Noise Report after buffer insertionReport : noise-verboseDesign :aFifoVersion: D-2010.03-ICC-SP4Date : Sun Jun 9 16:27:31 2013****************************************slack type: areanoise_region: above_lowpin name width height slack

Page 112: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 95

-----------------------------------------------------------------------Full_out_reg/SETBAggressors:Full_out 0.03 0.12WriteEn_in 0.03 0.05n273 0.04 0.01Total: 0.03 0.12 0.23noise_region: below_highpin name width height slack-----------------------------------------------------------------------Full_out_reg/SETBAggressors:Full_out 0.04 0.11WriteEn_in 0.02 0.05n273 0.03 0.02Total: 0.04 0.11 0.24

Page 113: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 96

9.7.3 Power report before buffer insertion

****************************************Report : power-analysis_effort lowDesign :aFifoVersion: D-2010.03-SP4Date : Mon Jun 3 17:42:49 2013****************************************Library(s) Used:saed90nm_typ (File: /home/11011J6033/dcshellfinal/ref/saed90nm_typ.db)Operating Conditions: TYPICAL Library: saed90nm_typWire Load Model Mode: topGlobal Operating Voltage = 1.2Power-specific unit information :Voltage Units = 1VCapacitance Units = 1.000000pfTime Units = 1nsDynamic Power Units = 1mW (derived from V,C,T units)Leakage Power Units = 1pWCell Internal Power = 408.3488 uW (79%)Net Switching Power = 108.5484 uW (21%)--------------------Total Dynamic Power = 516.8973 uW (100%)Cell Leakage Power = 23.6414 uW

Page 114: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 97

9.7.4 Power report after buffer insertion

****************************************Report : power-analysis_effort lowDesign :aFifoVersion: D-2010.03-SP4Date : Mon Jun 3 17:42:49 2013****************************************Library(s) Used:saed90nm_typ (File: /home/11011J6033/dcshellfinal/ref/saed90nm_typ.db)Operating Conditions: TYPICAL Library: saed90nm_typWire Load Model Mode: topGlobal Operating Voltage = 1.2Power-specific unit information :Voltage Units = 1VCapacitance Units = 1.000000pfTime Units = 1nsDynamic Power Units = 1mW (derived from V,C,T units)Leakage Power Units = 1pWCell Internal Power = 413.7029 uW (82%)Net Switching Power = 97.0414 uW (18%)--------------------Total Dynamic Power = 510.7444 uW (100%)Cell Leakage Power = 24.7386 uWAs shown in the 9.7.4 & 9.7.6 noise report & power report by inserting the

downsize buffer at the victim net. The noise height width decreases thus unnecessary

transitions reduce thus decreasing the dynamic power but in small design dynamic

Page 115: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 98

power doesn’t show reduction as buffer also uses dynamic buffer so at large designs

dynamic power reduction is visible. so the net switching power decreases which in

turn reduces dynamic power.

Figure 9.9 Victim net representations

Figure 9.10 Buffer additions to victim net

Page 116: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 99

By adding buffer to the victim the width and height of the buffer decreases so

the noise margin of the glitch decreases thereby when it propagates to the succeeding

cell which has higher noise margin would neglect the propagation of glitch. here the

noise threshold is 0.35 if anything exceeds this then it comes unacceptable region , so

the glitch is unacceptable and we need to eliminate it. By inserting buffer at victim net

we can decrease the coupling capacitance, thereby glitch dimension decreases.

Page 117: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 100

Chapter 10

Conclusion

A Glitch is an unwanted signal which propagates in the design without prior

knowledge. It consumes maximum dynamic switching power when it is huge number.

As the technology gets scaled down day to day the delay offered by it decreases

thereby at the propagation of two or more signals with different arrival times produce

the glitch. The methodology here doesn’t need to change the library cells or create

any library cell where changing the different arrival times by making the difference of

arrival times less than cell delay which receives the arriving signals.

This proposed methodology reduces the dynamic power when the glitch effect

is more and even ensures the correct functionality of the device by meeting time in

terms of setup & hold times. Therefore cell switching power decreases saves power.

Though the methodology doesn’t well suit for smaller designs as glitches may

decrease but due to buffer addition cell internal power increases which leads to

dynamic power incremental. Right now for less glitch effect design the proposed

methodology is best.

Page 118: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH 101

Chapter 11

Future Scope

Due to heavy scaling in technology increase in gate number, wire lengths, net

delay, ir drop, electron migration and performance of the design play a huge role in

the design implementation. The coupling capacitance and resistance along the paths.

Some amendments need to be made to improve performance with rich functionality

(noise less environment).

Glitch reduction in this proposed methodology removes or decrease glitches to

some extent by manually. There need to be automated mechanism or script which

detects and adds the buffers at victim or aggressor nets. Algorithms should be in such

a way where it has to do automatic shielding spacing for congested designs. Thus the

proposed methodology by adding automates mechanism works like charm in reducing

maximum extent of noise in the design.

Page 119: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH

Bibliography

1. Physical design Essentials by Khosrow Golshan: An Asic Implementation

Perspective 2007.

2. ASICs the Course by Michael John Sebastian Smith 1990.

3. http:asic-soc.blogspot.in/2007/10/power-planning.html

4. K. Agarwal, D. Sylvester and D. Blaauw, “Simple Metrics for Slew Rate of RC

Cir-cuits Based on Two Circuit Moments,” ACM/IEEE Design Automation

Conference,2003.

5. C. Albrecht, A. B. Kahng, B. Liu, I. I. Mandoiu and A. Z. Zelikovsky, “On

theSkew-Bounded Minimum-Buffer Routing Tree Problem,” IEEE Transactions on

CAD 22(7), 2003.

6. C. J. Alpert, F. Liu, C. Kashyap and A. Devgan, “Closed-Form Delay and Slew

Metrics Made Easy,” IEEE Transactions on CAD, vol. 23, 2004.

7. http:www.vlsisystemdesign.com/crosstalk.html

8. www.cecs.uci.edu/~papers/compendium94-03/papers/2003/.../02_5.pdf

9. http://vlsi-physical-design.blogspot.in/2011/11/floor-planning.html

10. IC compiler Implementation User guide 2010.

11. IC compiler student guide 2010.pdf

12. Design Compiler user guide 2007.pdf

13. http://iroi.seu.edu.cn/books/asics/Book2/CH01/CH01.1.htm

14. http://xine2009.blog.163.com/blog/static/163309345201211935756414/

15. http://www.csee.umbc.edu/~tinoosh/cmpe415/tutorials/FIFO.pdf

16. http://www.asic-world.com/tidbits/clock_domain.html

17. http://asicpd.blogspot.in/2012/08/logic-synthesis.html

18.

http://quartushelp.altera.com/12.0/mergedProjects/reference/glossary/def_clockskew.

Page 120: a technique to remove glitches in physical design stage

Department of ECE, JNTUHCEH

19. http://vlsi-soc.blogspot.in/2013/03/clock-skew-implication-on-timing.html

20. http://vlsi.pro/physical-design-flow-iv-routing/

21. http://w2.cadence.com/whitepapers/4496_SI_WP_Fnl.pdf