# FPGA BASED IMPLEMENTATION OF DELAY OPTIMISED DOUBLE PRECISION IEEE FLOATING-POINT ADDER

Post on 16-Apr-2017

59 views

Embed Size (px)

TRANSCRIPT

FPGA BASED IMPLEMENTATION OF DELAY OPTIMISED DOUBLE PRECISION IEEE

FLOATING-POINT ADDER

Dissertation Submitted in partial fulfillment of the requirement for the

degree of

Bachelor of Engineering In

Electrical Engineering SESSION 2009-2013

SubmittedbySomsubhraGhosh

BE(Finalyear)

UndertheSupervisionofSomnathGhosh,Scientist,ADRIN,DeptofSpace/ISRO,Govt.ofINDIA

DEPT.OFELECTRICALENGINEERINGJADAVPURUNIVERSITY

KOLKATA-700032INDIA

June,2012RollNo-000910801089Reg.No-107973of2009-10

JADAVPURUNIVERSITY|FPGABASEDIMPLEMENTATIONOFDOUBLEPRECISIONIEEEFP-ADDER 1

JADAVPUR UNIVERSITY

JADAVPURUNIVERSITY|FPGABASEDIMPLEMENTATIONOFDOUBLEPRECISIONIEEEFP-ADDER 2

JADAVPURUNIVERSITY|FPGABASEDIMPLEMENTATIONOFDOUBLEPRECISIONIEEEFP-ADDER 3

CONTENTS SECTION PAGE NO

Preface 4

Acknowledgements 5

Abstract 6

Section-I Introduction and summary 7

Section-II Notation 10

Section-III Nave FP-Adder Algorithm 11

Section-IV Optimization techniques 13

4.1 Separation of FP-Adder in parallel paths 13

4.2 Unification of significant result ranges 14

4.3 Reduction of IEEE rounding modes 15

4.4 Sign magnitude computation of a difference 16

4.5 Compound addition 17

4.6 Approximate counting of leading zeros 18

4.7 Precomputation of post normalization shift 20

Section- V Overview of present algorithm 21

Section-VI Specification and detailed description 26

Section VII Delay analysis reporting 32

Section-VIII Implementation 33

Simulation screen shots 42

Section-VIII Future Scope 48

Bibliography 49

JADAVPURUNIVERSITY|FPGABASEDIMPLEMENTATIONOFDOUBLEPRECISIONIEEEFP-ADDER 4

Preface This dissertation is a part of Bachelor of Electrical Engineering curriculum required in

partial fulfillment for the degree of B.E. Being a student of Electrical Engineering I

opted to take up a topic related to my preferred specialization. A prerequisite about

optimal Floating Point addition is essential and important for ensuring the faster and

optimal management of resources demanded by the complex circuit models

employing complex arithmetic operations. Addition and subtraction are the basis of the

arithmetic operations. So I was fascinated by the topic greatly and started my work.

Thereafter I studied the basic algorithmic steps and started to build the basic building

blocks using VHDL. Afterwards I integrated the blocks as per the requirement and

used the debugging tool for help. At the end circuit has been simulated and verified

using standard test benches.

JADAVPURUNIVERSITY|FPGABASEDIMPLEMENTATIONOFDOUBLEPRECISIONIEEEFP-ADDER 5

ACKNOWLEDGEMENT I sincerely acknowledge to my mentor Mr. Somnath Ghosh, Scientist, ADRIN. To D. Mallikharjuna. Rao, Section Head, ES&DS, Advanced Data Processing And Research Institute (ADRIN) for providing laboratory facility and other facilities. I am grateful to them for introducing me with exposure in research and development area. I am thankful to Director ADRIN and Dr. Venkatraman (GD), for allowing to carry on my project work here.

My mentor Mr. Somnath Ghosh has been a great source of encouragement throughout the project work. His appropriate guidance, critical appraisal, scholarly approach has helped me to reach my target.

It is a great pleasure to express my deep and heartiest gratitude to Sri Shashank Adimulyam, Scientist, ADRIN, for his guidance and helpful discussion. His valuable suggestions, ideas, and encouragement given in the course of this work were unforgettable. This is a reference implementation of DELAY OPTIMIZED IMPLEMENTATION OF IEEE FLOATING POINT ADDITION appeared on IEEE trans. On computers vol. 53, NO.2, February 2004.

Somsubhra Ghosh

JADAVPURUNIVERSITY|FPGABASEDIMPLEMENTATIONOFDOUBLEPRECISIONIEEEFP-ADDER 6

Abstract Hereby is presented an implementation of an IEEE double precision floating-point

adder (FP-adder) design mentioned in the IEEE publication DELAY OPTIMIZED

IMPLEMENTATION OF IEEE FLOATING POINT ADDITION authored by P.M. Seidel

and G. Even. The adder accepts normalized numbers, supports IEEE rounding mode,

and outputs the correctly normalized rounded sum/difference in the format required by

the IEEE Standard. The FP-adder design achieves a low latency by combining various

optimization techniques such as: A nonstandard separation into two paths, a simple

rounding algorithm, unification of rounding cases for addition and subtraction, sign-

magnitude computation of a difference based on ones complement subtraction,

compound adders. A technology-independent analysis and optimization of this

implementation based on the Logical Effort hardware model is done and optimal gate

sizes and optimal buffer insertion has been determined. The estimated delay of this

optimized design at 30.6 FO4 delays for double precision operands (15.3 FO4 delays

per stage between latches). It has been concluded that this algorithm has shorter

latency (-13 percent) and cycle time (-22 percent) compared to the next fastest

algorithm. Index Terms Floating-point addition, IEEE rounding, delay optimization, dual path algorithm, logical effort, optimized gate sizing, buffer insertion.

JADAVPURUNIVERSITY|FPGABASEDIMPLEMENTATIONOFDOUBLEPRECISIONIEEEFP-ADDER 7

SECTION I. INTRODUCTION AND SUMMERYFLOATING-POINT addition and subtraction are the most frequent floating-point

operations. Both operations use a floating-point adder (FP-adder). Therefore, a lot of

effort has been spent on reducing the latency of FP-adders (see [2], [9], [20], [22], [23],

[25], [27], [28], and the references that appear there). Many patents deal with FP-

adder design (ref. [6], [10], [11], [14], [15], [19], [21], [31], [32], [35]). In this dissertation

an FP-adder design is implemented that accepts normalized double precision

significands, supports IEEE rounding modes, and outputs the normalized

sum/difference that is rounded according to the IEEE FP standard 754 [13]. The

latency of this design is analyzed using the Logical Effort Model [33]. This model allows

for technology-independent delay analysis of CMOS circuits. The model enables

rigorous delay analysis that takes into account fanouts, drivers, and gate-sizing.

Following Horowitz [12], the delay of an inverter is used, the fanout of which equals 4,

as a technology-independent unit of delay. An inverter with fanout 4 is denoted by

FO4. The analysis using the Logical Effort Model shows that the delay of this FP-adder

design is 30:6 FO4 delays. This design is partitioned into two pipeline stages, the delay

of which is bounded by 15:3 FO4 delays. Extensions of the algorithm that deal with

denormal inputs and outputs are discussed in [16], [27]. It is shown there that the delay

overhead for supporting denormal numbers can be reduced to 1-2 logic levels (i.e.,

XOR delays). Several optimization techniques are employed in this algorithm. A

detailed examination of these techniques combined, enables implementation of an

overall fast FP-adder design. In particular, effective reduction of latency by parallel

paths requires balancing the delay of the paths. Such a balance is achieved by a gate-

level consideration of the design.

The optimization techniques, that has been used are included in the following -

1. A two path design with a nonstandard separation criterion. Instead of separation based

on the magnitude of the exponent difference [10], A separation criterion is defined that

also considers whether the operation is effective subtraction and the value of the

significand difference. This separation criterion maintains the advantages of the

standard two-path designs, namely, alignment shift and normalization shift take place

only in one of the paths and the full exponent difference is computed only in one path.

In addition, this separation technique requires rounding to take place only in one path.

JADAVPURUNIVERSITY|FPGABASEDIMPLEMENTATIONOFDOUBLEPRECISIONIEEEFP-ADDER 8

2. Reduction of IEEE rounding to three modes [25] and use of injection based rounding

[8].

3. A simpler design is obtained by using unconditional preshifts for effective subtractions

to reduce to 2 the number of binades that the significands sum and difference may

belong to.

4. The sign-magnitude representation of the difference of the exponents and the

significands is derived from ones complement representation of the difference.

5. A parallel-prefix adder is used to compute the sum and the incremented sum of the

significands [34].

6. Recordings are used to estimate the number of leading zeros in the nonredundant

representation of a number represented as a borrow-save number [20].

7. Postnormalization is advanced and takes place before the rounding decision is ready. Form an overview of FP-adder algorithms from technical papers and patents, the

optimization techniques that are used in each of these designs are summarized. The

algorithms from two particular implementations are also analyzed from literature in

some more detail [11], [21]. To allow for a fair comparison, the functionality of these

designs are adopted to matc

Recommended