FPGA BASED IMPLEMENTATION OF DELAY OPTIMISED DOUBLE PRECISION IEEE FLOATING-POINT ADDER
Post on 16-Apr-2017
Embed Size (px)
FPGA BASED IMPLEMENTATION OF DELAY OPTIMISED DOUBLE PRECISION IEEE
Dissertation Submitted in partial fulfillment of the requirement for the
Bachelor of Engineering In
Electrical Engineering SESSION 2009-2013
CONTENTS SECTION PAGE NO
Section-I Introduction and summary 7
Section-II Notation 10
Section-III Nave FP-Adder Algorithm 11
Section-IV Optimization techniques 13
4.1 Separation of FP-Adder in parallel paths 13
4.2 Unification of significant result ranges 14
4.3 Reduction of IEEE rounding modes 15
4.4 Sign magnitude computation of a difference 16
4.5 Compound addition 17
4.6 Approximate counting of leading zeros 18
4.7 Precomputation of post normalization shift 20
Section- V Overview of present algorithm 21
Section-VI Specification and detailed description 26
Section VII Delay analysis reporting 32
Section-VIII Implementation 33
Simulation screen shots 42
Section-VIII Future Scope 48
Preface This dissertation is a part of Bachelor of Electrical Engineering curriculum required in
partial fulfillment for the degree of B.E. Being a student of Electrical Engineering I
opted to take up a topic related to my preferred specialization. A prerequisite about
optimal Floating Point addition is essential and important for ensuring the faster and
optimal management of resources demanded by the complex circuit models
employing complex arithmetic operations. Addition and subtraction are the basis of the
arithmetic operations. So I was fascinated by the topic greatly and started my work.
Thereafter I studied the basic algorithmic steps and started to build the basic building
blocks using VHDL. Afterwards I integrated the blocks as per the requirement and
used the debugging tool for help. At the end circuit has been simulated and verified
using standard test benches.
ACKNOWLEDGEMENT I sincerely acknowledge to my mentor Mr. Somnath Ghosh, Scientist, ADRIN. To D. Mallikharjuna. Rao, Section Head, ES&DS, Advanced Data Processing And Research Institute (ADRIN) for providing laboratory facility and other facilities. I am grateful to them for introducing me with exposure in research and development area. I am thankful to Director ADRIN and Dr. Venkatraman (GD), for allowing to carry on my project work here.
My mentor Mr. Somnath Ghosh has been a great source of encouragement throughout the project work. His appropriate guidance, critical appraisal, scholarly approach has helped me to reach my target.
It is a great pleasure to express my deep and heartiest gratitude to Sri Shashank Adimulyam, Scientist, ADRIN, for his guidance and helpful discussion. His valuable suggestions, ideas, and encouragement given in the course of this work were unforgettable. This is a reference implementation of DELAY OPTIMIZED IMPLEMENTATION OF IEEE FLOATING POINT ADDITION appeared on IEEE trans. On computers vol. 53, NO.2, February 2004.
Abstract Hereby is presented an implementation of an IEEE double precision floating-point
adder (FP-adder) design mentioned in the IEEE publication DELAY OPTIMIZED
IMPLEMENTATION OF IEEE FLOATING POINT ADDITION authored by P.M. Seidel
and G. Even. The adder accepts normalized numbers, supports IEEE rounding mode,
and outputs the correctly normalized rounded sum/difference in the format required by
the IEEE Standard. The FP-adder design achieves a low latency by combining various
optimization techniques such as: A nonstandard separation into two paths, a simple
rounding algorithm, unification of rounding cases for addition and subtraction, sign-
magnitude computation of a difference based on ones complement subtraction,
compound adders. A technology-independent analysis and optimization of this
implementation based on the Logical Effort hardware model is done and optimal gate
sizes and optimal buffer insertion has been determined. The estimated delay of this
optimized design at 30.6 FO4 delays for double precision operands (15.3 FO4 delays
per stage between latches). It has been concluded that this algorithm has shorter
latency (-13 percent) and cycle time (-22 percent) compared to the next fastest
algorithm. Index Terms Floating-point addition, IEEE rounding, delay optimization, dual path algorithm, logical effort, optimized gate sizing, buffer insertion.
SECTION I. INTRODUCTION AND SUMMERYFLOATING-POINT addition and subtraction are the most frequent floating-point
operations. Both operations use a floating-point adder (FP-adder). Therefore, a lot of
effort has been spent on reducing the latency of FP-adders (see , , , , ,
, , , and the references that appear there). Many patents deal with FP-
adder design (ref. , , , , , , , , , ). In this dissertation
an FP-adder design is implemented that accepts normalized double precision
significands, supports IEEE rounding modes, and outputs the normalized
sum/difference that is rounded according to the IEEE FP standard 754 . The
latency of this design is analyzed using the Logical Effort Model . This model allows
for technology-independent delay analysis of CMOS circuits. The model enables
rigorous delay analysis that takes into account fanouts, drivers, and gate-sizing.
Following Horowitz , the delay of an inverter is used, the fanout of which equals 4,
as a technology-independent unit of delay. An inverter with fanout 4 is denoted by
FO4. The analysis using the Logical Effort Model shows that the delay of this FP-adder
design is 30:6 FO4 delays. This design is partitioned into two pipeline stages, the delay
of which is bounded by 15:3 FO4 delays. Extensions of the algorithm that deal with
denormal inputs and outputs are discussed in , . It is shown there that the delay
overhead for supporting denormal numbers can be reduced to 1-2 logic levels (i.e.,
XOR delays). Several optimization techniques are employed in this algorithm. A
detailed examination of these techniques combined, enables implementation of an
overall fast FP-adder design. In particular, effective reduction of latency by parallel
paths requires balancing the delay of the paths. Such a balance is achieved by a gate-
level consideration of the design.
The optimization techniques, that has been used are included in the following -
1. A two path design with a nonstandard separation criterion. Instead of separation based
on the magnitude of the exponent difference , A separation criterion is defined that
also considers whether the operation is effective subtraction and the value of the
significand difference. This separation criterion maintains the advantages of the
standard two-path designs, namely, alignment shift and normalization shift take place
only in one of the paths and the full exponent difference is computed only in one path.
In addition, this separation technique requires rounding to take place only in one path.
2. Reduction of IEEE rounding to three modes  and use of injection based rounding
3. A simpler design is obtained by using unconditional preshifts for effective subtractions
to reduce to 2 the number of binades that the significands sum and difference may
4. The sign-magnitude representation of the difference of the exponents and the
significands is derived from ones complement representation of the difference.
5. A parallel-prefix adder is used to compute the sum and the incremented sum of the
6. Recordings are used to estimate the number of leading zeros in the nonredundant
representation of a number represented as a borrow-save number .
7. Postnormalization is advanced and takes place before the rounding decision is ready. Form an overview of FP-adder algorithms from technical papers and patents, the
optimization techniques that are used in each of these designs are summarized. The
algorithms from two particular implementations are also analyzed from literature in
some more detail , . To allow for a fair comparison, the functionality of these
designs are adopted to matc