FPGA BASED IMPLEMENTATION OF DELAY OPTIMISED DOUBLE PRECISION IEEE FLOATING-POINT ADDER

Download FPGA BASED IMPLEMENTATION OF DELAY OPTIMISED DOUBLE PRECISION IEEE FLOATING-POINT ADDER

Post on 16-Apr-2017

59 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • FPGA BASED IMPLEMENTATION OF DELAY OPTIMISED DOUBLE PRECISION IEEE

    FLOATING-POINT ADDER

    Dissertation Submitted in partial fulfillment of the requirement for the

    degree of

    Bachelor of Engineering In

    Electrical Engineering SESSION 2009-2013

    SubmittedbySomsubhraGhosh

    BE(Finalyear)

    UndertheSupervisionofSomnathGhosh,Scientist,ADRIN,DeptofSpace/ISRO,Govt.ofINDIA

    DEPT.OFELECTRICALENGINEERINGJADAVPURUNIVERSITY

    KOLKATA-700032INDIA

    June,2012RollNo-000910801089Reg.No-107973of2009-10

  • JADAVPURUNIVERSITY|FPGABASEDIMPLEMENTATIONOFDOUBLEPRECISIONIEEEFP-ADDER 1

    JADAVPUR UNIVERSITY

  • JADAVPURUNIVERSITY|FPGABASEDIMPLEMENTATIONOFDOUBLEPRECISIONIEEEFP-ADDER 2

  • JADAVPURUNIVERSITY|FPGABASEDIMPLEMENTATIONOFDOUBLEPRECISIONIEEEFP-ADDER 3

    CONTENTS SECTION PAGE NO

    Preface 4

    Acknowledgements 5

    Abstract 6

    Section-I Introduction and summary 7

    Section-II Notation 10

    Section-III Nave FP-Adder Algorithm 11

    Section-IV Optimization techniques 13

    4.1 Separation of FP-Adder in parallel paths 13

    4.2 Unification of significant result ranges 14

    4.3 Reduction of IEEE rounding modes 15

    4.4 Sign magnitude computation of a difference 16

    4.5 Compound addition 17

    4.6 Approximate counting of leading zeros 18

    4.7 Precomputation of post normalization shift 20

    Section- V Overview of present algorithm 21

    Section-VI Specification and detailed description 26

    Section VII Delay analysis reporting 32

    Section-VIII Implementation 33

    Simulation screen shots 42

    Section-VIII Future Scope 48

    Bibliography 49

  • JADAVPURUNIVERSITY|FPGABASEDIMPLEMENTATIONOFDOUBLEPRECISIONIEEEFP-ADDER 4

    Preface This dissertation is a part of Bachelor of Electrical Engineering curriculum required in

    partial fulfillment for the degree of B.E. Being a student of Electrical Engineering I

    opted to take up a topic related to my preferred specialization. A prerequisite about

    optimal Floating Point addition is essential and important for ensuring the faster and

    optimal management of resources demanded by the complex circuit models

    employing complex arithmetic operations. Addition and subtraction are the basis of the

    arithmetic operations. So I was fascinated by the topic greatly and started my work.

    Thereafter I studied the basic algorithmic steps and started to build the basic building

    blocks using VHDL. Afterwards I integrated the blocks as per the requirement and

    used the debugging tool for help. At the end circuit has been simulated and verified

    using standard test benches.

  • JADAVPURUNIVERSITY|FPGABASEDIMPLEMENTATIONOFDOUBLEPRECISIONIEEEFP-ADDER 5

    ACKNOWLEDGEMENT I sincerely acknowledge to my mentor Mr. Somnath Ghosh, Scientist, ADRIN. To D. Mallikharjuna. Rao, Section Head, ES&DS, Advanced Data Processing And Research Institute (ADRIN) for providing laboratory facility and other facilities. I am grateful to them for introducing me with exposure in research and development area. I am thankful to Director ADRIN and Dr. Venkatraman (GD), for allowing to carry on my project work here.

    My mentor Mr. Somnath Ghosh has been a great source of encouragement throughout the project work. His appropriate guidance, critical appraisal, scholarly approach has helped me to reach my target.

    It is a great pleasure to express my deep and heartiest gratitude to Sri Shashank Adimulyam, Scientist, ADRIN, for his guidance and helpful discussion. His valuable suggestions, ideas, and encouragement given in the course of this work were unforgettable. This is a reference implementation of DELAY OPTIMIZED IMPLEMENTATION OF IEEE FLOATING POINT ADDITION appeared on IEEE trans. On computers vol. 53, NO.2, February 2004.

    Somsubhra Ghosh

  • JADAVPURUNIVERSITY|FPGABASEDIMPLEMENTATIONOFDOUBLEPRECISIONIEEEFP-ADDER 6

    Abstract Hereby is presented an implementation of an IEEE double precision floating-point

    adder (FP-adder) design mentioned in the IEEE publication DELAY OPTIMIZED

    IMPLEMENTATION OF IEEE FLOATING POINT ADDITION authored by P.M. Seidel

    and G. Even. The adder accepts normalized numbers, supports IEEE rounding mode,

    and outputs the correctly normalized rounded sum/difference in the format required by

    the IEEE Standard. The FP-adder design achieves a low latency by combining various

    optimization techniques such as: A nonstandard separation into two paths, a simple

    rounding algorithm, unification of rounding cases for addition and subtraction, sign-

    magnitude computation of a difference based on ones complement subtraction,

    compound adders. A technology-independent analysis and optimization of this

    implementation based on the Logical Effort hardware model is done and optimal gate

    sizes and optimal buffer insertion has been determined. The estimated delay of this

    optimized design at 30.6 FO4 delays for double precision operands (15.3 FO4 delays

    per stage between latches). It has been concluded that this algorithm has shorter

    latency (-13 percent) and cycle time (-22 percent) compared to the next fastest

    algorithm. Index Terms Floating-point addition, IEEE rounding, delay optimization, dual path algorithm, logical effort, optimized gate sizing, buffer insertion.

  • JADAVPURUNIVERSITY|FPGABASEDIMPLEMENTATIONOFDOUBLEPRECISIONIEEEFP-ADDER 7

    SECTION I. INTRODUCTION AND SUMMERYFLOATING-POINT addition and subtraction are the most frequent floating-point

    operations. Both operations use a floating-point adder (FP-adder). Therefore, a lot of

    effort has been spent on reducing the latency of FP-adders (see [2], [9], [20], [22], [23],

    [25], [27], [28], and the references that appear there). Many patents deal with FP-

    adder design (ref. [6], [10], [11], [14], [15], [19], [21], [31], [32], [35]). In this dissertation

    an FP-adder design is implemented that accepts normalized double precision

    significands, supports IEEE rounding modes, and outputs the normalized

    sum/difference that is rounded according to the IEEE FP standard 754 [13]. The

    latency of this design is analyzed using the Logical Effort Model [33]. This model allows

    for technology-independent delay analysis of CMOS circuits. The model enables

    rigorous delay analysis that takes into account fanouts, drivers, and gate-sizing.

    Following Horowitz [12], the delay of an inverter is used, the fanout of which equals 4,

    as a technology-independent unit of delay. An inverter with fanout 4 is denoted by

    FO4. The analysis using the Logical Effort Model shows that the delay of this FP-adder

    design is 30:6 FO4 delays. This design is partitioned into two pipeline stages, the delay

    of which is bounded by 15:3 FO4 delays. Extensions of the algorithm that deal with

    denormal inputs and outputs are discussed in [16], [27]. It is shown there that the delay

    overhead for supporting denormal numbers can be reduced to 1-2 logic levels (i.e.,

    XOR delays). Several optimization techniques are employed in this algorithm. A

    detailed examination of these techniques combined, enables implementation of an

    overall fast FP-adder design. In particular, effective reduction of latency by parallel

    paths requires balancing the delay of the paths. Such a balance is achieved by a gate-

    level consideration of the design.

    The optimization techniques, that has been used are included in the following -

    1. A two path design with a nonstandard separation criterion. Instead of separation based

    on the magnitude of the exponent difference [10], A separation criterion is defined that

    also considers whether the operation is effective subtraction and the value of the

    significand difference. This separation criterion maintains the advantages of the

    standard two-path designs, namely, alignment shift and normalization shift take place

    only in one of the paths and the full exponent difference is computed only in one path.

    In addition, this separation technique requires rounding to take place only in one path.

  • JADAVPURUNIVERSITY|FPGABASEDIMPLEMENTATIONOFDOUBLEPRECISIONIEEEFP-ADDER 8

    2. Reduction of IEEE rounding to three modes [25] and use of injection based rounding

    [8].

    3. A simpler design is obtained by using unconditional preshifts for effective subtractions

    to reduce to 2 the number of binades that the significands sum and difference may

    belong to.

    4. The sign-magnitude representation of the difference of the exponents and the

    significands is derived from ones complement representation of the difference.

    5. A parallel-prefix adder is used to compute the sum and the incremented sum of the

    significands [34].

    6. Recordings are used to estimate the number of leading zeros in the nonredundant

    representation of a number represented as a borrow-save number [20].

    7. Postnormalization is advanced and takes place before the rounding decision is ready. Form an overview of FP-adder algorithms from technical papers and patents, the

    optimization techniques that are used in each of these designs are summarized. The

    algorithms from two particular implementations are also analyzed from literature in

    some more detail [11], [21]. To allow for a fair comparison, the functionality of these

    designs are adopted to matc