project report about multipliers

Upload: senthilvl

Post on 29-Oct-2015

257 views

Category:

Documents


2 download

DESCRIPTION

multiplier about tree,array types were designed in CAD tool using verilog HDL .this simulation will be compared with simulation of vedic multipliers. finally code has been downloaded into fpga

TRANSCRIPT

  • 1

    CHAPTER 1

  • 2

    1. INTRODUCTION: 1.1 BACKGROUND:

    In todays fast technologically developing world, the shift has been towards

    construction of small and portable devices. As the number of these battery operated,

    processor driven equipments increase and their performance demand is expected to be

    more, there is a need of increasing their processing speed and reducing their power

    dissipation. In such a consumer controlled scenario, these demands mean a serious

    look into the construction of the devices. These Processors used for such purposes but

    also, in these processors, major operations such as FIR filter design, DCT, etc are

    done through multipliers. As multipliers are the major components of DSP,

    optimization in multiplier design will surely lead to a better operating DSP.

    1.2 IMPORTANCE OF MULTIPLIER:

    Computational performance of a DSP system is limited by its multiplication

    performance and since, multiplication dominates the execution time of most DSP

    algorithms therefore high-speed multiplier is much desired . Currently, multiplication

    time is still the dominant factor in determining the instruction cycle time of a DSP

    chip. With an ever-increasing quest for greater computing power on battery-operated

    mobile devices, design emphasis has shifted from optimizing conventional delay time

    area size to minimizing power dissipation while still maintaining the high

    performance . Traditionally shift and add algorithm has been implemented to design

    however this is not suitable for VLSI implementation and also from delay point of

    view. Some of the important algorithm proposed in literature for VLSI implementable

    fast multiplication is array multiplier and Wallace tree multiplier This paper presents

    the fundamental technical aspects behind these approaches. The low power and high

    speed VLSI can be implemented with different logic style. The three important

    considerations for VLSI design are power, area and delay. There are many proposed

  • 3

    logics (or) low power dissipation and high speed and each logic style has its own

    advantages in terms of speed and power.

    1.3 MULTIPLIER SCHEMES:

    There are two basic schemes in the multiplication process. They are serial

    multiplication and parallel multiplication.

    Serial Multiplication (Shift-Add)

    It Computing a set of partial products, and then summing the partial products

    together. The implementations are primitive with simple architectures (used when

    there is a lack of a dedicated hardware multiplier)

    Parallel Multiplication

    Partial products are generated simultaneously Parallel implementations are

    used for high performance machines, where computation latency needs to be

    minimized.

    Comparing these two types parallel multiplication has more advantage than the

    serial multiplication. Because the parallel type has lesser steps comparing to the serial

    multiplication. So it performs faster than the serial multiplication.

    1.4 MULTIPLIER FEATURES:

    The features of the multiplier are

    1.4.1 PIPELINING:

    Pipelining allows this multiplier to accept and start the partial process

    of multiplication of a set of data, eventhough a part of another multiplication is taking

    place.

  • 4

    1.4.2 MIXED ARCHITECTURE:

    The mixed type architecture has been considered, consisting of Wallace tree

    multiplier. This allows taking the advantage of low delay of Wallace multiplier.

    1.4.3 CLOCKING:

    Clocking has been so done as to allow the multiplier to work at its highest clock

    frequency without compromising with the perfect flow of partial products in the

    structure.

    1.4.4 DATA RANGE:

    The data range has been extended from initial 4x4 bit to 16x16 bit,which is

    actually the required working data range for many of the DSP processors.

    1.4.5 STRUCTURAL MODELLING:

    This makes sure the best implementation of the multiplier, beit on ASIC or in

    FPGA, and removes any chance of redundant hardware that may be generated.

  • 5

    CHAPTER 2

  • 6

    2.1 ADDER In electronics, an adder is a digital circuit that performs addition of numbers. In

    modern computers adders reside in the arithmetic logic unit (ALU) where other

    operations are performed. Although adders can be constructed for many numerical

    representations, such as Binary-coded decimal or excess-3, the most common adders

    operate on binary numbers. In cases where two's complement is being used to

    represent negative numbers it is trivial to modify an adder into an adder-subtracter.

    2.2 TYPES OF ADDERS

    For single bit adders, there are two general types. A half adder has two inputs,

    generally labeled A and B, and two outputs, the sum S and carry C. S is the two-bit

    XOR of A and B, and C is the AND of A and B. Essentially the output of a half adder

    is the sum of two one-bit numbers, with C being the most significant of these two

    outputs.The second type of single bit adder is the full adder. The full adder takes into

    account a carry input such that multiple adders can be used to add larger numbers. To

    removeambiguity between the input and output carry lines, the carry in is labeled Ci

    or Cin while the carry out is labeled Co or Cout.

    Half adder

    Fig 1: Half adder circuit diagram

    A half adder is a logical circuit that performs an addition operation on two

    binary digits. The half adder produces a sum and a carry value which are both binary

    digits.

  • 7

    Following is the logic table for a half adder:

    TABLE 1: HALFADDER

    A B C S

    0 0 0 0

    0 1 0 1

    1 0 0 1

    1 1 1 0

    Fig 2: Full adder circuit diagram

  • 8

    Schematic symbol for a 1-bit full adder

    A full adder is a logical circuit that performs an addition operation on three

    binary digits. The full adder produces a sum and carries value, which are both binary

    digits. It can be combined with other full adders (see below) or work on its own.

    TABLE 2: FULL ADDER

    A B Ci C0 S

    0 0 0 0 0

    0 0 1 0 1

    0 1 0 0 1

    0 1 1 1 0

    1 0 0 0 1

    1 0 1 1 0

    1 1 0 1 0

    1 1 1 1 1

    Note that the final OR gate before the carry-out output may be replaced by an

    XOR gate without altering the resulting logic. This is because the only discrepancy

  • 9

    between OR and XOR gates occurs when both inputs are 1; for the adder shown here,

    one can check this is never possible. Using only two types of gates is convenient if

    one desires to implement the adder directly using common IC chips. A full adder can

    be constructed from two half adders by connecting A and B to the input of one half

    adder, connecting the sum from that to an input to the second adder, connecting Ci to

    the other input and or the two carry outputs. Equivalently, S could be made the three-

    bit xor of A, B, and Ci and Co could be made the three-bit majority function of A, B,

    and Ci. The output of the full adder is the two-bit arithmetic sum of three one-bit

    numbers.

  • 10

    CHAPTER 3

  • 11

    LITRETURE SURVEY

    3.1 BASIC MULTIPLIER ARCHITECTURES:

    3.1.1 INTRODUCTION:

    Basic multiplier consists ANDed terms (as shown in Fig 1) and array of full

    adders and/or half adders arranged so as to obtain partial products at each level. These

    partial products are added along to obtain the final result. It is the different

    arrangement and the construction changes in these adders that lead to various type of

    structures of basic multipliers.

    Fig 3: AND gate

    Full Adder (FA)implementation is showing the two bits(A,B) and Carry In (Ci)

    as inputs and Sum (S) and Carry Out (Cout) as outputs.

  • 12

    3.2 BINARY MULTIPLIER

    A Binary multiplier is an electronic hardware device used in digital electronics

    or a computer or other electronic device to perform rapid multiplication of two

    numbers in binary representation. It is built using binary adders.

    The rules for binary multiplication can be stated as follows

    1. If the multiplier digit is a 1, the multiplicand is simply copied down and

    represents the product.

    2. If the multiplier digit is a 0 the product is also 0.

    For designing a multiplier circuit we should have circuitry to provide or do the

    following three things:

    1. it should be capable identifying whether a bit is 0 or 1.

    2. It should be capable of shifting left partial products.

    3. It should be able to add all the partial products to give the products as sum of

    partial products.

    4. It should examine the sign bits. If they are alike, the sign of the product will

    be a positive, if the sign bits are opposite product will be negative. The sign

    bit of the product stored with above criteria should be displayed along with the

    product.

    From the above discussion we observe that it is not necessary to wait until all

    the partial products have been formed before summing them. In fact the addition of

    partial product can be carried out as soon as the partial product is formed.

    Notations:

    a multiplicand

    b multiplier

    p product

  • 13

    Binary multiplication (eg n=4)

    P = ab

    an1 an2a1a0

    bn1 bn2b1b0

    p2 n1 p2 n2p1 p0

    x x x x a (Multiplicant)

    x x x x b (Multiplier)

    ---------

    x x x x b0a20

    x x x x b1a21 (Partial Product)

    x x x x b2a22

    x x x x b3a23

    ---------------

    x x x x x x x x p (Partial Sum)

    3.2.1 BASIC HARDWARE MULTIPLIER

    Partial products In binary, the partial products are trivial- If multiplier

    bit=1,copy the multiplicand Else 0 Use an AND gate.

  • 14

    3.2.2 MULTIPLY ACCUMULATE CIRCUITS

    Multiplication followed by accumulation is a operation in many digital systems,

    particularly those highly interconnected like digital filters, neural networks, data

    quantizers, etc. One typical MAC(multiply-accumulate) architecture is illustrated in

    figure. It consists of multiplying 2 values, then adding the result to the previously

    accumulated value, which must then be restored in the registers for future

    accumulations. Another feature of MAC circuit is that it must check for overflow,

    which might happen when the number of MAC operation is large . This design can be

    done using component because we have already design each of the units shown in

    figure. However since it is relatively simple circuit, it can also be designed directly. In

    any case the MAC circuit, as a whole, can be used as a component in application like

    digital filters and neural networks.

    3.3 WALLACE TREE MULTIPLIER:

    A Wallace tree is an efficient hardware implementation of a digital circuit that

    multiplies two integers. For a NxN bit multiplication, partial products are formed

    from (N^2)AND gates. Next N rows of the partial products are grouped together in set

    of three rows each. Any additional rows that are not a member of these groups are

    transferred to the next level without modification. For a column consisting of three

  • 15

    partial products and a full adder is used with the sum dropped down to the same

    column whereas the carry out is brought to the next higher column. For column with

    two partial products, a half adder is used in place of full adder. At the final stage, a

    carry propagation adder is used to add over all the propagating carries to get the final

    result. It can also be implemented using Carry Save Adders. Sometimes it will be

    Combined with Booth Encoding.Various other researches have been done to reduce

    the number of adders, for higher order bits such as 16 & 32.Applications, as the use in

    DSP for performing FFT,FIR, etc.,

    3.3.1 WALLACE TREE HARDWARE ARCHITECTURE:

    Fig 4: wallace tree hardware architecture

  • 16

    3.3.2 FUNCTION:

    The Wallace tree has three steps:

    Multiply (that is - AND) each bit of one of the arguments, by each bit of the

    other, yielding n2 results. Depending on position of the multiplied bits, the wires

    carry different weights, for example wire of bit carrying result of a2b3 is 32.

    Reduce the number of partial products to two by layers of full and half adders.

    Group the wires in two numbers, and add them with a conventional adder.

    3.3.3 EXAMPLE:

    Suppose two numbers are being multiplied:

    a3a2a1a0 X

    b3b2b1b0

    ___________________________________

    a3b0 a2b0 a1b0 a0b0

    a3b1 a2b1 a1b1 a0b1

    a3b2 a2b2 a1b2 a0b2

    a3b3 a2b3 a1b3 a0b3

    _____________________________________

    Arranging the partial products in the form of tree structure

    a3b3 a2b3 a1b3 a0b3 a0b2 a0b1 a0b0

    a3b2 a2b2 a1b2 a1b1 a1b0

    a3b1 a2b1 a2b0

    a3b0

  • 17

    3.3.4 ADDER ELEMENTS

    Half Adder:

  • 18

    Full Adder:

    3.3.5 ADVANTAGES:

    Each layer of the tree reduces the number of vectors by a factor of 3:2

    Minimum propagation delay.

    The benefit of the Wallace tree is that there are only O(log n) reduction layers,

    but adding partial products with regular adders would require O(log n)2 time.

    3.3.6 DISADVANTAGES:

    Wallace trees do not provide any advantage over ripple adder trees in many FPGAs.

    Due to the irregular routing, they may actually be slower and are certainly more difficult to route.

    Adder structure increases for increased bit multiplication

    .

  • 19

    CHAPTER 4

  • 20

    4. ARRAY MULTIPLIER:

    This is the most basic form of binary multiplier construction. Its basic principle

    is exactly like that done by pen and paper. It consists of a highly regular array of full

    adders, the exact number depending on the length of the binary number to be

    multiplied. Each row of this array generates a partial product. This partial product

    generated value is then added with the sum and carry generated on the next row. The

    final result of the multiplication is obtained directly after the last row. ANDed terms

    generated using logic AND gate. Full Adder (FA) implementation showing the two

    bits(A,B) and Carry In (Ci) as inputs and Sum (S) and Carry Out (Co) as outputs.

    4.1 HARDWARE ARCHITECTURE

    Fig 5: Hardware architecture

  • 21

    4.2 EXAMPLE :

    4*4 bit multiplication

    a3 a2 a1 a0

    b3 b2 b1 b0

    a3b0 a2b0 a1b0 a0b0

    a3b1 a2b1 a1b1 a0b1

    a3b2 a2b2 a1b2 a0b2

    a3b3 a3b2 a3b1 a3b0

    p7 p6 p5 p4 p3 p2 p1 p0

    4.3 PRINCIPLES OF ARRAY MULTIPLIER:

    Fig 6: Array multiplier

  • 22

    Due to the highly regular structure, array multiplier is very easily constructed

    and also can be densely implemented in VLSI, which takes less space. But compared

    to other multiplier structures proposed later, it shows a high computational time. In

    fact, the computational time is of order of log O(N), one of the highest in any

    multiplier structure.

    4.4 BAUGH-WOOLEY MULTIPLIER :

    Baugh-Wooley Multiplier are used for both unsigned and signed

    number multiplication. Signed Number operands which are represented in 2s

    complemented form. Partial Products are adjusted such that negative sign

    move to last step, which in turn maximize the regularity o f the

    multip lic atio n array. Baugh-Wo o ley Multip lier operates on signed

    operands with 2s complement representation to make sure that the signs of

    all partial products are positive. To reiterate, the numerical value of 2s

    complement numbers, suppose X and Y can be obtained from following product

    terms made of one AND gate.

    Variables with bars denotes prior inversions. Inverters are connected

    before the input of the full adder or the AND gates as required by the algorithm.

  • 23

    Each column represents the addition in accordance with the respective weight of the

    product term.

    4.5 BAUGH-WOOLEY HARDWARE ARCHITECTURE:

    Fig7: Signed 2s-Complement baugh wooley multiplier

    4.6 MULTIPLING TWOS COMPLIMENT NUMPERS:

    The Baugh-wooley multiplication algorithm is an efficient way to handle the

    sign bits.This technique has been developed in order to design regular multipliers that

    is suited for 2s-complement numbers.Dr.Gebali has extended this basic idea and

    developed efficient fast inner product processors capable of performing double-

    precision multiply-accumulate operations without the speed penalty.Let us consider

    two n-bit numbers,A and B,to be multiplied. A and B can be represented as

  • 24

    Where the ais and bis are the bits in A and B, respectively, and an-1 and

    bn-1 are the sign bits.

    The product, P = A * B, is then given by the following equation:

    It indicates that the final product is obtained by subtracting the last two positive terms

    from the first two terms.

  • 25

    4.7 BLOCK DIAGRAM OF A 4*4 BAUGH-WOOLEY MULTIPLIER:

    Fig8: Block diagram of baugh wooley multiplier

    4.8 ADVANTAGES:

    Minimum complexity.

    Easily scalable.

    Easily pipelined.

    Regular shape, easy to place & route.

    4.9 DISADVANTAGES:

    High power consumption.

    More digital gates resulting in large chip area.

  • 26

    CHAPTER 5

  • 27

    5.1 PROPOSED MULTIPLIER DESIGN

    Mathematics is a mother of all sciences. Mathematics is full of magic and

    mysteries. The ancient Indians were able to understand these mysteries and develop

    simple keys to solve these mysteries. Thousands of years ago the Indians used these

    techniques in different fields like construction of temples, astrology, medicine, science

    etc., due to which Indian emerged as the richest country in the world. The Indians

    called this system of calculations as The vedic mathematics. Vedic Mathematics is

    much simpler and easy to understand than conventional mathematics. The ancient

    system of Vedic Mathematics was reintroduced to the world by Swami Bharati

    Krishna Tirthaji Maharaj, Shan-karacharya of Goverdhan Peath. Vedic Mathematics

    was the name given by him. Bharati Krishna, who was himself a scholar of Sanskrit,

    Mathematics, History and Philosophy, was able to reconstruct the mathematics of the

    Vedas. According to his re-search all of mathematics is based on sixteen Sutras, or

    word-formulae and thirteen sub-sutras. According to Mahesh Yogi, The sutras of

    Vedic Mathematics are the software for the cosmic computer that runs this universe.

    Vedic Mathematics introduces the wonderful applications to Arithmetical

    computations, theory of numbers, compound multiplications, algebraic operations,

    factorizations, simple quadratic and higher order equations, simultaneous quadratic

    equations, partial fractions, calculus, squaring, cubing, square root, cube root,

    coordinate geometry and wonderful Vedic Numerical code. Conventional

    mathematics is an integral part of engineering education since most engineering

    system designs are based on various mathematical approaches. All the leading

    manufacturers of microprocessors have developed their architectures to be suit-able

    for conventional binary arithmetic methods. The need for faster processing speed is

    continuously driving major improvements in processor technologies, as well as the

    search for new algorithms. The Vedic mathematics approach is totally different and

    considered very close to the way a human mind works. A multiplier is one of the key

  • 28

    hardware blocks in most of applications such as digital signal processing , encryption

    and decryption algorithms in cryptography and in other logical computations. With

    advances in technology, many researchers have tried to design multipliers which offer

    either of the following high speed, low power consumption, regularity of layout and

    hence less area or even combination of them in multiplier. The Vedic multiplier is

    considered here to satisfy our requirements. In this work, we present multiplication

    operations based on Urdhva tiryagbhyam in binary, designed using a new proposed 4-

    bit adder and implemented in HDL language. The paper is organized as follows.

    Vedic multiplication method based on Urdhva Tiryagbhyam sutra for binary numbers

    is discussed. A new 4-bit adder is proposed and deals with the design and

    implementation of the above said multiplier. Finally,summarizes the experimental

    results obtained, with this conclusions of the work.

    5.2 VEDIC MULTIPLIER:

    Digital signal processors (DSPs) are very important in various engineering

    disciplines. Fast multiplication is very important in DSPs for convolution, Fourier

    transforms etc. A fast method for multiplication based on ancient Indian Vedic

    mathematics is proposed in this work. Among the various methods of multiplications

    in Vedic mathematics, Urdhva tiryakbhyam is discussed in detail. Urdhva

    tiryakbhyam is a general multiplication formula applicable to all cases of

    multiplication. This algorithm is applied to digital arithmetic and multiplier

    architecture is formulated. This is a highly modular design in which smaller blocks

    can be used to build higher blocks. The coding is done in Verilog HDL and synthesis

    is done using Altera Quartus-II. The combinational delay obtained after synthesis is

    compared with the performance of the Baugh wooley and Wallace tree multiplier

    which are fast multiplier. This Vedic multiplier can bring about great improvement in

    DSP performance.

  • 29

    5.3 IMPORTANCE OF VEDIC MATHEMATICS:

    Among the various methods of multiplication in Vedic mathematics, Urdhva

    tiryagbhyam, being a general multiplication formula, is equally applicable to all cases

    of multiplication. This is more efficient in the multiplication of large numbers with

    respect to speed and area. From this work, a 4 X 4 binary multiplier is designed using

    this sutra. This multiplier can be used in applications such as digital signal processing,

    encryption and decryption algorithms in cryptography, and in other logical

    computations. This design is implemented in Verilog HDL.

    5.4 Urdhva Tiryakbhyam Sutra

    The given Vedic multiplier based on the Vedic multiplication formulae (Sutra).

    This Sutra has been traditionally used for the multiplication of two numbers. Urdhva

    Tiryakbhyam Sutra is a general multiplication formula applicable to all cases of

    multiplication. It means Vertically and Crosswise . The digits on the two ends of the

    line are multiplied and the result is added with the previous carry. When there are

    more lines in one step, all the results are added to the previous carry. The least

    significant digit of the number thus obtained acts as one of the result digits and the

    rest act as the carry for the next step. Initially the carry is taken to be as zero. The line

    diagram for multiplication of two 4-bit numbers is as shown in figure.

  • 30

    To illustrate this multiplication scheme, let us consider the multiplication of two

    decimal numbers (325 * 728). Line diagram for the multiplication is shown in Figure.

    The digits on the two ends of the line are multiplied and the result is added with the

    previous carry. When there are more lines in one step, all the results are added to the

    previous carry. The least significant digit of the number thus obtained acts as one of

    the result digits and the rest act as the carry for the next step. Initially the carry is

    taken to be zero

    Fig 9: Multiplication of two decimal numbers by Urdhva tiryagbhyam sutra

    Urdhva tiryagbhyam Sutra is used for two decimal numbers multiplication .

    This Sutra is used in binary multiplication as shown in Figure . The 4-bit binary

    numbers to be multiplied are written on two consecutive sides of the square as shown

    in the figure. The square is divided into rows and columns where each row/column

    corresponds to one of the digit of either a multiplier or a multiplicand. Thus, each bit

    of the multiplier has a small box common to a digit of the multiplicand. Each bit of

    the multiplier is then independently multiplied (logical AND) with every bit of the

  • 31

    multiplicand and the product is written in the common box. All the bits lying on a

    crosswise dotted line are added to the previous carry. The least significant bit of the

    obtained number acts as the result bit and the rest as the carry for the next step. Carry

    for the first step (i.e., the dotted line on the extreme right side) is taken to be zero. We

    can extend this method for higher order binary numbers.

    Fig 10: Multiplication of two 4-bit binary numbers by Urdhva tiryagbhyam sutra

    Now we will extend this Sutra to binary number system. For the multiplication

    algorithm, let us consider the multiplication of two 8 bit binary numbers

    A7A6A5A4A3A2A1A0 and B7B6B5B4B3B2B1B0. As the result of this

    multiplication would be more than 8 bits, we express it as R7R6R5R4R3R2R1R0.

    As in the last case, the digits on the both sides of the line are multiplied and added

    with the carry from the previous step. This generates one of the bits of the result and a

    carry. This carry is added in the next step and hence the process goes on. If more

    than one lines are there in one step, all the results are added to the previous carry. In

    each step, least significant bit acts as the result bit and all the other bits act as carry.

  • 32

    For example, if in some intermediate step, we will get 011, then1 will act as result bit

    and 01 as the carry. Thus we will get the following expressions

    R0=A0B0

    C1R1=A0B1+A1B0

    C2R2=C1+A0B2+A2B0+A1B1

    C3R3=C2+A3B0+A0B3+A1B2+A2B1

    C4R4=C3+A4B0+A0B4+A3B1+A1B3+A2B2

    C5R5=C4+A5B0+A0B5+A4B1+A1B4+A3B2+A2B3

    C6R6=C5+A6B0+A0B6+A5B1+A1B5+A4B2+A2B4 +A3B3

    C7R7=C6+A7B0+A0B7+A6B1+A1B6+A5B2+A2B5 +A4B3+A3B4

    C8R8=C7+A7B1+A1B7+A6B2+A2B6+A5B3+A3B5+A4B4

    C9R9=C8+A7B2+A2B7+A6B3+A3B6+A5B4 +A4B5

    C10R10=C9+A7B3+A3B7+A6B4+A4B6+A5B5

    C11R11=C10+A7B4+A4B7+A6B5+A5B6

    C12R12=C11+A7B5+A5B7+A6B6

    C13R13=C12+A7B6+A6B7

    C14R14=C13+A7B7

    C14R14R13R12R11R10R9R8R7R6R5R4R3R2R1R0 being the final product.

    Hence this is the general mathematical formula applicable to all cases of

    multiplication. All the partial products are calculated in parallel and the delay

    associated is mainly the time taken by the carry to propagate through the adders which

    form the multiplication array. So, this is not an efficient algorithm for the

    multiplication of large numbers as a lot of propagation delay will be involved in such

    cases. To overcome this problem, Nikhilam Sutra will present an efficient method of

    multiplying two large numbers.

  • 33

    5.5 THE MULTIPLIER ARCHITECTURE:

    The multiplier architecture is based on this Urdhva tiryakbhyam sutra. The

    advantage of this algorithm is that partial products and their sums are calculated in

    parallel. This parallelism makes the multiplier clock independent. The other main

    advantage of this multiplier as compared to other multipliers is its regularity. Due to

    this modular nature the lay out design will be easy. The architecture can be explained

    with two four bit numbers i.e. the multiplier and multiplicand are four bit numbers.

    The multiplicand and the multiplier are split into four bit blocks. The four bit blocks

    are again divided into two bit multiplier blocks. According to the algorithm the 4 x 4

    (AxB) bit multiplication will be as follows

    A = AH - AL, B = BH - BL

    A = A3A2A1A0

    B = B3B2B1B0

    AH = A3A2, AL = A1A0

    BH = B3B2, BL = B1B0

  • 34

    5.6 METHODOLOGY:

    Fig 11: vedic Algorithm

    By the algorithm, the product can be obtained as follows.

    Product of A x B = AL x BL + AH x BL + AL x BH + AH x BH

    The parallel multiplications:-

  • 35

    The 4 x 4 bit multiplication can be again reduced to 2 x 2 bit multiplications. The 4 bit

    multiplicand and the multiplier are divided into two-bit blocks.

    AH = AHH - AHL

    BH = BHH - BHL

    AH x BH = AHL x BHL + AHH x BHL + AHL x BHH + AHH x BHH

    Here the parallel multiplications are

  • 36

    5.7 ADVANTAGE OF VEDIC METHODS

    The use of Vedic mathematics lies in the fact that it reduces the typical

    calculations in conventional mathematics to very simple ones. This is so because the

    Vedic formulae are claimed to be based on the natural principles on which the human

    mind works . Vedic Mathematics is a methodology of arithmetic rules that allow more

    efficient speed implementation . This is a very interesting field and presents some

    effective algorithms which can be applied to various branches of engineering such as

    computing.

  • 37

    CHAPTER 6

  • 38

    6.1 VERILOG LANGUAGE

    6.1.1 Introduction of Verilog HDL

    Verilog HDL has evolved as a standard hardware description language. Verilog

    HDL offers many useful features.Verilog HDL is a general-purpose hardware

    description language that is easy to learn and easy to use. It is similar in syntax to the

    C programming language. Designers with C programming experience will find it easy

    to learn Verilog HDL. Verilog HDL allows different levels of abstraction to be mixed

    in the same model.Thus, a designer can define a hardware model in terms of switches,

    gates, RTL, or behavioral code. Also, a designer needs to learn only one language for

    stimulus and hierarchical design. Most popular logic synthesis tools support Verilog

    HDL. This makes it the language of choice for designers. All fabrication vendors

    provide Verilog HDL libraries for postlogic synthesis simulation. Thus, designing a

    chip in Verilog HDL allows the widest choice of vendors. The Programming

    Language Interface (PLI) is a powerful feature that allows the user to write custom C

    code to interact with the internal data structures of Verilog. Designers can customize a

    Verilog HDL simulator to their needs with the PLI.

    6.2 Importance of HDLs

    HDLs have many advantages compared to traditional schematic-based design.

    Designs can be described at a very abstract level by use of HDLs. Designers can write

    their RTL description without choosing a specific fabrication technology. Logic

    synthesis tools can automatically convert the design to any fabrication technology. If a

    new technology emerges, designers do not need to redesign their circuit. They simply

    input the RTL description to the logic synthesis tool and create a new gate-level

    netlist, using the new fabrication technology. The logic synthesis tool will optimize

    the circuit in area and timing for the new technology. By describing designs in HDLs,

  • 39

    functional verification of the design can be done early in the design cycle. Since

    designers work at the RTL level, they can optimize and modify the RTL description

    until it meets the desired functionality. Most design bugs are eliminated at this point.

    This cuts down design cycle time significantly because the probability of hitting a

    functional bug at a later time in the gate-level netlist or physical layout is minimized.

    Designing with HDLs is analogous to computer programming. A textual description

    with comments is an easier way to develop and debug circuits. This also provides a

    concise representation of the design, compared to gate-level schematics. Gate-level

    schematics are almost incomprehensible for very complex designs. HDL-based design

    is here to stay. With rapidly increasing complexities of digital circuits and

    increasingly sophisticated EDA tools, HDLs are now the dominant method for large

    digital designs. No digital circuit designer can afford to ignore HDL-based

    design.New tools and languages focused on verification have emerged in the past few

    years. These languages are better suited for functional verification. However, for logic

    design, HDLs continue as the preferred choice.

    6.3 Trends in HDLs

    The speed and complexity of digital circuits have increased rapidly. Designers

    have responded by designing at higher levels of abstraction. Designers have to think

    only in terms of functionality. EDA tools take care of the implementation details.

    With designer assistance, EDA tools have become sophisticated enough to achieve a

    close to optimum implementation. The most popular trend currently is to design in

    HDL at an RTL level, because logic synthesis tools can create gatelevel netlists from

    RTL level design. Behavioral synthesis allowed engineers to design directly in terms

    of algorithms and the behavior of the circuit, and then use EDA tools to do the

    translation and optimization in each phase of the design. However, behavioral

    synthesis did not gain widespread acceptance. Today, RTL design continues to be

  • 40

    very popular. Verilog HDL is also being constantly enhanced to meet the needs of

    new verification methodologies. Formal verification and assertion checking

    techniques have emerged. Formal verification applies formal mathematical

    techniques to verify the correctness of Verilog HDL descriptions and to establish

    equivalency between RTL and gate-level netlists. However the need to describe a

    design in Verilog HDL will not go away. Assertion checkers allow checking to be

    embedded in the RTL code. This is a convenient way to do checking in the most

    important parts of a design. New verification languages have also gained rapid

    acceptance. These languages combine the parallelism and hardware constructs from

    HDLs with the object oriented nature of C++.

    These languages also provide support for automatic stimulus creation, checking,

    and coverage. However, these languages do not replace Verilog HDL. They simply

    boost the productivity of the verification process. Verilog HDL is still needed to

    describe the design. For very high-speed and timing-critical circuits like

    microprocessors, the gate-level netlist provided by logic synthesis tools is not optimal.

    In such cases, designers often mix gate-level description directly into the RTL

    description to achieve optimum results. This practice is opposite to the high-level

    design paradigm, yet it is frequently used for highspeed designs because designers

    need to squeeze the last bit of timing out of circuits, and EDA tools sometimes prove

    to be insufficient to achieve the desired results. Another technique that is used for

    system-level design is a mixed bottom-up methodology where the designers use either

    existing Verilog HDL modules, basic building blocks, or vendor-supplied core blocks

    to quickly bring up their system simulation. This is done to reduce development costs

    and compress design schedules. For example, consider a system that has a CPU,

    graphics chip, I/O chip, and a system bus. The CPU designers would build the next-

    generation CPU themselves at an RTL level, but they would use behavioral models for

  • 41

    the graphics chip and the I/O chip and would buy a vendor-supplied model for the

    system bus. Thus, the system-level simulation for the CPU could be up and running

    very quickly and long before the RTL descriptions for the graphics chip and the I/O

    chip are complete.

  • 42

    CHAPTER 7

  • 43

    7.1 INTRODUCTION OF FPGA:

    A field-programmable gate array (FPGA) is an integrated circuit designed to

    be configured by a customer or a designer after manufacturing hence "field-

    programmable". The FPGA configuration is generally specified using a hardware

    description language (HDL), similar to that used for an application-specific integrated

    circuit (ASIC) (circuit diagrams were previously used to specify the configuration, as

    they were for ASICs, but this is increasingly rare). Contemporary FPGAs have large

    resources of logic gates and RAM blocks to implement complex digital computations.

    As FPGA designs employ very fast IOs and bidirectional data buses it becomes a

    challenge to verify correct timing of valid data within setup time and hold time. Floor

    planning enables resources allocation within FPGA to meet these time constraints.

    FPGAs can be used to implement any logical function that an ASIC could perform.

    The ability to update the functionality after shipping, partial re-configuration of a

    portion of the design and the low non-recurring engineering costs relative to an ASIC

    design (notwithstanding the generally higher unit cost), offer advantages for many

    applications.

    FPGAs contain programmable logic components called "logic blocks", and a

    hierarchy of reconfigurable interconnects that allow the blocks to be "wired

    together"somewhat like many (changeable) logic gates that can be inter-wired in

    (many) different configurations. Logic blocks can be configured to perform complex

    combinational functions, or merely simple logic gates like AND and XOR. In most

    FPGAs, the logic blocks also include memory elements, which may be simple flip-

    flops or more complete blocks of memory.

    Some FPGAs have analog features in addition to digital functions. The most

    common analog feature is programmable slew rate and drive strength on each output

    pin, allowing the engineer to set slow rates on lightly loaded pins that would otherwise

  • 44

    ring unacceptably, and to set stronger, faster rates on heavily loaded pins on high-

    speed channels that would otherwise run too slow.[ Another relatively common analog

    feature is differential comparators on input pins designed to be connected to

    differential signaling channels. A few "mixed signal FPGAs" have integrated

    peripheral analog-to-digital converters (ADCs) and digital-to-analog converters

    (DACs) with analog signal conditioning blocks allowing them to operate as a system-

    on-a-chip.[Such devices blur the line between an FPGA, which carries digital ones and

    zeros on its internal programmable interconnect fabric, and field-programmable

    analog array (FPAA), which carries analog values on its internal programmable

    interconnect fabric.

    7.2 FPGA architecture:

    The most common FPGA architecture consists of an array of logic blocks

    (called Configurable Logic Block, CLB, or Logic Array Block, LAB, depending on

    vendor), I/O pads, and routing channels. Generally, all the routing channels have the

    same width (number of wires). Multiple I/O pads may fit into the height of one row or

    the width of one column in the array.

    An application circuit must be mapped into an FPGA with adequate resources.

    While the number of CLBs/LABs and I/Os required is easily determined from the

    design, the number of routing tracks needed may vary considerably even among

    designs with the same amount of logic. For example, a crossbar switch requires much

    more routing than a systolic array with the same gate count. Since unused routing

    tracks increase the cost (and decrease the performance) of the part without providing

    any benefit, FPGA manufacturers try to provide just enough tracks so that most

    designs that will fit in terms of Lookup tables (LUTs) and IOs can be routed. This is

    determined by estimates such as those derived from Rent's rule or by experiments with

    existing designs.

  • 45

    In general, a logic block (CLB or LAB) consists of a few logical cells (called

    ALM, LE, Slice etc.). A typical cell consists of a 4-input LUT, a Full adder (FA) and a

    D-type flip-flop, as shown below. The LUTs are in this figure split into two 3-input

    LUTs. In normal mode those are combined into a 4-input LUT through the left mux.

    In arithmetic mode, their outputs are fed to the FA. The selection of mode is

    programmed into the middle multiplexer. The output can be either synchronous or

    asynchronous, depending on the programming of the mux to the right, in the figure

    example. In practice, entire or parts of the FA are put as functions into the LUTs in

    order to save space.

    Fig 12: FPGA architecture

    ALMs and Slices usually contains 2 or 4 structures similar to the example figure, with

    some shared signals. CLBs/LABs typically contains a few ALMs/LEs/Slices. In

    recent years, manufacturers have started moving to 6-input LUTs in their high

    performance parts, claiming increased performance. Since clock signals (and often

  • 46

    other high fan out signals) are normally routed via special-purpose dedicated routing

    networks in commercial FPGAs, they and other signals are separately managed.

    7.3 Cyclone II FPGA Family:

    Altera's Cyclone II FPGA family is designed on an all layer copper, low k, 1.2-

    V SRAM process and is optimized for the smallest possible die size. Built on TSMCs

    highly successful 90-nm process technology using 300-mm wafers, the Cyclone II

    FPGA family offers higher densities, more features, exceptional performance, and the

    benefits of programmable logic at ASIC prices.

    7.4 Altera's Cyclone II FPGA Family Features:

    1.Cost-Optimized Architecture

    The Cyclone II architecture is optimized for the lowest cost and offers up to 68,416

    logic elements (LEs) more than 3x the density of first generation Cyclone FPGAs.

    The logic resources in Cyclone II FPGAs can be used to implement complex

    applications.

    2.High Performance

    Cyclone II FPGAs are 60 percent faster than competing low-cost 90-nm FPGAs,

    making them the highest performing low-cost 90-nm FPGAs on the market.

    3.Low Power

    Cyclone II FPGAs are half the power of competing low-cost 90-nm FPGAs,

    dramatically reducing both static and dynamic power.

  • 47

    4.Process Technology

    Cyclone II FPGAs are manufactured on 300-mm wafers using TSMC's leading-

    edge 90-nm, low-k dielectric process technology.

    5.Embedded Multipliers

    Cyclone II FPGAs offer up to 150 18 x 18 multipliers that are ideal for low-cost

    digital signal processing (DSP) applications. These multipliers are capable of

    implementing common DSP functions such as finite impulse response (FIR) filters,

    fast Fourier transforms (FFTs), correlators, encoders/decoders, and numerically

    controlled oscillators (NCOs).

    6.Fast On Capability

    Select Cyclone II FPGAs offer fast on capability, allowing them to be

    operational soon after power up, making them ideal for automotive and other

    applications where quick start-up time is essential. Cyclone II FPGAs, which offer a

    faster power-on reset (POR) time, are designated with an A in the device ordering

    code (EP2C5A, EP2C8A, EP2C15A, and EP2C20A).

  • 48

    CHAPTER 8

  • 49

    RESULTS

    8.1 BAUGH WOOLEY MULTIPLIER

    Fig 13 : simulation result for baugh wooley multiplier

  • 50

    Fig:14 RTL schemetic of Baugh wooley multiplier

  • 51

    BAUGH WOOLEY:

    8.1.1 AREA (TOTAL LOGIC ELEMENTS)

    8.1.2 POWER:

  • 52

    8.1.3 DELAY ( TIMING REQUIREMENT)

    8.2 VEDIC MULTIPLIER

    Fig 15: simulation result of vedic multiplier

  • 53

    Fig 16: RTL schematic of the vedic multiplier

    8.2.1 AREA (TOTAL LOGIC ELEMENTS)

  • 54

    8.2.2 DELAY ( TIMING REQUIREMENT)

    8.2.3 POWER:

    BAUGH WOOLEY MULTIPLIER

    TIME (ns)

    22.314

    POWER (mW)

    195.68

    TOTAL LOGIC ELEMENTS

    41

  • 55

    VEDIC MULTIPLIER

    TIME (ns)

    16.939

    POWER (mW)

    195.18

    TOTAL LOGIC ELEMENTS

    28

    8.3 COMPARISON OF TWO MULTIPLIERS

    8.3.1 DELAY ( TIMING REQUIREMENT)

    BAUGH WOOLEY: 22.314 nS

    VEDIC: 16.939 nS

  • 56

    8.3.2 AREA (TOTAL LOGIC ELEMENTS)

    0

    5

    10

    15

    20

    25

    30

    35

    40

    45

    Baugh-wooley multiplier Vedic multiplier

    Total logic elements

    BAUGH WOOLEY : 41

    VEDIC: 28

    8.3.3POWER

    BAUGH WOOLEY: 195.68 mW

    VEDIC: 195.18 mW

  • 57

    CHAPTER 9

  • 58

    CONCLUSION

    Multipliers are one the most important component of many systems. So we

    always need to find a better solution in case of multipliers. Our multipliers should

    always consume less power. So through our project we try to determine which of the

    two algorithms works the best. Our project gives a clear concept of different multiplier

    and their implementation in Altera Quartus-II tool. We found that the vedic multiplier

    is much option than the Baugh wooley multiplier. We concluded this from the result

    of power consumption and the total area. In case of vedic multiplier, the total area is

    much less than that of baugh wooley multipliers. Hence the power consumption is

    also less. This is clearly depicted in our results. This speeds up the calculation and

    makes the system faster. When the two multipliers were compared we found that

    baugh wooley multiplier is more power consuming and have the maximum area. This

    is because it uses a large number of adders. As a result it slows down the system

    because now the system has to do a lot of calculation. In the end we determine that

    Urdhva Tiryakbhyam algorithm works the best.

  • 59

    CHAPTER 10

  • 60

    REFERENCES:

    [1] Delay-Power performance of Multipliers in VLSI Design Sumit Vaidya and

    Deepak Dandekar International Journal of Computer Networks & Communications

    (IJCNC), Vol.2, No.4, July 2010

    [2] Design and implementation of different multipliers using VHDL Prof. Dr. K.K

    Mahapatra Dept. of Electronics and Communication Engineering, National Institute

    of Technology, Rourkela 2007.

    [3] A Reduced-Bit Multiplication Algorithm for Digital Arithmetic Harpreet Singh

    Dhillon and Abhijit Mitra, International Journal of Computational and Mathematical

    Sciences 2:2 ,2008.

    [4] VLSI implementation of vedic multiplier with reduced delay First Krishnaveni

    D., Department of TCE, A.P.S College of Engineering; Second Umarani T.G.,

    Department of ECE, A.P.S College of Engineering, Somanahalli. International

    Journal of Advanced Technology & Engineering Research (IJATER) National

    Conference on Emerging Trends in Technology (NCET-Tech) ISSN

    [5]A New Low Power 3232- bit Multiplier Pouya Asadi and Keivan Navi, World

    Applied Sciences Journal IDOSI Publication.

    [6] A Novel Parallel Multiply and Accumulate (V-MAC) Architecture Based On

    Ancient Indian Vedic Mathematics Himanshu Thapliyal and Hamid RArbania.

    [7] Low power and high speed 8x8 bit Multiplier Using Non-clocked Pass Transistor

    Logic C.Senthilpari,Ajay Kumar Singh and K. Diwadkar, 2007, IEEE.

  • 61

    [8] Kiat-seng Yeo and Kaushik Roy Low-voltage,low power VLSI sub system Mc

    Graw-Hill Publication.

    [9] Jong Duk Lee, Yong Jin Yoony, Kyong Hwa Leez and Byung-Gook Park

    Application of Dynamic Pass Transistor Logic to 8-Bit Multiplier Journal of the

    Korean Physical Society,March 2001

    [10] C. F. Law, S. S. Rofail, and K. S. Yeo A Low-Power 1616-Bit Parallel

    Multiplier Utilizing Pass-Transistor Logic IEEE Journal of Solid State circuits,

    October 1999.

    [11] Low Power High Performance Multiplier C.N. Marimuthu and P.Thiangaraj,

    ICGST-PDCS,Volume 8, December 2008.

    [12] ASIC Implementation of 4 Bit Multipliers Pravinkumar Parate ,IEEE

    Computer society.ICETET,2008.25.

    Books referred:

    1.VHDL by B Bhaskar

    2.Verilog HDL: A Guide to Digital Design and Synthesis, Second Edition

    By Samir Palnitkar, Publisher: Prentice Hall PTR, : February 21, 2003

  • 62