arth_cir

Upload: noordcet

Post on 02-Jun-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Arth_Cir

    1/105

    Logic Design

  • 8/10/2019 Arth_Cir

    2/105

    Boolean Functions 3 lectures

    Boolean Functions Minimization. Combinational Logic DesignPrinciples

    4 lectures Brief Description of Verilog 3 lectures

    Basic Combinational Circuits 4 lectures

    Finite States Machines (FSM) 3 lectures

    Synthesis of Synchronous FSM 5 lectures

    Content (1/2)

  • 8/10/2019 Arth_Cir

    3/105

    Basic Sequential Circuits 3 lectures

    Problems of Synchronous Design 3 lectures

    Asynchronous FSM. Self-Timed Circuits 3 lectures

    Arithmetic Units 4 lectures

    Programmable Logical Integrated Circuits (PLDs) 3 lectures

    Memory Devices 3 lectures

    Content (2/2)

  • 8/10/2019 Arth_Cir

    4/105

    Positional Number Systems Decimal base or radix=10 Binary radix=2=an-1an-2an-3 . . . a1a0,a-1a-2 . . .-m ; a{0,1}

    There are n digits to the left of the point and m digits to theright of the point.

    A=an-12n-1+an-22

    n-2+. . .+a121+a02

    0+a-12-1+a-22

    -2+. . .+ a-m2

    -m.Unsigned integer number

    Range (0 to 2n-1)

    n-1 0

    Unsigned Numbers

  • 8/10/2019 Arth_Cir

    5/105

    0000

    0111

    0011

    1011

    1111

    1110

    1101

    1100

    1010

    1001

    1000

    0110

    0101

    0100

    0010

    0001

    +0

    +1

    +2

    +3

    +4

    +5

    +6

    +7+8

    +9

    +10

    +11

    +12

    +13

    +14

    +15

    AdditionSubtraction

    A modular counting representation of 4-bit unsigned numbers

    A Graphical View

  • 8/10/2019 Arth_Cir

    6/105

    Signed Numbers

    S

    n-1 0

    S = 0 Positive number (or zero)

    S = 1 Negative numberNegative numbers representationThree Major schemes: sign and magnitude - direct code

    ones complement twos complement

  • 8/10/2019 Arth_Cir

    7/105

    0000

    0111

    0011

    1011

    1111

    1110

    1101

    1100

    1010

    1001

    1000

    0110

    0101

    0100

    0010

    0001

    +0

    +1

    +2

    +3

    +4

    +5

    +6-1

    -2

    -3

    -4

    -5

    -6

    -7

    +7-0

    A=-an-2an-3 . . .a1a0Asign&magn=1an-1an-2a1a0

    Example:

    +5 0101-5 1101

    Range(2n-1- 1) to 2n-1-1Two representation for 0

    Operands have different signs - subtract smaller(by magnitude)from larger and keep sign of the larger

    Sign and Magnitude

    (Direct Code)

  • 8/10/2019 Arth_Cir

    8/105

    A= - an-2an-3 a1a0A1scom= 1~an-2~an-3 ~a1~a0

    Example:

    +5 - 0101-5 - 1010

    A1scom= 2n- 1- |A|

    Range -2n-1 -1 to 2n-1-1

    Ones Complement

  • 8/10/2019 Arth_Cir

    9/105

    Ones Complement on the Number Wheel

    Two representation for 0

    A - A = -00000

    0111

    0011

    1011

    1111

    1110

    1101

    1100

    1010

    1001

    1000

    0110

    0101

    0100

    0010

    0001

    +0

    +1

    +2

    +3

    +4

    +5

    +6-6

    -5

    -4

    -3

    -2

    -1

    -0

    +7-7

    Addition ofpositive number

    Subtractionof positivenumber

  • 8/10/2019 Arth_Cir

    10/105

    A= - an-2an-3 a1a0A1scom= 1an-2an-3 a1a0 + 1

    Example:

    +5 - 0101-5 - 1011

    A2scom= 2n- |A|

    Range -2n-1 to 2n-1-1

    Twos Complement

  • 8/10/2019 Arth_Cir

    11/105

    Twos Complement on the Number Wheel

    +7-8

    0000

    0111

    0011

    1011

    1111

    1110

    1101

    1100

    1010

    1001

    1000

    0110

    0101

    0100

    0010

    0001

    +0

    +1

    +2

    +3

    +4

    +5

    +6-7

    -6

    -5

    -4

    -3

    -2

    -1

    Addition ofpositivenumber

    Subtractionof positivenumber

    A2scom = 2n -A

    = - 2n-1 an-1 + 2i ai

  • 8/10/2019 Arth_Cir

    12/105

    Twos Complement Addition (1)

    Addition:C = A + B. A 0, B 0, |A| BA2com + B = 2

    n |A| + B = B |A|

    The result is positive and carry from sign bit (2n

    ) is discarded.Example : + 0011111 = +0 0011111

    0000111 = 1 1111001+ 0011000 = 10 0011000

    0, B 0, |A| B.

    Acom + B = 2n (|A|B)= 2n |C|= CcomExample: 0011100 = +1 1100100+ 0000100 = 0 0000100 0011000 1 1101000

  • 8/10/2019 Arth_Cir

    13/105

    Twos Complement Addition (2)

    A 0, B 02n |A| + 2n |B|= 2n (|A| + |B|) + 2n = 2n |C| = C2com

    The result is negative and carry from sign bit (2n) is discarded.

    Example : 0001101 = 1 1110011 0011001 = 1 1100111 0100110 = 11 1011010

    Summary: The sign bit participates in operation like other bits. The negative result is represented in twos complement

    form. The carry from the sign bit is ignoredSubtraction: C = A-B = A + (-B) = A + Complemented B

  • 8/10/2019 Arth_Cir

    14/105

    Addition:C = A + B. A 0, B 0, |A| B.A1scom + B = 2

    n -1 |A| + B = B |A|+2n-1

    The result is positive and the carry from the sign bit ( 2n) is added tothe least bit of the result (end-around carry)

    Example : + 0011111 = +0 0011111 0000111 = 1 1111000

    +0011000 = 10 0010111

    10 0011000

    Ones Complement Addition (1)

  • 8/10/2019 Arth_Cir

    15/105

    Ones Complement Addition (2)

    0, B 0, |A| B.

    A1scom + B = 2n - 1- (|A|B)= 2n -1- |C|= C1scom

    Example : 0011100 = + 1 1100011

    + 0000100 = 0 0000100

    0011000 = 1 1100111

  • 8/10/2019 Arth_Cir

    16/105

    Ones Complement Addition (3)

    A 0, B 0 12n -1- |A| + 2n |B|= 2n -1 (|A| + |B|) + 2n -1 = 2n |C| = C1scomIn this case end-around carry is generatedExample : 0001101 = 1 1110010

    0011001 = 1 1100110 0100110 = 111011000

    11 1011001

    Summary: The sign bit participates in operation like other bits.

    The negative result is represented in ones complement form. The carry from the sign bit is end-around carry

    Simpler addition scheme makes twos complement the most commonchoice for integer number systems within digital systems

  • 8/10/2019 Arth_Cir

    17/105

    If an addition operation produced a result that exceeds the range of thenumber system, overflow is said to occur.Addition of two numbers with different signs can never produceoverflow (only addition numbers with the same signs).

    Example:-64 + 11000000 +50 + 0 0110010

    -65 10111111 80 0 1010000

    -129 01111111 = +1 130 1 0000010 = - 126

    Negative Overflow Positive Overflow

    Overflow: if the addends signs are the same but the sums sign isdifferent from the addends. OVF = cncn-1

    Overflow

  • 8/10/2019 Arth_Cir

    18/105

    Inputa b Outputs cout

    0 00 11 0

    1 1

    0 01 01 0

    0 1

    Half AdderThe function of the half adder is to add two binary digits,producing a sum and a carry.

    s= a b;cout = ab

    a

    b s

    cout

    Adders

  • 8/10/2019 Arth_Cir

    19/105

    Input

    a b cin

    Output

    s cout0 0 00 0 10 1 00 1 1

    1 0 01 0 11 1 01 1 1

    0 01 01 00 1

    1 00 10 11 1

    The function of the full adder is to add two binary digits and acarry that might be generated or propagated by the previousstage.

    Full Adder (1)

  • 8/10/2019 Arth_Cir

    20/105

    S=a b c;cout = ab + acin + bcin (majority function)or S=~cout(a+b+cin)+abcin

    bcina 00 01 11 10

    0

    1

    1 1

    1 1

    s

    bcina 00 01 11 10

    0

    1

    1

    11 1

    cout

    Full Adder (2)

  • 8/10/2019 Arth_Cir

    21/105

    The Circuit of Full Adder (1)

    cin

    ab

    s

    cout

    s1

    c1

    c2

    c3

    Cout=ab +cin(ab)Standard approach 6 gates

  • 8/10/2019 Arth_Cir

    22/105

    The Circuit of Full Adder (2)

    Cout=ab +cin(ab)= ab cin(ab)

    5 gates

    cin

    ab

    sum

    cout

    s1

    c1

    c2

  • 8/10/2019 Arth_Cir

    23/105

  • 8/10/2019 Arth_Cir

    24/105

    Full Adder from Two Half Adders

    Half

    Adder

    A

    B

    Half

    Adder

    A B

    Cin

    A B CinS S

    COCO

    Cin (A B)A B

    S

    CO

  • 8/10/2019 Arth_Cir

    25/105

    Inversion Property

    Boolean functions S and Cout are self-dual.

    A B

    S

    CinFACout

    A B

    S

    FACout Cin

    Cout (A, B, Cin) = Cout (A, B, Cin)

    S (A, B, Cin) = S(A, B, Cin)

  • 8/10/2019 Arth_Cir

    26/105

    Cout

    A0 B0

    S0

    CinFA

    A1 B1

    S1

    FA

    A2 B2

    FA

    A3 B3

    S3

    FA

    S2

    Invertors on the way of carry signal may be removed (thiswill minimize the critical path of carry chain).

    Inversion Property

  • 8/10/2019 Arth_Cir

    27/105

    an-1 an-2 . . . a1 a0ABCin

    FAS

    Cout

    bn-1 bn-2 . . . b1 b0

    sn-1 sn-2 . . . s1 s0

    D Q

    C

    RA

    RB

    Clock

    RS

    The Serial Adder

  • 8/10/2019 Arth_Cir

    28/105

    cout

    (c4)

    A S

    B

    CIN CO

    A S

    B

    CIN

    CO

    A S

    B

    CIN CO

    A S

    B

    CIN CO

    a0b0cin(c0)

    a1b1

    a2b2

    a3b3

    s0 s1 s2 s3

    c1 c2 c3

    Carry Ripple Adder.S= A+ B; A= (a0,a1,a2,a3); B= (b0,b1,b2,b3);S0 = a0 b0 c0; c1= a0b0 + (a0 + b0) c0;

    S1 = a1 b1 c1; c2= a1b1 + (a1 + b1) c1;S2 = a2 b2 c2; c3= a2b2 + (a2 + b2) c2;S3 = a0 b3 c3; c4= a3b3 + (a3 + b3) + c3;Tadd= (n-1)tc + tsm n tsm

    A Parallel Binary Adder

  • 8/10/2019 Arth_Cir

    29/105

    // Define a 4-bit addermodule add4(s, c_out, a, b, c_in); // I/O port declarationsoutput [3:0] s;output c_out;

    input [3:0] a, b;input c_in;// Internal netswire c1, c2, c3;

    Verilog Description for 4-bit CRA (1)

  • 8/10/2019 Arth_Cir

    30/105

    Verilog Description for 4-bit CRA (2)

    (Gate Level Description)

    // Instantiate four 1-bit full adders.fulladd fa0(s[0], c1, a[0], b[0], c_in);fulladd fa1(s[1], c2, a[1], b[1], c1);

    fulladd fa2(s[2], c3, a[2], b[2], c2);fulladd fa3(s[3], c_out, a[3], b[3], c3);endmodule// Define a 1-bit full addermodule fulladd(sum, c_out, a, b, c_in);

    // I/O port declarationsoutput sum, c_out;input a, b, c_in;

  • 8/10/2019 Arth_Cir

    31/105

  • 8/10/2019 Arth_Cir

    32/105

    module adder_4_RTL (a, b, c_in, sum, c_out);

    output [3:0] sum;

    output c_out;

    input [3:0] a, b;

    input c_in;

    assign {c_out, sum} = a + b + c_in;

    endmodule

    Verilog Description for 4-bit CRA (3)

  • 8/10/2019 Arth_Cir

    33/105

    Tadd=TFA(A,BCout) + (N-2)TFA(CinCout) + TFA(CinS)

    T = O(N) worst case delay.

    N number of bit.Real Goal: Make the fastest possible carry path.

    S

    B

    CIN CO

    S

    B

    CIN CO

    S

    B

    CIN CO

    0 1 31s0 s1 s31

    b0b1 b31

    c32~Add/Sub

    A 64-bit Adder/Subtractor

  • 8/10/2019 Arth_Cir

    34/105

    Adder/Subtractor Module in Verilog

    module addsub(a, b, select, cout, sum);

    input [7:0] a, b;input select;output [7:0] sum;output cout;assign {cout, sum}=select?(a-b):(a+b);

    endmodule

    Select = 0 Addition

    Select = 1 Subtraction

    Data-flow description

  • 8/10/2019 Arth_Cir

    35/105

    Inputa b cin

    Outputs cout cout s

    Carrystatus

    0 0 00 0 1

    0 01 0

    0 cin0 cin

    annihilateannihilate

    0 1 00 1 11 0 01 0 1

    1 00 11 00 1

    cin ~cincin ~cincin ~cincin ~cin

    propagatepropagatepropagatepropagate

    1 1 01 1 1

    0 11 1

    1 cin

    1 cin

    generategenerate

    Carry Look-Ahead Adders (1)

  • 8/10/2019 Arth_Cir

    36/105

    All carries are produced in parallelci+1 = gi + pici, where gi = aibi , pi = ai + bi (or pi=aibi).gi carry generationpi carry propagationRe-express the carry logic for each of the bits:c1 = g0 + p0c0;c2 = g1 + p1c1 = g1+ p1g0 + p1p0c0;c3 = g2 + p2(g1 + p1g0 + p1p0c0) = g2 + p2 g1 + p2p1g0 ++ p2p1p0c0;

    c4 = g3 + p3g2 + p3p2g1 + p3p2p1g0 + p3 p2p1p0c0;Each equation corresponds to a circuit with just three levels ofdelay one for generate and propagate signals, and two for thesum of products.

    Carry Look-Ahead Adders (2)

  • 8/10/2019 Arth_Cir

    37/105

    FA FA FA FA

    Carry Unit

    g0 p0 p1g1 g2 p2 g3 p3

    a0 b0 a1 b1 b2a2 a3 b3

    cin

    coutc1 c2 c3

    Carry Look-Ahead Adders (3)

  • 8/10/2019 Arth_Cir

    38/105

    One bit CLA

    cin

    cout

    aibi

    gi

    pi

    0

    1

    Si

  • 8/10/2019 Arth_Cir

    39/105

    One Stage of a Carry Look-Ahead Adder

    Carry-

    Lookahead

    Logic

    aibi

    hsici

    a0

    ai-1b0

    bi-1

    si

  • 8/10/2019 Arth_Cir

    40/105

    Lookahead carry circuit (Carry Unit)forms carry signalsc4 = G + Pc0, where G = g3 + p3g2 +p3p2g1 + p3p2p1g0 and P = p3

    p2p1p0c0.This equations formal coincide withequations ci+1=gi + pici .Lookahead carry for 4-bit sections ALUis executed like lookahead carry for

    separate bits of 4-bit adder.

    C0G0P0G1P1G2P2G3P3

    C1

    C2

    C3GP

    CRU (Carry Unit)

  • 8/10/2019 Arth_Cir

    41/105

    MSI Adders

    IC 74x283

    Adder produces active-low versions of the carry-generateand carry-propagate signals.Equations for half-sum:

    hsi = aibi = ai~bi +~aibi=ai~bi+ai~ai+~aibi+bi~bi==(ai+bi)(~ai+~bi)=(ai+bi)~(aibi)=pi~gi

    And gate can be used instead of XOR gateEquations for carry is factored

    ci+1 = gi+pici=pigi+pici=pi(gi+ci)

  • 8/10/2019 Arth_Cir

    42/105

    c1= p0(g0+c0)c2= p1(g1+c1) = p1(g1+ p0(g0+c0))== p1(g1+p0)(g1+g0+c0)c3= p2(g2+c2) = p2(g2+ p1(g1+p0)(g1+g0+c0))=

    =p2(g2+ p1)(g2+g1 +p0)(g2+g1+g0+c0)c4 = p3(g3+c3)= p3(g3 + p2(g2+ p1)(g2+ g1+p0)(g2+g1+g0+c0))==p3(g3 + p2)(g3+g2+p1) (g3+g2+g1+ p0)(g3+g2+ g1+g0+c0)

    The propagation delay from the C0 input to the C4 outputis very short, about the same as two inverter gates.

    Equations for Carry Signals

  • 8/10/2019 Arth_Cir

    43/105

    C0A0B0A1

    B1A2B2A3B3

    S0

    S1

    S2

    S3

    C4

    Logic Symbol

    IC 74x283

  • 8/10/2019 Arth_Cir

    44/105

    Cin

    Cout

    B[0:15]

    C0

    0 S0

    B0

    A1 S1

    B1A2 S2

    B2

    A3 S3

    B3

    C4

    C0

    0 S0

    B0

    A1 S1

    B1A2 S2

    B2

    A3 S3

    B3

    C4

    C0

    0 S0

    B0

    A1 S1

    B1A2 S2

    B2

    A3 S3

    B3

    C4

    C0

    0 S0

    B0

    A1 S1

    B1A2 S2

    B2

    A3 S3

    B3

    C4

    A[0:15]

    S[0:15]

    11

    22

    33

    44

    5

    6

    7

    88

    7

    6

    5

    7

    99

    10

    10

    1111

    1212

    1313

    1414

    1515

    1616

    1

    2

    3

    4

    5

    6

    8

    9

    10

    11

    12

    12

    13

    14

    16

    Tadd=Mt(c0c4)=4t(c0c4), where the number of groups

    A 16-bit Group-Ripple Adder

  • 8/10/2019 Arth_Cir

    45/105

  • 8/10/2019 Arth_Cir

    46/105

    The following 3 functions are formed in each bit of theadder:G = Ai Bi(Generate);

    P = AiBi(Propagate)K= ~Ai~Bi (Annihilate (kill))

    Carry Bypass Adder

    (Carry Skip Adder)

  • 8/10/2019 Arth_Cir

    47/105

    The idea of carry bypass adder:

    P0=a0+b0; P1=a1+b1; P2=a2+b2;P3=a3+b3.If P0 P1 P2 P3 =1, then Cout=Cin, else Cout = C4 (Generate).BP block propagation.

    FA FA FA FA

    MUX

    a0 b0 a1 b1 a2 b2 a3 b3 B P = P0P1P2P3

    Cin

    Cout

    C4

    S0 S1 S2 S3

    0

    1

    Carry Bypass (Skip) Adder

  • 8/10/2019 Arth_Cir

    48/105

    S0 - S3

    Cin

    B3Bit 12-15

    Setup

    Carry

    Propa-gation and

    C0

    Setup

    Carry

    Propa-

    gation and

    C1

    Setup

    Carry

    Propa-

    gation and

    C2

    Setup

    Carry

    Propa-gation and

    C3

    Sum Sum Sum Sum

    B0Bit 0-3

    B1Bit 4-7

    B2Bit 8-11

    BP0 BP0BP1 BP0

    BP1

    BP2 BP0BP1BP2BP3

    Cout

    0

    1

    C0 C1 C2C3

    BP0 BP1 BP2BP3

    S4 S7 S8 S11 S12 S15

    0

    1 1 1

    0 0

    Worst-case delay carry from bit 0 to bit 15 = carry generated in bit

    0, ripples through bits 1, 2, and 3, skips the middle two groups (B isthe group size in bits), ripples in the last group from bit 12 to bit 15.Tadd = tsetup + B tcarry + ((N/B) - 2) tskip +(B-1) tcarry + tsumtsetup- time for forming gs and ps.

    Carry Skip Adder

  • 8/10/2019 Arth_Cir

    49/105

    C0 carry from B0, C1 carry from B1, C2 carry from B2,C3 carry from B3.tsetup - time necessary for creating generation and propagationsignals (gi,pi).

    tcarry one bit propagation signal delay.tbypass propagation signal delay through bypass multiplexer.tsum time required for forming sum of the last bit.Dependence of timing delay from the number of bits is moreacceptable than in CRA (is also linear function but with less

    angular coefficient).

    Carry Skip Adder

  • 8/10/2019 Arth_Cir

    50/105

    Carry ripple is realized in the blocks.Accept tcarry = tskip = tsetup = 1; thenadd =1 + B +(N/B2) + B + 1 = 2B + N/B-1 ;dTadd/dB = 2 N/B2;

    dTadd/dB = 0 at Bopt = (N/2)Topt = 4(n/2) 1 = 2(2n) 1

    Optimal Skip Block Size and Add Time

  • 8/10/2019 Arth_Cir

    51/105

    N bit circuit is divided into M blocks by B bits.Precompute the carry out of each block for both carry_in = 0 andcarry_in = 1 (can be done for all blocks in parallel) and then selectthe correct one. The adder circuit is completed about 30%

    4 bits

    Carry0 Carry1

    MUX

    Sum

    0

    CinCout

    1

    0 1

    Carry Select Adder

    C S

  • 8/10/2019 Arth_Cir

    52/105

    Tadd = tsetup + B tcarry + (N/B) tmux + tsum

    Setup

    0 carry

    1 carry

    Mux

    Sum

    0

    1

    15 ... 12

    A B

    Ps Gs

    Cs

    S15... S12

    Setup

    0 carry

    1 carry

    Mux

    Sum

    0

    1

    Ps Gs

    Cs

    S11... S8

    Setup

    0 carry

    1 carry

    Mux

    Sum

    0

    1

    Ps Gs

    Cs

    S7... S4

    Setup

    0 carry

    1 carry

    Mux

    Sum

    0

    1

    Ps Gs

    Cs

    S3... S0

    CinCout

    13 ... 8

    A B

    7 ... 4

    A B

    3 ... 0

    A B

    (1)

    (5)(5)(5)(5)(6)(7)(8)(9)

    (1)(1)(1)

    (5)

    Carry Select Adder

  • 8/10/2019 Arth_Cir

    53/105

    C S l t Add (S R t)

  • 8/10/2019 Arth_Cir

    54/105

    Cin

    Tadd = tsetup + 2 tcarry + m tmux + tsum

    1 0

    A B

    4 ... 2

    A B

    Setup

    0 carry

    1 carry

    Mux

    Sum

    0

    1

    19 ... 14

    A B

    Ps Gs

    Cs

    S19... S14

    Setup

    0 carry

    1 carry

    Mux

    Sum

    0

    1

    Ps Gs

    Cs

    S13... S9

    Setup

    0 carry

    1 carry

    Mux

    Sum

    Ps

    Cs

    S8... S5

    Cout

    13 . 9

    A B

    8 ... 5

    A B

    1

    Setup

    0 carry

    1

    carry

    Mux

    Sum

    0

    1

    Ps Gs

    Cs

    S1 S0

    Gs

    0

    Gs

    Setup

    0 carry

    1 carry

    Mux

    Sum

    Ps

    Cs

    S4...S2

    0

    1

    0

    (1)

    (3)(3)(4)(5)(6)(7)(8)

    Carry Select Adder (Square Root)

    C S Add

  • 8/10/2019 Arth_Cir

    55/105

    Carry Save Adder

    Consider addition of three numbers. In this case two vectors areformed: sum vector S and carry vector C:

    Example: x+y+z = s+c

    x: 1001111

    y: 1100100z: + 0001111s: 0100100c: +1001111

    sum: 11000010

    At addition of N n-bit numbers, the number of bits of sum will equallog2N+n. CSA is used for adding more than two numbers together.

    Ci it f dd f 3 4 bit b

  • 8/10/2019 Arth_Cir

    56/105

    A B A B A B A B

    A B A B

    Cin

    Cout S

    Cin

    Cout SCout S

    Cin Cin

    Cout S

    Cin Cin

    AB

    AB

    FA FA FA FA

    HAHA FA FA

    CoutCoutCoutCout

    x0 y0 z0x1 y1 z1x2 y2 z2x3 y3 z3

    sum0sum1sum2sum3sum4sum5

    Circuit of adder of 3- 4-bit numbers

    CSA

  • 8/10/2019 Arth_Cir

    57/105

    Advantage of CSA pipeline capability.

    CSA1

    D

    C

    DC

    DC

    CSA2DC

    DC

    CRA

    A4

    A1

    A2

    A3

    Clock 1Clock 2

    Ss

    Cs

    CSA

    C diti l S Add

  • 8/10/2019 Arth_Cir

    58/105

    SMLow

    ALow

    SMHigh SMHigh

    BLow

    BHighAHigh

    MUX MUX

    Cin =1 Cin =0

    Cout0

    Cout1CoutL

    Cout

    S

    n-bit adder is divided into two groups by n/2 bits. The oldergroup is duplicated, so three adders by n/2 bits are included inthe circuit.

    Conditional Sum Adder

    Th St t f E ti U it

  • 8/10/2019 Arth_Cir

    59/105

    The Structure of Execution Unit

    OA- Operational(or Execution) UnitCU Control UnitOA consists of registers,adders, another logicalelements and wires.CU produces control signals,that bring to execution ofops.

    OU CU

    Data in

    Data out

    Command

    Done

    X

    Y

    EU f I t M lti li ti (U i d)

  • 8/10/2019 Arth_Cir

    60/105

    EU for Integer Multiplication (Unsigned)

    RA RB

    SM

    0

    MUX

    RC(acc)

    CT

    y1 y1

    y1

    y1 n

    y3

    y3

    y3y2

    Control

    Unitx2

    x1

    y1 y2 y3

    RA multiplicand, RB multiplier, RC (accumulator) high bitsof sum of partial products. Possible to combine multiplierregister (low bits) and accumulator register (high bits).

    n bit n bit

    n bit

    2n bit

    A B

    Flo Chart of M ltiplication (1)

  • 8/10/2019 Arth_Cir

    61/105

    Start

    Multiply?

    RA = A; RB =B;CT =n; RC =0

    Yes

    No

    Y1

    X1

    RC = SM

    Shift right RC, RB;CT =CT-1

    (CT)=0?

    End

    Yes

    Y2

    Y3

    X2No

    Flow-Chart of Multiplication (1)

    Example of Multiplication

  • 8/10/2019 Arth_Cir

    62/105

    Example of Multiplication

    AccumulatorRC

    RB CT

    0000 0000 1010 4

    0000 0000 0101 3 shift

    +0000 00001101 00001101 0000

    0110 1000 0010 2 shift

    0011 0100 0001 1 shift

    +0011 0100110110000 0100

    1000 0010 0000 0 shift

    A=1101;B=1010;

    Signed multiplication:

    convert negativenumbers to positive,execute unsignedmultiplication,

    remember the originalsigns.

    Behavior Description of Multiplier in

  • 8/10/2019 Arth_Cir

    63/105

    modulemultiplier (a, b, mul, clock, result, ready);input clock, mul;parametern=8;input [n-1:0] a,b;wire[n-1:0] a,b;

    reg [2*n-1:0] result;output ready;reg ready;output [2*n-1:0] result;reg [n:0] rc;reg [n-1:0] ra, rb;always @(posedge mul)beginra=a;rb=b;

    Behavior Description of Multiplier in

    Verilog(1)

    Behavior Description of Multiplier in

  • 8/10/2019 Arth_Cir

    64/105

    Behavior Description of Multiplier in

    Verilog(2)

    ready=0;rc = 0;repeat (n)begin@(posedge clock)if(rb[0])rc =rc+ra;rb={rc[0],rb[n-1:1]};rc=rc>>1;end

    result={rc[n-1:0],rb};ready=1;endendmodule //multiplier

    Multiplying Unit 2

  • 8/10/2019 Arth_Cir

    65/105

    RA RB

    SM

    MUX

    RC

    0

    CT

    The structure of multiplying

    unit with the shift of multiplierto the right and multiplicand tothe left

    A B

    2n bit

    2n bit

    2n bit

    n bit

    Multiplying Unit 2

    Multiplying Algorithm

  • 8/10/2019 Arth_Cir

    66/105

    Start

    Multiply?

    RA:= A; RB:=B;

    CT:=n; RC:=0

    Yes

    No

    No

    RC:=SM

    SL (RA)SR (RB)

    CT:=CT-1

    (CT)=0?End

    Yes

    Yes NoReturn a result

    (R) or (RB) =0?

    Multiplying Algorithm

    Multiplication on Signed Numbers

  • 8/10/2019 Arth_Cir

    67/105

    RA

    SM

    ACC RB RB0 RB-1

    0 1

    DC0 1 2 3

    CT

    B

    0011

    0110

    A

    Multiplication on Signed Numbers

    (Booths Algorithm)

    Booths Algorithm

  • 8/10/2019 Arth_Cir

    68/105

    Start

    No

    MultiplyNo

    Yes

    RA=A, RB=B, CT= n,

    ACC=0, RB-1=0

    RB0,RB-1

    ACC=ACC+RAACC=ACC+

    RA+1

    ASR(ACC, RB, RB-1)CT = CT-1

    Multiply?

    (CT)==

    0?Yes

    0110

    0011

    ASR arithmetical right shift (sign extend when shifting)

    Booth s Algorithm

    Example

  • 8/10/2019 Arth_Cir

    69/105

    Acc RB RB-1 CT

    +000000000110101001101010

    11000111

    11000111

    0 8

    00110101 01100011 1 7

    00011010 10110001 1 6

    +000011011001011010100011

    01011000

    01011000

    1 5

    11010001 10101100 0 4

    11101000 11010110 0 3

    +11110100

    0110101001011110

    01101011

    01101011

    0 2

    00101111 00110101 1 1

    00010111 10011010 1 0

    A = 10010110;

    B = 11000111;

    (A)com = 01101010;

    Example

    Substantiation of the algorithm (1)

  • 8/10/2019 Arth_Cir

    70/105

    Substantiation of the algorithm (1)

    1. B>0

    A*(00011110)=A*(24 +23 +22 +21) = A*30The set of addition operations can be replaced only by twooperations (addition and subtraction) as the following expressions

    take place:2n + 2n-1+ . . . +2n-k = 2n+1 2n-k

    *(00011110)=A*(25-21)=A*30.This can be expanded at any number of consequently following1s.This algorithm is called Booths recoding.

    (0,1) (-1,0,1)Multiplier: 00011110 0,0,1,0,0,0, -1,0Instead of 4 additions - 2.

    Substantiation of the algorithm (2)

  • 8/10/2019 Arth_Cir

    71/105

    2. B

  • 8/10/2019 Arth_Cir

    72/105

    module Booth_multiplier(a,b,clock,start, ready,result);parametern=16;input[n-1:0] a,b;wire [n-1:0] a,b;input clock, start;output[2*n-1:0] result;reg[2*n-1:0] result;output ready;reg ready;reg[n-1:0] acc,ra,rb;reg q;always@(posedge start)begin

    ra =a;rb=b;acc=0;q=0; ready=0;

    Booth Multiplier in Verilog (1)

    Booth Multiplier in Verilog (2)

  • 8/10/2019 Arth_Cir

    73/105

    repeat (n)begin@(posedge clock)if(rb[0]!==q)begin if(q)acc=acc+ra;else acc=acc-ra;endq=rb[0];rb={acc[0],rb[n-1:1]};acc={acc[n-1],acc[n-1:1]};//arithmetic shift rightendresult={acc,rb};

    ready =1;endendmodule//Booth_multiplier

    Booth Multiplier in Verilog (2)

    Combinational Multipliers

  • 8/10/2019 Arth_Cir

    74/105

    Combinational Multipliers

    Acceleration methods of multiplication: parallel computing of partial products reduction of number of additions reduction of propagation time delay

    Two types of multipliers are used matrix and treestructured.Propagation delay of matrix multipliers (n).Propagation delay of tree structured multipliers O(log2n).

    Partial Products in an 4 4 Multiplier

  • 8/10/2019 Arth_Cir

    75/105

    a0b3 a0b2 a0b1 a0b0

    a1b3 a1b2 a1b1 a1b0

    a0b3

    a0b2 a0b1 a0b0

    a1b3 a1b2 a1b1 a1b0

    p0p1p2p3p4p5p6p7

    +

    a0b3 a0b2 a0b1 a0b0

    a0b3 a0b2 a0b1 a0b0

    Partial Products in an 44 Multiplier

    Matrix Multiplier Based on CRA

  • 8/10/2019 Arth_Cir

    76/105

    Matrix multiplier contains n2 AND gates to formpartial products.

    Multiplier based on CRA contains(n-1)n adders. The number of HA n;

    The number of FA is n2

    -2n. In the worst case the propagation delay equal 3n-4.

    Matrix Multiplier Based on CRA

    Matrix Multiplier Based on CRA Structure

  • 8/10/2019 Arth_Cir

    77/105

    a0b0a1b0a2b0a3b0

    a0b1a1b1a2b1a3b1

    ++++

    a0b21b22b23b2

    ++++

    a0b31b32b33b3

    ++++

    p0p1p2p3p4p5p6p7

    0

    0

    0

    0

    Matrix Multiplier Based on CRA Structure

    Matrix Multiplier with Carry Save Addition

  • 8/10/2019 Arth_Cir

    78/105

    Matrix Multiplier with Carry Save Addition

    Matrix multiplier using carry-save addition contains thesame number of elements.

    It is more faster because its propagation delay isshorter.

    The last (n) stage corresponds to CRA. Its worst-case carry propagation path goes through2n-2 adders.

    Matrix Multiplier using Carry Save

  • 8/10/2019 Arth_Cir

    79/105

    a0b0a1b0a2b0a3b0

    a0b1a1b1a2b1a3b1

    +++

    a0b2a1b2a2b2a3b2

    +++

    a0b3a1b3a2b3a3b3

    +++

    p0p1p2p3p4p5p6p7

    +++

    Matrix Multiplier using Carry Save

    Addition

    Treelike Multipliers

  • 8/10/2019 Arth_Cir

    80/105

    Treelike multipliers contain three stages: Generation of bits of partial products. This stage consists

    of n2 of AND gates. Compression of partial products. Implemented as a tree

    of parallel adders. Final addition. Addition of sum vector and carry vector.While using in multipliers, full adders and half adders areusually called compressors and counters (3,2) (2,2).

    Treelike Multipliers

    Wallace-tree multiplier (1)

  • 8/10/2019 Arth_Cir

    81/105

    a3 a2 a1 a0b3 b2 b1 b0

    a0b0a1b0a0b1

    a2b0a1b1a0b2

    a3b0a2b1a1b2a0b3

    a3b1a2b2a1b3

    a3b2a2b3a3b3

    c 15 s14 c14 s13 c13 s12 c12 s11

    a0b0s11

    a1b0a0b1

    s12c12

    s13c13 a0b3

    s14c14 a1b3

    c15 a3b3a2b3

    a3b3

    c 24 s24 c23 s23 c23 s22 c21 s21s11s21 a0b0s22

    c21

    s23c22c31

    s24c23c32

    c24a3b3c33

    c 24 s34 s33 s32 s31p7 p6 p5 p4 p3 p2 p1 p0

    Wallace tree multiplier (1)

    Wallace-tree multiplier (2)

  • 8/10/2019 Arth_Cir

    82/105

    a0b0a1b0a2b0a3b0

    a0b1a1b1a2b1a3b1

    +

    p0p1p2p3p4p5p6p7

    a0b2a1b2a2b2a3b2

    +++

    a0b3a1b3a2b3a3b3

    ++++

    ++++

    Wallace tree multiplier (2)

    Wallace-tree

  • 8/10/2019 Arth_Cir

    83/105

    Lines of matrix of partial products are grouped in three. For the compression of columns with three bits FA are

    used. For compression of columns with two bits HA areused.

    Line that are not included in a set of three lines areaccounted in the next reduction cascade. Wallace scheme is considered to be the fastest, but at

    the same time its structure is the least regular. The main area of Wallace tree uses is a construction of

    schemas of large capacity.

    Wallace tree

    Dadda Multiplier (1)

  • 8/10/2019 Arth_Cir

    84/105

    c12 s12 s11 a2b0 a1b0a0b0a3b2 c11 a0b3 a1b1 a0b1a2b3 a0b2

    c24 s24 s23 s22 s21 a1b0 a0b0a

    3

    b3

    c23

    c22

    c21

    c31

    a0

    b1c35 c34 c33 c32

    a3b1a2b2a1b3

    a0b0a1b0a0b1

    a2b0a1b1a0b2

    a3b0a2b1a1b2a0b3

    a3b2a2b3a3b3

    c12 s12 c11 s11

    c24 s24 c23 s23 c22 s22 c21 s21

    a3b3

    c36 s36 s35 s34 s33 s32 s31p7 p6 p5 p4 p3 p2 p1 p0

    a3 a2 a1 a0b3 b2 b1 b0

    Dadda Multiplier (1)

    Dadda Multiplier (2)

  • 8/10/2019 Arth_Cir

    85/105

    a0b0a1b0a2b0a3b0

    a0b1a1b1a2b1a3b1

    a0b2a1b2a2b2a3b2

    a0b3a1b3a2b3a3b3

    + + +

    +++

    + +++++

    p7 p6 p5 p4 p3 p2 p1 p0

    Dadda Multiplier (2)

    Dadda Multiplier (3)

  • 8/10/2019 Arth_Cir

    86/105

    The difference in Wallace and Dadda methods is the differentapproach in the solution of addition compression problem.

    Wallace algorithm compresses codes as soon as possible, at theearly stages.

    Dadda algorithm provides the highest level of compression at the

    late stages. A Wallace-tree multiplier works forward from the multiplier inputs. The Dadda multiplier works backward from the final product. The number of cascades is the same in the both multipliers. Both lacks in structure regularity.

    The number of stages and thus delay (in units of an FA delayexcluding the CPA) for an n-bit tree-based multiplier using (3, 2)counters is log1.5.n = log10 n/log101.5 =log10n/0.176

    adda u t p e (3)

    Example of Sequential Multiplier

  • 8/10/2019 Arth_Cir

    87/105

    p q p

    RA RB

    SM

    0

    MUX

    RC(acc)

    CT

    y1

    y1n

    y3

    y3

    y3y2

    Control

    Unit

    x2

    x1

    y1 y2 y3

    RA multiplicand, RB multiplier, RC (accumulator) high bits of sumof partial products. Multiplier register (low bits) sum.

    n bit n bit

    n bit

    n bit

    Flow-Chart of Multiplication Algorithm(1)

  • 8/10/2019 Arth_Cir

    88/105

    p g ( )

    Begin

    Multiply?

    RA = A; RB =B;CT =n; RC =0

    YesNo

    Y1

    X1

    RC = SM

    Shift right RC, RB;CT =CT-1 (CT)=0?

    End

    Yes

    Y2

    Y3

    X2No

    Flow-Chart of Multiplication Algorithm(2)

  • 8/10/2019 Arth_Cir

    89/105

    p g ( )

    Begin

    RA = A; RB =B;CT =n; RC =0

    Yes

    No

    Y1

    In1

    Yes

    Y2

    Done?

    No

    RC = RC + MUX

    (Transfer with shift)

    Shift RB,

    CT=CT-1

    (CT)=0End

    Multiply?

    Transition to the Mealy FSM

  • 8/10/2019 Arth_Cir

    90/105

    Mealy FSM

    FSM State markup

    A = {X,Y,S,,}

    X={mul, done};Y={y1,y2,ready};

    S= {S0, S1, S2}

    Begin

    mul

    y1

    1

    0

    y2

    Done

    End

    0

    1

    S2

    S0

    S1

    S0

    y

    Mealy FSM Graph

  • 8/10/2019 Arth_Cir

    91/105

    S0

    S1

    S2

    ~mul/-

    mul/y1

    1/y2

    done/ready

    S3

    ~mul/-mul/ready

    ~done/y2

    y p

    Transition to the Moore FSM

  • 8/10/2019 Arth_Cir

    92/105

    Begin

    mul

    y1

    1

    0

    y2

    Done

    End

    0

    1

    S0

    S1

    S2

    S3

    Moore FSM

  • 8/10/2019 Arth_Cir

    93/105

    Y1:RA=A; RB=B;CT=n;Ready=0Y2:RC=RC+MUX (transfer with

    shift right); shift right RB;CT=CT-1;Y3: ready =1;

    ~done

    S0/-

    S1/y

    1

    ~mul

    mul

    1done

    ~mulmul

    S2/y

    2

    S3/ready

    reset

    Structure of Modules HDL Description

  • 8/10/2019 Arth_Cir

    94/105

    rareg_a

    clock

    comb_logicpart_prod

    acc(result[2n-1:n])accumulator

    rb (result[n-1:0])

    counter

    fsm

    done

    y

    n

    reg_b

    a

    by1 y2

    y1 y2

    mul

    y1

    rb[0]

    y2y1

    y2

    clock

    y1

    clock

    clock

    clock

    y3

    p

    RTL-Description of Multiplier (1)

  • 8/10/2019 Arth_Cir

    95/105

    module ser_mult (mul, result, a, b, clock, reset, ready);output[15:0] result;reg [15:0] result;input[7:0] a,b;input mul,clock,reset;output ready;wire[7:0] acc;wire ready;wire [2:0] y;wire [7:0] ra,rb;wire [8:0] part_prod;reg_a M1(clock, a,y[0],ra);reg_b M2(clock, b, y[0],y[1],rb);

    accumulator M3(clock,y[0],y[1],part_prod,acc);comb_logic M4(part_prod, ra,acc, rb);counterM5(y[0],y[1],clock,count);fsm M6(clock, mul, reset, done, y);

    p p ( )

    RTL-Description of Multiplier (2)

  • 8/10/2019 Arth_Cir

    96/105

    assign ready=y[2];always @(posedge clock)if(ready) result={acc[7:0], rb[7:0]};endmodule

    module reg_a(clock,a,y[0],ra);

    input[7:0] a;output[7:0]ra;reg[7:0] ra;input clock,y[0];always @(posedge clock)

    begin

    if(y[0]) ra

  • 8/10/2019 Arth_Cir

    97/105

    module reg_b(clock,b,y[0],y[1],rb);input clock,y[0],y[1];input[7:0] b;output[7:0] rb;reg[7:0] rb;

    always @(posedge clock)begin

    if(y[0]) rb

  • 8/10/2019 Arth_Cir

    98/105

    module accumulator(clock,y[0],y[1],part_prod,acc);input[8:0] part_prod;output[7:0] acc;reg [7:0] acc;always @(posedge clock)

    beginif(y[0]) acc

  • 8/10/2019 Arth_Cir

    99/105

    module comb_logic (part_prod, ra,acc, rb[0]);input rb[0];input [7:0] ra,acc;output [8:0] part_prod;wire [8:0] part_prod;

    assign part_prod = rb[0]?(acc+ra:acc);endmodule

    RTL-Description of Multiplier (6)

  • 8/10/2019 Arth_Cir

    100/105

    module counter (clock, y[0], y[1], done);input clock, y[0], y[1];reg [3:0] count;output done;reg done;

    always @ (posedge clock)case ({y[0],y[1]})

    00: count

  • 8/10/2019 Arth_Cir

    101/105

    FSM (2)

  • 8/10/2019 Arth_Cir

    102/105

    //next_state logicalways @(state ormul ordone)begin:statescase (state)s0: begin if(mul)next_state=s1;else next_state=s0; end

    s1: next_state=s2;s2: begin if(done) next_state=s3;else next_state=s1;ends3: if (mul)next_state=s3;

    else next_state=s0;

    default:next_state=s0;endcaseend//states

    FSM (3)

  • 8/10/2019 Arth_Cir

    103/105

    //output logicalways @(statebegin:outputscase (state)s0: y=3b000;

    s1: y=3b001;s2: y=3b010;s3: y=3b100;default: y=3b000;endcaseend//outputsendmodule

    Testbench (1)

  • 8/10/2019 Arth_Cir

    104/105

    module stimulus;parametern=8;reg[n-1:0] a,b;reg mul, clock;wire[2*n-1:0] result;

    wire ready;multiplier stud(a,b,mul,clock,result,ready);initial beginmul=0; clock=0;a=8b0; b=8b0;#15 mul=1;# 100 wait (ready);mul = 0;#10 a=8d15; b=8d122;

    Testbench (2)

  • 8/10/2019 Arth_Cir

    105/105

    #15 mul=1;#100 wait(ready);mul=0;#10 a=8d201; b=8d5;#15 mul=1;# 100 wait (ready);

    mul=0;#10 a=8d255; b=8d255;#15 mul=1;#100 wait(ready);mul=0;#100 $finish;

    endalways #10 clock=~clock;endmodule //stimulus