arth_cir

8/10/2019 Arth_Cir

1/105

Logic Design

8/10/2019 Arth_Cir

2/105

Boolean Functions 3 lectures

Boolean Functions Minimization. Combinational Logic DesignPrinciples

4 lectures Brief Description of Verilog 3 lectures

Basic Combinational Circuits 4 lectures

Finite States Machines (FSM) 3 lectures

Synthesis of Synchronous FSM 5 lectures

Content (1/2)

8/10/2019 Arth_Cir

3/105

Basic Sequential Circuits 3 lectures

Problems of Synchronous Design 3 lectures

Asynchronous FSM. Self-Timed Circuits 3 lectures

Arithmetic Units 4 lectures

Programmable Logical Integrated Circuits (PLDs) 3 lectures

Memory Devices 3 lectures

Content (2/2)

8/10/2019 Arth_Cir

4/105

Positional Number Systems Decimal base or radix=10 Binary radix=2=an-1an-2an-3 . . . a1a0,a-1a-2 . . .-m ; a{0,1}

There are n digits to the left of the point and m digits to theright of the point.

A=an-12n-1+an-22

n-2+. . .+a121+a02

0+a-12-1+a-22

-2+. . .+ a-m2

-m.Unsigned integer number

Range (0 to 2n-1)

n-1 0

Unsigned Numbers

8/10/2019 Arth_Cir

5/105

0000

0111

0011

1011

1111

1110

1101

1100

1010

1001

1000

0110

0101

0100

0010

0001

+0

+1

+2

+3

+4

+5

+6

+7+8

+9

+10

+11

+12

+13

+14

+15

AdditionSubtraction

A modular counting representation of 4-bit unsigned numbers

A Graphical View

8/10/2019 Arth_Cir

6/105

Signed Numbers

S

n-1 0

S = 0 Positive number (or zero)

S = 1 Negative numberNegative numbers representationThree Major schemes: sign and magnitude - direct code

ones complement twos complement

8/10/2019 Arth_Cir

7/105

0000

0111

0011

1011

1111

1110

1101

1100

1010

1001

1000

0110

0101

0100

0010

0001

+0

+1

+2

+3

+4

+5

+6-1

-2

-3

-4

-5

-6

-7

+7-0

A=-an-2an-3 . . .a1a0Asign&magn=1an-1an-2a1a0

Example:

+5 0101-5 1101

Range(2n-1- 1) to 2n-1-1Two representation for 0

Operands have different signs - subtract smaller(by magnitude)from larger and keep sign of the larger

Sign and Magnitude

(Direct Code)

8/10/2019 Arth_Cir

8/105

A= - an-2an-3 a1a0A1scom= 1~an-2~an-3 ~a1~a0

Example:

+5 - 0101-5 - 1010

A1scom= 2n- 1- |A|

Range -2n-1 -1 to 2n-1-1

Ones Complement

8/10/2019 Arth_Cir

9/105

Ones Complement on the Number Wheel

Two representation for 0

A - A = -00000

0111

0011

1011

1111

1110

1101

1100

1010

1001

1000

0110

0101

0100

0010

0001

+0

+1

+2

+3

+4

+5

+6-6

-5

-4

-3

-2

-1

-0

+7-7

Addition ofpositive number

Subtractionof positivenumber

8/10/2019 Arth_Cir

10/105

A= - an-2an-3 a1a0A1scom= 1an-2an-3 a1a0 + 1

Example:

+5 - 0101-5 - 1011

A2scom= 2n- |A|

Range -2n-1 to 2n-1-1

Twos Complement

8/10/2019 Arth_Cir

11/105

Twos Complement on the Number Wheel

+7-8

0000

0111

0011

1011

1111

1110

1101

1100

1010

1001

1000

0110

0101

0100

0010

0001

+0

+1

+2

+3

+4

+5

+6-7

-6

-5

-4

-3

-2

-1

Addition ofpositivenumber

Subtractionof positivenumber

A2scom = 2n -A

= - 2n-1 an-1 + 2i ai

8/10/2019 Arth_Cir

12/105

Twos Complement Addition (1)

Addition:C = A + B. A 0, B 0, |A| BA2com + B = 2

n |A| + B = B |A|

The result is positive and carry from sign bit (2n

) is discarded.Example : + 0011111 = +0 0011111

0000111 = 1 1111001+ 0011000 = 10 0011000

0, B 0, |A| B.

Acom + B = 2n (|A|B)= 2n |C|= CcomExample: 0011100 = +1 1100100+ 0000100 = 0 0000100 0011000 1 1101000

8/10/2019 Arth_Cir

13/105

Twos Complement Addition (2)

A 0, B 02n |A| + 2n |B|= 2n (|A| + |B|) + 2n = 2n |C| = C2com

The result is negative and carry from sign bit (2n) is discarded.

Example : 0001101 = 1 1110011 0011001 = 1 1100111 0100110 = 11 1011010

Summary: The sign bit participates in operation like other bits. The negative result is represented in twos complement

form. The carry from the sign bit is ignoredSubtraction: C = A-B = A + (-B) = A + Complemented B

8/10/2019 Arth_Cir

14/105

Addition:C = A + B. A 0, B 0, |A| B.A1scom + B = 2

n -1 |A| + B = B |A|+2n-1

The result is positive and the carry from the sign bit ( 2n) is added tothe least bit of the result (end-around carry)

Example : + 0011111 = +0 0011111 0000111 = 1 1111000

+0011000 = 10 0010111

10 0011000

Ones Complement Addition (1)

8/10/2019 Arth_Cir

15/105


0, B 0, |A| B.

A1scom + B = 2n - 1- (|A|B)= 2n -1- |C|= C1scom

Example : 0011100 = + 1 1100011

+ 0000100 = 0 0000100

0011000 = 1 1100111

8/10/2019 Arth_Cir

16/105


A 0, B 0 12n -1- |A| + 2n |B|= 2n -1 (|A| + |B|) + 2n -1 = 2n |C| = C1scomIn this case end-around carry is generatedExample : 0001101 = 1 1110010

0011001 = 1 1100110 0100110 = 111011000

11 1011001

Summary: The sign bit participates in operation like other bits.

The negative result is represented in ones complement form. The carry from the sign bit is end-around carry

Simpler addition scheme makes twos complement the most commonchoice for integer number systems within digital systems

8/10/2019 Arth_Cir

17/105

If an addition operation produced a result that exceeds the range of thenumber system, overflow is said to occur.Addition of two numbers with different signs can never produceoverflow (only addition numbers with the same signs).

Example:-64 + 11000000 +50 + 0 0110010

-65 10111111 80 0 1010000

-129 01111111 = +1 130 1 0000010 = - 126

Negative Overflow Positive Overflow

Overflow: if the addends signs are the same but the sums sign isdifferent from the addends. OVF = cncn-1

Overflow

8/10/2019 Arth_Cir

18/105

Inputa b Outputs cout

0 00 11 0

1 1

0 01 01 0

0 1

Half AdderThe function of the half adder is to add two binary digits,producing a sum and a carry.

s= a b;cout = ab

a

b s

cout

Adders

8/10/2019 Arth_Cir

19/105

Input

a b cin

Output

s cout0 0 00 0 10 1 00 1 1

1 0 01 0 11 1 01 1 1

0 01 01 00 1

1 00 10 11 1

The function of the full adder is to add two binary digits and acarry that might be generated or propagated by the previousstage.

Full Adder (1)

8/10/2019 Arth_Cir

20/105

S=a b c;cout = ab + acin + bcin (majority function)or S=~cout(a+b+cin)+abcin

bcina 00 01 11 10

0

1

1 1

1 1

s

bcina 00 01 11 10

0

1

1

11 1

cout

Full Adder (2)

8/10/2019 Arth_Cir

21/105

The Circuit of Full Adder (1)

cin

ab

s

cout

s1

c1

c2

c3

Cout=ab +cin(ab)Standard approach 6 gates

8/10/2019 Arth_Cir

22/105

The Circuit of Full Adder (2)

Cout=ab +cin(ab)= ab cin(ab)

5 gates

cin

ab

sum

cout

s1

c1

c2

8/10/2019 Arth_Cir

23/105

8/10/2019 Arth_Cir

24/105

Full Adder from Two Half Adders

Half

Adder

A

B

Half

Adder

A B

Cin

A B CinS S

COCO

Cin (A B)A B

S

CO

8/10/2019 Arth_Cir

25/105

Inversion Property

Boolean functions S and Cout are self-dual.

A B

S

CinFACout

A B

S

FACout Cin

Cout (A, B, Cin) = Cout (A, B, Cin)

S (A, B, Cin) = S(A, B, Cin)

8/10/2019 Arth_Cir

26/105

Cout

A0 B0

S0

CinFA

A1 B1

S1

FA

A2 B2

FA

A3 B3

S3

FA

S2

Invertors on the way of carry signal may be removed (thiswill minimize the critical path of carry chain).

Inversion Property

8/10/2019 Arth_Cir

27/105

an-1 an-2 . . . a1 a0ABCin

FAS

Cout

bn-1 bn-2 . . . b1 b0

sn-1 sn-2 . . . s1 s0

D Q

C

RA

RB

Clock

RS

The Serial Adder

8/10/2019 Arth_Cir

28/105

cout

(c4)

A S

B

CIN CO

A S

B

CIN

CO

A S

B

CIN CO

A S

B

CIN CO

a0b0cin(c0)

a1b1

a2b2

a3b3

s0 s1 s2 s3

c1 c2 c3

Carry Ripple Adder.S= A+ B; A= (a0,a1,a2,a3); B= (b0,b1,b2,b3);S0 = a0 b0 c0; c1= a0b0 + (a0 + b0) c0;

S1 = a1 b1 c1; c2= a1b1 + (a1 + b1) c1;S2 = a2 b2 c2; c3= a2b2 + (a2 + b2) c2;S3 = a0 b3 c3; c4= a3b3 + (a3 + b3) + c3;Tadd= (n-1)tc + tsm n tsm

A Parallel Binary Adder

8/10/2019 Arth_Cir

29/105

// Define a 4-bit addermodule add4(s, c_out, a, b, c_in); // I/O port declarationsoutput [3:0] s;output c_out;

input [3:0] a, b;input c_in;// Internal netswire c1, c2, c3;

Verilog Description for 4-bit CRA (1)

8/10/2019 Arth_Cir

30/105


(Gate Level Description)

// Instantiate four 1-bit full adders.fulladd fa0(s[0], c1, a[0], b[0], c_in);fulladd fa1(s[1], c2, a[1], b[1], c1);

fulladd fa2(s[2], c3, a[2], b[2], c2);fulladd fa3(s[3], c_out, a[3], b[3], c3);endmodule// Define a 1-bit full addermodule fulladd(sum, c_out, a, b, c_in);

// I/O port declarationsoutput sum, c_out;input a, b, c_in;

8/10/2019 Arth_Cir

31/105

8/10/2019 Arth_Cir

32/105

module adder_4_RTL (a, b, c_in, sum, c_out);

output [3:0] sum;

output c_out;

input [3:0] a, b;

input c_in;

assign {c_out, sum} = a + b + c_in;

endmodule


8/10/2019 Arth_Cir

33/105

Tadd=TFA(A,BCout) + (N-2)TFA(CinCout) + TFA(CinS)

T = O(N) worst case delay.

N number of bit.Real Goal: Make the fastest possible carry path.

S

B

CIN CO

S

B

CIN CO

S

B

CIN CO

0 1 31s0 s1 s31

b0b1 b31

c32~Add/Sub

A 64-bit Adder/Subtractor

8/10/2019 Arth_Cir

34/105

Adder/Subtractor Module in Verilog

module addsub(a, b, select, cout, sum);

input [7:0] a, b;input select;output [7:0] sum;output cout;assign {cout, sum}=select?(a-b):(a+b);

endmodule

Select = 0 Addition

Select = 1 Subtraction

Data-flow description

8/10/2019 Arth_Cir

35/105

Inputa b cin

Outputs cout cout s

Carrystatus

0 0 00 0 1

0 01 0

0 cin0 cin

annihilateannihilate

0 1 00 1 11 0 01 0 1

1 00 11 00 1

cin ~cincin ~cincin ~cincin ~cin

propagatepropagatepropagatepropagate

1 1 01 1 1

0 11 1

1 cin

1 cin

generategenerate

Carry Look-Ahead Adders (1)

8/10/2019 Arth_Cir

36/105

All carries are produced in parallelci+1 = gi + pici, where gi = aibi , pi = ai + bi (or pi=aibi).gi carry generationpi carry propagationRe-express the carry logic for each of the bits:c1 = g0 + p0c0;c2 = g1 + p1c1 = g1+ p1g0 + p1p0c0;c3 = g2 + p2(g1 + p1g0 + p1p0c0) = g2 + p2 g1 + p2p1g0 ++ p2p1p0c0;

c4 = g3 + p3g2 + p3p2g1 + p3p2p1g0 + p3 p2p1p0c0;Each equation corresponds to a circuit with just three levels ofdelay one for generate and propagate signals, and two for thesum of products.


8/10/2019 Arth_Cir

37/105

FA FA FA FA

Carry Unit

g0 p0 p1g1 g2 p2 g3 p3

a0 b0 a1 b1 b2a2 a3 b3

cin

coutc1 c2 c3


8/10/2019 Arth_Cir

38/105

One bit CLA

cin

cout

aibi

gi

pi

0

1

Si

8/10/2019 Arth_Cir

39/105

One Stage of a Carry Look-Ahead Adder

Carry-

Lookahead

Logic

aibi

hsici

a0

ai-1b0

bi-1

si

8/10/2019 Arth_Cir

40/105

Lookahead carry circuit (Carry Unit)forms carry signalsc4 = G + Pc0, where G = g3 + p3g2 +p3p2g1 + p3p2p1g0 and P = p3

p2p1p0c0.This equations formal coincide withequations ci+1=gi + pici .Lookahead carry for 4-bit sections ALUis executed like lookahead carry for

separate bits of 4-bit adder.

C0G0P0G1P1G2P2G3P3

C1

C2

C3GP

CRU (Carry Unit)

8/10/2019 Arth_Cir

41/105

MSI Adders

IC 74x283

Adder produces active-low versions of the carry-generateand carry-propagate signals.Equations for half-sum:

hsi = aibi = ai~bi +~aibi=ai~bi+ai~ai+~aibi+bi~bi==(ai+bi)(~ai+~bi)=(ai+bi)~(aibi)=pi~gi

And gate can be used instead of XOR gateEquations for carry is factored

ci+1 = gi+pici=pigi+pici=pi(gi+ci)

8/10/2019 Arth_Cir

42/105

c1= p0(g0+c0)c2= p1(g1+c1) = p1(g1+ p0(g0+c0))== p1(g1+p0)(g1+g0+c0)c3= p2(g2+c2) = p2(g2+ p1(g1+p0)(g1+g0+c0))=

=p2(g2+ p1)(g2+g1 +p0)(g2+g1+g0+c0)c4 = p3(g3+c3)= p3(g3 + p2(g2+ p1)(g2+ g1+p0)(g2+g1+g0+c0))==p3(g3 + p2)(g3+g2+p1) (g3+g2+g1+ p0)(g3+g2+ g1+g0+c0)

The propagation delay from the C0 input to the C4 outputis very short, about the same as two inverter gates.

Equations for Carry Signals

8/10/2019 Arth_Cir

43/105

C0A0B0A1

B1A2B2A3B3

S0

S1

S2

S3

C4

Logic Symbol

IC 74x283

8/10/2019 Arth_Cir

44/105

Cin

Cout

B[0:15]

C0

0 S0

B0

A1 S1

B1A2 S2

B2

A3 S3

B3

C4

C0

0 S0

B0

A1 S1

B1A2 S2

B2

A3 S3

B3

C4

C0

0 S0

B0

A1 S1

B1A2 S2

B2

A3 S3

B3

C4

C0

0 S0

B0

A1 S1

B1A2 S2

B2

A3 S3

B3

C4

A[0:15]

S[0:15]

11

22

33

44

5

6

7

88

7

6

5

7

99

10

10

1111

1212

1313

1414

1515

1616

1

2

3

4

5

6

8

9

10

11

12

12

13

14

16

Tadd=Mt(c0c4)=4t(c0c4), where the number of groups

A 16-bit Group-Ripple Adder

8/10/2019 Arth_Cir

45/105

8/10/2019 Arth_Cir

46/105

The following 3 functions are formed in each bit of theadder:G = Ai Bi(Generate);

P = AiBi(Propagate)K= ~Ai~Bi (Annihilate (kill))

Carry Bypass Adder

(Carry Skip Adder)

8/10/2019 Arth_Cir

47/105

The idea of carry bypass adder:

P0=a0+b0; P1=a1+b1; P2=a2+b2;P3=a3+b3.If P0 P1 P2 P3 =1, then Cout=Cin, else Cout = C4 (Generate).BP block propagation.

FA FA FA FA

MUX

a0 b0 a1 b1 a2 b2 a3 b3 B P = P0P1P2P3

Cin

Cout

C4

S0 S1 S2 S3

0

1

Carry Bypass (Skip) Adder

8/10/2019 Arth_Cir

48/105

S0 - S3

Cin

B3Bit 12-15

Setup

Carry

Propa-gation and

C0

Setup

Carry

Propa-

gation and

C1

Setup

Carry

Propa-

gation and

C2

Setup

Carry

Propa-gation and

C3

Sum Sum Sum Sum

B0Bit 0-3

B1Bit 4-7

B2Bit 8-11

BP0 BP0BP1 BP0

BP1

BP2 BP0BP1BP2BP3

Cout

0

1

C0 C1 C2C3

BP0 BP1 BP2BP3

S4 S7 S8 S11 S12 S15

0

1 1 1

0 0

Worst-case delay carry from bit 0 to bit 15 = carry generated in bit

0, ripples through bits 1, 2, and 3, skips the middle two groups (B isthe group size in bits), ripples in the last group from bit 12 to bit 15.Tadd = tsetup + B tcarry + ((N/B) - 2) tskip +(B-1) tcarry + tsumtsetup- time for forming gs and ps.

Carry Skip Adder

8/10/2019 Arth_Cir

49/105

C0 carry from B0, C1 carry from B1, C2 carry from B2,C3 carry from B3.tsetup - time necessary for creating generation and propagationsignals (gi,pi).

tcarry one bit propagation signal delay.tbypass propagation signal delay through bypass multiplexer.tsum time required for forming sum of the last bit.Dependence of timing delay from the number of bits is moreacceptable than in CRA (is also linear function but with less

angular coefficient).

Carry Skip Adder

8/10/2019 Arth_Cir

50/105

Carry ripple is realized in the blocks.Accept tcarry = tskip = tsetup = 1; thenadd =1 + B +(N/B2) + B + 1 = 2B + N/B-1 ;dTadd/dB = 2 N/B2;

dTadd/dB = 0 at Bopt = (N/2)Topt = 4(n/2) 1 = 2(2n) 1

Optimal Skip Block Size and Add Time

8/10/2019 Arth_Cir

51/105

N bit circuit is divided into M blocks by B bits.Precompute the carry out of each block for both carry_in = 0 andcarry_in = 1 (can be done for all blocks in parallel) and then selectthe correct one. The adder circuit is completed about 30%

4 bits

Carry0 Carry1

MUX

Sum

0

CinCout

1

0 1

Carry Select Adder

C S

8/10/2019 Arth_Cir

52/105

Tadd = tsetup + B tcarry + (N/B) tmux + tsum

Setup

0 carry

1 carry

Mux

Sum

0

1

15 ... 12

A B

Ps Gs

Cs

S15... S12

Setup

0 carry

1 carry

Mux

Sum

0

1

Ps Gs

Cs

S11... S8

Setup

0 carry

1 carry

Mux

Sum

0

1

Ps Gs

Cs

S7... S4

Setup

0 carry

1 carry

Mux

Sum

0

1

Ps Gs

Cs

S3... S0

CinCout

13 ... 8

A B

7 ... 4

A B

3 ... 0

A B

(1)

(5)(5)(5)(5)(6)(7)(8)(9)

(1)(1)(1)

(5)

Carry Select Adder

8/10/2019 Arth_Cir

53/105

C S l t Add (S R t)

8/10/2019 Arth_Cir

54/105

Cin

Tadd = tsetup + 2 tcarry + m tmux + tsum

1 0

A B

4 ... 2

A B

Setup

0 carry

1 carry

Mux

Sum

0

1

19 ... 14

A B

Ps Gs

Cs

S19... S14

Setup

0 carry

1 carry

Mux

Sum

0

1

Ps Gs

Cs

S13... S9

Setup

0 carry

1 carry

Mux

Sum

Ps

Cs

S8... S5

Cout

13 . 9

A B

8 ... 5

A B

1

Setup

0 carry

1

carry

Mux

Sum

0

1

Ps Gs

Cs

S1 S0

Gs

0

Gs

Setup

0 carry

1 carry

Mux

Sum

Ps

Cs

S4...S2

0

1

0

(1)

(3)(3)(4)(5)(6)(7)(8)

Carry Select Adder (Square Root)

C S Add

8/10/2019 Arth_Cir

55/105

Carry Save Adder

Consider addition of three numbers. In this case two vectors areformed: sum vector S and carry vector C:

Example: x+y+z = s+c

x: 1001111

y: 1100100z: + 0001111s: 0100100c: +1001111

sum: 11000010

At addition of N n-bit numbers, the number of bits of sum will equallog2N+n. CSA is used for adding more than two numbers together.

Ci it f dd f 3 4 bit b

8/10/2019 Arth_Cir

56/105

A B A B A B A B

A B A B

Cin

Cout S

Cin

Cout SCout S

Cin Cin

Cout S

Cin Cin

AB

AB

FA FA FA FA

HAHA FA FA

CoutCoutCoutCout

x0 y0 z0x1 y1 z1x2 y2 z2x3 y3 z3

sum0sum1sum2sum3sum4sum5

Circuit of adder of 3- 4-bit numbers

CSA

8/10/2019 Arth_Cir

57/105

Advantage of CSA pipeline capability.

CSA1

D

C

DC

DC

CSA2DC

DC

CRA

A4

A1

A2

A3

Clock 1Clock 2

Ss

Cs

CSA

C diti l S Add

8/10/2019 Arth_Cir

58/105

SMLow

ALow

SMHigh SMHigh

BLow

BHighAHigh

MUX MUX

Cin =1 Cin =0

Cout0

Cout1CoutL

Cout

S

n-bit adder is divided into two groups by n/2 bits. The oldergroup is duplicated, so three adders by n/2 bits are included inthe circuit.

Conditional Sum Adder

Th St t f E ti U it

8/10/2019 Arth_Cir

59/105

The Structure of Execution Unit

OA- Operational(or Execution) UnitCU Control UnitOA consists of registers,adders, another logicalelements and wires.CU produces control signals,that bring to execution ofops.

OU CU

Data in

Data out

Command

Done

X

Y

EU f I t M lti li ti (U i d)

8/10/2019 Arth_Cir

60/105

EU for Integer Multiplication (Unsigned)

RA RB

SM

0

MUX

RC(acc)

CT

y1 y1

y1

y1 n

y3

y3

y3y2

Control

Unitx2

x1

y1 y2 y3

RA multiplicand, RB multiplier, RC (accumulator) high bitsof sum of partial products. Possible to combine multiplierregister (low bits) and accumulator register (high bits).

n bit n bit

n bit

2n bit

A B

Flo Chart of M ltiplication (1)

8/10/2019 Arth_Cir

61/105

Start

Multiply?

RA = A; RB =B;CT =n; RC =0

Yes

No

Y1

X1

RC = SM

Shift right RC, RB;CT =CT-1

(CT)=0?

End

Yes

Y2

Y3

X2No

Flow-Chart of Multiplication (1)

Example of Multiplication

8/10/2019 Arth_Cir

62/105

Example of Multiplication

AccumulatorRC

RB CT

0000 0000 1010 4

0000 0000 0101 3 shift

+0000 00001101 00001101 0000

0110 1000 0010 2 shift

0011 0100 0001 1 shift

+0011 0100110110000 0100

1000 0010 0000 0 shift

A=1101;B=1010;

Signed multiplication:

convert negativenumbers to positive,execute unsignedmultiplication,

remember the originalsigns.

Behavior Description of Multiplier in

8/10/2019 Arth_Cir

63/105

modulemultiplier (a, b, mul, clock, result, ready);input clock, mul;parametern=8;input [n-1:0] a,b;wire[n-1:0] a,b;

reg [2*n-1:0] result;output ready;reg ready;output [2*n-1:0] result;reg [n:0] rc;reg [n-1:0] ra, rb;always @(posedge mul)beginra=a;rb=b;


Verilog(1)


8/10/2019 Arth_Cir

64/105


Verilog(2)

ready=0;rc = 0;repeat (n)begin@(posedge clock)if(rb[0])rc =rc+ra;rb={rc[0],rb[n-1:1]};rc=rc>>1;end

result={rc[n-1:0],rb};ready=1;endendmodule //multiplier

Multiplying Unit 2

8/10/2019 Arth_Cir

65/105

RA RB

SM

MUX

RC

0

CT

The structure of multiplying

unit with the shift of multiplierto the right and multiplicand tothe left

A B

2n bit

2n bit

2n bit

n bit

Multiplying Unit 2

Multiplying Algorithm

8/10/2019 Arth_Cir

66/105

Start

Multiply?

RA:= A; RB:=B;

CT:=n; RC:=0

Yes

No

No

RC:=SM

SL (RA)SR (RB)

CT:=CT-1

(CT)=0?End

Yes

Yes NoReturn a result

(R) or (RB) =0?

Multiplying Algorithm

Multiplication on Signed Numbers

8/10/2019 Arth_Cir

67/105

RA

SM

ACC RB RB0 RB-1

0 1

DC0 1 2 3

CT

B

0011

0110

A

Multiplication on Signed Numbers

(Booths Algorithm)

Booths Algorithm

8/10/2019 Arth_Cir

68/105

Start

No

MultiplyNo

Yes

RA=A, RB=B, CT= n,

ACC=0, RB-1=0

RB0,RB-1

ACC=ACC+RAACC=ACC+

RA+1

ASR(ACC, RB, RB-1)CT = CT-1

Multiply?

(CT)==

0?Yes

0110

0011

ASR arithmetical right shift (sign extend when shifting)

Booth s Algorithm

Example

8/10/2019 Arth_Cir

69/105

Acc RB RB-1 CT

+000000000110101001101010

11000111

11000111

0 8

00110101 01100011 1 7

00011010 10110001 1 6

+000011011001011010100011

01011000

01011000

1 5

11010001 10101100 0 4

11101000 11010110 0 3

+11110100

0110101001011110

01101011

01101011

0 2

00101111 00110101 1 1

00010111 10011010 1 0

A = 10010110;

B = 11000111;

(A)com = 01101010;

Example

Substantiation of the algorithm (1)

8/10/2019 Arth_Cir

70/105


1. B>0

A*(00011110)=A*(24 +23 +22 +21) = A*30The set of addition operations can be replaced only by twooperations (addition and subtraction) as the following expressions

take place:2n + 2n-1+ . . . +2n-k = 2n+1 2n-k

*(00011110)=A*(25-21)=A*30.This can be expanded at any number of consequently following1s.This algorithm is called Booths recoding.

(0,1) (-1,0,1)Multiplier: 00011110 0,0,1,0,0,0, -1,0Instead of 4 additions - 2.


8/10/2019 Arth_Cir

71/105

2. B

8/10/2019 Arth_Cir

72/105

module Booth_multiplier(a,b,clock,start, ready,result);parametern=16;input[n-1:0] a,b;wire [n-1:0] a,b;input clock, start;output[2*n-1:0] result;reg[2*n-1:0] result;output ready;reg ready;reg[n-1:0] acc,ra,rb;reg q;always@(posedge start)begin

ra =a;rb=b;acc=0;q=0; ready=0;

Booth Multiplier in Verilog (1)


8/10/2019 Arth_Cir

73/105

repeat (n)begin@(posedge clock)if(rb[0]!==q)begin if(q)acc=acc+ra;else acc=acc-ra;endq=rb[0];rb={acc[0],rb[n-1:1]};acc={acc[n-1],acc[n-1:1]};//arithmetic shift rightendresult={acc,rb};

ready =1;endendmodule//Booth_multiplier


Combinational Multipliers

8/10/2019 Arth_Cir

74/105

Combinational Multipliers

Acceleration methods of multiplication: parallel computing of partial products reduction of number of additions reduction of propagation time delay

Two types of multipliers are used matrix and treestructured.Propagation delay of matrix multipliers (n).Propagation delay of tree structured multipliers O(log2n).

Partial Products in an 4 4 Multiplier

8/10/2019 Arth_Cir

75/105

a0b3 a0b2 a0b1 a0b0

a1b3 a1b2 a1b1 a1b0

a0b3

a0b2 a0b1 a0b0

a1b3 a1b2 a1b1 a1b0

p0p1p2p3p4p5p6p7

+

a0b3 a0b2 a0b1 a0b0

a0b3 a0b2 a0b1 a0b0

Partial Products in an 44 Multiplier

Matrix Multiplier Based on CRA

8/10/2019 Arth_Cir

76/105

Matrix multiplier contains n2 AND gates to formpartial products.

Multiplier based on CRA contains(n-1)n adders. The number of HA n;

The number of FA is n2

-2n. In the worst case the propagation delay equal 3n-4.

Matrix Multiplier Based on CRA

Matrix Multiplier Based on CRA Structure

8/10/2019 Arth_Cir

77/105

a0b0a1b0a2b0a3b0

a0b1a1b1a2b1a3b1

++++

a0b21b22b23b2

++++

a0b31b32b33b3

++++

p0p1p2p3p4p5p6p7

0

0

0

0

Matrix Multiplier Based on CRA Structure

Matrix Multiplier with Carry Save Addition

8/10/2019 Arth_Cir

78/105

Matrix Multiplier with Carry Save Addition

Matrix multiplier using carry-save addition contains thesame number of elements.

It is more faster because its propagation delay isshorter.

The last (n) stage corresponds to CRA. Its worst-case carry propagation path goes through2n-2 adders.

Matrix Multiplier using Carry Save

8/10/2019 Arth_Cir

79/105

a0b0a1b0a2b0a3b0

a0b1a1b1a2b1a3b1

+++

a0b2a1b2a2b2a3b2

+++

a0b3a1b3a2b3a3b3

+++

p0p1p2p3p4p5p6p7

+++

Matrix Multiplier using Carry Save

Addition

Treelike Multipliers

8/10/2019 Arth_Cir

80/105

Treelike multipliers contain three stages: Generation of bits of partial products. This stage consists

of n2 of AND gates. Compression of partial products. Implemented as a tree

of parallel adders. Final addition. Addition of sum vector and carry vector.While using in multipliers, full adders and half adders areusually called compressors and counters (3,2) (2,2).

Treelike Multipliers

Wallace-tree multiplier (1)

8/10/2019 Arth_Cir

81/105

a3 a2 a1 a0b3 b2 b1 b0

a0b0a1b0a0b1

a2b0a1b1a0b2

a3b0a2b1a1b2a0b3

a3b1a2b2a1b3

a3b2a2b3a3b3

c 15 s14 c14 s13 c13 s12 c12 s11

a0b0s11

a1b0a0b1

s12c12

s13c13 a0b3

s14c14 a1b3

c15 a3b3a2b3

a3b3

c 24 s24 c23 s23 c23 s22 c21 s21s11s21 a0b0s22

c21

s23c22c31

s24c23c32

c24a3b3c33

c 24 s34 s33 s32 s31p7 p6 p5 p4 p3 p2 p1 p0

Wallace tree multiplier (1)

Wallace-tree multiplier (2)

8/10/2019 Arth_Cir

82/105

a0b0a1b0a2b0a3b0

a0b1a1b1a2b1a3b1

+

p0p1p2p3p4p5p6p7

a0b2a1b2a2b2a3b2

+++

a0b3a1b3a2b3a3b3

++++

++++

Wallace tree multiplier (2)

Wallace-tree

8/10/2019 Arth_Cir

83/105

Lines of matrix of partial products are grouped in three. For the compression of columns with three bits FA are

used. For compression of columns with two bits HA areused.

Line that are not included in a set of three lines areaccounted in the next reduction cascade. Wallace scheme is considered to be the fastest, but at

the same time its structure is the least regular. The main area of Wallace tree uses is a construction of

schemas of large capacity.

Wallace tree

Dadda Multiplier (1)

8/10/2019 Arth_Cir

84/105

c12 s12 s11 a2b0 a1b0a0b0a3b2 c11 a0b3 a1b1 a0b1a2b3 a0b2

c24 s24 s23 s22 s21 a1b0 a0b0a

3

b3

c23

c22

c21

c31

a0

b1c35 c34 c33 c32

a3b1a2b2a1b3

a0b0a1b0a0b1

a2b0a1b1a0b2

a3b0a2b1a1b2a0b3

a3b2a2b3a3b3

c12 s12 c11 s11

c24 s24 c23 s23 c22 s22 c21 s21

a3b3

c36 s36 s35 s34 s33 s32 s31p7 p6 p5 p4 p3 p2 p1 p0

a3 a2 a1 a0b3 b2 b1 b0



8/10/2019 Arth_Cir

85/105

a0b0a1b0a2b0a3b0

a0b1a1b1a2b1a3b1

a0b2a1b2a2b2a3b2

a0b3a1b3a2b3a3b3

+ + +

+++

+ +++++

p7 p6 p5 p4 p3 p2 p1 p0



8/10/2019 Arth_Cir

86/105

The difference in Wallace and Dadda methods is the differentapproach in the solution of addition compression problem.

Wallace algorithm compresses codes as soon as possible, at theearly stages.

Dadda algorithm provides the highest level of compression at the

late stages. A Wallace-tree multiplier works forward from the multiplier inputs. The Dadda multiplier works backward from the final product. The number of cascades is the same in the both multipliers. Both lacks in structure regularity.

The number of stages and thus delay (in units of an FA delayexcluding the CPA) for an n-bit tree-based multiplier using (3, 2)counters is log1.5.n = log10 n/log101.5 =log10n/0.176

adda u t p e (3)

Example of Sequential Multiplier

8/10/2019 Arth_Cir

87/105

p q p

RA RB

SM

0

MUX

RC(acc)

CT

y1

y1n

y3

y3

y3y2

Control

Unit

x2

x1

y1 y2 y3

RA multiplicand, RB multiplier, RC (accumulator) high bits of sumof partial products. Multiplier register (low bits) sum.

n bit n bit

n bit

n bit

Flow-Chart of Multiplication Algorithm(1)

8/10/2019 Arth_Cir

88/105

p g ( )

Begin

Multiply?


YesNo

Y1

X1

RC = SM

Shift right RC, RB;CT =CT-1 (CT)=0?

End

Yes

Y2

Y3

X2No

Flow-Chart of Multiplication Algorithm(2)

8/10/2019 Arth_Cir

89/105

p g ( )

Begin


Yes

No

Y1

In1

Yes

Y2

Done?

No

RC = RC + MUX

(Transfer with shift)

Shift RB,

CT=CT-1

(CT)=0End

Multiply?

Transition to the Mealy FSM

8/10/2019 Arth_Cir

90/105

Mealy FSM

FSM State markup

A = {X,Y,S,,}

X={mul, done};Y={y1,y2,ready};

S= {S0, S1, S2}

Begin

mul

y1

1

0

y2

Done

End

0

1

S2

S0

S1

S0

y

Mealy FSM Graph

8/10/2019 Arth_Cir

91/105

S0

S1

S2

~mul/-

mul/y1

1/y2

done/ready

S3

~mul/-mul/ready

~done/y2

y p

Transition to the Moore FSM

8/10/2019 Arth_Cir

92/105

Begin

mul

y1

1

0

y2

Done

End

0

1

S0

S1

S2

S3

Moore FSM

8/10/2019 Arth_Cir

93/105

Y1:RA=A; RB=B;CT=n;Ready=0Y2:RC=RC+MUX (transfer with

shift right); shift right RB;CT=CT-1;Y3: ready =1;

~done

S0/-

S1/y

1

~mul

mul

1done

~mulmul

S2/y

2

S3/ready

reset

Structure of Modules HDL Description

8/10/2019 Arth_Cir

94/105

rareg_a

clock

comb_logicpart_prod

acc(result[2n-1:n])accumulator

rb (result[n-1:0])

counter

fsm

done

y

n

reg_b

a

by1 y2

y1 y2

mul

y1

rb[0]

y2y1

y2

clock

y1

clock

clock

clock

y3

p

RTL-Description of Multiplier (1)

8/10/2019 Arth_Cir

95/105

module ser_mult (mul, result, a, b, clock, reset, ready);output[15:0] result;reg [15:0] result;input[7:0] a,b;input mul,clock,reset;output ready;wire[7:0] acc;wire ready;wire [2:0] y;wire [7:0] ra,rb;wire [8:0] part_prod;reg_a M1(clock, a,y[0],ra);reg_b M2(clock, b, y[0],y[1],rb);

accumulator M3(clock,y[0],y[1],part_prod,acc);comb_logic M4(part_prod, ra,acc, rb);counterM5(y[0],y[1],clock,count);fsm M6(clock, mul, reset, done, y);

p p ( )


8/10/2019 Arth_Cir

96/105

assign ready=y[2];always @(posedge clock)if(ready) result={acc[7:0], rb[7:0]};endmodule

module reg_a(clock,a,y[0],ra);

input[7:0] a;output[7:0]ra;reg[7:0] ra;input clock,y[0];always @(posedge clock)

begin

if(y[0]) ra

8/10/2019 Arth_Cir

97/105

module reg_b(clock,b,y[0],y[1],rb);input clock,y[0],y[1];input[7:0] b;output[7:0] rb;reg[7:0] rb;

always @(posedge clock)begin

if(y[0]) rb

8/10/2019 Arth_Cir

98/105

module accumulator(clock,y[0],y[1],part_prod,acc);input[8:0] part_prod;output[7:0] acc;reg [7:0] acc;always @(posedge clock)

beginif(y[0]) acc

8/10/2019 Arth_Cir

99/105

module comb_logic (part_prod, ra,acc, rb[0]);input rb[0];input [7:0] ra,acc;output [8:0] part_prod;wire [8:0] part_prod;

assign part_prod = rb[0]?(acc+ra:acc);endmodule


8/10/2019 Arth_Cir

100/105

module counter (clock, y[0], y[1], done);input clock, y[0], y[1];reg [3:0] count;output done;reg done;

always @ (posedge clock)case ({y[0],y[1]})

00: count

8/10/2019 Arth_Cir

101/105

FSM (2)

8/10/2019 Arth_Cir

102/105

//next_state logicalways @(state ormul ordone)begin:statescase (state)s0: begin if(mul)next_state=s1;else next_state=s0; end

s1: next_state=s2;s2: begin if(done) next_state=s3;else next_state=s1;ends3: if (mul)next_state=s3;

else next_state=s0;

default:next_state=s0;endcaseend//states

FSM (3)

8/10/2019 Arth_Cir

103/105

//output logicalways @(statebegin:outputscase (state)s0: y=3b000;

s1: y=3b001;s2: y=3b010;s3: y=3b100;default: y=3b000;endcaseend//outputsendmodule

Testbench (1)

8/10/2019 Arth_Cir

104/105

module stimulus;parametern=8;reg[n-1:0] a,b;reg mul, clock;wire[2*n-1:0] result;

wire ready;multiplier stud(a,b,mul,clock,result,ready);initial beginmul=0; clock=0;a=8b0; b=8b0;#15 mul=1;# 100 wait (ready);mul = 0;#10 a=8d15; b=8d122;

Testbench (2)

8/10/2019 Arth_Cir

105/105

#15 mul=1;#100 wait(ready);mul=0;#10 a=8d201; b=8d5;#15 mul=1;# 100 wait (ready);

mul=0;#10 a=8d255; b=8d255;#15 mul=1;#100 wait(ready);mul=0;#100 $finish;

endalways #10 clock=~clock;endmodule //stimulus

arth_cir

Documents