arth_cir
TRANSCRIPT
-
8/10/2019 Arth_Cir
1/105
Logic Design
-
8/10/2019 Arth_Cir
2/105
Boolean Functions 3 lectures
Boolean Functions Minimization. Combinational Logic DesignPrinciples
4 lectures Brief Description of Verilog 3 lectures
Basic Combinational Circuits 4 lectures
Finite States Machines (FSM) 3 lectures
Synthesis of Synchronous FSM 5 lectures
Content (1/2)
-
8/10/2019 Arth_Cir
3/105
Basic Sequential Circuits 3 lectures
Problems of Synchronous Design 3 lectures
Asynchronous FSM. Self-Timed Circuits 3 lectures
Arithmetic Units 4 lectures
Programmable Logical Integrated Circuits (PLDs) 3 lectures
Memory Devices 3 lectures
Content (2/2)
-
8/10/2019 Arth_Cir
4/105
Positional Number Systems Decimal base or radix=10 Binary radix=2=an-1an-2an-3 . . . a1a0,a-1a-2 . . .-m ; a{0,1}
There are n digits to the left of the point and m digits to theright of the point.
A=an-12n-1+an-22
n-2+. . .+a121+a02
0+a-12-1+a-22
-2+. . .+ a-m2
-m.Unsigned integer number
Range (0 to 2n-1)
n-1 0
Unsigned Numbers
-
8/10/2019 Arth_Cir
5/105
0000
0111
0011
1011
1111
1110
1101
1100
1010
1001
1000
0110
0101
0100
0010
0001
+0
+1
+2
+3
+4
+5
+6
+7+8
+9
+10
+11
+12
+13
+14
+15
AdditionSubtraction
A modular counting representation of 4-bit unsigned numbers
A Graphical View
-
8/10/2019 Arth_Cir
6/105
Signed Numbers
S
n-1 0
S = 0 Positive number (or zero)
S = 1 Negative numberNegative numbers representationThree Major schemes: sign and magnitude - direct code
ones complement twos complement
-
8/10/2019 Arth_Cir
7/105
0000
0111
0011
1011
1111
1110
1101
1100
1010
1001
1000
0110
0101
0100
0010
0001
+0
+1
+2
+3
+4
+5
+6-1
-2
-3
-4
-5
-6
-7
+7-0
A=-an-2an-3 . . .a1a0Asign&magn=1an-1an-2a1a0
Example:
+5 0101-5 1101
Range(2n-1- 1) to 2n-1-1Two representation for 0
Operands have different signs - subtract smaller(by magnitude)from larger and keep sign of the larger
Sign and Magnitude
(Direct Code)
-
8/10/2019 Arth_Cir
8/105
A= - an-2an-3 a1a0A1scom= 1~an-2~an-3 ~a1~a0
Example:
+5 - 0101-5 - 1010
A1scom= 2n- 1- |A|
Range -2n-1 -1 to 2n-1-1
Ones Complement
-
8/10/2019 Arth_Cir
9/105
Ones Complement on the Number Wheel
Two representation for 0
A - A = -00000
0111
0011
1011
1111
1110
1101
1100
1010
1001
1000
0110
0101
0100
0010
0001
+0
+1
+2
+3
+4
+5
+6-6
-5
-4
-3
-2
-1
-0
+7-7
Addition ofpositive number
Subtractionof positivenumber
-
8/10/2019 Arth_Cir
10/105
A= - an-2an-3 a1a0A1scom= 1an-2an-3 a1a0 + 1
Example:
+5 - 0101-5 - 1011
A2scom= 2n- |A|
Range -2n-1 to 2n-1-1
Twos Complement
-
8/10/2019 Arth_Cir
11/105
Twos Complement on the Number Wheel
+7-8
0000
0111
0011
1011
1111
1110
1101
1100
1010
1001
1000
0110
0101
0100
0010
0001
+0
+1
+2
+3
+4
+5
+6-7
-6
-5
-4
-3
-2
-1
Addition ofpositivenumber
Subtractionof positivenumber
A2scom = 2n -A
= - 2n-1 an-1 + 2i ai
-
8/10/2019 Arth_Cir
12/105
Twos Complement Addition (1)
Addition:C = A + B. A 0, B 0, |A| BA2com + B = 2
n |A| + B = B |A|
The result is positive and carry from sign bit (2n
) is discarded.Example : + 0011111 = +0 0011111
0000111 = 1 1111001+ 0011000 = 10 0011000
0, B 0, |A| B.
Acom + B = 2n (|A|B)= 2n |C|= CcomExample: 0011100 = +1 1100100+ 0000100 = 0 0000100 0011000 1 1101000
-
8/10/2019 Arth_Cir
13/105
Twos Complement Addition (2)
A 0, B 02n |A| + 2n |B|= 2n (|A| + |B|) + 2n = 2n |C| = C2com
The result is negative and carry from sign bit (2n) is discarded.
Example : 0001101 = 1 1110011 0011001 = 1 1100111 0100110 = 11 1011010
Summary: The sign bit participates in operation like other bits. The negative result is represented in twos complement
form. The carry from the sign bit is ignoredSubtraction: C = A-B = A + (-B) = A + Complemented B
-
8/10/2019 Arth_Cir
14/105
Addition:C = A + B. A 0, B 0, |A| B.A1scom + B = 2
n -1 |A| + B = B |A|+2n-1
The result is positive and the carry from the sign bit ( 2n) is added tothe least bit of the result (end-around carry)
Example : + 0011111 = +0 0011111 0000111 = 1 1111000
+0011000 = 10 0010111
10 0011000
Ones Complement Addition (1)
-
8/10/2019 Arth_Cir
15/105
Ones Complement Addition (2)
0, B 0, |A| B.
A1scom + B = 2n - 1- (|A|B)= 2n -1- |C|= C1scom
Example : 0011100 = + 1 1100011
+ 0000100 = 0 0000100
0011000 = 1 1100111
-
8/10/2019 Arth_Cir
16/105
Ones Complement Addition (3)
A 0, B 0 12n -1- |A| + 2n |B|= 2n -1 (|A| + |B|) + 2n -1 = 2n |C| = C1scomIn this case end-around carry is generatedExample : 0001101 = 1 1110010
0011001 = 1 1100110 0100110 = 111011000
11 1011001
Summary: The sign bit participates in operation like other bits.
The negative result is represented in ones complement form. The carry from the sign bit is end-around carry
Simpler addition scheme makes twos complement the most commonchoice for integer number systems within digital systems
-
8/10/2019 Arth_Cir
17/105
If an addition operation produced a result that exceeds the range of thenumber system, overflow is said to occur.Addition of two numbers with different signs can never produceoverflow (only addition numbers with the same signs).
Example:-64 + 11000000 +50 + 0 0110010
-65 10111111 80 0 1010000
-129 01111111 = +1 130 1 0000010 = - 126
Negative Overflow Positive Overflow
Overflow: if the addends signs are the same but the sums sign isdifferent from the addends. OVF = cncn-1
Overflow
-
8/10/2019 Arth_Cir
18/105
Inputa b Outputs cout
0 00 11 0
1 1
0 01 01 0
0 1
Half AdderThe function of the half adder is to add two binary digits,producing a sum and a carry.
s= a b;cout = ab
a
b s
cout
Adders
-
8/10/2019 Arth_Cir
19/105
Input
a b cin
Output
s cout0 0 00 0 10 1 00 1 1
1 0 01 0 11 1 01 1 1
0 01 01 00 1
1 00 10 11 1
The function of the full adder is to add two binary digits and acarry that might be generated or propagated by the previousstage.
Full Adder (1)
-
8/10/2019 Arth_Cir
20/105
S=a b c;cout = ab + acin + bcin (majority function)or S=~cout(a+b+cin)+abcin
bcina 00 01 11 10
0
1
1 1
1 1
s
bcina 00 01 11 10
0
1
1
11 1
cout
Full Adder (2)
-
8/10/2019 Arth_Cir
21/105
The Circuit of Full Adder (1)
cin
ab
s
cout
s1
c1
c2
c3
Cout=ab +cin(ab)Standard approach 6 gates
-
8/10/2019 Arth_Cir
22/105
The Circuit of Full Adder (2)
Cout=ab +cin(ab)= ab cin(ab)
5 gates
cin
ab
sum
cout
s1
c1
c2
-
8/10/2019 Arth_Cir
23/105
-
8/10/2019 Arth_Cir
24/105
Full Adder from Two Half Adders
Half
Adder
A
B
Half
Adder
A B
Cin
A B CinS S
COCO
Cin (A B)A B
S
CO
-
8/10/2019 Arth_Cir
25/105
Inversion Property
Boolean functions S and Cout are self-dual.
A B
S
CinFACout
A B
S
FACout Cin
Cout (A, B, Cin) = Cout (A, B, Cin)
S (A, B, Cin) = S(A, B, Cin)
-
8/10/2019 Arth_Cir
26/105
Cout
A0 B0
S0
CinFA
A1 B1
S1
FA
A2 B2
FA
A3 B3
S3
FA
S2
Invertors on the way of carry signal may be removed (thiswill minimize the critical path of carry chain).
Inversion Property
-
8/10/2019 Arth_Cir
27/105
an-1 an-2 . . . a1 a0ABCin
FAS
Cout
bn-1 bn-2 . . . b1 b0
sn-1 sn-2 . . . s1 s0
D Q
C
RA
RB
Clock
RS
The Serial Adder
-
8/10/2019 Arth_Cir
28/105
cout
(c4)
A S
B
CIN CO
A S
B
CIN
CO
A S
B
CIN CO
A S
B
CIN CO
a0b0cin(c0)
a1b1
a2b2
a3b3
s0 s1 s2 s3
c1 c2 c3
Carry Ripple Adder.S= A+ B; A= (a0,a1,a2,a3); B= (b0,b1,b2,b3);S0 = a0 b0 c0; c1= a0b0 + (a0 + b0) c0;
S1 = a1 b1 c1; c2= a1b1 + (a1 + b1) c1;S2 = a2 b2 c2; c3= a2b2 + (a2 + b2) c2;S3 = a0 b3 c3; c4= a3b3 + (a3 + b3) + c3;Tadd= (n-1)tc + tsm n tsm
A Parallel Binary Adder
-
8/10/2019 Arth_Cir
29/105
// Define a 4-bit addermodule add4(s, c_out, a, b, c_in); // I/O port declarationsoutput [3:0] s;output c_out;
input [3:0] a, b;input c_in;// Internal netswire c1, c2, c3;
Verilog Description for 4-bit CRA (1)
-
8/10/2019 Arth_Cir
30/105
Verilog Description for 4-bit CRA (2)
(Gate Level Description)
// Instantiate four 1-bit full adders.fulladd fa0(s[0], c1, a[0], b[0], c_in);fulladd fa1(s[1], c2, a[1], b[1], c1);
fulladd fa2(s[2], c3, a[2], b[2], c2);fulladd fa3(s[3], c_out, a[3], b[3], c3);endmodule// Define a 1-bit full addermodule fulladd(sum, c_out, a, b, c_in);
// I/O port declarationsoutput sum, c_out;input a, b, c_in;
-
8/10/2019 Arth_Cir
31/105
-
8/10/2019 Arth_Cir
32/105
module adder_4_RTL (a, b, c_in, sum, c_out);
output [3:0] sum;
output c_out;
input [3:0] a, b;
input c_in;
assign {c_out, sum} = a + b + c_in;
endmodule
Verilog Description for 4-bit CRA (3)
-
8/10/2019 Arth_Cir
33/105
Tadd=TFA(A,BCout) + (N-2)TFA(CinCout) + TFA(CinS)
T = O(N) worst case delay.
N number of bit.Real Goal: Make the fastest possible carry path.
S
B
CIN CO
S
B
CIN CO
S
B
CIN CO
0 1 31s0 s1 s31
b0b1 b31
c32~Add/Sub
A 64-bit Adder/Subtractor
-
8/10/2019 Arth_Cir
34/105
Adder/Subtractor Module in Verilog
module addsub(a, b, select, cout, sum);
input [7:0] a, b;input select;output [7:0] sum;output cout;assign {cout, sum}=select?(a-b):(a+b);
endmodule
Select = 0 Addition
Select = 1 Subtraction
Data-flow description
-
8/10/2019 Arth_Cir
35/105
Inputa b cin
Outputs cout cout s
Carrystatus
0 0 00 0 1
0 01 0
0 cin0 cin
annihilateannihilate
0 1 00 1 11 0 01 0 1
1 00 11 00 1
cin ~cincin ~cincin ~cincin ~cin
propagatepropagatepropagatepropagate
1 1 01 1 1
0 11 1
1 cin
1 cin
generategenerate
Carry Look-Ahead Adders (1)
-
8/10/2019 Arth_Cir
36/105
All carries are produced in parallelci+1 = gi + pici, where gi = aibi , pi = ai + bi (or pi=aibi).gi carry generationpi carry propagationRe-express the carry logic for each of the bits:c1 = g0 + p0c0;c2 = g1 + p1c1 = g1+ p1g0 + p1p0c0;c3 = g2 + p2(g1 + p1g0 + p1p0c0) = g2 + p2 g1 + p2p1g0 ++ p2p1p0c0;
c4 = g3 + p3g2 + p3p2g1 + p3p2p1g0 + p3 p2p1p0c0;Each equation corresponds to a circuit with just three levels ofdelay one for generate and propagate signals, and two for thesum of products.
Carry Look-Ahead Adders (2)
-
8/10/2019 Arth_Cir
37/105
FA FA FA FA
Carry Unit
g0 p0 p1g1 g2 p2 g3 p3
a0 b0 a1 b1 b2a2 a3 b3
cin
coutc1 c2 c3
Carry Look-Ahead Adders (3)
-
8/10/2019 Arth_Cir
38/105
One bit CLA
cin
cout
aibi
gi
pi
0
1
Si
-
8/10/2019 Arth_Cir
39/105
One Stage of a Carry Look-Ahead Adder
Carry-
Lookahead
Logic
aibi
hsici
a0
ai-1b0
bi-1
si
-
8/10/2019 Arth_Cir
40/105
Lookahead carry circuit (Carry Unit)forms carry signalsc4 = G + Pc0, where G = g3 + p3g2 +p3p2g1 + p3p2p1g0 and P = p3
p2p1p0c0.This equations formal coincide withequations ci+1=gi + pici .Lookahead carry for 4-bit sections ALUis executed like lookahead carry for
separate bits of 4-bit adder.
C0G0P0G1P1G2P2G3P3
C1
C2
C3GP
CRU (Carry Unit)
-
8/10/2019 Arth_Cir
41/105
MSI Adders
IC 74x283
Adder produces active-low versions of the carry-generateand carry-propagate signals.Equations for half-sum:
hsi = aibi = ai~bi +~aibi=ai~bi+ai~ai+~aibi+bi~bi==(ai+bi)(~ai+~bi)=(ai+bi)~(aibi)=pi~gi
And gate can be used instead of XOR gateEquations for carry is factored
ci+1 = gi+pici=pigi+pici=pi(gi+ci)
-
8/10/2019 Arth_Cir
42/105
c1= p0(g0+c0)c2= p1(g1+c1) = p1(g1+ p0(g0+c0))== p1(g1+p0)(g1+g0+c0)c3= p2(g2+c2) = p2(g2+ p1(g1+p0)(g1+g0+c0))=
=p2(g2+ p1)(g2+g1 +p0)(g2+g1+g0+c0)c4 = p3(g3+c3)= p3(g3 + p2(g2+ p1)(g2+ g1+p0)(g2+g1+g0+c0))==p3(g3 + p2)(g3+g2+p1) (g3+g2+g1+ p0)(g3+g2+ g1+g0+c0)
The propagation delay from the C0 input to the C4 outputis very short, about the same as two inverter gates.
Equations for Carry Signals
-
8/10/2019 Arth_Cir
43/105
C0A0B0A1
B1A2B2A3B3
S0
S1
S2
S3
C4
Logic Symbol
IC 74x283
-
8/10/2019 Arth_Cir
44/105
Cin
Cout
B[0:15]
C0
0 S0
B0
A1 S1
B1A2 S2
B2
A3 S3
B3
C4
C0
0 S0
B0
A1 S1
B1A2 S2
B2
A3 S3
B3
C4
C0
0 S0
B0
A1 S1
B1A2 S2
B2
A3 S3
B3
C4
C0
0 S0
B0
A1 S1
B1A2 S2
B2
A3 S3
B3
C4
A[0:15]
S[0:15]
11
22
33
44
5
6
7
88
7
6
5
7
99
10
10
1111
1212
1313
1414
1515
1616
1
2
3
4
5
6
8
9
10
11
12
12
13
14
16
Tadd=Mt(c0c4)=4t(c0c4), where the number of groups
A 16-bit Group-Ripple Adder
-
8/10/2019 Arth_Cir
45/105
-
8/10/2019 Arth_Cir
46/105
The following 3 functions are formed in each bit of theadder:G = Ai Bi(Generate);
P = AiBi(Propagate)K= ~Ai~Bi (Annihilate (kill))
Carry Bypass Adder
(Carry Skip Adder)
-
8/10/2019 Arth_Cir
47/105
The idea of carry bypass adder:
P0=a0+b0; P1=a1+b1; P2=a2+b2;P3=a3+b3.If P0 P1 P2 P3 =1, then Cout=Cin, else Cout = C4 (Generate).BP block propagation.
FA FA FA FA
MUX
a0 b0 a1 b1 a2 b2 a3 b3 B P = P0P1P2P3
Cin
Cout
C4
S0 S1 S2 S3
0
1
Carry Bypass (Skip) Adder
-
8/10/2019 Arth_Cir
48/105
S0 - S3
Cin
B3Bit 12-15
Setup
Carry
Propa-gation and
C0
Setup
Carry
Propa-
gation and
C1
Setup
Carry
Propa-
gation and
C2
Setup
Carry
Propa-gation and
C3
Sum Sum Sum Sum
B0Bit 0-3
B1Bit 4-7
B2Bit 8-11
BP0 BP0BP1 BP0
BP1
BP2 BP0BP1BP2BP3
Cout
0
1
C0 C1 C2C3
BP0 BP1 BP2BP3
S4 S7 S8 S11 S12 S15
0
1 1 1
0 0
Worst-case delay carry from bit 0 to bit 15 = carry generated in bit
0, ripples through bits 1, 2, and 3, skips the middle two groups (B isthe group size in bits), ripples in the last group from bit 12 to bit 15.Tadd = tsetup + B tcarry + ((N/B) - 2) tskip +(B-1) tcarry + tsumtsetup- time for forming gs and ps.
Carry Skip Adder
-
8/10/2019 Arth_Cir
49/105
C0 carry from B0, C1 carry from B1, C2 carry from B2,C3 carry from B3.tsetup - time necessary for creating generation and propagationsignals (gi,pi).
tcarry one bit propagation signal delay.tbypass propagation signal delay through bypass multiplexer.tsum time required for forming sum of the last bit.Dependence of timing delay from the number of bits is moreacceptable than in CRA (is also linear function but with less
angular coefficient).
Carry Skip Adder
-
8/10/2019 Arth_Cir
50/105
Carry ripple is realized in the blocks.Accept tcarry = tskip = tsetup = 1; thenadd =1 + B +(N/B2) + B + 1 = 2B + N/B-1 ;dTadd/dB = 2 N/B2;
dTadd/dB = 0 at Bopt = (N/2)Topt = 4(n/2) 1 = 2(2n) 1
Optimal Skip Block Size and Add Time
-
8/10/2019 Arth_Cir
51/105
N bit circuit is divided into M blocks by B bits.Precompute the carry out of each block for both carry_in = 0 andcarry_in = 1 (can be done for all blocks in parallel) and then selectthe correct one. The adder circuit is completed about 30%
4 bits
Carry0 Carry1
MUX
Sum
0
CinCout
1
0 1
Carry Select Adder
C S
-
8/10/2019 Arth_Cir
52/105
Tadd = tsetup + B tcarry + (N/B) tmux + tsum
Setup
0 carry
1 carry
Mux
Sum
0
1
15 ... 12
A B
Ps Gs
Cs
S15... S12
Setup
0 carry
1 carry
Mux
Sum
0
1
Ps Gs
Cs
S11... S8
Setup
0 carry
1 carry
Mux
Sum
0
1
Ps Gs
Cs
S7... S4
Setup
0 carry
1 carry
Mux
Sum
0
1
Ps Gs
Cs
S3... S0
CinCout
13 ... 8
A B
7 ... 4
A B
3 ... 0
A B
(1)
(5)(5)(5)(5)(6)(7)(8)(9)
(1)(1)(1)
(5)
Carry Select Adder
-
8/10/2019 Arth_Cir
53/105
C S l t Add (S R t)
-
8/10/2019 Arth_Cir
54/105
Cin
Tadd = tsetup + 2 tcarry + m tmux + tsum
1 0
A B
4 ... 2
A B
Setup
0 carry
1 carry
Mux
Sum
0
1
19 ... 14
A B
Ps Gs
Cs
S19... S14
Setup
0 carry
1 carry
Mux
Sum
0
1
Ps Gs
Cs
S13... S9
Setup
0 carry
1 carry
Mux
Sum
Ps
Cs
S8... S5
Cout
13 . 9
A B
8 ... 5
A B
1
Setup
0 carry
1
carry
Mux
Sum
0
1
Ps Gs
Cs
S1 S0
Gs
0
Gs
Setup
0 carry
1 carry
Mux
Sum
Ps
Cs
S4...S2
0
1
0
(1)
(3)(3)(4)(5)(6)(7)(8)
Carry Select Adder (Square Root)
C S Add
-
8/10/2019 Arth_Cir
55/105
Carry Save Adder
Consider addition of three numbers. In this case two vectors areformed: sum vector S and carry vector C:
Example: x+y+z = s+c
x: 1001111
y: 1100100z: + 0001111s: 0100100c: +1001111
sum: 11000010
At addition of N n-bit numbers, the number of bits of sum will equallog2N+n. CSA is used for adding more than two numbers together.
Ci it f dd f 3 4 bit b
-
8/10/2019 Arth_Cir
56/105
A B A B A B A B
A B A B
Cin
Cout S
Cin
Cout SCout S
Cin Cin
Cout S
Cin Cin
AB
AB
FA FA FA FA
HAHA FA FA
CoutCoutCoutCout
x0 y0 z0x1 y1 z1x2 y2 z2x3 y3 z3
sum0sum1sum2sum3sum4sum5
Circuit of adder of 3- 4-bit numbers
CSA
-
8/10/2019 Arth_Cir
57/105
Advantage of CSA pipeline capability.
CSA1
D
C
DC
DC
CSA2DC
DC
CRA
A4
A1
A2
A3
Clock 1Clock 2
Ss
Cs
CSA
C diti l S Add
-
8/10/2019 Arth_Cir
58/105
SMLow
ALow
SMHigh SMHigh
BLow
BHighAHigh
MUX MUX
Cin =1 Cin =0
Cout0
Cout1CoutL
Cout
S
n-bit adder is divided into two groups by n/2 bits. The oldergroup is duplicated, so three adders by n/2 bits are included inthe circuit.
Conditional Sum Adder
Th St t f E ti U it
-
8/10/2019 Arth_Cir
59/105
The Structure of Execution Unit
OA- Operational(or Execution) UnitCU Control UnitOA consists of registers,adders, another logicalelements and wires.CU produces control signals,that bring to execution ofops.
OU CU
Data in
Data out
Command
Done
X
Y
EU f I t M lti li ti (U i d)
-
8/10/2019 Arth_Cir
60/105
EU for Integer Multiplication (Unsigned)
RA RB
SM
0
MUX
RC(acc)
CT
y1 y1
y1
y1 n
y3
y3
y3y2
Control
Unitx2
x1
y1 y2 y3
RA multiplicand, RB multiplier, RC (accumulator) high bitsof sum of partial products. Possible to combine multiplierregister (low bits) and accumulator register (high bits).
n bit n bit
n bit
2n bit
A B
Flo Chart of M ltiplication (1)
-
8/10/2019 Arth_Cir
61/105
Start
Multiply?
RA = A; RB =B;CT =n; RC =0
Yes
No
Y1
X1
RC = SM
Shift right RC, RB;CT =CT-1
(CT)=0?
End
Yes
Y2
Y3
X2No
Flow-Chart of Multiplication (1)
Example of Multiplication
-
8/10/2019 Arth_Cir
62/105
Example of Multiplication
AccumulatorRC
RB CT
0000 0000 1010 4
0000 0000 0101 3 shift
+0000 00001101 00001101 0000
0110 1000 0010 2 shift
0011 0100 0001 1 shift
+0011 0100110110000 0100
1000 0010 0000 0 shift
A=1101;B=1010;
Signed multiplication:
convert negativenumbers to positive,execute unsignedmultiplication,
remember the originalsigns.
Behavior Description of Multiplier in
-
8/10/2019 Arth_Cir
63/105
modulemultiplier (a, b, mul, clock, result, ready);input clock, mul;parametern=8;input [n-1:0] a,b;wire[n-1:0] a,b;
reg [2*n-1:0] result;output ready;reg ready;output [2*n-1:0] result;reg [n:0] rc;reg [n-1:0] ra, rb;always @(posedge mul)beginra=a;rb=b;
Behavior Description of Multiplier in
Verilog(1)
Behavior Description of Multiplier in
-
8/10/2019 Arth_Cir
64/105
Behavior Description of Multiplier in
Verilog(2)
ready=0;rc = 0;repeat (n)begin@(posedge clock)if(rb[0])rc =rc+ra;rb={rc[0],rb[n-1:1]};rc=rc>>1;end
result={rc[n-1:0],rb};ready=1;endendmodule //multiplier
Multiplying Unit 2
-
8/10/2019 Arth_Cir
65/105
RA RB
SM
MUX
RC
0
CT
The structure of multiplying
unit with the shift of multiplierto the right and multiplicand tothe left
A B
2n bit
2n bit
2n bit
n bit
Multiplying Unit 2
Multiplying Algorithm
-
8/10/2019 Arth_Cir
66/105
Start
Multiply?
RA:= A; RB:=B;
CT:=n; RC:=0
Yes
No
No
RC:=SM
SL (RA)SR (RB)
CT:=CT-1
(CT)=0?End
Yes
Yes NoReturn a result
(R) or (RB) =0?
Multiplying Algorithm
Multiplication on Signed Numbers
-
8/10/2019 Arth_Cir
67/105
RA
SM
ACC RB RB0 RB-1
0 1
DC0 1 2 3
CT
B
0011
0110
A
Multiplication on Signed Numbers
(Booths Algorithm)
Booths Algorithm
-
8/10/2019 Arth_Cir
68/105
Start
No
MultiplyNo
Yes
RA=A, RB=B, CT= n,
ACC=0, RB-1=0
RB0,RB-1
ACC=ACC+RAACC=ACC+
RA+1
ASR(ACC, RB, RB-1)CT = CT-1
Multiply?
(CT)==
0?Yes
0110
0011
ASR arithmetical right shift (sign extend when shifting)
Booth s Algorithm
Example
-
8/10/2019 Arth_Cir
69/105
Acc RB RB-1 CT
+000000000110101001101010
11000111
11000111
0 8
00110101 01100011 1 7
00011010 10110001 1 6
+000011011001011010100011
01011000
01011000
1 5
11010001 10101100 0 4
11101000 11010110 0 3
+11110100
0110101001011110
01101011
01101011
0 2
00101111 00110101 1 1
00010111 10011010 1 0
A = 10010110;
B = 11000111;
(A)com = 01101010;
Example
Substantiation of the algorithm (1)
-
8/10/2019 Arth_Cir
70/105
Substantiation of the algorithm (1)
1. B>0
A*(00011110)=A*(24 +23 +22 +21) = A*30The set of addition operations can be replaced only by twooperations (addition and subtraction) as the following expressions
take place:2n + 2n-1+ . . . +2n-k = 2n+1 2n-k
*(00011110)=A*(25-21)=A*30.This can be expanded at any number of consequently following1s.This algorithm is called Booths recoding.
(0,1) (-1,0,1)Multiplier: 00011110 0,0,1,0,0,0, -1,0Instead of 4 additions - 2.
Substantiation of the algorithm (2)
-
8/10/2019 Arth_Cir
71/105
2. B
-
8/10/2019 Arth_Cir
72/105
module Booth_multiplier(a,b,clock,start, ready,result);parametern=16;input[n-1:0] a,b;wire [n-1:0] a,b;input clock, start;output[2*n-1:0] result;reg[2*n-1:0] result;output ready;reg ready;reg[n-1:0] acc,ra,rb;reg q;always@(posedge start)begin
ra =a;rb=b;acc=0;q=0; ready=0;
Booth Multiplier in Verilog (1)
Booth Multiplier in Verilog (2)
-
8/10/2019 Arth_Cir
73/105
repeat (n)begin@(posedge clock)if(rb[0]!==q)begin if(q)acc=acc+ra;else acc=acc-ra;endq=rb[0];rb={acc[0],rb[n-1:1]};acc={acc[n-1],acc[n-1:1]};//arithmetic shift rightendresult={acc,rb};
ready =1;endendmodule//Booth_multiplier
Booth Multiplier in Verilog (2)
Combinational Multipliers
-
8/10/2019 Arth_Cir
74/105
Combinational Multipliers
Acceleration methods of multiplication: parallel computing of partial products reduction of number of additions reduction of propagation time delay
Two types of multipliers are used matrix and treestructured.Propagation delay of matrix multipliers (n).Propagation delay of tree structured multipliers O(log2n).
Partial Products in an 4 4 Multiplier
-
8/10/2019 Arth_Cir
75/105
a0b3 a0b2 a0b1 a0b0
a1b3 a1b2 a1b1 a1b0
a0b3
a0b2 a0b1 a0b0
a1b3 a1b2 a1b1 a1b0
p0p1p2p3p4p5p6p7
+
a0b3 a0b2 a0b1 a0b0
a0b3 a0b2 a0b1 a0b0
Partial Products in an 44 Multiplier
Matrix Multiplier Based on CRA
-
8/10/2019 Arth_Cir
76/105
Matrix multiplier contains n2 AND gates to formpartial products.
Multiplier based on CRA contains(n-1)n adders. The number of HA n;
The number of FA is n2
-2n. In the worst case the propagation delay equal 3n-4.
Matrix Multiplier Based on CRA
Matrix Multiplier Based on CRA Structure
-
8/10/2019 Arth_Cir
77/105
a0b0a1b0a2b0a3b0
a0b1a1b1a2b1a3b1
++++
a0b21b22b23b2
++++
a0b31b32b33b3
++++
p0p1p2p3p4p5p6p7
0
0
0
0
Matrix Multiplier Based on CRA Structure
Matrix Multiplier with Carry Save Addition
-
8/10/2019 Arth_Cir
78/105
Matrix Multiplier with Carry Save Addition
Matrix multiplier using carry-save addition contains thesame number of elements.
It is more faster because its propagation delay isshorter.
The last (n) stage corresponds to CRA. Its worst-case carry propagation path goes through2n-2 adders.
Matrix Multiplier using Carry Save
-
8/10/2019 Arth_Cir
79/105
a0b0a1b0a2b0a3b0
a0b1a1b1a2b1a3b1
+++
a0b2a1b2a2b2a3b2
+++
a0b3a1b3a2b3a3b3
+++
p0p1p2p3p4p5p6p7
+++
Matrix Multiplier using Carry Save
Addition
Treelike Multipliers
-
8/10/2019 Arth_Cir
80/105
Treelike multipliers contain three stages: Generation of bits of partial products. This stage consists
of n2 of AND gates. Compression of partial products. Implemented as a tree
of parallel adders. Final addition. Addition of sum vector and carry vector.While using in multipliers, full adders and half adders areusually called compressors and counters (3,2) (2,2).
Treelike Multipliers
Wallace-tree multiplier (1)
-
8/10/2019 Arth_Cir
81/105
a3 a2 a1 a0b3 b2 b1 b0
a0b0a1b0a0b1
a2b0a1b1a0b2
a3b0a2b1a1b2a0b3
a3b1a2b2a1b3
a3b2a2b3a3b3
c 15 s14 c14 s13 c13 s12 c12 s11
a0b0s11
a1b0a0b1
s12c12
s13c13 a0b3
s14c14 a1b3
c15 a3b3a2b3
a3b3
c 24 s24 c23 s23 c23 s22 c21 s21s11s21 a0b0s22
c21
s23c22c31
s24c23c32
c24a3b3c33
c 24 s34 s33 s32 s31p7 p6 p5 p4 p3 p2 p1 p0
Wallace tree multiplier (1)
Wallace-tree multiplier (2)
-
8/10/2019 Arth_Cir
82/105
a0b0a1b0a2b0a3b0
a0b1a1b1a2b1a3b1
+
p0p1p2p3p4p5p6p7
a0b2a1b2a2b2a3b2
+++
a0b3a1b3a2b3a3b3
++++
++++
Wallace tree multiplier (2)
Wallace-tree
-
8/10/2019 Arth_Cir
83/105
Lines of matrix of partial products are grouped in three. For the compression of columns with three bits FA are
used. For compression of columns with two bits HA areused.
Line that are not included in a set of three lines areaccounted in the next reduction cascade. Wallace scheme is considered to be the fastest, but at
the same time its structure is the least regular. The main area of Wallace tree uses is a construction of
schemas of large capacity.
Wallace tree
Dadda Multiplier (1)
-
8/10/2019 Arth_Cir
84/105
c12 s12 s11 a2b0 a1b0a0b0a3b2 c11 a0b3 a1b1 a0b1a2b3 a0b2
c24 s24 s23 s22 s21 a1b0 a0b0a
3
b3
c23
c22
c21
c31
a0
b1c35 c34 c33 c32
a3b1a2b2a1b3
a0b0a1b0a0b1
a2b0a1b1a0b2
a3b0a2b1a1b2a0b3
a3b2a2b3a3b3
c12 s12 c11 s11
c24 s24 c23 s23 c22 s22 c21 s21
a3b3
c36 s36 s35 s34 s33 s32 s31p7 p6 p5 p4 p3 p2 p1 p0
a3 a2 a1 a0b3 b2 b1 b0
Dadda Multiplier (1)
Dadda Multiplier (2)
-
8/10/2019 Arth_Cir
85/105
a0b0a1b0a2b0a3b0
a0b1a1b1a2b1a3b1
a0b2a1b2a2b2a3b2
a0b3a1b3a2b3a3b3
+ + +
+++
+ +++++
p7 p6 p5 p4 p3 p2 p1 p0
Dadda Multiplier (2)
Dadda Multiplier (3)
-
8/10/2019 Arth_Cir
86/105
The difference in Wallace and Dadda methods is the differentapproach in the solution of addition compression problem.
Wallace algorithm compresses codes as soon as possible, at theearly stages.
Dadda algorithm provides the highest level of compression at the
late stages. A Wallace-tree multiplier works forward from the multiplier inputs. The Dadda multiplier works backward from the final product. The number of cascades is the same in the both multipliers. Both lacks in structure regularity.
The number of stages and thus delay (in units of an FA delayexcluding the CPA) for an n-bit tree-based multiplier using (3, 2)counters is log1.5.n = log10 n/log101.5 =log10n/0.176
adda u t p e (3)
Example of Sequential Multiplier
-
8/10/2019 Arth_Cir
87/105
p q p
RA RB
SM
0
MUX
RC(acc)
CT
y1
y1n
y3
y3
y3y2
Control
Unit
x2
x1
y1 y2 y3
RA multiplicand, RB multiplier, RC (accumulator) high bits of sumof partial products. Multiplier register (low bits) sum.
n bit n bit
n bit
n bit
Flow-Chart of Multiplication Algorithm(1)
-
8/10/2019 Arth_Cir
88/105
p g ( )
Begin
Multiply?
RA = A; RB =B;CT =n; RC =0
YesNo
Y1
X1
RC = SM
Shift right RC, RB;CT =CT-1 (CT)=0?
End
Yes
Y2
Y3
X2No
Flow-Chart of Multiplication Algorithm(2)
-
8/10/2019 Arth_Cir
89/105
p g ( )
Begin
RA = A; RB =B;CT =n; RC =0
Yes
No
Y1
In1
Yes
Y2
Done?
No
RC = RC + MUX
(Transfer with shift)
Shift RB,
CT=CT-1
(CT)=0End
Multiply?
Transition to the Mealy FSM
-
8/10/2019 Arth_Cir
90/105
Mealy FSM
FSM State markup
A = {X,Y,S,,}
X={mul, done};Y={y1,y2,ready};
S= {S0, S1, S2}
Begin
mul
y1
1
0
y2
Done
End
0
1
S2
S0
S1
S0
y
Mealy FSM Graph
-
8/10/2019 Arth_Cir
91/105
S0
S1
S2
~mul/-
mul/y1
1/y2
done/ready
S3
~mul/-mul/ready
~done/y2
y p
Transition to the Moore FSM
-
8/10/2019 Arth_Cir
92/105
Begin
mul
y1
1
0
y2
Done
End
0
1
S0
S1
S2
S3
Moore FSM
-
8/10/2019 Arth_Cir
93/105
Y1:RA=A; RB=B;CT=n;Ready=0Y2:RC=RC+MUX (transfer with
shift right); shift right RB;CT=CT-1;Y3: ready =1;
~done
S0/-
S1/y
1
~mul
mul
1done
~mulmul
S2/y
2
S3/ready
reset
Structure of Modules HDL Description
-
8/10/2019 Arth_Cir
94/105
rareg_a
clock
comb_logicpart_prod
acc(result[2n-1:n])accumulator
rb (result[n-1:0])
counter
fsm
done
y
n
reg_b
a
by1 y2
y1 y2
mul
y1
rb[0]
y2y1
y2
clock
y1
clock
clock
clock
y3
p
RTL-Description of Multiplier (1)
-
8/10/2019 Arth_Cir
95/105
module ser_mult (mul, result, a, b, clock, reset, ready);output[15:0] result;reg [15:0] result;input[7:0] a,b;input mul,clock,reset;output ready;wire[7:0] acc;wire ready;wire [2:0] y;wire [7:0] ra,rb;wire [8:0] part_prod;reg_a M1(clock, a,y[0],ra);reg_b M2(clock, b, y[0],y[1],rb);
accumulator M3(clock,y[0],y[1],part_prod,acc);comb_logic M4(part_prod, ra,acc, rb);counterM5(y[0],y[1],clock,count);fsm M6(clock, mul, reset, done, y);
p p ( )
RTL-Description of Multiplier (2)
-
8/10/2019 Arth_Cir
96/105
assign ready=y[2];always @(posedge clock)if(ready) result={acc[7:0], rb[7:0]};endmodule
module reg_a(clock,a,y[0],ra);
input[7:0] a;output[7:0]ra;reg[7:0] ra;input clock,y[0];always @(posedge clock)
begin
if(y[0]) ra
-
8/10/2019 Arth_Cir
97/105
module reg_b(clock,b,y[0],y[1],rb);input clock,y[0],y[1];input[7:0] b;output[7:0] rb;reg[7:0] rb;
always @(posedge clock)begin
if(y[0]) rb
-
8/10/2019 Arth_Cir
98/105
module accumulator(clock,y[0],y[1],part_prod,acc);input[8:0] part_prod;output[7:0] acc;reg [7:0] acc;always @(posedge clock)
beginif(y[0]) acc
-
8/10/2019 Arth_Cir
99/105
module comb_logic (part_prod, ra,acc, rb[0]);input rb[0];input [7:0] ra,acc;output [8:0] part_prod;wire [8:0] part_prod;
assign part_prod = rb[0]?(acc+ra:acc);endmodule
RTL-Description of Multiplier (6)
-
8/10/2019 Arth_Cir
100/105
module counter (clock, y[0], y[1], done);input clock, y[0], y[1];reg [3:0] count;output done;reg done;
always @ (posedge clock)case ({y[0],y[1]})
00: count
-
8/10/2019 Arth_Cir
101/105
FSM (2)
-
8/10/2019 Arth_Cir
102/105
//next_state logicalways @(state ormul ordone)begin:statescase (state)s0: begin if(mul)next_state=s1;else next_state=s0; end
s1: next_state=s2;s2: begin if(done) next_state=s3;else next_state=s1;ends3: if (mul)next_state=s3;
else next_state=s0;
default:next_state=s0;endcaseend//states
FSM (3)
-
8/10/2019 Arth_Cir
103/105
//output logicalways @(statebegin:outputscase (state)s0: y=3b000;
s1: y=3b001;s2: y=3b010;s3: y=3b100;default: y=3b000;endcaseend//outputsendmodule
Testbench (1)
-
8/10/2019 Arth_Cir
104/105
module stimulus;parametern=8;reg[n-1:0] a,b;reg mul, clock;wire[2*n-1:0] result;
wire ready;multiplier stud(a,b,mul,clock,result,ready);initial beginmul=0; clock=0;a=8b0; b=8b0;#15 mul=1;# 100 wait (ready);mul = 0;#10 a=8d15; b=8d122;
Testbench (2)
-
8/10/2019 Arth_Cir
105/105
#15 mul=1;#100 wait(ready);mul=0;#10 a=8d201; b=8d5;#15 mul=1;# 100 wait (ready);
mul=0;#10 a=8d255; b=8d255;#15 mul=1;#100 wait(ready);mul=0;#100 $finish;
endalways #10 clock=~clock;endmodule //stimulus