copyright 2016, nikhil patil
TRANSCRIPT
Complementary Based Logic Design for Arithmetic Building Blocks
By
Nikhil Patil, B.Tech
A Thesis
In
Electrical Engineering
Submitted to the Graduate Faculty
of Texas Tech University in
Partial Fulfillment of
the Requirements for
the Degree of
MASTER OF SCIENCES
Approved
Dr. Tooraj Nikoubin
Chair of Committee
Dr. Brian Nutter
Member of Committee
Dr. Stephen Bayne
Member of Committee
Mark Sheridan
Dean of the Graduate School
May 2016
Copyright 2016, Nikhil Patil
Texas Tech University, Nikhil Patil, May 2016
ii
ACKNOWLEDGMENTS
I would like to thank Dr. Tooraj Nikoubin for all his help, advice, patience, and for
giving me the opportunity to work on this thesis. His guidance helped me all the time
in the research and writing this thesis. I couldn’t imagine a better advisor for my
research.
Besides my advisor I would like to thank my thesis committee members Dr. Brian
Nutter and Dr. Bayne for insightful comments, encouragement and support.
I thank my fellow labmates Ashish Joshi, the Nima Eskandari, Swetha Rapollu,
Karthikeya Challa, Abhilash Reddy and Rathan for the stimulating discussions and
sleepless nights we were working together.
Last but not the least, I would like to thank my parents and my sister for their support
and encouragement.
Texas Tech University, Nikhil Patil, May 2016
iii
Table of Contents ACKNOWLEDGMENTS .................................................................................................... ii ABSTRACT ..................................................................................................................... iv LIST OF TABLES ............................................................................................................. v LIST OF FIGURES .......................................................................................................... vi INTRODUCTION .............................................................................................................. 1 COMPLIMENTARY BASED LOGIC DESIGN .................................................................... 3 OPTIMIZATION OF ARITHMETIC BUILDING BLOCKS ................................................... 8
2. 1 Traditional Add/ Sub circuit for BCD ................................................................. 8
2.1.1 Nine’s complement circuit (Block A in the fig 7): ..................................... 12 2.1.2 BCD adder(Block B in the figure 7): .......................................................... 14 2.1.3 nine’s Complement OR Add by one Or buffer(C): ..................................... 15
2.2 Reduced Delay Adder Optimization Using CBLD ............................................ 17
2.3 High- Speed Parallel Decimal Multiplication with Redundant Internal Encoding
circuit optimization with CBLD. ............................................................................. 19 DESIGN OF CARRY SELECT ADDER USING CBLD ...................................................... 23
3.1 Regular CSLA: ................................................................................................... 24 3.2 BEC-CSLA: ....................................................................................................... 24
3.3 CS block based CSLA: ...................................................................................... 27 PROPOSED ADDER DESIGN .......................................................................................... 29
4.2 Half Adder Design: ............................................................................................ 30 4.3 Determination of stage size: ............................................................................... 31
4.3.1 SQRT structure: .......................................................................................... 31
4.3.2 Uniform stage size Cascade Structure: ....................................................... 32 ASIC DESIGN FLOW .................................................................................................... 34 SIMULATION AND RESULTS .......................................................................................... 39
6.1 Optimization of arithmetic building blocks using CBLD .................................. 39 6.1.1 Add/Sub module ......................................................................................... 39 6.1.2 Reduced delay adder ................................................................................... 40 6.1.3 High Speed parallel multiplier .................................................................... 40
6.2 Design of CBEC-RCA ....................................................................................... 41 CONCLUSION ................................................................................................................ 44 REFERENCES ................................................................................................................ 45
Texas Tech University, Nikhil Patil, May 2016
iv
ABSTRACT
High-performance low power arithmetic circuit with reduced area is essential
for advanced arithmetic processes. This thesis proposes a modification of arithmetic
units using a new design technique which is called Complimentary Based Logic Design
(CBLD). CBLD is the design technique which minimizes the number of gates hence
improves the area efficiency of the overall circuit. This new design approach can be
issued in an arithmetic unit in which multiple blocks perform conditional operations in
parallel at one stage or in series with dependency. As multiple module performs in
parallel only one module functions and others stays idle. CBLD reduces such idle
modules by compressing all modules with a single module. The functionality of CBLD
can be verified by implementing it on an optimized module of BCD adder and BCD
adder/subtractor module. Comparison of CBLD design with its’ older counterpart shows
significant area power and delay efficiency.
Second Part of this thesis proposes a novel design of Carry select adder
implementing CBLD. Proposed design of conditional BEC-CSLA or modified ripple
carry adder is compared with recently published Area power and time efficient Carry
select adders.
Texas Tech University, Nikhil Patil, May 2016
v
LIST OF TABLES
1: Output in terms of input complement ........................................................................ 4
2: Min Terms representation .......................................................................................... 4
3: multi-input CBLD consideration ............................................................................... 5
4: Nine's Complement table ......................................................................................... 12
5: Condition function for nine's complement ............................................................... 12
6: CBLD final Function for Nine's complement .......................................................... 13
7: Function for add by six ............................................................................................ 14
8: Condition for add by one and nine's complement from truth table .......................... 15
9: Control signal Conditions ........................................................................................ 16
10: CBLD Functions for the 9'c complement add by one module ............................... 16
11: Conditions in the BCD adder ................................................................................. 18
12: Add by six and add by one ..................................................................................... 19
13: Condition for add by nine / add by one/ add by 10 ................................................ 20
14: Control signal condition ........................................................................................ 20
15: CBLD condition for Add by One ........................................................................... 29
16: Result of comparison for Add/sub to CBLD counterpart ...................................... 39
17: Reduced delay adder comparison .......................................................................... 40
18: ADP PDP comparison for Reduced delay adder ................................................... 40
19: Comparison of High speed parallel multiplier with CBLD ................................... 40
20: Comparison Result for CBEC-RCA and standard adders ..................................... 42
Texas Tech University, Nikhil Patil, May 2016
vi
LIST OF FIGURES
1: Typical multistage system ........................................................................................... 3
2: Complementary based Logic Design General Block Presentation .............................. 3
3: Final 2-bit Add by one Function ................................................................................. 5
4: General XOR gate representation ................................................................................ 6
5: (A) Proposed class- A XOR gate [14] (B) 3-input XOR [19] ..................................... 6
6:(a) Time delay result for 2-input OR gate (b)Time delay result of 2-input XOR ........ 7
7: Traditional BCD add/Sub circuit ............................................................................... 11
8: Nine's compliment module using CBLD ................................................................... 13
9: waveform for nine's complement module with CBLD methodology ....................... 14
10: Add by six/ buffer module using CBLD methodology ........................................... 15
11: Functionality verification for BCD adder using CBLD methodology .................... 15
12 : 9's Complement/ Add by one/ Buffer Unit by CBLD methodology ...................... 16
13: Functionality Waveform for stage 3 (c_e is borrow or subtraction
correction/a_s is add/Sub decision) ........................................................................ 17
14: Reduced Delay Adder Block Diagram [1] .............................................................. 17
15: Multistage CBLD .................................................................................................... 18
16: BCD correction module for reduced delay adder sing CBLD methodology [18]. .. 19
17: High- Speed Parallel Decimal Multiplication module [6] ....................................... 20
18: Optimized Circuit for High- Speed Parallel Decimal Multiplication circuit.
Using CBLD .......................................................................................................... 21
19: Multistage Approach for High- Speed Parallel Decimal Multiplication circuit
[18] ......................................................................................................................... 21
20: Waveform for Multistage CBLD for multiplier module ......................................... 22
21: Typical CSLA structure [18] ................................................................................... 24
22: BEC unit proposed in BEC CSLA [4] ..................................................................... 25
23: Typical Square Root Structure with increasing stage size [4] ................................. 26
24: SQRT Structure for BEC CSLA [4] ........................................................................ 26
25: Carry select adder with conditional carry [18] ........................................................ 27
26: Carry skip based BEC-adder [22] ............................................................................ 28
Texas Tech University, Nikhil Patil, May 2016
vii
27: Add by one realization ............................................................................................. 29
28: Ripple carry adder CBEC ........................................................................................ 30
29: Proposed CBEC RCA design with half adder [18] ................................................. 30
30: Waveform for 8-bit CSLA ....................................................................................... 31
31: SQRT structure delay path for last multiplexer ....................................................... 31
32: Critical Path for SQRT RCA-CBEC ....................................................................... 32
33: 4-4 Structure for RCA-CBEC ................................................................................. 32
34: The 2 bit RCA-CBEC unit ...................................................................................... 33
35: 2-bit cascade structure for CBEC-RCA .................................................................. 33
36: VLSI IC Design cycle [16] ...................................................................................... 34
37: Design Compiler Functional Diagram [17] ............................................................. 35
38: typical Floor plan in IC compiler ............................................................................ 37
39: Final IC layout using IC compiler ........................................................................... 38
40: ADP and PDP comparison for add/sub ................................................................... 39
41: ADP PDP comparison for High speed parallel multiplier ....................................... 41
42: Area Comparison ..................................................................................................... 42
43: Power comparison ................................................................................................... 43
Texas Tech University, Nikhil Patil, May 2016
1
CHAPTER I
INTRODUCTION
In arithmetic, there arises a condition when multiple modules are assigned for one
stage these modules work in parallel. However, many of the times in the several parallel
modules only one module is active. Complimentary Based Logic Design (CBLD) is an
approach to compress the redundant modules with a standard method of reduction.
Which, provide better Area, Power delay performance of complicated circuitry. In the
normal circuit there exist multiple modules which process the input differently but
depending on the condition only one or few modules are actually used. CBLD will
consider all the modules and a control signal and compress it to one module.
CBLD design method also works when a constant is to be added to a variable
number. Adding constant to a variable number is general practice in digital design. In
computer architecture “Increment memory address by 1” is a very typical operation.
Traditionally a constant is added by an n-bit binary adder to a variable number e.g.
address of a register in this case. This method of adding a constant is very area consuming
and slow. Various efficient circuits are present. However, these circuits are case pacific
and are generated using AOI or OAI method. On the other hand, CBLD gives a general
methodology which works in any add by constant case and gives comparative results in
standard form.
In the CBLD, XOR gates are used as controlled buffer or inverter modules to
control the output of each module. When one input of XOR gate is high the second input
of the gate is inverted so that the XOR gate acts as an inverter. If one of the inputs is low,
the other input is passed without any change and the XOR gate acts as a buffer in this
case. This attribute of the XOR gate is harnessed to develop the CBLD Methodology.
CBLD can be used to optimize existing circuit with multiple parallel modules
with conditional inputs and CBLD can also be used to design a new circuit where a
conditional constant addition is used. Second part of the thesis present such design
Texas Tech University, Nikhil Patil, May 2016
2
example of CBLD. A Conditional carry select adder is proposed and compared with state
of the art adders present. Carry select adders are used because of their better speed
performance compared to simple carry skip or carry save adders and they have less area
consumption compared to complicated adders like carry look ahead adders. A novel
structure of conditional binary to excess-1 adder based on carry select adder logic is
proposed.
Regular CSLA is area-consuming due to the dual Ripple-Carry Adder (RCA)
structure. For reducing area, the CSLA can be implemented by using a single RCA and
a Binary to excess-1 circuit instead of using dual RCA. A conditional BEC block is
proposed which can be implemented in place of regular BEC block and multiplexer to
add one to the output of modified RCA or pass the output without any change. This
approach reduces necessity of multiplexers for CSLA hence reduce delay, area and power
consumption compared to BEC-CSLA. Characteristics of modified RCA are compared
with regular CSLA and two recent CSLA designs with SQRT structure. Result and
analysis shows modified CBEC-CSLA have better area and power consumption than the
regular dual RCA-CSLA, CBEC-MRCA, BEC based carry select adder and CS- based
CSLA.
Texas Tech University, Nikhil Patil, May 2016
3
CHAPTER II
COMPLIMENTARY BASED LOGIC DESIGN
A typical multistage system is shown in the fig. 1. each block in the figure
represent a module. These modules generates an output which is used by the next stage.
The advantage of binary system is the output of any stage can only take two values i.e. 1
or 0 hence, we can say that the input is either inverted or passed without any change this
principle is used to develop CBLD methodology.
I3 I2 I1 I0
C0
C1
C2
C3
O0O1O2O3 Figure 1: Typical multistage system
As mentioned earlier CBLD heavily uses property of XOR gates, if we consider
one input to XOR gate as control signal and second input as variable. Then, if first input
is 1 the second input to gate (variable input) is then inverted, similarly if first input which
control signal is 0 second input to the gate is passed without any change.
For a given circuit a function of inputs can be formed to represent outputs. i.e.
Outputs = f(Inputs) Similarly, Complementary Based Logic Design considers a function
of inputs which complement the output. i.e. output 𝑓(𝑖𝑛𝑝𝑢𝑡𝑠) = Output.
CBLD F1CBLD F2CBLD F3CBLD F4
I[0]I[1]I[2]I[3]
O[0]O[1]O[2]O[3]
Figure 2: Complementary based Logic Design General Block Presentation
Texas Tech University, Nikhil Patil, May 2016
4
This gives us an ability to use control signal directly inside the module to bypass
the entire stage or complement the input according to the function.
Let’s, consider a case when we want to add one to the input if control signal x is
high if x is zero then input will be passed to output without any change.
Step 1: Represent output in terms of input complement.
Table 1: output in terms of input complement
Input Output Output in terms of inputs
I1 I0 O1 O0 O1 O0
0 0 0 1 NC C
0 1 1 0 C C
1 0 1 1 NC C
1 1 0 0 C C
* C = Complement ** NC = No Complement
Step 2: Find the min terms:
Now we consider all the complement terms as 1 and non-complement terms as zero.
Consider I1 and O1
Table 2: Min Terms representation
I1 O1 Complement Min term
0 0 NC 0
0 1 C 1
1 1 NC 0
1 0 C 1
Min terms for O1 = 𝐼1 are (1,3)
Step 3: Function Realization:
Function can be realized using SOP, POS or Quine McCuskey method. Simple SOP of
the above equation gives complement function as = I0
If we follow step 2 and 3 for O0 we get a function as I0
Texas Tech University, Nikhil Patil, May 2016
5
Step 4: Control Signal consideration:
Control signals are ANDed to the functions hence corresponding function is only
performed when control signal is high.
In the above equation there is only one case of complement for control signal X Hence
X is multiplied to these functions.
Depending on this equation circuit can be realized and is as shown in the fig.3
X
I0I1
O0O1
Figure 3: Final 2-bit Add by one Function
The proposed circuit performs add by one or pass the input as is operation with only 3
gates.
CBLD can further be expanded if we represent all the input to find the
compliment function.
Consider following example let’s represent O0 in terms of I1 and I0 for a random circuit.
Table 3: multi-input CBLD consideration
I1 I0 O1 O1 in
terms
of I1
O1 in
terms
of I0
0 1 1 𝐼1 I0
0 0 1 𝐼1 𝐼0
1 1 0 𝐼1 𝐼0
1 0 0 𝐼1 I0
Texas Tech University, Nikhil Patil, May 2016
6
Now the functions for O1 can be f (in terms of I1) = 𝐼1 or in terms of f (in terms of I0) =
I0 exor I Here we can chose which function is less complicated and easy to realize and
use that function instead of using O1 in terms of I1 directly. Clearly in this example O1
in terms of I1 is the best case, however this is not the case always.
1.1 XOR gate consideration:
Bottleneck for the modern digital design is an XOR gate. A logical representation
of an XOR gate is as follows:
A EXOR B = A.𝐵 + B.𝐴
Which can be represented as
A
O
B
Figure 4: General XOR gate representation
This structure is 3 Stage structure.
However, for design simulation modern compilers use very optimized design of
XOR gates. Various research shows that performance of XOR can be improved with 4-
transistor design New 4- Transistor XOR XNOR designs [14] paper shows a very novel
design of XOR gate based on Cell design methodology.
AB
XNOR
XOR
G6G5
G8G7
G3 G5
G1 G2 XOR
XONR
ln1
ln3
ln2
ln4
Y
Y
(A) (B)
Figure 5: (A) Proposed class- A XOR gate [14] (B) 3-input XOR [19]
This design is improved using transmission gates [15] this paper represents
various designs of XOR- XNOR circuits. The gates developed in this paper are balanced
Texas Tech University, Nikhil Patil, May 2016
7
XOR-XNOR circuits. Author suggests various methods to implement XOR-XOR circuit.
This group XOR uses feedback network mechanism to correct high impedance states to
obtain XOR-XNOR circuit from basic cells.
A systematic cell design methodology suggests a methodology to design XOR
[19]. This design provides full-swing and fairly balanced XOR circuit with 3 inputs. The
critical path contains only 2 transistors. Elementary basic cell (EBC) for this circuit is
shown in fig. 5 B
We performed design test on gates n Design vision to analyze XOR gate
performance following picture gives the delay result for XOR gate and OR gate.
(a)
(b)
Figure 6:(a) Time delay result for 2-input OR gate (b)Time delay result of 2-input XOR
In the fig.6 highlighted box represents the data arrival time for the gates. This
result shows the standard cell library (saed90_typ_ht.db) suggested by synopsis uses
modified XOR gate. This Modified gate has comparative time delay to OR gate. This
proves that we can consider XOR gate as standard gate with nearly same parameters as
AND OR gates.
Texas Tech University, Nikhil Patil, May 2016
8
CHAPTER II
OPTIMIZATION OF ARITHMETIC BUILDING BLOCKS
2. 1 Traditional Add/ Sub circuit for BCD
First arithmetic block optimized is a basic illustration of traditional BCD add-sub unit.
This example shows the optimization capacity of traditional design which considers add
by constant operation, nine’s complement operation and conditional addition operation
all these modules can be efficiently optimized by CBLD methodology and optimization
results in significant reduction in number of gates required providing less area
consumption in the circuit. Reduction in number of gates in critical path provides less
delay and as a result of reduced number of elements and elimination of Multiplexer unit
gives less power consumption.
A. BCD Addition:
In Computer Arithmetic numbers are represented in binary format. BCD is method
of representing a binary number in decimal format. BCD is method of representing a
binary number in decimal format. A decimal number contains 10 digits starting from 0
to 9. A group of 4 binaries are formed to represent a decimal digit. In BCD we can use
the binary number from 0000-1001 only, which are the decimal equivalent from 0-9.
Suppose if a number have single decimal digit then it’s equivalent Binary Coded Decimal
will be the respective four binary digits of that decimal number and if the number contains
two decimal digits then it’s equivalent BCD will be the respective eight binary of the
given decimal number, four for the first decimal digit and next four for the second
decimal digit. hence 64 will be represented as 0110 0100
BCD addition Steps
Texas Tech University, Nikhil Patil, May 2016
9
1) Add the two numbers using Binary Rules:
Case I 1010 Case II 0010
+ 0100 + 0101
1110 0111
2) Here we have to judge the result of addition. Above two cases describe the rules
of BCD Addition. In case 1 the result of addition of two binary number is 14
which is greater than 9, which is not valid for BCD number. But the result of
addition in case 2 which is 7 is less than 9, which is valid for BCD numbers.
3) The result in case I is incorrect and hence 6 has to be added to obtain a BCD
number
Hence,
Hence the result of addition is 4 with a carry 1
B. Nine’s complement method for BCD subtraction
9’s complement of any number is obtained by subtracting the number from (10n - 1)
here n = number of digits in the number e.g. 9’s complement of 333 can be obtained by
subtracting the given number from
103-1 hence
999
- 333
666
Steps to obtain 9’s complement subtraction:
Consider BCD SUBTRACTION By 9’s Compliment A= 8 , B = 6
1110
+ 0110
0100
Texas Tech University, Nikhil Patil, May 2016
10
1)9’s Compliment of first number
That is subtract the digit from 9 logically however it’s done in two steps
a) Compliment the number 1001
b) Add 1010 to the number = 0011
2) Add second BCD to the number, this addition is binary addition and does not consider
BCD aspect of the addition. Which gives the result of addition in hexadecimal format
and it can be greater than 9
1000
+ 0011
= 1011
3) As the addition in the previous stage results in the hexadecimal format, the number
should be converted back to BCD. The “Add 6 if the number is > 9” condition converts
the number to the BCD.
1011
+ 0110
= 0001
4) Add One if carry generated and if the carry is not generated this means the answer is
in 9’s complement format and needs to be converted back to BCD format by taking 9’s
complement of the number similar to stage 1 in this discussion.
0001
+0001
=0010
Hence, 8-6 = 2 (0010)
Texas Tech University, Nikhil Patil, May 2016
11
Add/Sub
4-Bt Binary Adder
10X5 MUX
4-Bt Binary Adder
4-Bt Binary Adder
4-Bt Binary Adder
00
00
0 0
0
0Ci
n
Ci
n
Ci
n
B3 B2 B1 B0
A0A1A2A3
9's compliment Buffer
BCD adderAdd By Six/ Buffer
BCD SUB Correction/ Add by One/ 9's Compliment
A
A
B
C
Figure 7: Traditional BCD add/Sub circuit
Above figure shows the traditional add/Sub circuit. First step in the circuit is 9’s
complement unit. second stage is BCD addition and third stage is BCD subtraction
adjustment. Add/ sub is control signal.
In the first stage add/Sub control is an input to XOR gates with first operand. If
the Add/ Sub signal is high the operand is complemented else the operand remains, the
same. Result of this operation is given to input of a binary ripple carry adder with 10 as
second input formed by add/sub decision. Hence binary 10 is added to the complemented
number or the second operand is passed without any change.
Texas Tech University, Nikhil Patil, May 2016
12
Second stage Adds the two operands with binary adder. Result of addition is
verified. If the result is greater than nine, six is added to the result and carry is generated.
In the third stage does the nine’s complement of the number or adds one to the
result depending on the add/sub decision and Carry/borrow generation.
2.1.1 Nine’s complement circuit (Block A in the fig 7):
In this stage input has to be complemented and added by ten. Traditionally it’s a
two stage approach, however as the input and outputs are definite we can directly map
them to obtain CBLD functions. Consider the nine’s compliment truth table:
Table 4: Nine's Complement table
Inputs Outputs
0 0 0 0 1 0 0 1
0 0 0 1 1 0 0 0
0 0 1 0 0 1 1 1
0 0 1 1 0 1 1 0
0 1 0 0 0 1 0 1
0 1 0 1 0 1 0 0
0 1 1 0 0 0 1 1
0 1 1 1 0 0 1 0
1 0 0 0 0 0 0 1
1 0 0 1 0 0 0 0
Solving above table and following the steps for function generation conditions mentioned
in table 5 can be obtained.
Table 5: Condition function for nine's complement
Bit Condition for compliment
B0 Always
B1 never
B2 A1
B3 A1 + A2
Texas Tech University, Nikhil Patil, May 2016
13
Here we consider A [3:0] is an input and B [3:0] is an output. Now again, we have
only one control signal i.e. Add/Sub decision hence this signal is ANDed to the given
conditions:
Table 6: CBLD final Function for Nine's complement
Bit CBLD Function
B0 X
B1 No operation
B2 X. A1
B3 ( A1 + A2 ). X
Above equations can be used to realize a CBLD circuit for 9’s complement. Fig
8 shows the CBLD circuit for the 9’s complement
B3 B2 B1 B0
I1I2I3
O0O2O3
I0
X
O1
I1I2I3
O0O2O3
I0
X
O1
Figure 8: Nine's compliment module using CBLD
Number of gates :6
Number of gates in critical Path: 3
Above circuit is verified in Verilog Xilinx Result of simulation proves the
functionality:
Test bench used for the verification is as follows:
Here count is temporary variable,
for(count=0;count<9;count = count+1)
begin
b = b+1;
Texas Tech University, Nikhil Patil, May 2016
14
a_s = 1;
#100;
end
Above experiment gives the following waveform:
Figure 9: waveform for nine's complement module with CBLD methodology
Here O[3:0] is an output and i[3:0] is an input.
2.1.2 BCD adder(Block B in the figure 7):
Second stage is BCD adder it first adds to BCD adders with Binary rules and
depending on obtained results adds 0 or 6, as the second part of this module adds 6 or 0
this module is conditional block hence we can use CBLD
This case is add by constant, constant is six, hence we have definite input and
outputs and hence CBLD functions can be found out. Considering the truth table, we get
following conditions:
Table 7: Function for add by six
Bit Condition for compliment
B0 No change
B1 Always
B2 I1
B3 (𝐼2 + 𝐼1)
Here condition is generated from first part of the module and its BCD correction,
considering condition signal x, we obtained following circuit:
Texas Tech University, Nikhil Patil, May 2016
15
X
I0I1I2I3
O0O1O2O3
Figure 10: Add by six/ buffer module using CBLD methodology
Functionality of the BCD module can be verified using Verilog and following figure
proves the functionality:
Figure 11: Functionality verification for BCD adder using CBLD methodology
2.1.3 nine’s Complement OR Add by one Or buffer(C):
This module adds one if borrow is generated, else does nine’s compliment if the
entire circuit is in subtraction mode else it doesn’t change input in addition mode. Both
the results are given to a multiplexer and result of addition or subtraction are given out.
In CBLD approach we have 3 conditions Add by one nine’s complement and
buffer, we have 2 control signals Add/sub decision and borrow:
Functions for nine’s complement are defined in 2.1.1 and functions for add by one can
be calculated similarly. Both the conditions are as follows:
Table 8: Condition for add by one and nine's complement from truth table
Bit Add By One Nine’s compliment
O0 Always Always
O1 I0 never
O2 (I1. I0) I1
O3 (I2. I1. I0) I1 + I2
Texas Tech University, Nikhil Patil, May 2016
16
Control Signal Consideration: Control Signal for these two conditions is borrow,
And these two modules will be active only in subtraction
Table 9: Control signal Conditions
Condition Control Signal
9’s complement (A_S).(𝐵)
Add by one (A_S).B
Combining above two tables we get functions as follows:
Table 10: CBLD Functions for the 9'c complement add by one module
Bit Final Function
For Add/Sub (X1)
O0 A_S
O1 A_S. B. I0
O2 A_S. (B(I0I1) +𝐵(I0))
O3 A_S. (BI2I1I0) +𝐵( ( I1 + I2))
Hence the circuit realization is as follows:
Borrow
Add/Sub
I3I2
I1I0
O3 O2 O1 O0 Figure 12 : 9's Complement/ Add by one/ Buffer Unit by CBLD methodology
Number of gates: 15
Number of gates in critical path: 5
Functionality of above circuit is verified using Verilog following waveform proves the
functionality:
Texas Tech University, Nikhil Patil, May 2016
17
Figure 13: Functionality Waveform for stage 3 (c_e is borrow or subtraction
correction/a_s is add/Sub decision)
2.2 Reduced Delay Adder Optimization Using CBLD
Reduced delay adder proposed by Alp Arslan Bayrakci, Ahmet Akkas et al is novel
design for an BCD adder. This Circuit has broadly two modules. Module one adds the 2
BCD numbers using binary rules. Second module does
Figure 14: Reduced Delay Adder Block Diagram [1]
Texas Tech University, Nikhil Patil, May 2016
18
the BCD correction. For the second unit carry input to entire circuit (Cin), carry
output from first module (Cout) and result of binary summation (Sum) are the
inputs cin and cout are arranged so that if cin is high and cout is low then 1 is
added to the summation. If cout is high and cin is low 6 is added to summation
if both Cin Cout are high 7 is added to the Summation and if both are low
summation is passed without any change
If we tabulate the conditions:
Table 11: Conditions in the BCD adder
Input Operation
Cin = 1,cout =0 Add By One
Cout = 1,cin =0 Add by Six
Cin & Cout =1 Add By Seven
Here we noticed that if condition 1 and condition 2 are performed together we
can satisfy condition 3. Hence, we use multistage approach in the CBLD. As
shown in following fig:
CBLD F11CBLD F12CBLD F13CBLD F14
O[0]O[1]O[2]O[3]
CBLD F01CBLD F02CBLD F03CBLD F04
Figure 15: Multistage CBLD
Instead of combining two stages we apply add by constant on each stage.
Conditions for add by six and add by one for CBLD can be obtained using truth
table and are as follows:
Texas Tech University, Nikhil Patil, May 2016
19
Table 12: Add by six and add by one
Bit
Condition for
compliment Add by
six
Condition for
compliment Add by
one
O0 No change Always
O1 Always I0
O2 I1 I1.I0
O3 (𝐼2 + 𝐼1) I2.I1.Io
Considering above circuit, we obtained following circuit:
`
Cin
Cout
A3 A2 A1 A0
S3 S2 S1 S0
Figure 16: BCD correction module for reduced delay adder sing CBLD methodology [18].
Number of gates with CBLD design: 15
Number of gates in critical path with CBLD design: 6
Number of gates in former design 24
Number of gates in critical path on former design 8
2.3 High- Speed Parallel Decimal Multiplication with Redundant Internal
Encoding circuit optimization with CBLD.
High -speed parallel decimal multiplier is an another good example of conditional
operation. Entire operation of the multiplier is beyond the scope and explanation of it is
redundant. The author of the paper suggests a modified block in final stage which adds
15, 10 or 9 to the sum generated in the previous modules. For this purpose, author has
used modified add by constant circuits for each of the constants. Result of summation is
then given to the multiplier. This module can be optimized by the CBLD methodology.
Texas Tech University, Nikhil Patil, May 2016
20
Figure 17: High- Speed Parallel Decimal Multiplication module [6]
Here the control signals are Ci+1,Ci. following table gives the condition for add
by constant in CBLD approach obtained from the truth table.
Table 13: Condition for add by nine / add by one/ add by 10
Bit Number Add by 10 Add By 15 Add by 9
0 Never Always Always
1 Always 𝐴0 Never
2 A1 𝐴0 . 𝐴1 A0A1
3 𝐴2 + 𝐴1 + 𝐴0 𝐴0. 𝐴3. 𝐴2 𝐴2 + 𝐴1 + 𝐴0
Conditions for control signals:
Table 14: Control signal condition
Input Operation
C0 = 1 Add by fifteen
C1 = 1 Add by ten
C0 & C1 =1 Add By nine
C0 & C1 = 0 Buffer Input to output
Hence the optimized Circuit:
Texas Tech University, Nikhil Patil, May 2016
21
C0
C1
I0I1I2I3
O0O1O2O3
Figure 18: Optimized Circuit for High- Speed Parallel Decimal Multiplication circuit.
Using CBLD
Number of gates in CBLD Design: 23
Number of gates in critical path in CBLD Design: 7
Number of gates in former design: 36
Number of gates in former design: 7
This Circuit gives reduced number of gates compared to original Circuit. However
Similar to 2.2 circuit optimization if two conditions are satisfied third condition is
satisfied simultaneously. Hence, we can use Multistage CBLD approach.
Following diagram shows the Multistage approach of CBLD:
I0I1I2I3
O3 O2 O1 O0
C0
C1
Figure 19: Multistage Approach for High- Speed Parallel Decimal Multiplication
circuit [18]
Texas Tech University, Nikhil Patil, May 2016
22
Number of gates with CBLD design: 21
Number of gates in critical Path with CBLD design: 7
Number of gates without CBLD with former design 36
Number of gates in critical path in former deign 7
Verilog verification waveform proves the functionality:
Figure 20: Waveform for Multistage CBLD for multiplier module
Texas Tech University, Nikhil Patil, May 2016
23
CHAPTER III
DESIGN OF CARRY SELECT ADDER USING CBLD
Carry Select Adder (CSLA) is used for arithmetic operations for better speed at
the expense of area and power. We present novel structure of Conditional Binary to
Excess-1 adder based on carry select adder design on gate level using CBLD
methodology. Regular CSLA is area-consuming due to the dual Ripple-Carry Adder
(RCA) structure. For reducing area, the CSLA can be implemented by using a single
RCA and a Binary to excess-1 circuit instead of using dual RCA. A conditional BEC
block is proposed which can be implemented in place of regular BEC block and
multiplexer to add one to the output of modified RCA or pass the output without any
change. This approach reduces necessity of multiplexers for CSLA hence reduce delay,
area and power consumption.
In digital design area and time delay are inversely proportional to each other. As
a result, time delay improvement results in consumption of more area and power. Carry
select adder is significant adder as this adder balances area and time delay. Efforts have
been taken to optimize the basic design of carry select adder. A CSLA based on common
Boolean logic is suggested in [20]. This adder uses only one XOR gate and one inverter
for summation and 1 AND gate and inverter for carry. This reduces the number of gates
significantly. However, the result shows that the delay obtained is similar to RCA adder.
Improvement for this is suggested in [21]. Author suggests more systematic
representation by using SQRT structure. This gives better speed. Comparison of these
adders to binary excess-1 [4] adder suggest that BEC- CSLA is the best design.
As the function of Carry select adder uses conditional operation of add by one or
passing the input to output without any change, CBLD can be used to optimize the
performance. With CBLD a new block is generated which is more efficient compared to
traditional CSLA and State of the art designs of CSLA proposed recently.
Texas Tech University, Nikhil Patil, May 2016
24
3.1 Regular CSLA:
A carry select adder is designed typically with 2 RCA blocks one RCA block adds the
input with carry input as 0 and second adds with 1 multiplexer is used to select one of
the sum word depending on carry in. here two binary adds work in parallel. This increases
the redundant units staying in idle condition. Following figure shows the typical RCA
block diagram
RCA-1
RCA-2
SUM and Carry
Selection Unit
A B
0
1
Cin
n n
nn
Coutn
Sum
Figure 21: Typical CSLA structure [18]
3.2 BEC-CSLA:
Low power and Area-efficient Carry select adder proposed by Ramkumar and Kuttur [7]
shown in fig.16. This design uses binary to excess-1 addition referred as BEC addition.
Carry bit and sum word for the RCA with Carry input 0 (C0, S0) are produced using
conventional RCA and Carry and sum word for RCA with Carry input 1 (C1, S1) are
produced using excess one block instead of using regular RCA. If we consider S0[i] and
C0[i] as outputs when Cin=0 and S1[i] and C1[i] as output when Cin=1; where i is the
number of a bit. The operation in Ramkumar and Kuttur paper can be formulated as
follows
Texas Tech University, Nikhil Patil, May 2016
25
BEC block is,
S1[0] = S’0[0] (1)
C1[0] = Cin (2)
S1[1]= S0[0] ⊕ S0[1] (3)
C1[1]= S0[0].S0[1] (4)
S1[2]= S0[2] ⊕ (S0[0].S0[1]) and so
on..
(5)
B0B1B2B3
O0O1O2O3
Figure 22: BEC unit proposed in BEC CSLA [4]
And final sum and carry are,
Sum = S0 . C’in + S1Cin (6)
Carry = C0 . C’in + C1Cin (7)
Expression 2 shows the C1[1] is dependent on S0[0]. This shows a small ripple in BEC
which travels from S1[1] to S1[n] for n bit adder. Hence a BEC adder has more delay than
regular CSLA. Efforts have been done to reduce this dependency and speed up the
addition.
Texas Tech University, Nikhil Patil, May 2016
26
Author suggest a Square root structure of the adder. Considering traditional adders as
shown in the following diagram:
Figure 23: Typical Square Root Structure with increasing stage size [4]
In above diagram all the highlighted blocks work in parallel and are independent of
previous stages. Hence, Delay of the circuit comes from MUX unit. MUX unit in
standard form are of same size, hence if the delay for one unit is n then total delay will
be (n x number) of stages. This delay can be reduced using SQRT structure consider
following diagram.
12345
Figure 24: SQRT Structure for BEC CSLA [4]
In above diagram all the stages have variable stage size. Hence suppose block 1
and block to requires 1 ns for execution each. Then inputs for the multiplexer are at the
same time. Similarly block 3 has a delay which is equal to the delay of block one plus
delay of multiplexer one hence for third stage multiplexer inputs arrive at the same time
and so on. This approach reduces the delay by significant amount.
Texas Tech University, Nikhil Patil, May 2016
27
3.3 CS block based CSLA:
Most effective effort to reduce dependency is given by Mohanty et al.[1]. Area-
Power-Delay Efficient Carry-Select adder suggested by Mohanty et al considers addition
of one in carry block instead of sum block. Following figure gives the proposed module
of Carry select adder suggested by them. In this design two carry words are generated for
individual bits instead of generating 2 separated sum words. Proper carry word is selected
depending on Cin with a modified multiplexer and added to the sum word previously
generated. Elimination of ripple for sum generation is successfully done and hence a fast
adder is designed.
Again two separate units are required for generation of two separate carry words and
hence this design is area consuming. Our effort is to minimize the area occupancy and
power consumption for the same BEC- based adders.
Cin
HSG
CG0
CG1
CS
FSG
C S
C0
C1
Cout
Sum
A B
nn
Figure 25: Carry select adder with conditional carry [18]
3.4 Carry- Skip based BEC-CSLA
BEC-CSLA is one of the most efficient adder. However, this adder has more
delay. This delay can be reduced by Carry skip logic proposed in [22]. This circuit uses
modified BEC block and AOI -OAI alternating logic to skip the carry. The proposed
design is as shown in the following diagram:
Texas Tech University, Nikhil Patil, May 2016
28
RCA without Cin
BEC Unit
A B
Cin
n
nn
Cout
n
Sum
Carry skip
Figure 26: Carry skip based BEC-adder [22]
Texas Tech University, Nikhil Patil, May 2016
29
CHAPTER IV
PROPOSED ADDER DESIGN
In CSLA just like arithmetic circuits analyzed in chapter II have a conditional
operation block. This block adds one or pass the input without change. Considering this
condition, we can form a CBLD circuit for 4-bits. After representing input in terms of
output and obtaining min terms we get following functions for add by one:
Step 3: Function realization for CBLD
Table 15: CBLD condition for Add by One
Bit Condition for
compliment
B0 Never
B1 A0
B2 A0A1
B3 A1A2A0
If we continue for more number of bits, the function observed is as follows
f(Bn ) = An-1.An-2……..A0
This gives a definitive form for the module which is easy to implement.
Cin
I0I1I2I3
Figure 27: Add by one realization
Compared to traditional BEC mentioned in the chapter II this unit doesn’t require
multiplier unit.
First part module consists of ripple carry adder and second module consists of
add by one unit.
Texas Tech University, Nikhil Patil, May 2016
30
a1b1a2b2a3b3
Cin
S0S1S2S3Co
a0b0
Figure 28: Ripple carry adder CBEC
4.2 Half Adder Design for RCA-CBEC:
The initial design comprises of a ripple carry adder with a conditional BEC unit.
The carry from the first bit is given to the CBEC chain. In this approach propagation
delay comes from the RCA chain. Leaving CBEC chain idle. Efforts are being made to
decrease the delay from RCA chain. First approach is replacing first bit full adder to half
adder. Proposed design is as shown in the following figure:
a0b0a1b1a2b2a3b3
Cin
S0S1S2S3Co Figure 29: Proposed CBEC RCA design with half adder [18]
Above circuit is tested using Verilog and result proves the functionality:
Texas Tech University, Nikhil Patil, May 2016
31
Figure 30: Waveform for 8-bit CSLA
4.3 Determination of stage size:
4.3.1 SQRT structure:
Chapter III BEC paper suggest a square root structure. This stage grouping is
preferred stage size grouping for its delay reduction. SQRT structure is efficient
because it efficiently manages the time delay from first stage Ripple carry adder and
multiplexer. In previous examples of CSLA [7] , as all the RCA work in parallel, only
first RCA and all the carry select multiplexers are in critical path.
Here a multiplexer used are n x (n/2) hence delay for any multiplier is the same
and is equal to 3 logic gates. Consider following diagram for the critical path. Now, if
we consider n-bit ripple carry adder produces n ns delay and a carry select multiplexer
produces 1 ns delay in following diagram. We get the input for CY mux at 5 ns from
both the critical paths RCA [15:11]. Consider figure 29.
RCA [1:0]
CY Mux [3:2] CY Mux [6:4] CY Mux [10:7]2 ns 3 ns 4 ns
RCA [15:11]
CY Mux [10:7]
5 ns
5 ns
Figure 31: SQRT structure delay path for last multiplexer
In case of CBEC- RCA critical path is produced in the CBEC module and it’s
not same for each step size. Consider following fig. 31. Here the critical path is along
the CBEC unit. Hence a increasing order of CBEC units increases the delay, which is
constant in case of multiplexers in the CSLA-BEC. Test results show that SQRT
Texas Tech University, Nikhil Patil, May 2016
32
structure of RCA-CBEC are better than the normal RCA-CBEC. However, the results
can be improved using
a0b0a1b1
S0S1
Co
a0b0a1b1
S0S1S2
a1b1
Figure 32: Critical Path for SQRT RCA-CBEC
4.3.2 Uniform stage size Cascade Structure:
CBEC- RCA’s critical path restricts the usage of SQRT and other equivalent
structure which manipulates the RCA delay to match second stage delay. Hence, a
uniform structure has to be considered. A 4-4 structure is shown in following diagram:
a0b0a1b1a2b2a3b3
S0S1S2S3Co
Figure 33: 4-4 Structure for RCA-CBEC
The critical path for this structure is along the first stage RCA and hence the delay of this
adder will be equal to RCA.
Texas Tech University, Nikhil Patil, May 2016
33
Co Cin
Figure 34: The 2 bit RCA-CBEC unit
The above diagram shows the 2- bit adder. In this case the delay is along the
CBEC unit unlike 4- bit cascade structure. The number of gates used in this structure are
for 2-bits is equal to 12 hence a 4-bit adder formed using two 2-bit adders has 24 number
of gates. On the other hand, a single 4- bit adder requires 26 gates. This is because of half
adder in bit 0. This is true for any n-bit cascade structure e.g. a 8-bit cascade will require
54 gates, compared to 48 gates required by 2-bit cascade structure. Hence the proposed
network structure for CBEC-RCA is 2-bt cascade as shown in the following diagram.
a0b0a1b1
S0S1
Co Cin
a0b0a1b1
S0S1
Co Cin
a0b0a1b1
S0S1
Co Cin
a0b0a1b1
S0S1
Co Cin
Figure 35: 2-bit cascade structure for CBEC-RCA
Texas Tech University, Nikhil Patil, May 2016
34
CHAPTER V
ASIC DESIGN FLOW
Before diving into the simulation results, we have to review how digital ICs
design flow works and how they are developed. During its development, a digital design
goes through various set of states and it undergoes multiple transformations from the
original set of specifications. Each of these transformations corresponds, coarsely, to a
different description of the system, each stage in this transformation is more detailed
compared to its previous stage and each stage has its set of primitives.
IC LAYOUT
Design Specification
Fabrication Design
RTL Design
RTL verification
Synthesis and Optimization
RTL VSGate Verification
Tech Mapping
Fabrication
Tech andPachaging
High Level Function descriptionVoid adder (int a ,int b, int sum, int carry){Sum = a+b; .. ..} Register level Function Module adder ( a , b, sum carry);
Input [3:0]a;Output [3:0]sum; .. ..endmodule
Gate Level Description
Silicon Die
Figure 36: VLSI IC Design cycle [16]
Texas Tech University, Nikhil Patil, May 2016
35
Above figure shows the top -down design flow in simplified format. However,
the industrial design flow is much more complex and consider much more iterations.
First stage in this process is design specification. This represents a document of
specification a specific desired module needs to fulfil. In our case a standard low area
and power consuming CSLA is a desired module. Design is realized using software
usually in C to verify the functional behavior of the concept. This software serves as the
reference point in the design to check functionality of the module.
Next phase in designing is RTL design and verification. During this phase, the
architectural description is further refined: memory element and functional components
of each model are designed using a Hardware Description Languages (HDL). RTL design
is the last stage of functional digital design. After this verification process starts,
verification makes sure that the design functions properly assuming there are no
manufacturing errors. Various tests are performed to check the module is functionally
correct. If any errors exist HDL design need to be modified to remove all the possible
errors. In design flow the RTL verification works parallel to the RTL design.
In the next stage of synthesis and optimization, a more detailed model of design
is generated which is optimized depending on design constrains. This stage gives an
initial view of actual IC’s area, power and delay. Design produced in this stage is in gate
level. If generated result does not satisfy area, power requirements we need to go back to
RTL design phase to make necessary changes. We use Design vision by synopsis.
Timing Optimization
Data Path Optimization
Power Optimization
Area Optimization
Test Synthesis
Timing Closure
Verilog HDL
HDL Compiler(Xilinx)
Optimized Netlist
Place and Route
Design CompilerConstraints
IP DesignWare
Tech File
Symbol Library
SDFPDEF
Timing Power analysis
Formal Verification
Figure 37: Design Compiler Functional Diagram [17]
Texas Tech University, Nikhil Patil, May 2016
36
Design Compiler uses technology libraries, synthetic or DesignWare libraries,
and symbol libraries to implement synthesis and to display synthesis results graphically.
Design compiler contains 2 library files, generic technology library(GTECH) file which
contains all the logic gates and flip flops and DesignWare which contains more complex
circuits such as adders and comparators. Both the files are technology independent. HDL
design is extracted using this file. At this point design is not mapped according to specific
technology. Symbol library is used to generate the schematic of the design.
After the extraction design compiler maps the design to specific Technology
using Technology file called target library. This process is constrain driven. These
constrains are specified by the user such as environmental restrictions under which the
synthesis is done.
After the design is optimized next stage is Test synthesis. This is the process by
which designers can integrate test logic into a design during logic synthesis. Test
synthesis enables designers to ensure that a design is testable and resolve any test issues
early in the design cycle. The result of the logic synthesis process is an optimized gate-
level netlist, which is a list of circuit elements and their interconnections.
After test synthesis, the design is ready for the place and route tools, which place
and interconnect cells in the design. Based on the physical routing, the designer can back-
annotate the design with actual interconnect delays; Design Compiler can then
resynthesize the design for more accurate timing analysis. The synthesized file is then
used for IC compiler.
Next Stage in the flow is Tech mapping and place routing. This is also called as
physical implementation. Tool used for physical implementation is IC compiler. Physical
implementation in Design consists of
1) Floor planning
2) Placement
3) Routing
Texas Tech University, Nikhil Patil, May 2016
37
Floor planning: At this stage we have the netlist which is the logical description
of the design. In floor planning we map the netlist design. floor planning is most
important stage in the physical implementation. Time area power consumption of the real
IC depends on the floor planning. In side floor planning following steps are performed:
1) Estimation of the size of the chip.
2) Arrangement of various blocks on the in the design on chip
3) Pin assignment
4) IO and power planning
5) Clock distribution is decided
Figure 38: typical Floor plan in IC compiler
Placement: Once floor planning is done placement is initiated.in this stage
standard cells are assigned to the rows defined in the floor plan. Space is left out for
connection of these cell. This enables us to see the capacitive load each cell will need to
drive. The placement is done internally depending on the algorithm. There are several
algorithms to achieve this e.g. constructive or min cut algorithm, Iterative algorithm etc.
Routing: Till now cells are just placed now, in this stage the connection is done.
It also splits into two routes: Global Route: This stage just plans the interconnections.
This optimizes the path delay and critical path delay. The chip is divided into small cells
these are called cell bins or gcells. The size of these gcells depends on the algorithm used.
Texas Tech University, Nikhil Patil, May 2016
38
Detailed routing: in this stage actual connections are made. Each layer has its own routing
grid and rules.
Once routing is done parasitic extraction is done with extraction tools and the
result gives us the parasitic resistance and capacitance in the circuit. Then, Timing is
calculated with static timing analysis tools. Once Timing requirement are satisfied and
LVS matching is done, design can be sent to foundry for manufacturing.
Figure 39: Final IC layout using IC compiler
Texas Tech University, Nikhil Patil, May 2016
39
CHAPTER VI
SIMULATION AND RESULTS
In this chapter we will discuss the ASIC simulation results based on
“saed90nm_typ_ht.db” tech file. We compared (a) Traditional Add/ Sub module with its
CBLD counterpart (b) Reduced delay adders’ conditional adder with CBLD counterpart
(c) High speed parallel multiplier and its CBLD counterpart (d) Binary to excess-1 CSLA
SQRT structure to CBEC-RCA (e) Conditional carry adder to CBEC-RCA (f)Regular
CSLA to CBEC-RCA.
For comparison we follow the procedure discussed in previous chapter. All
designed in Xilinx Verilog ISE, functionality of these modules is tested on Xilinx for all
the corner cases. The design is then compared on design vision to obtain more accurate
results.
6.1 Optimization of arithmetic building blocks using CBLD
6.1.1 Add/Sub module
Following table shows the comparison of Add/Sub Module to its CBLD
counterpart. For comparison only 1 digit is considered as the structure is uniformly
cascaded the result will be a linearly increasing graph.
Table 16: Result of comparison for Add/sub to CBLD counterpart
Name Area Power Delay ADP PDP
Add/sub 637.74um2 368uw 4.12ns 2627 1516
CBLD 356.65um2 146.73uw 2.65ns 945.12 388.8
Above result shows significant improvement in all three important parameters.
Figure 40: ADP and PDP comparison for add/sub
0
200
400
ADP PDP
ADP and PDP comparison
Add/sub
CBLD
Texas Tech University, Nikhil Patil, May 2016
40
6.1.2 Reduced delay adder
For comparison adder block of the design is considered.
Table 17: Reduced delay adder comparison
Name Delay Area Power ADP PDP
MRDA 2.61 1816.47 227.73 4740.98 594.37
CBLD 2.55 1511.42 182.0558 3854.12 464.2
As discussed earlier, the CBLD design of the reduced delay adder require 15 gates
compared to 20 gates required by the former design this reflects in less area power
consumption and reduction of number of gates in the critical path improves the time delay
marginally. Result of this gives a better power delay and area delay product. Following
graph shows the ADP and PDP improvement.
Table 18: ADP PDP comparison for Reduced delay adder
6.1.3 High Speed parallel multiplier
In this module like previous cases only module is considered which requires
modification. Following is the result of comparison.
Table 19: Comparison of High speed parallel multiplier with CBLD
Name Delay Area Power ADP PDP
Multiplier
block
1.02ns 215.65 44uW 219.3 44.88
CBLD 0.94ns 139.161 32.71uW 130.81 30.74
0
50
100
150
200
250
ADP PDP
ADP and PDP comparison
Add/sub
CBLD
Texas Tech University, Nikhil Patil, May 2016
41
In previous two examples authors have used binary adder for add by constant. In
this example modified add by constant modules have been used. Following graphs proves
the ability of CBLD to improve the area, delay and power
Figure 41: ADP PDP comparison for High speed parallel multiplier
6.2 Design of CBEC-RCA
In this section a 2-bit adder cascade structure of CBEC-RCA is compared with
state of the art adders. Adders are built for 8-bit, 16-bit,32-bit and 64-bit to compare the
performance over the increase in the stage size.
Following table gives the comparison of area, delay and power. From the table
1) CBEC-RCA consumes 57.08% less area, 9.8% more speed and has 57.55% more
power efficiency for 8-bit structure. Area efficiency improvement is at 56.63%,
power at 57.37% with marginal improvement in delay. This shows the CBEC-
RCA unit remains better than regular structure over the stage size.
2) Compared to conditional carry adder CBEC consumes 52.53% less area and
22.02% less power at 8-bit and at 64- bits 37.36% less area and 22.6% less power
consumption is noted.
3) Compared with BEC adder we observe 43.27%less area, 55.39% less power
consumption, with 38.54% better speed at 8- bit and at 64 -bit 43.59% less area and
59.45% less power with 15.96 % better speed.
Following table shows the result of comparison
0
50
100
150
200
250
ADP PDP
ADP and PDP comparison
Add/sub
CBLD
Texas Tech University, Nikhil Patil, May 2016
42
Table 20: Comparison Result for CBEC-RCA and standard adders
Width Design Area Delay Power PDP APP
µm2 ns µW 10-15 10-15
8
ADP 499.1 0.78 145.32 113.34 72.532
CBEC 326.8 1.65 113.32 186.97 37.039
BEC 576.3 2.71 254.06 688.50 146.42
Reg 759.4 1.83 267.8 490.07 203.36
16
ADP 1015.2 1.78 295.36 525.74 299.87
CBEC 653.5 3.07 230.90 708.86 150.91
BEC 1154.1 5.24 537.97 2818.9 620.89
Reg 1509.6 3.22 542.83 1747.9 819.45
32
ADP 2063.3 2.47 601.82 1486.4 1241.7
CBEC 1307.0 5.9 467.50 2758.2 6110.4
BEC 2311.6 9.40 1113.8 10469 2574.6
Reg 3015.3 6.01 1093.3 6570.7 3296.6
64
ADP 4173.5 3.67 1206.7 4428.5 5036.2
CBEC 2613.9 11.57 933.70 10802 2440.6
BEC 4632.7 16.66 2303.3 38372 1067.0
Reg 6027.2 11.59 2191.4 25398 1320.8
Following graph shows the Area and power comparison for 8,16,32,64 bit for all the
adders.
Figure 42: Area Comparison
0
500
1000
1500
2000
2500
8-bit 16-bit 32-bit 64-bit
Po
we
r (µ
w) CBEC
ADP
Reg
BEC
Texas Tech University, Nikhil Patil, May 2016
43
Figure 43: Power comparison
0
500
1000
1500
2000
2500
8-bit 16-bit 32-bit 64-bitP
ow
er
(µw
) CBEC
ADP
Reg
BEC
Texas Tech University, Nikhil Patil, May 2016
44
CHAPTER VII
CONCLUSION
New view of logic design with the name of Compliment Based Logic Design
(CBLD) has been used to model the Various Arithmetic circuits in the field of VLSI.
Comparison result obtained from Design vision of CBLD approach to their former
counter parts shows that the CBLD version of BCD Adder/ subtractor consumes 60%
less power and 19% less power than Modified reduced delay adder.
Further, A new design approach is proposed in this paper to reduce the area and
power of conventional structure for CSLAs architecture. This approach eliminates
multiplexer and hence reduce number of gates required. This work offers the very large
reduction of area and also the total power. The compared results show that the modified
CBEC-CSLA has a slightly larger delay, but the area and power of the 64-bit modified
MRCA are significantly reduced by 51.5%, 77.1% 81% compared state of the art
structures mentioned earlier. The modified RCA architecture is therefore, very less area
occupying and very less power consuming.
Texas Tech University, Nikhil Patil, May 2016
45
REFERENCES
[1] Alp Arslan Bayrakci, Ahmet Akkas, "Reduced Delay BCD Adder", IEEEInternational Conf. on
Application-specific Systems, Architectures and Processors(ASAP 2007), Volume-Issue: 9-11 ,
Page(s):266 – 271,July 2007.
[2] Sundaresan C. et al, “Modified reduced delay BCD adder ,”Int. conf. on Biomedical Eng. And infomatics,
City of Conf.,2011 ,pp.2148-2151.
[3] Al-Khaleel, O et al, “Fast and compact binary-to-BCD conversion circuits for decimal multiplication”.
IEEE 29th Int. Conf. on Computer Design (ICCD), 2011,pp.226-231
[4] B. Ramkumar and H M Kittur, “low-Power and Area-Efficient Carry Select Adder,”IEEE Trans. Very
Large Scale Integr. (VLSI) Syst.,vol 20, no.20,pp. 371-375,FEB 2012.
[5] M.M. Mano. Digital Design,pages 129-131.Prentice Hall,third edition,2002
[6] L. Han and S. B. Ko “High-Speed Parallel Decimal Multiplication with Redundant Internal Encodings”
IEEE Transactions on Computers. Vol-62, pp- 956-958,2013
[7] Ramkumar and H. Kuttur “Low- power and area-efficient carry select adder” IEEE trans. on VLSI sys.
vol. 20, no. 2, pp 371-375, feb 2012
[8] B. K. Mohanty and S. K. Patel “Area-delay-power efficient Carry select adder”. IEEE trans. on Circuits
and Sys. vol. 61, no. 6, pp 418-422, june 2014.
[9] Y. Kim and L.S. Kim “64-bit carry select adder with reduced area”. Electron. Lett.,vol.37,no.10,pp614-
615,may 2001.
[10] B. Parhami, Computer Arithmatic: Algorithms and Hardware Designs, 2nd ed. New York,USA: Oxford
univ. Press 2010.
[11] O. J. Bedrij, “Carry-select adder,” IRE Trans. Electron. Comput., pp. 340–344, 1962
[12] CEIANG, T.Y., and HSIAO, M.J.: ‘Carry-select adder using single ripplecarry adder’, Electron. Ixtt.,
1998, 34, (22), pp. 2101-2103
[13] T. Y. Ceiang and M. J. Hsiao, “Carry-select adder using single ripple
carryadder,”Electron.Lett.,vol.34,no.22,pp.2101–2103,Oct.199
[14] T. Nikoubin, et al, “A new cell design methodology for balanced XOR-XNOR circuits for hybrid-CMOS
logic”, Journal of Low Power Electronics. P.p. 474-483, December 2009
[15] T. Nikoubin and M. Grailoo, “Cell Design Methodology Based on Transmission Gate for Low-Power
High-Speed Balanced XOR-XNOR Circuits in Hybrid-CMOS Logic Style”Journal of Low Power
Electronics. Pp503-512, year2010
Texas Tech University, Nikhil Patil, May 2016
46
[16] Sherwani, N. A. Algorithms for VLSI Physical Design Automation,page – 5, Kluwer Academic
Publishers, Third edition, Nov 1998.
[17] Design compiler user guide, synopsys, version E-2010, December2010
[18] Nikhil Patil et. al. “RCA with conditional BEC in CSLA structure for area-power efficiency” 6th
International Conference on Computing, Communication and Networking Technologies (ICCCNT), July
2015
[19] T. Nikoubin et. al. “Energy and Area Efficient Three-Input XOR/XNORs With Systematic Cell Design
Methodology” IEEE Transactions on Very Large Scale Integration (VLSI) Systems. Vol.24, pp. 396-
402, March 2015.
[20] I.-C. Wey, C.-C. Ho, Y.-S. Lin, and C. C. Peng, “An area-efficient carry select adder design by sharing
the common Boolean logic term,” in Proc. IMECS, pp. 1–4, 2012.
[21] S. Manju and V. Sornagopal, “An efficient SQRT architecture of carry select adder design by common
Boolean logic,” in Proc. VLSI ICEVENT , pp. 1–5, 2013.
[22] Milad Bahadori "High-Speed and Energy-Efficient Carry Skip Adder Operating Under a Wide Range of
Supply Voltage Levels", IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol 24,pp
421-433, March 2015