final report

24
1 San Jose State University College of Engineering Fall 2006, EE-271 Advanced Digital System Design and Synthesis Final Project Report Performance Trade-Off in Addition/Subtraction Circuits Under the guidance of: Prof. Thuy T. Le. Due Date: Dec 15, 2006 Team Member: Kunal Vyas (004837067) [email protected] Moe Kyaw Thu (002924598) [email protected]

Upload: azmeeralal

Post on 15-Nov-2014

708 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Final Report

1

San Jose State University

College of Engineering

Fall 2006, EE-271 Advanced Digital System Design and Synthesis

Final Project Report

Performance Trade-Off in Addition/Subtraction Circuits

Under the guidance of:

Prof. Thuy T. Le.

Due Date: Dec 15, 2006

Team Member: Kunal Vyas (004837067)

[email protected]

Moe Kyaw Thu (002924598)

[email protected]

Page 2: Final Report

2

Executive Summary The report is a detailed study of three main adders which are Ripple Carry Adder (RCA), Carry Select Adder (CSA), and Carry Look Ahead Adder (CLA). We implement all these adder in verilog and verify the result using ModelSim. Then we synthesized all three adders using Synopsys and analyze the circuits in terms of performance, area and power using design optimization. All the input and output constraints are the same for the analysis of all three circuits, and we verify that the circuit after optimization is the desired circuit as we have intended to build before optimization.

Page 3: Final Report

3

General Contents – INDEX

1) Introduction (Kunal Vyas)………………………………………..04 2) Specifications and Features (Kunal Vyas)………………………..05 3) Architecture (Kunal Vyas)……….……………………………….09 4) Design Development (Kunal Vyas)……………………………....13 5) Verification Testing, Validation and Analysis (Moe Kyaw Thu)...19 6) Design Analysis (Moe Kyaw Thu).......…………………………...21 7) Conclusion (Moe Kyaw Thu)………...…………………………...23

References ……………………………………………………......24

APPENDIX A…………………………………………………….A-1 APPENDIX B…………………………………………………….B-1 APPENDIX C…………………………………………………….C-1

Page 4: Final Report

4

1. Introduction One thing always comes in VLSI designer’s mind that they want to reduce Area and want to decrease delay of product. The Purpose of reduction in area is to increase yield and to reduce delay is for improvement in performance. The way we worked on this project is we divided 4,8,16 and 32 bit circuit in smaller parts and vary the performance for all so that we can give best result to out 32 bit (final) circuit with worth performance between Area/Power and Delay. This Project is designed for understanding of 16 and 32 bits of adder and subtraction circuits, Adders are widely used circuits everywhere and so by comparing Adders we can also compare other products too from same criteria. Another goal for adder comparisons is its easy to understand compare to other Arithmetic Circuits like Multiplier, Floating Point functions etcetera. This project is designed in Verilog and been tested, verified and Synthesized in Synopsys, CAD tool. We have checked trade-offs between 4, 8, 16 and 32 bits circuits with Addition/Subtraction functionalities. Trade-offs is been observed from Power-Delay and Area-Delay curves, from which we can easily judge about the circuit functionality and its importance in different bits and different kind of adder e.g. Ripple Carry, Carry Select, and Carry look Ahead adders. All down level modules are been checked and Synthesized first and then they are been constructed in to whole big adder so we can get optimal performance with best synthesized rules, sometimes synthesizer only add few not gates to achieve best timing and so we need to check whether functionality of circuit is maintained or not, with same logic. All the circuits (verilog code) has been synthesized in Synopsys tool, and been optimized with the same. Constraints are kept same for all kind of adders so we can easily observe differences between all adders. Digital Circuits are always comes with Trade-Off between Area, Power and Delay and so it depends on our application which kind of circuit we want for our application, from curves provided in this report will easily show which circuit is better for particular application and which not.

Adders are of many times and its heart of ALU, and some ALU contains multiple adders for many numerical presentations such as binary coded Decimal and Excess 3. Other products are also available which uses same fundamental of adder and can be considered as a part of adder, like KVM Switch, Multipliers, FIR Filters, Adaptive logic Module, Counters, and many more. There are many kinds of adders available in market too, like Carry Look Ahead Adders, Conditional Sum Adder, Ripple Carry Adder, Sklansky Adder, Kogge-Stone Adder, Brent Kung to name a few. All these adders have their own merits and demerits and it depend on our application which one is better in all.

Page 5: Final Report

5

2. Specifications and Features 2.1 Functionalities:

These adders can be used as either Addition or Subtraction by just changing input bits in from 0(zero) to 1(one) respectively, adders are capable of taking 32 bits inputs. They are designed in such a way that though having optimal synthesized its functionality for that adder and logic remains same.

2.2 Theory and Algorithm used in Design:

There are mainly three kinds of adders and Algorithms are used in this design.

I) Ripple-Carry Adder : Ripple carry adder is limited by the time required to propagate a signal transition from

the carry in bit to carry out bit, word length of processor is very high so it is recommended to use other adder is timing is our main goal for any circuit, one solution to increase timing is to use pipelining in this design. Ripple carry adder has recursive design and worst case delay is linear with number of bits which can be easily observed from given below graph of our project.

Where X axis shows number of bits and Y-Delay.

In 32 bit ripple carry adder, carry is ripple through all adders and goal is to make fastest propagation of carry though all adders, block diagram of ripple carry adder can be shown as below.

*

Page 6: Final Report

6

while gate-level schematic is nothing but full adders joint together with carry ripple though all. Timing of this circuit can also be calculated from Tadd ~ Tsetup + (n-1) Tcarry + Tsum_last. We can check this formula with our design and its verified that its taking almost same time as this formula does.

II) Carry Look-Ahead Adder:

The Algorithm for the carry look ahead adder is the observation that the value of the array into any stage of a multicell adder depends on only the data bits of the previous stages and the carry in to the first stage, so we can save our time by skipping other block’s carry generator and can generate carry for all simultaneously. Given cell is said to propagate carry if both the cell’s bits are 1. it is said to propagate carry if either of the cell’s data bits could combine with the carry into the cell to cause a carry out to next stage of the adder. We define generate and propagate bits gi and pi using the bitwise and (&) operator and the exclusive or (^) operator as follows.

gi = ai & bi

pi = ai ^ bi Logical expressions for sum and carry can be given as below. Si = pi^ci and Ci+1 = pi & ci + gi Simplified block diagram for P,G generator and bit propagation can be shown as below.

*

4 bit CLA is using almost 4Tg and 4 bit RCA is almost using double time comparing to CLA, including graph shows different delay with different timing constraints for 4 bit adders, from that graph we can easily verify that RCA uses double time compare to CLA, so carry look ahead adder is better design for timing.

Carry look ahead adder logic equations can be given as below, from which we made our verilog code (Circuit). S0 = p0^c0 C1= (p0&c0) + g0 S1 = p1^c1 = p1 ^ ( p0&c0) + p1^g0 C2 = (p1 & c1) + g1 S2 = p2 ^ g2

Page 7: Final Report

7

C3 = p2 & c2 + g2 and so on.

III) Carry –Selected Adder

For each bits carry 0 and carry 1 is calculated in advance and then it has been selected through multiplexers controlled by all levels of carries from the prefix structure, compare to other circuit this circuit requires much more area because it uses multiplexers in place of XOR gates. N bits for this circuit has been divided in to non overlapping different groups, block diagram representation for the same circuit can be shown as below. This can also be divided in to different number of bits so that we can achieve maximum timing out of all. It has to be done with K number of different bits groups with total N number of bits. Carry select adder is very area efficient for medium speeds if special carry select adders are available in the library we are using for our design. The carry-select adder is simple but rather fast, having a gate level depth O(N^1/2) levels.

*

Page 8: Final Report

8

2.3 Specification of the Product:

This product can run on certain attributes only, beyond that constraint it runs with degraded performance. It has drive strength of 0.08 with all input loads. It has O/p load of 7.5 pF for all outputs we can observe. Wire load for all is considered as 5 * 5 for all internal and external wiring. Operating condition for this has been set as WCCON (class). Apart from this attributes we have defined these circuits with timing, power and Area optimization restriction, so we can choose best circuit depends on our application. Circuit uses binary data, so we can not give any other input other than binary numbers, and it has to be 32 bits input, and output will give us 32 bit sum and 1 bit carry out. Intermediate stages are generated for observability and debug issues.

Page 9: Final Report

9

3. Architecture Description of Product Hardware, Software and Instructions: Our adder uses basic Hardware components like XOR, NAND, NOR, Inverters, AND, OR gates and Flip-Flops. The more we try to optimize circuit, it will use more complex gates instead of simple AND, OR. Software for this Architecture is Verilog. We have synthesized circuit from verilog code to gate level circuit using Synopsys software. To run this product we need to give 32 bits of data in to it and for addition Carry in should be 0 (zero) and for Subtraction we need to give input as 1 (one). 3.1 Hardware Block Diagrams(Gate Level):

1) Ripple carry Adder: As shown in Ripple carry adder it uses carry ripple method for all Full adder blocks. Full adder can be formed with XOR, AND and OR gates, each block will wait till carry for earlier block has been generated. Longest path for this circuit would be from a0 -> p0 -> c1 ->c2 -> c3 -> c4.

* 2) Carry Look Ahead Adder:

As shown in Carry Look Ahead adder, CLA uses NAND and Inverter gates to generate final Propagation, Generate and carry out from each level. Hardware block diagram for carry look ahead adder is as shown below. Synopsys will optimize this circuit by changing gates like substituting more inverter gates and uses complex cells, with reduced interconnect, basic cells for the circuits are AND, OR and XOR, this is optimized circuit for the same. Longest path for the same circuit would be from input b2 to c4 (cout) through p2.

Page 10: Final Report

10

*

3) Carry Select Adder:

Carry Select adder uses Multiplexers to select carry in either zero (0) or one (1) and the basic four bits adder as shown above, which uses basic gates as AND, XOR and NOR. Timing constraint and basic hardware block diagram is shown as below.

* 3.2 Description of all Hardware Terms: All the hardware inputs have limitation of driving capability of 0.08, and output drive has 7.5pF capacitance drive. Hardware has basic components for multiplexers, basic gates. Carry

Page 11: Final Report

11

look ahead has specification that carry will propagate all the way through Full Adders, so it will take time to pass through all adders. Carry select can be used as different k blocks, so it can have different bits of block, for best result we can use smallest numbers of bits in first and last blocks and can use more number of bits block in middle, that way it has minimal delay in this adder. Carry look ahead adder will not wait for other carry to be generated so it will take all that block’s input and from carry in it will generate other carries from the same data. 3.3 State Transition Graph (STG) for adder design

Page 12: Final Report

12

Page 13: Final Report

13

4. Design Development: 4.1 Coding Process: Coding has been done in verilog language. We have used Top-Down method for coding of different methods. There are two basic kinds of methodologies: top-down design methodology where we define top-level block first and then identify sub-blocks for each blocks. We further subdivide the sub-blocks until we come to leaf cells, which are the cells that cannot further divide. Other methodology is down-top methodology which uses just opposite to top-down methodology, where smaller (leaf) cells are defined first and from there we can come to our main part, main block. We have divided adder in to smaller number of bits adder which are around 4 bits in each adder. As a main block we are using 32 bit adder architecture by calling sub cells for two 16 bits adder and that both 16 bits adder are being called from four (4)four bit adders. Module for 4 bit adder is thus complex and fundamental, if 4 bit adder has got some problem then design become mess, we cant even propagate our data through other blocks, if its not working. 4.2 CAD Tool, Library and Technology: We are synthesizing and verifying circuit in Synopsys tool. So tool we used are Synopsys for verification and synthesis of our product from verilog module to gate level circuit and modelsim to check the functionality of the circuit whether it’s correct or not. We are not using any other advance library for this, but what we are using is default work library and their technology in our design, circuit has been synthesized from this library and their gates, with the optimal performance and functionalities. 4.3 Optimization for area, power and delay For optimization we need to define each in-out port for its limitation and constraints. Synopsys tool has facility to generate all attributes and constraints, from toll it self. From attributes, we can set timing, power and area constraint as from our requirements and can observe the trade-offs between all constraint by generating new synthesized circuit, and making sure that logic is not been modified by the tool. So we can generate and observe few constraints in tool itself with symmetrical rise and fall time, power limitation and area limitation. After being synthesized with new constraints we can Observe the report for this, which will show use delay, power and area new circuit has used, so we can use that in our graph for final design analysis. Herewith I am attaching my graphs taken from my design:

Page 14: Final Report

14

Page 15: Final Report

15

Page 16: Final Report

16

Page 17: Final Report

17

Page 18: Final Report

18

Page 19: Final Report

19

5. Verification, Testing and Analysis

Verification is a larger step of simulation. Unlike simulation, which is to see if the simulated circuit produces simulated waveform, the purpose of the verification is to make sure the circuit works as desired. In order to verify the functionality of the designs, which are Ripple Carry Adder, Carry Select Adder and Carry Look-ahead Adder, Model Simulator version 8.0 is used.

In our design verification of the addition and subtraction circuits, we mainly focus on the addition circuit since the subtraction circuit is nothing more than an addition circuit. In subtraction circuit, the input of the second operand goes through an inverter and set the carry in signal to 1, which is known as two’s complement and add the signals together using the adders implemented before.

First of all, the first verification for our design is on the building block of the design, which is full adder. Since full adder circuit is a fairly small circuit, we don’t really need to generate a test bench for verifying the full adder, and tested it manually. Before preceding the verification process for the addition circuit implementation, the design and coding steps for implementing three addition circuits, which are RCA, CSA and CLA are addressed.

For all the addition implementation, the basic building block is the full adder. In Ripple Carry adder, we make a group of 4 bit RCA by connecting 4 bit full adder in series. Then, the rest of the RCA adders, which are 8 bit, 16 bit and 32 bit, are implemented by instantiating 4 bit RCA. For 4 bit Carry Select Adder, we instantiating two 4-bit RCA with carry in 0 and 1 simultaneously and pass both results to the multiplexer which has the select signal connect to actual carry in value. Like in RCA, we also instantiated 4bit CSA to implement 8 bit, 16bit, and 32 bit CSA. In 4-bit CLA implementation, in addition to fall adder as a basic block, we also implement for block of 2 bit and 4-bit propagate and generate block to calculate the carry ahead of addition. Because of fan-in limitation, we implement CLA by creating 4 bit for each block.

In order to verify the addition circuit, we create a test bench which creates or generates input signals automatically and feed them to the circuit and observe the result waveform using modelsim simulator. The following is the example of 8 bit CLA test bench. In this test bench, we create some register to hold input date and generate a signal patter using forever loop and delay command. Then, pass those signals to the adder that need to be tested. In this example, an 8 bit CLA adder is called by test bench and passes the generated signal to its inputs. Then, run the simulation to see the resulting waveform in the waveform viewer. In order to test other adders, which are RCA and CSA, we just need to instantiated the adder in the place of CLA adder in the test bench, and run the simulation and verify the results. The following code is for the 8-bit test bench

Page 20: Final Report

20

module CLA8_test; reg [7:0] a, b; //input registers reg cin; wire [7:0] sum; wire cout; initial begin forever //generating input patterns begin #320 a[7] = ~a[7]; b[0] = ~b[0]; #160 a[6] = ~a[6]; b[1] = ~b[1]; #80 a[5] = ~a[5]; b[2] = ~b[2]; #40 a[4] = ~a[4]; b[3] = ~b[3]; #20 a[3] = ~a[3]; b[4] = ~b[4]; cin = ~cin; #5 a[2] = ~a[2]; b[5] = ~b[5]; #5 a[1] = ~a[1]; b[6] = ~b[6]; #5 a[0] = ~a[0]; b[7] = ~b[7]; end #2000 $finish; end CLA_8bit CLA(cout, sum, bus[7:0], bus[15:8], bus[16]); //CLA_8bit adder is called and pass the generated signal to the adder. //Place appropriate adders for test. endmodule

Page 21: Final Report

21

6. Design Analysis Ripple Carry Adder

*figure from lecture slide As shown in the figure, Ripple Carry Adder is an adder is implemented by connecting a series of full adder. The output of the lower bit carry out is connected to the carry in input of the next higher bit full adder. Therefore, carry out of the first lower bit adder is ripple through to the last bit of the adder, hence named Ripple Carry Adder (RCA). In terms of timing analysis, the result of the last bit adder will be valid only when the valid carry out bit of the lower bit adders is reached to its input, and one more one more gate delay as seen in the figure. This ripple time is not very significant if the adder is 4 bit adder. However, if we want to implement the higher bit adder such as 32 bit or 64 bit adders, it is not very wise to use RCA since the carry out bit has to ripple thought all the way to 32 or 64 bit, which is a significant amount of time to pass thorough. Therefore, to improve the performance to the RCA, we introduce Carry Select Adder. Carry Select Adder

Page 22: Final Report

22

*figure from lecture note Carry Select Adder (CSA) is implemented on RCA, where RCA are divided into groups and each group has two different carry in input 0 and 1. Then, use the mux to select the appropriate results, and pass to the next block. For example, for a 16 bit carry select adder, each RCA is divided into 4 bit group. In each group, the computation for carry out and sum is done by giving carry in 1 for one 4 bit RCA and 0 for another 4 bit RCA. Therefore, there are two RCA circuits in each group. The value that needs to be passed to the next block is decided by the mux. The is faster then the RCA because all the 4 bit group are computed their results simultaneously, while waiting for the mux to select the appropriate carry in signal. The trade off, of course, for this implementation is the area and power to achieve a faster speed. Carry delay time for this circuit can be calculated by adding 1 setup time and 1 computation time for sum and carry and ripple time for passing value from one mux to another. Setup time and computation for sum and carry out in each group are operated in parallel. The last and most efficient in area and power and speed compare to other addition circuit is Carry Look ahead Adder (CLA). In CLA, all the carry are generated using propagate and generate circuit. However, this method has limited fan-in limitation for calculating all the carry simultaneously. Therefore, we divided into 4 bit groups to calculate the carry out and ripple carry out from each group to another. CLA is by far the most efficient in terms of area, power and speed than all other addition circuit implemented.

Page 23: Final Report

23

7. Conclusion In this project we have studied trade-offs between Carry Look Ahead, Carry Select and Ripple Carry Adder. Our results shows that when choosing design alternative we need to consider all factors, as yield, area, power and timings, as for area Carry look Ahead adder and Less timings Ripple carry adder. Constraints such as Time, Area and Power affect in synthesize a lot, it can add some extra gates or complex gate to meet those requirements. The real performance gain heavily depends on actual data path but in some cases it also goes beyond the limitation which is a considerable margin. These circuits can still be improved by giving more gates, pipelining and by other algorithms, but it depends on customer’s requirement that in which block he want to place Adder.

Page 24: Final Report

24

References:

[1] Professor Dr. Thuy T Le’s lecture notes. (* diagrams are taken from notes) [2] Synposys tutorial. [3] I. Koren, Computer Arithmatic Algorithms. [4] R.P. Brent and H.T Kung, “regular layout for parallel adders” IEEE journal. [5] Verilog HDL by Samir Palnitkar, Prentice Hall [6] Advance Digitl Design with the VERILOG HDL by Michael D. Ciletti, PHI publication.