asynchronous datapath design
DESCRIPTION
Asynchronous Datapath Design. Adders Comparators Multipliers Registers Completion Detection Bus Pipeline …. Read Reading 3: Delay-Insensitive Adders. Asynchronous Adder Design. Motivation Background: Sync and Async adders Delay-insensitive carry-lookahead adders - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/1.jpg)
Asynchronous Datapath Design• Adders• Comparators• Multipliers• Registers• Completion Detection• Bus• Pipeline•….. Read Reading 3:
Delay-Insensitive Adders
![Page 2: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/2.jpg)
Asynchronous Adder Design
• Motivation• Background: Sync and Async adders• Delay-insensitive carry-lookahead adders• Complexity Analysis• Conclusions
![Page 3: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/3.jpg)
Motivation
• Integer addition is one of the most important operations in digital computer systems
• Statistics shows that in a prototypical RISC
machine (DLX) 72% of the instructions perform additions(or subtractions) in the datapath.
• In ARM processors it even reaches 80%.
• The performance of processors is significantly influenced by the speed of their adders.
![Page 4: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/4.jpg)
Background
• Adders: synchronous or asynchronous synchronous adders: worst case performance asynchronous adders: average case performance
• For example:
Ripple-Carry Adders(synchronous): O(n) Carry-Completion Sensing Adders(asynchronous): O(log n)
![Page 5: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/5.jpg)
Background: Binary Addition
• Worst case 00000001 + 11111111 ---------------------- S 00000000 C 11111111 ---------------------- 100000000
• Adders can perform average case behavior
• Best case 00000000 + 00000000 ---------------------- S 00000000 C 00000000 ---------------------- 000000000
![Page 6: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/6.jpg)
Background
• Ripple-Carry Adders:
• One-stage full adder:• Logic complexity: O(n)• Time complexity: O(n)
![Page 7: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/7.jpg)
Background
• Carry-Sensing Completion Detection Adders: (asynchronous version of RCA)
![Page 8: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/8.jpg)
Background
• One-stage CSCD Adder:
• Carry-Sensing Completion Detection Adders:
Logic complexity: O(n) Time complexity: O(log n)
![Page 9: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/9.jpg)
Background
• Delay-Insensitive Ripple-Carry Adders: (DI version of RCA):
![Page 10: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/10.jpg)
Background
• One-stage DIRCA:
• DIRCA Adders:
Logic complexity: O(n) Time complexity: O(log n)• One of the most robust adders
![Page 11: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/11.jpg)
Background
• Completion detection for asynchronous adders:
![Page 12: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/12.jpg)
Background
• DI adder VS Bundling Constraint adder:
![Page 13: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/13.jpg)
Carry-Lookahead Adders
• RCA requires n stage-propagation delays. • For high speed processors, this scheme is undesirable. • One way to improve adder performance is to use parallel processing in computing the carries. • That is why Carry-Lookahead Adders (CLA) are introduced.
• CLAs:
Logic complexity: O(n) Time complexity: O(log n)
![Page 14: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/14.jpg)
Carry-Lookahead Adders
![Page 15: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/15.jpg)
Carry-Lookahead Adders
• A module:
• B module:
![Page 16: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/16.jpg)
DI Carry-Lookahead Adders
• Delay-Insensitive Carry-Lookahead Adders (DICLA) may be implemented by using delay-insensitive code.
1. dual-rail signaling: inputs, sums, and carry bits
2. one-hot code: internal signals
A1=0A0=0
A1=0A0=1
A1=1A0=0
A1=1A0=1
a. No data b. valid 0 c. valid 1 d. illegal
a. No data: 000b. 001c. 010d. 100
![Page 17: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/17.jpg)
QDI Carry-Lookahead Adders
• DI C module: 1. internal signals: one-hot code, k, g, p
2. input and sum bits: dual-rail signals
CLA A module
![Page 18: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/18.jpg)
QDI Carry-Lookahead Adders
• DI D module: 1. Internal signals: one-hot code, K, G, P 2. Carry bits: dual-rail signals
CLA B module
![Page 19: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/19.jpg)
DI Carry-Lookahead Adders
![Page 20: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/20.jpg)
DI Carry-Lookahead Adders
If A3=B3 thenC3 is carry kill or generate
k3,g3
![Page 21: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/21.jpg)
DI Carry-Lookahead Adders
G3,2, K3,2
can be used tospeed up the carry computation too.
k3,g3
K3,2, G3,2
![Page 22: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/22.jpg)
Speeding Up DICLA
• Idea: Send the carry-generate’s and carry-kill’s to any possible stages which needs these information to compute carries immediately.• D module with speed-up circuitry
![Page 23: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/23.jpg)
Speeding Up DICLA
• General form:• D module with speed-up circuitry
for carry-kill
for carry-generate
= gj-1+gj-2Pj-1+…+g0p1p2…pj-1
This is in fact the full carry-lookahead scheme.
![Page 24: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/24.jpg)
Speeding Up DICLA
• Problem of full carry-lookahead scheme • practical limitations on fan-in and fan-out, irregular structure, and many long wire.• logic complexity increases more than linearly
• Solution: use the properties of tree-like structure• New speed-up circuitry:
![Page 25: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/25.jpg)
• SP focuses on the root node of a subtree.• All leftmost root node of its right subtree
![Page 26: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/26.jpg)
Power of Speed-up Circuitry
x : carry chainx’ in r subtreex-x’ in l subtree
![Page 27: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/27.jpg)
Power of Speed-up Circuitry
Without Speed-up circuitry
![Page 28: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/28.jpg)
Power of Speed-up Circuitry
With Speed-up circuitry
![Page 29: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/29.jpg)
Optimization:
• Simplified D module • Simplified D’ module
• Better logic complexity• Delay-Insensitive again
![Page 30: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/30.jpg)
![Page 31: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/31.jpg)
Complexity Analysis
• DICLASP
• Logic Complexity: (n)• Time Complexity: (log log n)• Best area-time efficiency: (n log log n)
![Page 32: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/32.jpg)
Complexity Analysis
![Page 33: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/33.jpg)
CMOS: C module
![Page 34: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/34.jpg)
CMOS: SD module
![Page 35: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/35.jpg)
CMOS: SD’ module
![Page 36: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/36.jpg)
SPICE Simulation:
SPICE Simulation contains two parts:• Random number inputs: 10000 random generated input pairs• Statistical data: running examples on a 32-bit ARM emulator
![Page 37: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/37.jpg)
SPICE Simulation:
• Random number input distribution
![Page 38: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/38.jpg)
SPICE Simulation:
• SPICE simulation results: random number inputs
• Speedup: DIRCA vs RCA: 6.39 DICLASP vs CLA: 2.64
![Page 39: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/39.jpg)
SPICE Simulation:
• Breakdown of addition/subtraction operations: by runing three benchmark programs: Dhrystone f1, Dhrystone f2 and Espresso dc2 on a 32-bit ARM simulator
![Page 40: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/40.jpg)
SPICE Simulation:dynamic traces
![Page 41: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/41.jpg)
SPICE Simulation:
• dynamic traces• 83.92% instructions: |carry chain| <17
![Page 42: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/42.jpg)
SPICE Simulation:
• SPICE simulation results: dynamic traces• Average computation time:
DIRCA 9.61ns DICALSP 5.25ns• Speedup: DIRCA vs RCA: 4.1
DICLASP vs CLA: 2.2
![Page 43: Asynchronous Datapath Design](https://reader035.vdocuments.site/reader035/viewer/2022062809/56815809550346895dc578b5/html5/thumbnails/43.jpg)
Conclusion
• DICLASP Best area-time efficiency: (n log log n)
Correctness: No adder is more robust than
DICLASP
Cost(Logic Complexity):No parallel adder is
cheaper than DICLASP ((n)). Speed(Time Complexity):No adder is better
than DICLASP ((log log n)). Suitable for VLSI implementation.