low power asynchronous circuit design - ieee cas · low power asynchronous-logic circuit design by...
TRANSCRIPT
Low Power Asynchronous-Logic Circuit Design
by
Bah-Hwee GweeAssociate Professor, Nanyang Technological University
Senior Consultant Digital Hearing Aid CentreSenior Consultant, Digital Hearing Aid CentreFounder & Director, Advanced Electroacoustics Pte Ltd
1
OutlineOutline
MotivationsBackgroundAsynchronous-logicOur Group and ProjectsConclusionsFuture Projects
2
Motivations1. Huge Demand for Biomedical Electronics
Human needs due to health, aging, better quality of life (QoL) considerationsWide range of applications, e.g. hearing aids (instruments), EEG, health-monitoring bio-sensors, wearable biomedical systems, implantable devices including pacemakers, vision processors (cornea), etcetc.Sales – estimated US $66b in 2010 (*INEMI 2007)
3*INEMI – International Electronic Manufacturing Initiatives
2. Power-/Energy-Critical Requirements for Biomedical Devices
Motivations
o e / e gy C t ca equ e e ts o o ed ca e cesLong battery life-span (e.g. pacemaker 10-12 years)Low energy capacities (e.g. < 1mA for hearing aids)Associated small size requirement (e.g. miniature battery is used)Associated small size requirement (e.g. miniature battery is used)
3. Reliability IssuesOperation robustnessOperation robustnessInsensitive to variations due to process, voltage, and temperature (PVT)Reduced timing assumptions (or clock issues)Reduced timing assumptions (or clock issues)
4. Need of Ultra Low Power Digital CircuitsASIC designsASIC designsMicroprocessors and DSPs
4
BackgroundTechnology Roadmap
2004
Power densityClock frequencyN b f
Dotted lines: ‘Power Aware’ ScalingSolid lines: Classical Scaling
Technology Roadmap
2004Number of coresProcess variations
Time/90nm 65nm 45nm130nm180nm
5
Time/Process
Background
Computing Trend and Challenges
High-performance computing Ubiquitous computing
Ultra High-performance : PCs, Servers, Network Routers,
High-performance : PCs, notebooks, DVDs, Imaging Processors,
Low Power Dissipation:Audio Devices, Portable
Ultra Low Power Dissipation: Bio-medical Devices, Hearing Aids,Network Routers,
etc.Processors, Cellular Phones, etc.
Portable Consumer Electronics, Controllers, etc.
Hearing Aids, Sensors, etc.
Two common challenges for ultra low power devices:(1) Li it d iti ( ll d it )(1) Limited energy capacities (small power density)(2) Process, temperature, voltage (PVT) variabilities
6
Background
Some energy source examples
Battery Capacity Application Duration Power
A13 100mA.h Hearing aids 100 hours <1mA
UB6120 12A.h Pacemaker 10 years <0.15mA
A13, Hearing aidsUB6120, Pacemakers
Other Energy Source Power DensitySolar cell (indoor) 3 2 W/cm2Solar cell (indoor) 3.2μW/cm2
Solar cell (direct sunshine) 3700μW/cm2
Vibrations 5 – 500μW/cm2
Energy is limited!!
7
Background
Process variations
Probability-1σ-1σ µµ
130nm
180nm
90nm
Speed
45nm CMOS transistors can switch faster…. But higher margins are required!!
Margins for 84.1% yield
8
Background
Timing margin
+45% +25% +30% +10% +20%
More and more timing margins required due to complexity of the design!!
9
Asynchronous-Logic (Async-Logic) : Review
Bye Bye
10
Async-logic : Review
Brief Time Line1950s – Digital circuits were implemented (either async or sync)
1960s – Sync digital circuits became dominant
1989 – • Sutherland (Sun Microsystems) reignited the interests in async designs
• First async microprocessor was designed (Martin, Caltech)
1990s – • Many university research groups focused on async designs
• Big firms (Intel, Philips, IBM, etc.) joined inPhili i t d d fi t i l hi i• Philips introduced first async commercial chip in pagers
• Start-up company: Theseus Logic
2000s Start up companies: Fulcrum Microsystems2000s – Start up companies: Fulcrum Microsystems, Handshake Solutions and Self-Timed Solutions
Future – An alternative solution in many ICs, as predicted by International Technology Roadmap for Semiconductors
11
International Technology Roadmap for Semiconductors
Async-logic : Review
Fundamental difference between sync and async approaches
VS
S nc (clocked based) A (h d h k )Sync (clocked-based)approach
Async (handshake)approach
12
Async-logic : Review
Sync operation
Logic
Regis
Regis Logic
RegisDataIn gster
ster
g ster
CLK
Reg
Reg
Reg
L i
RegDataOut
Reg
L i
gister
gister
gister
Logic
gister
DataOut gister
Logic
13
Async-logic : Review
Async operation
Register
Register
Register
HCC HCC HCCDelay lines
in1ACK
REQin1
HCC HCC HCC HCC HCCREQACKout
Registe
Registe
Registe
Registe
Registe
Delay linesREQ out
*HCC: Handshake control circuits
er er
er er er
14
Async-logic : Review
Good and Bad for async-logic
1. No (or little) clock-related issues (e.g. clock skews, etc.)
Good (Potential advantages)
1. No (or little) clock related issues (e.g. clock skews, etc.)2. Low power dissipation3. Faster speed performance4. Robust towards PVT variations5 Hi h d l it5. High modularity6. Many more …
Bad (so far)
1. Lack of EDA tools2. Lack of methodologies (educational training is insufficient)2. Lack of methodologies (educational training is insufficient)3. Some bad design experiences (in the past)4. Difficult in testing
15
Our Group
1. Dr Joseph S. Chang (PI)2 Dr Gwee Bah Hwee (PI)
NTU Team:
2. Dr Gwee Bah Hwee (PI)3. Dr Chong Kwen Siong (Research Scientist)4. Law Chong Fatt (Research Associate, PhD thesis under
examination)5. Chang Kok Leong (PhD, A*Star Scholarship)6. Shi Yiqiong (PhD, NTU Scholarship)7. Lin Tong (PhD, President Scholarship)
Collaborators:
1. Prof. Alain Martin (Caltech, USA)2. Prof. Lars Wanhammar (Likoping Uni, Sweden)2. Prof. Lars Wanhammar (Likoping Uni, Sweden)3. DSO (Temasek Laboratories @ NTU)4. Prof Ser Wei (CSP @ NTU)
16
Our Projects
Primary Application: Digital Hearing Aid (Instrument)
Microcontroller(Interfacing)
Input Signal Output Signal
Filt B kDSP (noise
(Interfacing)
Filter Bank(Spectrum Analyzer)
reduction and signal
amplification
General block diagram in a hearing aid
17
Project 1: Async FFT Processor
Objective:To develop a low power async Fast Fourier Transform (FFT) processor (as a filter bank) for digital hearing aid applicationsapplications
FFT Algorithm )()(
1∑−
⋅=N nk
NWnxkX )()(0
∑=n
N
)/2sin()/2cos(/2 NnkjNnkeW NnkjnkN πππ −== −
Radix-2 butterfly (core computation for FFT)
-1WR + jI
R + jIa
b
a
b
R + jI
R + jI
a a
b b
, ,
, ,
Radix-2 butterfly (core computation for FFT)
R + jI R + jIa a a a
W = C +j(- S )
+, ,
b b
18×R + jI R + jIb b b b
j( )
-1+
, ,
Project 1: Async FFT Processor
Basic Features:
16-bit data, 128-point radix 2 formatSemi-custom/full-custom approachesCommercial EDA tools usedCommercial EDA tools usedAsync state-machine concepts (for controllers) with async handshakeUltra low power library cells developed (for datapaths)Fine-grain gating (innately controlled by async handshake)4-phase, hybrid single-rail & dual-rail implementationSimple data flows with 2 multipliers, 3 adders (see data flows below)
19
Block diagram of the async FFT processor
Project 1: Async FFT Processor
g y p
20
Project 1: Async FFT Processor
IC realization for the proposed async FFT processor
21
IC realization for the benchmarked sync FFT processor
Project 1: Async FFT Processor
Result : Energy dissipation per FFT operation from 1.1V to 1.4V
350
400
250
300
350
Sync FFT Processor
~ 40% lower energy!
150
200
Async FFT Processor
~ 40% lower energy!
50
100y
01.1V 1.2V 1.3V 1.4V
Voltage
22
Project 1: Async FFT Processor
Result : Characteristics of the FFT Processors
Benchmarked sync FFT processor
Proposed async FFT processor
Delay* ~ 1.215 ms @ 1MHz ~ 1.2 ms
Energy* ~ 188 nJ ~ 120 nJ
IC Area ~ 1.44mm2 @ 0.35um ~ 1.6mm2 @ 0.35um CMOS@CMOS
@
* Based on one complete FFT computation @ 1.1Vp p @
23
Our Project 2: Async 8051 Microcontroller
Objective:To develop a low power async 8051 microcontroller for digital hearing aid applications
Harvard Architecture1-bit operations
Basic features:
pAbundant development tools and benchmarksNon-orthogonal instruction set (different instruction lengths and cycle times)CControl-dominated applications8051 (a CISC) contains numerous exceptions, which must be traded-off by margining, if designing using synchronous logic, it is not the case for asynchronous logicy g8051 core is still a popular processor for ubiquitous computing
24
Project 2: Async 8051 Microcontroller
Methodology adopted, in part, by async Balsa tool
Balsa Standard
cell
.lef, .libBalsa Framework
Compilercell
library
BalsaHandshake
Cadence First Encounter Place and Route
Handshake cells
SynopsysN i
Parasitic information
GDSIITransistor level i l ti Nanosim
Cadence DesignLVS and
simulation
Forward flow gFrameworkDRC checksForward flow
Back annotation 25
Project 2: Async 8051 Microcontroller
Block Diagram of the async 8051 microcontroller
4kByte ROM
g y
128Byte RAM
Main pipelinep p
26
Project 2: Async 8051 Microcontroller
Read-Only-Memory IP Interface
Synchronous 4k×8
read-only memory (ROM)
addr_d0[11..0]addr_d1[11..0] Q_d0[7..0]
Q_d1[7..0]
Q_a
1212 8
8CENCLK
A[11:0] Q[7:0]
C
y y
(ROM)
addr_a
_C
addr_d0[11]addr_d1[11] Q_d0[7]
Q_d1[7]
Q[7]
addr_d0[0]addr_d1[0]
C
Q_d0[0]Q_d1[0]
Q_a
Q[0]
addr_a
• The channels addr and Q are asynchronous
• Interface circuits are added to ‘synchronize’ the asynchronous channels
• Timing assumption for addr a+ → Q a+ (α)
27
addr_a → Q_a (α)
Project 2: Async 8051 Microcontroller
Random-Access-Memory IP Interface
Asynchronous128×8
random-access memory (RAM)
RAMctrl_d0[15..0]RAMctrl_d1[15..0]
RAMctrl_a
Q_d0[7..0]Q_d1[7..0]
Q_a
1616
88
Synchronous 128×8
random-access memory (RAM)
RAMctrl_d0[15..0]RAMctrl_d1[15..0] Q_d0[7..0]
Q_d1[7..0]
Q_a
1616 8
8
CENCLK
A[6:0] Q[7:0]
C
WEND[7:0]
RAMctrl_d1[15..9]RAMctrl_d1[8..1]
RAMctrl_d1[0]
RAMctrl_d + Q_a +RAMctrl_a - Q_d -WEN=1WEN=0
RAMctrl_a
RAMctrl_d1[15] Q_d0[7]Q_d1[7]
Q[7]RAMctrl_d0[15]
RAMctrl_a + RAMctrl_d - Q_d + Q_a - RAMctrl_d0[0]RAMctrl_d1[0]
C
Q_d0[0]Q_d1[0]
Q_a
Q[0]
RAMctrl_a
• The channels RAMctrl and Q are asynchronous
• A, D and WEN are bundled in RAMctrl
• Timing assumption for RAMctrl_a+ → Q_a+ (α) during the read cycle
28
Project 2: Async 8051 Microcontroller
Results: Energy x Delay2 (Et2) on different designs
Matched-Delay
Et2
Proposed 8051 [2]
HT80C51 [3]
Synopsys DW8051 [1]
Et
QDI
Proposed 8051 [2]
Et2
Proposed 8051 [2]
Lutonium [4]
Synopsys DW8051 [1]
[1] "DesignWare Library - DW8051 MacroCell," https://www.synopsys.com/dw/doc.php/ds/i/DW8051.pdf, 2008.[2] K. L. Chang and B. H. Gwee, "A low-energy low-voltage asynchronous 8051 microcontroller core," in IEEE International Symposium on Circuits and Systems, pp. 3181-3184, 2006.[3] "HT80C51 microcontroller," http://www.handshakesolutions.com/products_services/HT-80C51/I d ht l 2008
29
80C51/Index.html, 2008.[4] A. J. Martin, et al., "The Lutonium: a sub-nanojoule asynchronous 8051 microcontroller," in International Symposium on Asynchronous Circuits and Systems, pp. 14-23, 2003.
Project 2: Async 8051 Microcontroller
IC Layout for the proposed QDI Async 8051 microcontroller and
Design Name: IBM 8051 0 13um
benchmarked sync 8051 microcontroller (under fabrication)
QDI Asynchronous
8051 core
IBM_8051_0.13um
Technology: 0.13um IBM CMOS (DM) Process
Asynchronous 8051
ROM XRAMRAM
8051 coreArea: 2.027mm x 2.027mm = 4.108729mm^2
Synchronous 8051
ROM XRAMRAM
Synopsys DW8051 IP
Package: LCC84M
Number of I/Os: 84
core
30
Project 3: Async 56002 DSP
Our Project 3: Async 56002 DSPOur Project 3: Async 56002 DSPObjective:To develop a low power async Motorola 56002 DSP for
digital hearing aid applications
ProgramCounter PC
Generator
PC UpdateDebugger
Debug
ProgramMemory
Generator
ALU AGU
Pipeline Flush
DebugSignals
DecoderInstructionProgramControl
Opcode
DataMemories
AddressBuses
DataBuses
ControlSignals
Block Diagram of the async 56002 DSP
31
Project 3: Async 56002 DSP
Preliminary result: Synchronous and asynchronous Data Arithmetic y y yand Logic Units (DALUs) – fabricated in May 08
Block diagram
32Microphotograph of the DALUs and its package
Our Project 4: Verilog HDL Asynchronous Complier
Objective:To develop a low power async EDA tool
Low control overheads considerationsLow power dissipation
Basic features:
p pArgumentation with existing sync and async EDA toolsStandard HDL languagesAdditional semantics for async componentsIntelligent translation for async componentsNetwork optimizationEasy of design (virtually transparent to designers for async handshake)
33
Proposed design methodology
Project 4: Verilog HDL Async Compiler
Proposed design methodology
34
Compilation example
Project 4: Verilog HDL Async Compiler
Compilation example
• Step 1– Extract all causal relations (i.e., control and data paths).– Determine sources and destinations of extracted paths.– Establish corresponding channels.
• Step 2– Determine the nature of each channel (such as push or pull, data or
control, guarded).
35
Project 4: Verilog HDL Async Compiler
Compilation example• Step 3
– Elaborate established channels into explicit handshake signals (i.e. request and acknowledge lines)
Compilation example
request and acknowledge lines).– Infer handshake components and instantiate them.
ri ro
Sync
ri ro
SyncRequest
• Step 4Rewrite part of specification to establish the handshake
lblb
ri
ai
ro
ao
ri
ai
ro
aoAcknowledge
– Rewrite part of specification to establish the handshake components’ control over their respective datapath components.
ri ro
Sync
ri ro
SyncRequest
lblb
ri
ai
ro
ao
ri
ai
ro
aoAcknowledge
x y
36
x y
Design example: Reed Solomon error decoder
Project 4: Verilog HDL Async Compiler
Design example: Reed-Solomon error decoder
• In general, provide correction if 2E + R ≤ P, where E, R, and P denote b f d it b l ti lnumber of errors, erasures, and parity symbols, respectively.
• Implemented version is based on DCC Error Corrector by Philips Research Laboratories and correct only 1 symbol.Accepts code words of 28 or 32 8 bit symbols including 4 or 6 parity• Accepts code words of 28 or 32 8-bit symbols, including 4 or 6 parity symbols, respectively.
• Suitable for async implementation: error detection block is automatically powered down when code word is correct.automatically powered down when code word is correct.
SizeErrorSyndromesSyndrome
Status
Symbol
Error
Detection
Syndromes Syndrome
ComputationError
Location
37
Project 4: Verilog HDL Async Compiler
Design example: Reed Solomon error decoder
• Async control network
Ssymbol req G
Design example: Reed-Solomon error decoder
S0symbol_req G0
P4A4
size_req
size ack C
C
C to error detection block
M0 P0A0
size_ack C
M1 P1A1
A
from error detection block
M2 P2
M3 P3
C
A2
A3
symbol_ack
G1L0 Cfrom syndrome
error_location_req
error location ack
(a)
38
M4 CG2
0from syndrome computation block
to syndrome computation block
error_location_ack
(b)
Project 4: Verilog HDL Async Compiler
Design example: Reed Solomon error decoder
• Comparisons– Process: AMS 0.35 μm CMOS
Design example: Reed-Solomon error decoder
– Supply Voltage: 3.3 V– Simulator: Synopsys Nanosim– Design Style: standard cell, 4-phase handshaking, delay-matching– Test Scenario: 95% correct code words, 5% incorrect code words (error at
first symbol)Balsa
Implementation Our ImplementationImplementation
# Trans.(k)
Control 12.3 0.72Datapath 19.6 5.27
Total 31.9 5.99Throughput
(mega codeword per sec)0.68 5.59
Energy perControl 19.2 0.19
39
Energy per codeword (nJ) Datapath 9.00 1.16
Total 28.2 1.35
Conclusions
W h d d 3 IC d i j d 1We have conducted 3 async IC design projects and 1 asyncEDA tool project
We have demonstrated that async-logic has lower power y g pattribute (over the standard sync-logic)
We have shown that async-logic is highly an alternative solution f l bi di l li tifor low power bio-medical applications
40