advances in designing clockless digital systems
DESCRIPTION
Advances in Designing Clockless Digital Systems. Prof. Steven M. Nowick [email protected]. Department of Computer Science Columbia University New York, NY, USA. Introduction. Synchronous vs. Asynchronous Systems? Synchronous Systems: use a global clock - PowerPoint PPT PresentationTRANSCRIPT
Advances in Designing Advances in Designing Clockless Digital SystemsClockless Digital SystemsAdvances in Designing Advances in Designing
Clockless Digital SystemsClockless Digital Systems
Prof. Steven M. NowickProf. Steven M. Nowick
[email protected]@cs.columbia.edu
Department of Computer ScienceDepartment of Computer ScienceColumbia UniversityColumbia UniversityNew York, NY, USANew York, NY, USA
#2
IntroductionIntroduction
Synchronous vs. Asynchronous Systems?Synchronous vs. Asynchronous Systems?
Synchronous Systems:Synchronous Systems: use a use a global clockglobal clock entire system operates entire system operates at fixed-rateat fixed-rate
uses uses “centralized control”“centralized control”
clock
#3
Introduction (cont.)Introduction (cont.)
Synchronous vs. Asynchronous Systems? Synchronous vs. Asynchronous Systems? (cont.)(cont.)
Asynchronous Systems:Asynchronous Systems: no global clockno global clock components can operate atcomponents can operate at varying ratesvarying rates
communicate locallycommunicate locally via via “handshaking”“handshaking”
uses uses “distributed control” “distributed control”
“handshaking interfaces”(channels)
#4
Trends and ChallengesTrends and Challenges
Trends in Chip Design: Trends in Chip Design: next decadenext decade ““Semiconductor Industry Association (SIA) Semiconductor Industry Association (SIA) Roadmap”Roadmap” (97-8) (97-8)
Unprecedented Challenges:Unprecedented Challenges: complexity and scale (= size of systems)complexity and scale (= size of systems) clock speedsclock speeds power managementpower management reusability & scalabilityreusability & scalability ““time-to-market”time-to-market”
Design becoming unmanageable using a Design becoming unmanageable using a centralized single clock (synchronous) centralized single clock (synchronous) approach….approach….
#5
Trends and Challenges (cont.)Trends and Challenges (cont.)
1. Clock Rate:1. Clock Rate:
1980: 1980: several MegaHertzseveral MegaHertz 2001: 2001: ~750 MegaHertz - 1+ GigaHertz~750 MegaHertz - 1+ GigaHertz 2005:2005: several GigaHertzseveral GigaHertz
Design Challenge:Design Challenge: ““clock skew”:clock skew”: clock must be clock must be near-near-simultaneoussimultaneous across entire chip across entire chip
#6
Trends and Challenges (cont.)Trends and Challenges (cont.)
2. Chip Size and Density:2. Chip Size and Density:
Total #Transistors per Chip: Total #Transistors per Chip: 60-80% 60-80%
increase/yearincrease/year ~1970: ~1970: 4 thousand4 thousand (Intel 4004 microprocessor)(Intel 4004 microprocessor)
today: today: 50-200+ million50-200+ million
2006 and beyond:2006 and beyond: towards 1towards 1 billion+billion+
Design Challenges:Design Challenges: system complexity, design time, clock distributionsystem complexity, design time, clock distribution clock will require 10-20 cycles to reach across chipclock will require 10-20 cycles to reach across chip
#7
Trends and Challenges (cont.)Trends and Challenges (cont.)
3. Power Consumption3. Power Consumption
Low power: ever-increasing demandLow power: ever-increasing demand
consumer electronics:consumer electronics: battery-powered battery-powered
high-end processors:high-end processors: avoid expensive fans, avoid expensive fans,
packagingpackaging
Design Challenge:Design Challenge: clock clock inherentlyinherently consumes power consumes power continuouslycontinuously
““power-down” techniques: complex, only partly power-down” techniques: complex, only partly
effectiveeffective
#8
Trends and Challenges (cont.)Trends and Challenges (cont.)
4. Time-to-Market, Design Re-Use, Scalability4. Time-to-Market, Design Re-Use, Scalability
Increasing pressure for faster Increasing pressure for faster “time-to-market”.“time-to-market”. Need: Need: reusable components:reusable components: “plug-and-play” design “plug-and-play” design
flexible interfacing:flexible interfacing: under varied conditions, voltage under varied conditions, voltage
scalingscaling
scalable design:scalable design: easy system upgradeseasy system upgrades
Design Challenge:Design Challenge: mismatch w/ central fixed-rate clock mismatch w/ central fixed-rate clock
#9
Trends and Challenges (cont.)Trends and Challenges (cont.)
5. Future Trends: “Mixed Timing” Domains5. Future Trends: “Mixed Timing” Domains
Chips themselves becoming Chips themselves becoming distributed distributed systems….systems…. contain many sub-regions, contain many sub-regions, operating at operating at different speeds:different speeds:
Design Challenge:Design Challenge: breakdown of single centralizedbreakdown of single centralizedclock controlclock control
#10
Asynchronous Design: Asynchronous Design: Potential AdvantagesPotential Advantages
Several Potential Advantages:Several Potential Advantages:
Lower PowerLower Power no clockno clock components use power only “on demand”components use power only “on demand”
Robustness, ScalabilityRobustness, Scalability no global timingno global timing“mix-and-match” variable-speed “mix-and-match” variable-speed componentscomponents
composable/modular design style composable/modular design style “object-oriented” “object-oriented”
Higher PerformanceHigher Performance systems not limited to “worst-case” clock ratesystems not limited to “worst-case” clock rate
#11
Asynchronous Design: Some Recent Asynchronous Design: Some Recent DevelopmentsDevelopments
1. Philips Semiconductors:1. Philips Semiconductors: commercial use: 100 million async chips for consumer electronics:commercial use: 100 million async chips for consumer electronics: pagers, cell phones, smart cards, digital passports, pagers, cell phones, smart cards, digital passports, automotive automotive
3-4x lower power3-4x lower power,, less electromagnetic interference (“EMI”) less electromagnetic interference (“EMI”)
2. Intel:2. Intel: experimental: Pentium instruction-length decoder = “RAPPID” experimental: Pentium instruction-length decoder = “RAPPID” (1990’s)(1990’s)
3-4x faster 3-4x faster than synchronous subsystemthan synchronous subsystem
3. Sun Labs:3. Sun Labs: commercial use: high-speed FIFO’s in recent “Ultra’s” (memory commercial use: high-speed FIFO’s in recent “Ultra’s” (memory access)access)
4. IBM Research:4. IBM Research: experimental: high-speed pipelines, filters, mixed-timing systemsexperimental: high-speed pipelines, filters, mixed-timing systems
Recent Startups:Recent Startups: Fulcrum, Theseus Logic, Handshake Solutions, Silistrix Fulcrum, Theseus Logic, Handshake Solutions, Silistrix
#12
Asynchronous CAD Tools: Recent Asynchronous CAD Tools: Recent DevelopmentsDevelopments
DARPA’s “CLASS” Program:DARPA’s “CLASS” Program: Clockless InitiativeClockless Initiative (2003- (2003-07)07)
Goals:Goals:- CAD tool: - CAD tool: produce viable produce viable commercial-grade async tool flowcommercial-grade async tool flow- demonstration: - demonstration: a complex Boeing ASIC chipa complex Boeing ASIC chip
Participants:Participants: Lead (PI):Lead (PI): Boeing Boeing Industrial participants:Industrial participants:
Philips (via async incubated startup, “Handshake Solutions”)Philips (via async incubated startup, “Handshake Solutions”) Theseus Logic, CodetronixTheseus Logic, Codetronix
Academic participants:Academic participants: Columbia, UNC, UW, Yale, OSUColumbia, UNC, UW, Yale, OSU
Targets:Targets: cover wide “design space” – very robust to high-speed cover wide “design space” – very robust to high-speed
circuitscircuits
Columbia’s role:Columbia’s role: (i) high-speed pipelines, (ii) CAD (i) high-speed pipelines, (ii) CAD
optimizationsoptimizations
#13
Asynchronous Design: Asynchronous Design: ChallengesChallenges
Critical Design Issues:Critical Design Issues: components must components must communicate cleanly:communicate cleanly: ‘hazard-free’ ‘hazard-free’
designdesign
highly-concurrent designs:highly-concurrent designs: much harder to verify! much harder to verify!
Lack of Automated “Computer-Aided Design” Lack of Automated “Computer-Aided Design”
Tools:Tools: most commercial “CAD” tools targeted to synchronous most commercial “CAD” tools targeted to synchronous
#14
What Are CAD Tools?What Are CAD Tools?
Software programs to aid digital Software programs to aid digital
designers = designers = ““computer-aided design” toolscomputer-aided design” tools
automatically automatically synthesize synthesize and and optimizeoptimize digital digital circuitscircuits
CADTOOL
Input:desired circuit specification
Output:optimized circuit implementation
#15
Asynchronous Design ChallengeAsynchronous Design Challenge
Lack of Existing Asynchronous Design Tools:Lack of Existing Asynchronous Design Tools:
Most commercial “CAD” tools targeted to Most commercial “CAD” tools targeted to
synchronoussynchronous
Synchronous CAD tools: Synchronous CAD tools: major drivers of growth in microelectronics industry major drivers of growth in microelectronics industry
Asynchronous “chicken-and-egg” problem:Asynchronous “chicken-and-egg” problem: few CAD tools few CAD tools less commercial use of async less commercial use of async designdesign
especially lacking: tools for designing/optmzng. especially lacking: tools for designing/optmzng.
large systemslarge systems
#16
Overview: My Research AreasOverview: My Research Areas
CAD Tools for Asynchronous Controllers CAD Tools for Asynchronous Controllers
(FSM’s)(FSM’s) ““MINIMALIST” PackageMINIMALIST” Package: for synthesis + : for synthesis +
optimizationoptimization
Other Research Areas:Other Research Areas:
CAD Tools for Designing Large-Scale Async SystemsCAD Tools for Designing Large-Scale Async Systems
Mixed-Timing Interface Circuits:Mixed-Timing Interface Circuits:
for interfacing sync/async systemsfor interfacing sync/async systems
High-Speed Asynchronous PipelinesHigh-Speed Asynchronous Pipelines
#17
CAD Tools for Async ControllersCAD Tools for Async Controllers
MINIMALIST:MINIMALIST: developed at Columbia University [1994-] developed at Columbia University [1994-] extensible CAD package for synthesis of extensible CAD package for synthesis of asynchronous controllersasynchronous controllers integrates synthesis, optimization and verification toolsintegrates synthesis, optimization and verification tools used in 80+ sites/17+ countries (being taught in IIT Bombay)used in 80+ sites/17+ countries (being taught in IIT Bombay) URL:URL: httphttp://www.cs.columbia.edu/async://www.cs.columbia.edu/async
Includes several optimization tools: Includes several optimization tools: State MinimizationState Minimization CHASM: CHASM: optimal state encodingoptimal state encoding 2-Level Hazard-Free Logic Minimization2-Level Hazard-Free Logic Minimization Verilog back-endVerilog back-end
Key goal: Key goal: facilitate design-space explorationfacilitate design-space exploration
#18
Example: “PE-SEND-IFC” (HP Example: “PE-SEND-IFC” (HP Labs)Labs)Inputs:req-sendtreqrd-iqadbld-outack-pkt
Outputs:tackpeackadbld
0
1
2
7
3
4
5
6
8
9
10
req-send+ treq+ rd-iq+/adbld+
adbld-out+/peack+
rd-iq-/peack- adbld- tack+
adbld-out- treq-rd-id+/ adbld+
adbld-out+/peack+
rd-iq-/ peack- adbld- tack-
adbld-out- treq+ ack-pkt+/ peack+ tack+
ack-pkt- treq-/peack- tack-
treq-/tack-
treq+/tack+
ack-pkt+/peack- tack-
adbld-out-treq- ack-pkt+/
peack+
req-send-/--
adbld-out- treq+ rd-iq+/ adbld+
From HP Labs “Mayfly” Project:B.Coates, A.Davis, K.Stevens, “The Post Office Experience: Designing a Large Asynchronous Chip”, INTEGRATION: the VLSI Journal, vol. 15:3, pp. 341-66 (Oct. 1993)
#19
EXAMPLE (cont.):EXAMPLE (cont.):
Examples:
Design-Space Explorationusing MINIMALIST:
optimizing for area vs. speed
#20
CAD Tools for Large-Scale CAD Tools for Large-Scale Asynchronous SystemsAsynchronous Systems
Target:Target: - synthesize - synthesize distributed distributed control control - 1 controller per - 1 controller per functional unitfunctional unit
Target Architecture:Target Architecture:
RegisterRegister RegisterRegister
FunctionaFunctional l
UnitUnit
FunctionaFunctional l
UnitUnit
FunctionaFunctional l
UnitUnit
StartStart
LoopLoop C< 0 C< 0
B:=2dx+dxB:=2dx+dx C:=X<aC:=X<a
M:=U*X1M:=U*X1
EndlooEndloopp
EndEnd
X:=X+dxX:=X+dx C:=X<aC:=X<a
Input Specification:Input Specification:= “Control Data-flow Graph”= “Control Data-flow Graph”
control control unitunit
Ctrlr 1Ctrlr 1
Ctrlr 2Ctrlr 2
Ctrlr 3Ctrlr 3
[Theobald/Nowick, IEEE Design Automation Conf. (2001)]
#21
Mixed-Timing InterfacesMixed-Timing Interfaces
AsynchronousDomain
SynchronousDomain 1
SynchronousDomain 2
Goal: provide low-latency communication between “timing domains”
Challenge: avoid synchronization errors
Goal: provide low-latency communication between “timing domains”
Challenge: avoid synchronization errors
AsynchronousDomain
#22
Mixed-Timing Interfaces: Mixed-Timing Interfaces: SolutionSolution
AsynchronousDomain
SynchronousDomain 1
SynchronousDomain 2
Async-Sync FIFO
Async-Sync FIFO
Sync-Async FIFO
Mixed-Clock FIFO’s
… developed complete family of mixed-timing interface circuits [Chelcea/Nowick, IEEE Design Automation Conf. (2001)]
… developed complete family of mixed-timing interface circuits [Chelcea/Nowick, IEEE Design Automation Conf. (2001)]
Solution: insert mixed-timing FIFO’s provide safe data transferSolution: insert mixed-timing FIFO’s provide safe data transfer
AsynchronousDomain
#23
global clock
NON-PIPELINED COMPUTATION:
High-Speed Asynchronous High-Speed Asynchronous PipelinesPipelines
“datapath component” = adder, multiplier, etc.
SYNCHRONOUS
#24
global clock
SYNCHRONOUS
ASYNCHRONOUS
“PIPELINED COMPUTATION”: like an assembly line
no global clock
High-Speed Asynchronous High-Speed Asynchronous PipelinesPipelines
#25
Goal: Goal: extremely fast async datapath componentsextremely fast async datapath components speed:speed: comparable to comparable to fastestfastest existing synchronous designs existing synchronous designs additional benefits:additional benefits:
dynamically adapt to variable-speed interfaces: dynamically adapt to variable-speed interfaces: voltage scaling!voltage scaling! ““elastic” processing of data in pipelineelastic” processing of data in pipeline no clock distributionno clock distribution
Contributions: Contributions: 3 new async pipeline styles3 new async pipeline styles [SINGH/NOWICK][SINGH/NOWICK] MOUSETRAP:MOUSETRAP: static logicstatic logic High-Capacity/Lookahead:High-Capacity/Lookahead: dynamic logicdynamic logic
Obtain multi-GigaHertz speedsObtain multi-GigaHertz speedsUsed by IBM, currently incorporated into Philips tool flowUsed by IBM, currently incorporated into Philips tool flow
High-Speed Asynchronous High-Speed Asynchronous PipelinesPipelines
#26
reqN
ackN-1
reqN+1
ackN
Data Latch
Latch Controller
doneN
Data in Data out
Stage NStage N-1 Stage N+1
En
MOUSETRAP: A Basic FIFO (no MOUSETRAP: A Basic FIFO (no computation)computation)
Stages communicate using Stages communicate using transition-transition-signaling:signaling:
[Singh/Nowick, IEEE Int. Conf. on Computer Design (2001)]
#27
Stage N+1
logic
delaydelay
Stage N
Data Latch
Latch Controller
doneN
logic
delaydelay
Stage N-1
logic
delaydelayreqreqNN
ackN-1
reqreqN+N+11
ackN
““MOUSETRAP” Pipeline: MOUSETRAP” Pipeline: w/computationw/computation
Function Blocks:Function Blocks: use “synchronous” single-rail circuits use “synchronous” single-rail circuits (not hazard-free!)(not hazard-free!)
““Bundled Data” Requirement:Bundled Data” Requirement: each each “req”“req” must arrive must arrive afterafter data inputs valid and stable data inputs valid and stable
#28
#29
reqN
ackN-1
reqN+1
ackN
Data Latch
Latch Controller
doneN
Data in Data out
Stage NStage N-1 Stage N+1
En
MOUSETRAP: A Basic FIFOMOUSETRAP: A Basic FIFOStages communicate using Stages communicate using transition-transition-
signaling:signaling:1 transition1 transitionper data item!per data item!
One Data Item