data communication mechanisms for systems with heterogeneous timing'. thanks for the invite!...
Post on 20-Dec-2015
213 views
TRANSCRIPT
'Data Communication Mechanisms for Systems with
Heterogeneous Timing'.
Thanks for the invite!
Ian G. Clark
[email protected] http://IanGClark.net/
http://async.org.uk/
Fei Xia, Alex Yakovlev, Delong Shang
Talk layout
ACM
Background
HETS
Modelling
ACM Classification
Metastability
Data properties
Processing & communication split
Analysis & results
Future
Introduction and Background
• Systems are becoming larger and more complex.
• Large systems are difficult to synchronize.
• Physical material limits the maximum clock speed.
• Power consumption.
• EMC fields.
• Synchronous operation typically means that the system is running at a speed dictated by the slowest element.
Now: Pentium 4 processor: = 100 Million transistors
By 2012 Semiconductor Industry Association (SIA) predicts:1400 million transistors per chip,
3000 GHz clock, 1000 Gbit memory chips, 1 Volt power supply
Number of transistors per chip (log)
time
Chip complexity
Design productivity
Verification
The delay ratio problem
Delay (ps)
VLSI Generation (m)
Transistor/Gate Interconnect
Transistors get smaller/faster
Interconnects get longer/slower
Delay is unknown until we have the layout or if we change the layout – The TIME CLOSURE problem
Introduction and Background (2)
The Timing Modes Spectrum
Introduction and Background (3)A
nalo
gue
Asy
nchr
onou
s (s
elf-
timed
)
Sin
gle
cloc
k sy
nchr
onou
s
GA
LS
Het
erog
eneo
us
Non-sampled Sampled data
Continuous time Discrete time?
Par
alle
l
Mul
tiple
clo
ck
dom
ains
HE
TS
• Asynchronous processing.
• Improved EMC - dependent on data being processed.
• Lower power - energy only used when work is done.
Introduction and Background (4)
Example – A to D conversion.
However …
• Sequential and synchronous easier.
• Most current commercial tools support sequential and synchronous, some parallel but not asynchronous.
• An intermediate solution GALS
• Use synchronous and sequential in processing, and asynchronous in communication.
Can there be an easy transfer of knowledge from the existing methods to the new solutions?
Introduction and Background (5)
This is not just an academic question!
ITRS (International Technology Roadmap for Semiconductors) (http://public.itrs.net/) – Systems on Chip ‘SoC’ are increasingly becoming heterogeneous in the their behaviour, including mixed analogue-discrete components, time-driven and power-saving subsystems.
Wilfred Pinfold of Intel Microprocessor Lab ‘by the end of this decade, when Intel expects to be producing billion transistor devices, today’s essentially homogenous microprocessor market will have to become more diverse, characterised by multiple heterogeneous designs, each optimised for the requirements of different application segments’.
Introduction and Background (6)
• MASCOT / Real-Time network tools (internal to BAe).• Metropolis (Cadence Labs at Berkeley +++
(http://www.gigascale.org/metropolis/))• Moses (http://www.tik.ee.ethz.ch/~moses/).
Tool Support
• Off the shelf processors or IP cores - “best in class”
• MASCOT designs can be compiled down on to different hardware platforms
Component re-use
• ‘SoPC’ - System on Programmable Chip - defined as ‘any complex ASIC with at least one computing engine’
Pat Mead, Altera: from IEE SoC forum in Cambridge 2001
• NoC: Benini/De Micheli work
Implementation
NoC – Network on Chip
• Large existing knowledge base.
• Philips ‘ethernet on chip’.
• Current networks are synchronous – cannot handle non-synchronous cores – like self-timed.
• Global chip communication – increased power consumption.
• Good for non-deterministic data communication.
• Side step the synchronization and global clock issues.
• Not suitable for Real-Time applications.
Baseline: Architectural aspect
• Real-time networks and MASCOT approach – from RSRE/Phillips(67), BAe/Simpson(86) – for software systems– high time heterogeneity but relatively low speed
• Globally-Asynchronous-Locally-Synchronous (GALS) – Chapiro(84), Muttersbach(00), Ginosar(00) – for VLSI circuits– high speed but very limited time heterogeneity
Asynchronous data communications
process 1 shared
memory process 2
writer reader
writer time domain reader time domain
Level of asynchrony is defined by WRITE and READ rules
Processes are single threads of execution.
Classification of ACMs
Hugo Simpson’s classification:
Destructive read (read can be held up)
Non-destructive read (read cannot be held up)
Destructive write (write cannot be held up)
Signal
(event data)
Pool
(reference data)
Non-destructive write (write can be held up)
Channel
(message data)
Constant
(configuration data)
Other ACM classifications: e.g. L. Lamport, 1986 (safe, regular and atomic registers)
Difficulty with Simpson’s classification
• Destructive/Non-destructive does not intuitively imply temporal, Wait/No-wait division:
– Destructive write cannot wait – Destructive read can wait
• There is symmetry between Pool and Channel but no symmetry between Signal and Constant
Quick aside - Petri Nets
place
trans
ition ar
c
toke
n
Not to be confused with the Hets / MASCOT symbols
Petri net capture of Simpson’s protocols
Signalnon-destr write empty
full
destr write
non-destr write
empty
full
destr read
non-destr write
empty
full
full
destr write non-destr read
destr read
ConstantChannel
Pool
non-destr read
Our interpretation
Signal
writeread
unread
over-writeread
unread
writeread
unreadread
Message/CommandChannel
Pool
writeread
re-read
read
unread
over-write
write re-read
read
read
Constant is a special case of Command
Our interpretation
Signal
writeread
unread
over-writeread
unread
writeread
unreadread
Message/CommandChannel
Pool
writeread
re-read
read
unread
over-write
write re-read
read
read
Busy Writer
Lazy Writer
Busy ReaderLazy Reader
Our classification of ACMs
Lazy read = read only previously unread data(read can be held up)
Busy read = may re-read data already read
(read cannot be held up)
Busy write = may over-write unread data(write cannot be held up)
BW-LR (Signal) BW-BR (Pool)
Lazy write = write only if previous read data(write can be held up)
LW-LR (Channel) LW-BR (Command)
Signal vs Pool
Pool
Real time 1 (busy domain)
Real time 2 (busy domain)
Signal
Real time (busy domain)
Data-driven (lazy domain)
Low Power!
Sample algorithms
wr: write slot n;
w0: l:=n;
w1: n:=¬(l,r);
r0: r:=l;
rd: read slot r;
wr: write slot w;
w0: w:=¬r;
r0: r:=¬r;
rd: wait until w¬=r
read slot r;
Signal – with 2 slots – conditionally asynchronous
Pool – with 3 slots – fully asynchronous
- Multiple slots:
No temporal independence with only one slot.(There will always be situations when both processes clash in time on the one data slot).
- Slot:
Shared memory for one item of data
What is a slot?
- Capacity
Not to be confused with the number of slots. It takes a minimum of 3 slots to make a capacity 1 pool.
Data PropertiesCoherence
Write: ‘07:57’; ‘07:58’; ‘07:59’; ‘08:00’; ‘08:01’; ‘08:02’; ‘08:03’;
Read: ‘07:57’; ‘07:59’;‘07:00’; ‘08:02’;
Freshness
Write: ‘07:57’; ‘07:58’; ‘07:59’; ‘08:00’; ‘08:01’; ‘08:02’; ‘08:03’;
Read: ‘07:57’; ‘07:58’; ‘08:02’;
Sequence
Write: ‘07:57’; ‘07:58’; ‘07:59’; ‘08:00’; ‘08:01’; ‘08:02’; ‘08:03’;
Read: ‘07:57’; ‘07:59’;‘07:58’; ‘08:02’;
SIGNAL: Data latency
If a reader cycle immediately follows a writer cyclewhat data does it get?
Write X post
Does the reader read X?
SIGNAL: Data latency
Write X post
w=0r=0
write slot w;w := not r;
r := not r;wait until w¬=rread slot r;
Write slot 0
w:=not r = 1
pre
r:=not r = 1
w==rtherefore
made to wait
SIGNAL: Data latency
w=0r=0
write slot w;w := not r;
r := not r;wait until w¬=rread slot r;
Write X post
Write slot 0
w:=not r = 1
pre
r:=not r = 1
w==rtherefore
made to wait
Write Y post
Write slot 1
w:=not r = 0
Read
This implies 0 capacity
Trade off between slots and capacity and latency.
3 slot signal has capacity 1, and does not make the reader wait as here.
Modeling the algorithms
Example statement :- “w := not r;”
r=1
r=0
w=0
w=1
finishstart
subnet W0 in the SignalNon-abstract models for ease of understanding
This is atomic – some statements need to be 2 stage
Modeling the algorithms
W0 subnet
write subnet
read subnet
R0 subnet
w=0/1
r=0/1
Slot_0/1read/unread
setting
referencing
Sub-models and the ‘enable’ place
write post
Write is set to fresh and validother slot is set to not fresh
write end fresh and validsub-model
This should appear as an atomic action to the other process
MetastabilityActive clock edge
time
Input Set-up time
Output Propagation
delay
Every flip-flop has at least three equilibrium points, two stable and one unstable.
Keep away from data path!
MetastabilityActive clock edge
time
Input Set-up time
Output Propagation
delay
M
0
1
Analysis and Some Results
Exhaustive ‘reachability’ search – all process interleaving covered.
3 slot pool
Control {1,2,3}
Arbiter req.
Capacity 1+delay
4 slot pool
Control {0,1}
No arbiter
Capacity 1
2 slot signal
Control {0,1}
No arbiter
Capacity 0~1
3 slot signal
Control {1,2,3}
No arbiter
Capacity 1
ACM
Background
HETS
Modelling
ACM Classification
Metastability
Data properties
Processing & communication split
Analysis & results
FutureConclusion
Open questions:
• Have the best ACM algorithms been found?
• How should best be defined?
Current and Future work
Applications – distributed CCTV, Control systems.
Modelling of ACMs in system - analysis - (Moses/Metropolis?)