data communication mechanisms for systems with heterogeneous timing'. thanks for the invite!...

45
'Data Communication Mechanisms for Systems with Heterogeneous Timing'. Thanks for the invite! Ian G. Clark [email protected] rg http:// IanGClark.net/ http:// async.org.uk/ Fei Xia, Alex Yakovlev, Delong Shang

Post on 20-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

'Data Communication Mechanisms for Systems with

Heterogeneous Timing'.

Thanks for the invite!

Ian G. Clark

[email protected] http://IanGClark.net/

http://async.org.uk/

Fei Xia, Alex Yakovlev, Delong Shang

Talk layout

ACM

Background

HETS

Modelling

ACM Classification

Metastability

Data properties

Processing & communication split

Analysis & results

Future

Introduction and Background

• Systems are becoming larger and more complex.

• Large systems are difficult to synchronize.

• Physical material limits the maximum clock speed.

• Power consumption.

• EMC fields.

• Synchronous operation typically means that the system is running at a speed dictated by the slowest element.

Now: Pentium 4 processor: = 100 Million transistors

By 2012 Semiconductor Industry Association (SIA) predicts:1400 million transistors per chip,

3000 GHz clock, 1000 Gbit memory chips, 1 Volt power supply

Number of transistors per chip (log)

time

Chip complexity

Design productivity

Verification

The delay ratio problem

Delay (ps)

VLSI Generation (m)

Transistor/Gate Interconnect

Transistors get smaller/faster

Interconnects get longer/slower

Delay is unknown until we have the layout or if we change the layout – The TIME CLOSURE problem

Introduction and Background (2)

The Timing Modes Spectrum

Introduction and Background (3)A

nalo

gue

Asy

nchr

onou

s (s

elf-

timed

)

Sin

gle

cloc

k sy

nchr

onou

s

GA

LS

Het

erog

eneo

us

Non-sampled Sampled data

Continuous time Discrete time?

Par

alle

l

Mul

tiple

clo

ck

dom

ains

HE

TS

• Asynchronous processing.

• Improved EMC - dependent on data being processed.

• Lower power - energy only used when work is done.

Introduction and Background (4)

Example – A to D conversion.

However …

• Sequential and synchronous easier.

• Most current commercial tools support sequential and synchronous, some parallel but not asynchronous.

• An intermediate solution GALS

• Use synchronous and sequential in processing, and asynchronous in communication.

Can there be an easy transfer of knowledge from the existing methods to the new solutions?

Introduction and Background (5)

This is not just an academic question!

ITRS (International Technology Roadmap for Semiconductors) (http://public.itrs.net/) – Systems on Chip ‘SoC’ are increasingly becoming heterogeneous in the their behaviour, including mixed analogue-discrete components, time-driven and power-saving subsystems.

Wilfred Pinfold of Intel Microprocessor Lab ‘by the end of this decade, when Intel expects to be producing billion transistor devices, today’s essentially homogenous microprocessor market will have to become more diverse, characterised by multiple heterogeneous designs, each optimised for the requirements of different application segments’.

Introduction and Background (6)

• MASCOT / Real-Time network tools (internal to BAe).• Metropolis (Cadence Labs at Berkeley +++

(http://www.gigascale.org/metropolis/))• Moses (http://www.tik.ee.ethz.ch/~moses/).

Tool Support

• Off the shelf processors or IP cores - “best in class”

• MASCOT designs can be compiled down on to different hardware platforms

Component re-use

• ‘SoPC’ - System on Programmable Chip - defined as ‘any complex ASIC with at least one computing engine’

Pat Mead, Altera: from IEE SoC forum in Cambridge 2001

• NoC: Benini/De Micheli work

Implementation

NoC – Network on Chip

• Large existing knowledge base.

• Philips ‘ethernet on chip’.

• Current networks are synchronous – cannot handle non-synchronous cores – like self-timed.

• Global chip communication – increased power consumption.

• Good for non-deterministic data communication.

• Side step the synchronization and global clock issues.

• Not suitable for Real-Time applications.

Baseline: Architectural aspect

• Real-time networks and MASCOT approach – from RSRE/Phillips(67), BAe/Simpson(86) – for software systems– high time heterogeneity but relatively low speed

• Globally-Asynchronous-Locally-Synchronous (GALS) – Chapiro(84), Muttersbach(00), Ginosar(00) – for VLSI circuits– high speed but very limited time heterogeneity

Heterogeneously Timed Nets (hets)(based on MASCOT standard symbols)

A1 C1

A3

A4

A2

C3

C2

Hets

A1 C1

A3

A4

A2

C3

C2

Time/event/data-drivenData processing elements(active)

Hets

A1 C1

A3

A4

A2

C3

C2

Data communication elements(passive) - ACMs

Asynchronous data communications

process 1 shared

memory process 2

writer reader

writer time domain reader time domain

Level of asynchrony is defined by WRITE and READ rules

Processes are single threads of execution.

Classification of ACMs

Hugo Simpson’s classification:

Destructive read (read can be held up)

Non-destructive read (read cannot be held up)

Destructive write (write cannot be held up)

Signal

(event data)

Pool

(reference data)

Non-destructive write (write can be held up)

Channel

(message data)

Constant

(configuration data)

Other ACM classifications: e.g. L. Lamport, 1986 (safe, regular and atomic registers)

Difficulty with Simpson’s classification

• Destructive/Non-destructive does not intuitively imply temporal, Wait/No-wait division:

– Destructive write cannot wait – Destructive read can wait

• There is symmetry between Pool and Channel but no symmetry between Signal and Constant

Quick aside - Petri Nets

place

trans

ition ar

c

toke

n

Not to be confused with the Hets / MASCOT symbols

Petri net capture of Simpson’s protocols

Signalnon-destr write empty

full

destr write

non-destr write

empty

full

destr read

non-destr write

empty

full

full

destr write non-destr read

destr read

ConstantChannel

Pool

non-destr read

Our interpretation

Signal

writeread

unread

over-writeread

unread

writeread

unreadread

Message/CommandChannel

Pool

writeread

re-read

read

unread

over-write

write re-read

read

read

Constant is a special case of Command

Our interpretation

Signal

writeread

unread

over-writeread

unread

writeread

unreadread

Message/CommandChannel

Pool

writeread

re-read

read

unread

over-write

write re-read

read

read

Busy Writer

Lazy Writer

Busy ReaderLazy Reader

Our classification of ACMs

Lazy read = read only previously unread data(read can be held up)

Busy read = may re-read data already read

(read cannot be held up)

Busy write = may over-write unread data(write cannot be held up)

BW-LR (Signal) BW-BR (Pool)

Lazy write = write only if previous read data(write can be held up)

LW-LR (Channel) LW-BR (Command)

Signal vs Pool

Pool

Real time 1 (busy domain)

Real time 2 (busy domain)

Signal

Real time (busy domain)

Data-driven (lazy domain)

Low Power!

Sample algorithms

wr: write slot n;

w0: l:=n;

w1: n:=¬(l,r);

r0: r:=l;

rd: read slot r;

wr: write slot w;

w0: w:=¬r;

r0: r:=¬r;

rd: wait until w¬=r

read slot r;

Signal – with 2 slots – conditionally asynchronous

Pool – with 3 slots – fully asynchronous

- Multiple slots:

No temporal independence with only one slot.(There will always be situations when both processes clash in time on the one data slot).

- Slot:

Shared memory for one item of data

What is a slot?

- Capacity

Not to be confused with the number of slots. It takes a minimum of 3 slots to make a capacity 1 pool.

Data PropertiesCoherence

Write: ‘07:57’; ‘07:58’; ‘07:59’; ‘08:00’; ‘08:01’; ‘08:02’; ‘08:03’;

Read: ‘07:57’; ‘07:59’;‘07:00’; ‘08:02’;

Freshness

Write: ‘07:57’; ‘07:58’; ‘07:59’; ‘08:00’; ‘08:01’; ‘08:02’; ‘08:03’;

Read: ‘07:57’; ‘07:58’; ‘08:02’;

Sequence

Write: ‘07:57’; ‘07:58’; ‘07:59’; ‘08:00’; ‘08:01’; ‘08:02’; ‘08:03’;

Read: ‘07:57’; ‘07:59’;‘07:58’; ‘08:02’;

SIGNAL: Data latency

If a reader cycle immediately follows a writer cyclewhat data does it get?

Write X post

Does the reader read X?

SIGNAL: Data latency

Write X post

w=0r=0

write slot w;w := not r;

r := not r;wait until w¬=rread slot r;

Write slot 0

w:=not r = 1

pre

r:=not r = 1

w==rtherefore

made to wait

SIGNAL: Data latency

w=0r=0

write slot w;w := not r;

r := not r;wait until w¬=rread slot r;

Write X post

Write slot 0

w:=not r = 1

pre

r:=not r = 1

w==rtherefore

made to wait

Write Y post

Write slot 1

w:=not r = 0

Read

This implies 0 capacity

Trade off between slots and capacity and latency.

3 slot signal has capacity 1, and does not make the reader wait as here.

Modeling the algorithms

Example statement :- “w := not r;”

r=1

r=0

w=0

w=1

finishstart

subnet W0 in the SignalNon-abstract models for ease of understanding

This is atomic – some statements need to be 2 stage

Modeling the algorithms

W0 subnet

write subnet

read subnet

R0 subnet

w=0/1

r=0/1

Slot_0/1read/unread

setting

referencing

Sub-models and the ‘enable’ place

write post

Write is set to fresh and validother slot is set to not fresh

write end fresh and validsub-model

This should appear as an atomic action to the other process

Sub-models and the ‘enable’ place

write end testingsub-model

enable

part of the reader model

MetastabilityActive clock edge

time

Q1

Q2

S

R

a normal state-transition

MetastabilityActive clock edge

time

Input Set-up time

Output Propagation

delay

Every flip-flop has at least three equilibrium points, two stable and one unstable.

Metastable transients

3

1

3

1

3

1

13

Keep away from data path!

MetastabilityActive clock edge

time

Input Set-up time

Output Propagation

delay

M

0

1

Analysis and Some Results

Exhaustive ‘reachability’ search – all process interleaving covered.

3 slot pool

Control {1,2,3}

Arbiter req.

Capacity 1+delay

4 slot pool

Control {0,1}

No arbiter

Capacity 1

2 slot signal

Control {0,1}

No arbiter

Capacity 0~1

3 slot signal

Control {1,2,3}

No arbiter

Capacity 1

VLSI design layout (chip fab’ed in June 2000 via EUROPRACTICE)

4-slot Pool ACM

4-slot ACM part

(details on testing in 9thAsync UK Forum paper)

ACM

Background

HETS

Modelling

ACM Classification

Metastability

Data properties

Processing & communication split

Analysis & results

FutureConclusion

Open questions:

• Have the best ACM algorithms been found?

• How should best be defined?

Current and Future work

Applications – distributed CCTV, Control systems.

Modelling of ACMs in system - analysis - (Moses/Metropolis?)

Acknowledgements

More info on team and projects

Tony Davies

David Fraser

David Kinniment

Albert Koelmans

Graeme Chester

Fei Hao

Maria Valera

Sergio Velastin

Other team members

Coherent project

Grants GR/32895 & GR/32666

http://async.org.uk/coherent/

Collaborators

Hugo Simpson

Eric Campbell