esl: system level design bluespec esepro: esl synthesis extenstions for systemc
DESCRIPTION
ESL: System Level Design Bluespec ESEPro: ESL Synthesis Extenstions for SystemC. Rishiyur S. Nikhil CTO, Bluespec, Inc. (www.bluespec.com). 6.375 Lecture 16 Delivered by Arvind March 16, 2007 (Only a subset of Nikhil’s slides are included). Software. implements. refine. - PowerPoint PPT PresentationTRANSCRIPT
ESL: System Level Design
Bluespec ESEPro:ESL Synthesis Extenstions for SystemC
Rishiyur S. Nikhil CTO, Bluespec, Inc. (www.bluespec.com)
6.375 Lecture 16 Delivered by ArvindMarch 16, 2007 (Only a subset of Nikhil’s slides are included)
2Rishiyur Nikhil, Bluespec, Inc.
Not avail. early;slower sim;HW-accurate
explore architectures
(for speed, area. power)
refine
HW Implementation
implements
Software
The central ESL design problem
Avail. early;very fast sim;not HW-accurate(timing, area)
Earlysoftware
implements
First HW model(s)
HW/SW interface (e.g., register read/write)Early
models
Required:uniform computational model(single paradigm), plus higher levelthan RTL, even for implementation
3Rishiyur Nikhil, Bluespec, Inc.
Another ESL design problem
Reuse (models and implementations)
SoC 1 SoC 2 SoC n
Required:powerful parameterization andpowerful module interface semantics
4Rishiyur Nikhil, Bluespec, Inc.
Bluespec enables ESL• Rules and Rule-based Interfaces provide a uniform
computational model suitable both for high-level system modeling as well as for HW implementation
• Atomic Transaction semantics are very powerful for expressing complex concurrency
– Formal and informal reasoning about correctness
– Automatic synthesis of complex control logic to manage concurrency
• Map naturally to HW (“state machines); synthesizable; no mental shifting of gears during refinement
• Can be mixed with regular SystemC, TLM, and C++, for mixed-model and whole-system modeling
• Enables Design-by-Refinement; Design-by-Contract
BSV: Bluespec SystemVerilog
ESEPro: Bluspec’s ESL Synthesis Extensions to SystemC
5Rishiyur Nikhil, Bluespec, Inc.
Rule Concurrent Semantics• “Untimed” semantics:
• “Timed”, or “Clock Scheduled” semantics (Bluespec scheduling technology)
Forever: Execute any enabled rule
In each clock:
Execute a subset of enabled rules (in parallel, but consistent with untimed semantics)
6Rishiyur Nikhil, Bluespec, Inc.
Bluespec Tools Architecture
Scheduling
Optimization
RTL Generation
Static Checking
Power Optimization
Parsing Parsing
BSV (SystemVerilog*)ESEPro (SystemC*)
RTL
gcc
systemc.h,esl.h
.exe
CommonSynthesis
Engine
ESEComp and BSC BluesimESE and ESEPro
Rapid,Source-Level
Simulation andInteractive
Debug of BSV
Cycle-Accurate
w/Verilog sim
Cycle-Accurate
w/Verilog sim
Blu
evie
w D
ebu
g
Untimed& Timed
sim simsynthesis
7Rishiyur Nikhil, Bluespec, Inc.
Outline• Limitations of SystemC in modeling SoCs
• ESEPro’s Rule-based Interfaces
• Model-to-implementation refinement with SystemC and ESEPro modules
• Seamless interoperation of SystemC TLM and ESEPro modules
• ESEPro-to-RTL synthesis
• An example
8Rishiyur Nikhil, Bluespec, Inc.
Example illustrating why modeling hardware-accurate complex concurrency is difficult in standard SystemC (threads and events)
9Rishiyur Nikhil, Bluespec, Inc.
A 2x2 switch, with stats
Spec:
• Packets arrive on two input FIFOs, and must be switched to two output FIFOs
• Certain “interesting packets” must be counted
Dete
rmin
eQ
ueue
Dete
rmin
eQ
ueue
+1
Countcertain packets
10Rishiyur Nikhil, Bluespec, Inc.
The first version of the SystemC code is easy
Dete
rmin
eQ
ueue
Dete
rmin
eQ
ueue
+1
Countcertain packets
void thread1 (){while (true) { Pkt x = in0->first(); in0->deq(); if (x.dest == 0) out0->enq (x); else out1->enq (x); if (count(x)) c++;}}
void thread2 (){while (true) { Pkt x = in1->first(); in1->deq(); if (x.dest == 0) out0->enq (x); else out1->enq (x); if (count(x)) c++;}}
first(), deq() block if input fifo is empty;enq() blocks if output fifo is full.
It all works fine because of “cooperative parallelism”
11Rishiyur Nikhil, Bluespec, Inc.
Cooperative parallelism model
• The two increments to the counter do not need to be protected with “locks” because of SystemC’s definition of parallelism as cooperative, i.e.,
• Threads only switch at “wait()” statements
• Threads do not interleave
• But real hardware has real parallelism!
• Gap between model and implementation
• Further, cooperative multithreading also makes it hard to simulate models in parallel (e.g., on a modern multi-core or SMP machine)
This code would have problems with preemptive parallelism
12Rishiyur Nikhil, Bluespec, Inc.
There could be some subtle mistakes
Dete
rmin
eQ
ueue
Dete
rmin
eQ
ueue
+1
Countcertain packets
void thread1 (){while (true) { int tmp = c ; Pkt x = in0->first(); in0->deq(); if (x.dest == 0) out0->enq (x); else out1->enq (x); if (count(x)) c = tmp + 1 ;}}
void thread2 (){while (true) { int tmp = c ; Pkt x = in1->first(); in1->deq(); if (x.dest == 0) out0->enq (x); else out1->enq (x); if (count(x)) c = tmp + 1 ;}}If the threads interleave due to blockingof first(), deq(), enq(), c will beincorrectly updated (non-atomically)
Cooperative parallelism Atomicity
13Rishiyur Nikhil, Bluespec, Inc.
Hardware has additional “resource contention” constraints
• Each output fifo can be enq’d by only one process at a time (in the same clock)
• Need arbitration if both processes want to enq() on the same fifo simultaneously
• SystemC’s cooperative multitasking makes it easy to ignore this, but much harder to model this accurately D
ete
rmin
eQ
ueue
Dete
rmin
eQ
ueue
+1
Countcertain packetsAccurately modeling this makes
the code messier
14Rishiyur Nikhil, Bluespec, Inc.
Hardware has additional “resource contention” constraints
• The counter can be incremented by only one process at a time
• Need arbitration if both want to increment
• SystemC’s cooperative multitasking makes it easy to ignore this, but much harder to model this accurately
Dete
rmin
eQ
ueue
Dete
rmin
eQ
ueue
+1
Countcertain packetsAccurately modeling this makes
the code messier
15Rishiyur Nikhil, Bluespec, Inc.
Hardware has additional “resource contention” constraints
• No intermediate buffering a process should transfer a packet only when both its input fifo and its output fifo are ready, and it has priority on its output fifo and the counter
• SystemC’s blocking methods make it easy to ignore this, but much harder to model this accurately
Dete
rmin
eQ
ueue
Dete
rmin
eQ
ueue
+1
Countcertain packetsAccurately modeling this makes
the code messier
16Rishiyur Nikhil, Bluespec, Inc.
Hardware typically has additional “resource contention” constraints
• These constraints must be modeled in order to model HW performance accurately (latencies, bandwidth)
• In SystemC, this exposes full/empty tests on fifos, adds locks/semaphores, polling of locks/semaphores, …
• The code becomes a mess
• If we want synthesizability, it more and more resembles writing RTL in SystemC notation
17Rishiyur Nikhil, Bluespec, Inc.
Limitations of SystemC/C++• Accurate SoC modeling involves lots of concurrency and dynamic,
fine-grain resource sharing• Because these are the characteristics of HW• Most blocks in an SoC are HW; a few blocks (e.g., processor,
DSP) involve software (typically C, C++)
• “Threads and Events” (SystemC’s concurrency model) are far too low-level for this
• Require tedious, explicit management of concurrent access to shared state
• Weak semantics for module composition• Does not scale to large systems• They are the source of the majority of bugs in RTL and
SystemC (race conditions, inconsistent state, protocol errors, …)
• Instead, advanced SW systems (e.g., Operating Systems, Database Systems, Transaction Processing Systems) use Atomic Transactions to manage complex concurrency
18Rishiyur Nikhil, Bluespec, Inc.
Other issues with SystemC/C++• No early feedback on HW implementability during
modeling, because of distance of SystemC semantics from HW
• Threads, stacks, dynamic allocation, events, locks, global variables, undisciplined instantaneous access to global/remote data
• Undisciplined access to shared resources
• No credible synthesis from a sequential, thread-based model of computation (except for loop-and-array computational kernels)
• The design has to be re-implemented in RTL
19Rishiyur Nikhil, Bluespec, Inc.
Literature on problems with threads(and the advantages of atomicity)
• The Problem with Threads, Edward A. Lee, IEEE Computer, 39:5, pp 33-42, May 2006
• Why threads are a bad idea (for most purposes), John K. Ousterhout, Invited Talk, USENIX Technical Conference, January 1996
• Composable memory transactions, Tim Harris, Simon Marlow, Simon Peyton Jones and Maurice Herlihy, in ACM Conf. on Principles and Practice of Parallel Programming (PPoPP), 2005.
• Atomic Transactions, Nancy A. Lynch, Michael Merritt, William E. Weihl and Alan Fekete, Morgan Kaufman, San Mateo, CA, 1994, 476 pp.
• … and more …
20Rishiyur Nikhil, Bluespec, Inc.
2x2 switch:the meat of the ESEPro code
Dete
rmin
eQ
ueue
Dete
rmin
eQ
ueue
+1
Countcertain packets
ESL_RULE (r0); Pkt x = in0->first(); in0->deq(); if (x.dest == 0) out0->enq(x); else out1->enq(x); if (count(x)) c++;}
ESL_RULE (r1); Pkt x = in1->first(); in1->deq(); if (x.dest == 0) out0->enq(x); else out1->enq(x); if (count(x)) c++;}
Atomicity of rules captures all the “resourcecontention” constraints of hardwareimplementation; further, this code issynthesizable to RTL as written.
21Rishiyur Nikhil, Bluespec, Inc.
Managing change• Specs always change. Imagine:
• Some packets are multicast (go to both FIFOs)
• Some packets are dropped (go to no FIFO)
• More complex arbitration
– FIFO collision: in favor of r1
– Counter collision: in favor of r2
– Fair scheduling
• Several counters for several kinds of interesting packets
• Non exclusive counters (e.g., TCP IP)
• M input FIFOs, N output FIFOs (parameterized)
• What if these changes are required 6 months after original coding?
With Rules these are easy, because the source code remains uncluttered by all the
complex control and mux logic atomicity ensures correctness
22Rishiyur Nikhil, Bluespec, Inc.
Outline• Limitations of SystemC in modeling SoCs
• ESEPro’s Rule-based Interfaces
• Model-to-implementation refinement with SystemC and ESEPro modules
• Seamless interoperation of SystemC TLM and ESEPro modules
• ESEPro-to-RTL synthesis
• An example
23Rishiyur Nikhil, Bluespec, Inc.
Interfaces: raising the level of abstraction(while preserving Rule semantics)
• Interfaces can also contain other interfaces
• We use this to build a hierarchy of interfaces
• Get/Put Client/Server …
• These capture common interface design patterns
• There is no HW overhead to such abstraction
• Connections between standard interfaces can be packaged (and used, and reused)
• “Connectable” interfaces
• All these are synthesizable
24Rishiyur Nikhil, Bluespec, Inc.
Get and Put Interfaces• Provide simple methods for getting data from a
module or putting data into it
• Easy to connect together
template <typename T>ESL_INTERFACE ( Get ) { ESL_METHOD_ACTIONVALUE_INTERFACE ( get, T );}
template <typename T>ESL_INTERFACE ( Put ) { ESL_METHOD_ACTION_INTERFACE ( put, T x );}
get
put
25Rishiyur Nikhil, Bluespec, Inc.
Get and Put Interfaces• Get and Put are just interface specifications
• Many different kinds of modules can provide Get and Put interfaces
• E.g., a FIFO’s enq() can be viewed as a put() operation, and a FIFO’s first()/deq() can be viewed as a get() operation
26Rishiyur Nikhil, Bluespec, Inc.
Interface transformers/transactors• Because of the abstractions of interfaces and modules, it is
easy to write interface transformers/transactors
• This example is from the ESEPro library, transforming a FIFO interface into a Get interface
ESL_MODULE_TEMPLATE ( fifoToGet, Get, T ) { FIFO<T> *f;
ESL_METHOD_ACTIONVALUE (get, true, T) { T temp = f->first(); f->deq(); return temp; }
ESL_CTOR ( fifoToGet, FIFO<T> *ff ) : f ( ff ) { ESL_END_CTOR; }};
27Rishiyur Nikhil, Bluespec, Inc.
Interface transformers/transactors
• Another example from the ESEPro library, transforming a FIFO interface into a Put interface:
ESL_MODULE_TEMPLATE ( fifoToPut, Put, T ) { FIFO<T> *f;
ESL_METHOD_ACTION (put, true, T x) { f->enq (x); }
ESL_CTOR ( fifoToPut, FIFO<T> *ff ) : f ( ff ) { ESL_END_CTOR; }};
28Rishiyur Nikhil, Bluespec, Inc.
Nested interfaces• An interface can not only contain methods, but also
nested interfaces
template < typename Req_t, typename Resp_t >ESL_INTERFACE ( Server ) { ESL_SUBINTERFACE ( request, Put <Req_t> ); ESL_SUBINTERFACE ( response, Get <Resp_t> );}
get
put
29Rishiyur Nikhil, Bluespec, Inc.
Sub-interfaces: using transformers
• The ESEPro library provides functions to convert FIFOs to Get/Put
ESL_MODULE ( mkCache, CacheIfc ) { FIFO<Req_t> *p2c; FIFO<Resp_t> *c2p;
… rules expressing cache logic …
ESL_CTOR ( mkCache, …) { request = new fifoToPut (“req”, p2c); response = new fifoToGet (“rsp”, c2p); }}
Absolutely no difference in the HW!
typedef Server<Req_t, Resp_t> CacheIfc;
getput
mkCache
30Rishiyur Nikhil, Bluespec, Inc.
client
Client/Server interfaces• Get/Put pairs are very common, and duals of each
other, so the library defines Client/Server interface types for this purpose
ESL_INTERFACE ( Client<req_t, resp_t> ) { ESL_SUBINTERFACE (request, Get<req_t> ); ESL_SUBINTERFACE (response, Put<resp_t> );};
ESL_INTERFACE ( Server<req_t, resp_t> ) { ESL_SUBINTERFACE ( request, Put<req_t> ); ESL_SUBINTERFACE ( response, Get<resp_t> );};
getserver
get put
put
req_t resp_t
31Rishiyur Nikhil, Bluespec, Inc.
Client/Server interfacesESL_INTERFACE ( CacheIfc ) { ESL_SUBINTERFACE ( ipc, Server<Req_t, Resp_t> ); ESL_SUBINTERFACE ( icm, Client<Req_t, Resp_t> );};
ESL_MODULE ( mkCache, CacheIfc ) { // from / to processor FIFO<Req_t> *p2c; FIFO<Resp_t> *c2p;
// to / from memory FIFO<Req_t> *c2m; FIFO<Resp_t> *m2c;
… rules expressing cache logic …
ESL_CTOR (mkCache ) { … ipc = fifosToServer (p2c, c2p); icm = fifosToClient (c2m, m2c); ESL_END_CTOR;}
mkCache
getputserver
clientget put
getputserver
clientget put
mkMem
mkProcessor
32Rishiyur Nikhil, Bluespec, Inc.
Connecting Get and Put• A module m1 providing a Get interface can be
connected to a module m2 providing a Put interface with a simple rule
ESL_MODULE ( mkTop, …) { Get<int> *m1; Put<int> *m2;
ESL_RULE ( connect, true ) { x = m1->get(); m2->put (x); // note implicit conditions }}
get
put
33Rishiyur Nikhil, Bluespec, Inc.
“Connectable” interface pairs• There are many pairs of types that are duals of each
other
• Get/Put, Client/Server, YourTypeT1/YourTypeT2, …
• The ESEPro library defines an overloaded, templated module mkConnection which encapsulates the connections between such duals
• The ESEPro library predefines the implementation of mkConnection for Get/Put, Client/Server, etc.
• Because overloading in C++ is extensible, you can overload mkConnection to work on your own interface types T1 and T2
34Rishiyur Nikhil, Bluespec, Inc.
mkConnection• Using these interface facilities, assembling
systems becomes very easy
ESL_MODULE ( mkTopLevel, …) { // instantiate subsystems Client<Req_t, Resp_t> *p; Cache_Ifc<Req_t, Resp_t> *c; Server<Req_t, Resp_t> *m;
// instantiate connections new mkConnection< Client<Req_t, Resp_t>, Server<Req_t, Resp_t> > (“p2c”, p, c->ipc);
new mkConnection< Client<Req_t, Resp_t>, Server<Req_t, Resp_t> > (“c2m”, c->icm, m);}
mkCache
getputserver (ipc)
client (icm)get put
getputserver
clientget put
mkMem
mkProcessor
35Rishiyur Nikhil, Bluespec, Inc.
Outline• Limitations of SystemC in modeling SoCs
• ESEPro’s Rule-based Interfaces
• Model-to-implementation refinement with SystemC and ESEPro modules
• Seamless interoperation of SystemC TLM and ESEPro modules
• ESEPro-to-RTL synthesis
• An example
36Rishiyur Nikhil, Bluespec, Inc.
Rules and Levels of abstraction
PV (Programmer’s View)
PVT (PV with Timing)
AV (Architect’s View)
CA (Cycle-accurate)
IM (Implementation)
AL/FL (Algorithm/Function level)
Untimed Rules(no clocks)
Clocked Rules(scheduled)
Rules, C, C++,Matlab, …
37Rishiyur Nikhil, Bluespec, Inc.
Module structure• A system model can
contain a mixture of SystemC modules and ESEPro modules
• Typical SystemC modules:
• CPU ISS models
• Behavioral models
• C++ code targeted for behavioral synthesis
• Existing SystemC IP
• Typical ESEPro modules:
• Complex control
• Requiring HW-realistic architectural exploration
Processor(App/ISS)
DSP(App/ISS)
DMA MemController
DRAMmodel
Interconnect
L2 cache
Codecmodel
SystemC
Rule-based SystemC
Legend
SoC Model
38Rishiyur Nikhil, Bluespec, Inc.
Simulation flow
Processor(ISS/App)
DSP(ISS/App)
DMA MemController
DRAMmodel
Interconnect
L2 cache
Codecmodel
SystemC
Rule-based SystemC
Legend
System Model
ESLclass
defs/libs
+ESL
coreSystemC
Standard SystemC tools(gcc, OSCI sim, gdb, …)
+TLM
coreSystemC
classdefs/libs
+TLM
+TLM
TLMclass
defs/libs
39Rishiyur Nikhil, Bluespec, Inc.
Synthesizable subset: ESEPro Rule-based modules much higher level than RTL already validated in BSV
Bluespecsynthesis tool
RTL
Synthesis flow
Processor(ISS)
DSP(App)
DMA MemController
DRAMmodel
Interconnect
L2 cache
Codecmodel
System Model
Verilog simRTL synthesis,Physical design
TapeoutSystemC
Rule-based SystemC
Legend
40Rishiyur Nikhil, Bluespec, Inc.
System refinement Using ESEPro
• ESEPro modules can be introduced early as they can be written at a very high level, can interface to TLM modules, and can themselves be refined
• System-level testbenches can be reused at all levels
• SystemC modules with standard TLM interfaces interoperate seamlessly with ESEPro modules
• Behavioral models, Design IP, Verification IP, …
More information at: http://www.bluespec.com/products/ESLSynthesisExtensions.htm
Website also has a free distribution called “ESE”
41Rishiyur Nikhil, Bluespec, Inc.
Mixing models: all combinations
TLM Master
ESEProMaster
SystemC
Rule-based SystemC
Legend
TLM Slave
ESEProSlave
ESEProSlave
ESEProMaster
TLM Master TLM Slave
TLM Master and Slave are taken unmodified from OSCI TLM distribution examples
Replace Master
Replace Slave
Replace Slave
Replace Master
42Rishiyur Nikhil, Bluespec, Inc.
Structure of TLM modules in demo(from OSCI_TLM/examples/example_3_2)
TLM master
write (addr, data)
read (addr, data &)
basic_initiator_portRSP = transport (REQ)
TLM slave
write (addr, data)
read (addr, data &)
basic_slave_base
20
20
transport () is a basic TLM interface call
43Rishiyur Nikhil, Bluespec, Inc.
TLM master
TLM master and ESEPro slave
ESEPro slave
write (addr, data)
read (addr, data &)
Server <REQ, RSP>
write (addr, data)
read (addr, data &)
basic_initiator_port
20
20
transport ()
mkConnection(channel)
44Rishiyur Nikhil, Bluespec, Inc.
Example: ESEPro SoC model for synthesis(from ST/GreenSoCs “TAC” model)
M S= Master interface = Slave interface (< 1000 lines of source code)
Router
Initiator 0 Initiator 1
Target 0 Timer
Respond totimer interrupt
Target 1
Set timer
M
M
MMM
SSS
S S
M
45Rishiyur Nikhil, Bluespec, Inc.
SoC Model: Behavior• Initiators repeatedly do read/write transactions to
Targets, via Router
• At startup, Initiator 1 writes to Timer registers via Router, starting the timer
• When Timer’s time period expires, generates an interrupt to Initiator 1
46Rishiyur Nikhil, Bluespec, Inc.
SynthesisSimulation
SoC Model in ESEPro(from ST/GreenSocs “TAC” model)
Standard SystemC tools(gcc, OSCI sim, gdb, …)
coreSystemC
classdefs/libs
ESLclass
defs/libs
+ESL
RTL
Verilog sim
Bluespecsynthesis tool
ESEComp™ESEPro™
This capability is unique to ESEComp
CycleAccurate Magma
synthesis
Synthesis example
47Rishiyur Nikhil, Bluespec, Inc.
Side-by-side simulation comparison
Cycle 12Target[1]: got request from initiator[1], addr is 1001
Target[1]: sending response, data 1011Target[0]: got request from initiator[0], addr is 4
Target[0]: sending response, data 14Initiator_with_intr_in[1]: forwarding req, addr = 1003
Initiator[0]: got response addr 2, data 12Initiator[0]: sending req, addr = 6
----------------Cycle 13
Timer: generating interruptInitiator[1]: sending req, addr = 4
----------------Cycle 14
Target[1]: got request from initiator[0], addr is 1005Target[1]: sending response, data 1015
Target[0]: got request from initiator[1], addr is 2Target[0]: sending response, data 12
Initiator_with_intr_in[1]: forwarding req, addr = 4Initiator[1]: got response addr 0, data 10
Initiator[0]: got response addr 1003, data 1013Initiator[0]: sending req, addr = 1007
----------------Cycle 15
Initiator_with_intr_in[1] received interruptInitiator[1]: sending req, addr = 1005
Cycle 12Initiator[0]: sending req, addr = 6Initiator[0]: got response addr 2, data 12Target[0]: got request from initiator[0], addr is 4Target[0]: sending response, data 14Target[1]: got request from initiator[1], addr is 1001Target[1]: sending response, data 1011Initiator_with_intr_in[1]: forwarding req, addr = 1003----------------Cycle 13Initiator[1]: sending req, addr = 4Timer: generating interrupt----------------Cycle 14Initiator[0]: sending req, addr = 1007Initiator[0]: got response addr 1003, data 1013Initiator[1]: got response addr 0, data 10Target[0]: got request from initiator[1], addr is 2Target[0]: sending response, data 12Target[1]: got request from initiator[0], addr is 1005Target[1]: sending response, data 1015Initiator_with_intr_in[1]: forwarding req, addr = 4----------------Cycle 15Initiator[1]: sending req, addr = 1005Initiator_with_intr_in[1] received interrupt
SystemCSimulation
Verilog (Generated)Simulation
CycleAccurate
(order of messages within each cycle varies, but that’s ok—from parallel actions)