system-level simulation (hw/sw co-simulation) · system-level simulation (hw/sw co-simulation)...
TRANSCRIPT
EE290A: Design of Embedded System ASV/LL 9/10
1
SystemSystem--level simulationlevel simulation(HW/SW co(HW/SW co--simulation)simulation)
OutlineOutline
nn Problem statementProblem statementnn Simulation and embedded system designSimulation and embedded system design
uu functional simulationfunctional simulationuu performance simulationperformance simulation
èè POLIS implementationPOLIS implementationèè partitioning examplepartitioning example
uu implementation simulationimplementation simulationèè softwaresoftware--orientedorientedèè hardwarehardware--centriccentric
nn SummarySummary
EE290A: Design of Embedded System ASV/LL 9/10
2
Embedded Heterogeneous Embedded Heterogeneous SystemSystem
A
Ddigital
downconv
900Mhz70Mhz 10.7Mhz
SH
SAW FILTER
RF IF 40Ms/sec - 540ks/sec
ViterbiEqls.
270.8ks/sec
demodand
sync
BB
phonebook
keypadintfc
protocolcontrol
de-intl&
decoder
RPE-LTPspeechdecoder
speechquality
enhancement
voicerecognition
Logic
analog digital
phonebookDMA
S/P
DSP core
uC core
RAM & ROM
GSM Phone
Modeling andModeling andSimulation TechniquesSimulation Techniques
execute C code on a workstation
execute object code usingn instruction level model
l instruction cycle accurate
l clock cycle accuraten HDL model of the processor
l RTL
l gate level netlistn phase accurate modeln pin accurate modeln fully functional modeln spice model of the processorn in-circuit emulatorsn real hardwarebus functional model
n some sort of C coden behavioral HDL coden RTL HDL code
l discrete eventl clock cycle
basedn gate level netlistn real hardware
n some sort of C coden frequency domain modeln spice modeln real hardware
RF IF
uC
DSPor
ASIC
n SourceModelsn SmartModelsn DesignWaren Hardware Modeler
EE290A: Design of Embedded System ASV/LL 9/10
3
Problem StatementProblem Statement
nn To model the behavior of a combined hardware To model the behavior of a combined hardware and software system based on models of the and software system based on models of the behavior of the hardware and software componentsbehavior of the hardware and software components
nn Usually requires trading offUsually requires trading offèè accuracyaccuracyèè throughputthroughputèè convenienceconvenience
uu using the right abstractions for the taskusing the right abstractions for the task
Function and Interface Function and Interface AbstractionAbstraction
n what is going on inside a subsystem
n how do two subsystems communicate
How much visibility do I need to have into
function abstraction
interface abstraction
? ?? ?
EE290A: Design of Embedded System ASV/LL 9/10
4
OutlineOutline
nn Problem statementProblem statementnn Simulation and embedded system designSimulation and embedded system design
uu functional simulationfunctional simulationuu performance simulationperformance simulation
èè POLIS implementationPOLIS implementationèè partitioning examplepartitioning example
uu implementation simulationimplementation simulationèè softwaresoftware--orientedorientedèè hardwarehardware--centriccentric
nn SummarySummary
Cycle-based,logic simul.
Design flowDesign flow
Behaviorcapture
Mapping(partitioning)
Architecturecapture
Functionalsimul.
Architecturesimul.
Performancesimul.
Synthesis,coding
EE290A: Design of Embedded System ASV/LL 9/10
5
for (i=0;i<8;i++)idctrow(i);
for (i=0;i<8;i++)idctcol(i);
Functional modelFunctional model
Variable-length dec.
Flowsplitting
IDCT
Motioncomp.
Imagememory
Videooutput
MPEG 2decoder
Stream in Video out
Functional simulationFunctional simulationnn Timeless (not really coTimeless (not really co--simulation… )simulation… )nn Algorithm exploration, functional debugging, Algorithm exploration, functional debugging,
virtual prototypingvirtual prototypingnn Different formal models: Different formal models:
uu controlcontrol--dominated: CSP,dominated: CSP, EFSMsEFSMs, DE, etc., DE, etc.eventevent--based: Bones,based: Bones, StateChartsStateCharts, etc., etc.
uu data dominated: DF networksdata dominated: DF networkstokentoken--based:based: CossapCossap, SPW, etc., SPW, etc.
nn Single process network (PtolemySingle process network (Ptolemy--style)style)
EE290A: Design of Embedded System ASV/LL 9/10
6
HW and SW Modeling HW and SW Modeling TechniquesTechniques
throughput
accu
racy
progmemory
procHW HW
HW
SW&
procHW
HWSW
hardware centrichardware centric
software orientedsoftware oriented
functionalfunctional
Cycle-based,logic simul.
Design flowDesign flow
Behaviorcapture
Mapping(partitioning)
Architecturecapture
Functionalsimul.
Architecturesimul.
Performancesimul.
Synthesis,coding
EE290A: Design of Embedded System ASV/LL 9/10
7
Architecture modelArchitecture model
nn AbstractAbstract model for mappingmodel for mappinguu no detailed wiring (busses, serial links, etc.)no detailed wiring (busses, serial links, etc.)uu blackblack--box components (box components (ASICsASICs, micro, micro--controllers,controllers, DSPsDSPs, ,
memories, etc.)memories, etc.)nn Later refined to a detailed designLater refined to a detailed design
uu implement communicationimplement communicationuu refine interfacesrefine interfaces
Architecture modelArchitecture model
CPU(10 SPEC)
ASIC(s)(160 Mops)
Memory(2Mb) bus
10Mb/s
EE290A: Design of Embedded System ASV/LL 9/10
8
Cycle-based,logic simul.
Design flowDesign flow
Behaviorcapture
Mapping(partitioning)
Architecturecapture
Functionalsimul.
Architecturesimul.
Performancesimul.
Synthesis,coding
MappingMapping
nn Associates functional units with architectural unitsAssociates functional units with architectural unitsnn Performs HW/SW partitioningPerforms HW/SW partitioningnn Associates functional communication with Associates functional communication with
resources (buffers, busses, serial links, etc.)resources (buffers, busses, serial links, etc.)nn Provides Provides estimatesestimates of performance of a given of performance of a given
function on a given architectural unitfunction on a given architectural unit
EE290A: Design of Embedded System ASV/LL 9/10
9
MappingMapping
CPU(10 SPEC)
Memory(2Mb) bus
10Mb/s
Flowsplitting
IDCT
Motioncomp.
Imagememory
Videooutput
ASIC(160Mops)
Variable-length dec.
Stream in Video out
MappingMapping
CPU(10 SPEC)
Memory(2Mb) bus
10Mb/s
Flowsplitting
IDCT
Motioncomp.
Imagememory
Videooutput
ASIC(160Mops)
Variable-length dec.
Stream in Video out
EE290A: Design of Embedded System ASV/LL 9/10
10
MappingMapping
CPU(10 SPEC)
Memory(2Mb) bus
10Mb/s
Flowsplitting
IDCT
Motioncomp.
Imagememory
Videooutput
ASIC(160Mops)
Variable-length dec.
Stream in Video out
MappingMapping
CPU(10 SPEC)
Memory(2Mb) bus
10Mb/s
Flowsplitting
IDCT
Motioncomp.
Imagememory
Videooutput
ASIC(160Mops)
Variable-length dec.
Stream in Video out
EE290A: Design of Embedded System ASV/LL 9/10
11
Another mappingAnother mapping
CPU(10 SPEC)
Memory(2Mb) bus
10Mb/s
Flowsplitting
IDCT
Motioncomp.
Imagememory
Videooutput
Variable-length dec.
Stream in Video out
ASIC(160Mops)
Performance simulationPerformance simulation
nn Analyzes performance of behavior on given Analyzes performance of behavior on given architecturearchitecture
nn Architectural model provides:Architectural model provides:uu performance estimatesperformance estimatesuu resource constraintsresource constraints
èè CPU schedulingCPU schedulingèè bus arbitration policybus arbitration policyèè (abstract) cache modeling(abstract) cache modeling
EE290A: Design of Embedded System ASV/LL 9/10
12
OutlineOutline
nn Problem statementProblem statementnn Simulation and embedded system designSimulation and embedded system design
uu functional simulationfunctional simulationuu performance simulationperformance simulation
èè POLIS implementationPOLIS implementationèè partitioning examplepartitioning example
uu implementation simulationimplementation simulationèè softwaresoftware--orientedorientedèè hardwarehardware--centriccentric
nn SummarySummary
Performance simulation Performance simulation prototypeprototype
nn PolisPolisuu High level specificationHigh level specificationuu Unified model for both hardware and software (CFSMs)Unified model for both hardware and software (CFSMs)uu Code generation and estimationCode generation and estimationuu Hardware synthesisHardware synthesis
nn PtolemyPtolemyuu Simulation environmentSimulation environmentuu Different simulation models can coexistDifferent simulation models can coexistuu Nice graphical interfaceNice graphical interface
EE290A: Design of Embedded System ASV/LL 9/10
13
The POLIS design flowThe POLIS design flow
Graphical EFSMGraphical EFSM ESTERELESTEREL ................................
CompilersCompilers
CFSMsPartitioningPartitioning
Sw SynthesisSw Synthesis
FormalFormalVerificationVerification
Sw Code + Sw Code + RTOSRTOS
Logic NetlistLogic NetlistSimulationSimulation
Hw SynthesisHw SynthesisIntfc SynthesisIntfc Synthesis
PrototypePrototype
Performance simulation Performance simulation in POLISin POLIS
nn Based on synthesized software timing estimatesBased on synthesized software timing estimatesnn Generate C code for both hardware and software Generate C code for both hardware and software
componentscomponentsuu each statement is labeled with the estimated running each statement is labeled with the estimated running
timetimeuu accumulate delays during execution of software accumulate delays during execution of software
componentscomponentsuu use cycle based simulation for hardware componentsuse cycle based simulation for hardware components
nn Very fastVery fastuu The system is compiled on the host machineThe system is compiled on the host machineuu Don’t need a model of the processorDon’t need a model of the processor
EE290A: Design of Embedded System ASV/LL 9/10
14
Performance estimationPerformance estimation
nn Current practice: manual guess or detailed Current practice: manual guess or detailed simulationsimulation
nn Software: need restrictions on coding styleSoftware: need restrictions on coding styleuu still requires lots of user inputs still requires lots of user inputs uu can be automated if software is synthesized can be automated if software is synthesized
nn Hardware: need RTL specification or fast Hardware: need RTL specification or fast behavioral synthesisbehavioral synthesis
nn Communication: need communication mapping Communication: need communication mapping (and refinement)(and refinement)
Performance estimationPerformance estimation
4040
26264141 6363
14
18 9
a := a + 1a := a + 1 a := 0a := 0
detect(c)detect(c) a<ba<b
BEGINBEGIN
ENDEND
FFTT
TT FF
emit(y)emit(y)
L0: i f ( de t e c t _ c ) go t o L1; e l s e g o t o end ;
L1: i f ( a<b) go t o L3; e l s e g o t o L2;
L2: a = a + 1; go t o L4;
L3: a = 0;
L4: e mi t ( y ) ;
end : c l ean_up( ) ;
EE290A: Design of Embedded System ASV/LL 9/10
15
CFSM scheduling policyCFSM scheduling policy
nn Concurrency and shared resourcesConcurrency and shared resourcesuu hardware components are concurrenthardware components are concurrentuu only one software component can be executed at any timeonly one software component can be executed at any timeuu if a software component receives an event,if a software component receives an event,
but the simulated processor is busy, but the simulated processor is busy, its execution is postponedits execution is postponed
uu need a scheduler to choose among all enabled software need a scheduler to choose among all enabled software processesprocesses
nn Discrete event simulation with timeDiscrete event simulation with time--stamped stamped events is used to synchronize HW and SW events is used to synchronize HW and SW
Car dashboard exampleCar dashboard example
FRCFRC
TimerTimer
OdoOdo
BeltBelt
SpeedSpeed
RPMRPM CrossdispCrossdisp
CrossdispCrossdisp
CrossdispCrossdispFuelFuel
fuelfuel
key, beltkey, belt
clockclock
wheelwheel
engineengine
fuel_fuel_dispdisp
speed_speed_dispdisp
RPM_RPM_dispdisp
odoodo__dispdisp
Timing generatorsData processingPWM drivers
EE290A: Design of Embedded System ASV/LL 9/10
16
TradeTrade--off evaluationoff evaluation
nn Interactively changed simulation parametersInteractively changed simulation parametersuu Define different aspects:Define different aspects:
èè Implementation of each CFSMImplementation of each CFSMèè CPU and clock frequencyCPU and clock frequencyèè SchedulerScheduler
uu May be inherited in hierarchyMay be inherited in hierarchyuu Automatically transmitted to following synthesis stepsAutomatically transmitted to following synthesis steps
TradeTrade--off evaluationoff evaluation
nn Hw/Sw implementation and partitioningHw/Sw implementation and partitioninguu meeting timing constraintsmeeting timing constraintsuu tradetrade--offsoffs: speed, code size, chip area: speed, code size, chip area
nn Scheduling policiesScheduling policiesuu Round RobinRound Robinuu PrePre--emptiveemptive and non preand non pre--emptiveemptive
nn CPU selectionCPU selectionuu MC 68HC11, MC 68332, MIPS R3000MC 68HC11, MC 68332, MIPS R3000
nn Don’t need to recompile the systemDon’t need to recompile the system
EE290A: Design of Embedded System ASV/LL 9/10
17
TradeTrade--off evaluationoff evaluation
nn Input patternsInput patternsuu Impulse, clock, randomImpulse, clock, randomuu Sliders, buttonsSliders, buttonsuu Waveforms from fileWaveforms from file
nn Monitoring the systemMonitoring the systemuu Processor utilization and task scheduling chartsProcessor utilization and task scheduling chartsuu Missed deadlinesMissed deadlinesuu Cost of implementationCost of implementationuu Internal valuesInternal values
Performance evaluationPerformance evaluation
Targetproc.
MHz Part. Wheel &Engine
Missed
68HC11 4 1 260-8000 068HC11 1 1 180-6000 90068HC11 4 2 50-3000 500068332 20 3 50-3000 3068332 40 3 260-8000 20
Partition 1: HW={Timing unit, PWM drivers) SW={Data Processing}Partition 2: HW={Timing unit} SW={Data processing, PWM drivers}Partition 3: HW={} SW={Timing unit, Data processing, PWM drivers}
EE290A: Design of Embedded System ASV/LL 9/10
18
Simulation speedSimulation speed
Targetproc.
MHz Part. Graph. cycles/sec
68HC11 4 3 No < 2000068HC11 4 2 No 65000068HC11 4 1 No 55000068HC11 4 2 Yes 100000
Partition 1: HW={Timing unit, PWM drivers) SW={Data Processing}Partition 2: HW={Timing unit} SW={Data processing, PWM drivers}Partition 3: HW={} SW={Timing unit, Data processing, PWM drivers}
Cycle-based,logic simul.
Design flowDesign flow
Behaviorcapture
Mapping(partitioning)
Architecturecapture
Functionalsimul.
Architecturesimul.
Performancesimul.
Synthesis,coding
EE290A: Design of Embedded System ASV/LL 9/10
19
Architecture refinementArchitecture refinement
CPU(10 SPEC)
ASIC(160 Mops)
Memory(2Mb) bus
10Mb/s
486(66 MHz)
Memory(2Mb) PCI bus
132 Mb/s
ASIC(30 Kgates66 MHz)
Architecture simulationArchitecture simulation
nn Performed on refined model of the architecture, Performed on refined model of the architecture, ignoring behaviorignoring behavior
nn Simulation verifies Simulation verifies interface correctnessinterface correctnessuu busbus--functional model for processors, with random functional model for processors, with random
address and data generationaddress and data generation(Logic Modeling, etc.)(Logic Modeling, etc.)
uu cyclecycle--based or logic simulation for the hardware based or logic simulation for the hardware interfacesinterfaces
nn Interface refinement simulationInterface refinement simulation(Rowson et al., Hines et al.)(Rowson et al., Hines et al.)
EE290A: Design of Embedded System ASV/LL 9/10
20
HW and SW Modeling HW and SW Modeling TechniquesTechniques
throughput
accu
racy
progmemory
procHW HW
HW
SW&
procHW
HWSW
hardware centrichardware centric
software orientedsoftware oriented
functionalfunctional
Bus Functional ModelBus Functional Model
nn command interpreter command interpreter executing memory bus executing memory bus access cyclesaccess cyclesuu drives all relevant drives all relevant
processor pinsprocessor pinsuu does not implement the does not implement the
complete instruction set of complete instruction set of the processorthe processor
cmd> WRTval,addrcmd> WRT R2,addrcmd> RDaddr,R1cmd> ...
C/C++
HDLlanguage
and/orsimulatorbinding
adrdatardwrt
HW
SW&
procHW
EE290A: Design of Embedded System ASV/LL 9/10
21
Refined mappingRefined mapping
486(66 MHz)
Memory(2Mb) PCI bus
132 Mb/s
Flowsplitting
IDCT
Motioncomp.
Imagememory
Videooutput
ASIC(30Kgates66 MHz)
Variable-length dec.
Stream in Video out
Cycle-based,logic simul.
Design flowDesign flow
Behaviorcapture
Mapping(partitioning)
Architecturecapture
Functionalsimul.
Architecturesimul.
Performancesimul.
Synthesis,coding
EE290A: Design of Embedded System ASV/LL 9/10
22
Synthesis and codingSynthesis and coding
nn Implement functional specification on architectureImplement functional specification on architecturenn Ideally should be automated Ideally should be automated
(e.g. if function for HW is specified as RTL… )(e.g. if function for HW is specified as RTL… )nn In practice generally a mix of hand design and In practice generally a mix of hand design and
synthesissynthesis
OutlineOutline
nn Problem statementProblem statementnn Simulation and embedded system designSimulation and embedded system design
uu functional simulationfunctional simulationuu performance simulationperformance simulation
èè POLIS implementationPOLIS implementationèè partitioning examplepartitioning example
uu implementation simulationimplementation simulationèè softwaresoftware--orientedorientedèè hardwarehardware--centriccentric
nn SummarySummary
EE290A: Design of Embedded System ASV/LL 9/10
23
Implementation simulationImplementation simulation
nn Necessary because:Necessary because:uu need detailed performance analysisneed detailed performance analysisuu hand design is errorhand design is error--proneprone
nn Most errors should have been caught beforeMost errors should have been caught beforenn How to use test vectors and compare results How to use test vectors and compare results
between between different abstraction levels different abstraction levels ??
HW and SW Modeling HW and SW Modeling TechniquesTechniques
throughput
accu
racy
progmemory
procHW HW
HW
SW&
procHW
HWSW
hardware centrichardware centric
software orientedsoftware oriented
functionalfunctional
EE290A: Design of Embedded System ASV/LL 9/10
24
Software OrientedSoftware Oriented
software executing on asoftware executing on amodel of the processormodel of the processor
n combine program and data memory withthe processor into a single model
n implement instruction fetch, decode, andexecution cycle in opcode interpreter in software
HW
SW&
procHW
Clock Cycle AccurateClock Cycle AccurateInstruction Set Architecture ModelInstruction Set Architecture Model
clock cycle accurate
ISA model
C/C++
HDLlanguage
and/orsimulatorbinding
model derived/generated fromthe instruction set description
ADD R1,R2,R3{ A1.OP = 0x01
A1.RA = R1A1.RB = R2... }
{ R3<- ADD(R2,R1) }
ADD R1,R2,R3{ A1.OP = 0x01
A1.RA = R1A1.RB = R2... }
{ R3<- ADD(R2,R1) }
HW
SW&
procHW
EE290A: Design of Embedded System ASV/LL 9/10
25
Clock Cycle AccurateClock Cycle AccurateInstruction Set Architecture ModelInstruction Set Architecture Model
nn hardware devices hanginghardware devices hangingoff the memory busoff the memory busuu memory read and write cycles memory read and write cycles
are executed when specific are executed when specific memory addresses are being memory addresses are being referencedreferenced
clock cycle accurate
ISA model
C/C++
HDLlanguage
and/orsimulatorbinding
adrdatardwrt
HW
SW&
procHW
Instruction Cycle AccurateInstruction Cycle AccurateInstruction Set Architecture ModelInstruction Set Architecture Model
nn same basic idea as same basic idea as Clock Cycle Accurate ISA Clock Cycle Accurate ISA ModelModeluu processor status can only be observed at instruction processor status can only be observed at instruction
cycle boundariescycle boundariesuu usually no notion of processor pins, besides serial usually no notion of processor pins, besides serial
and parallel portsand parallel portsuu can be enhanced to drive all pins, including the can be enhanced to drive all pins, including the
memory interfacememory interfaceèè Clock Cycle Accurate ISA ModelClock Cycle Accurate ISA Model
HW
SW&
procHW
EE290A: Design of Embedded System ASV/LL 9/10
26
Virtual ProcessorVirtual Processor
nn C/C++ code is compiled and executed on a C/C++ code is compiled and executed on a workstationworkstationuu accuracy of numeric results may be an issueaccuracy of numeric results may be an issue
nn Hardware devices hanging off the memory busHardware devices hanging off the memory busuu additional function calls to execute memory read additional function calls to execute memory read
and write cycles when specific memory addresses and write cycles when specific memory addresses are being referenced can be inserted in the original are being referenced can be inserted in the original softwaresoftware
uu need estimation and synchronization with hardware need estimation and synchronization with hardware simulation to model software timingsimulation to model software timing
HW
SW&
procHW
HW and SW Modeling HW and SW Modeling TechniquesTechniques
throughput
accu
racy
progmemory
procHW HW
hardware centrichardware centric
HW
SW&
procHW
software orientedsoftware oriented
HWSW
functionalfunctional
EE290A: Design of Embedded System ASV/LL 9/10
27
HardwareHardware--CentricCentric
nn Possible descriptions of the hardwarePossible descriptions of the hardwareuu actual hardwareactual hardwareuu gate levelgate level netlistnetlistuu clock cycle accurate architectural descriptionclock cycle accurate architectural description
hardware executing a softwareimage stored in memory
progmemory
procHW HW
progmemory
procHW HWActual HardwareActual Hardware
uu processor is plugged into a device connected to a processor is plugged into a device connected to a workstationworkstation
uu discrete event simulator just sees another discrete event simulator just sees another componentcomponent
uu software image is loaded into memory and clock software image is loaded into memory and clock is appliedis applied
workstation running discrete event simulator
XPLORERMS- 3X00
ModelSource
XPLORERMS- 3X00
ModelSource
Hardware Modeler
EE290A: Design of Embedded System ASV/LL 9/10
28
progmemory
procHW HWGate Level NetlistGate Level Netlist
uu exact timing of the signals at all input and output exact timing of the signals at all input and output pins pins
uu full visibility of all internal signals and storagefull visibility of all internal signals and storage
gate level netlist executed in a discreteevent verification environment
Clock CycleClock Cycle--AccurateAccurateArchitectural DescriptionArchitectural Description
uu model delivers at each clock edgemodel delivers at each clock edgea set of a set of stablestable output signals givenoutput signals givena set of a set of stablestable input signalsinput signals
uu visibility of internal signalsvisibility of internal signalsand storage depends onand storage depends ondegree of model refinementdegree of model refinement
architectural description executing ina cycle based verification environment
progmemory
procHW HW
EE290A: Design of Embedded System ASV/LL 9/10
29
SummarySummaryFunctional model
VERIFY FUNCTIONALITY
Implementation modelVERIFY
ABSTRACTIONS
Performance modelVERIFY
PERFORMANCE
Architecture modelVERIFY
INTERFACES
ConclusionsConclusions
nn Abstraction key to speedupAbstraction key to speedupnn Separate behavior (functionality), communication Separate behavior (functionality), communication
and timingand timingnn Architecture refinement essential for true coArchitecture refinement essential for true co--design design
and fast coand fast co--simulationsimulationnn Software and hardware synthesis helps Software and hardware synthesis helps
performance estimationperformance estimationnn Rapid prototyping may be the ultimate coRapid prototyping may be the ultimate co--
simulation tool...simulation tool...