lec24 exploration
TRANSCRIPT
-
7/31/2019 Lec24 Exploration
1/29
1
(PEHGGHG6\VWHP'HVLJQ(PEHGGHG6\VWHP'HVLJQIRU:LUHOHVV$SSOLFDWLRQVIRU:LUHOHVV$SSOLFDWLRQV
Jan M. RabaeyJan M. Rabaey
BWRC
University of California @ Berkeley
http://www.eecs.berkeley.edu/~jan
DAC 2000, Los Angeles
The Distributed Approach to InformationThe Distributed Approach to Information
ProcessingProcessing
Source: Richard Newton
-
7/31/2019 Lec24 Exploration
2/29
2
The Smart HomeThe Smart Home
Security
Environment monitoring and controlObject taggingIdentification
Dense network ofDense network of
sensor and monitor nodessensor and monitor nodes
Wireless in the HomeWireless in the Home
Source: IEEE Spectrum,
December 99
-
7/31/2019 Lec24 Exploration
3/29
3
The Changing MetricsThe Changing Metrics
Flexibility
Power
Cost
Performance as a Functionality ConstraintPerformance as a Functionality Constraint
(Just(Just--inin--Time Computing)Time Computing)
The Wireless System Design ChallengeThe Wireless System Design Challenge
Projected energy per digital operation(2004): 50 pJ
Lithium-Ion: 220 Watt-hours/kg == 800Joules/gr
At 50 pJ/operation:10 teraOps/gr!
Equivalent to continuous operation at 100 MOPSfor 30 hours (or average power dissipation of 6mW)
The Battery LimitationThe Battery Limitation
-
7/31/2019 Lec24 Exploration
4/29
4
Some interesting numbersSome interesting numbers Energy cost of digital computation
1999 (0.25m): 1pJ/op (custom) 1nJ/op (proc)
2004 (0.1m): 0.1pJ/op (custom) 100pJ/op (proc) Factor 1.6 per year; Factor 10 over 5 years
Assuming reconfigurable implementation: 1 pJ/op
Energy cost of communication 1999 Bluetooth (2.4 GHz band, 10m distance)
1 nJ/bit transmission energy (thermal limit 30 pJ/bit)
Overall energy: 170 nJ/bit reception / 150 nJ/bit transmission (!)
Standby power: 300 W
2004 Radio (10 m) Only minor reduction in transmission energy
Reduce transceiver energy with at least a factor 10-50
Trade-off @10m: 5000 operations / transmitted bit
@ 1m: 0.5 operations / transmitted bit
The Implementation OpportunitiesThe Implementation Opportunities
SystemSystem--onon--aa--ChipChip
RAM
500 k Gates FPGA
+ 1 Gbit DRAM
Preprocessing
Multi-
Spectral
Imager
C
system
+2 Gbit
DRAMRecog-
nition
Analog
64 SIMD Processor
Array + SRAM
Image Conditioning
100 GOPS
Embedded applications where cost,
performance, and energy are the real
issues!
DSP and control intensive
Mixed-mode
Combines programmable and
application-specific modules
SOCSOC annoanno 20102010
-
7/31/2019 Lec24 Exploration
5/29
5
The SystemThe System--onon--aa--Chip NightmareChip Nightmare
Femme seFemme se coiffantcoiffant
Pablo Ruiz PicassoPablo Ruiz Picasso
19401940
The SystemThe System--onon--aa--Chip NightmareChip Nightmare
Bridge
DMA CPU DSP
Mem
Ctrl.
MPEG
CI O O
System Bus
PeripheralBus
Control Wires
CustomInterfaces
The Board-on-a-Chip
Approach
Courtesy of Sonics, Inc
-
7/31/2019 Lec24 Exploration
6/29
6
The Wireless ChallengeThe Wireless Challenge
Physical
+ RF
Mac/
Data Link
NetworkApplication'DWD'DWD
DataAcquisition
DataEncoding
DataFormatting
Mod/Demod
UI
&RQWURO&RQWURO
Synchron-ization
SlotAllocation
CallSetup
Data and Time Granularity
nsecsecmsecsecbitspacketsstreamssource data
RadioRadio
The Software RadioThe Software Radio
A/D ConverterD/A Converter
DSP
Idea: Digitize (wideband) signal at antenna and usesignal processing to extract desired signal
Leverages of advances in technology, circuitdesign, and signal processing
Software solution enables flexibility and adaptivity,but at huge price in power and cost
16 bit A/D converter at 2.2 GHz dissipates 1 to 10 W
-
7/31/2019 Lec24 Exploration
7/29
7
The Mostly Digital RadioThe Mostly Digital Radio
DigitalBasebandReceiver
RF input
(fc = 2GHz)
LNA
cos[2(2GHz)t]
RF filter
chip boundary
I (50MS/s)
Q (50MS/s)
A/D
A/D
sin[2(2GHz)t]
Analog Digital
Architectural ChoicesArchitectural Choices
P
Prog Mem
MAC
Unit
Addr
GenP
Prog Mem
P
Prog Mem
Satellite
Processor
Dedicated
Logic
Satellite
Processor
Satellite
Processor
GeneralPurposeP
Software
DirectMapped
Hardware
HardwareReconfigurable
Processor
ProgrammableDSP
Flexibility
1/Efficiency
-
7/31/2019 Lec24 Exploration
8/29
8
The EnergyThe Energy--Flexibility GapFlexibility Gap
Embedded ProcessorsSA1100.4 MIPS/mW
ASIPsDSPs 2 V DSP: 3 MOPS/mW
DedicatedHW
Flexibility (Coverage)
EnergyEfficiency
MOPS/mW(orMIPS/mW)
0.1
1
10
100
1000
ReconfigurableProcessor/Logic
Pleiades10-80 MOPS/mW
System Optimization HierarchySystem Optimization Hierarchy
Functional & Performance
RequirementsNetwork Architecture
Performance analysis
Constraints
Network
level
Node
level
Functional & Performance
RequirementsNode Architecture
Performance analysis
-
7/31/2019 Lec24 Exploration
9/29
9
The fully programmable approachThe fully programmable approach Flexible platform forexperimentation on
networking andprotocol strategies
Size: 3x4x2
Power dissipation < 2 W(peak)
Multiple radio modules:Bluetooth, Proxim,
Collection of sensorand monitor cards
Fully operational by latespring (includingsoftware supportsystem)!
Digital IntercomDigital IntercomA Design Exercise inA Design Exercise in
Communication/Component BasedCommunication/Component BasedDesignDesign Basestation
Mobiles
Up to 20 users per cell @ 64 kbit/sec per link
TDMA selected as MAC protocol
Known and testedspecification of limitedcomplexity allowsfocus on architecturalimplementationmethodology
Two-chipimplementationleverages separatesbetween analog (RF)
and digital designconcerns
Duration of exercise:1 year (summer 00)
-
7/31/2019 Lec24 Exploration
10/29
10
TwoTwo--Chip IntercomChip Intercom
ADC
DAC
Chip 2Chip 1
Custom
analogcircuitry
Mixed
analog/digital
DigitalBaseband
processing
Fixedlogic
Program-
mable logic
Software
running onprocessor
Analog RF
Protocol
Direct down-conversion front-end
(Yee et al)
- ADC
LNA
Filters
Mixer
PLL
The Target ArchitectureThe Target Architecture
A
D
Multi-model Analog RF
Timing
recovery
phone
bookAppl.
ARQ
Keypad,Display
Coding
Filters
MAC
Transport
Correlators MUD
Accelerators
(bit level)
analog digital
DSP core
EmbeddedEmbedded PP
Physical
Layer
Fixed HardwareFixed Hardware
Programmable HardwareProgrammable Hardware
-
7/31/2019 Lec24 Exploration
11/29
11
DigitalDigitalBasebandBaseband
Simulink example:Matched filter correlator
Stateflow example:Receiver controller
Stage 1: floating point blocksStage 2: fixed point blocks
Stage 1: Model components in MATHWORKS tools (Simulink/Stateflow)at appropriate granularities
Stage 2: Convert Simulink structural blocks (manually) to fixed point / bit-trueStage 3: Map to HW and/or SW (automatic synthesis path: Simulink to HW -
Simulink to Software)
Design Estimations (First order):Design Estimations (First order):
RF + ADC/DAC
Transmit: 30 mW
Receive: 70 mWDigital (conservative)
Transmit: 20 mW (100,000 transistors)
Receive: 80 mW (700,000 transistors)
Physical layer timing analysis (fromPhysical layer timing analysis (fromSimulinkSimulink))
Hz MHz s us
Chip 2.50E+07 25.00 4.00E-08 0.04 Chips per Symbol 31
Symbol 8.06E+05 0.81 1.24E-06 1.24 Bits per Symbol 2
Bit 1.61E+06 1.61 6.20E-07 0.62
0.00
Pilot symbol 1.24E-06 1.24 Pilot sequence length 15
Pilot sequence 1.86E-05 18.60
Channel coherency time 1.00E-01
BB clock coherency time (s) 5.00E-03
Max # sequential symbols (s) 4.03E+03
Safety margin 95.00%
Safe # sequential symbols 3.83E+03
PD (# of symbols) 10
PD 0.0000124 12.40 DAT (# of symbols) 3800 OK
DAT 0 .00 47 12 4 71 2.0 0
meters feet
Min distance 5 16.40
4.96E-05 49.60 (1) Max distance 10 32.81
Speed of light (feet per ns) 1
Speed of light (feet per s) 1.00E+09
2.58E-06 2.58 (2)
s us
Min LOS time of flight 1.64E-08 0.02
Max LOS time of flight 3.28E-08 0.03
5.22E-05 52.18 (3)
Max time of flight (suggested
by Paul bwo Dennis) 1.00E-07 0.10
time from TX on Radio A, an d RX on radio B until:
first DAT clock on transmitter
RXCLK on radio B
first DAT2 RXCLK on radio B
time from RX to TX transition until :
time from TXCLK on Radio A until:
Estimates for th e performan ce of the TCI Physical layer Ad ditio nal Calculation s
TX= PS | PD | PS | DAT
The transmit protocol will send a pilot sequence, some small number of dummy data
bits (PD), another pilot sequence, and the real data bits (DAT) with the constraint
that DAT
-
7/31/2019 Lec24 Exploration
12/29
12
Physical Layer DesignPhysical Layer Design
Digital baseband bridges gap between RF/Comm andprotocol/network
Analog
and RFDSP
RF and Communications
Application/
Architecture
Protocol/ Network
Comm
Algorithms
Radio Operating
Mode
Radio
Controller
Digital Baseband
Protocol
Design
Physical Layer Protocol
Physical to Protocol InterfacePhysical to Protocol Interface
xx Different toolsDifferent tools
xx Verification relying on coVerification relying on co--
simulationsimulation
xx Interface design critical toInterface design critical to
ensuring final designs workensuring final designs work
togethertogether
Define small number ofDefine small number of
interface signalsinterface signals
Clearly specify behaviorClearly specify behavior
TX
TX_CLK
TX_DATA
RX
RX_CLK
RX_DATA
ON/OFF
Chip 2
Digital
Baseband
Processing
Protocol
Physicalto
Protocol
Interface
Signals
MATLAB VCC
-
7/31/2019 Lec24 Exploration
13/29
13
The Intercom Protocol StackThe Intercom Protocol Stack
UI
MAC
Transmit
Synchronization
Filter
Tx_data Rx_data
Mulaw Mulaw
Transport
User Interface Layer
Transport Layer
Mac Layer
Data Link Layer
Voice samples
Tx/Rx
Service Requests
Receive
RefinementRefinement--based Protocol Design Methodologybased Protocol Design Methodology
RefinementRefinementCFSM modelCFSM modelSimulationSimulation
Formal VerificationFormal Verification
Hardware (VHDL)Hardware (VHDL)
FormalFormal
SystemSystem
SpecSpec
Input languageInput language
FormalFormal
Software (C)Software (C)
A CFSM-based approach Advantages Combines synchronous
and asynchronousmodels
Constrained modelenables verification
-
7/31/2019 Lec24 Exploration
14/29
14
CoCo
--design Finite State machinesdesign Finite State machines
Three-level hierarchy
top level: asynchronous, partially ordered
(bounded buffer non- blocking single- read communication)
middle level: synchronous FSM
(atomic event- and condition- based transition)
bottom level: Synchronous DataFlow- like
(FSM provides tokens and selects active sub- network)
(from ee249: http://www-cad.eecs.berkeley.edu/Respep/Research/hsc/class/index.html
-
7/31/2019 Lec24 Exploration
15/29
15
POLIS/VCC Design FlowPOLIS/VCC Design Flow
* (from the VCC manual)
Describing the BehaviorDescribing the BehaviorLayer C-code
(lines)
State-
transition
Diagram
(states)
User Interface 100
Mulaw 100
Transport 300
MAC 270 42
Transmit 120 16
Receive 140 2
Synchronization 17
CFSM
VCC, Polis
-
7/31/2019 Lec24 Exploration
16/29
16
Formal VerificationFormal Verification
System satisfies certain properties? System described in some formal mathematical languages (e.g.
Esterel)
Properties written in some formal logic (e.g. LTL) or formal model (e.g.Esterel)
Property Verification Invariant (only one remote can send voice data in any time slot)
Response (if a remote sends a request to the base station, theneventually there is an acknowledgement)
deadlock freedom
Refinement Checking Does the (low-level) implementation conform with the (high-level)
specification?(Do the mapped CFSMs function the same as the specification?)
Mocha (Henzinger): Modularity in Model Checking
Example of Property VerificationExample of Property Verification
Remote returns to theRemote returns to the disconnect statedisconnect state
if user presses theif user presses the disconnect buttondisconnect button..
$*'LVF $)1RW&RQQ
"1272.
-
7/31/2019 Lec24 Exploration
17/29
17
Why it Fails?Why it Fails?
Remote accepts Discfrom the user even ifit is not connected
After the remote has sent DiscReqandwaits for acknowledgement
However, base station ignores DiscReqifremote is not registered
Targeted Implementation PlatformTargeted Implementation Platform
Embedded
Processor
Embedded
Processor
Memory
Sub-system
Memory
Sub-system
Baseband
Processing
Baseband
Processing
Configurable
Logic
(Physical Layer)
Configurable
Logic
(Physical Layer)
Programmable
Protocol Stack
Programmable
Protocol Stack
Interconnect Network
Benefit: Build library of computational andBenefit: Build library of computational and
networking modules (and models)networking modules (and models)
-
7/31/2019 Lec24 Exploration
18/29
18
Describing the ArchitectureDescribing the Architecture
Xtensa embedded CPU(Tensilica, Inc) Configurability allows designer to keep
minimal hardware overhead
ISA (compatible with 32 bit RISC) canbe extended for software optimizations
Fully synthesizable
Complete HW/SW suite
VCC modeling for exploration Requires mapping of fuzzy
instructions of VCC processor modelto real ISA
Requires multiple models dependingon memory configuration
ISS simulation to validate accuracy ofmodel
xx Tensilica model in VCCTensilica model in VCC
inst,LD,2inst,LI,1inst,ST,2inst,OP.c,2inst,OP.s,3inst,OP.i,1inst,OP.l,1inst,OP.f,1
inst,OP.d,6
inst,DIV.i,118inst,DIV.l,122inst,DIV.f,145inst,DIV.d,155inst,IF,5inst,GOTO,2inst,SUB,19inst,RET,21
inst,MUL.c,9inst,MUL.s,10inst,MUL.i,18inst,MUL.l,22inst,MUL.f,45inst,MUL.d,55inst,DIV.c,19inst,DIV.s,110
Describing The ArchitectureDescribing The Architecture
The OnThe On--Chip NetworkChip Network
DSP MPEGCPUDMA
C MEMI O
Example: The Silicon Backplane (Sonics, Inc)Example: The Silicon Backplane (Sonics, Inc)
Open Core
ProtocolTM
SiliconBackplaneAgentTM
Guaranteed BandwidthGuaranteed Bandwidth
ArbitrationArbitration
-
7/31/2019 Lec24 Exploration
19/29
19
Describing the ArchitectureDescribing the Architecture
Flexible bandwidth arbitration model
TDMA slot map gives slot owner right
of refusal
Unowned/unused slots fall to round-
robin arbitration
Latency after slice granted is user-specified
between 2-7 Bus Clock cycles
xx SONICS model in VCCSONICS model in VCC
InitiatorCore
InitiatorAgent
Interconnect
OCP
TargetAgent
TargetCore
OCP
Arbiter
TCI ArchitectureTCI Architecture
SiliconBackplaneASIC Tensilica Xtensa
-
7/31/2019 Lec24 Exploration
20/29
20
Application
Rest
Transport
Exploring Architectural MappingsExploring Architectural Mappings
SoftwareSoftware
ProcessorProcessor
ASICASIC
AcceleratorsAccelerators
Mu-law
MAC
Processor UtilizationProcessor Utilization--EstimationEstimationProcessorUtilization
ClockFrequency
User Interface
ARM
@11MHz
32.7%
MulawTransport
User Interface
ARM
@1MHz
5.46%
Transport
Latency
insensitive
ARM
@200MHz
2.7%User Interface
MulawTransport
0.5 MAC
Peak performance
ARM
@2GHz
User Interface
MulawTransport
0.9 MAC
RTOS
overhead
-
7/31/2019 Lec24 Exploration
21/29
21
Implementation Fabrics for ProtocolsImplementation Fabrics for Protocols
BUF
Memory
Slot_Set_Tbl2x16
addr
BUF
slot_set
Slot_no
Slotstart
Pktend
RACHreq
RACHakn
W_ENA
R_ENAupdate
idle
writereadslotset
RACH
idle
A protocol =Extended FSM
Intercom TDMA MAC
Intercom TDMA MACIntercom TDMA MAC
Implementation alternativesImplementation alternatives
ASIC: 1V, 0.25 m CMOS process
FPGA: 1.5 V 0.25 m CMOS low-energy FPGA
ARM8: 1 V 25 MHz processor; n = 13,000
Ratio: 1 - 8 - >> 400
ASIC FP GA AR M8
P ower 0.26m W 2.1m W 114mW
Energy 10 .2pJ/op 81 .4pJ/op n*457pJ/op
Idea: Exploit model of computation: concurrent finite state machines,
communicating through message passing
-
7/31/2019 Lec24 Exploration
22/29
-
7/31/2019 Lec24 Exploration
23/29
23
HW Mapping Experiment: STD to Flexible Imp.HW Mapping Experiment: STD to Flexible Imp.
020 0
40 0
60 0
80 0
1000
1200
1400
TCI CRC TCI CRC+FSM Phys Send
#
G
ates
Xilinx FPGA
Altera PLD
CoolRunner
Area Comparison - FPGA x PLD (Manual)
HW Mapping Experiment: Flexible versus FixedHW Mapping Experiment: Flexible versus Fixed
0
20 0
40 0
60 0
80 0
1000
1200
1400
TCI CRC TCI CRC+FSM Phy s Send
#
G
ates
Xilinx FPGA
Design Compiler
Area Comparison FPGA x Std.Cell (Manual)
-
7/31/2019 Lec24 Exploration
24/29
24
HW Mapping Experiment: PowerHW Mapping Experiment: Power
ConsumptionConsumption
0
10
20
30
40
50
60
70
T C I C R C T C I C R C +F SM Ph ysSe n d F S M
LC A
MAX7000
FPGA versus PLD
Hierarchy in System OptimizationHierarchy in System Optimization
Functional & Performance
RequirementsNetwork Architecture
Performance analysis
Constraints
Network
level
Node
level
Functional & Performance
RequirementsNode Architecture
Performance analysis
-
7/31/2019 Lec24 Exploration
25/29
25
The Applications and SpecsThe Applications and Specs
Security
Environment monitoring and controlObject taggingIdentification
Dense network ofDense network of
sensor and monitor nodessensor and monitor nodes
The Obvious ChoiceThe Obvious Choice -The Smart Home and Network AppliancesThe Smart Home and Network Appliances
System Requirements and ConstraintsSystem Requirements and Constraints Large numbers of nodes between 0.05 and 1
nodes/m2
Cheap (
-
7/31/2019 Lec24 Exploration
26/29
26
The Software-Defined Radio
Reconfigurable
DataPath
FPGA Embedded uP
Dedicated FSM
Dedicated
DSP
Implementation in hard- and software
Source(xs,ys)
Dest(xd,yd)
Communication Request(Data type, BW, latency, BER)
Physical Layer(Band,Modulation)
Based on well-defined abstraction layers
Step-wise refinement (partitioning, resourcemapping and sharing) enables correctnessverification
Automatic synthesis of adaptive protocols inhard- and software
SystemSystem--Level Design Space ExplorationLevel Design Space Exploration
Network layer(Point-to-Point, multi-hop, star)
Media Access Layer(T-C-F-DMA)
-
7/31/2019 Lec24 Exploration
27/29
27
Distance
PicoRadio Energy OptimizationPicoRadio Energy OptimizationThe Cost of CommunicationThe Cost of Communication
TransmitPower
-70dBm
-30dBm
10dBm
100K
bps
50dBm
90dBm
1m 10m 100m 1Km 10Km
TransceiverPower50dBm
90dBm
10dBm
-30dBm
-70dBm
Assumes R-4loss due to ground wave(@ 1 GHz)
Communicating over Long DistancesCommunicating over Long Distances
MultiMulti--hop Networkshop NetworksSource
Dest
Example: 1 hop over 50 m
1.25 nJ/bit
5 hops of 10 m each5 2 pJ/bit = 10 pJ/bit
Multi-hop reduces transmission energy by 125!(assuming path loss exponent of 4)
Optimal number of hops needed forfree space path loss.
log( / )
But network discoveryBut network discovery
and maintenance overheadand maintenance overhead
-
7/31/2019 Lec24 Exploration
28/29
28
OPNETOPNETNetwork SimulatorNetwork Simulator
Analysis ViewerAnalysis Viewer
Network ModelNetwork Model Node ModelNode Model Process ModelProcess Model
Comparing the approaches from an energyComparing the approaches from an energy
perspectiveperspective Energy = Eb * Packet Size
Reactive Routing good for rarely used routes
Proactive Routing good for frequently used routes
Need solution that is more adequate for problem at hand:Need solution that is more adequate for problem at hand:classclass--based and locationbased and location--based addressing.based addressing.
0500
1000
150020002500300035004000
20 33 56
Number of Nodes
Routing Overhead (bytes)
DSDV
AODV
020004000
60008000
10000120001400016000
20 33 56
Number of Nodes
Routing Overhead (bytes) Normalized
(discovering oneroute) (discovering nroutes)
-
7/31/2019 Lec24 Exploration
29/29
SummarySummary Low-energy design ascends to prime time
forced mainly by the last-meter problem
System-on-a-Chip approach enables and demandsheterogeneous implementation strategies, sometimes involvingnon-intuitive and innovative design platforms
Design exploration over various fabrics and partitions hasdramatic impact on dominant metrics, such as energy and cost
It requires orthogonalization of function and architecture,
supplemented with performance models (cost, time, energy)
This methodology holds at all levels of the system hierarchy