design reconfigurable computing in wireless communication ... · design tools for reconfigurable...
TRANSCRIPT
Design tools for
Reconfigurable Computing in Wireless Communication Systems
Eli BozorgzadehCenter for Embedded Computing Systems
Computer Science DepartmentUniv. of California, Irvine
Reconfigurable Hardware
• Hardware on chip that could implement various functionalities
• Provides flexibility and adaptability• Software Programmability• Coarse Grained and Fine Grained• Field Programmable Gate Arrays (FPGA)
6/4/2009 2Virginia Tech Symposium on Wireless
Communications
Design Technology
Virginia Tech Symposium on Wireless Communications
3
Manufacturing Volume
Time To
Market
System
Comp
lexity
Reconfigurable Logic (FPGA)
Reconfigurable Logic (FPGA)
Structured ASIC
Structured ASIC
System‐On‐Chip
System‐On‐Chip
Source: ITRS 2005, Design Technology
6/4/2009
System Driver: SOC Architecture
Virginia Tech Symposium on Wireless Communications
4
Source: ITRS 2005 System Drivers (Fig. 12)
System on Chip Architecture Template6/4/2009
System On Chip Predictions
Virginia Tech Symposium on Wireless Communications 5
Source: ITRS 2005 System Drivers (Figure 13)
6/4/2009
Reconfigurability in SOC
Percentage Reconfigurability of SOC
23 26 28 28 3035 38 40 42 45 48 50 53 56 60 63
010203040506070
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
Year
%ag
e R
econ
figur
abili
ty
Virginia Tech Symposium on Wireless Communications
6
Source: ITRS 2005 Design Technology (Table 13)
6/4/2009
Heterogeneous FPGA as SOC
Virginia Tech Symposium on Wireless Communications
7
Xilinx Virtex 4 FX FPGA
6/4/2009
Embedded Macros in Virtex 4 FX
• 1‐2 450 MHz PowerPC• 32‐192 500 MHz DSP• 4‐20 500 MHz Digital Clock Manager and Phase Matched Clock Dividers
• ChipSync blocks at every IO (SERDES –serializer deserializer) – synchronizes with external memory, network.
• 2‐4 Ethernet MAC• 10Mb Block and distributed RAM/ROM
6/4/2009Virginia Tech Symposium on Wireless
Communications8
MPSoC on FPGA• Millions of logic gates on chip that could implement various
functionalities• Embedded dedicated blocks such as memory blocks,
multipliers, etc. • Wide range of interconnection standards, such as PCI and
high‐speed serial protocols • Open Research Issues:
– Use of soft/hard processors and customization to realize HW/MP (multi‐processor) SoC systems on FPGAs
• HW/SW partitioning tool support• Interconnect between processors• Memory Hierarchy• Architectural Support
Virginia Tech Symposium on Wireless Communications 96/4/2009
PowerPC
Co‐processorA
Cache
Softprocessor
Coprocessor B
Coprocessor C
Fragment‐based sequentialPartial reconfiguration
PowerPC
Heterogeneity
•ASIC Blocks•Block RAMs
EmbeddedSoftware
RuntimeReconfiguration
•Frame‐based•Controller
RuntimeReconfiguration
•Frame‐based•Controller
T2
T1
T3
Task Ti beingImplemented
Virginia Tech Symposium on Wireless Communications
12
Reconfiguration Framework Reconfiguration Framework Xilinx Virtex 4 ArchitectureXilinx Virtex 4 Architecture
Unit of reconfiguration is a frame.
A frame includes both logic and routing
Frame Bits are inserted externally or stored in memory
Number of frames determines reconfiguration delay
Virtex 4 FPGA
16 CLBs
1 CLB has 22/23 vertical frames
Frames
6/4/2009
Dynamically Reconfigurable Architectures
Coarse‐grain Fine‐grain
NEC DRP, IPFLEX DAPDNA, Morphosys, Piperench, ……
Device granularity
Altera Stratix
Bit granularity
Xilinx XC6200, Triscend
Columnar granularity
Xilinx Virtex‐II
Block granularity
Xilinx Virtex‐4 (V‐5 ?)
6/4/2009 13Virginia Tech Symposium on Wireless
Communications
Software Defined Radios Application
• Composed of – A set of processors (mostly DSP)
– Co‐processors for computationally intensive computation such as filters, modulation, decoding/encoding, etc.
– Integrated with data and multimedia applications
6/4/2009Virginia Tech Symposium on Wireless
Communications14
Reconfigurable Architectures for SDR
• Programmability and flexibility is the necessity– Multi‐processor based
– Coarse‐grain reconfigurable architectures
– Hardware Acceleration
• Dynamic reconfiguration and heterogeneity enable adaptation to application and environment– Brings another dimension of design complexity
6/4/2009Virginia Tech Symposium on Wireless
Communications15
Reconfigurable Physical Layer for SDR
• SDR base stations provide the capability of assimilating different communication technologies (e.g., in disaster scenarios).
• FPGAs because of their ultimate flexibility and also DSP computation power are important platforms for SDR base stations .
Internet
SDR (PHY)
WiFi B
GPRS linkWiFi link
U1 U2
GPRS
ISM Band
Up/
Dow
nC
onve
rter
6/4/2009 16Virginia Tech Symposium on Wireless
Communications
Support for Reconfigurable Computing
• Architectural Support
• Runtime configuration
• Design Automation and Tool Support
• Hardware/software Co‐design environment
6/4/2009Virginia Tech Symposium on Wireless
Communications17
Overview of the tutorial
• System design tools for dynamically reconfigurable architectures
• Application‐dependent techniques for hardware adaptation
• Challenges for software defined radios on reconfigurable architectures
• Highlights of our contributions
6/4/2009 18Virginia Tech Symposium on Wireless
Communications
Reconfiguration‐awareDesign Tools
Layout Generation
System Synthesis
Application to SDR
Simple Codesign FlowApplicationSpecification
SW
Partitioning (+ Scheduling)
HW
Partial Dynamic Reconfiguration (RTR)
Layout and configuration
6/4/2009 20Virginia Tech Symposium on Wireless
Communications
P HWM
Communication
….
Reconfiguration Technique
Virginia Tech Symposium on Wireless Communications
21
FIR
Div
Mult
FFT
Mult FIR
Div
Mult
FFT
MultRAM
RAM
Mult
Maximize ReuseMaximize Reuse
Place common blocks in same locations and do not
reconfigure them
Design 1 Design 2
6/4/2009
Floorplan for Reconfiguration
Virginia Tech Symposium on Wireless Communications
22
reconfigure to
Case I
Case II
Case III
m1
m3
m2
m4
m7
m5 m6
m1 m3m2
m14
m4m5
m6
m7 m35
m16
m4 m7m6
m5
m26 m37m14
Design 1 Design 2Reconfigured and reusedregions on the chip
m1
m3
m2
Independent Floorplan.
Minimum Reuse.
Dependent Floorplan.
Limited Reuse.
Combined Floorplan.
Maximum Reuse.
6/4/2009
Maximum Reuse of Components
• Reconfigured Region= A1 + A2 – A1,2 – Areused
• Maximize the overlap of static (Areused) and non static (A1,2) regions to save reconfiguration bits
• Find floorplan such that the components could be placed in locations with these objectives along with minimum area and wirelength
Virginia Tech Symposium on Wireless Communications
23
A2-A1,2
A1-A1,2
A1,2-Areused
Areused
A1
A2
6/4/2009
Problem Statement
• Given sequence of k designs for reconfiguration, find a placement that– Optimizes area, wirelength and congestion for all the designs
– Maximize overlap of non‐fixed blocks and fixed blocks in the designs.
Virginia Tech Symposium on Wireless Communications
246/4/2009
Research Contributions
• Develop a floorplanner that maximizes reuse of frames by isolating components (RAW 2006)
• Multi‐layer floorplanning for handling multiple designs simultaneously (FPL 2006)*received the Best Paper Award.
Virginia Tech Symposium on Wireless Communications
256/4/2009
Floorplanner Overview
Virginia Tech Symposium on Wireless Communications
26
Generate Sequence Pair
Find Floorplan
Evaluate Total Cost(Area, Wirelength,
Congestion and Frames)
Floorplan Accepted?
Apply Moves(Swap, Orientation,
Whitespaceand Matching)
No
Yes
Simulated Annealing
Final Floorplan
6/4/2009
Applications
Floorplanner for Partial Reconfiguration (FFPR)
• Simulated Annealing• Design Representation Model
– Sequence Pair• Moves
– Orientation, Block Swap Moves– Whitespace Allocation Moves– Matching Moves
• Cost Function– Wirelength and Total Area– Total Congestion– Total Frames
Virginia Tech Symposium on Wireless Communications
276/4/2009
Whitespace Requirement
Virginia Tech Symposium on Wireless Communications
28
Fixed Fixed
Whitespace AllocationWhitespace Allocation
6/4/2009
FFPR Moves ‐Whitespace Allocation
• Empty space is added in the design to allow global wires to pass without crossing modules.
• The four offset parameters, n, s, e, w, are changed during simulated annealing iterations.
Virginia Tech Symposium on Wireless Communications
29
w
n
Block
s
e
whitespace
Floorplan with block offsets6/4/2009
Research Contributions
• Develop a floorplanner that maximizes reuse of frames by isolating components (RAW 2006)
• Multi‐layer floorplanning for handling multiple designs simultaneously (FPL 2006)
Virginia Tech Symposium on Wireless Communications
306/4/2009
Representation: Sequence Pair
• Sequence Pair consists of two sequences of blocks in the design. <(A, B, C), (B, A, C)>
• Placement of blocks follow the relationship{(.., a, .., b, ..), (.., a, .., b, ..)} => a is left of b{(.., a, .., b, ..), (.., b, .., a, ..)} => a is top of b
• Each block has one and only one relationship with the other blocks.
• O(n2) Time to calculate the placement
Virginia Tech Symposium on Wireless Communications
316/4/2009
Working of Sequence Pair
• Sequence Pair <(1,3,2), (2,1,3)>
Virginia Tech Symposium on Wireless Communications
32
1 3Horizontal Graph
(left‐right property)
2
1Vertical Graph
(top‐bottom property)
3
2
X = 0 X = 2 X = 0
Y = 0
Y = 2
Y = 2
6/4/2009
Working of Sequence Pair
• Sequence Pair <(1,3,2), (2,1,3)>
Virginia Tech Symposium on Wireless Communications
33
2
31
1 3
2
1
3
2
X=0 X=2 X=0
Y=0
Y=2
Y=2
6/4/2009
Multi Layer Sequence Pairs
• One sequence pair for all designs• Fixed Blocks occur only once• {(.., a, .., b, ..), (.., a, .., b, ..)} => a is left of b
if a and b belong to same design• {(.., a, .., b, ..), (.., b, .., a, ..)} => a is top of b
if a and b belong to same design
Virginia Tech Symposium on Wireless Communications
346/4/2009
Working of Multi‐Layer Sequence Pair
Virginia Tech Symposium on Wireless Communications
35
Design 1: 1,2,3,4,5Design 2: 3,4,5,6,7,8
Sequence Pair: (<3, 2, 7, 8, 5, 4, 1, 6 >,< 8, 1, 3, 5, 6, 7, 2, 4 >)
3
5
4
2
1
Horizontal Graph (using left‐right property)
X=2, W=2
8
7
6
X=0, W=2 X=0, W=2
X=2, W=2
X=0, W=2
X=2, W=2
X=4, W=1
X=4, W=1
6/4/2009
Working of Multi‐Layer Sequence Pair
Virginia Tech Symposium on Wireless Communications
36
3
5
4
2
1
Vertical Graph (using top‐bottom property)
Y=2, H=2
8
7
6
Y=2, H=2Y=0, H=2
Y=4, H=1
Y=0, H=2
Y=4, H=1
Y=3, H=2
Y=0, H=3
Design 1: 1,2,3,4,5Design 2: 3,4,5,6,7,8
Sequence Pair: (<3, 2, 7, 8, 5, 4, 1, 6 >,< 8, 1, 3, 5, 6, 7, 2, 4 >)
6/4/2009
Working of Multi‐Layer Sequence Pair
Virginia Tech Symposium on Wireless Communications
37
Reused moduleReconfigured modulesOverlapped/Reconfiguredmodules
1,8
3 5
42,7
6
Final Placement
Design 1: 1,2,3,4,5Design 2: 3,4,5,6,7,8
Sequence Pair: (<3, 2, 7, 8, 5, 4, 1, 6 >,< 8, 1, 3, 5, 6, 7, 2, 4 >)
6/4/2009
Properties of Multi‐Layer Sequence Pair
• Theorems: – All possible overlapping floorplans can be represented using multi‐layer sequence pair.
– Any multi‐layer sequence pair represents a valid overlapping floorplan.
– Of all the floorplans defined using a given multi‐layer sequence pair, the longest path algorithm gives area‐minimal floorplan for that sequence pair.
Virginia Tech Symposium on Wireless Communications
386/4/2009
Experiment Results• Effect of finding common components• Could result in more than 4.8 times savings in frames.
Virginia Tech Symposium on Wireless Communications
39
Number of frames with and without common placementPair Common
PlacementNo Common Placement
P1 962 1753
P2 2078 2978
P3 1994 2975
P4 3414 7662
P5 886 4329
Average 1869 2929
Reuse reduces Reconfiguration Delay
6/4/2009
Dependent Floorplan: Direction
Virginia Tech Symposium on Wireless Communications
40
Direction of floorplanning could affect the savings in frames.
Number of Reconfiguration Frames using Opposite Flows in Dependent Mode
Pair Design 1 -> 2 Design 2 -> 1
P1 962 698
P2 2078 821
P3 1994 1491
P4 3414 Unrouteable
P5 886 Unrouteable
Dependent Mode leads to Infeasibility and Limited Reuse
6/4/2009
Design A Design BTiming Constraint (ns)
CombinedFloorplan(seconds)
Dependent Floorplan(B A)
Timing Constraint (ns)
Combined Floorplan(seconds)
Dependent Floorplan(A B)
6.5 3476 X 5.5 X X
7 1662 X 6 525 X
7.5 1627 5250 6.5 479 X
8 1568 2410 7 469 1318
8.5 1636 2017 7.5 460 853
Reconfiguration‐aware Floorplanner
Reconfigurable Physical Layer for SDR
• SDR base stations provide the capability of assimilating different communication technologies (e.g., in disaster scenarios).
• FPGAs because of their ultimate flexibility and also DSP computation power are important platforms for SDR base stations .
Internet
SDR (PHY)
WiFi B
GPRS linkWiFi link
U1 U2
GPRS
ISM Band
Up/
Dow
nC
onve
rter
6/4/2009 42Virginia Tech Symposium on Wireless
Communications
Software Defined Radio Components on FPGA
• FFT Cores– data processing of 200 Mega‐samples/sec and transform lengths between 16 and 16,384 points
• Viterbi Decoder/encoder• decoding rates of 199 MSPS for a single channel and 273 MSPS for multi‐channel designs
• Serial vs. parallel implementations
• Turbo Decoder/encoder
• CORDIC, MAC, FIR, etc.
6/4/2009Virginia Tech Symposium on Wireless
Communications43
Control PlaneInformation PlaneData Plane
MediumAccess
Controller
ConfigurationManager
Static FPGAConfigurationGeneration
Implementation
Library
Sequence of configuratio
n
Configuration Controller
PowerPC
WiMAX
WiFi
GPRS
ConfigurationMemory
WiMAXDriver
GPRSDriver
WiFiDriver
Radio Front End
Data Sources NetworkMobilityManager
Medium Access Planner
CommunicationSoftware Server
Communication Software LibraryMobile Device
Profiles
SDR Hardware Platform and Network Applications Off-SDR Storage andComputing Server
HW/SW Cross‐LayerAdaptation
6/4/2009 44Virginia Tech Symposium on Wireless
Communications
Joint NSF projectWith Prof. Luke Bao
Reconfiguration Aware Physical Layer Planning in SDR
• While today FPGAs provide dynamic partial reconfiguration capability, their reconfiguration time overhead is a deterrence considering QoS required for real time traffic like VoIP
• Sequence of reconfiguration of communication protocols affects total reconfiguration time overhead thus the result of Sequence Generation Problem (SGP)can find the right sequence to minimize total reconfiguration time
• SGP can be embedded in a floorplanner to generate Sequence Aware Floorplans which systematically reduces reconfiguration time overhead
6/4/2009 45Virginia Tech Symposium on Wireless
Communications
Model of Reconfiguration
FPGA
Ti
TjCij
6/4/2009 46Virginia Tech Symposium on Wireless
Communications
Sequence Matters
T1
T2
T3
C23
C13
C12
T4
C14C34
T1‐>T2‐>T4‐>T3‐>T1
Reconfiguration Cost:
C12 + 0 + C34 + C13
T1‐>T2‐>T3‐>T4‐>T1
Reconfiguration Cost:
C12 + C23 + C34 + C14
<
6/4/2009 47Virginia Tech Symposium on Wireless
Communications
Problem Modeling
T1
T2
T3
C23
C13
C12
T4
C14C34
V1 V2
V3 V4
C12C12
C13C13 C24C24
C14C14
C23C23
6/4/2009 48Virginia Tech Symposium on Wireless
Communications
Multiple Implementations per Design
C12,21C12,21C11,22C11,22 C11,21
C11,21
C12,22
C12,22
V11 V12 A
V22 V21 B
I11
I21
I12
I22
C12,21
C11,22 C12,22
C11,21
6/4/2009 49Virginia Tech Symposium on Wireless
Communications
Parallel Implementation
V11A
B
V11,2
V2
V3
V12
I11
I12
I2
I3
C3,12
C3,11
C23
C2,12
6/4/2009 50Virginia Tech Symposium on Wireless
Communications
Sequence‐Aware Floorplanner ‐Methodology
Multi‐Layer SP Generation
Initialize Temperature
Layout‐Compaction
Random Moves
Input Netlist1, Netlist2, …, Netlistn
Sequence Generation Problem
Cost Calculation
Move Evaluation
Move Accepted?
YES
Update Floorplan
Store the Best Floorplan
Cool Down Temp.NO
6/4/2009 51Virginia Tech Symposium on Wireless
Communications
Partial Reconfiguration Scheme
Number of
Frames
Delay (ms)
Phase 1 Phase 2 Phase 3
1 1.7 0.104 0.033
2 3.2 0.218 0.065
MicroBlaze
ICAP
BRAM
SysAce Controller
Compact Flash
OPB_Timer
Local Memory
LMB
6/4/2009 52Virginia Tech Symposium on Wireless
Communications
Protocol Configuration Statistics
Wireless Protocols Design Size (slices)
Similarity to A (%)
Similarity to B (%)
Similarity to C (%)
Similarity to D (%)
Protocol A (802.16a) 20640 100 67 21 33
Protocol B (802.11a) 20160 67 100 25 40
Protocol C (WCDMA) 14240 21 25 100 45
Protocol D (CDMA) 12640 33 40 45 100
6/4/2009 53Virginia Tech Symposium on Wireless
Communications
Experiments Description
• Experiment 1: Comparing Best/Worst sequence and cost for the floorplanner while we do not consider configuration cost in the cost function
• Experiment 2: Comparing the two approaches where we try to find the best sequence once it is floorplanned Vs. when we floorplan the designs
• Comparing the Best/Worst sequence within the floorplanner
6/4/2009 54Virginia Tech Symposium on Wireless
Communications
Experiment 1
(Cost in ms)
Device (row x column) (slices)
Custom FPGA (100 x 320)
LX80(112 x 320)
LX100(128 x 384)
LX160(176 x 384)
Best Seq/Cost ABCD/26.9 ABCD/35.3 ABCD/6.7 ABCD/2.9
Worst Seq/ Cost ACBD/53.7 ACBD/67 ACBD/14.7 ACBD/5.9
6/4/2009 55Virginia Tech Symposium on Wireless
Communications
Experiment 2
(Cost in ms)
Device (row x column) (slices)
Custom FPGA (100 x 320)
LX80(112 x 320)
LX100(128 x 384)
LX160(176 x 384)
System-level ABCD/26.9 ABCD/35.3 ABCD/6.7 ABCD/2.9
Floorplan-level ABCD/0 ABCD/0 ABCD/0 ABCD/0
6/4/2009 56Virginia Tech Symposium on Wireless
Communications
Experiment 3
(Cost in ms)
Device (row x column) (slices)
Custom FPGA (100 x 320)
LX80(112 x 320)
LX100(128 x 384)
LX160(176 x 384)
Best Seq/Cost ABCD/0 ABCD/0 ABCD/0 ABCD/0
Worst Seq/ Cost ABDC/25.2 ABDC/9.0 ABDC/0 ABCD/0
Worst Seq/ Cost in Solution Space ACBD/159.5 ACBD/159.1 ACBD/175.9 ACBD/100.8
6/4/2009 57Virginia Tech Symposium on Wireless
Communications
Example of Best and Worst Solution
I3
I1
I2
A
I3 I1I2 B
I3
I1
I2
C
I3 I1I2 D
30.24ms
87.36ms
178.08ms
50.4ms
174.7
2ms
184.8ms
137.7
6ms
164.64ms
6/4/2009 58Virginia Tech Symposium on Wireless
Communications
Control PlaneInformation PlaneData Plane
MediumAccess
Controller
ConfigurationManager
Static FPGAConfigurationGeneration
Implementation
Library
Sequence of configuratio
n
Configuration Controller
PowerPC
WiMAX
WiFi
GPRS
ConfigurationMemory
WiMAXDriver
GPRSDriver
WiFiDriver
Radio Front End
Data Sources NetworkMobilityManager
Medium Access Planner
CommunicationSoftware Server
Communication Software LibraryMobile Device
Profiles
SDR Hardware Platform and Network Applications Off-SDR Storage andComputing Server
HW/SW Cross‐LayerAdaptation
6/4/2009 59Virginia Tech Symposium on Wireless
Communications
Reconfigurable Physical Layer
SDR Reconfigurable Hardware
DLL/MAC
Upper Layers (Net/Trans/App)
WiMAXDriver
GPRSDriver
WiFiDriver
Data Sources
PHY Radio Frontend
WiMAXModem
GPRSModem
WiFiModem
6/4/2009 60Virginia Tech Symposium on Wireless
Communications
Seamless Sequence of Software Defined Radio Designs through Hardware Reconfigurability of FPGAs
Reconfiguration‐aware Time slot allocation per protocol
Reconfiguration Overhead at the physical layer to switch between the protocolscan lead to missing packets for processing.
6/4/2009 61Virginia Tech Symposium on Wireless
Communications
Real Time Task Scheduling
6/4/2009 62Virginia Tech Symposium on Wireless
Communications
Network flow based model for time slot allocation
6/4/2009 63Virginia Tech Symposium on Wireless
Communications
Case I
6/4/2009 64Virginia Tech Symposium on Wireless
Communications
Case II
6/4/2009 65Virginia Tech Symposium on Wireless
Communications
Case III
6/4/2009 66Virginia Tech Symposium on Wireless
Communications
reconfiguration overhead between protocols
to Wi-Fi to WiMax to GPRS to WCDMA
from Wi-Fi 0 .38 .25 1.07from WiMax .29 0 .54 1.06from GPRS 1.4 1.73 0 1.08from WCDMA 2.15 2.24 1.01 0
6/4/2009 67Virginia Tech Symposium on Wireless
Communications
New Scheduling EDFScenario a (19 tasks) 16 11Scenario b (30 tasks) 24 17Scenario c (34 tasks) 35 27Scenario d (20 tasks) 20 11Scenario e (36 tasks) 27 17
Best Effort Comparison between our scheduler and EDF algorithm
6/4/2009 68Virginia Tech Symposium on Wireless
Communications
New Scheduling EDFScenario a 1.4 4.69Scenario b 1.94 10.24Scenario c 2.32 10.34Scenario d 0.79 7.17Scenario e 3.08 23.26
Reconfiguration Overhead comparison between our scheduler and EDF
6/4/2009 69Virginia Tech Symposium on Wireless
Communications
Adaptive‐aware Synthesis for Reconfigurable Systems
Reconfigurable MPSoCs
• Reconfigurability for self‐adaptive systems to respond to unsupervised events – Adaptivity to enhance performance/power (e.g. communication and DSP applications)
– Response to embedded sensor networks• Reconfigurability for self‐healing systems
– Reliability concerns due to transient errors, thermal runways, process variation, etc.
• Requirements– Architectural support for dynamic and static adaptivity/healing management
– Early design planning with system‐level CAD tools
Virginia Tech Symposium on Wireless Communications 716/4/2009
Dynamic Architecture Model (Virtex‐II like)
CLB
Off‐chip memory
Width
Height
Task Ti
On‐chip shared memory
Tj
Frame
Computation
Memory +Communication
Key Concerns in Commercial Architectures with partial RTR
Off‐chip memory
On‐chip shared memory
Column‐based partial RTR
Reconfiguration delay for convolution task (@100 MHz)greater than task execution time (for 256X256 image) !!
Sequential reconfiguration
Placement constraints
Exactly one task reconfigured in a single time‐instant
Significant reconfiguration delay
Delay‐hiding techniques such as configuration prefetch,configuration re‐use, etc
Criticality of linear placement: simple example
Infeasible
T1T3
T2
C1 C2 C3 C4
Execution tim
e
Width
T4
t2
T1 T3
T4
T2
t2
T1T3
T2
C1 C2 C3 C4
Execution tim
e
Width
T4
Feasible 6/4/2009 74
Virginia Tech Symposium on Wireless Communications
Modified Codesign Flow with partial RTR
ApplicationSpecification
SW
Partitioning + Scheduling+ Block‐level placement
Placed HWI/f
P HWM
Communication
….
6/4/2009 75Virginia Tech Symposium on Wireless
Communications
EST computation: Example
Task HW time
SW time
HW area
1 5 23 32 2 9 33 2 11 24 3 14 15 2 10 26 3 7 4
Time C1 C2 C5C4C3 C6 Proc
1
2
6
7
8
9
10
5
3
4
E1
E2
R3
R4
E3E4
R5
E5
P6
C65
Gap
T1 T2
T3T4
T5
T6
EXECUTE Task 1
RECONFIG Task 5
HW‐SW comm (6,5)
PREFETCH gap
Task 6 on SW
6/4/2009 76Virginia Tech Symposium on Wireless
Communications
Problem Overview (contd)
T1
T2
Task chain
2T1
1T1
3T1
2T2
1T2
T1
2T2
1T2
Determine number of instances of each task
Key challenges: Physical (placement), architectural constraints
Maximize application performance by selecting parallelism granularity for individual data‐parallel tasks
Determine workload of each task instanceGranularity
Key IssuesReconfiguration overhead
Width
E1
Time
Sequential execution
E14E1
3E12E1
1
Width
Time
“Ideal” parallel execution
Ideal gain
Load balancing
E12E1
1
Width
Time R1
2
R14
R13
Reduced gain
E13
Execution with reconfigoverhead
E14
E12E1
1
Width
Time R1
2
R14
R13
Target gain
E13
“Load‐balanced” Execution Simple equationsfor single task
Key issues: Precedence constraints
T1
T2
2T1
1T1
3T1
2T2
1T2
Width
Time E1
E2
R2 E11
Width
Time
R12
E12 R2
1
R22
E22 E2
1
E11
Width
Time
R12
E12
R21
R22
E22E2
1
R13
E13
2T2
1T2
1T1
2T1
Case A Case BTask chain
Experiments: JPEG encoding
RGB2YCrCB_1 RGB2YCrCB_2
DCT
Quantize
Huffman
Colour image
Compressed Image
256X256/less area
RGB2YCrCB
DCT_1
Quantize
Huffman
Colour image
Compressed Image
DCT_2
256X256/more area
RGB2YCrCB_1
DCT_1
Quantize_1
Huffman
Colour image
Compressed Image
DCT_2
512X512/more area
Quantize_2
RGB2YCrCB_2
Problem Overview: Exploiting limited bandwidth
Memory
Memory controller
High‐performance shared bus
OFF‐CHIP
Platform
FPG
A
Local Memory
STATICALLYCONFIGURED
RUN‐TIME(RE)CONFIGUREDTask‐j
Task‐i
PPC Other Static
System Architecture for on‐the‐fly computing
6/4/2009 81Virginia Tech Symposium on Wireless
Communications
Problem Overview: Exploiting limited bandwidth
Platform
FPG
A
Memory
Memory controller
High‐performance shared bus
OFF‐CHIP
STATICALLYCONFIGURED
RUN‐TIME(RE)CONFIGURED
Task‐jTask‐i
PPCOther Static
System Architecture for on‐the‐fly computing
Bandwidth key system resourceTask execution time depends on bandwidth availability
Problem Objective: Minimize application execution time with limited bandwidth
6/4/2009 82Virginia Tech Symposium on Wireless
Communications
Task Microarchitecture
Shared
Com
mun
ication Med
ium
Interface logic
Mem
ory Access Logic
Interface Clock Domain Task Clock Domain
Rx Buffer
TxBuffer
(Data PREFETCH) (Task CORE)
Core Logic
6/4/2009 83Virginia Tech Symposium on Wireless
Communications
Theoretical principles for single task
Width
Time
E11
E13
R13
R12
E12
L1
BW = 3*BW_1 Constraint = 2.5*BW_1
L2Width
Time
E11
E13
R13
R12
E12
Width
Time
E11
E13
R13
R12
E12
L3
EqualBandwidth
Width
Time
E11
E13
R13
R12
E12
L4
DifferentBandwidth
Lemma:Assigning remaining bandwidth to lastinstance results in fastest execution
HalfFrequency
6/4/2009 84Virginia Tech Symposium on Wireless
Communications
Experimental results for unsharp masking (512 X 512)
Bandwidth = 630 MB/sSchedule length = 18.8 ms
RGB2YCbCr
Blur
Sub
Add
YCbCr2RGB
Colour Image
Filtered Image
140
110
210
210
140
Constraint = 600 MB/s
Schedule length = 16.25 ms
RGB2YCbCr_1
Blur
Sub
Add
YCbCr2RGB_1
Colour Image
Filtered Image
140
110
197
197
140
RGB2YCbCr_2
YCbCr2RGB_2
60
60
6/4/2009 85Virginia Tech Symposium on Wireless
Communications
Challenges in System Design Tools
• Lack of design automation tools that are aware of hardware/software adaptation in the system
• No feedback and back‐end tool awareness to application layer– Not aware of reconfiguration challenges at physical layer
• Need to develop automation tools to guide the designers both at physical layer and higher levels to converge and close the loop for effective adaptive systems
6/4/2009 86Virginia Tech Symposium on Wireless
Communications
StaticComputationStaticComputation
Adaptivity‐driven System SynthesisAdaptivityAdaptivity‐‐driven System Synthesisdriven System Synthesis
Dynamic SystemConfiguration
Manager
Dynamic SystemConfiguration
Manager
p1
timer DMA
FFT
mem
Sensing and monitoringSensing and monitoring
p2
Turbo decoding
Configuration
Input (image size, frame, packet)
Constraints(power, QoS, throughput)
System SynthesisAnd
Layout Planning
System SynthesisAnd
Layout Planning
Application Specifications(Task graphs)
Configuration Generation
Configuration Generation
System ConstraintGeneration
System ConstraintGeneration
Configuration Sequence Predication/GenerationConfiguration Sequence Predication/Generation
6/4/2009 87Virginia Tech Symposium on Wireless
Communications
Supported by NSF CAREER grant
Ongoing and Future Work
• System Synthesis and Layout planning to exploit hardware reconfiguration for further adaptation to system constraints.
• Integrating concepts such as resource upgradability and configuration sequences in the design tools.
• Provide feedback from layout synthesis to application layers for effective adaptive systems
• Use Software defined radio as target application
6/4/2009 88Virginia Tech Symposium on Wireless
Communications
Acknowledgment
• NSF– CAREER
– ECCS‐IHCS (with Prof. :Luke Bao)
• Majority of work done by PhD students– Sudarshan Banerjee (PhD 2007)
– Love Singhal (PhD 2009)
– Hessam Kooti (PhD student)
6/4/2009 89Virginia Tech Symposium on Wireless
Communications
Thank you!Any Comments?
Email: [email protected]
http://www.ics.uci.edu/~eli