reconfigurable antenna processing with matrix ... · title: microsoft powerpoint - 01.1-001 fitton...
TRANSCRIPT
© 2004 Altera Corporation
Reconfigurable Antenna Processing with Matrix Decomposition Using FPGA-Based Application-Specific Integrated Processors
Reconfigurable Antenna Processing with Matrix Decomposition Using FPGA-Based Application-Specific Integrated Processors
Mike Fitton Rob Jackson Steve PerryMike Fitton Rob Jackson Steve Perry
Altera European Technology Centre
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
2© 2004 Altera Corporation - Confidential
AgendaAgendaMotivationApplication-Specific Integrated Processors (ASIPs)QR Decomposition & QRD-RLS− Introduction− Systolic Array
ASIP Implementation of QRD-RLS− Architecture− Results
SDR AspectsSummary
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
3© 2004 Altera Corporation - Confidential
Motivation: System ConceptMotivation: System ConceptReconfiguration for −SDR: Different Standards/Specifications−Optimization: Adapt to the Wireless Environment−Multiple Input Multiple Output Space Time
Coding Adaptive AntennasProvide a Methodology to Give: −Reconfiguration−Cost/Performance Trade-off−Usability: Present a Software-Like Interface
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
4© 2004 Altera Corporation - Confidential
System RequirementSystem Requirement
Slower than System
Requirem
ent
Accessing the Optimal PointAccessing the Optimal Point
Cost
Execution Time
Optimal Point
Hardware CostDesign Cost
Processor(Too Slow)
Flat RTL Design(Too Big)
with ASIP
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
5© 2004 Altera Corporation - Confidential
AgendaAgendaMotivationApplication-Specific Integrated Processors (ASIPs)QR Decomposition & QRD-RLS− Introduction− Systolic Array
ASIP Implementation of QRD-RLS− Architecture− Results
SDR AspectsSummary
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
6© 2004 Altera Corporation - Confidential
ASIP ApproachASIP ApproachBuild Application-Specific Processors− Easy to Think of Them as Custom Processors
Application-Specific Integrated Processor Has − Flexible Number of Functional Units− Connections Between Function Units as Defined
by the Algorithm− Program to Allow Flexible (& Dynamic) Control− Software-Based Design Flow
Software Process Brings Productivity vs. RTL
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
7© 2004 Altera Corporation - Confidential
Example ASIPExample ASIP
ALU2
CORDIC
Prog CounterProg Counter
ALU1 RF2
Multiple Function UnitsPartitioned Register FilesSparse, Irregular ConnectivityTypically Very High fmax
Control Complexity in Software
ProgramMicrocodeProgram
Microcode
RF1 *
*
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
8© 2004 Altera Corporation - Confidential
CO
RD
IC
CO
RD
ICC
OR
DIC
CO
RD
ICC
OR
DIC
QRD Processor
ALUALU
uCodeuCode
Hierarchical CompositionHierarchical CompositionProcessors That Are Built Can Be Used As Functional Units Within Another ProcessorUsers Can Add Functional Units to Their Processors
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
9© 2004 Altera Corporation - Confidential
AgendaAgendaMotivationApplication-Specific Integrated Processors (ASIPs)QR Decomposition & QRD-RLS− Introduction− Systolic Array
ASIP Implementation of QRD-RLS− Architecture− Results
SDR AspectsSummary
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
10© 2004 Altera Corporation - Confidential
QR Decomposition OverviewQR Decomposition Overview
=
2
1
0
2
1
0
987654321
zzz
www Given Observation X, Desired Output z,
Solve for Optimum Coefficients w
=
−−−
−−−−
−
2
1
0
2
1
0
0008.19.0011108
4.03.08.08.03.05.04.09.01.0
zzz
www Decompose X Into QR, Where
Q Is Unitary (Q QT=1) & R Is Upper-Triangular
−−−−−
=
−−−
2
1
0
2
1
0
4.08.04.03.03.09.08.05.01.0
0008.19.0011108
zzz
www Solve for w By Rearranging Q
to RHS, Where Q-1 = QT
Z’=QT ZUse Backsubstitution to Solve for w
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
11© 2004 Altera Corporation - Confidential
Adaptive Filter TechniquesAdaptive Filter TechniquesTrack Variations in the Data to Give Optimum Filter CoefficientsLeast Mean Squares (LMS) Algorithms− Advantage: Simple to Implement− Disadvantage: Relatively Slow Convergence Time
Least Squares (LS) Algorithms− Advantage: Faster Rate of Convergence Over LMS− Disadvantage: Computationally Intensive
Can Exhibit Divergence, Due to Finite Wordlength
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
12© 2004 Altera Corporation - Confidential
Adaptive Filter Techniques (Cont.)Adaptive Filter Techniques (Cont.)Recursive Least Squares (RLS) Is LS Algorithm− QR Decomposition (QRD) Can Be Used to
Implement RLSSquare-Root Formulation of RLSAdvantage: Numerically Stable as Operates on Incoming Data not Time-Averaged Correlation Matrix of Input As Used in Standard RLSUses Parallel Processing Elements Suited for Hardware Implementation
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
13© 2004 Altera Corporation - Confidential
Systolic Array ArchitectureSystolic Array ArchitectureUse Systolic Array to Perform QR Decomposition− Efficient Hardware Implementation
Array of Parallel Processing CellsInputs & Outputs Fed Into Different Columns of Array2 Types of Processing Cells− Boundary Cell− Internal Cell
Result in Each Cell Forms R Matrix & Z’Matrix Values
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
14© 2004 Altera Corporation - Confidential
Systolic Array Architecture (Cont.)Systolic Array Architecture (Cont.)
R11 R12
R22
R13
R33
R23
R14
R44
z’1
z’2R24
R34 z’3
z’4
Inputs Output
X1(0) 0 0 0 0X1(1) X2(0) 0 0 0X1(2) X2(1) X3(0) 0 0X1(3) X2(2) X3(1) X4(0) 0X1(4) X2(3) X3(2) X4(1) z(0)
Boundary Cell
Internal Cell
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
15© 2004 Altera Corporation - Confidential
AgendaAgendaMotivationApplication-Specific Integrated Processors (ASIPs)QR Decomposition & QRD-RLS− Introduction− Systolic Array
ASIP Implementation of QRD-RLS− Architecture− Results
SDR AspectsSummary
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
16© 2004 Altera Corporation - Confidential
Method of Mapping Nodes to Limited Hardware Resources− One Performs Only
Boundary Cell Operations− Others Perform Internal
Cell Operations− Allows Optimizations
of Processors− Minimum Amount of Memory Required
Other Resource-Sharing Techniques Possible
Discrete MappingDiscrete Mapping
R34
R44R44
Z2 Z2
Z3 Z3
Z4 Z4
R33R33
R12 R13 R13 R14 R14 Z1 Z1
R22R22 R23 R24 R24
R33R33
R11R11
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
17© 2004 Altera Corporation - Confidential
R34
R44R44
Z2 Z2
Z3 Z3
Z4 Z4
R33R33
Redraw Position of Nodes to Allow Mapping Onto Sequential Processors
Discrete MappingDiscrete MappingR34 R44R44
Z2 Z2 Z3 Z3 Z4 Z4 R33R33
R12 R13 R13 R14 R14 Z1 Z1
R22R22 R23 R24 R24
R33R33
R11R11
Method of Mapping Nodes to Limited Hardware Resources− One Performs Only
Boundary Cell Operations− Others Perform Internal
Cell Operations− Allows Optimizations
of Processors− Minimum Amount of Memory Required
Other Resource-Sharing Techniques Possible
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
18© 2004 Altera Corporation - Confidential
Method of Mapping Nodes to Limited Hardware Resources− One Performs Only
Boundary Cell Operations− Others Perform Internal
Cell Operations− Allows Optimizations
of Processors− Minimum Amount of Memory Required
Other Resource-Sharing Techniques Possible
Discrete MappingDiscrete Mapping
Nodes are Divided Into Time SlotsWhen They WillBe Processed
Time Slot 1
Time
Slot 3
Time
Slot 0
Time
Slot 2
Time Slot 4
Proc2Proc1 Proc3
Nodes Within One Time Slot Are Mapped Onto Processors
R34 R44R44
Z2 Z2 Z3 Z3 Z4 Z4 R33R33
R12 R13 R13 R14 R14 Z1 Z1
R22R22 R23 R24 R24
R33R33
R11R11
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
19© 2004 Altera Corporation - Confidential
CORDIC as Processor Cell for Real InputsCORDIC as Processor Cell for Real Inputs
R11 R12
R22
R13
R33
R23
z’1
z’2
z’3
Өi Өi Өi
X1 X3
Rotate Mode
Өi
(R,Xi)
(R’,X’o)
ӨoutCORDIC
Xi
R
X’o
Internal Cell
Өin
ӨoutCORDIC
Xi
R 0
Өout
(R,Xi)
(R’,0)
Vectorize Mode
Boundary Cell
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
20© 2004 Altera Corporation - Confidential
Complex Operations With CORDICComplex Operations With CORDIC
øoutCORDIC 1
CORDIC 3
Real(Xout) Imag(Xout)
Real(Xin) Imag(Xin)
øin
Өin
Өout
CORDIC 2
Real(R) Imag(R)
Internal Cell(Rotate Mode)
CORDIC 1
Real(Xin) Imag(Xin)
øout
ӨoutCORDIC 2
Real(R)
Boundary Cell(Vectorize Mode)
|Xin|
Time Share Processors Between NodesTime Share Within Processor (e.g., Use a Single CORDIC)
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
21© 2004 Altera Corporation - Confidential
Mixed Cartesian/Polar ProcessingMixed Cartesian/Polar Processing
Different Implementation of Node Processor− Transparent to Higher-Level Architecture
Exploit Hard Multipliers in Internal Cells
Real(Xout) Imag(Xout)
Real(Xin) Imag(Xin)
Internal Cell(Rotate Mode)
Real(Xin) Imag(Xin)
øout
Өout
Real(R)
Boundary Cell(Vectorize Mode)
|Xin|sin (real)
sin (imag)
cos
sin/cos
CORDIC 1
CORDIC 2
Complex MultiplyComplex Multiply
Real(R)
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
22© 2004 Altera Corporation - Confidential
Implementation of Back SubstitutionImplementation of Back SubstitutionSoft Nios® II Processor: − Flexibility in Changing R Matrix Size− 5,623 Cycles for 20 Coefficients
Dedicated Hardware− Faster than Processor Solution− 728 Cycles for 20 Coefficients
Application-Specific Integrated Processor− Can Optimize for Speed or Size− Between 460 & 1,144 Cycles for 20 Coefficients
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
23© 2004 Altera Corporation - Confidential
Results & Resource Utilization (i)Results & Resource Utilization (i)20 Coefficients, 18 bit, with 2,048 IterationsFull CORDIC Implementation Shown Here − Multipliers to Remove CORDIC Scaling
0
10
20
30
40
50
60
70
0 2 4 6 8
Calculation time (ms)
Res
ourc
es
LEsMultipliers
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
24© 2004 Altera Corporation - Confidential
Results & Resource Utilization (ii)Results & Resource Utilization (ii)Similar Top-Level Architecture With Mixed Polar & Cartesian ProcessingHigher Multiplier Utilisation, Less Logic − Design Space Exploration Allows Speed & Size Trade-Off
0
10
20
30
40
50
60
0 2 4 6 8
Calculation time (ms)
Res
ourc
es
LEsMultipliers
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
25© 2004 Altera Corporation - Confidential
AgendaAgendaMotivationApplication-Specific Integrated Processors (ASIPs)QR Decomposition & QRD-RLS− Introduction− Systolic Array
ASIP Implementation of QRD-RLS− Architecture− Results
SDR AspectsSummary
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
26© 2004 Altera Corporation - Confidential
Reconfiguration & ASIPsReconfiguration & ASIPsASIPs Use a Program − Sequential Instruction-Level Parallelism− Can be Quickly & Easily Reprogrammed by Changing
the Contents of Program MemoryConventional Hardware Description − Behavior Encoded in a Number of Parallel Processes− Reconfiguration Requires new FPGA Image - Literally
Rewiring the DeviceASIPs Can Combine Flexibility of DSPs with Processing Power of FPGAsMigrate to Structured ASIC (e.g. Altera Hardcopy Structured ASIC ) − Does Not Compromise the Reconfigurability
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
27© 2004 Altera Corporation - Confidential
SDR Aspects SDR Aspects Different Levels of ReconfigurationParameterization− Dynamically Change Update Rate, Size, Adaptation, etc.− Change the Cost-Performance Trade-Off
Algorithmic Reconfiguration− MIMO for Multipath-Rich Channels & High Data Rates− Space-Time Coding for Robustness− Adaptive Beamforming to Increase Signal-to-Noise Ratio
Application Reconfiguration− QRD for Polynomial-Based Digital Predistortion− QRD for Channel Estimation− QRD for Antenna Beamforming
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
28© 2004 Altera Corporation - Confidential
AgendaAgendaMotivationApplication Specific Integrated Processors (ASIPs)QR Decomposition & QRD-RLS− Introduction− Systolic Array
ASIP Implementation of QRD-RLS− Architecture− Results
SDR AspectsSummary
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved
29© 2004 Altera Corporation - Confidential
SummarySummaryMethodology for Constructing Customized Application-Specific Processors on FPGA − Used Here for QRD-Decomposition-Based RLS− Appropriate Algorithm for Many Wireless ApplicationsExploit ASIP Methodology− Encapsulation & Abstraction− Time Multiplexing on Different LevelsEfficient, Scalable Design ProducedStraightforward to Reconfigure & ParameterizeASIPs Remain Reprogrammable if FPGA Design is Converted to ASIC (e.g., HardCopy Structured ASIC)
Proceeding of the SDR 04 Technical Conference and Product Exposition. Copyright © 2004 SDR Forum. All Rights Reserved