Download - Nikhil Pre
-
7/29/2019 Nikhil Pre
1/48
By
NIKHIL SURYANARAYANAN
-
7/29/2019 Nikhil Pre
2/48
Outline
Motivation
Eigen Value Decomposition &Applications
Exact Jacobi Parallel Decomposition using Systolic
Array
Optimization of Systolic Array Interconnect optimized Systolic Array
Conclusion
-
7/29/2019 Nikhil Pre
3/48
Motivation
Required in various fields
High Performance and Real time applicationsdemands Hardware implementation
SDMA Communication
Realizing optimal architectures with respect tospeed and power for respective applications
-
7/29/2019 Nikhil Pre
4/48
Eigen Value Decomposition
Angle of Arrival Estimation
Face Detection
Image Compression
Eigen Beam-forming
Signal Subspace Estimation PCA
MUSIC & ESPRIT
-
7/29/2019 Nikhil Pre
5/48
EVD Methods
Exact Jacobi
Systolic Array
Approximate Jacobi
Algebraic Method (only for 3x3 matrix)
-
7/29/2019 Nikhil Pre
6/48
Eigen Value Decomposition
(EVD) Special case of Singular Value Decomposition(SVD) where the Matrix is Square-Symmetric
Consider a Matrix ARmxn
SVD: A = UDVT
EVD: A = UDUT
where,
DRmxn is diagonal matrix,URmxn & VRmxn are orthogonal
-
7/29/2019 Nikhil Pre
7/48
CORDIC
COordinate Rotation DIgital Computer
Set of Shift Add Algorithms forcomputing Sine, Cosine, Arc,
Hyperbolic, Coordinate Rotation etc Eliminates complex computations
Single Shift-Add Multiplier, ROM/RAMfor lookup & Basic Logic gates
Hardware friendly
Iterative Algorithm
-
7/29/2019 Nikhil Pre
8/48
Loop Unrolling
-
7/29/2019 Nikhil Pre
9/48
CORDIC Modules
ArcTan Module
Used to compute the
tan-1
/ angle forconstructing the Jacobi
Rotation Matrix
Sine/Cosine Module
cos sin
-sin cos
2x2 matrix is constructedusing the angle from the
ArcTan module
-
7/29/2019 Nikhil Pre
10/48
Exact Jacobi
Aims at annihilating the off diagonalelements using a series of orthogonaltransformations
A(k+1) = JTpqA(k) Jpq,
where A(0)=A
Jpq is called the Jacobi RotationDefined by the parameter (c s, -s c)
-
7/29/2019 Nikhil Pre
11/48
Exact Jacobi
A=UDVT
UTAV=D After niterations,
Ai+1=JiTAiJi
Repeating for allpossible pairs, A canbe effectivelydiagonalized
1......0......0......0
. . . .
0......c......s......0 p. . . .
0......-s......c......0 q
. . . .
0......0......0......1
p q
-
7/29/2019 Nikhil Pre
12/48
Limitations of Exact Jacobi
Implementation
Jacobi iterations are serial
Inability to derive parallelism as iterations have
large inter-loop Data Dependency
Inability to pipeline
Every iteration involves transfer of 4N-4 matrix
elements to the processor Even though it is MATRIX operation,
parallelism cannot be derived
-
7/29/2019 Nikhil Pre
13/48
How to parallelize?
Systolic Array
Solve 2x2 EVD sub problems
For a matrix of size N we have N/2xN/2
EVD sub problems If N=6; possible sets are
{ (1,2), (3,4) }
{ (1,3), (2,4) }
{ (1,4), (2,3) }
Parallel Reordering
-
7/29/2019 Nikhil Pre
14/48
Systolic Array for EVD
PE PE PE
PE PE PE
PE PE PE
-
7/29/2019 Nikhil Pre
15/48
Structure of PE
CORDICATAN
CORDICROT
REG REG
-
7/29/2019 Nikhil Pre
16/48
Data Exchange Sequence
in in
in
in
PEij
-
7/29/2019 Nikhil Pre
17/48
Data Exchange PE11
in in
in in
-
7/29/2019 Nikhil Pre
18/48
Data Exchange PE1j
in in
inin
-
7/29/2019 Nikhil Pre
19/48
Data Exchange PEi1
in in
inin
-
7/29/2019 Nikhil Pre
20/48
in in
inin
Data Exchange PEij
-
7/29/2019 Nikhil Pre
21/48
Timing & Data Exchange
-
7/29/2019 Nikhil Pre
22/48
Array Cycle = 1
-
7/29/2019 Nikhil Pre
23/48
Array Cycle = 1
-
7/29/2019 Nikhil Pre
24/48
Array Cycle = 1
DATA
EXCHANGE
-
7/29/2019 Nikhil Pre
25/48
Array Cycle = 1DATA
EXCHANGE
-
7/29/2019 Nikhil Pre
26/48
Array Cycle = 1DATA
EXCHANGE
-
7/29/2019 Nikhil Pre
27/48
Array Cycle = 1
DATA
EXCHANGE
-
7/29/2019 Nikhil Pre
28/48
Array Cycle = 1DATA
EXCHANGE
-
7/29/2019 Nikhil Pre
29/48
Array Cycle = 1DATA
EXCHANGE
-
7/29/2019 Nikhil Pre
30/48
Array Cycle = 1
DATA
EXCHANGE
-
7/29/2019 Nikhil Pre
31/48
Staggered Processing?
Not realistic to broadcast row and column angles
in real time
ij is the distance of the processor Pij from the
diagonalAlso Pij needs data from neighbors Pi+-1,j+-1 (1< i,
j < n/2)
Can be made faster by allowing off-diagonalPE to allow execution as soon as thediagonal PE produce angles
-
7/29/2019 Nikhil Pre
32/48
Optimizations
Improves the utilization time for each PE from 1/3 rd to 2/3 rd
CYCLE 2
CYCLE 1
-
7/29/2019 Nikhil Pre
33/48
Comparisons. Matrix 8x8
EXACT JACOBI SYSTOLIC ARRAY
Iterations forConvergence 3
Additions 3500
Multiplications 7000
Swaps/Exchange 0
Slower
Iterations forConvergence 22-25
Additions 1500 (less
than half)
Multiplications 3000
Swaps/Exchange = 368
Faster
-
7/29/2019 Nikhil Pre
34/48
Optimized Architecture
In the final Stages of Analyzing a simpler
Systolic Architecture
Matrix size=4x4
1 2
PE
PE
PE
PE
-
7/29/2019 Nikhil Pre
35/48
GOALS
Achieved:
Pipelined Jacobi Architecture
S/W Implementation of Systolic Array
Simultaneous execution of off diagonal PE to
improve timing and reduce idle time
Optimized Systolic Array architecture forminimum swaps and angle transmission
-
7/29/2019 Nikhil Pre
36/48
References
Andraka, Ray, Survey of CORDIC algorithms for FPGA based computers, ACM 1998
A Novel Implementation of CORDIC Algorithm Using Backward Angle Recoding (BAR), Yu Hen Hu & Homer H.M. Chern,IEEE Transactions on Computers, December 1996
Parallel Eigen Value Decomposition for Toeplitz and Related Matrices, Yu Hen Hu, IEEE Transactions-1989
Kim Y, Kim Y, Doyle James, A Low Power CMOS CORDIC Processor Design for Wireless Telecommunications, IEEE2007
Hemkumar N, Masters Thesis, Rice University
Yang Liu et al, Hardware Efficient Architectures for Eigen Value Computation;, EDA 2006
ASIC Implementation of Autocorrelation and CORDIC algorithm for OFDM based WLAN, Sudhakar Reddy & RamchandraReddy, European Journal of Scientific Research, 2009
Advanced Algorithmic Evaluation for Imaging, Communication and Audio Applications Eigenvalue Decomposition usingCATAPULT C Algorithmic Synthesis Methodology
Efficient Implementation of SVD on a Reconfigurable System, Christophe Bobda, Klaus Danne and Andre Linarth,Springer-Verlag Berlin Heidelberg 2003
Hardware Implementation of Smart Antenna Systems, H. Wang and M. Glesner, Adv in Radio Sciences 2006
Spectral Estimation using MUSIC Algorithm, Jawed Qumar, Nios II Embedded Processor Design Contest-2005
Hardware Efficient Architectures for Eigen Value Computation, Yang Liu, Christis-Savvas Bouganis, Peter Y.K. Cheung,Philip H.W. Leong, Stephen J. Motley, EDAA 2006
A Novel Fast Eigenvalue Decomposition based on Cyclic Jacobi Rotation and its application in eigen-beamforming, TechReport of IEICE-Japan
Efficient Hardware Architectures for Eigenvector and Signal Subspace Estimation, Fan Xu & Alan Wilson, IEEETransactions on Circuits & Systems-204
16 BIT CORDIC Rotator for High Sped Wireless LAN, Koushik Maharatna, Alfonso Troya, Swapna Banerjee, EckhardGrass, Milos Krstic, IEEE Transactions-2004
Survey of CORDIC Algorithms for FPGA Based computers, Ray Andraka, ACM-1998
Smart Antennas for Wireless Communications, Frank B Gross, Mc-Graw Hill,2005 ( Used forFacts & References forComparison purposes and Specifications of Different wireless standards)
-
7/29/2019 Nikhil Pre
37/48
-
7/29/2019 Nikhil Pre
38/48
-
7/29/2019 Nikhil Pre
39/48
Ei V l d Ei
-
7/29/2019 Nikhil Pre
40/48
Eigen Value and Eigen
Vector The non zero vector of any linear
transformation when applied to the vector
changes the magnitude but not the direction is
an Eigen Vector The scalar value associated with this vector is
called the Eigen Value
Ax=x
A is the transformation, x is the Eigen vector &
is the corresponding Eigen Value
-
7/29/2019 Nikhil Pre
41/48
CORDIC contd
Convergence depends on number of iterations
Unrolled for Systolic and Pipeline
implementations
Iterative architecture unsuitable for FPGA Pipelined preferred as less complex H/W &
operates at data rate
Registers present on Logic cells in FPGAssupport pipelining better
Runs at 52 MHz on XC4013E-2 [1]
CORDIC It ti E ti
-
7/29/2019 Nikhil Pre
42/48
CORDIC Iteration Equations
Givens rotation transformation
x = x cos y sin
y = y cos + x sin
The iteration equations are given as
xi+1 = xi yi. di. 2-i
yi+1
= yi
+ xi
. di
. 2-i
zi+1 = zi di. tan-1(2-i)
-
7/29/2019 Nikhil Pre
43/48
CORDIC Algorithms
for i=1:n
x1= x y * d * (2^-(i-1)) ;
y1= y + x * d * (2^-(i-1)) ;
angle = angle d * (W(i)); // W(i) is the ith ROM reference
if (angle==0)d=0;
elseif (angle>0)
d=1;
else
d=-1;
endx=x1;
y=y1;
end
-
7/29/2019 Nikhil Pre
44/48
Exact Jacobi Algorithm
for j=1:n-1for i=n:-1:j+1
J = jacobi_rot( A( i, i), A( j, j), A( i, j) );
A( [ i, j], :)=J'*A( [ i, j], : );A( : , [ i, j])=A( :, [ i, j] )*J;
end
end
REPEAT for n iterations for accuracy
-
7/29/2019 Nikhil Pre
45/48
EVD contd
Also UTU=I and VTV=I
The Diagonal matrix contains Eigen
Values along its diagonals
U are the left Singular Vectors & V are
right singular vectors
U = {u1,u2,,un}
V = {v1,v2,,vn}
-
7/29/2019 Nikhil Pre
46/48
Diagonal Processor
-
7/29/2019 Nikhil Pre
47/48
Sub-Diagonal Processor
-
7/29/2019 Nikhil Pre
48/48
Super-Diagonal Processor