r. arce-nazario, m. jimenez, and d. rodriguez electrical and computer engineering university of...
TRANSCRIPT
![Page 1: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649f135503460f94c2744e/html5/thumbnails/1.jpg)
R. Arce-Nazario, M. Jimenez, and D. RodriguezElectrical and Computer EngineeringUniversity of Puerto Rico – Mayagüez
WALSAIP
![Page 2: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649f135503460f94c2744e/html5/thumbnails/2.jpg)
2
Motivation and ObjectiveMotivation and Objective
Discrete Signal Transforms (DSTs)DFT, DCT, lots of applications
Hardware accelerated but at high area cost
Distributed (dedicated) hardware architectures (DHAs)Cost-effective
Partitioning plays key role
Objective: Use inherent properties of DSTs to improve their hardware partitioning to distributed hardware architectures.
DST Partitioning
DHA
![Page 3: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649f135503460f94c2744e/html5/thumbnails/3.jpg)
3
Previous WorkPrevious Work
Automated partitioning of DST to DHA’sDSTs treated as any other algorithm/benchmark [Srinivasan01][Bringmann00]Converted to high-level or structural DFG and treated as such.
Manual partitioning & automated code generationDST specific properties exploited [Kumhom01]New formulations developed to exploit architectural features. [VanLoan92]SPIRAL and FFTW – code generation platforms exploring the space of equivalent algorithms. ([Pueschel05], [Frigo05])
[Arce05] – Automated partitioning methodology that incorporates DST features and formulation exploration
![Page 4: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649f135503460f94c2744e/html5/thumbnails/4.jpg)
4
Partitioning Methodology Partitioning Methodology
KPA DSTFormulation
ArchitecturalDescription
FormulationManipulator
FormulationTo DFG
Heuristic Control
Partition/Placement
Estimators
High-level partition solution
KPAFormulation
DFG
Cost andIndicators
RuleSelection
KPAFormulation
HypergraphRepresentation
![Page 5: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649f135503460f94c2744e/html5/thumbnails/5.jpg)
5
DSTs – General Concepts DSTs – General Concepts
),()..,(],..,[..],..,[ 11111
1
ddddnn
d knknnnxkkXd
General formula for d-dimensional DST
Essentially a vector-matrix multiplication
Fast versions exists, using divide and conquer techniquesHighly regular
Highly connected
Rules can be applied at formulation level: permutation,index-set..
α’s determine type of transform, e.g. DFT: iii Nknjiii ekn /2),(
( ) ( )( ) ( )8 2 4 1 2 2 2 0 4 2 8F F I T I F I T I F R Ä Ä Ä Ä
8R ( )4 2I FÄ ( )( )2 2 2 0I F I TÄ Ä ( )2 4 1F I TÄ
![Page 6: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649f135503460f94c2744e/html5/thumbnails/6.jpg)
6
Kronecker Algebra Kronecker Algebra
4444 FFF x Ä)()( 242,4248 FITIFF ÄÄ
84242,4248 )()( PFITIFF ÄÄ
F4
F2 W
W
F2 W
W
F2 W
W
F2 W
WF4
![Page 7: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649f135503460f94c2744e/html5/thumbnails/7.jpg)
7
Target topologyTarget topology
Similar to existing platforms in market and academia.Annapolis Micro Systems (Wildforce)Gidel (PROC20KE)Berkeley Emulation Engine (BEE) – being proposed as a cost effective alternative to traditional high performance computing systems.
M0
D0
M1
D1
Mk-1
Dk-1
Crossbar
![Page 8: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649f135503460f94c2744e/html5/thumbnails/8.jpg)
8
Partitioning Methodology Partitioning Methodology
KPA DSTFormulation
ArchitecturalDescription
FormulationManipulator
FormulationTo DFG
Heuristic Control
Partition/Placement
Estimators
High-level partition solution
KPAFormulation
DFG
Cost andIndicators
RuleSelection
KPAFormulation
HypergraphRepresentation
![Page 9: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649f135503460f94c2744e/html5/thumbnails/9.jpg)
9
DST properties in our methodologyDST properties in our methodology
Incorporated graph considerations to partitioning/placement process
Exploration of equivalent formulations
Partition/Placement
FormulationManipulator
FormulationTo DFG
Heuristic Control
Partition/Placement
Estimators
KPAFormulation
DFG
Cost andIndicators
RuleSelection
![Page 10: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649f135503460f94c2744e/html5/thumbnails/10.jpg)
10
Graph partitioning considerationsGraph partitioning considerations
Focus on horizontal partitioning schemes (SIMD-like implementation)
Initial solution = balanced horizontal linear partitioning
scheduling consideration: swap nodes from same computational stages.
M0
D0
M1
D1
Mk-1
Dk-1
Crossbar
Kernigan Lin - bipartitioning Heterogeneous channel k-way partitioning
![Page 11: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649f135503460f94c2744e/html5/thumbnails/11.jpg)
11
Formulation explorationFormulation exploration
( ) ( ), ,n p m n p p m n pF F I T I F P Ä ÄFormulationManipulator
FormulationTo DFG
Heuristic Control
Partition/Placement
KPAFormulation
DFG
Cost andIndicators
RuleSelection
FormulationManipulator
Applies permutation and factorization to Kronecker formulation of DSTs to obtain equivalent formulations
Rule
Number of possible reformulations grows exponentially with DST size
Heuristic control method, first answer questions:Do reformulations have an effect on solution quality?How can we effectively explore the equivalent formulation space to find more apt formulations?
Experiments Gain an understanding of algorithmic level effects on solution quality and convergence.
( ) ( )8 2 16,8 8 2 16,8F I T I F PÄ Ä
( ) ( )( )( )( )2 4 8,2 2 4 8,2 2
16,8 8 2 16,8
F I T I F P I
T I F P
Ä Ä Ä
Ä
![Page 12: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649f135503460f94c2744e/html5/thumbnails/12.jpg)
12
Measuring quality of solutionMeasuring quality of solution
0 1 1, , , mCost where
‘weight’ of channel iii i WR
required communications through i
D0
D1
D2
D3
D0
D1
D2
D3
4,4 4, ,8Cost
Example: W01 = W12 = W23 = 1, WXBAR = 2
![Page 13: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649f135503460f94c2744e/html5/thumbnails/13.jpg)
13
Experiment #1 – Inter-stage permutationsExperiment #1 – Inter-stage permutations
Since Cooley-Tukey’s FFT several common formulations available.( ) ( )( ) ( )8 2 4 1 2 2 2 0 4 2 8F F I T I F I T I F R Ä Ä Ä Ä Pease formulation here
Experiment – several sizes of 5 common formulations where partitioned.
ISP have effect on solution quality, yet no clear winner formulation.
StockahmTr. Stockahm
Cooley-TukeyG. Sande
Pease
![Page 14: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649f135503460f94c2744e/html5/thumbnails/14.jpg)
14
Experiment #2 - GranularityExperiment #2 - Granularity
The weight of the nodes for the various computational stages of the transform.
F4F4 F4F4
F4F4
F4F4
F4F4
F4F4
F4F4
F4F4
F2F2
F2F2
F4F4
F4F4
F4F4
F4F4
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
164 4 4 4 4 4 4( ) ( )F F I T I F P Ä Ä 16
422422244444 )))()(()(( PFITIFIIFF ÄÄÄÄ
coarser finer
![Page 15: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649f135503460f94c2744e/html5/thumbnails/15.jpg)
15
Experiment #2 – GranularityExperiment #2 – Granularity
Decomposition rules: Large DST = combinations of smaller DSTs analogous to node clustering
* Multiple formulations achieved best cost. Coarsest granularity is shown.
Size Cost Formulation Cost Formulation Cost Formulation Cost Formulation32 11 2/2/2/4* 7 2/2/2/4 32 8/2/2* 16 2/4/2/264 22 8/2/4* 14 2/2/8* 48 2/2/2/2/4 20 4/2/2/4
128 43 8/2/8* 26 16/2/2/2* 92 2/2/2/2/2/4 32 2/2/2/2/2/4256 86 4/2/32* 55 16/8/2* 132 4/2/2/2/2/4 58 2/2/2/2/2/2/4512 171 4/2/64* 106 64/4/2* 276 2/2/2/2/2/2/4/2 116 2/2/2/2/2/2/8
Array 4 Ring 4 Array 8 Ring 8
Effect of topology: Ring vs. Linear: 57% cost reductionFinest granularity not necessarily best.
( ) ( ) ( ) ( ) ( ) ( ) ( )( )( )8 4 2 8,4 4 2 8,4 2 4 8,2 2 4 8,2 2 4 8,2 2 2 2 4,2 2 2 4,2 8,2F F I T I F P F I T I F P F I T I F I T I F P P Ä Ä Ä Ä Ä Ä Ä Ä
![Page 16: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649f135503460f94c2744e/html5/thumbnails/16.jpg)
16
Experiment #3 – Breakdown strategyExperiment #3 – Breakdown strategy
Breakdown strategy – order and divisors with which a transform is decomposed.
Split trees – a common graphical representation of break. Strategy
Example: Two split tress for a DFT size 64.
( ) ( )( )( ) ( )64 4 2 8,4 4 2 8,4 8 64,8 8 8 64,8F F I T I F P I T I F P Ä Ä Ä Ä
( )64 2 32 64,2F F I T Ä ( ) ( )( )( )2 2 16 16,2 2 16 16,2 64,2I F I T I F P PÄ Ä Ä
(a)
(b)
6
3 3
2 1
6
1 5
41
(a) (b)
![Page 17: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649f135503460f94c2744e/html5/thumbnails/17.jpg)
17
Experiment #3 – Results Experiment #3 – Results
ProcedureExhaustive generation of split trees for DFT sizes n=16 to 256.
Formulations partitioned for various topologies
Observation of split tree decisions that lead to ‘partition friendly’ formulations
Generation of n > 256 formulations using rules.
![Page 18: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649f135503460f94c2744e/html5/thumbnails/18.jpg)
18
Conclusions and Future WorkConclusions and Future WorkMethodology for partitioning of DST to DHAs:
DST graph considerations Formulation exploration
Graph considerationsGeneration of initial partition linear – provides better results than random.Limitation of node moves – faster convergence time.
Exploration at the algorithmic level experimentsIsolated features such as permutations and granularity
Effect was evidenced, but hard to establish a relation to solution quality.Coarse granularity = better convergence, good solution quality
Breakdown strategy – ‘partition friendly’ formulations generated.
Current Work: Experimentation with DCTs.Experimentation with other properties define overall exploration strategy
![Page 19: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649f135503460f94c2744e/html5/thumbnails/19.jpg)
19
AcknowledgementsAcknowledgements
Puerto Rico Experimental Program to Stimulate Competitive
Research (PR-EPSCoR)
WALSAIP - Wide-Area Large Scale Automated Information Project
Puerto Rico NASA Space Grant
QUESTIONS?