novel algorithms in the memory management of multi-dimensional signal processing florin balasa...
TRANSCRIPT
![Page 1: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/1.jpg)
Novel Algorithms in the Memory Management Novel Algorithms in the Memory Management
of Multi-Dimensional Signal Processingof Multi-Dimensional Signal Processing
Florin BalasaFlorin Balasa
UniversityUniversity of Illinois at Chicagoof Illinois at Chicago
![Page 2: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/2.jpg)
OutlineOutline
The importance of memory managementThe importance of memory management
in multi-dimensional signal processingin multi-dimensional signal processing A lattice-based frameworkA lattice-based framework The computation of the minimum dataThe computation of the minimum data
memory size memory size Optimization of the dynamic energy consumptionOptimization of the dynamic energy consumption
in a hierarchical memory subsystem in a hierarchical memory subsystem
Mapping multi-dimensional signalsMapping multi-dimensional signals
into hierarchical memory organizationsinto hierarchical memory organizations Future research directionsFuture research directions ConclusionsConclusions
![Page 3: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/3.jpg)
Memory management Memory management for signal processing applicationsfor signal processing applications
Real-time multi-dimensional signal processing systemsReal-time multi-dimensional signal processing systems
(video and image processing, telecommunications,(video and image processing, telecommunications, audio and speech coding, medical imaging, etc.)audio and speech coding, medical imaging, etc.)
data transfer and data storagedata transfer and data storage
system performancesystem performancepower consumptionpower consumption
chip areachip area
The designer must focusThe designer must focus on the exploration of on the exploration of the memory subsystemthe memory subsystem
![Page 4: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/4.jpg)
In the early years of high-level synthesisIn the early years of high-level synthesis
memory management tasks tackled at memory management tasks tackled at scalar levelscalar level
Algebraic techniques Algebraic techniques (similar to those used in modern compilers) (similar to those used in modern compilers)
Memory management Memory management for signal processing applicationsfor signal processing applications
register-transfer level (RTL) algorithmic specificationsregister-transfer level (RTL) algorithmic specifications
More recentlyMore recently
memory management tasks at memory management tasks at non-scalar levelnon-scalar level
high-level algorithmic specificationshigh-level algorithmic specifications
![Page 5: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/5.jpg)
Affine algorithmicAffine algorithmic
specificationsspecifications
T[0] = 0;T[0] = 0;for ( j=16; j<=512; j++ ) {for ( j=16; j<=512; j++ ) { S[0][j-16][0] = 0;S[0][j-16][0] = 0; for ( k=0; k<=8; k++ ) for ( k=0; k<=8; k++ ) for (i=j-16; i<=j+16; i++ )for (i=j-16; i<=j+16; i++ ) S[0][j-16][33*k+i-j+17] = S[0][j-16][33*k+i-j+16] + A[4][j] – A[k][i];S[0][j-16][33*k+i-j+17] = S[0][j-16][33*k+i-j+16] + A[4][j] – A[k][i]; T[j-15] = S[0][j-16][297] + T[j-16];T[j-15] = S[0][j-16][297] + T[j-16];}}out = T[497];out = T[497];
Memory management Memory management for signal processing applicationsfor signal processing applications
Loop-organized algorithmic specificationLoop-organized algorithmic specification
Main data structures: multi-dimensional arraysMain data structures: multi-dimensional arrays
![Page 6: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/6.jpg)
A Lattice-Based FrameworkA Lattice-Based Framework
… … A [2i+3j+1] [5i+j+3] [4i+6j+2]A [2i+3j+1] [5i+j+3] [4i+6j+2] … …
jj
ii
x=2i+3j+1x=2i+3j+1
y=5i+j+3y=5i+j+3
z=4i+6j+2z=4i+6j+2A[x][y][z]A[x][y][z]
Iterator spaceIterator space Index spaceIndex space
for (i=0; i<=4; i++)for (i=0; i<=4; i++)
for (j=0; j <= 2i && j <= -i+6; j++)for (j=0; j <= 2i && j <= -i+6; j++)
![Page 7: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/7.jpg)
A Lattice-Based FrameworkA Lattice-Based Framework
22 3355 1144 66
11
3322
xx
yyzz
ii
jj++==
Iterator spaceIterator spaceIndex spaceIndex spaceaffineaffine
mappingmapping
0 <= i <= 4 , 0 <= j <=2i, j <= -i+6 0 <= i <= 4 , 0 <= j <=2i, j <= -i+6
for (i=0; i<=4; i++)for (i=0; i<=4; i++)
for (j=0; j <= 2i && j <= -i+6; j++)for (j=0; j <= 2i && j <= -i+6; j++)
… … A [2i+3j+1] [5i+j+3] [4i+6j+2]A [2i+3j+1] [5i+j+3] [4i+6j+2] … …
![Page 8: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/8.jpg)
A Lattice-Based FrameworkA Lattice-Based Framework
Any array reference can be modeledAny array reference can be modeled as a linearly bounded lattice (LBL)as a linearly bounded lattice (LBL)
LBL = { LBL = { xx = = TT··ii + + uu | | AA··ii >= >= bb } }
Iterator spaceIterator space
- - scope of nested loops, andscope of nested loops, and
- iterator-dependent conditionsiterator-dependent conditions
Affine mappingAffine mapping
PolytopePolytopeLBLLBLaffineaffine
mappingmapping
![Page 9: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/9.jpg)
A Lattice-Based FrameworkA Lattice-Based Framework
for (i=0; i<=4; i++)for (i=0; i<=4; i++)
for (j=0; j <= 2i && j <= -i+6; j++)for (j=0; j <= 2i && j <= -i+6; j++)
How many memory locations are necessaryHow many memory locations are necessary to store the array reference to store the array reference A [2i+3j+1] [5i+j+3] [4i+6j+2] A [2i+3j+1] [5i+j+3] [4i+6j+2]
… … A [2i+3j+1] [5i+j+3] [4i+6j+2]A [2i+3j+1] [5i+j+3] [4i+6j+2] … …
![Page 10: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/10.jpg)
A Lattice-Based FrameworkA Lattice-Based Framework
The storage requirement of an array referenceThe storage requirement of an array reference is the size of its index space (i.e., a lattice !!)is the size of its index space (i.e., a lattice !!)
LBL = { LBL = { xx = = TT··ii + + uu | | AA··ii >= >= bb } }
f : f : ZZn n ZZmm f(f(ii) = ) = TT··ii + + uu
Is function f a one-to-one mapping ??Is function f a one-to-one mapping ??
Size(index space) = Size(iterator space)Size(index space) = Size(iterator space)
If YESIf YES
![Page 11: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/11.jpg)
A Lattice-Based FrameworkA Lattice-Based Framework
Computation of the size of an integer polytopeComputation of the size of an integer polytope
for (i=0; i<=4; i++)for (i=0; i<=4; i++)
for (j=0; j <= 2i && j <= -i+6; j++)for (j=0; j <= 2i && j <= -i+6; j++)
… … A [2i+3j+1] [5i+j+3] [4i+6j+2] A [2i+3j+1] [5i+j+3] [4i+6j+2]
11
22
11
00
Step 1Step 1
Find the vertices of the iterator spaceFind the vertices of the iterator spaceand their supporting polyhedral conesand their supporting polyhedral cones
C(VC(V11) = { r) = { r11 , r , r22 } = } =
![Page 12: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/12.jpg)
A Lattice-Based FrameworkA Lattice-Based Framework
Computation of the size of an integer polytope (cont’d)Computation of the size of an integer polytope (cont’d)
11
22
00
-1-1
Step 2Step 2
C(VC(V11) = + ) = +
Decompose the supporting cones into unimodular conesDecompose the supporting cones into unimodular cones(Barvinok’s decomposition algorithm) (Barvinok’s decomposition algorithm)
00
11
11
00++
Step 3Step 3 Find the generating function of each supporting coneFind the generating function of each supporting cone
F(VF(V11) = + ) = + (1-xy(1-xy22) (1-y) (1-y-1-1))
11
(1-y) (1-x)(1-y) (1-x)
11++
Step 4Step 4 Find the number of monomials in the generating functionFind the number of monomials in the generating functionof the whole polytope F = F(Vof the whole polytope F = F(V11) + F(V) + F(V22) + …) + …
![Page 13: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/13.jpg)
Affine algorithmicAffine algorithmic
specificationsspecifications
T[0] = 0;T[0] = 0;for ( j=16; j<=512; j++ ) {for ( j=16; j<=512; j++ ) { S[0][j-16][0] = 0;S[0][j-16][0] = 0; for ( k=0; k<=8; k++ ) for ( k=0; k<=8; k++ ) for (i=j-16; i<=j+16; i++ )for (i=j-16; i<=j+16; i++ ) S[0][j-16][33*k+i-j+17] = S[0][j-16][33*k+i-j+16] + A[4][j] – A[k][i];S[0][j-16][33*k+i-j+17] = S[0][j-16][33*k+i-j+16] + A[4][j] – A[k][i]; T[j-15] = S[0][j-16][297] + T[j-16];T[j-15] = S[0][j-16][297] + T[j-16];}}out = T[497];out = T[497];
The Memory Size Computation ProblemThe Memory Size Computation Problem
What is the minimum data storage necessaryWhat is the minimum data storage necessary to execute an algorithm (affine specification)to execute an algorithm (affine specification)
Any scalar signal must be stored only during its lifetimeAny scalar signal must be stored only during its lifetime
Signals having disjoint lifetimes can share the same locationSignals having disjoint lifetimes can share the same location
![Page 14: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/14.jpg)
Affine algorithmicAffine algorithmic
specificationsspecifications
T[0] = 0;T[0] = 0;for ( j=16; j<=512; j++ ) {for ( j=16; j<=512; j++ ) { S[0][j-16][0] = 0;S[0][j-16][0] = 0; for ( k=0; k<=8; k++ ) for ( k=0; k<=8; k++ ) for (i=j-16; i<=j+16; i++ )for (i=j-16; i<=j+16; i++ ) S[0][j-16][33*k+i-j+17] = S[0][j-16][33*k+i-j+16] + A[4][j] – A[k][i];S[0][j-16][33*k+i-j+17] = S[0][j-16][33*k+i-j+16] + A[4][j] – A[k][i]; T[j-15] = S[0][j-16][297] + T[j-16];T[j-15] = S[0][j-16][297] + T[j-16];}}out = T[497];out = T[497];
The Memory Size Computation ProblemThe Memory Size Computation Problem
The number of scalars (array elements): 153,366The number of scalars (array elements): 153,366The minimum data storage storage: 4,763The minimum data storage storage: 4,763
All the previous works proposed estimation techniques !All the previous works proposed estimation techniques !
![Page 15: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/15.jpg)
for ( j=0; j<n ; j++ )for ( j=0; j<n ; j++ ){ A [ j ] [ 0 ] = in0;{ A [ j ] [ 0 ] = in0; for ( i=0; i<n ; i++ )for ( i=0; i<n ; i++ ) A [ j ] [ i+1 ] = A [ j ] [ i ] + 1;A [ j ] [ i+1 ] = A [ j ] [ i ] + 1;}}
for ( i=0; i<n ; i++ )for ( i=0; i<n ; i++ ){ alpha [ i ] = A [ i ] [ n+i ] ;{ alpha [ i ] = A [ i ] [ n+i ] ; for ( j=0; j<n ; j++ )for ( j=0; j<n ; j++ ) A [ j ] [ n+i+1 ] = A [ j ] [ n+i+1 ] = j < i ? A [ j ] [ n+i ] :j < i ? A [ j ] [ n+i ] : alpha [ i ] + A [ j ] [ n+i ] ;alpha [ i ] + A [ j ] [ n+i ] ;}}for ( j=0; j<n ; j++ ) B [ j ] = A [ j ] [ 2*n ];for ( j=0; j<n ; j++ ) B [ j ] = A [ j ] [ 2*n ];
# define n 6# define n 6
The Memory Size Computation ProblemThe Memory Size Computation Problem
![Page 16: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/16.jpg)
Decompose the LBL’s of the array refs.Decompose the LBL’s of the array refs.into disjoint latticesinto disjoint lattices
LBLLBL11 LBLLBL22UU
LBLLBL
LBLLBL11 = { = { xx = = TT11··ii11 + + uu11 | | AA11··ii11 >= >= bb11 } }
LBLLBL22 = { = { xx = = TT22··ii22 + + uu22 | | AA22··ii22 >= >= bb22 } }
TT11··ii11 + + uu11 = = TT22··ii22 + + uu22
Diophantine system of eqs.Diophantine system of eqs.
{ { AA11··ii11 >= >= bb11 , , AA22··ii22 >= >= bb22 } }
New polytopeNew polytope
The Memory Size Computation ProblemThe Memory Size Computation Problem
![Page 17: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/17.jpg)
Decomposition of the array references of signal A Decomposition of the array references of signal A (illustrative example)(illustrative example)
The Memory Size Computation ProblemThe Memory Size Computation Problem
![Page 18: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/18.jpg)
Memory Size Computation AlgorithmMemory Size Computation Algorithm
Step 1Step 1 For every indexed signal in the algorithmic specification,For every indexed signal in the algorithmic specification,decompose the array references in decompose the array references in disjoint latticesdisjoint lattices
Step 2Step 2 Based on the lattice lifetime analysis, find the memoryBased on the lattice lifetime analysis, find the memorysize at the boundaries between the blocks of codesize at the boundaries between the blocks of code
Step 3Step 3 Analyzing the amounts of signals produced and consumedAnalyzing the amounts of signals produced and consumedIn each block, prune the blocks of code where theIn each block, prune the blocks of code where themaximum storage cannot happenmaximum storage cannot happen
Step 4Step 4 For each of the remaining blocks, compute the maximumFor each of the remaining blocks, compute the maximummemory sizememory size
computing the maximum iterator vectors of the scalarscomputing the maximum iterator vectors of the scalars
exploiting the one-to-one mapping property of array referencesexploiting the one-to-one mapping property of array references
![Page 19: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/19.jpg)
Memory trace for an SVD updating algorithmMemory trace for an SVD updating algorithm
![Page 20: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/20.jpg)
Memory trace for a 2-D Gaussian blur filter algorithmMemory trace for a 2-D Gaussian blur filter algorithm
![Page 21: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/21.jpg)
for ( i = 0; i < 95 ; i++ )for ( i = 0; i < 95 ; i++ ) for ( j = 0; j < 32 ; j++ )for ( j = 0; j < 32 ; j++ ) { { if ( i+j > 30 && i+j < 63 )if ( i+j > 30 && i+j < 63 ) A [ i ] [ j ] = … ;A [ i ] [ j ] = … ; if ( i+j > 62 && i+j < 95 )if ( i+j > 62 && i+j < 95 ) … … = A[ i - 32 ] [ j ] ;= A[ i - 32 ] [ j ] ; }}
for ( j = 0; j < 32 ; j++ )for ( j = 0; j < 32 ; j++ ) for ( i = 0; i < 95 ; i++ )for ( i = 0; i < 95 ; i++ ) { { if ( i+j > 30 && i+j < 63 )if ( i+j > 30 && i+j < 63 ) A [ i ] [ j ] = … ;A [ i ] [ j ] = … ; if ( i+j > 62 && i+j < 95 )if ( i+j > 62 && i+j < 95 ) … … = A[ i - 32 ] [ j ] ;= A[ i - 32 ] [ j ] ; }}
784 locations784 locations
32 locations32 locations
Study the effect of loop transformations on the data memoryStudy the effect of loop transformations on the data memory
![Page 22: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/22.jpg)
For the first time, the storage requirements of applicationsFor the first time, the storage requirements of applications can be exactly computed using formal techniquescan be exactly computed using formal techniques
All the previous works are estimation techniques;All the previous works are estimation techniques; they are sometimes very inaccuratethey are sometimes very inaccurate
The previous works have constraints on the specificationsThe previous works have constraints on the specifications
This approach works for the entire class of affine specificationsThis approach works for the entire class of affine specifications
The previous works are illustrated with “simple’’ benchmarksThe previous works are illustrated with “simple’’ benchmarks(in terms of array elements, array references, lines of code) (in terms of array elements, array references, lines of code)
This approach was tested on complex benchmarksThis approach was tested on complex benchmarks e.g.: code with 113 loop nests 3-level deep,e.g.: code with 113 loop nests 3-level deep, 906 array references, over 900 lines of code, 906 array references, over 900 lines of code, 4 million scalar signals4 million scalar signals
The Memory Size Computation ProblemThe Memory Size Computation Problem
![Page 23: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/23.jpg)
Multi-dimensional arrays stored off-chipMulti-dimensional arrays stored off-chip
Copies of the frequently accessed array parts Copies of the frequently accessed array parts
should be stored on-chip should be stored on-chip
Optimizing the Dynamic Energy Consumption Optimizing the Dynamic Energy Consumption in a Hierarchical Memory Subsystemin a Hierarchical Memory Subsystem
SPMSPMOff-chipOff-chipmemorymemory
Copy candidateCopy candidate
Two layer modelTwo layer model
On-chip scratch-padOn-chip scratch-padmemorymemory
![Page 24: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/24.jpg)
The need of an array partitioning based The need of an array partitioning based on the intensity of memory accesses on the intensity of memory accesses
Data reuse model based on latticesData reuse model based on lattices
How to select the copy candidates?
Rows/columns – somewhat better …
Entire arrays – unlikely …
How to find array parts heavily accessed?
![Page 25: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/25.jpg)
A3A16
A2A17
A1A15
Optimizing the Dynamic Energy Consumption Optimizing the Dynamic Energy Consumption in a Hierarchical Memory Subsystemin a Hierarchical Memory Subsystem
![Page 26: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/26.jpg)
#accesses A#accesses A11: 13,569: 13,569
#accesses A#accesses A22: 13,569: 13,569
#accesses A#accesses A33: 13,569: 13,569
#accesses A#accesses A1717: 131,625: 131,625
Total #accesses ATotal #accesses A1717: 172,332: 172,332
Decomposition in disjoint latticesDecomposition in disjoint lattices
Computation of exact number Computation of exact number of memory accesses per latticeof memory accesses per lattice
Optimizing the Dynamic Energy Consumption Optimizing the Dynamic Energy Consumption in a Hierarchical Memory Subsystemin a Hierarchical Memory Subsystem
Lattice ALattice A1717
![Page 27: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/27.jpg)
Map of the array space Map of the array space based on based on
average memory accessesaverage memory accesses
Array space of signal AArray space of signal A
Optimizing the Dynamic Energy Consumption Optimizing the Dynamic Energy Consumption in a Hierarchical Memory Subsystemin a Hierarchical Memory Subsystem
![Page 28: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/28.jpg)
Array space of signal AArray space of signal A
3-D map of the array space 3-D map of the array space based on the exact number based on the exact number
of memory accessesof memory accesses
![Page 29: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/29.jpg)
Map of the array space Map of the array space based on based on
average memory accessesaverage memory accesses
Array space of signal AArray space of signal A
Optimizing the Dynamic Energy Consumption Optimizing the Dynamic Energy Consumption in a Hierarchical Memory Subsystemin a Hierarchical Memory Subsystem
![Page 30: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/30.jpg)
Map of the array space Map of the array space based on based on
average memory accessesaverage memory accesses
Array space of signal AArray space of signal A
Optimizing the Dynamic Energy Consumption Optimizing the Dynamic Energy Consumption in a Hierarchical Memory Subsystemin a Hierarchical Memory Subsystem
![Page 31: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/31.jpg)
Dynamic energyDynamic energy – computed based on number – computed based on number
of accesses to each memory layerof accesses to each memory layer
[ Reinman 99 ]
Optimizing the Dynamic Energy Consumption Optimizing the Dynamic Energy Consumption in a Hierarchical Memory Subsystemin a Hierarchical Memory Subsystem
CACTI power modelCACTI power model One or two orders of magnitude between an One or two orders of magnitude between an
SPM access and an off-chip accessSPM access and an off-chip access
Energy per access is SPM size-dependent – Energy per access is SPM size-dependent –
constant for small SPM sizes (< a few Kbytes) constant for small SPM sizes (< a few Kbytes)
![Page 32: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/32.jpg)
Optimizing the Dynamic Energy Consumption Optimizing the Dynamic Energy Consumption in a Hierarchical Memory Subsystemin a Hierarchical Memory Subsystem
![Page 33: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/33.jpg)
Signal-to-Memory MappingSignal-to-Memory Mapping
A [ indexA [ index11 ] [ index ] [ index22 ] ]
Physical MemoryPhysical Memory
Base address Base address of signal Aof signal A
00
11
22
Window size Window size of signal Aof signal A
![Page 34: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/34.jpg)
Signal-to-Memory MappingSignal-to-Memory Mapping
Mapping model Mapping model (can be used in hierarchical memory organizations)(can be used in hierarchical memory organizations)
m-dim. arraym-dim. array
w w ii = = Max { dist. alive elements having same index i } + 1Max { dist. alive elements having same index i } + 1
mapped tomapped to
m-dim. window ( wm-dim. window ( w11 , … , w , … , wmm ) )
A [ indexA [ index11 ] … [ index ] … [ indexmm ] ]
A [ indexA [ index11 modmod w w11] … [ index] … [ indexmm modmod w wmm]]
![Page 35: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/35.jpg)
Bounding window:Bounding window:
(w(w11,w,w22) = (4,6)) = (4,6)
Storage requirements:Storage requirements:
4 x 6 = 244 x 6 = 24
[ ][ ]A i j
[ mod 4][ mod 6]A i j
Iteration (i=7 , j=9)Iteration (i=7 , j=9)
( 2; 16; )
( 2 18; 2 3; ) {
( 3 21 && 3 34) [ ][ ] ...
( 3 42 && 3 55) ... [ 3][ 6]
}
for i i i
for j i j i j
if i j i j A i j
if i j i j A i j
![Page 36: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/36.jpg)
Bounding window:Bounding window:
(w(w11,w,w22) = (4,6)) = (4,6)
Storage requirements:Storage requirements:
4 x 6 = 244 x 6 = 24
[ ][ ]A i j
[ mod 4][ mod 6]A i j
Iteration (i=7 , j=9)Iteration (i=7 , j=9)
( 2; 16; )
( 2 18; 2 3; ) {
( 3 21 && 3 34) [ ][ ] ...
( 3 42 && 3 55) ... [ 3][ 6]
}
for i i i
for j i j i j
if i j i j A i j
if i j i j A i j
![Page 37: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/37.jpg)
Computation of the WindowComputation of the Windowof a Lattice of Live Signalsof a Lattice of Live Signals
ii
jj
Iterator spaceIterator space
00 11 22 33
11
22
Index spaceIndex space
xx = = TT··ii + + uu
indexindex11
indexindex22
A [2i+3j] [5i+j]A [2i+3j] [5i+j]
for ( i=0; i<=3; i++ )for ( i=0; i<=3; i++ )
for ( j=0; j<= 2; j++ )for ( j=0; j<= 2; j++ )
if ( 3i >= 2j ) if ( 3i >= 2j )
… … A [2i+3j] [5i+j] A [2i+3j] [5i+j] … …
![Page 38: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/38.jpg)
Computation of the WindowComputation of the Windowof a Lattice of Live Signalsof a Lattice of Live Signals
indexindex11
indexindex22
A [2i+3j] [5i+j]A [2i+3j] [5i+j]
00 1212
1717
( w( w11 = 13 , w = 13 , w22 = 18 ) = 18 )
ww11 = 13 = 13
ww22
= 1
8 =
18
2-D window2-D window
by integer projectionby integer projectionof the lattice on the axesof the lattice on the axes
![Page 39: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/39.jpg)
Future workFuture work
Computation of storage requirements for high-throughputComputation of storage requirements for high-throughput
applications, where the code contains explicit parallelismapplications, where the code contains explicit parallelism
Improve the algorithm that aims to optimize the dynamicImprove the algorithm that aims to optimize the dynamic
energy consumption, extending it to an arbitrary numberenergy consumption, extending it to an arbitrary number
of memory layersof memory layers
Extend the hierarchical memory allocation model to saveExtend the hierarchical memory allocation model to save
leakage energyleakage energy
Use area models for memories in order to trade-off Use area models for memories in order to trade-off
decrease of energy consumption and the increase of areadecrease of energy consumption and the increase of area
implied by the memory fragmentationimplied by the memory fragmentation
![Page 40: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/40.jpg)
Future workFuture work
Memory management for configurable architecturesMemory management for configurable architectures
Several FPGA contain distributed RAM modulesSeveral FPGA contain distributed RAM modules
Homogeneous architectures Homogeneous architectures RAMs of same capacityRAMs of same capacityevenly distributedevenly distributed
(Xilinx Virtex II Pro)(Xilinx Virtex II Pro)
Heterogeneous architecturesHeterogeneous architectures A variety of RAMsA variety of RAMs
(Altera Stratix II)(Altera Stratix II)
Memory management for dynamically reconfigurable systemsMemory management for dynamically reconfigurable systems
![Page 41: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/41.jpg)
ConclusionsConclusions
The The exactexact computation of the data storage requirement computation of the data storage requirement
of an application [ IEEE TVLSI 2007 ]of an application [ IEEE TVLSI 2007 ]
A data reuse formal model based on partitioning A data reuse formal model based on partitioning
the arrays according to the intensity of memory accessesthe arrays according to the intensity of memory accesses
[ ICCAD 2006 ] [ ICCAD 2006 ]
A general framework based on lattices for addressingA general framework based on lattices for addressing
several memory management problemsseveral memory management problems
Unique features of this researchUnique features of this research
A signal-to-memory mapping model which works A signal-to-memory mapping model which works
for hierarchical memory organizations [ DATE 2007 ] for hierarchical memory organizations [ DATE 2007 ]
![Page 42: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/42.jpg)
ConclusionsConclusions
This topic is considered by the Semiconductor ResearchThis topic is considered by the Semiconductor Research
Corporation (SRC) Corporation (SRC) one of the top synthesis problems one of the top synthesis problems
still unsolvedstill unsolved
This research is interdisciplinary: EE+CS+MathThis research is interdisciplinary: EE+CS+Math
General goalGeneral goal design of a (hierarchical) memory design of a (hierarchical) memory
subsystem optimized for power consumption and chip area,subsystem optimized for power consumption and chip area,
s.t. performance constraints, starting from the specification s.t. performance constraints, starting from the specification
of a (multi-dimensional) signal processing applicationof a (multi-dimensional) signal processing application
There is interest for international co-operationThere is interest for international co-operation (potential funding: the NSF-PIRE program)(potential funding: the NSF-PIRE program)
![Page 43: Novel Algorithms in the Memory Management of Multi-Dimensional Signal Processing Florin Balasa University of Illinois at Chicago](https://reader036.vdocuments.site/reader036/viewer/2022062422/56649f2b5503460f94c45221/html5/thumbnails/43.jpg)
ConclusionsConclusions
Graduate studentsGraduate students
Hongwei Zhu (Ph.D. defense: Spring 2007)Hongwei Zhu (Ph.D. defense: Spring 2007)
Ilie I. Luican (Ph.D. defense: Spring 2009)Ilie I. Luican (Ph.D. defense: Spring 2009)
Karthik Chandramouli (M.S. completed)Karthik Chandramouli (M.S. completed)