Interleaved Multi-Bank Scratchpad Memories: A Probabilistic Description of
Access Conflicts
DAC '15, June 07 - 11, 2015, San Francisco, CA, USA
Background
• Shared on-chip memory• with multiple separately accessible banks• having a common address space for all
processors
• Advantage: efficient communication between processors• Disadvantage: interference among the
processors• Solution: more banks, optimizing the
address mapping
Address Mapping
• contiguous mapping• pseudo-random mapping• sequentially interleaved mapping (SIM)
The aim of this work is to quantitatively evaluate the properties and characteristics of SIM systems.
Outline
• Background• Problem definition• Occupancy distribution• Markov model • Evaluation
Problem Definition
• We consider a platform with c processor cores and b independently accessible memory banks.• the access probability and the
sequential access probability .• A denotes the random number of
accesses requested in any given cycle, and I represents the number of banks serving accesses in any given cycle.• Given c, b, , and , compute the distribution of the number I of memory banks serving accesses.
ap
seqp
ap seqp
The classic occupancy distribution
• Actual memory accesses a, a=c
Adding access probabilities
A follows the binomial distribution
Limitations of the model
• Sequential access patterns of the applications cannot be taken into account.• It ignores the fact that accesses that
cannot be immediately served are served in subsequent cycles, then interfering with new accesses.
Markov Model
1 2 1. {s ,s ,..., s }ka s setate St
, 2. (s s )k k j ii ja transition matrix T R with T P
,T3. ksteady stat Re
Memory throughput by Markov steady state
1 21. {s ,s ,...,s s
}
c
n
set
s is the number of banks ha
st
ving n accesses
ate
Memory throughput by Markov steady state
Transition probabilities
For a state s, the associated throughput is 0(s) b si
Adding sequential access patterns
Adding sequential access patterns• st• 1. s s’, one access request removed from each’s
queue• 2. s’ t, distributing new access requests
Adding access probabilities
Experiment evaluation
• gem5 ARM simulator.• The GSM, FFT, blowfish, string search and
JPEG examples were chosen to obtain a high diversity in behaviour.
Accuracy of the occupancy model
• For a small number of banks, the throughput is likely to be close to that number.• For a sufficiently large number of banks,
the number of waiting accesses is small.• The maximum relative error is of 12.0%
for b=8.
Benchmark Results
Conclusions from the occupancy model
• As long as the ratio of banks and cores is constant, a system can be arbitrarily scaled without changing the throughput expectation per bank or per core.• The throughput converges exponentially
with the product pa*r to the maximum value b.• For pa*r <0.3, the throughput can be
regarded as growing approximately linearly with pa.
Application example: System design
• System with c=16 cores and b=32 banks.• System 1: interleaving over all 32 banks.• System 2: interleaving for 16 banks + one
“private” memory bank for each core.
1
1[I] b b (1 )occ a
SystemEb
Application example: System design
• System 2 performs better for0.2875privp
Discussion of the synchronisation effect
• For <0.4, the synchronisation effect is insignificant.• Even the speedup from pseq=0 to pseq=1 is less than 5% in this
system.• There are only few cases in which performance is likely to be a
decisive factor for opting for a SIM system rather than for pseudo-random mapping.
seqp