chuanjun zhang, uc riverside 1 using a victim buffer in an application- specific memory hierarchy...
Post on 22-Dec-2015
217 views
TRANSCRIPT
Chuanjun Zhang, UC Riverside 1
Using a Victim Buffer in an Application-Specific Memory
Hierarchy
Chuanjun Zhang*, Frank Vahid***Dept. of Electrical Engineering
Dept. of Computer Science and EngineeringUniversity of California, Riverside
**Also with the Center for Embedded Computer Systems at UC Irvine
This work was supported by the National Science Foundation and the Semiconductor Research Corporation
Frank Vahid, UC Riverside 2
Low Power/Energy Techniques are Essential
Hot enough to cook an egg.
Skadron et al., 30th ISCA
High performance processors are going to be too hot to work
Low energy dissipation is imperative for battery-driven embedded systems
Low power techniques are essential to both embedded systems and high performance processors
Frank Vahid, UC Riverside 3
Caches Consume Much Power
>50%
Caches consume 50% of total processor system power
ARM920T and M*CORE(Segars 01, Lee 99)
Caches accessed often Consume dynamic power
Associativity reduces misses Less power off-chip, but more
power per access Victim buffer helps (Jouppi 90)
Add to direct-mapped cache Keep recently evicted lines in
small buffer, check on miss Like higher-associativity, but
without extra power per access 10% energy savings, 4%
performance improvement (Albera 99)
ProcessorCache
Victim buffer
Memory
Frank Vahid, UC Riverside 4
Victim Buffer
With a victim buffer One cycle on a cache hit Two cycles on a victim
buffer hit Twenty two cycles on a
victim buffer miss Without a victim buffer
One cycle on a cache hit Twenty one cycles on a
victim buffer miss More accesses to off-chip
memory
OFFCHIP MEMORY
PROCESSOR
HIT
L1 cache
Victim buffer
One cycle
MISSHIT
Miss
22 cycles21 cycles
Two cycles
Frank Vahid, UC Riverside 5
Cache Architecture with a Configurable Victim Buffer
Is a victim buffer a useful configurable cache parameter?
Helps for some applications
For others, not useful VB misses, so extra
cycle wasteful? Thus, want ability to
shut off VB for given app.
Hardware overhead One bit register A switch
Four-line victim buffer shown
control signals to the next level memory
SRAM
tag data
SRAMSRAM
CAM
Fully-associative victim buffer
27-bit tag 16-byte cache line data
VB on/off
reg
data from next level memory
Vdd victim line
data to processor
to mux
from cache control circuit
L1 cache
cache control circuit
control signals
s01
Frank Vahid, UC Riverside 6
0%
50%
100%
pad
pcm crc
auto
2
bcn
t
bilv
bin
ary
blit
bre
v
g3fa
x
fir
pje
pg
ucb
qso
rt
v42
adpcm
epic
g721
peg
wit
mpeg
jpeg art
mcf
par
ser
vpr
Ave
Hit rate of victim buffer when added to an 8 Kbyte, 4 Kbyte, or 2 Kbyte direct-mapped cacheBenchmarks from Powerstone, MediaBench, and Spec 2000.
0%
50%
100%
padp
cm crc
auto
2
bcnt
bilv
bina
ry blit
brev
g3fa
x fir
pjep
g
ucbq
sort
v42
adpc
m
epic
g721
pegw
it
mpe
g
jpeg ar
t
mcf
pars
er vpr
Ave
8 Kbyte 4 Kbyte 2 Kbyte
Data cache
Hit Rate of a Victim Buffer
Instruction cache
Frank Vahid, UC Riverside 7
Computing Total Memory-Related Energy
Consider CPU stall energy and off-chip memory energy
Excludes CPU active energy Thus, represents all memory-related energyenergy_mem = energy_dynamic + energy_static
energy_dynamic = cache_hits * energy_hit + cache_misses * energy_miss
energy_miss = energy_offchip_access + energy_uP_stall + energy_cache_block_fill
energy_static = cycles * energy_static_per_cycle
energy_miss = k_miss_energy * energy_hit
energy_static_per_cycle = k_static * energy_total_per_cycle
(we varied the k’s to account for different system implementations)
Underlined – measured quantities SimpleScalar (cache_hits, cache_misses, cycles)Our layout or data sheets (others)
Frank Vahid, UC Riverside 8
Performance and Energy Benefits of Victim Buffer with a Direct-Mapped Cache
21% 24% 13% 38% 43% 60% 15%
-4%
0%
4%
8%
12%
padp
cm crc
auto
2
bcnt
bilv
bina
ry blit
brev
g3fa
x fir
pjeg
ucbq
sort
v42
adpc
m
epic
g721
pegw
it
mpe
g
jpeg ar
t
mcf
pars
er vpr
performance energy
An 8-line victim buffer with an 8 Kbyte direct-mapped cache (0%=DM w/o victim buffer)
Should shut-off VB
Substantial benefit
Configurable victim buffer is clearly useful to avoid performance penalty for certain applications
Frank Vahid, UC Riverside 9
Is a Configurable Victim Buffer Useful Even With a Configurable Cache
We showed that a configurable cache can reduce memory access power by half on average
(Zhang/Vahid/Najjar ISCA 03, ISVLSI 03)
Software-configurable cache Associativity – 1, 2 or 4
ways Size: 2, 4 or 8 Kbytes
Does that configurability subsume usefulness of configurable victim buffer?
Line
2 Kbway
0.0
0.2
0.4
0.6
0.8
1.0
1 2 4
Associativity
Nor
mal
ized
ene
rgy
epic
mpeg2
Frank Vahid, UC Riverside 10
Best Configurable Cache with VB Configurations
Optimal cache configuration when cache associativity, cache size, and victim buffer are all configurable.
I and D stands for instruction cache and data cache, respectively.
V stands for the victim buffer is on. nK stands for the cache size is n
Kbyte. The associativity is represented by
the last four characters Benchmark vpr, I2D1 stands for two-
way instruction cache and direct-mapped data cache.
Note that sometimes victim buffer should be on, sometimes off
Example Best Example Bestpadpcm I8KD4KI1D2 ucbqsort I4KDV4KI1D1
crc I2KDV4KI1D1 v42 I8KD8KI1D1
auto2 I4KD2KI1D1 adpcm I2KDV2KI1D1
bcnt I2KD2KI1D1 epic IV4KDV8KI1D1
bilv I4KD2KI1D1 jpeg I8KD2KI4D1
binary I4KD2KI1D1 mpeg2 I4KDV4KI1D1
blit I2KDV2KI1D1 g721 I8KDV2KI2D1
brev I4KD2KI1D1 art I4KDV2KI1D1
g3fax I4KDV2KI1D1 mcf I4KD4KI1D1
fir I4KD2KI1D1 parser I8KDV4KI4D1
pjepg I4KDV2KI1D1 vpr I8KD2KI2D1
pegw it I4KD4KI1D1
Frank Vahid, UC Riverside 11
Performance and Energy Benefits of Victim Buffer Added to a Configurable Cache
32% 43% 23%
-4%
0%
4%
8%
12%
padp
cm crc
auto
2
bcnt
bilv
bina
ry blit
brev
g3fa
x fir
pjeg
ucbq
sort
v42
adpc
m
epic
g721
pegw
it
mpe
g
jpeg ar
t
mcf
pars
er vpr
performance energy
An 8-line victim buffer with a configurable cache, whose associativity, size, and line size are configurable (0%=optimal config. without VB)
Still surprisingly effective
Frank Vahid, UC Riverside 12
Conclusion
Configurable victim buffer useful with direct-mapped cache
As much as 60% energy and 4% performance improvements for some applications
Can shut off to avoid performance penalty on other apps. Configurable victim buffer also useful with configurable
cache As much as 43% energy and 8% performance improvement for
some applications Can shut off to avoid performance overhead on other applications
Configurable victim buffer should be included as a software-configurable parameter to direct-mapped as well as configurable caches for embedded system architectures