linearly compressed pages: a main memory compression framework with low complexity and low latency...
TRANSCRIPT
![Page 1: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/1.jpg)
Linearly Compressed Pages: A Main Memory
Compression Framework with Low Complexity and Low Latency
Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi Xin, Onur Mutlu, Todd C. Mowry
Phillip B. Gibbons, Michael A. Kozuch
![Page 2: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/2.jpg)
2
Executive Summary Main memory is a limited shared resource Observation: Significant data redundancy Idea: Compress data in main memory Problem: How to avoid inefficiency in address
computation? Solution: Linearly Compressed Pages (LCP): fixed-size cache line granularity compression1. Increases memory capacity (62% on average)2. Decreases memory bandwidth consumption (24%)3. Decreases memory energy consumption (4.9%)4. Improves overall performance (13.9%)
![Page 3: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/3.jpg)
3
Potential for Data CompressionSignificant redundancy in in-memory data:
0x00000000
How can we exploit this redundancy?• Main memory compression helps• Provides effect of a larger memory without
making it physically larger
0x0000000B 0x00000003 0x00000004 …
![Page 4: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/4.jpg)
4
Challenges in Main Memory Compression
1. Address Computation
2. Mapping and Fragmentation
![Page 5: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/5.jpg)
L0 L1 L2 . . . LN-1
Cache Line (64B)
Address Offset 0 64 128 (N-1)*64
L0 L1 L2 . . . LN-1Compressed Page
0 ? ? ?Address Offset
Uncompressed Page
Challenge 1: Address Computation
5
![Page 6: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/6.jpg)
Challenge 2: Mapping & Fragmentation
6
Virtual Page (4KB)
Physical Page (? KB) Fragmentation
Virtual Address
Physical Address
![Page 7: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/7.jpg)
7
Outline• Motivation & Challenges• Shortcomings of Prior Work• LCP: Key Idea • LCP: Implementation• Evaluation• Conclusion and Future Work
![Page 8: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/8.jpg)
8
Key Parameters in Memory CompressionCompressionRatio
Address Comp.Latency
Decompression Latency
Complexityand Cost
![Page 9: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/9.jpg)
9
Shortcomings of Prior WorkCompressionMechanisms
CompressionRatio
Address Comp.Latency
Decompression Latency
Complexityand Cost
IBM MXT[IBM J.R.D. ’01]
2X 64 cycles
![Page 10: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/10.jpg)
10
Shortcomings of Prior Work (2)CompressionMechanisms
CompressionRatio
Address Comp.Latency
Decompression Latency
Complexity And Cost
IBM MXT[IBM J.R.D. ’01]
Robust Main Memory Compression [ISCA’05]
![Page 11: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/11.jpg)
11
Shortcomings of Prior Work (3)CompressionMechanisms
CompressionRatio
Address Comp.Latency
Decompression Latency
Complexity And Cost
IBM MXT[IBM J.R.D. ’01]
Robust Main Memory Compression [ISCA’05]
LCP: Our Proposal
![Page 12: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/12.jpg)
12
Linearly Compressed Pages (LCP): Key Idea
64B 64B 64B 64B . . .
. . .4:1 Compression
64B
Uncompressed Page (4KB: 64*64B)
Compressed Data (1KB)
LCP effectively solves challenge 1: address computation
128
32
Fixed compressed size
![Page 13: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/13.jpg)
13
4:1 Compression
E
LCP: Key Idea (2)
64B 64B 64B 64B . . .
. . . M
Metadata (64B)
ExceptionStorage
64B
Uncompressed Page (4KB: 64*64B)
Compressed Data (1KB)
idx
E0
![Page 14: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/14.jpg)
14
E
But, wait …
64B 64B 64B 64B . . .
. . . M
4:1 Compression
64B
Uncompressed Page (4KB: 64*64B)
Compressed Data (1KB)
How to avoid 2 accesses ?
Metadata (MD) cache
![Page 15: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/15.jpg)
15
Key Ideas: Summary
Fixed compressed size per cache line
Metadata (MD) cache
![Page 16: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/16.jpg)
16
Outline• Motivation & Challenges• Shortcomings of Prior Work• LCP: Key Idea • LCP: Implementation• Evaluation• Conclusion and Future Work
![Page 17: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/17.jpg)
17
LCP Overview• Page Table entry extension
• compression type and size (fixed encoding)• OS support for multiple page sizes
• 4 memory pools (512B, 1KB, 2KB, 4KB)• Handling uncompressible data• Hardware support
• memory controller logic• metadata (MD) cache
PTE
512B 1KB 2KB 4KB
![Page 18: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/18.jpg)
Page Table Entry Extension
18
c-bit (1b)c-type (3b)
Page Table Entry c-size (2b)
c-base (3b)
• c-bit (1b) – compressed or uncompressed page• c-type (3b) – compression encoding used• c-size (2b) – LCP size (e.g., 1KB)• c-base (3b) – offset within a page
![Page 19: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/19.jpg)
19
Physical Memory Layout
1
4
4KB
2KB 2KB
1KB 1KB 1KB 1KB
512B 512B ... 512B
4KB
…
…
Page Table
PA1
PA2
…
PA2 + 512*1
PA1 + 512*4
PA0
![Page 20: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/20.jpg)
20
Memory Request Flow
1. Initial Page Compression
2. Cache Line Read
3. Cache Line Writeback
![Page 21: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/21.jpg)
21
Initial Page Compression (1/3)Memory Request Flow (2)
Last-LevelCache
Core TLB
Compress/ Decompress
MemoryController
MD Cache
Processor
Disk
DRAM
4KB
1KB
1. Initial Page Compression2. Cache Line Read
LD
LD
1KB$Line
3. Cache Line Writeback
$Line
2KB
$Line
Cache Line Read (2/3)Cache Line Writeback (3/3)
![Page 22: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/22.jpg)
22
Handling Page Overflows• Happens after writebacks, when all slots in the
exception storage are already taken
• Two possible scenarios:• Type-1 overflow: requires larger physical page size
(e.g., 2KB instead of 1KB)• Type-2 overflow: requires decompression and full
uncompressed physical page (e.g., 4KB)
…
$ line
M
Compressed Data
E0
Exception Storage
E1 E2
Happens infrequently -once per ~2M instructions
![Page 23: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/23.jpg)
23
Compression Algorithms• Key requirements:
• Low hardware complexity• Low decompression latency• High effective compression ratio
• Frequent Pattern Compression [ISCA’04]
• Uses simplified dictionary-based compression
• Base-Delta-Immediate Compression [PACT’12]
• Uses low-dynamic range in the data
![Page 24: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/24.jpg)
24
Base-Delta Encoding [PACT’12]32-byte Uncompressed Cache Line
0xC04039C0 0xC04039C8 0xC04039D0 … 0xC04039F8
0xC04039C0Base
0x00
1 byte
0x08
1 byte
0x10
1 byte
… 0x38 12-byte Compressed Cache Line
20 bytes saved Fast Decompression: vector addition
Simple Hardware: arithmetic and comparison
Effective: good compression ratio
BDI [PACT’12] has two bases:1. zero base (for narrow values)2. arbitrary base (first non-zero
value in the cache line)
![Page 25: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/25.jpg)
25
• Memory bandwidth reduction:
• Zero pages and zero cache lines• Handled separately in TLB (1-bit) and in metadata (1-bit per cache line)
LCP-Enabled Optimizations
64B 64B 64B 64B
1 transfer instead of 4
![Page 26: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/26.jpg)
26
Outline• Motivation & Challenges• Shortcomings of Prior Work• LCP: Key Idea • LCP: Implementation• Evaluation• Conclusion and Future Work
![Page 27: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/27.jpg)
27
Methodology• Simulator: x86 event-driven based on Simics
• Workloads (32 applications)• SPEC2006 benchmarks, TPC, Apache web server
• System Parameters• L1/L2/L3 cache latencies from CACTI [Thoziyoor+, ISCA’08]• 512kB - 16MB L2 caches • DDR3-1066, 1 memory channel
• Metrics• Performance: Instructions per cycle, weighted speedup• Capacity: Effective compression ratio• Bandwidth: Bytes per kilo-instruction (BPKI)• Energy: Memory subsystem energy
![Page 28: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/28.jpg)
28
Evaluated DesignsDesign Description
Baseline Baseline (no compression)
RMC Robust main memory compression[ISCA’05]
(RMC) and FPC[ISCA’04]
LCP-FPC LCP framework with FPC
LCP-BDI LCP framework with BDI[PACT’12]
LZ Lempel-Ziv compression (per page)
![Page 29: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/29.jpg)
29
Effect on Memory Capacity32 SPEC2006, databases, web workloads, 2MB L2 cache
LCP-based designs achieve competitive average compression ratios with prior work
0.00.51.01.52.02.5
1.00
1.59 1.52 1.62
2.60Baseline RMC LCP-FPC LCP-BDI LZ
Com
pres
sion
Ra-
tio
![Page 30: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/30.jpg)
30
Effect on Bus Bandwidth32 SPEC2006, databases, web workloads, 2MB L2 cache
LCP-based designs significantly reduce bandwidth (24%)(due to data compression)
Bett
er
0.00.20.40.60.81.0 1.00
0.79 0.80 0.76
Baseline RMC LCP-FPC LCP-BDI
Nor
mal
ized
BPK
I
![Page 31: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/31.jpg)
31
Effect on Performance
LCP-based designs significantly improve performance over RMC
1-core 2-core 4-core0%2%4%6%8%
10%12%14%16%
RMC LCP-FPC LCP-BDI
Perf
orm
ance
Im
prov
emen
t
![Page 32: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/32.jpg)
32
Effect on Memory Subsystem Energy32 SPEC2006, databases, web workloads, 2MB L2 cache
LCP framework is more energy efficient than RMC
Bett
er
0.0
0.2
0.4
0.6
0.8
1.0
1.21.00 1.06 0.97 0.95
Baseline RMC LCP-FPC LCP-BDI
Nor
mal
ized
Ene
rgy
![Page 33: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/33.jpg)
33
Effect on Page Faults32 SPEC2006, databases, web workloads, 2MB L2 cache
LCP framework significantly decreases the number of page faults (up to 23% on average for 768MB)
256MB 512MB 768MB 1GB0
0.20.40.60.8
11.2
8%14% 23%
21%
Baseline LCP-BDI
DRAM Size
Nor
mal
ized
# o
f Pa
ge F
aults
![Page 34: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/34.jpg)
34
Other Results and Analyses in the Paper
• Analysis of page overflows• Compressed page size distribution• Compression ratio over time• Number of exceptions (per page)• Detailed single-/multicore evaluation• Comparison with stride prefetching
• performance and bandwidth
![Page 35: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/35.jpg)
35
Conclusion• Old Idea: Compress data in main memory• Problem: How to avoid inefficiency in address
computation?• Solution: A new main memory compression framework
called LCP (Linearly Compressed Pages)• Key idea: fixed-size for compressed cache lines within a page
• Evaluation:1. Increases memory capacity (62% on average)2. Decreases bandwidth consumption (24%)3. Decreases memory energy consumption (4.9%)4. Improves overall performance (13.9%)
![Page 36: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/36.jpg)
Linearly Compressed Pages: A Main Memory Compression
Framework with Low Complexity and Low Latency
Gennady Pekhimenko, Vivek Seshadri , Yoongu Kim, Hongyi Xin, Onur Mutlu, Todd C. Mowry
Phillip B. Gibbons, Michael A. Kozuch
![Page 37: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/37.jpg)
37
Backup Slides
![Page 38: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/38.jpg)
38
Large Pages (e.g., 2MB or 1GB)
• Splitting large pages into smaller 4KB sub-pages (compressed individually)
• 64-byte metadata chunks for every sub-page
2KB 2KB
…
2KB 2KBM
![Page 39: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/39.jpg)
Physically Tagged Caches
39
Core
TLB
tagtagtag
Physical Address
datadatadata
VirtualAddress
Critical PathAddress Translation
L2 CacheLines
![Page 40: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/40.jpg)
40
Changes to Cache Tagging Logic
Before:
tagtag
p-base
datadatadata
CacheLines
tag
• p-base – physical page base address• c-idx – cache line index within the page
After:
p-base c-idx
![Page 41: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/41.jpg)
41
Analysis of Page Overflows
apache
bzip2
gcc
gromacs
lbm
libquantum
omnetpp
sjeng
sphinx3
tpch6
zeusm
p
GeoMean
1E-08
1E-07
1E-06
1E-05
1E-04
1E-03T
ype-
1 O
verf
low
s p
er in
str.
(l
og-s
cale
)
![Page 42: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/42.jpg)
42
Frequent Pattern CompressionIdea: encode cache lines based on frequently occurring patterns, e.g., first half of a word is zero
0x00000001 0x00000000 0xFFFFFFFF 0xABCDEFFF
0x00000001 001
0x00000000 000
0xFFFFFFFF 011
0xABCDEFFF 111
Frequent Patterns:000 – All zeros001 – First half zeros010 – Second half zeros011 – Repeated bytes100 – All ones…111 – Not a frequent pattern
001 0x0001 000 011 0xFF 111 0xABCDEFFF
0x0001
0xFF
0xABCDEFFF
![Page 43: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/43.jpg)
43
GPGPU Evaluation
• Gpgpu-sim v3.x• Card: NVIDIA GeForce GTX 480 (Fermi)• Caches:
– DL1: 16 KB with 128B lines– L2: 786 KB with 128B lines
• Memory: GDDR5
![Page 44: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/44.jpg)
44
Effect on Bandwidth ConsumptionBF
SM
UM
JPEG N
NLP
SST
OCO
NS
SCP
spm
vsa
d
back
prop
hots
pot
stre
amcl
uste
r
PVC
PVR
InvI
dx SS bfs
bh dmr
mst sp
sssp
Geo
Mea
n
CUDA Parboil Rodinia Mars Lonestar
0.00.51.01.52.02.53.0
BDI LCP-BDI
Nor
mal
ized
BPK
I
![Page 45: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/45.jpg)
45
Effect on ThroughputBF
SM
UM
JPEG N
NLP
SST
OCO
NS
SCP
spm
vsa
d
back
prop
hots
pot
stre
amcl
uste
r
PVC
PVR
InvI
dx SS bfs
bh dmr
mst sp
sssp
Geo
Mea
n
CUDA Parboil Rodinia Mars Lonestar
0.81.01.21.41.61.8
Baseline BDI
Nor
mal
ized
Per
form
ance
![Page 46: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/46.jpg)
46
Physical Memory Layout
1
4
4KB
2KB 2KB
1KB 1KB 1KB 1KB
512B 512B ... 512B
4KB
…
…
Page Table
PA1c-base
PA2
…
PA2 + 512*1
PA1 + 512*4
PA0
![Page 47: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/47.jpg)
47
Page Size Distribution
![Page 48: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/48.jpg)
48
Compression Ratio Over Time
![Page 49: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/49.jpg)
49
IPC (1-core)
![Page 50: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/50.jpg)
50
Weighted Speedup
![Page 51: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/51.jpg)
51
Bandwidth Consumption
![Page 52: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/52.jpg)
52
Page Overflows
![Page 53: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/53.jpg)
53
Stride Prefetching - IPC
![Page 54: Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi](https://reader031.vdocuments.site/reader031/viewer/2022032701/56649c895503460f9494184d/html5/thumbnails/54.jpg)
54
Stride Prefetching - Bandwidth