transparent pointer compression for linked data structures
Post on 10-Feb-2016
15 views
Embed Size (px)
DESCRIPTION
Transparent Pointer Compression for Linked Data Structures. Chris Lattner lattner@cs.uiuc.edu. Vikram Adve vadve@cs.uiuc.edu. June 12, 2005 MSP 2005 http://llvm.cs.uiuc.edu/. Growth of 64-bit computing. 64-bit architectures are increasingly common: - PowerPoint PPT PresentationTRANSCRIPT
Transparent Pointer Compressionfor Linked Data StructuresJune 12, 2005MSP 2005
http://llvm.cs.uiuc.edu/Chris Lattnerlattner@cs.uiuc.edu
Vikram Advevadve@cs.uiuc.edu
Growth of 64-bit computing64-bit architectures are increasingly common:New architectures and chips (G5, IA64, X86-64, )High-end systems have existed for many years now
64-bit address space used for many purposes:Address space randomization (security)Memory mapping large files (databases, etc)Single address space OSsMany 64-bit systems have < 4GB of phys memory64-bits is still useful for its virtual address space
Cost of a 64-bit virtual address spacePointers must be 64 bits (8 bytes) instead of 32 bits:Significant impact for pointer-intensive programs!
Pointer intensive programs suffer from:Reduced effective L1/L2/TLB cache sizesReduced effective memory bandwidthIncreased alignment requirements, etc
Pointer intensive programs are increasingly common:Recursive data structures (our focus)Object oriented programs
BIGGER POINTERS
Previously Published ApproachesSimplest approaches: Use 32-bit addressingCompile for 32-bit pointer size -m32Force program image into 32-bits [Adl-Tabatabai04]Loses advantage of 64-bit address spaces!Other approaches: Exotic hardware supportCompress pairs of values, speculating that pointer offset is small [Zhang02]Compress arrays of related pointers [Takagi03]Requires significant changes to cache hierarchyNo previous fully-automatic compiler technique to shrink pointers in RDSs
Our Approach (1/2)Use Automatic Pool Allocation [PLDI05] to partition heap into memory pools:Infers and captures pool homogeneity informationOriginal heap layoutLayout after pool allocation
Our Approach (2/2)Replace pointers with 64-bit integer offsets from the start of the poolChange *Ptr into *(PoolBase+Ptr)
Shrink 64-bit integers to 32-bit integersAllows each pool to be up to 4GB in size
Layout after pointer compression
Talk OutlineIntroduction & MotivationAutomatic Pool Allocation BackgroundPointer Compression TransformationExperimental ResultsConclusion
Automatic Pool AllocationCompute points-to graph:Ensure each pointer has one targetunification-based approach
Infer pool lifetimes:Uses escape analysis
Rewrite program:malloc poolalloc, free poolfreeInsert calls to poolinit/pooldestroyPass pool descriptors to functionsPool 1Pool 2Pool 3Pool 4For more info: see MSP paper or talk at PLDI tomorrow
A Simple Pointer-intensive ExampleA list of pointers to doublesList *L = 0;for () { List *N = malloc(List); N->Next = L; N->Data = malloc(double); L = N;} LdoubleListList List double
Effect of Automatic Pool Allocation 1/2Heap is partitioned into separate poolsEach individual pool is smaller than total heapList *L = 0;for () { List *N = malloc(List); N->Next = L; N->Data = malloc(double); L = N;}List *L = 0;for () { List *N = poolalloc(PD1, List); N->Next = L; N->Data = poolalloc(PD2,double); L = N;}PD1PD2
Effect of Automatic Pool Allocation 2/2Each pool has a descriptor associated with it:Passed into poolalloc/poolfree
We know which pool each pointer points into:Given the above, we also have the pool descriptore.g. N, L PD1 and N->Data PD2List *L = 0;for () { List *N = malloc(List); N->Next = L; N->Data = malloc(double); L = N;}List *L = 0;for () { List *N = poolalloc(PD1, List); N->Next = L; N->Data = poolalloc(PD2,double); L = N;}
Talk OutlineIntroduction & MotivationAutomatic Pool Allocation BackgroundPointer Compression TransformationExperimental ResultsConclusion
Index Conversion of a PoolForce pool memory to be contiguous:Normal PoolAlloc runtime allocates memory in chunksTwo implementation strategies for this (see paper)Change pointers into the pool to integer offsets/indexes from pool base:Replace *P with *(PoolBase + P)A pool can be index converted if pointers into it only point to heap memory (no stack or global mem)
Index Compression of a PointerShrink indexes in type-homogenous poolsShrink from 64-bits to 32-bitsReplace structure definition & field accesses Requires accurate type-info and type-safe accessesstruct List { struct List *Next; int Data;};struct List { int64 Next; int Data;};struct List { int32 Next; int Data;};List *L = malloc(16);L = malloc(16);L = malloc(8);index conversionindex compression
Index Conversion Example 1LdoubleListPrevious Example
Index Compression Example 1Example afterindex conversion64-bit integer indicesLdoubleList
Impact of Type-Homogeneity/safety Compression requires rewriting structures:e.g. malloc(32) malloc(16)Rewriting depends on type-safe memory accessesWe cant know how to rewrite unions and other casesMust verify that memory accesses are type-safe
Pool allocation infers type homogeneity:Unions, bad C pointer tricks, etc non-THSome pools may be TH, others not
Cant index compress ptrs in non-TH pools!
Index Conversion Example 2Compression of TH memory, pointed to by non-TH memory
Index Compression Example 2Example afterindex conversion64-bit integer indexesCompression of TH memory, pointed to by non-TH memory
Pointer Compression Example 3LArrayunknowntypeCompression of TH pointers, pointing to non-TH memory
Static Pointer Compression Impl.Inspect graph of pools provided by APA:Find compressible pointersDetermine pools to index convert
Use rewrite rules to ptr compress program:e.g. if P1 and P2 are compressed pointers, change:P1 = *P2 P1 = *(int*)(PoolBase+P2)
Perform interprocedural call graph traversal:Top-down traversal from main()
Talk OutlineIntroduction & MotivationAutomatic Pool Allocation BackgroundPointer Compression TransformationDynamic Pointer CompressionExperimental ResultsConclusion
Dynamic Pointer Compression IdeaStatic compression can break programs:Each pool is limited to 232 bytes of memoryProgram aborts when 232nd byte allocated!
Expand pointers dynamically when needed!When 232nd byte is allocated, expand ptrs to 64-bitsTraverse/rewrite pool, uses type informationSimilar to (but a bit simpler than) a copying GC pass
Dynamic Pointer Compression CostStructure offset and sizes depend on whether a pool is currently compressed:P1 = *P2
Use standard optimizations to address this:Predication, loop unswitching, jump threading, etc.
See paper for detailsif (PD->isCompressed) P1 = *(int32*)(PoolBase + P2*C1 + C2);else P1 = *(int64*)(PoolBase + P2*C3 + C4);
Talk OutlineIntroduction & MotivationAutomatic Pool Allocation BackgroundPointer Compression TransformationExperimental ResultsConclusion
Experimental Results: 2 QuestionsDoes Pointer Compression improve the performance of pointer intensive programs?Cache miss reductions, memory bandwidth improvementsMemory footprint reduction
How does the impact of Pointer Compression vary across 64-bit architectures?Do memory system improvements outweigh overhead?Built in the LLVM Compiler Infrastructure: http://llvm.cs.uiuc.edu/
Static PtrComp Performance ImpactUltraSPARC IIIi w/1MB CachePeak Memory Usage1.0 = Program compiled with LLVM & PA but no PC
PAPCPC/PAbh8MB8MB1.00bisort64MB32MB0.50em3d47MB47MB1.00perimeter299MB171MB0.57power882KB816KB0.93treeadd128MB64MB0.50tsp128MB96MB0.75ft9MB4MB0.51ks47KB47KB1.00llubench4MB2MB0.50
Chart2
0.94626312540.9987646695
1.12422871250.9991772933
1.00110456551.1137702504
1.52790346910.8114630468
1.00153374230.9984662577
1.37987874190.6775293672
1.12592137590.9385749386
1.25775862070.8655172414
1.01008827241.0428751576
1.34169953390.4256005737
NoPA Runtime Ratio
PtrComp Runtime Ratio
Runtime Ratio (PA = 1.0)
Sheet1
NativePAPCNoPA Runtime RatioPtrComp Runtime RatioPeakPAPeakPCPC/PAPeakPAPeakPC
bh15.3216.1916.170.94626312540.99876466957.7MB7.7MB0bh7.7MB7.7MB
bisort27.3324.3124.291.12422871250.999177293348MB24MB0bisort48MB24MB
em3d27.1927.1630.251.00110456551.1137702504
perimeter10.136.635.381.52790346910.8114630468256MB149MB0perimeter256MB149MB
power6.536.526.511.00153374230.9984662577924KB854KB0power924KB854KB
treeadd72.8352.7835.761.37987874190.677529367296MB48MB0treeadd96MB48MB
tsp18.3316.2815.281.12592137590.9385749386131MB114MB0tsp131MB114MB
ft14.5911.610.041.25775862070.86551724148.9MB4.5MB0ft8.9MB4.5MB
ks8.017.938.271.01008827241.042875157647KB47KB0ks47KB47KB
llubench37.4227.8911.871.34169953390.42560057373MB1.5MB0llubench3MB1.5MB
Sheet1
NoPA Runtime Ratio
PtrComp Runtime Ratio
Runtime Ratio (PA = 1.0)
Sheet2
Sheet3
Evaluating Effect of ArchitecturePick one program that scales easily:llubench Linked list micro-benchmarkllubench has little computation, many dereferencesBest possible case for pointer compressionEvaluate how ptrcomp impacts scalability:Compare to native and pool allocated versionEvaluate overhead introduced by ptrcomp:Compare PA32 with PC32 (compress 32 32 bits)How close is ptrcomp to native 32-bit pointers?Compare to native-32 and poolalloc-32 for limit study
SPARC V9 PtrComp (1MB Cache)PtrComp overhead is very lowSignificant scalability impact
Chart3
0.00000002910.00000001780.00000001090.00000001730.0000000110.0000000112
0.00000002730.00000001820.00000001040.00000002020.00000001060.0000000109
0.00000002720.00000001760.00000001080.00000002020.00000001020.0000000112
0.00000002740.0000000180.00000001130.00000002120.00000001130.0