transparent pointer compression for linked data structures

Download Transparent Pointer Compression for Linked Data Structures

Post on 10-Feb-2016

15 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

Transparent Pointer Compression for Linked Data Structures. Chris Lattner lattner@cs.uiuc.edu. Vikram Adve vadve@cs.uiuc.edu. June 12, 2005 MSP 2005 http://llvm.cs.uiuc.edu/. Growth of 64-bit computing. 64-bit architectures are increasingly common: - PowerPoint PPT Presentation

TRANSCRIPT

  • Transparent Pointer Compressionfor Linked Data StructuresJune 12, 2005MSP 2005

    http://llvm.cs.uiuc.edu/Chris Lattnerlattner@cs.uiuc.edu

    Vikram Advevadve@cs.uiuc.edu

  • Growth of 64-bit computing64-bit architectures are increasingly common:New architectures and chips (G5, IA64, X86-64, )High-end systems have existed for many years now

    64-bit address space used for many purposes:Address space randomization (security)Memory mapping large files (databases, etc)Single address space OSsMany 64-bit systems have < 4GB of phys memory64-bits is still useful for its virtual address space

  • Cost of a 64-bit virtual address spacePointers must be 64 bits (8 bytes) instead of 32 bits:Significant impact for pointer-intensive programs!

    Pointer intensive programs suffer from:Reduced effective L1/L2/TLB cache sizesReduced effective memory bandwidthIncreased alignment requirements, etc

    Pointer intensive programs are increasingly common:Recursive data structures (our focus)Object oriented programs

    BIGGER POINTERS

  • Previously Published ApproachesSimplest approaches: Use 32-bit addressingCompile for 32-bit pointer size -m32Force program image into 32-bits [Adl-Tabatabai04]Loses advantage of 64-bit address spaces!Other approaches: Exotic hardware supportCompress pairs of values, speculating that pointer offset is small [Zhang02]Compress arrays of related pointers [Takagi03]Requires significant changes to cache hierarchyNo previous fully-automatic compiler technique to shrink pointers in RDSs

  • Our Approach (1/2)Use Automatic Pool Allocation [PLDI05] to partition heap into memory pools:Infers and captures pool homogeneity informationOriginal heap layoutLayout after pool allocation

  • Our Approach (2/2)Replace pointers with 64-bit integer offsets from the start of the poolChange *Ptr into *(PoolBase+Ptr)

    Shrink 64-bit integers to 32-bit integersAllows each pool to be up to 4GB in size

    Layout after pointer compression

  • Talk OutlineIntroduction & MotivationAutomatic Pool Allocation BackgroundPointer Compression TransformationExperimental ResultsConclusion

  • Automatic Pool AllocationCompute points-to graph:Ensure each pointer has one targetunification-based approach

    Infer pool lifetimes:Uses escape analysis

    Rewrite program:malloc poolalloc, free poolfreeInsert calls to poolinit/pooldestroyPass pool descriptors to functionsPool 1Pool 2Pool 3Pool 4For more info: see MSP paper or talk at PLDI tomorrow

  • A Simple Pointer-intensive ExampleA list of pointers to doublesList *L = 0;for () { List *N = malloc(List); N->Next = L; N->Data = malloc(double); L = N;} LdoubleListList List double

  • Effect of Automatic Pool Allocation 1/2Heap is partitioned into separate poolsEach individual pool is smaller than total heapList *L = 0;for () { List *N = malloc(List); N->Next = L; N->Data = malloc(double); L = N;}List *L = 0;for () { List *N = poolalloc(PD1, List); N->Next = L; N->Data = poolalloc(PD2,double); L = N;}PD1PD2

  • Effect of Automatic Pool Allocation 2/2Each pool has a descriptor associated with it:Passed into poolalloc/poolfree

    We know which pool each pointer points into:Given the above, we also have the pool descriptore.g. N, L PD1 and N->Data PD2List *L = 0;for () { List *N = malloc(List); N->Next = L; N->Data = malloc(double); L = N;}List *L = 0;for () { List *N = poolalloc(PD1, List); N->Next = L; N->Data = poolalloc(PD2,double); L = N;}

  • Talk OutlineIntroduction & MotivationAutomatic Pool Allocation BackgroundPointer Compression TransformationExperimental ResultsConclusion

  • Index Conversion of a PoolForce pool memory to be contiguous:Normal PoolAlloc runtime allocates memory in chunksTwo implementation strategies for this (see paper)Change pointers into the pool to integer offsets/indexes from pool base:Replace *P with *(PoolBase + P)A pool can be index converted if pointers into it only point to heap memory (no stack or global mem)

  • Index Compression of a PointerShrink indexes in type-homogenous poolsShrink from 64-bits to 32-bitsReplace structure definition & field accesses Requires accurate type-info and type-safe accessesstruct List { struct List *Next; int Data;};struct List { int64 Next; int Data;};struct List { int32 Next; int Data;};List *L = malloc(16);L = malloc(16);L = malloc(8);index conversionindex compression

  • Index Conversion Example 1LdoubleListPrevious Example

  • Index Compression Example 1Example afterindex conversion64-bit integer indicesLdoubleList

  • Impact of Type-Homogeneity/safety Compression requires rewriting structures:e.g. malloc(32) malloc(16)Rewriting depends on type-safe memory accessesWe cant know how to rewrite unions and other casesMust verify that memory accesses are type-safe

    Pool allocation infers type homogeneity:Unions, bad C pointer tricks, etc non-THSome pools may be TH, others not

    Cant index compress ptrs in non-TH pools!

  • Index Conversion Example 2Compression of TH memory, pointed to by non-TH memory

  • Index Compression Example 2Example afterindex conversion64-bit integer indexesCompression of TH memory, pointed to by non-TH memory

  • Pointer Compression Example 3LArrayunknowntypeCompression of TH pointers, pointing to non-TH memory

  • Static Pointer Compression Impl.Inspect graph of pools provided by APA:Find compressible pointersDetermine pools to index convert

    Use rewrite rules to ptr compress program:e.g. if P1 and P2 are compressed pointers, change:P1 = *P2 P1 = *(int*)(PoolBase+P2)

    Perform interprocedural call graph traversal:Top-down traversal from main()

  • Talk OutlineIntroduction & MotivationAutomatic Pool Allocation BackgroundPointer Compression TransformationDynamic Pointer CompressionExperimental ResultsConclusion

  • Dynamic Pointer Compression IdeaStatic compression can break programs:Each pool is limited to 232 bytes of memoryProgram aborts when 232nd byte allocated!

    Expand pointers dynamically when needed!When 232nd byte is allocated, expand ptrs to 64-bitsTraverse/rewrite pool, uses type informationSimilar to (but a bit simpler than) a copying GC pass

  • Dynamic Pointer Compression CostStructure offset and sizes depend on whether a pool is currently compressed:P1 = *P2

    Use standard optimizations to address this:Predication, loop unswitching, jump threading, etc.

    See paper for detailsif (PD->isCompressed) P1 = *(int32*)(PoolBase + P2*C1 + C2);else P1 = *(int64*)(PoolBase + P2*C3 + C4);

  • Talk OutlineIntroduction & MotivationAutomatic Pool Allocation BackgroundPointer Compression TransformationExperimental ResultsConclusion

  • Experimental Results: 2 QuestionsDoes Pointer Compression improve the performance of pointer intensive programs?Cache miss reductions, memory bandwidth improvementsMemory footprint reduction

    How does the impact of Pointer Compression vary across 64-bit architectures?Do memory system improvements outweigh overhead?Built in the LLVM Compiler Infrastructure: http://llvm.cs.uiuc.edu/

  • Static PtrComp Performance ImpactUltraSPARC IIIi w/1MB CachePeak Memory Usage1.0 = Program compiled with LLVM & PA but no PC

    PAPCPC/PAbh8MB8MB1.00bisort64MB32MB0.50em3d47MB47MB1.00perimeter299MB171MB0.57power882KB816KB0.93treeadd128MB64MB0.50tsp128MB96MB0.75ft9MB4MB0.51ks47KB47KB1.00llubench4MB2MB0.50

    Chart2

    0.94626312540.9987646695

    1.12422871250.9991772933

    1.00110456551.1137702504

    1.52790346910.8114630468

    1.00153374230.9984662577

    1.37987874190.6775293672

    1.12592137590.9385749386

    1.25775862070.8655172414

    1.01008827241.0428751576

    1.34169953390.4256005737

    NoPA Runtime Ratio

    PtrComp Runtime Ratio

    Runtime Ratio (PA = 1.0)

    Sheet1

    NativePAPCNoPA Runtime RatioPtrComp Runtime RatioPeakPAPeakPCPC/PAPeakPAPeakPC

    bh15.3216.1916.170.94626312540.99876466957.7MB7.7MB0bh7.7MB7.7MB

    bisort27.3324.3124.291.12422871250.999177293348MB24MB0bisort48MB24MB

    em3d27.1927.1630.251.00110456551.1137702504

    perimeter10.136.635.381.52790346910.8114630468256MB149MB0perimeter256MB149MB

    power6.536.526.511.00153374230.9984662577924KB854KB0power924KB854KB

    treeadd72.8352.7835.761.37987874190.677529367296MB48MB0treeadd96MB48MB

    tsp18.3316.2815.281.12592137590.9385749386131MB114MB0tsp131MB114MB

    ft14.5911.610.041.25775862070.86551724148.9MB4.5MB0ft8.9MB4.5MB

    ks8.017.938.271.01008827241.042875157647KB47KB0ks47KB47KB

    llubench37.4227.8911.871.34169953390.42560057373MB1.5MB0llubench3MB1.5MB

    Sheet1

    NoPA Runtime Ratio

    PtrComp Runtime Ratio

    Runtime Ratio (PA = 1.0)

    Sheet2

    Sheet3

  • Evaluating Effect of ArchitecturePick one program that scales easily:llubench Linked list micro-benchmarkllubench has little computation, many dereferencesBest possible case for pointer compressionEvaluate how ptrcomp impacts scalability:Compare to native and pool allocated versionEvaluate overhead introduced by ptrcomp:Compare PA32 with PC32 (compress 32 32 bits)How close is ptrcomp to native 32-bit pointers?Compare to native-32 and poolalloc-32 for limit study

  • SPARC V9 PtrComp (1MB Cache)PtrComp overhead is very lowSignificant scalability impact

    Chart3

    0.00000002910.00000001780.00000001090.00000001730.0000000110.0000000112

    0.00000002730.00000001820.00000001040.00000002020.00000001060.0000000109

    0.00000002720.00000001760.00000001080.00000002020.00000001020.0000000112

    0.00000002740.0000000180.00000001130.00000002120.00000001130.0

Recommended

View more >