composing high-performance memory allocators with heap layers
DESCRIPTION
Heap Layers is a template-based infrastructure for building high-quality, fast memory allocators. The infrastructure is remarkably flexible, and the resulting memory allocators are as fast or faster than counterparts written in conventional C or C++. We have built several industrial-strength allocators using Heap Layers, including Hoard (which now includes the Heap Layers infrastructure) and DieHard.TRANSCRIPT
Composing High-Performance Memory Allocators
Emery Berger, Ben Zorn, Kathryn McKinley
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 2
Motivation & Contributions• Programs increasingly allocation intensive
– spend more than half of runtime in malloc/free
programmers require high performance allocators– often build own custom allocators
• Heap layers infrastructure for building memory allocators– composable, extensible, and high-performance– based on C++ templates– custom and general-purpose, competitive with state-
of-the-art
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 3
Outline• High-performance memory
allocators– focus on custom allocators– pros & cons of current practice
• Previous work• Heap layers
– how it works– examples
• Experimental results– custom & general-purpose allocators
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 4
Using Custom Allocators• Can be very fast:
– Linked lists of objects for highly-used classes
– Region (arena, zone) allocators
• “Best practices” [Meyers 1995, Bulka 2001]
– Used in 3 SPEC2000 benchmarks (parser, gcc, vpr), Apache, PGP, SQLServer, etc.
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 5
Custom Allocators Work
Using a custom allocator reduces runtime by 60%
197.parser runtime
0
5
10
15
20
25
custom allocator system allocator (estimated)
Allocator
Ru
nti
me
(sec
s)
memory operations
computation
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 6
Problems with Current Practice• Brittle code
– written from scratch – macros/monolithic functions to avoid
overhead hard to write, reuse or maintain
• Excessive fragmentation– good memory allocators:
complicated, not retargetable
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 7
Allocator Conceptual DesignPeople think & talk about heaps as if they
were modular:
Select heap based on size
malloc free
Manage small objects
System memory manager
Manage large objects
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 8
Infrastructure Requirements
• Flexible– can add functionality
• Reusable– in other contexts & in same
program
• Fast– very low or no overhead
• High-level– as component-like as possible
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 9
Possible Solutions
Flexible
Reusable
Fast High-level
Indirect function calls (Vmalloc [Vo
1996])
function call
overhead
function-pointer
assignment
Object-oriented(CMM
[Attardi et al. 1998])
rigid
hierarchy
virtual method overhe
ad
Mixins(our
approach)
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 10
Ordinary Classes vs. Mixins• Ordinary classes
– fixed inheritance dag– can’t rearrange
hierarchy– can’t use class
multiple times
• Mixins– no fixed inheritance dag– multiple hierarchies possible– can reuse classes– fast: static dispatch
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 11
A Heap Layer
void * malloc (sz) { do something; void * p = SuperHeap::malloc (sz); do something else; return p;}
heap layer
template <class SuperHeap>class HeapLayer : public SuperHeap {…};
• Provides malloc and free methods• “Top heaps” get memory from system
– e.g., mallocHeap uses C library’s malloc and free
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 12
LockedHeap
mallocHeap
void * malloc (sz) { acquire lock; void * p = release lock; return p;}
Example: Thread-safety
LockedHeap protects the parent heap with a single lock
class LockedMallocHeap:public LockedHeap<mallocHeap> {};
SuperHeap::malloc (sz);
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 13
Example: Debugging
DebugHeapProtects against invalid & multiple frees.
DebugHeap
class LockedDebugMallocHeap:public LockedHeap< DebugHeap<mallocHeap> > {};
LockedHeap
void free (p) { check that p is valid; check that p hasn’t been freed before;
}
SuperHeap::free (p);
mallocHeap
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 14
Implementation in Heap LayersModular design and implementation
SegHeap
malloc free
SizeHeap
FreelistHeap manage objects on freelist
add size info to objects
select heap based on size
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 15
Experimental Methodology• Built replacement allocators using heap layers
– custom allocators:• XallocHeap (197.parser), ObstackHeap
(176.gcc)– general-purpose allocators:
• KingsleyHeap (BSD allocator)• LeaHeap (based on Lea allocator 2.7.0)
– three weeks to develop– 500 lines vs. 2,000 lines in original
• Compared performance with original allocators– SPEC benchmarks & standard allocation benchmarks
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 16
Experimental Results:Custom Allocation – gcc
gcc parse: Obstack vs. ObstackHeap
0
0.25
0.5
0.75
1
1.25
Macros
No macros
ObstackHeap+malloc
Ru
nti
me
(n
orm
aliz
ed
)
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 17
Experimental Results:General-Purpose Allocators
Runtime (normalized to Lea allocator)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
cfrac espresso lindsay LRUsim perl roboop AverageBenchmark
No
rma
lize
d R
un
tim
e
Kingsley KingsleyHeap Lea LeaHeap
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 18
Experimental Results:General-Purpose Allocators
Space (normalized to Lea allocator)
0
0.5
1
1.5
2
2.5
cfrac espresso lindsay LRUsim perl roboop Averagew/o
roboopBenchmark
No
rmal
ized
Sp
ace
Kingsley KingsleyHeap Lea LeaHeap
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 19
Conclusion• Heap layers infrastructure for composing
allocators
• Useful experimental infrastructure
• Allows rapid implementation of high-quality allocators– custom allocators as fast as originals– general-purpose allocators comparable to state-of-
the-artin speed and efficiency
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 20
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 21
A Library of Heap LayersTop heaps
mallocHeap, mmapHeap, sbrkHeap
Building-blocksAdaptHeap, FreelistHeap, CoalesceHeap
Combining heapsHybridHeap, TryHeap, SegHeap, StrictSegHeap
Utility layersANSIWrapper, DebugHeap, LockedHeap, PerClassHeap, STLAdapter
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 22
Heap Layersas Experimental Infrastructure
Kingsley allocatoraverages 50% internal
fragmentationwhat’s the impact of adding
coalescing?
Just add coalescing layertwo lines of code!
Result:Almost as memory-efficient
as Lea allocatorReasonably fast for all but
most allocation-intensive apps
Runtime: General-Purpose Allocators
0
0.5
1
1.5
2
cfrac espresso lindsay LRUsim perl roboop
Benchmark
No
rmali
zed
Ru
nti
me
Kingsley KingsleyHeap KingsleyHeap + coal. Lea LeaHeap
Space: General-Purpose Allocators
0
0.5
1
1.5
2
2.5
cfrac espresso lindsay LRUsim perl roboop
Benchmark
No
rmali
zed
Sp
ace
Kingsley KingsleyHeap KingsleyHeap + coal. Lea LeaHeap