composing high-performance memory allocators with heap layers

22
Composing High-Performance Memory Allocators Emery Berger, Ben Zorn, Kathryn McKinley

Upload: emery-berger

Post on 10-May-2015

4.792 views

Category:

Technology


4 download

DESCRIPTION

Heap Layers is a template-based infrastructure for building high-quality, fast memory allocators. The infrastructure is remarkably flexible, and the resulting memory allocators are as fast or faster than counterparts written in conventional C or C++. We have built several industrial-strength allocators using Heap Layers, including Hoard (which now includes the Heap Layers infrastructure) and DieHard.

TRANSCRIPT

Page 1: Composing High-Performance Memory Allocators with Heap Layers

Composing High-Performance Memory Allocators

Emery Berger, Ben Zorn, Kathryn McKinley

Page 2: Composing High-Performance Memory Allocators with Heap Layers

PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 2

Motivation & Contributions• Programs increasingly allocation intensive

– spend more than half of runtime in malloc/free

programmers require high performance allocators– often build own custom allocators

• Heap layers infrastructure for building memory allocators– composable, extensible, and high-performance– based on C++ templates– custom and general-purpose, competitive with state-

of-the-art

Page 3: Composing High-Performance Memory Allocators with Heap Layers

PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 3

Outline• High-performance memory

allocators– focus on custom allocators– pros & cons of current practice

• Previous work• Heap layers

– how it works– examples

• Experimental results– custom & general-purpose allocators

Page 4: Composing High-Performance Memory Allocators with Heap Layers

PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 4

Using Custom Allocators• Can be very fast:

– Linked lists of objects for highly-used classes

– Region (arena, zone) allocators

• “Best practices” [Meyers 1995, Bulka 2001]

– Used in 3 SPEC2000 benchmarks (parser, gcc, vpr), Apache, PGP, SQLServer, etc.

Page 5: Composing High-Performance Memory Allocators with Heap Layers

PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 5

Custom Allocators Work

Using a custom allocator reduces runtime by 60%

197.parser runtime

0

5

10

15

20

25

custom allocator system allocator (estimated)

Allocator

Ru

nti

me

(sec

s)

memory operations

computation

Page 6: Composing High-Performance Memory Allocators with Heap Layers

PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 6

Problems with Current Practice• Brittle code

– written from scratch – macros/monolithic functions to avoid

overhead hard to write, reuse or maintain

• Excessive fragmentation– good memory allocators:

complicated, not retargetable

Page 7: Composing High-Performance Memory Allocators with Heap Layers

PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 7

Allocator Conceptual DesignPeople think & talk about heaps as if they

were modular:

Select heap based on size

malloc free

Manage small objects

System memory manager

Manage large objects

Page 8: Composing High-Performance Memory Allocators with Heap Layers

PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 8

Infrastructure Requirements

• Flexible– can add functionality

• Reusable– in other contexts & in same

program

• Fast– very low or no overhead

• High-level– as component-like as possible

Page 9: Composing High-Performance Memory Allocators with Heap Layers

PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 9

Possible Solutions

Flexible

Reusable

Fast High-level

Indirect function calls (Vmalloc [Vo

1996])

function call

overhead

function-pointer

assignment

Object-oriented(CMM

[Attardi et al. 1998])

rigid

hierarchy

virtual method overhe

ad

Mixins(our

approach)

Page 10: Composing High-Performance Memory Allocators with Heap Layers

PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 10

Ordinary Classes vs. Mixins• Ordinary classes

– fixed inheritance dag– can’t rearrange

hierarchy– can’t use class

multiple times

• Mixins– no fixed inheritance dag– multiple hierarchies possible– can reuse classes– fast: static dispatch

Page 11: Composing High-Performance Memory Allocators with Heap Layers

PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 11

A Heap Layer

void * malloc (sz) { do something; void * p = SuperHeap::malloc (sz); do something else; return p;}

heap layer

template <class SuperHeap>class HeapLayer : public SuperHeap {…};

• Provides malloc and free methods• “Top heaps” get memory from system

– e.g., mallocHeap uses C library’s malloc and free

Page 12: Composing High-Performance Memory Allocators with Heap Layers

PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 12

LockedHeap

mallocHeap

void * malloc (sz) { acquire lock; void * p = release lock; return p;}

Example: Thread-safety

LockedHeap protects the parent heap with a single lock

class LockedMallocHeap:public LockedHeap<mallocHeap> {};

SuperHeap::malloc (sz);

Page 13: Composing High-Performance Memory Allocators with Heap Layers

PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 13

Example: Debugging

DebugHeapProtects against invalid & multiple frees.

DebugHeap

class LockedDebugMallocHeap:public LockedHeap< DebugHeap<mallocHeap> > {};

LockedHeap

void free (p) { check that p is valid; check that p hasn’t been freed before;

}

SuperHeap::free (p);

mallocHeap

Page 14: Composing High-Performance Memory Allocators with Heap Layers

PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 14

Implementation in Heap LayersModular design and implementation

SegHeap

malloc free

SizeHeap

FreelistHeap manage objects on freelist

add size info to objects

select heap based on size

Page 15: Composing High-Performance Memory Allocators with Heap Layers

PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 15

Experimental Methodology• Built replacement allocators using heap layers

– custom allocators:• XallocHeap (197.parser), ObstackHeap

(176.gcc)– general-purpose allocators:

• KingsleyHeap (BSD allocator)• LeaHeap (based on Lea allocator 2.7.0)

– three weeks to develop– 500 lines vs. 2,000 lines in original

• Compared performance with original allocators– SPEC benchmarks & standard allocation benchmarks

Page 16: Composing High-Performance Memory Allocators with Heap Layers

PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 16

Experimental Results:Custom Allocation – gcc

gcc parse: Obstack vs. ObstackHeap

0

0.25

0.5

0.75

1

1.25

Macros

No macros

ObstackHeap+malloc

Ru

nti

me

(n

orm

aliz

ed

)

Page 17: Composing High-Performance Memory Allocators with Heap Layers

PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 17

Experimental Results:General-Purpose Allocators

Runtime (normalized to Lea allocator)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

cfrac espresso lindsay LRUsim perl roboop AverageBenchmark

No

rma

lize

d R

un

tim

e

Kingsley KingsleyHeap Lea LeaHeap

Page 18: Composing High-Performance Memory Allocators with Heap Layers

PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 18

Experimental Results:General-Purpose Allocators

Space (normalized to Lea allocator)

0

0.5

1

1.5

2

2.5

cfrac espresso lindsay LRUsim perl roboop Averagew/o

roboopBenchmark

No

rmal

ized

Sp

ace

Kingsley KingsleyHeap Lea LeaHeap

Page 19: Composing High-Performance Memory Allocators with Heap Layers

PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 19

Conclusion• Heap layers infrastructure for composing

allocators

• Useful experimental infrastructure

• Allows rapid implementation of high-quality allocators– custom allocators as fast as originals– general-purpose allocators comparable to state-of-

the-artin speed and efficiency

Page 20: Composing High-Performance Memory Allocators with Heap Layers

PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 20

Page 21: Composing High-Performance Memory Allocators with Heap Layers

PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 21

A Library of Heap LayersTop heaps

mallocHeap, mmapHeap, sbrkHeap

Building-blocksAdaptHeap, FreelistHeap, CoalesceHeap

Combining heapsHybridHeap, TryHeap, SegHeap, StrictSegHeap

Utility layersANSIWrapper, DebugHeap, LockedHeap, PerClassHeap, STLAdapter

Page 22: Composing High-Performance Memory Allocators with Heap Layers

PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley 22

Heap Layersas Experimental Infrastructure

Kingsley allocatoraverages 50% internal

fragmentationwhat’s the impact of adding

coalescing?

Just add coalescing layertwo lines of code!

Result:Almost as memory-efficient

as Lea allocatorReasonably fast for all but

most allocation-intensive apps

Runtime: General-Purpose Allocators

0

0.5

1

1.5

2

cfrac espresso lindsay LRUsim perl roboop

Benchmark

No

rmali

zed

Ru

nti

me

Kingsley KingsleyHeap KingsleyHeap + coal. Lea LeaHeap

Space: General-Purpose Allocators

0

0.5

1

1.5

2

2.5

cfrac espresso lindsay LRUsim perl roboop

Benchmark

No

rmali

zed

Sp

ace

Kingsley KingsleyHeap KingsleyHeap + coal. Lea LeaHeap