Composing High-Performance Memory Allocators

Download Report

Transcript Composing High-Performance Memory Allocators

Composing High-Performance
Memory Allocators
Emery Berger, Ben Zorn, Kathryn McKinley
Motivation & Contributions
• Programs increasingly allocation intensive
– spend more than half of runtime in malloc/free
 programmers require high performance allocators
– often build own custom allocators
• Heap layers infrastructure for building memory allocators
– composable, extensible, and high-performance
– based on C++ templates
– custom and general-purpose, competitive with state-of-the-art
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley
2
Outline
• High-performance memory allocators
– focus on custom allocators
– pros & cons of current practice
• Previous work
• Heap layers
– how it works
– examples
• Experimental results
– custom & general-purpose allocators
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley
3
Using Custom Allocators
• Can be very fast:
– Linked lists of objects for highly-used classes
– Region (arena, zone) allocators
• “Best practices” [Meyers 1995, Bulka 2001]
– Used in 3 SPEC2000 benchmarks (parser, gcc,
vpr), Apache, PGP, SQLServer, etc.
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley
4
Custom Allocators Work
197.parser runtime
Runtime (secs)
25
20
memory operations
computation
15
10
5
0
custom allocator
system allocator (estimated)
Allocator
Using a custom allocator reduces runtime by 60%
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley
5
Problems with Current Practice
• Brittle code
– written from scratch
– macros/monolithic functions to avoid
overhead
 hard to write, reuse or maintain
• Excessive fragmentation
– good memory allocators:
complicated, not retargettable
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley
6
Allocator Conceptual Design
People think & talk about heaps as if they were modular:
System memory manager
Manage small objects
Manage large objects
Select heap based on size
malloc free
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley
7
Infrastructure Requirements
• Flexible
– can add functionality
• Reusable
– in other contexts & in same program
• Fast
– very low or no overhead
• High-level
– as component-like as possible
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley
8
Possible Solutions
Flexible
Reusable
Fast
High-level


function
call
overhead
function-pointer
assignment
Object-oriented
(CMM [Attardi et
al. 1998])

rigid
hierarchy
virtual
method
overhead

Mixins
(our approach)




Indirect function
calls (Vmalloc
[Vo 1996])
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley
9
Ordinary Classes vs. Mixins
•
Ordinary classes
–
–
–
fixed inheritance dag
can’t rearrange hierarchy
can’t use class multiple
times
•
Mixins
–
–
–
–
no fixed inheritance dag
multiple hierarchies possible
can reuse classes
fast: static dispatch
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley
10
A Heap Layer
•
•
Provides malloc and free methods
“Top heaps” get memory from system
–
e.g., mallocHeap uses C library’s malloc and free
template <class SuperHeap>
class HeapLayer : public SuperHeap {…};
heap layer
void * malloc (sz) {
do something;
void * p = SuperHeap::malloc (sz);
do something else;
return p;
}
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley
11
Example: Thread-safety
LockedHeap
protects the parent
heap with a single
lock
class LockedMallocHeap:
public
LockedHeap<mallocHeap> {};
mallocHeap
LockedHeap
void * malloc (sz) {
acquire lock;
void * p = SuperHeap::malloc (sz);
release lock;
return p;
}
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley
12
Example: Debugging
DebugHeap
class LockedDebugMallocHeap:
public LockedHeap<
DebugHeap<mallocHeap> > {};
Protects against
invalid & multiple
frees.
mallocHeap
DebugHeap
void free (p) {
check that p is valid;
check that p hasn’t been freed before;
SuperHeap::free (p);
}
LockedHeap
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley
13
Implementation in Heap Layers
Modular design and implementation
FreelistHeap
manage objects on freelist
SizeHeap
add size info to objects
SegHeap
select heap based on size
malloc free
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley
14
Experimental Methodology
•
Built replacement allocators using heap layers
–
–
custom allocators:
• XallocHeap (197.parser), ObstackHeap (176.gcc)
general-purpose allocators:
• KingsleyHeap (BSD allocator)
• LeaHeap (based on Lea allocator 2.7.0)
– three weeks to develop
– 500 lines vs. 2,000 lines in original
•
Compared performance with original allocators
–
SPEC benchmarks & standard allocation benchmarks
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley
15
Experimental Results:
Custom Allocation – gcc
Runtime (normalized)
gcc parse: Obstack vs. ObstackHeap
1.25
1
0.75
0.5
0.25
0
s
c ro
Ma
N
s
c ro
a
om
O
c
al lo
m
+
ea p
H
k
c
bsta
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley
16
Experimental Results:
General-Purpose Allocators
Runtime (normalized to Lea allocator)
Normalized Runtime
1.4
Kingsley
KingsleyHeap
Lea
LeaHeap
1.2
1
0.8
0.6
0.4
0.2
0
cfrac
espresso lindsay
LRUsim
perl
Benchmark
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley
roboop
Average
17
Experimental Results:
General-Purpose Allocators
Space (normalized to Lea allocator)
Normalized Space
2.5
Kingsley
KingsleyHeap
Lea
LeaHeap
2
1.5
1
0.5
0
cfrac
espresso
lindsay
LRUsim
perl
Benchmark
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley
roboop
Average
w/o
roboop
18
Conclusion
• Heap layers infrastructure for composing allocators
• Useful experimental infrastructure
• Allows rapid implementation of high-quality allocators
– custom allocators as fast as originals
– general-purpose allocators comparable to state-of-the-art
in speed and efficiency
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley
19
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley
20
A Library of Heap Layers
Top heaps
mallocHeap, mmapHeap, sbrkHeap
Building-blocks
AdaptHeap, FreelistHeap, CoalesceHeap
Combining heaps
HybridHeap, TryHeap, SegHeap,
StrictSegHeap
Utility layers
ANSIWrapper, DebugHeap, LockedHeap,
PerClassHeap, STLAdapter
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley
21
Heap Layers
as Experimental Infrastructure
Space: General-Purpose Allocators
Kingsley allocator
KingsleyHeap
Lea
LeaHeap
2
1.5
1
0.5
0
cfrac
Just add coalescing layer
espresso
lindsay
LRUsim
perl
roboop
Benchmark
two lines of code!
Runtime: General-Purpose Allocators
Result:
Normalized Runtime
Kingsley
Almost as memory-efficient
as Lea allocator
Reasonably fast for all but
most allocationintensive apps
KingsleyHeap + coal.
2.5
Normalized Space
averages 50% internal
fragmentation
what’s the impact of adding
coalescing?
Kingsley
KingsleyHeap
KingsleyHeap + coal.
Lea
LeaHeap
2
1.5
1
0.5
0
cfrac
espresso
lindsay
LRUsim
perl
roboop
Benchmark
PLDI 2001 - Composing High-Performance Memory Allocators - Berger, Zorn, McKinley
22