Multiprocessor Memory Allocation
Download
Report
Transcript Multiprocessor Memory Allocation
Reconsidering
Custom Memory Allocation
Emery Berger, Ben Zorn, Kathryn McKinley
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
Custom Memory Allocation
Programmers replace
new/delete, bypassing
system allocator
Reduce runtime – often
Expand functionality – sometimes
Reduce space – rarely
Very common practice
Apache, gcc, lcc, STL,
database servers…
Language-level
support in C++
Widely recommended
“Use custom
allocators”
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
2
Drawbacks of Custom Allocators
Avoiding system allocator:
More code to maintain & debug
Can’t use memory debuggers
Not modular or robust:
Mix memory from custom
and general-purpose allocators → crash!
Increased burden on programmers
Are custom allocators really a win?
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
3
Overview
Introduction
Perceived benefits and drawbacks
Three main kinds of custom allocators
Comparison with general-purpose allocators
Advantages and drawbacks of regions
Reaps – generalization of regions & heaps
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
4
(1) Per-Class Allocators
Recycle freed objects from a free list
a = new Class1;
b = new Class1;
c = new Class1;
delete a;
delete b;
delete c;
a = new Class1;
b = new Class1;
c = new Class1;
Class1
free list
+
Fast
+
a
+
b
Simple
+
+
c
-
Linked list operations
Identical semantics
C++ language support
Possibly space-inefficient
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
5
(II) Custom Patterns
Tailor-made to fit allocation patterns
Example: 197.parser (natural language parser)
char[MEMORY_LIMIT]
a
db
c
end_of_array
end_of_array
end_of_array
end_of_array
end_of_array
a = xalloc(8);
b = xalloc(16);
c = xalloc(8);
xfree(b);
xfree(c);
d = xalloc(8);
+
Fast
+
Pointer-bumping allocation
- Brittle
- Fixed memory size
- Requires stack-like lifetimes
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
6
(III) Regions
Separate areas, deletion only en masse
regioncreate(r)
regionmalloc(r, sz)
regiondelete(r)
+
- Risky
Fast
+
+
+
r
Pointer-bumping allocation
Deletion of chunks
Convenient
+
- Dangling
references
- Too much space
One call frees all memory
Increasingly popular custom allocator
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
7
Overview
Introduction
Perceived benefits and drawbacks
Three main kinds of custom allocators
Comparison with general-purpose allocators
Advantages and drawbacks of regions
Reaps – generalization of regions & heaps
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
8
Custom Allocators Are Faster…
Runtime - Custom Allocator Benchmarks
1.75
regions
0.5
0.25
ud
lle
m
lc
c
ap
ac
he
17
6.
gc
c
17
5.
vp
r
br
ee
ze
c-
bo
xe
d-
si
m
0
19
7.
pa
r
Win32
non-regions
1.5
1.25
1
0.75
se
r
Normalized Runtime
Custom
As good as and sometimes much faster than Win32
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
9
Not So Fast…
Runtime - Custom Allocator Benchmarks
Custom
Normalized Runtime
1.75
Win32
non-regions
1.5
DLmalloc
regions
1.25
1
0.75
0.5
0.25
lle
m
ud
c
lc
he
ac
ap
17
6.
gc
c
r
5.
vp
17
e
ee
z
cbr
xe
d
bo
19
7.
pa
rs
e
-s
im
r
0
DLmalloc: as fast or faster for most benchmarks
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
10
The Lea Allocator (DLmalloc 2.7.0)
Mature public-domain general-purpose
allocator
Optimized for common allocation patterns
Deferred coalescing
(combining adjacent free objects)
Per-size quicklists ≈ per-class allocation
Highly-optimized fastpath
Space-efficient
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
11
Space Consumption: Mixed Results
Space - Custom Allocator Benchmarks
Normalized Space
Custom
1.75
1.5
1.25
1
0.75
0.5
0.25
0
1
.p
97
DLmalloc
non-regions
se
ar
r
b
e
ox
sim
d
regions
e
b
c-
ez
e
r
17
vp
.
5
r
c
17
gc
6.
ap
h
ac
e
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
c
lc
l
ud
m
le
12
Overview
Introduction
Perceived benefits and drawbacks
Three main kinds of custom allocators
Comparison with general-purpose allocators
Advantages and drawbacks of regions
Reaps – generalization of regions & heaps
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
13
Regions – Pros and Cons
+
+
Fast, convenient, etc.
Avoid resource leaks (e.g., Apache)
-
Tear down memory for terminated connections
No individual object deletion
Unbounded memory consumption
(producer-consumer, long-running computations,
off-the-shelf programs)
Apache: vulnerable to DoS, memory leaks
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
14
Reap Hybrid Allocator
Reap = region + heap
Adds individual object deletion & heap
reapcreate(r)
reapmalloc(r, sz)
reapfree(r,p)
reapdelete(r)
+
+
r
Can reduce memory consumption
Fast
+
+
Adapts to use (region or heap style)
Cheap deletion
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
15
Reap Runtime
Runtime - Custom Allocation Benchmarks
Win32
DLmalloc
lle
c
lc
he
ac
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
m
ud
17
6.
gc
c
r
5.
vp
17
e
regions
ee
z
cbr
xe
bo
7.
pa
rs
e
dsim
r
non-regions
Reap
ap
1.75
1.5
1.25
1
0.75
0.5
0.25
0
19
Normalized runtime
Custom
16
Reap Space
Space - Custom Allocator Benchmarks
DLmalloc
lle
c
lc
he
ac
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
m
ud
17
6.
gc
c
r
5.
vp
17
e
regions
ee
z
cbr
xe
bo
7.
pa
rs
e
dsim
r
non-regions
Reap
ap
1.75
1.5
1.25
1
0.75
0.5
0.25
0
19
Normalized Space
Custom
17
Reap: Best of Both Worlds
Allows mixing of regions and new/delete
Case study:
New Apache module “mod_bc”
bc: C-based arbitrary-precision calculator
Changed 20 lines out of 8000
Benchmark: compute 1000th prime
With Reap: 240K
Without Reap: 7.4MB
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
18
Conclusions and Future Work
Empirical study of custom allocators
Lea allocator often as fast or faster
Non-region custom allocation ineffective
Reap: region performance without drawbacks
Future work:
Reduce space with per-page bitmaps
Combine with scalable general-purpose
allocator (e.g., Hoard)
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
19
Software
http://www.cs.umass.edu/~emery
(Reap: part of Heap Layers distribution)
http://g.oswego.edu
(DLmalloc 2.7.0)
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
20
If You Can Read This,
I Went Too Far
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
21
Backup Slides
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
22
Experimental Methodology
Comparing to general-purpose
allocators
Same semantics: no problem
E.g., disable per-class allocators
Different semantics: use emulator
Uses general-purpose allocator
Adds bookkeeping to support
region semantics
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
23
Why Did They Do That?
Recommended practice
Premature optimization
Drift
Microbenchmarks vs. actual performance
Not bottleneck anymore
Improved competition
Modern allocators are better
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
24
Reaps as Regions: Runtime
Runtime - Region-Based Benchmarks
Custom
Win32
DLmalloc
Reap
1.75
Normalized Runtime
1.5
1.25
1
0.75
0.5
0.25
0
lcc
mudlle
Reap performance nearly matches regions
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
25
Using Reap as Regions
Runtime - Region-Based Benchmarks
Original
Win32
WinHeap
Vmalloc
Reap
4.08
2.5
Normalized Runtime
DLmalloc
2
1.5
1
0.5
0
lcc
mudlle
Reap performance nearly matches regions
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
26
Drawbacks of Regions
Can’t reclaim memory within regions
Bad for long-running computations,
producer-consumer patterns,
“malloc/free” programs
unbounded memory consumption
Current situation for Apache:
vulnerable to denial-of-service
limits runtime of connections
limits module programming
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
27
Use Custom Allocators?
Strongly recommended by practitioners
Little hard data on performance/space
improvements
Only one previous study [Zorn 1992]
Focused on just one type of allocator
Custom allocators: waste of time
Small gains, bad allocators
Different allocators better? Trade-offs?
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
28
Kinds of Custom Allocators
Three basic types of custom allocators
Per-class
Custom patterns
Fast
Fast, but very special-purpose
Regions
Fast, possibly more space-efficient
Convenient
Variants: nested, obstacks
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
29
Optimization Opportunity
Time Spent in Memory Operations
Memory Operations
Other
80
60
40
20
ge
ra
Av
e
ud
ll e
m
lcc
e
ac
h
ap
cc
17
6.
g
pr
5.
v
17
ze
br
ee
c-
sim
xe
d-
bo
7.
p
ar
se
r
0
19
% of runtime
100
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
30