Multiprocessor Memory Allocation

Download Report

Transcript Multiprocessor Memory Allocation

Reconsidering
Custom Memory Allocation
Emery Berger, Ben Zorn, Kathryn McKinley
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
Custom Memory Allocation

Programmers replace
new/delete, bypassing
system allocator




Reduce runtime – often
Expand functionality – sometimes
Reduce space – rarely
Very common practice



Apache, gcc, lcc, STL,
database servers…
Language-level
support in C++
Widely recommended
“Use custom
allocators”
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
2
Drawbacks of Custom Allocators

Avoiding system allocator:



More code to maintain & debug
Can’t use memory debuggers
Not modular or robust:


Mix memory from custom
and general-purpose allocators → crash!
Increased burden on programmers
Are custom allocators really a win?
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
3
Overview






Introduction
Perceived benefits and drawbacks
Three main kinds of custom allocators
Comparison with general-purpose allocators
Advantages and drawbacks of regions
Reaps – generalization of regions & heaps
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
4
(1) Per-Class Allocators

Recycle freed objects from a free list
a = new Class1;
b = new Class1;
c = new Class1;
delete a;
delete b;
delete c;
a = new Class1;
b = new Class1;
c = new Class1;
Class1
free list
+
Fast
+
a
+
b
Simple
+
+
c
-
Linked list operations
Identical semantics
C++ language support
Possibly space-inefficient
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
5
(II) Custom Patterns

Tailor-made to fit allocation patterns

Example: 197.parser (natural language parser)
char[MEMORY_LIMIT]
a
db
c
end_of_array
end_of_array
end_of_array
end_of_array
end_of_array
a = xalloc(8);
b = xalloc(16);
c = xalloc(8);
xfree(b);
xfree(c);
d = xalloc(8);
+
Fast
+
Pointer-bumping allocation
- Brittle
- Fixed memory size
- Requires stack-like lifetimes
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
6
(III) Regions

Separate areas, deletion only en masse
regioncreate(r)
regionmalloc(r, sz)
regiondelete(r)
+
- Risky
Fast
+
+
+
r
Pointer-bumping allocation
Deletion of chunks
Convenient
+
- Dangling
references
- Too much space
One call frees all memory

Increasingly popular custom allocator
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
7
Overview






Introduction
Perceived benefits and drawbacks
Three main kinds of custom allocators
Comparison with general-purpose allocators
Advantages and drawbacks of regions
Reaps – generalization of regions & heaps
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
8
Custom Allocators Are Faster…
Runtime - Custom Allocator Benchmarks
1.75
regions
0.5
0.25
ud
lle
m
lc
c
ap
ac
he
17
6.
gc
c
17
5.
vp
r
br
ee
ze
c-
bo
xe
d-
si
m
0
19
7.
pa
r

Win32
non-regions
1.5
1.25
1
0.75
se
r
Normalized Runtime
Custom
As good as and sometimes much faster than Win32
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
9
Not So Fast…
Runtime - Custom Allocator Benchmarks
Custom
Normalized Runtime
1.75
Win32
non-regions
1.5
DLmalloc
regions
1.25
1
0.75
0.5
0.25

lle
m
ud
c
lc
he
ac
ap
17
6.
gc
c
r
5.
vp
17
e
ee
z
cbr
xe
d
bo
19
7.
pa
rs
e
-s
im
r
0
DLmalloc: as fast or faster for most benchmarks
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
10
The Lea Allocator (DLmalloc 2.7.0)


Mature public-domain general-purpose
allocator
Optimized for common allocation patterns


Deferred coalescing
(combining adjacent free objects)


Per-size quicklists ≈ per-class allocation
Highly-optimized fastpath
Space-efficient
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
11
Space Consumption: Mixed Results
Space - Custom Allocator Benchmarks
Normalized Space
Custom
1.75
1.5
1.25
1
0.75
0.5
0.25
0
1
.p
97
DLmalloc
non-regions
se
ar
r
b
e
ox
sim
d
regions
e
b
c-
ez
e
r
17
vp
.
5
r
c
17
gc
6.
ap
h
ac
e
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
c
lc
l
ud
m
le
12
Overview






Introduction
Perceived benefits and drawbacks
Three main kinds of custom allocators
Comparison with general-purpose allocators
Advantages and drawbacks of regions
Reaps – generalization of regions & heaps
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
13
Regions – Pros and Cons
+
+
Fast, convenient, etc.
Avoid resource leaks (e.g., Apache)

-
Tear down memory for terminated connections
No individual object deletion

Unbounded memory consumption
(producer-consumer, long-running computations,
off-the-shelf programs)
 Apache: vulnerable to DoS, memory leaks
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
14
Reap Hybrid Allocator

Reap = region + heap

Adds individual object deletion & heap
reapcreate(r)
reapmalloc(r, sz)
reapfree(r,p)
reapdelete(r)
+
+
r
Can reduce memory consumption
Fast
+
+
Adapts to use (region or heap style)
Cheap deletion
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
15
Reap Runtime
Runtime - Custom Allocation Benchmarks
Win32
DLmalloc
lle
c
lc
he
ac
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
m
ud
17
6.
gc
c
r
5.
vp
17
e
regions
ee
z
cbr
xe
bo
7.
pa
rs
e
dsim
r
non-regions
Reap
ap
1.75
1.5
1.25
1
0.75
0.5
0.25
0
19
Normalized runtime
Custom
16
Reap Space
Space - Custom Allocator Benchmarks
DLmalloc
lle
c
lc
he
ac
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
m
ud
17
6.
gc
c
r
5.
vp
17
e
regions
ee
z
cbr
xe
bo
7.
pa
rs
e
dsim
r
non-regions
Reap
ap
1.75
1.5
1.25
1
0.75
0.5
0.25
0
19
Normalized Space
Custom
17
Reap: Best of Both Worlds


Allows mixing of regions and new/delete
Case study:
 New Apache module “mod_bc”



bc: C-based arbitrary-precision calculator
Changed 20 lines out of 8000
Benchmark: compute 1000th prime


With Reap: 240K
Without Reap: 7.4MB
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
18
Conclusions and Future Work

Empirical study of custom allocators




Lea allocator often as fast or faster
Non-region custom allocation ineffective
Reap: region performance without drawbacks
Future work:


Reduce space with per-page bitmaps
Combine with scalable general-purpose
allocator (e.g., Hoard)
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
19
Software
http://www.cs.umass.edu/~emery
(Reap: part of Heap Layers distribution)
http://g.oswego.edu
(DLmalloc 2.7.0)
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
20
If You Can Read This,
I Went Too Far
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
21
Backup Slides
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
22
Experimental Methodology

Comparing to general-purpose
allocators

Same semantics: no problem


E.g., disable per-class allocators
Different semantics: use emulator
Uses general-purpose allocator
 Adds bookkeeping to support
region semantics

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
23
Why Did They Do That?


Recommended practice
Premature optimization


Drift


Microbenchmarks vs. actual performance
Not bottleneck anymore
Improved competition

Modern allocators are better
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
24
Reaps as Regions: Runtime
Runtime - Region-Based Benchmarks
Custom
Win32
DLmalloc
Reap
1.75
Normalized Runtime
1.5
1.25
1
0.75
0.5
0.25
0
lcc

mudlle
Reap performance nearly matches regions
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
25
Using Reap as Regions
Runtime - Region-Based Benchmarks
Original
Win32
WinHeap
Vmalloc
Reap
4.08
2.5
Normalized Runtime
DLmalloc
2
1.5
1
0.5
0
lcc
mudlle
Reap performance nearly matches regions
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
26
Drawbacks of Regions

Can’t reclaim memory within regions



Bad for long-running computations,
producer-consumer patterns,
“malloc/free” programs
unbounded memory consumption
Current situation for Apache:



vulnerable to denial-of-service
limits runtime of connections
limits module programming
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
27
Use Custom Allocators?


Strongly recommended by practitioners
Little hard data on performance/space
improvements



Only one previous study [Zorn 1992]
Focused on just one type of allocator
Custom allocators: waste of time


Small gains, bad allocators
Different allocators better? Trade-offs?
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
28
Kinds of Custom Allocators

Three basic types of custom allocators

Per-class


Custom patterns


Fast
Fast, but very special-purpose
Regions



Fast, possibly more space-efficient
Convenient
Variants: nested, obstacks
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
29
Optimization Opportunity
Time Spent in Memory Operations
Memory Operations
Other
80
60
40
20
ge
ra
Av
e
ud
ll e
m
lcc
e
ac
h
ap
cc
17
6.
g
pr
5.
v
17
ze
br
ee
c-
sim
xe
d-
bo
7.
p
ar
se
r
0
19
% of runtime
100
UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE
30