The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.

Download Report

Transcript The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.

The Compressor:
Concurrent, Incremental
and Parallel Compaction.
Haim Kermany and Erez Petrank
Technion – Israel Institute of Technology
1
The Compressor





The first compactor with one heap pass.
Fully compacts all the objects in the heap.
Preserves the order of the objects.
Low space overhead.
A parallel version and a concurrent version.
2
Garbage collection

Automatic memory management.


User allocates objects
Memory manager reclaims objects which “are not
needed anymore”.
In practice: unreachable from roots.
3
Modern Platforms and
Requirements


High performance and low pauses.
SMP and Multicore platforms:


Use parallel collectors for highest efficiency
Use concurrent collectors for short pauses.
High throughput
Parallel (STW)
t
Concurrent &
Parallel
t
short Pauses
4
Main Streams in GC

Mark and Sweep



Reference Counting



Trace objects.
Go over the heap and reclaim the unmarked objects.
Keep the number of pointers to each object.
When an object counter reaches zero, reclaim the object.
Copying


Divide the heap into two spaces.
Copy all the objects from one space to the other.
5
Compaction - Motivation


M&S and RC face the problem of fragmentation.
Fragmentation – unused space between live objects
due to repeated allocation and reclaiming.




Allocation efficiency decreases.
May fail to allocate large objects.
Cache behavior may be harmed.
Compaction – move all the live objects to one place
in the heap.

Best practice: keep order of objects for best locality.
6
Traditional Compaction



Go over the heap and write the new location of
every object in its header (install a forwarding
pointer).
Update all the pointers in the roots and the heap.
Move the objects
Stack

Three Heap Passes
7
Agenda


Introduction: garbage collection, servers,
compaction.
The Compressor:






Basic technique
 Obtain compaction with a single heap pass.
The parallel version.
The concurrent version.
Measurements
Related Work.
Conclusion
8
Compressor - Overview



Compute new locations of objects
Fix root pointers
Move objects + fix their pointers
Stack

One Heap Pass
plus one pass over the (small) mark-bits table.
9
Compute new locations

Computing new locations and saving this info
succinctly:


Heap partitioned to blocks (typically, 512 bytes).
Start by computing and saving for each block the total
size of objects preceding that block (the offset vector).
Offset vector
The Heap
Addresses
0
0 1
50
90
2
3
125
4
200
5
6
275
7 8
325
350
9
1000 1100 1200 1300 1400 1500 1600 1700
10
Computing A New Address

Assume a markbit vector which reflect the heap:


First and the last bits of each object are set.
A new location of an object is computed from the
markbit and the offset vectors:

for object 5, at the 4th block the new location is:
1000 + 125 +50 = 1175.
Offset vector
0
Markbit vector
0 1
The Heap
Addresses
50
90
2
3
125
4
200
5
6
275
7 8
325
350
9
1000 1100 1200 1300 1400 1500 1600 1700
11
Computing Offset Vector


Computed from the markbit vector.
Does not require a heap pass
Offset vector
0
Markbit vector
0 1
The Heap
Addresses
50
90
2
3
125
4
200
5
6
275
7 8
325
350
9
1000 1100 1200 1300 1400 1500 1600 1700
12
Properties
Single heap pass.
Plus one pass over the markbit vector.
Small space overhead.
Does not need a forwarding pointer.
Single threaded.
Stop-the-world.

Next:


A parallel stop-the-world (STW) version.
A concurrent version.
13
Parallelization – First Try



Had we divided the heap to two spaces…
The application uses only one space.
The Compressor compacts the objects from
one space (from-space) to the other (to-Space).
Advantage: objects can be moved
independently.
Problem: space overhead.
14
Eliminating Space Overhead

Initially, to-space is not mapped to physical pages.


For every (virtual) page in to-space: (a parallel loop)


roots

0
It is a virtual address space.
Map the virtual page to physical memory.
Move the corresponding from-space objects and fix the
pointers.
Unmap the relevant pages in from-space.
12 3 45
0
1
2
6 7 8
3
90123456789
4
5
6
7
8
15
9
Properties
All virtues of basic Compressor:
Single heap pass, small space overhead.
Easy parallelization: each to-space page can be
handled independently.
Stop-The-World.
16
What about Concurrency?



Problem: two copies appear when moving objects
during application run.
Sync. problems between compaction and
application.
Solution (Baker style): Application can only access
moved objects (in to-space).
17
Concurrent Version





roots
0
Stop application
Fix roots to new locations in to-space.
Read-protect to-space and let application resume.
When application touches a to-space page a trap is
sprung.
Trap handler moves relevant objects into the page and
unprotect the page.
12 3 45
6 7 8
9
456789
18
Implementation &
Measurements


Implementation on the Jikes RVM.
Compressor added to a simple modification of the Jikes
mark-sweep collector (main modification: allocation via
local-caches).


Benchmarks: SPECjbb, Dacapo, SPECjvm98.


Compressor invoked once every 10 collections.
In the talk we concentrate on SPECjbb
Compared collectors:


no compaction algorithms on the Jikes RVM.
Some comparison to mark-sweep (MS) and an Appel
Generational collector (GenMS).
19
SPECjbb Throughput
CON
17000
STW
Throughput (ops)
GenMS
MS
15000
13000
CON = Concurrent
Compressor,
11000
STW = Parallel
Compressor
9000
7000
1
2
3
4
5
6
7
8
Number of WareHouses
20
SPECjbb pause time (ms)
Parallel
Compaction
Mark and
Sweep
(Stop-TheWorld)
generational
Mark and
Sweep
(full collections)
jbb 2-WH
319
229.37
279.73
jbb 4-WH
516
287
323
jbb 6-WH
641
315
347
jbb 8-WH
770
372
374
21
SPECjbb - Allocations per time
100
2WH
4WH
6WH
8WH
Allocations Kb/ms
90
80
70
60
50
40
30
20
10
0
-200
0
200
400
600 800 1000 1200 1400
Time From resuming the application(ms)
22
Dacapo - Allocations per time
30
jython
pmd
ps
xalan
Allocation rate
25
20
15
10
5
0
-20
-10
0
10
20
30
40
50
60
70
Time from resuming the application (ms)
23
Previous Work on Compaction




Early works: Two-finger, Lisp2, and the threaded
algorithm [Jonkers and Morris] are single threaded and
therefore create a large pause time.
[Flood et al. 2001] first parallel compaction algorithms.
But has 3 heap passes and creates several dense areas.
[Abuaiadh et al. 2004] Parallel with two heap passes,
not concurrent.
[Ossia et al. 2004] execute the pointer fix-up part
concurrently.
24
Related Work




Numerous concurrent and parallel garbage collectors.
Copying collectors [Cheney 70] compact objects
during the collection but require a large space
overhead and do not retain objects order.
Savings in space overhead for copying collectors
[Sachindran and Moss 2004]
[Bacon et al. 2003, Click et al. 2005] propose an
incremental compaction.
But it uses a read barrier, and does not keep the order
of objects.
25
Complexity Comparisons
JonkersMorris
SUN 2001
Heap
Mark-bits Remarks
Passes Passes
2
0
Two traces of the heap
are interleaved with
the two passes.
3
0
IBM 2004
2
1
The
Compressor
1
1
Mark-bits table is
small (at most 1/32 of
the heap size)
26
Conclusion
The Compressor:
 The first compactor that passes over the
heap only once.






Plus one pass over the mark-bits vector.
Fully compacts all the objects in the heap.
Preserves the order of the objects.
Low space overhead.
Uses memory services to obtain parallelism.
Uses traps to obtain concurrency.
27
Questions
28