IPDPS 2003 - Fast and Lock Free Concurrent Priority Queues
Download
Report
Transcript IPDPS 2003 - Fast and Lock Free Concurrent Priority Queues
Wait-Free Reference
Counting and Memory
Management
Håkan Sundell , Ph.D.
Outline
Shared Memory
Synchronization Methods
Memory Management
Garbage Collection
• Reference Counting
Memory Allocation
Performance
Conclusions
2
IPDPS 2005
5 April 2005
Shared Memory
CPU
Cache
CPU
Cache
...
CPU
Cache
Memory
- Uniform Memory Access (UMA)
CPU ... CPU
Cache bus
CPU ... CPU
Cache bus
Memory
Memory
...
CPU ... CPU
Cache bus
Memory
- Non-Uniform Memory Access (NUMA)
3
IPDPS 2005
5 April 2005
Synchronization
Shared data structures needs
synchronization!
P1
P2
P3
4
Accesses and updates must be coordinated
to establish consistency.
IPDPS 2005
5 April 2005
Hardware Synchronization
Primitives
Weak
Write
Atomic Test-And-Set (TAS), Fetch-And-Add
(FAA), Swap
M=f(M,…)
Universal
5
Atomic Read/Write
Read
Stronger
Read
Atomic Compare-And-Swap (CAS)
Atomic Load-Linked/Store-Conditionally
IPDPS 2005
5 April 2005
Mutual Exclusion
Access to shared data will be atomic
P1
because of lock
P3
Reduced Parallelism by definition
Blocking, Danger of priority inversion
and deadlocks.
• Solutions exists, but with high overhead,
especially for multi-processor systems
6
IPDPS 2005
5 April 2005
P2
Non-blocking Synchronization
Perform operation/changes using
atomic primitives
Lock-Free Synchronization
Optimistic approach
• Retries until succeeding
Wait-Free Synchronization
Always finishes in a finite number of
its own steps
• Coordination with all participants
7
IPDPS 2005
5 April 2005
Memory Management
Dynamic data structures need dynamic
memory management
8
Concurrent D.S. need concurrent M.M.!
IPDPS 2005
5 April 2005
Concurrent Memory
Management
Concurrent Memory Allocation
i.e. malloc/free functionality
Concurrent Garbage Collection
Questions (among many):
• When to re-use memory?
• How to de-reference pointers safely?
P2
9
P1
IPDPS 2005
P3
5 April 2005
Lock-Free Memory
Management
Memory Allocation
Garbage Collection
10
Valois 1995, fixed block-size, fixed purpose
Michael 2004, Gidenstam et al. 2004, any
size, any purpose
Valois 1995, Detlefs et al. 2001; reference
counting
Michael 2002, Herlihy et al. 2002; hazard
pointers
IPDPS 2005
5 April 2005
Wait-Free Memory
Management
Hesselink and Groote, ”Wait-free concurrent
memory management by create and read
until deletion (CaRuD)”, Dist. Comp. 2001
New Wait-Free Algorithm:
11
limited to the problem of shared static terms
Memory Allocation – fixed block-size, fixed
purpose
Garbage Collection – reference counting
IPDPS 2005
5 April 2005
Wait-Free Reference
Counting
De-referencing links
1. Read the link contents, i.e. a pointer.
2. Increment (FAA) the reference count on
the corresponding object.
What if the link is changed between step 1
and 2?
Wait-Free solution:
• The de-referencing operation should announce
the link before reading.
• The operations that changes that link should help
the de-referencing operation.
12
IPDPS 2005
5 April 2005
Wait-Free Reference
Counting
Announcing
Helping
13
Writes the link adress to a (per thread and
per new de-ref) shared variable.
Atomically removes the announcement and
retrieves possible answer (from helping) by
Swap with null.
If announcement matches changed link,
atomically answer with a proper pointer
using CAS.
IPDPS 2005
5 April 2005
Wait-Free Memory Allocation
Solution (lock-free), IBM freelists:
Create a linked-list of the free nodes,
allocate/reclaim using CAS
Allocate
Head
Mem 1
Reclaim
14
Mem 2
…
…
Mem i
Used 1
How to guarantee that the CAS of a
alloc/free operation eventually succeeds?
IPDPS 2005
5 April 2005
Wait-Free Memory
Allocation
Wait-Free Solution:
15
Create 2*N freelists.
Alloc operations concurrently try to allocate
from the current (globally agreed on) freelist.
When current freelist is empty, the current is
changed in round-robin manner.
Free operation of thread i only works on
freelist i or N+i.
Alloc operations announce their interest.
All free and alloc operations try to help
announced alloc operations in round-robin.
IPDPS 2005
5 April 2005
Wait-Free Memory Allocation
SWAP!CAS!
X
X
Null Null
X Null
X
…
Null
Announcement variables
id
Announcing
Helping
A
valueagreed
of nulloninwhich
the per
thread
shared
Globally
thread
to help,
incremented
16
variable
indicates
interest.
when
agreed
in round-robin.
Free
answers
the selected
of
Allocatomically
atomically
announces
andthread
recieves
interest
with
a free node
using CAS.
possible
answer
by using
Swap.
First time that Alloc succeeds with getting a node from
the current freelist, it tries to atomically answer the
selected thread of interest with the node using CAS.
IPDPS 2005
5 April 2005
Performance
Worst-case
Need analysis of maximum execution path
and apply known WCET techniques.
• e.g. 2*N2 maximum CAS retries for alloc.
Average and Overhead
Experiments in the scope of dynamic data
structures (e.g. lock-free skip list)
• H. Sundell and P. Tsigas, ”Fast and Lock-Free
Concurrent Priority Queues for Multi-thread
Systems”, IPDPS 2003
17
Performed on NUMA (SGI Origin 2000)
architecture, full concurrency.
IPDPS 2005
5 April 2005
Average Performance
18
IPDPS 2005
5 April 2005
Conclusions
19
New algorithms for concurrent & dynamic Memory
Management
Wait-Free & Linearizable.
Reference counting.
Fixed-size memory allocation.
To the best of knowledge, the first wait-free memory
management scheme that supports implementing
arbitrary dynamic concurrent data structures.
Will be available as part of NOBLE software library,
http://www.noble-library.org
Future work
Implement new wait-free dynamic data structures.
Provide upper bounds of memory usage.
IPDPS 2005
5 April 2005
Questions?
Contact Information:
20
Address:
Håkan Sundell
Computing Science
Chalmers University of Technology
Email:
[email protected]
Web:
http://www.cs.chalmers.se/~phs
IPDPS 2005
5 April 2005