IPDPS 2003 - Fast and Lock Free Concurrent Priority Queues

Download Report

Transcript IPDPS 2003 - Fast and Lock Free Concurrent Priority Queues

Wait-Free Reference
Counting and Memory
Management
Håkan Sundell , Ph.D.
Outline
Shared Memory
 Synchronization Methods
 Memory Management


Garbage Collection
• Reference Counting

Memory Allocation
Performance
 Conclusions

2
IPDPS 2005
5 April 2005
Shared Memory
CPU
Cache
CPU
Cache
...
CPU
Cache
Memory
- Uniform Memory Access (UMA)
CPU ... CPU
Cache bus
CPU ... CPU
Cache bus
Memory
Memory
...
CPU ... CPU
Cache bus
Memory
- Non-Uniform Memory Access (NUMA)
3
IPDPS 2005
5 April 2005
Synchronization

Shared data structures needs
synchronization!
P1
P2
P3

4
Accesses and updates must be coordinated
to establish consistency.
IPDPS 2005
5 April 2005
Hardware Synchronization
Primitives

Weak


Write
Atomic Test-And-Set (TAS), Fetch-And-Add
(FAA), Swap
M=f(M,…)
Universal


5
Atomic Read/Write
Read
Stronger


Read
Atomic Compare-And-Swap (CAS)
Atomic Load-Linked/Store-Conditionally
IPDPS 2005
5 April 2005
Mutual Exclusion

Access to shared data will be atomic
P1
because of lock
P3
Reduced Parallelism by definition
 Blocking, Danger of priority inversion
and deadlocks.

• Solutions exists, but with high overhead,
especially for multi-processor systems
6
IPDPS 2005
5 April 2005
P2
Non-blocking Synchronization
Perform operation/changes using
atomic primitives
 Lock-Free Synchronization


Optimistic approach
• Retries until succeeding

Wait-Free Synchronization

Always finishes in a finite number of
its own steps
• Coordination with all participants
7
IPDPS 2005
5 April 2005
Memory Management

Dynamic data structures need dynamic
memory management

8
Concurrent D.S. need concurrent M.M.!
IPDPS 2005
5 April 2005
Concurrent Memory
Management

Concurrent Memory Allocation


i.e. malloc/free functionality
Concurrent Garbage Collection

Questions (among many):
• When to re-use memory?
• How to de-reference pointers safely?
P2
9
P1
IPDPS 2005
P3
5 April 2005
Lock-Free Memory
Management

Memory Allocation



Garbage Collection


10
Valois 1995, fixed block-size, fixed purpose
Michael 2004, Gidenstam et al. 2004, any
size, any purpose
Valois 1995, Detlefs et al. 2001; reference
counting
Michael 2002, Herlihy et al. 2002; hazard
pointers
IPDPS 2005
5 April 2005
Wait-Free Memory
Management

Hesselink and Groote, ”Wait-free concurrent
memory management by create and read
until deletion (CaRuD)”, Dist. Comp. 2001


New Wait-Free Algorithm:


11
limited to the problem of shared static terms
Memory Allocation – fixed block-size, fixed
purpose
Garbage Collection – reference counting
IPDPS 2005
5 April 2005
Wait-Free Reference
Counting

De-referencing links



1. Read the link contents, i.e. a pointer.
2. Increment (FAA) the reference count on
the corresponding object.
What if the link is changed between step 1
and 2?

Wait-Free solution:
• The de-referencing operation should announce
the link before reading.
• The operations that changes that link should help
the de-referencing operation.
12
IPDPS 2005
5 April 2005
Wait-Free Reference
Counting

Announcing



Helping

13
Writes the link adress to a (per thread and
per new de-ref) shared variable.
Atomically removes the announcement and
retrieves possible answer (from helping) by
Swap with null.
If announcement matches changed link,
atomically answer with a proper pointer
using CAS.
IPDPS 2005
5 April 2005
Wait-Free Memory Allocation

Solution (lock-free), IBM freelists:

Create a linked-list of the free nodes,
allocate/reclaim using CAS
Allocate
Head
Mem 1
Reclaim

14
Mem 2
…
…
Mem i
Used 1
How to guarantee that the CAS of a
alloc/free operation eventually succeeds?
IPDPS 2005
5 April 2005
Wait-Free Memory
Allocation

Wait-Free Solution:






15
Create 2*N freelists.
Alloc operations concurrently try to allocate
from the current (globally agreed on) freelist.
When current freelist is empty, the current is
changed in round-robin manner.
Free operation of thread i only works on
freelist i or N+i.
Alloc operations announce their interest.
All free and alloc operations try to help
announced alloc operations in round-robin.
IPDPS 2005
5 April 2005
Wait-Free Memory Allocation
SWAP!CAS!
X
X
Null Null
X Null
X
…
Null
Announcement variables
id

 Announcing
Helping
 A
valueagreed
of nulloninwhich
the per
thread
shared

Globally
thread
to help,
incremented



16
variable
indicates
interest.
when
agreed
in round-robin.
Free
answers
the selected
of
Allocatomically
atomically
announces
andthread
recieves
interest
with
a free node
using CAS.
possible
answer
by using
Swap.
First time that Alloc succeeds with getting a node from
the current freelist, it tries to atomically answer the
selected thread of interest with the node using CAS.
IPDPS 2005
5 April 2005
Performance

Worst-case

Need analysis of maximum execution path
and apply known WCET techniques.
• e.g. 2*N2 maximum CAS retries for alloc.

Average and Overhead

Experiments in the scope of dynamic data
structures (e.g. lock-free skip list)
• H. Sundell and P. Tsigas, ”Fast and Lock-Free
Concurrent Priority Queues for Multi-thread
Systems”, IPDPS 2003

17
Performed on NUMA (SGI Origin 2000)
architecture, full concurrency.
IPDPS 2005
5 April 2005
Average Performance
18
IPDPS 2005
5 April 2005
Conclusions


19
New algorithms for concurrent & dynamic Memory
Management
 Wait-Free & Linearizable.
 Reference counting.
 Fixed-size memory allocation.
 To the best of knowledge, the first wait-free memory
management scheme that supports implementing
arbitrary dynamic concurrent data structures.
 Will be available as part of NOBLE software library,
http://www.noble-library.org
Future work
 Implement new wait-free dynamic data structures.
 Provide upper bounds of memory usage.
IPDPS 2005
5 April 2005
Questions?

Contact Information:



20
Address:
Håkan Sundell
Computing Science
Chalmers University of Technology
Email:
[email protected]
Web:
http://www.cs.chalmers.se/~phs
IPDPS 2005
5 April 2005