Transcript IPDPS 2003 - Fast and Lock Free Concurrent Priority Queues
Challenges in multi thread synchronization - Winter Meeting 2004 Håkan Sundell PhD-Student for Philippas Tsigas since 1999 - Licentiate in 2002 - PhD planned for 2004
Why Synchronization
Multi-thread programs work with shared data
T2
Alt 1. Message Passing
T1 T3
Alt 2. Shared Memory
T2
? ? ?
? ? ?
T1 T3
Shared Memory
CPU Cache CPU Cache
. . .
CPU Cache Memory
- Uniform Memory Access (UMA)
CPU ...
CPU Cache bus Memory CPU ...
CPU Cache bus Memory
. . .
CPU
- Non-Uniform Memory Access (NUMA)
...
CPU Cache bus Memory
Memory Consistency
Example scenario T i T j T k W(x,0) W(y,0) W(x,1) R(x)=1 R(y)=0 W(y,1) R(x)=0 R(y)=1 R(y)=1 R(x)=1 t Models: Relaxed Memory Order, Sequential Consistency, etc.
Linearizability
T i : T j : T k : Ser : All concurrent executions can be transformed into an equivalent serial sequence of atomic operations preserving the partial order Write Read Write t
Hardware Synchronization Primitives
Consensus 1 Atomic Read/Write Read Write Read Consensus 2 Atomic Test-And-Set (TAS), Fetch-And-Add (FAA) M=f(M,…) Consensus Infinite Atomic Compare-And-Swap (CAS) Atomic Load-Linked/Store-Conditionally
Mutual Exclusion
Access to shared data will be atomic because of lock P1 P2 P3 Reduced Parallelism by definition Blocking, Danger of priority inversion and deadlocks.
• Solutions exists, but with high overhead, especially for multi-processor systems
Non-blocking Synchronization Perform operation/changes using atomic primitives Lock-Free Synchronization Optimistic approach • Retries until succeeding Wait-Free Synchronization Always finishes in a finite number of its own steps • Coordination with all participants
Non-Blocking Dynamic Memory Management
i.e. Malloc/Free Fixed block size, Fixed purpose Lock-Free Wait-Free (to be published) Any block size, Any/Fixed purpose Open problem
Reference Counting
Question When to free objects?
How to de-reference pointers safely?
P2 P1 P3 Solutions (in cooperation with dynamic memory manager) Lock-Free Wait-Free (to be published)
Software Synchronization Primitives
Atomic Read/Write (WF with RT requirements published).
Multi-variable Read/Write, Snapshot (WF with RT-requirements published).
LL/SC (LF to be published) Multi-word Compare-And-Swap (CASN) i.e. transactions Lock-Free Wait-Free (to be published) LL/SCN
Shared Data Structures
General LF/WF schemes exist to implement all data structures. However, not practical NOBLE library (published) Lock-Free Stack Queue Priority Queue (published) Dictionary (published) Deque (to be published) Singly Linked Lists Doubly Linked Lists (to be published)
Evaluation of Non-Blocking Synchronization
Experiments on parallel machines, 2,4,64 CPUS with UMA/NUMA architectures Stress tests of data structures (published) Benchmark applications Real applications Open subject
Clusters and Grids
CPU ...
CPU Cache bus I/O Memory CPU ...
CPU Cache bus I/O Memory
. . .
CPU ...
CPU Cache bus I/O Memory Distributed Shared Memory Atomic synchronization primitives?
Relaxed Memory Models?
Modification of non-blocking algorithms?