IPDPS 2003 - Fast and Lock Free Concurrent Priority Queues

Download Report

Transcript IPDPS 2003 - Fast and Lock Free Concurrent Priority Queues

Challenges in multi thread synchronization - Winter Meeting 2004 Håkan Sundell PhD-Student for Philippas Tsigas since 1999 - Licentiate in 2002 - PhD planned for 2004

Why Synchronization

 Multi-thread programs work with shared data

T2

 Alt 1. Message Passing

T1 T3

 Alt 2. Shared Memory

T2

? ? ?

? ? ?

T1 T3

Shared Memory

CPU Cache CPU Cache

. . .

CPU Cache Memory

- Uniform Memory Access (UMA)

CPU ...

CPU Cache bus Memory CPU ...

CPU Cache bus Memory

. . .

CPU

- Non-Uniform Memory Access (NUMA)

...

CPU Cache bus Memory

Memory Consistency

 Example scenario T i T j T k W(x,0) W(y,0) W(x,1) R(x)=1 R(y)=0 W(y,1) R(x)=0 R(y)=1 R(y)=1 R(x)=1 t  Models: Relaxed Memory Order, Sequential Consistency, etc.

Linearizability

T i : T j : T k : Ser :  All concurrent executions can be transformed into an equivalent serial sequence of atomic operations preserving the partial order Write Read Write t

Hardware Synchronization Primitives

 Consensus 1  Atomic Read/Write Read Write Read  Consensus 2  Atomic Test-And-Set (TAS), Fetch-And-Add (FAA) M=f(M,…)  Consensus Infinite  Atomic Compare-And-Swap (CAS)  Atomic Load-Linked/Store-Conditionally

Mutual Exclusion

 Access to shared data will be atomic because of lock P1 P2 P3  Reduced Parallelism by definition  Blocking, Danger of priority inversion and deadlocks.

• Solutions exists, but with high overhead, especially for multi-processor systems

Non-blocking Synchronization  Perform operation/changes using atomic primitives  Lock-Free Synchronization  Optimistic approach • Retries until succeeding  Wait-Free Synchronization  Always finishes in a finite number of its own steps • Coordination with all participants

Non-Blocking Dynamic Memory Management

 i.e. Malloc/Free  Fixed block size, Fixed purpose  Lock-Free  Wait-Free (to be published)  Any block size, Any/Fixed purpose  Open problem

Reference Counting

 Question  When to free objects?

 How to de-reference pointers safely?

P2 P1 P3  Solutions (in cooperation with dynamic memory manager)  Lock-Free  Wait-Free (to be published)

Software Synchronization Primitives

 Atomic Read/Write (WF with RT requirements published).

 Multi-variable Read/Write, Snapshot (WF with RT-requirements published).

   LL/SC (LF to be published) Multi-word Compare-And-Swap (CASN) i.e. transactions   Lock-Free Wait-Free (to be published) LL/SCN

Shared Data Structures

   General LF/WF schemes exist to implement all data structures. However, not practical NOBLE library (published) Lock-Free   Stack Queue    Priority Queue (published) Dictionary (published) Deque (to be published)   Singly Linked Lists Doubly Linked Lists (to be published)

Evaluation of Non-Blocking Synchronization

 Experiments on parallel machines, 2,4,64 CPUS with UMA/NUMA architectures  Stress tests of data structures (published)  Benchmark applications  Real applications  Open subject

Clusters and Grids

CPU ...

CPU Cache bus I/O Memory CPU ...

CPU Cache bus I/O Memory

. . .

CPU ...

CPU Cache bus I/O Memory  Distributed Shared Memory  Atomic synchronization primitives?

 Relaxed Memory Models?

 Modification of non-blocking algorithms?