Lock-Free Resizeable Concurrent Tries Aleksandar Prokopec, Phil Bagwell, Martin Odersky LAMP, École Polytechnique Fédérale de Lausanne Switzerland.

Download Report

Transcript Lock-Free Resizeable Concurrent Tries Aleksandar Prokopec, Phil Bagwell, Martin Odersky LAMP, École Polytechnique Fédérale de Lausanne Switzerland.

Lock-Free Resizeable Concurrent Tries
Aleksandar Prokopec, Phil Bagwell, Martin Odersky
LAMP, École Polytechnique Fédérale de Lausanne
Switzerland
Motivation
xs.foreach { x =>
doSomething(x)
}
Motivation
xs.foreach { x =>
doSomething(x)
}
ys = xs.map { x =>
x * (-1)
}
Motivation
ys = new ConcurrentMap
xs.foreach { x =>
ys.insert(x * (-1))
}
Hash Array Mapped Tries (HAMT)
Hash Array Mapped Tries (HAMT)
0 = 0000002
Hash Array Mapped Tries (HAMT)
0
Hash Array Mapped Tries (HAMT)
16 = 0100002
0
Hash Array Mapped Tries (HAMT)
0 16
Hash Array Mapped Tries (HAMT)
4 = 0001002
0 16
Hash Array Mapped Tries (HAMT)
16
4 = 0001002
0
Hash Array Mapped Tries (HAMT)
16
0 4
Hash Array Mapped Tries (HAMT)
16
12 = 0011002
0 4
Hash Array Mapped Tries (HAMT)
16
12 = 0011002
0 4
Hash Array Mapped Tries (HAMT)
16
0 4
12
Hash Array Mapped Tries (HAMT)
16 33
0 4
12
Hash Array Mapped Tries (HAMT)
16 33 48
0 4
12
Hash Array Mapped Tries (HAMT)
16
0 4
12
48
33 37
Hash Array Mapped Tries (HAMT)
16
4
0
3
12
48
33 37
Hash Array Mapped Tries (HAMT)
0 1
4
12
3
8 9
16 20 25
33 37
48
57
Hash Array Mapped Tries (HAMT)
0 1
4
12
3
8 9
16 20 25
33 37
48
57
Too much space!
Hash Array Mapped Tries (HAMT)
4
0 1 3
12
8 9
16 20 25
33 37
48 57
Hash Array Mapped Tries (HAMT)
4
0 1 3
12
8 9
16 20 25
33 37
48 57
Linear search at every level - slow!
Hash Array Mapped Tries (HAMT)
4
0 1 3
12
16 20 25
8 9
33 37
48 57
Solution – bitmap index!
Relying on BITPOP instruction.
Hash Array Mapped Tries (HAMT)
48
57
1 0 1 0
48
57
1 0 1 0
48 57
10 48 57
BITPOP(((1 << ((hc >> lev) & 1F)) – 1) & BMP)
Hash Array Mapped Tries (HAMT)
4
0 1 3
12
16 20 25
8 9
33 37
48 57
For 32-way tries – 32-bit bitmap.
Hash Array Mapped Tries (HAMT)
4
0 1 3
12
16 20 25
8 9
33 37
48 57
Hash Array Mapped Tries (HAMT)
4
0 1 3
12
16 20 25
9
33 37
48 57
Hash Array Mapped Tries (HAMT)
4 9 12
0 1 3
16 20 25
33 37
48 57
Remove compresses the trie.
Hash Array Mapped Tries (HAMT)
• advantages:
• low space consumption and shrinking
• no contiguous memory region required
• fast – logarithmic complexity, but with a low
constant factor
• used as efficient immutable maps
• no global resize phase – real time
applications, potentially more scalable
concurrent operations?
Concurrent Trie (Ctrie)
• goals:
• thread-safe concurrent trie
• maintain the advantages of HAMT
• rely solely on CAS instructions
• ensure lock-freedom and linearizability
• lookup – probably same as for HAMT
CAS instruction
CAS(address, expected_value, new_value)
Atomically replaces the value at the address with
the new_value if it is equal to the
expected_value.
Returns true if successful, false otherwise.
May fail spuriously.
Lock-freedom
If multiple threads execute an operation, at least
one of them will complete the operation within
a finite number of steps.
Lock-freedom
If multiple threads execute an operation, at least
one of them will complete the operation within
a finite number of steps.
do {
a = READ(addr)
b=a+1
} while (!CAS(addr, a, b))
Lock-freedom
If multiple threads execute an operation, at least
one of them will complete the operation within
a finite number of steps.
def counter()
do {
a = READ(addr)
b=a+1
} while (!CAS(addr, a, b))
Insertion
4 9 12
0 1 3
16 20 25
33 37
48 57
17 = 0100012
Insertion
4 9 12
0 1 3
16 20 25
16 17
1) allocate
33 37
48 57
17 = 0100012
Insertion
4 9 12
0 1 3
20 25
16 17
33 37
48 57
17 = 0100012
2) CAS
Insertion
4 9 12
0 1 3
20 25
16 17
33 37
48 57
17 = 0100012
Insertion
4 9 12
0 1 3
20 25
33 37
16 17
18 = 0100102
48 57
Insertion
4 9 12
0 1 3
20 25
16 17
33 37
16 17 18
18 = 0100102
48 57
1) allocate
Insertion
4 9 12
0 1 3
20 25
2) CAS
33 37
16 17 18
18 = 0100102
48 57
Insertion
Unless…
4 9 12
0 1 3
20 25
2) CAS
33 37
16 17 18
18 = 0100102
48 57
Insertion
28 = 0111002
Unless…
4 9 12
0 1 3
20 25
16 17
T2
33 37
16 17 18
18 = 0100102
48 57
T1-1) allocate
T1
Insertion
Unless…
4 9 12
0 1 3
20 25
16 17
28 = 0111002
T2
20 25 28
16 17 18
18 = 0100102
T2-1) allocate
T1-1) allocate
T1
Insertion
T2-2) CAS
4 9 12
0 1 3
20 25
16 17
28 = 0111002
T2
20 25 28
16 17 18
18 = 0100102
T1-1) allocate
T1
Insertion
T2-2) CAS
4 9 12
0 1 3
20 25
16 17
28 = 0111002
T2
20 25 28
16 17 18
T1
T1-2) CAS
18 = 0100102
Insertion
28 = 0111002
T2
4 9 12
0 1 3
20 25 28
16 17
T1
20 25
18 = 0100102
16 17 18
Lost insert!
Insertion – 2nd attempt
Solution: I-nodes
4 9 12
0 1 3
20 25
16 17
Insertion – 2nd attempt
28 = 0111002
4 9 12
0 1 3
T2
20 25
16 17
18 = 0100102
T1
Insertion – 2nd attempt
28 = 0111002
4 9 12
0 1 3
20 25
16 17
20 25 28
16 17 18
18 = 0100102
T2
T2-1) allocate
T1-1) allocate
T1
Insertion – 2nd attempt
T2
T2-2) CAS
4 9 12
20 25
20 25 28
T1-2) CAS
0 1 3
16 17
16 17 18
T1
Insertion – 2nd attempt
4 9 12
0 1 3
20 25 28
16 17 18
Insertion – 2nd attempt
4 9 12
0 1 3
20 25 28
16 17 18
Idea: once added to the Ctrie, I-nodes remain present.
Remove
4 9 12
0 1 3
20 25 28
16 17 18
Idea: same logic as insert.
Remove
4 9 12
0 1 3
20 25 28
16 17 18
Remove
4 9 12
0 1 3
20 25 28
16 17 18
16 18
1) allocate
Remove
4 9 12
20 25 28
2) CAS
0 1 3
16 17 18
16 18
Remove
4 9 12
0 1 3
20 25 28
16 18
Remove
4 9 12
0 1 3
20 25 28
18
Remove
4 9 12
0 1 3
20 25
18
Remove
4 9 12
0 1
20 25
18
Remove
4 9
0 1
20 25
18
Remove
4 9
1
20 25
18
Remove
4 9
1
20
18
Remove
9
1
20
18
Remove
1
18
Ctrie is not compact => could be faster
Remove – 2nd attempt
4 9 12
20 25 28
3) allocate
18 20 25 28
0 1 3
18
Remove – 2nd attempt
4 9 12
20 25 28
4) CAS
18 20 25 28
0 1 3
18
Remove – 2nd attempt
4 9 12
20 25 28
4) CAS
18 20 25 28
0 1 3
18
Not correct.
Remove – 2nd attempt
4 9 12
20 25 28
T1-3) allocate
18 20 25 28
0 1 3
18
T2-1) allocate
17 18
T1 – compress
T2 – insert 17
Remove – 2nd attempt
4 9 12
20 25 28
T1-4) CAS
18 20 25 28
0 1 3
18
T2-2) CAS
17 18
T1 – compress
T2 – insert 17
Remove – 3rd attempt
4 9 12
0 1 3
20 25 28
18
Idea: disallow insertions as you do compression
Remove – 3rd attempt
4 9 12
20 25 28
T1-3) allocate
 18
0 1 3
18
T-node
T2-1) allocate
17 18
Idea: disallow insertions as you do compression
Remove – 3rd attempt
4 9 12
20 25 28
T1-4) CAS
 18
0 1 3
18
T2-2) CAS
17 18
Idea: disallow insertions as you do compression
Remove – 3rd attempt
4 9 12
20 25 28
T1-4) CAS
 18
0 1 3
18
T2-2) CAS failed - repeat
17 18
Idea: disallow insertions as you do compression
Remove – 3rd attempt
T1-5) allocate
18 20 25 28
4 9 12
20 25 28
 18
0 1 3
T2-1) do the same as T1,
then repeat
Idea: disallow insertions as you do compression
Remove – 3rd attempt
T1-6) CAS
18 20 25 28
4 9 12
20 25 28
 18
0 1 3
Is this still lock-free?
Remove – 3rd attempt
T1-6) CAS
18 20 25 28
4 9 12
20 25 28
 18
0 1 3
Is this still lock-free?
Yes - roughly, whoever sees the T-node will help remove it, and
there is a finite number of T-nodes (full proof in the paper).
Remove – 3rd attempt
T1-6) CAS
18 20 25 28
4 9 12
20 25 28
 18
0 1 3
Is this linearizable?
Remove – 3rd attempt
T1-6) CAS
18 20 25 28
4 9 12
20 25 28
 18
0 1 3
Is this linearizable?
Yes – roughly, the CAS instruction which makes the new value
reachable is the linearization point (see paper for full list).
Evaluation – quad core i7
Evaluation – UltraSPARC T2
Evaluation – 4x 8-core i7
Summary
• pseudocode and implementation for a
concurrent hash trie
• properties proven:
• correctness
• linearizability
• lock-freedom
• compactness
• performance evaluation – scalable insertion and
remove
Future work
• concurrent memory pool to avoid GC
• lock-free size, iterator and clear operations
running in O(1)
Thank you!