www.faculty.idc.ac.il

Download Report

Transcript www.faculty.idc.ac.il

Synchronization Algorithms
and Concurrent Programming
Gadi Taubenfeld
Chapter 5
Barrier Synchronization
Version: June 2014
This presentation is a Synchronization
modified version
of a presentation
thatProgramming
Itai Avrian and Shachar Gidron
Algorithms
and Concurrent
1
prepared for my Seminar inGadi
Concurrent
Distributed Computing, 2012.
Taubenfeldand
© 2014
Chapter 5
Synchronization Algorithms
and Concurrent Programming
ISBN: 0131972596, 1st edition
A note on the use of these ppt slides:
I am making these slides freely available to all (faculty, students, readers).
They are in PowerPoint form so you can add, modify, and delete slides and slide
content to suit your needs. They obviously represent a lot of work on my part.
In return for use, I only ask the following:
 That you mention their source, after all, I would like people to use my book!
 That you note that they are adapted from (or perhaps identical to)
my slides, and note my copyright of this material.
Thanks and enjoy!
Gadi Taubenfeld
All material copyright 2014
Gadi Taubenfeld, All Rights Reserved
To get the most updated version of these slides go to:
http://www.faculty.idc.ac.il/gadi/book.htm
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
2
Chapter 5
Barrier Synchronization
5.1 Barriers
5.2 Atomic Counter
5.3 Test-and-set Bits
5.4 Combining Tree Barrier*
5.5 A Tree-based Barriers
5.6 The Dissemination Barrier*
5.7 The See-Saw Barrier
5.8 Semaphores
5.9 Bibliographic Notes*
5.10 Problems*
* Not covered in this presentation
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
3
Definition and Motivation
Barrier Synchronization
Chapter 5
Synchronization Algorithms and Concurrent
Programming Gadi Taubenfeld © 2014
4
What is a Barrier ?
P3
P4
P1
P2
P2
P3
Barrier
Barrier
P2
P1
Barrier
P1
P4
P3
P4
time
four processes
approach the
barrier
Chapter 5
all except
P4 arrive
Once all
arrive, they
continue
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
5
What is a Barrier ?

A barrier is a coordination mechanism (an algorithm),
that forces processes which participate in a concurrent
(or distributed) algorithm to wait until each one of
them has reached a certain point in its program.

The collection of this coordination points is called the
barrier.

Once all the processes have reached the barrier, they
are all permitted to continue pass the barrier.
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
6
Example: Parallel Prefix Sum
begin
a
b
c
d
e
f
time
end
Chapter 5
a
a+b
a+b+c
a+b+c+d
a+b+c
+d+e
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
a+b+c
+d+e+f
7
Example: Parallel Prefix Sum
begin
end
Chapter 5
a
b
c
d
e
f
a
a+b
c
d
e
f
a
a+b
a+b+c
d
e
f
a
a+b
a+b+c
a+b+c+d
e
f
a
a+b
a+b+c
a+b+c+d a+b+c+d+e
a
a+b
a+b+c
a+b+c+d
a+b+c
+d+e
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
time
f
a+b+c
+d+e+f
8
Example: Parallel Prefix Sum
begin
a
b
c
d
e
f
a
a+b
b+c
c+d
d+e
e+f
time
end
Chapter 5
a
a+b
a+b+c
a+b+c+d b+c+d+e c+d+e+f
a
a+b
a+b+c
a+b+c+d
a+b+c
+d+e
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
a+b+c
+d+e+f
9
Example: Parallel Prefix Sum
begin
a
b
c
d
e
f
a
a+b
b+c
c+d
d+e
e+f
barrier
time
barrier
end
Chapter 5
a
a+b
a+b+c
a+b+c+d b+c+d+e c+d+e+f
a
a+b
a+b+c
a+b+c+d
a+b+c
+d+e
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
a+b+c
+d+e+f
10
Example: Video
Single thread
 Assume we have a video application
 Each frame needs to be calculated,
before being displayed
 Prepare frame for display by graphics processor
while (true)
{
frame = prepare_next_frame();
frame.display();
}
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
11
Example: Video
Multiple threads
 Now, we have n threads running in parallel
 It makes sense to split the frame into n disjoint parts
 Each thread prepares its own parts in parallel with others
 Each thread may run on different graphical processor
Barrier globalBarrier;
i = getThreadID();
while (true)
{
frame[ i ].prepare();
globalBarrier.await();
frame[ i ].display();
}
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
12
Where it is needed

Scientific & numeric computation

Computer graphics

Garbage collections

Parallel computing in general
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
13
Various Barrier Goals
Ideally when designing barriers, we would like to have the
following properties:
 Low shared memory space complexity
 Low contention on shared objects
 Low shared memory reference per process
 No need for shared memory initialization
 Symmetric-ness (same amount of work for all processes)
 Algorithm simplicity
 Simple basic primtive
 Minimal propagation time
 Reusability of the barrier (must!)
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
14
Data Objects in Use
 Atomic Bit
 Atomic Register
 Fetch-and-increment register
 Test and set bits
 Read-Modify-Write register
 Semaphores
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
15
Barriers using atomic counters
Section 5.2
Atomic Bit
Atomic Register
Fetch-and-increment register / atomic counter
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
16
Fetch-and-increment Register
 A shared register that supports a F&I operation:
 Input: register
r
 Atomic operation:

r is incremented by 1
 the old value of r is returned
function fetch-and-increment (r : register)
orig_r := r;
r:= r + 1;
return (orig_r);
end-function
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
17
await macro
 For clarity, we use the await macro
 Not an operation of an object
 This is also called: “spinning”
macro await (condition : boolean condition)
repeat
cond = eval(condition);
until (cond)
end-macro
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
18
Simple Barrier Using an Atomic Counter
Program of a Process
shared
counter: fetch and increment reg. – {0,..n}, initially = 0
go: atomic bit, initial value is immaterial
local
local.go: a bit, initial value is immaterial
local.counter: register
1
local.go := go
2
local.counter := fetch-and-increment (counter)
3
if local.counter + 1 = n then
4
counter := 0
5
go := 1 - go
6
Chapter 5
else await(local.go ≠ go) fi
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
19
Simple Barrier Using an Atomic Counter
Run for n=2 Processes
counter
?
go
local.go
?
local.counter
?
P1
SM
local.go
?
local.counter
?
1
local.go := go
2
local.counter := fetch-and-increment (counter)
3
if local.counter + 1 = n then
4
counter := 0
5
go := 1 - go
6
Chapter 5
?
P2
else await(local.go ≠ go) fi
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
20
Simple Barrier Using an Atomic Counter
Run for n=2 Processes
counter
P2
P1
local.go
?
0
local.counter
?
0
P1
SM
local.go
?
0
local.counter
?
1
1
local.go := go
2
local.counter := fetch-and-increment (counter)
0+1≠2
1+1=2
3
if local.counter + 1 = n then
4
counter := 0
5
go := 1 - go
6
Chapter 5
0
1
go
0
2
1
P2
P1 Busy wait
else await(local.go ≠ go) fi
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
21
Simple Barrier Using an Atomic Counter
Another Run for n=2 Processes
counter
P2
P1
local.go
?
0
local.counter
?
0
1
2
3
0
1
P1
SM
local.go
?
0
local.counter
?
1
P2
local.go := go
Counter
is “fetchP1: 0+1≠2
local.counter := fetch-and-increment (counter)
and-increment”
P2: 1+1=2
register
if local.counter + 1 = n then
4
counter := 0
5
go := 1 - go
6
Chapter 5
go
0
2
1
P1 Busy wait
else await(local.go ≠ go) fi
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
22
Another Algorithm Using an Atomic Counter
Program of a Process
shared
counter: fetch and increment reg. – {0,..n}, initially = 0
local
local.counter: register
1
local.counter := fetch-and-increment(counter)
2
if local.counter + 1 = n then
3
4
Chapter 5
Is this
implementation
incorrect?
counter := 0
else await(counter = 0) fi
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
23
Simple Barrier Using an Atomic Counter
 There is high memory contention on
go bit
 Reducing the contention:
go bit with n bits: go[1],…,go[n]
 Process pi may spin only on the bit go[i]
 Replace the
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
24
A Local Spinning Counter Barrier
Program of a Process i
shared
counter: fetch and increment reg. – {0,..n}, initially = 0
go[1..n]: array of atomic bits, initial values are immaterial
local
local.go: a bit, initial value is immaterial
local.counter: register
1
local.go := go[i]
2
local.counter := fetch-and-increment (counter)
3
if local.counter + 1 = n then
4
counter := 0
5
for j=1 to n do go[j] := 1 – go[j] od
6
Chapter 5
else await(local.go ≠ go[i]) fi
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
25
A Local Spinning Counter Barrier
Example Run for n=3 Processes
P3
P2
P1
counter
1
2
3
0
loc.go
?
0
loc.counter
0
?
P1
0
1
?
0
1
?
loc.go
?
0
loc.counter
?
1
P2
0
1
?
0
?
loc.counter
?
2
local.go := go[i]
2
local.counter := fetch-and-increment (counter)
2+1=3
0+1≠3
1+1≠3
3
if local.counter + 1 = n then
4
counter := 0
5
for j=1 to n do go[j] := 1 – go[j] od
else await(local.go ≠ go[i]) fi
SM
loc.go
1
6
Chapter 5
go
P3
P1,P2
P1 Busy
Busywait
wait
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
26
Comparison of fetch-and-increment Barriers
Simple Barrier
Simple Barrier with go array
 Pros:
 Pros:
 Very Simple
 Low contention on the go
 Shared memory: O(log n)
bits
 Takes O(1) until last
waiting p is awaken
 Cons:
 High contention on the
go bit
 Contention on the
counter register (*)
Chapter 5
array
 In some models:
 spinning is done on local
memory
 remote mem. ref.: O(1)
 Cons:
 Shared memory: O(n)
 Still contention on the
counter register (*)
 Takes O(n) until last
waiting p is awaken
(*) One technique for solving this contention is the
Synchronization Algorithms and Concurrent Programming
Combining
Tree Barriers – page 210
Gadi Taubenfeld © 2014
27
A Barrier without Memory Initialization
 Barrier is a basic synchronization method
 To initialize shared memory, processes need to be
synchronized
 Thus, barrier may be a prerequisite for shared memory
initialization and cannot assume one
 Processes may not be implemented in the same way
 So it is desirable to reduce the dependency between them
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
28
A Barrier without Memory Initialization
Program of a Process
shared
counter: atomic counter – {0,..n-1}, initial value is immaterial
go: atomic bit, initial value is immaterial
local
local.go: a bit, initial value is immaterial
local.counter: register, initial value is immaterial
1
local.go := go
// remember current value
2
local.counter := counter
// remember current value
3
counter := counter +1 (mod n)
// atomic increment mod n
4
repeat
5
if counter = local.counter
// all processes have arrived
6
then go := 1 – local.go fi
// notify all
7
Chapter 5
until (local.go ≠ go)
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
29
Using Test-and-Set Bits
Section 5.3
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
30
Test-and-Set Bit
b
 Test-and-set is an atomic operation:
 b is set to 1
 the old value of b (i.e., 0 or 1) is returned
 Input: bit
function test-and-set (b : bit)
orig_b := b;
b:= 1;
return (orig_b);
end-function
 An atomic
supported
Chapter 5
reset operation, which sets the value to 0, is
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
31
Test-and-Test-and-Set Bit
Operations supported:
 Test-and-set
 Reset
like a test-and-set bit
 Atomic read (test)
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
32
Test-and-set based Barrier
shared
leader: test-and-set bit, initial value = 0
countflag: test-and-test-and-set bit, initial value = 0
go: atomic bit, initial value is immaterial
local
local.go: a bit, initial value is immaterial
local.counter: register, initial value is immaterial
Chapter 5
0
leader: test-and-set bit
0
countflag: test-and-test-set bit
Local.counter: register
go: atomic register
local.go: bit
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
33
Test-and-set based Barrier
1
local.go := go
2
if test-and-set(leader) = 0 then
3
local.counter := 0
4
repeat
5
await(countflag = 1)
6
local.counter = local.counter + 1
7
reset(countflag)
8
until (local.counter = n - 1)
9
reset(leader)
10
go := 1 – go
11
// a test operation
// the other processes
12
await(test-and-set(countflag) = 0)
13
await(local.go ≠ go)
14
Chapter 5
else
// the leader
fi
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
34
Test-and-Set Barrier
P1
P2
leader
P3
0
1
SM
P4
leader test-and-set atomic operation
First to set the
leader bit is the
leader
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
35
Test-and-Set Barrier
P1
P2
0
1
countflag
P3
SM
P4
go!
repeat
await (test-and-set atomic operation on countflag)
await (countflag = 1)
until
(local.counter = n - 1)
await ( go changed ? )
Chapter 5
P4 – the leader
All processes has
arrived, change go
0
3
2
1
local.counter
Synchronization
Algorithms
and
Concurrent
Programming
bit and exit barrier
Gadi Taubenfeld © 2014
36
A Barrier without Memory Initialization
Two new techniques
1.
The leader count each process twice

Needs only to count to 2n – 2

Allows off-by-one mistakes

Thus make memory initialization redundant
2.
Asymmetric-ness

Process has a role according to its index i

Pros:
saves bits and operations

Cons:
different processes differ in their tasks
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
37
Asymmetric Test-and-set based Barrier w/o M/I
program of process i
shared
countflag: test-and-test-and-set bit, initial value is immaterial
go: atomic bit, initial value is immaterial
local
local.go: a bit, initial value is immaterial
local.counter: atomic register, initial value is immaterial
No need for the
leader
test-and-set bit
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
38
Asymmetric Test-and-set based Barrier w/o M/I
program of process i
1
local.go := go
2
if i = 1 then
3
local.counter := 0
4
repeat
5
await(countflag = 1)
6
local.counter = local.counter + 1
7
reset(countflag)
8
until (local.counter = 2n - 2)
9
go := 1 – go
10
else
// a test operation
// the other processes
11
await(test-and-set(countflag) = 0)
12
await(test-and-set(countflag) = 0)
13
await(local.go ≠ go)
14
Chapter 5
// the leader
fi
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
39
Test-and-Set based Barriers
Properties
 Different object (T&S instead of F&I)
 Pros:
 Shared memory: Only bits - O(1) space
As opposed to the counter-based which requires O(log n)
 Does not require memory initialization (in the second version)
 Cons:
 Asymmetric (in the second version)
 Still high contention on countflag & go bits
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
40
Tree Based Barriers
Section 5.5
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
41
A Tree-based Barrier
 The processes are organized in a binary tree
 Each node is owned by a predetermined process
 Each process waits until its 2 children arrive, combines the
results and passes them on to its parent
 When the root learns that its 2 children have arrived, it tells
its children that they can move on
 The signal propagates down the tree until all the processes
get the message
1
2
4
Chapter 5
3
5
6
7
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
42
A Tree-based Barrier
1
Assume 𝑛
𝑖𝑘 −1
=2
2𝑖
2
2𝑖
+1
3
4
8
5
9
10
6
11
12
7
13
14
15
arrive
go
2
Chapter 5
3
4
5
6
7
8
9
10 11 12 13 14 15
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
43
A Tree-based Barrier
program of process i
shared
arrive[2..n]: array of atomic bits, initial values = 0
go[2..n]: array of atomic bits, initial values = 0
1
2
await(arrive[2] = 1); arrive[2] := 0
3
await(arrive[3] = 1); arrive[3] := 0
4
go[2] = 1; go[3] = 1
5
else if i ≤ (n-1)/2 then
6
await(arrive[2i] = 1); arrive[2i] := 0
7
await(arrive[2i+1] = 1); arrive[2i+1] := 0
8
arrive[i] := 1
9
await(go[i] = 1); go[i] := 0
10
go[2i] = 1; go[2i+1] := 1
11
else
// root
// internal node
// leaf
12
arrive[i] := 1
13
await(go[i] = 1); go[i] := 0 fi
14
Chapter 5
if i=1 then
fi
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
44
A Tree-based Barrier
Example Run for n=7 Processes
Waiting for
p3 to arrive
arrive[2]=1
PP132 zeros
zeros
Arrive[1]=1
Finished!!
arrive[6,7]
arrive[4,5]
arrive[2]
arrive[3]
?
1
Waiting for
go[3]
Waiting for
p4 to
go[2]
arrive
2
Waiting for
go[5]
go[4]
3
4
Chapter 5
5
6
Waiting for
go[7]
go[6]
7
arrive
01
01
1
0
1
0
01
01
go
1
1
1
1
1
1
2
3
4
5
6
7
45
A Tree-based Barrier
 Pros:
 Low shared memory contention
 No bit is shared by more than 2 processes
 Good for larger n
 Fast (in comparison to local spinning)
– information from the root propagates after log(n) steps
 Uses only atomic bits (no special objects)
 On some models:
 each process spins on a locally accessible bit
 # (remote memory ref.) = O(1) per process
 Cons:
 Shared memory space complexity – O(n)
 Asymmetric – not all the processes do the same amount
of work (*)
(*) There is a similar barrier which is symmetric, but at the cost of more
shared memory consumption -- O(nlogn) as opposed to O(n) .
See the Dissemination Barrier from Section 5.6 page 213.
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
46
The See-Saw Barrier
Section 5.7
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
47
See-Saw Barrier
 Now, we’ll use a Read-Modify-Write object
 Allows to construct a symmetric barrier, that requires only
few shared bits
 This algorithm can also be used to solve the leader election
and the consensus problems
 The See-Saw barrier is based on a solution to the wake-up
problem which was proposed by M. J. Fischer, S. Moran, S.
Rudich, G. Taubenfeld (1996)
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
48
Read-Modify-Write Register
 Input: register
r with n bits, function f(r)
 Atomic operation:
 Reads the register
f on r, return value is written into r
 The old value of r is returned
 Calls function
function read-modify-write (r : register, f : function)
orig_r := r;
r := f(r);
return (orig_r);
end-function
 Usually
Chapter 5
f is custom made for the algorithm
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
49
Data Flow
 Tokens:
 Each process starts with 2 tokens
 Total number of tokens doesn’t change
 Each process can absorb one token or emit one token,
at a time
 See-Saw:
 One see-saw
 Can be left-up-right-down OR left-down-right-up
 Each process that enters the playground needs to getup on the see-saw
 Each process which is on the see-saw is either on the
left side or the right side
 Tokens are weightless
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
50
Data Representation
Using 2-bit read-modify-write register
Token Bit
Two states:
1.
no-token-present
2.
token-present
See-saw Bit
Two states:
1.
left-side-down
2.
right-side-down
P2
T: 3
2
P1
T: 1
2
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
51
Process State
On right
side
On left
side
Never
been on
P7
P3
T:T:22
Got-off
P6P5
T:T:2P2
T:22
P1
T: 0
P4
T: 2
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
52
Runtime Flow
 Each process loops until it got-off from the see-saw
 After it got-off, waits for the go flag
 The algorithm is based on 5-rules
 On each loop iteration:
 According to its state, one rule is performed
 Only one process at a time performs a rule
 A rule is done atomically, using the RMW register
 Each rule changes the tokens and/or the state of the see-saw
 There can be many processes on each side (up to
1
n
2
)
 When one of the processes gets 2n tokens, it gets-off and
sets the go flag
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
53
Rule #1 – Start of
Algorithm
Token-state
No-token-present
See-saw-state
Right-side-down
Left-side-down
RMW
 Applicable if:
 scheduled process is “never-been-on”
 Operation:
 Saves the go bit locally
 got on the up side, and swings the see-saw
P2
T: 2
Chapter 5
P1
T: 2
54
Rule #2 – Emitter
Token-state
No-token-present
Token-present
See-saw-state
Left-side-down
RMW
 Applicable if:
 scheduled process is “down-side”, has tokens,
and token-state = no-token-present
 Operation:
 Deposit one token in the shared token-state
 If remains without tokens, got-off the see-saw, and swing
it
P1
T: 2
P2
T: 21
Chapter 5
55
Rule #3 – Absorber
Token-state
No-token-present
Token-present
See-saw-state
Left-side-down
RMW
 Applicable if:
 scheduled process is “up-side”, and
token-state = token-present
 Operation:
 Takes the token from token-state
P1
T: 23
P2
T: 1
Chapter 5
56
Rule #2 – Emitter
Token-state
No-token-present
Token-present
See-saw-state
Right-side-down
Left-side-down
RMW
 Applicable if:
 scheduled process is “down-side”, has tokens,
 and token-state = no-token-present
 Operation:
 Deposit one token in the shared token-state
 If remains without tokens, got-off the see-saw, and swing
!
it
The process that got-off now
awaits the go flag
P1
T: 3
P2
T: 0
1
Chapter 5
57
Rule #4 – Leader
Token-state
No-token-present
Token-present
See-saw-state
Right-side-down
RMW
 Applicable if:
 scheduled process is on the see-saw, and sees at
least 2n tokens
 Operation:
 Gets-off the see-saw, and flips the shared go bit
ZZZ…
P1
T: 3
P2
T: 0
Chapter 5
58
Rule #4 – Leader
Token-state
No-token-present
See-saw-state
Right-side-down
RMW
 Applicable if:
 scheduled process is on the see-saw, and sees at
least 2n tokens
 Operation:
 Gets-off the see-saw, and flips the shared go bit
go!
ZZZ…
P1
T: 4
P2
T: 0
Chapter 5
59
Rule #5 – End of
the Algorithm
Token-state
No-token-present
See-saw-state
Right-side-down
RMW
 Applicable if:
 scheduled process notices that the go bit has been
flipped (relative to its local.go)
 Operation:
 Everybody has arrived  continue past the barrier
go!
ZZZ…
P2
T: 0
Chapter 5
P1
T: 4
60
Important Invariants
 Token Invariant
 During a single episode of the see-saw barrier,
the number of tokens in the system
 is either 2n or 2n+1
 never changes
(like in the test-and-set barrier)
 Balance Invariant
 During a single episode of the see-saw barrier,
the number of processes on the left and on the
right side of the see-saw is
 either perfectly balanced
 or favored the down-side by 1
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
61
Correctness
 When all processes are on the see-saw:
 Tokens are given from the down side, until one gets-off
 By induction, at some point:
 one process will see 2n tokens
 So no deadlock.
 2n tokens can only be accumulated if all processes have
arrived, so this is a barrier.
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
62
Remarks
 All the logic is done inside the atomic
of the RMW register
Modify function
 Needs to read and modify all the three bits atomically,
to prevent race-conditions
 Before a process applies a rule, it first checks
whether the go bit has been flipped relative to its
local.go (regardless of its current state) !!!
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
63
Question
 How many times does the state of the shared memory change
during one episode of the see-saw barrier?
 O(n) in the best case
 O(n2) in the worst case
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
64
The See-Saw Barrier
 Pros:
 O(1) shared memory space complexity
 No need to initialize shared memory
 Symmetric
 Cons:
 Uses custom Read-Modify-Write register
 High memory contention on the RMW bits
 Worst case O(n2) total shared memory
references
 Complex
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
65
The code:
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
66
A Barrier using Semaphores
Section 5.8
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
67
Semaphore
 Shared object
 Takes a non-negative integer value
 Supports two operations:
 Down
 If value > 0, the value is decremented by 1
 Otherwise, the process is blocked until the value
becomes > 0
 Up – the value is incremented by 1
 Incrementing, Decrementing and testing the
semaphore are executed atomically
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
68
Binary Semaphore
 Semaphore whose value is only 0 or 1
 Decrementing is identical to general semaphore
 Incrementing is equal to setting the value to 1
 Initial value is assume to be 1
 Can be used to implement a deadlock-free
mutual exclusion:
down(S)
critical-section
up(S)
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
69
Barrier using Semaphores
Algorithm for n processes
shared
arrival: binary semaphore, initially 1
departure: binary semaphore, initially 0
counter: atomic register ranges over {0, …, n}, initially 0
1
down(arrival)
2
counter := counter + 1
3
if counter < n then up(arrival) else up(departure) fi
4
down(departure)
5
counter := counter - 1
6
if counter > 0 then up(departure) else up(arrival) fi
// atomic register
Question:
Would this barrier be correct if the
shared counter won’t be an atomic register?
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
70
Barrier using Semaphores
Properties
 Pros:
 Very Simple
 Space complexity O(1)
 Symmetric
 Cons:
 Required a strong object
 Requires some central manager
 High contention on the semaphores if no central
manager
 Propagation delay O(n)
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
71
Summary
Barrier Synchronization
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
72
Barriers we’ve seen
 Simple barrier
 Based on atomic fetch-and-increment counter
 Local spinning barrier
 Based on atomic fetch-and-increment counter
and go array
 Test-and-Set barriers
 Based on test-and-test-and-set objects
 One version without memory initialization
 Tree-based barrier
 See-Saw barrier
 Semaphore-based barrier
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
73
Conclusions
 Many possible algorithms for Barrier Synchronization
 Each has pros/cons
 Different shared objects allow various algorithms
 Choosing the correct barrier is application/platform
dependent (need to do benchmarking to know for sure).
Chapter 5
Synchronization Algorithms and Concurrent Programming
Gadi Taubenfeld © 2014
74