Synchronization

Transcript Synchronization

Synchronization in Distributed Systems

Chapter 6

Guide to Synchronization Lectures

• Synchronization in shared memory systems • Event ordering in distributed systems – Logical time, logical clocks, time stamps, • Mutual exclusion in distributed systems – Centralized, decentralized, etc.

– Election algorithms • Data race detection in multithreaded programs (if time permits)

Background

• Synchronization: coordination of actions between processes. • Processes are usually asynchronous, (operate independent of events in other processes) • Sometimes need to cooperate/synchronize – For mutual exclusion – For event ordering (was message x from process P sent before or after message y from process Q?)

Introduction

• Synchronization in centralized systems is primarily accomplished through shared memory; – Semaphores can be used for mutex, & for forcing a specific order – Event ordering is clear because all events are timed by the same clock • Synchronization in distributed systems is harder – No shared memory, no common clock

Clock Synchronization

• Some applications rely on knowledge of event ordering to be meaningful.

– Suppose you receive an email that was sent at an earlier time than the time it was received – confusing, if not wrong!

– See page 232 for other examples • Event ordering is easier if you can accurately time-stamp events, but in a distributed system the clocks may not always be synchronized

Physical Clocks - pages 233-238

• Physical clock example: counter + holding register + oscillating quartz crystal – The counter is decremented at each oscillation – Counter interrupts when it reaches zero – Reloads from the holding register – Interrupt = clock tick • Software clock: counts interrupts – This value represents number of seconds since some predetermined time (Jan 1, 1970 for UNIX systems; beginning of the Gregorian calendar for Microsoft) – Can be converted to normal clock times • The same technology is used in most clocks, watches, cell phones

Clock Skew

• In a distributed system each computer has its own clock • Each crystal will oscillate at slightly different rate.

• Over time, the software clock values on the different computers are no longer the same.

Clock Skew

•

Clock skew(offset)

: the difference between the times on two different clocks • C

lock drift

: the difference between a clock and actual time • Ordinary quartz clocks drift by ~ 1sec in 11-12 days. (10 -6 secs/sec) • High precision quartz clocks drift rate is somewhat better

Various Ways of Measuring Time*

• The sun – Mean solar second – gradually getting longer as earth’s rotation slows.

• International Atomic Time (TAI) – Atomic clocks are based on transitions of the cesium atom – Atomic second = value of solar second at some fixed time (no longer accurate) • Universal Coordinated Time (UTC) – Based on TAI seconds, but more accurately reflects sun time (inserts leap seconds to synchronize atomic second with solar second)

Getting the Correct (UTC) Time*

• WWV radio station or similar stations in other countries (accurate to +/- 10 msec) • UTC services provided by earth satellites (accurate to .5 msec) • GPS (Global Positioning System) (accurate to 20-35 nanoseconds)

Clock Synchronization Algorithms

• In a distributed system one machine may have a WWV receiver and some technique is used to keep all the other machines in synch with this value.

• Or, no machine has access to an external time source and some technique is used to keep all machines synchronized with each other, if not with “real” time.

Clock Synchronization Algorithms

• Network Time Protocol (NTP): – Objective: keeps clocks in a system synchronized to UTC time (1-50 msec accuracy); – Designed to be used in the Internet – Uses a hierarchy of passive time servers • The Berkeley Algorithm: – Objective: to keep all clocks in a system synchronized to each other (internal synchronization) – Uses active time servers that poll machines periodically • Reference broadcast synchronization (RBS) – Objective: to keep all clocks in a wireless system synchronized to each other

Three Philosophies of Clock Synchronization

• Try to keep all clocks synchronized to “real” time as closely as possible • Try to keep all clocks synchronized to each other, even if they vary somewhat from “real” time • Try to synchronize enough so that interacting processes can agree upon an event order.

– Refer to these “clocks” as

logical clocks

6.2 Logical Clocks

• Observation: if two processes (running on separate processors) do not interact, it doesn’t matter if their clocks are not synchronized.

• Observation: When processes do interact, they are usually interested in event

order

, instead of exact event

time

• Conclusion: Logical clocks are sufficient for many applications

Formalization

• The distributed system consists of n processes, p 1 , p 2 , …p n (e.g, a MPI group) • Each p i executes on a separate processor • No shared memory • Each p i has a state s i • Process execution: a sequence of events – Changes to the local state – Message Send or Receive

Two Versions

• Lamport’s logical clocks: Can be used to determine an absolute ordering among a set of events although the order doesn’t necessarily reflect causal relations between events.

• Vector logical clocks: can capture the causal relationships between events.

Lamport’s Logical Time

• Causality in this sense is defined as a “happens-before” relation between events in a process.

• "Events" are defined by the application. The granularity may be as coarse as a procedure or as fine-grained as a single instruction. • Leslie Lamport defined the happens-before relation.

Happens Before

Relation (a



b)

• a  b: 3 rules for determining (pp. 244-5) – in the same [sequential] process, – send, receive in different processes, (messages) – transitivity: if a  b and b  c, then a  • If a  b, then a and b are c causally related ;

i.e

., event a potentially has a causal effect on event b.

Concurrent Events

• •

Happens-before

defines a partial order of events in a distributed system.

• Some events can’t be placed in a happens before order

and

are concurrent ( !(



) and !(



a a || b

) if • If a and b aren’t connected by the happened before relation, there’s no way one could affect the other.

Logical Clocks

• Needed: method to assign a “timestamp” to event

(call it C(a)). • The method must guarantee that the clocks have certain properties, in order to reflect the definition of happens-before.

• Define a clock (event counter), C i , at each process (processor) P i . • When an event a occurs, its timestamp ts(a) = C(a), the local clock value at the time the event takes place.

Correctness Conditions

• If a and b are in the same process, and a  b then C (a) < C (b) • If a is the event of sending a message from Pi, and b is the event of receiving the message by Pj, then (it should be true that) C i (a) < C j (b). • The value of C i must be increasing (time doesn’t go backward).

– Corollary: any clock corrections must be made by adding a positive number to a time.

Implementation Rules

• Before executing an event a in Pi, increment the local clock (C i = C i + 1) – So for any two events a & b in the same process, where b immediately follows a, C(b) = C(a) + 1 • When a message m is sent from P i , set its time stamp ts m to C i , the time of the send event after following previous step.

• When the message is received at P j must be greater than ts m the local time . The rule is (C j = max{C j , ts m } + 1).

Lamport’s Logical Clocks (2)

Event a

: P1 sends m1 to P2 at t = 6,

Event b

: P2 receives m1 at t = 16.

If C(a) is the time m1 was sent, and C(b) is the time m1 is received, do C(a) and C(b) satisfy the correctness conditions ?

Figure 6-9. (a) Three processes, each with its own clock. The clocks “run” at different rates.

Lamport’s Logical Clocks (3)

Event c

: P3 sends m3 to P2 at t = 60

Event d

: P2 receives m3 at t = 56 Do C(c) and C(d) satisfy the conditions?

Figure 6 9. (b) Lamport’s algorithm corrects the clocks.

Application Layer Application sends message m i Deliver m i to application Adjust local clock, Timestamp m i Adjust local clock (if needed) Middleware layer Middleware sends message Message m i is received Network Layer

Figure 6-10

. The positioning of Lamport’s logical clocks in distributed systems Handling clock management as a middleware operation

Example of a three-process system exchanging messages

Figure 5.3 (Advanced Operating Systems,Singhal and Shivaratri )

P1 e11 e12 e13 e14 e15 e16 e17 P2 e21 e22 e23 e24 e25 Which events are causally related?

Which events are concurrent?

eij represents event j on processor i

A Total Ordering Rule

(does not guarantee causality) • A total ordering of events can be obtained if we ensure that no two events happen at the same “time” (have same timestamp).

• Why? So all processes can agree on an unambiguous order for system events.

• How? Attach process number to low-order end of time, separated by decimal point; e.g., event at time 40 at process P1 is 40.1,event at time 40 at process P2 is 40.2

• Arbitrarily but deterministically we have chosen to order P1 before P2 (write P1 => P2)

Figure 5.3 -

Singhal and Shivaratri P1 e11 e12 e13 e14 e15 e16 e17 P2 e21 e22 e23 e24 e25 What is the total ordering of the events in these two processes if each clock is initially 1 and the normal clock adjustment process is followed?

Application of Lamport Clocks: Total Order Multicast

• Consider a banking database, replicated across several sites.

• Queries are processed originally at the geographically closest replica but other replicas are updated.

• We need to be able to guarantee that DB updates are seen in the same order everywhere.

• Lamport’s total ordering relation (=>) can be used for this

Totally Ordered Multicast

Update 1: Process 1 at Site A adds $100 to an account, (initial value = $1000) Update 2: Process 2 at Site B increments the account by 1% Without synchronization, it’s possible that replica 1 = $1111, replica 2 = $1110

• Message 1: add $100.00

Message 2: increment account by 1% • The replica that sees the messages in the order m1, m2 will have a final balance of $1111 • The replica that sees the messages in the order m2, m1 will have a final balance of $1110

The Problem

• Site 1 has final account balance of $1,111 after both transactions complete and Site 2 has final balance of $1,100.

• Which is “right”? Either, from the standpoint of consistency.

• Problem: lack of consistency.

– Both values should be the same • Solution: make sure all sites see/process all messages in the same order.

Assumptions

• Updates are multicast to all sites, including (conceptually) the sender • No messages are lost • All messages from a single sender arrive at each replica in the order in which they were sent; i.e., if send(m i ) → send(m j ) then every site that receives both messages should receive m i before m j . • Messages are time-stamped with Lamport clock values (the totally ordered version).

Implementation

• When a process receives a message, put it in a local message queue, ordered by timestamp.

• Multicast an acknowledgement to all sites • Each ack has a timestamp larger than the timestamp on the message it acknowledges • The message queue at each site will eventually be in the same order

Implementation

• Deliver a message to the application only when the following conditions are true: – The message is at the head of the queue – The message has been acknowledged by all other receivers. This guarantees that no update messages with earlier timestamps are still in transit.

• Acknowledgements are deleted when the message they acknowledge is processed.

• Since all queues have the same order, all sites process the messages in the same order.

Importance

• Maintaining consistent replicas is an important feature in distributed systems • Totally ordered multicast is a way to implement it correctly.

Causality

•

Happens-before relation

reflects causality: – Events

a a



or and

b b



are causally related if either (

has a causal effect on

) – If neither of the above relations hold, then there is no causal relation between

. We say that

(

and

are concurrent) • The

total ordering relation

doesn’t reflect causality.

Vector Clock Rationale

• Lamport clocks limitation, for a & b on different processors: – If (a  b) then C(a) < C(b) but – If C(a) < C(b) then we only know that either (a  b) or (a || b), i.e., b a • In other words, you cannot look at the clock values of events on two different processors and decide which one “happens before”.

• Lamport clocks do not capture causality

Lamport’s Logical Clocks (3)

m 2 ’ m 3 ’ Suppose we add a message to the scenario in Fig. 6.12(b).

• Tsnd(m1) < Tsnd(m3’).

(6) < (32) • Does this mean send(m1)  send(m3’)? But … • Tsnd(m1) < Tsnd(m2’).

(6) < (10) • Does this mean send(m1)  send(m2)?

Figure 6-12.

Review

• Physical clocks: hard to keep synchronized • Logical clocks: can provide some notion of relative event occurrence • Lamport’s logical time – happened-before relation defines causal relations – Lamport’s logical clocks – don’t capture causality – total ordering relation is useful • use in establishing totally ordered multicast as well as other applications • Vector clocks – Unlike Lamport clocks, vector clocks capture causality – Have a component for each process in the system

Figure 5.4

Time Space P1 e11 .

(1) e12 (2) P2 e21 (1) e22 (3) P3 e31 (1) e32 (2) e33 (3) C(e11) < C(e22) and C(e11) < C(e32) but while e11  e11  e22, we cannot say e32 since there is no causal path connecting them. So, with Lamport clocks we can guarantee that if C(a) < C(b) then b a , but by looking at the clock values alone we cannot say whether or not the events are causally related.

Vector Clocks – How They Work

• Each processor keeps a vector of values, instead of a single value.

• VC i is the clock at process i; it has a component for each process in the system.

– VC i [i] corresponds to P i ‘s local “time”.

– VC i [j] represents P i ‘s knowledge of the “time” at P j (the # of events that P i knows about at Pj) • Each processor knows its own “time” exactly, and updates the values of other processors’ clocks based on timestamps received in messages.

Implementation Rules

• IR1: Increment VC i [i] before each new event at P i .

• IR2: When process i sends a message

it sets

’s (vector) timestamp to VC i (after incrementing VC i [i]) • IR3: When a process receives a message the clock management system does a component by-component comparison of the message timestamp to its local time and picks the maximum of the two corresponding components. Adjust local components accordingly.

• Then deliver the message to the application.

Figure 5.5. Singhal and Shivaratri P1 (1, 0 , 0) e11 P2 (0, 1, 0) e21 (2, 0, 0) e12 (3, 0, 0) e13 (4, 5, 2) e14 (2, 2, 0) e22 (2, 3, 1) e23 (2,4,2) e24 (2, 5, 2) e25 P3 (0, 0, 1) e31 (0, 0, 2) e32 (0, 0, 3) e33 Vector clock values. In a 3- process system, VC(Pi) = vc1, vc2, vc3

Establishing Causal Order

• When Pi sends a message

to Pj, Pj knows – How many events occurred at Pi before

was sent – How many

relevant m

events occurred at other sites before was sent (relevant = “happened-before”) • In Figure 5.5, VC(e 24 ) = (2, 4, 2). Two events in P1 and two events in P3 “happened before” e they don’t have a causal effect on e 24 .

24 . – Even though P1 and P3 may have executed other events,

Happened Before/Causally Related Events - Vector Clock Definition

• a → b iff ts(a) < ts(b) (a happens before b iff the timestamp of a is less than the timestamp of b) • Events a and b are causally related if – ts(a) – ts(b) < ts(b) or < ts(a) (in other words, a → b or b → a ) • Otherwise, we say the events are concurrent.

• Any pair of events that satisfy the vector clock definition of happens-before will also satisfy the Lamport definition, and vice-versa.

Comparing Vector Timestamps

•

Less than

: ts(a) < ts(b) iff at least one component of ts(a) is strictly less than the corresponding component of ts(b) and all other components of ts(a) are either less than or equal to the corresponding component in ts(b). • (3,3,5) < (3,4,5), (3, 3, 3) ═ (3, 3, 3), (3,3,5) > (3,2,4), (3, 3 ,5) | | (4,2,5).

Figure 5.4

Time P1 P2 P3 e11 (1, 0, 0) e21 (0, 1, 0) e12 (2, 0, 0) e22 (2, 2, 0) e31 (0, 0,1) e32 (0, 0, 2) e33 (0, 0, 3) ts(e11) = (1, 0, 0) and ts(e32) = (0, 0, 2), which shows that the two events are concurrent.

ts(e11) = (1, 0, 0) and ts(e22) = (2, 2, 0), which shows that e11 e22

Causal Ordering of Messages An Application of Vector Clocks

•

Premise

: Deliver a message only if messages that causally precede it have already been received –

i.e

., if

send(m 1 )



send(m 2

), then it should be true that

receive(m 1 )



receive(m 2 )

at each site. – If messages are not related (

send(m 1 ) || send(m 2 )),

delivery order is not of interest.

Compare to Total Order

• Totally ordered multicast (TOM) is, in a way, stronger (more inclusive) than causal ordering (COM).

– TOM orders

all

messages, not just those that are causally related.

– “Weaker” COM is often what is needed.

Enforcing Causal Communication

• Clocks are adjusted only when sending or receiving messages; i.e, these are the only events of interest.

• Send

: P i increments

VC i [i]

applies timestamp,

ts(m)

by 1 and • Receive

: Pi compares

VC i VC i [k] k

≠ i .

max{VC i [k] , ts(m)[k]}

ts(m)

; set for each

Message Delivery Conditions

• Suppose:

P J

receives message • Middleware delivers

m m

from

P i

to the application iff – –

ts(m)[i] = VC j [i] +

1 • all previous messages from

P i

P j

have been delivered

ts(m)[k] ≤ VC i [k]

for all k ≠ i •

P J

has received all messages that

P i

had seen before it sent message

• In other words, if a message

is received from P i, you should also have received every message that

P i

received before it sent m; e.g., – if m is sent by

P 1

and

ts(m)

is (3, 4, 0) and you are

P 3

, you should already have received exactly 2 messages from P 1 and at least 4 from P 2 – if m is sent by P 2 you are P 3 and ts(m) is (4, 5, 1, 3) and if and VC 3 is (3, 3, 4, 3) then you need to wait for a fourth message from P 2 and at least one more message from P 1 .

P0 Figure 6-13. Enforcing Causal Communication VC (1, 0, 0) m 0 VC 0 (1, 1, 0) P1 (1, 1, 0) VC 1 m* P2 (0, 0, 0) VC2 (1, 0, 0) VC2 (1, 1, 0) VC 2 P1 received message

from P0 before sending message

receiving

to P2; P2 must wait for delivery of

before ( Increment own clock only on message send, increment other components, if needed, on message receipt.

) Before sending or receiving any messages, one’s own clock is (0, 0, …0)

History

• ISIS and Horus were middleware systems that supported the building of distributed environments through

virtually synchronous process groups

• Provided both totally ordered and causally ordered message delivery.

– “Lightweight Causal and Atomic Group Multicast” – Birman, K., Schiper, A., Stephenson, P, ACM Transactions on Computer Systems, Vol 9, No. 3, August 1991, pp 272-314.

Location of Message Delivery

• Problems if located in middleware: – Message ordering captures only potential causality; no way to know if two messages from the same source are actually dependent.

– Causality from other sources is not captured.

•

End-to-end argument

: the application is better equipped to know which messages are causally related.

• But … developers are now forced to do more work; re-inventing the wheel.

Synchronization

Transcript Synchronization