filename*=utf8''DCS_5_Wei_Yuan_20131121&response-cache

Download Report

Transcript filename*=utf8''DCS_5_Wei_Yuan_20131121&response-cache

DCS 6. Basic Distributed Algorithms
Fundamentals
Wei Yuan
November,21,2013
Outline
• Physical Clocks
• Logical Clocks
– Lamport’s Logical Clock
– Vector Clock
• Global Snapshots
2
Physical Clocks
• Most computers today keep track of the passage of
time with a battery-backed-up CMOS clock circuit,
driven by a quartz oscillator.
– battery backup to continue measuring time when power is off
• Two registers with quartz: counter, holding register
• A Programmable Interval Timer, to generate an
interrupt (clock tick) periodically
• The interrupt service procedure simply adds one to a
counter in memory.
3
Problem
• Getting two systems to agree on time
– Two clocks hardly ever agree
– Quartz oscillators oscillate at slightly different frequencies
• Clocks tick at different rates
– Create ever-widening gap in perceived time
– Clock Drift(时钟漂移)
• Difference between two clocks at one point in time
– Clock Skew(时钟偏移)
4
Solution
•
•
•
•
国际原子时间(international atomic time,TAI)
统一协调时间(Universal coordinated time,UTC)
……
时间同步算法
5
Outline
• Physical Clocks
• Logical Clocks
– Lamport’s Logical Clock
– Vector Clock
• Global Snapshots
6
Lamport’s Logical Clock
• A distributed system consists of a collection of distinct
processes which are spatially separated, and which
communicate with one another by exchanging
messages.
– A network of interconnected computers, the ARPA net
– A single computer :the central control unit, the memory units,
and the input-output channels are separate processes
• Lamport L. Time, clocks, and the ordering of events in a
distributed system[J]. Communications of the ACM,
1978, 21(7): 558-565.
7
Lamport’s happened before (→) relation
• Define the "happened before" relation without using
physical clocks(partial ordering)
• Assumption
–
–
–
–
the system is composed of a collection of processes
Each process consists of a sequence of events
the execution of a subprogram on a computer
the execution of a single machine instruction
• We are assuming that the events of a process form a
sequence, where a occurs before b in this sequence if
a happens before b.
8
Lamport’s happened before (→) relation
(1) In the same process:
if 𝑡𝑖𝑚𝑒 𝑎 < 𝑡𝑖𝑚𝑒 𝑏 ,then 𝑎 → 𝑏
(2) If 𝑎 is the sending of a message by one process
and 𝑏 is the receipt of the same message by another
process, then 𝑎 → 𝑏.
(3) If 𝑎 → 𝑏 and 𝑏 → 𝑐 then 𝑎 → 𝑐.
• Two distinct events 𝑎 and 𝑏 are said to be concurrent if
𝑎 ↛ 𝑏 and 𝑏 ↛ 𝑎.
• Assume that 𝑎 ↛ 𝑎 for any event 𝑎. (→ is an irreflexive
partial ordering)
9
space-time diagram
•
•
•
•
•
horizontal: space
vertical: time
dots: events
vertical lines: process
wavy lines: messages
10
• A clock is just a way of assigning a number to an event
(abstract)
– Clock 𝐶𝑖 for each process 𝑃𝑖
• assign a number 𝐶𝑖 𝑎 to any event 𝑎 in the process
– Clock 𝐶 for the entire system
• 𝐶 𝑏 = 𝐶𝑗 𝑏 if 𝑏 is an event in process 𝑃𝑗
• Clock Condition
– For any events 𝑎, 𝑏: if 𝑎 → 𝑏 then 𝐶 𝑎 < 𝐶 𝑏 .
– Cannot expect the converse condition to hold, since that would
imply that any two concurrent events must occur at the same
time.(e.g., p2&p3 are both concurrent with q3)
11
• A process’ clock “ticks”
– (1)means that there must be a tick line between any two
events on a process line
– (2)means that every message line must cross a tick line
12
Event counting example
13
Lamport’s logical timestamps
• Process 𝑃𝑖 ’s clock is represented by a register 𝐶𝑖 , so
𝐶𝑖 𝑎 is the value contained by 𝐶𝑖 during the event 𝑎.
• All processes use a local counter (logical clock) with
initial value of zero
• Just before each event, the local counter is
incremented by 1 and assigned to the event as its
timestamp
• A send (message) event carries its timestamp
• For a receive (message) event, the counter is updated
by max (receiver’s-local-counter, message-timestamp) + 1
14
Event counting example
Applying Lamport’s algorithm
15
Problem: Identical timestamps
• Concurrent events (e.g., b & g; i & k) may have the
same timestamp … or not
• Total ordering: every event is assigned a unique
timestamp (number), every such timestamp is unique.
16
Unique timestamps (total ordering)
We can force each timestamp to be unique
• Define global logical timestamp 𝑇𝑖 , 𝑖
– 𝑇𝑖 represents local Lamport timestamp
– 𝑖 represents process number (globally unique)
• e.g., (host address, process ID)
• Compare timestamps:
– 𝑇𝑖 , 𝑖 < 𝑇𝑗 , 𝑗 if and only if
– 𝑇𝑖 < 𝑇𝑗 or 𝑇𝑖 = 𝑇𝑗 and 𝑖 < 𝑗
• Does not necessarily relate to actual event ordering
17
• Unique (totally ordered) timestamps
18
Problem: Detecting causal relations
• If 𝐶 𝑎 < 𝐶 𝑏
– We cannot conclude 𝑎 → 𝑏 .
• By looking at Lamport timestamps
– We cannot conclude which events are causally related
• Solution: use a vector clock
19
Outline
• Physical Clocks
• Logical Clocks
– Lamport’s Logical Clock
– Vector Clock
• Global Snapshots
20
Vector clocks
Rules:
1. Vector initialized to 0 at each process
𝑉𝑖 𝑗 = 0 𝑓𝑜𝑟 𝑖, 𝑗 = 1, ⋯ , 𝑁
2. Process increments its element of the vector in local vector
before timestamping event:
𝑉𝑖 𝑖 = 𝑉𝑖 𝑖 + 1
3. Message is sent from process 𝑃𝑖 with 𝑉𝑖 attached to it
4. When 𝑃𝑗 receives message, compares vectors element by
element and sets local vector to higher of two values
𝑉𝑗 𝑖 = max 𝑉𝑖 𝑖 , 𝑉𝑗 𝑖 𝑓𝑜𝑟 𝑖 = 1, ⋯ , 𝑁
• For example,
received: [ 0, 5, 12, 1 ], have: [ 2, 8, 10, 1]
new timestamp: [ 2, 8, 12, 1 ]
21
Comparing vector timestamps
• Define
𝑉 = 𝑉 ′ iff 𝑉 𝑖 = 𝑉 ′ 𝑖 𝑓𝑜𝑟 𝑖 = 1, ⋯ , 𝑁
𝑉 ≤ 𝑉 ′ iff 𝑉 𝑖 ≤ 𝑉 ′ 𝑖 𝑓𝑜𝑟 𝑖 = 1, ⋯ , 𝑁
• For any two events e, e’
if 𝑒 → 𝑒 ′ then V(e) < V(e’)
… just like Lamport’s algorithm
if V(e) < V(e’) then 𝑒 → 𝑒 ′
• Two events are concurrent if neither
V(e)≤V(e’) nor V(e’)≤ V(e)
22
Vector timestamps
(0,0,0)
(0,0,0)
(0,0,0)
23
Vector timestamps
(1,0,0)
(0,0,0)
(0,0,0)
(0,0,0)
24
Vector timestamps
(1,0,0)
(2,0,0)
(0,0,0)
(0,0,0)
(0,0,0)
25
Vector timestamps
(1,0,0)
(2,0,0)
(0,0,0)
(0,0,0)
(0,0,0)
26
Vector timestamps
(1,0,0)
(2,0,0)
(0,0,0)
(0,0,0)
(0,0,0)
27
Vector timestamps
(1,0,0)
(2,0,0)
(0,0,0)
(0,0,0)
(0,0,0)
28
Vector timestamps
(1,0,0)
(2,0,0)
(0,0,0)
(0,0,0)
(0,0,0)
29
Vector timestamps
(1,0,0)
Two events are concurrent if neither
V(e)≤V(e’) nor V(e’)≤ V(e)
(2,0,0)
(0,0,0)
(0,0,0)
(0,0,0)
30
Vector timestamps
(1,0,0)
(2,0,0)
(0,0,0)
(0,0,0)
(0,0,0)
31
Vector timestamps
(1,0,0)
(0,0,0)
(2,0,0)
(2,1,0)
(0,0,0)
(0,0,0)
32
Vector timestamps
(1,0,0)
(0,0,0)
(2,0,0)
(2,2,0)
(0,0,0)
(0,0,0)
33
Outline
• Physical Clocks
• Logical Clocks
– Lamport’s Logical Clock
– Vector Clock
• Global Snapshots
34
“Distributed snapshots: determining global states of
distributed systems”, K. Mani Chandy and Leslie Lamport
, ACM TOCS 1985
35
Model of a Distributed System
•
•
•
•
Finite set of processes as nodes.
Finite set of channels as edges.
Channels have infinite buffers, are error-free and FIFO.
The delay experienced by a message is arbitrary but finite.
c1
p
q
c2
c3
c4
r
36
A banking example to illustrate recording of consistent states
37
Global State of a Distributed System
Global State:
• Union of the local states of the individual processes and the state
of the channels.
• The state of a channel is determined by “Message in transit” where
the message is sent along the channel but not yet received.
• Initial global state for system:
– each process is in initial state
– the state of each channel is empty sequence
分布式系统的每个组件都有一个本地状态。
进程状态:由本地存储器和活动历史描述。
通道状态:由沿通道发送的消息减去沿通道接收消息
的序列描述。
38
Global State Detection
• Many problems in distributed systems can be solved by detecting
a global state of system.
• Stable property detection
– A stable property which once becomes true, remains true
forever.
– E.g. termination, deadlock, token loss etc.
• Checkpointing in distributed systems
– E.g .debugging, failure recovering etc.
检测如死锁和终止这样的稳态特性时,就需要检查系统全局状态。
对于故障恢复,需要周期性地保存分布式系统的全局状态(称检查
点),并通过把系统还原到最近保存的全局状态使恢复工作从进程故
障点开始。
分布式系统中没有共享的存储器和全局时钟,本地时钟和本地存储器
这样的分布式特性使得有效记录系统全局状态很困难。
39
Distributed Computation
• A distributed computation is the sequence of events.
• There are three kind of events: local, send, receive.
• An event is an atomic action that may change the state of the
process p and the state of at most one channel that is incident on p.
Definition of Event e
• Event is a five-tuple e = <p, s, s', M, c>, where
• p is the process in which the event occur,
• s is the state of p immediately before the event,
• s' is the state of p immediately after the event,
• M is the message sent or received along the channel c.
40
Consistent Global State
• Consistency: every message that is recorded as received has also
been recorded as sent.
• Consistent global states determined by a snapshots are the states
that may have occurred during the computation.
同时满足以下两个条件:
C1. 消息守恒。记录在进程pi的本地状态中发送的消息mij必
须出现在通道Cij的状态中,或是出现在接收方进程pj的本地
状态中。
C2. 在得到的全局状态中,对于每一个结果,引起结果的
原因也必须出现。
41
Chandy–Lamport Algorithm
• Each process in the system records its local state and the state of
its incoming channels.
• Recorded states form a consistent global state.
• Snapshot algorithm runs concurrently with the computation but
does not alter the underlying computation.
• Snapshot algorithm uses marker as a recording signal.
• Any process can initiate the snapshot by sending a marker for all
outgoing channels.
• On receiving a marker a process records its own local state and
the states of all incoming channels.
42
Chandy–Lamport Algorithm contd.
Marker-Sending Rule for Process pi
(1) Process pi records its state.
(2) For each outgoing channel C on which a marker
has not been sent, pi sends a marker along C
before pi sends further messages along C.
43
Chandy–Lamport Algorithm contd.
Marker-Receiving Rule for Process pj
On receiving a marker along channel C:
if pj has not recorded its state then
Record the state of C as the empty set
Execute the “marker sending rule”
else
Record the state of C as the set of messages
received along C after pj ’s state was recorded
and before pj received the marker along C
44
Thanks!
Q&A
45
附
• 集合𝑋上的关系𝑅称为偏序关系或偏序,当且仅当𝑅是自反的、反对称
的和传递的。
• 偏序(Partial Order)
设A是一个非空集,P是A上的一个关系,若P满足下列条件:
1. 对任意的a∈A,(a,a)∈P;(自反性)
2. 若(a,b)∈P,且(b,a)∈P,则 a=b;(反对称性)
3. 若(a,b)∈P,(b,c)∈P,则(a,c)∈P;(传递性)
则称P是A上的一个偏序关系。
若P是A上的一个偏序关系,我们用a≤b来表示(a,b)∈P。
• 设 𝑃, ≤ 是一个偏序集合,如果对于每一个𝑥, 𝑦 ∈ 𝑃,或者有𝑥 ≤ 𝑦,或
者有𝑦 ≤ 𝑥,则称小于等于为𝑃上的全序或线序。
46