Consistency and Replication

Download Report

Transcript Consistency and Replication

Consistency and Replication
Reasons for Replication
•Reliability
•Performance
–Scaling with respect to size of geographic area
Negative Points
•Having multiple copies around does not help with
consistency of data (i.e.,local cache of http browsers)
1
Object Replication
• Organization of a single distributed remote object shared by
two different clients.
• Before replicating an object one has to solve the problem of
how to protect this against simultaneous accesses?
2
Concurrent Access of Remote Objects
a) A remote object capable of handling concurrent invocations on its own (ala
Java – object methods can be synchronized).
b) The object is completely unprotected against concurrent invocations.
•An object adapter is needed to handle concurrent invocations.
•This adaptor ensures that the object will not be left in an inconsistent/corrupted
state.
3
Methods for Replicating Objects
1. Make object aware of the replication
– No special support required
– Only objects must be replication-aware…
2. Make the distributed computing environment
responsible for the replication
– The Distributed System ensures that concurrent invocations
are passed to the correct replicas in the right order/sequence!
– Such layers can be used for fault-tolerance as well.
4
Object Replication
a)
b)
A distributed system for replication-aware distributed objects.
A distributed system responsible for replica management
5
Replication vs. Scaling
• Replication and Caching are widely used as mechanisms for improved
performance.
• There is a trade-off between replication and network bandwidth
required to maintain absolute consistency.
– Consider a process P that accesses a local replica N times per second
– The object gets modified M times per second
– If N<<M then it is likely that many updated versions of the data may never get
accessed by P!!
– In this case, it would be advisable if there would be no local copy of the object
near P.
• There are numerous consistency and replication models (each
catering for different applications).
6
Data-Centric Consistency Models
• A store may be physically distributed across multiple machines.
• Data segments may be replicated across multiple processes.
• Consistency models are needed (set of rules that is between processes and data store)
• Model (a contract): if processes obey the rules the store will work/behave correctly.
7
Rule: Any read on a data item X returns a value corresponding to the
result of the most recent write on X
• strict consistency relies on absolute global time..
Behavior of two processes, operating on the same data item.
•
A strictly consistent store (changes become obvious instantly)..
•
A store that is not strictly consistent (at different moments in time
after the W(x)a takes place two different results may come out).
8
Sequential Consistency Model
(weaker than strict)
Rule: The results of any execution is the same as if the R/W operations
by all processes on the data-store were executed in some sequential
order and the operations of each individual process appear in this
sequence in the order specified by its program.
• time does not play any role
• all processes “see” the same interleaving of operations
• sequential consistency is similar to serializability in the the case of xactions.
a)
b)
A sequentially consistent data store.
A data store that is not sequentially consistent.
9
Linearizability Model
• Less restrictive than “strict coherence” but stronger than sequential consistency
• Operations are assumed to receive a timestamp using a globally available clock.
• Definition: the result of any execution is the same as if the R/W operations by all
processes on the data store were executed in some sequential order AND the
operations of each individual process appear in this sequence in the order specified
by its program
IN ADDITION: if TSOP1(x) < TSOP2(y), then OP1(x) should precede OP2(y) in this
sequence.
•
•
•
•
A linearizable data store is also sequentially consistent.
A set of synchronized clocks are used
Much more expensive to implement than sequential [AttWelch94].
Linearizability used in the formal verification of distributed programs and is of
theoretical interest.
10
Sequential Consistency Example
Consider the following three concurrently executing processes.
Process P1
Process P2
Process P3
x = 1;
print ( y, z);
y = 1;
print (x, z);
z = 1;
print (x, y);
•Initially all variables are set to 0
•Assignments are essentially the Write operations
•Prints are the Read operations.
11
time
Sequential Consistency Example
•
•
•
•
x = 1;
print ((y, z);
y = 1;
print (x, z);
z = 1;
print (x, y);
x = 1;
y = 1;
print (x,z);
print(y, z);
z = 1;
print (x, y);
y = 1;
z = 1;
print (x, y);
print (x, z);
x = 1;
print (y, z);
y = 1;
x = 1;
z = 1;
print (x, z);
print (y, z);
print (x, y);
Prints: 001011
Prints: 101011
Prints: 010111
Prints: 111111
Signature:
001011
(a)
Signature:
101011
(b)
Signature:
110101
(c)
Signature:
111111
(d)
Four valid (interleaved) execution sequences for the processes of the previous slide.
Signature 000000 is not possible
Is 001001 a valid signature? Why?
Problem: how does one formally express sequential consistency?
12
History
• Each process Pi has an execution Ei
• Ei= sequence of Reads/Writes performed by Pi on the data-store.
• History H is a sequence of operations that appear in all Pi
• A history H is valid when
– Program order is maintained
– Data coherence is respected (a read(x) must return the value most recently
written to x).
Consider Ei (page 9)
E1= W1(x)b
E2=W2(x)a
E3=R3(x)b, R3(x)a
E4=R4(x)b, R4(x)a
H=W1(x)b, R3(x)b, R4(x)b, W2(x)a, R3(x)a, R4(x)a is a valid one (in terms of
sequential consistency).
13
Causal Consistency Model
Necessary conditions:
• Writes that are potentially causally related must be seen by all
processes in the same order.
• Concurrent writes may be seen in a different order on
different machines.
Notes:
• Causal is a weakening of the sequential consistency model.
• It makes distinction between events that are potentially
causally related (and those that are not).
• Operations that are not causally related are said to be
concurrent.
14
Casual Consistency Model
• W1(x)c and W2(x)c are concurrent.
• This means that processes are not required to “see” them in the same
order.
• This sequence is allowed with a causally-consistent store, but not with
sequentially or strictly consistent store.
15
Casual Consistency
a)
b)
A violation of a casually-consistent store.
A correct sequence of events in a casually-consistent store.
Implementing Causal Consistency requires keeping track of
which processes have seen which writes.
16
FIFO (PRAM) Consistency
• The causally-related requirement is dropped.
• Definition:
– Writes done by a single process are seen by all other processes in the
order in which they were issued,
– but writes done by different processes may be seen in a different order
by different processes.
• FIFO Consistency is interesting as it is easy to implement
• No guarantees how different processes see writes except that
two or more writes from the same source (process) must arrive
in order.
17
FIFO Consistency Example
A valid sequence of events of FIFO consistency
Some unexpected results may yield using FIFO:
Process P1
Process P2
x=1
y=1;
if (y==0) kill (P2);
if (x==0) kill (P1);
Using FIFO both P1 and P2 may get killed!
18
Weak Consistency
Properties:
• Accesses to synchronization variables associated with a data
store are sequentially consistent
• No operation on a synchronization variable is allowed to be
performed until all previous writes have been completed
everywhere
• No read or write operation on data items is allowed to be
performed until all previous operations to synchronization
variables have been performed.
19
Weak Consistency
a)
A valid sequence of events for weak consistency.
a)
b)
b)
S synchronizes all local copies (a, b) to the data store.
P2 and P3 have not synchronized yet (S comes after their reads)
An invalid sequence for weak consistency.
20
Release Consistency
Avoids the explicit Synchronization (S) operation of Weak Consistency
by using two operations:
• Acquire: tell the data store that a critical region is about to be entered.
– All the local data copies will be brought up to date with the remote ones if
needed.
– Doing an acquire does not mean that local changes will be sent out to other
copies
• Release: indicate that a critical region has just been exited.
– When a release occurs, protected data that have been updated are other local
copies of the data store.
– A release does not import changes from other copies.
21
Release Consistency
A valid event sequence for release consistency.
22
Release Consistency
Rules:
• Before a read or write operation on shared data is performed, all
previous acquires done by the process must have completed
successfully.
• Before a release is allowed to be performed, all previous reads
and writes by the process must have completed
• Accesses to synchronization variables are FIFO consistent
(sequential consistency is not required).
23
Entry Consistency
• Requires programmers to use acquire/release at the
start and the end of a critical sections
• Unlike the previous two, it does call for each data
item to be acquired/released individually!
24
Entry Consistency
Conditions:
• An acquire access of a synchronization variable is not allowed to
perform with respect to a process until all updates to the guarded
shared data have been performed with respect to that process.
• Before an exclusive mode access to a synchronization variable by
a process is allowed to perform with respect to that process, no
other process may hold the synchronization variable, not even in
nonexclusive mode.
• After an exclusive mode access to a synchronization variable has
been performed, any other process's next nonexclusive mode
access to that synchronization variable may not be performed until
it has performed with respect to that variable's owner.
25
Entry Consistency
A valid event sequence for entry consistency.
26
Summary of Consistency Models
Consistency
Description
Strict
Absolute time ordering of all shared accesses matters.
Linearizability
All processes must see all shared accesses in the same order. Accesses are furthermore
ordered according to a (nonunique) global timestamp
Sequential
All processes see all shared accesses in the same order. Accesses are not ordered in time
Causal
All processes see causally-related shared accesses in the same order.
FIFO
All processes see writes from each other in the order they were used. Writes from different
processes may not always be seen in that order
(a)
Consistency
Description
Weak
Shared data can be counted on to be consistent only after a synchronization is done
Release
Shared data are made consistent when a critical region is exited
Entry
Shared data pertaining to a critical region are made consistent when a critical region is
entered.
(b)
a)
b)
Consistency models not using synchronization operations.
Models with synchronization operations.
27
Client-Centric Consistency Models
• Models described thus far deal with system-wide
consistency
• Basic assumption: concurrent processes may simultaneously
update the data store and it is required to provide
concurrency in the light of such concurrency.
• Client-centric models deal with environments where
updates can “easily” be resolved.
– Most operations involved reading data.
28
Eventual Consistency
Principle: if no updates take place for a long time, it means that
all replicas will have gradually become consistent.
•Write/write conflicts are easy to resolve (since only a small number
of processes can perform updates).
The principle of a mobile user accessing different replicas of a distributed
database (the mobile computer is unaware of which replica is using).
29
Monotonic Reads
Principle: if a process reads the value of a data item x, any
successive read operation on x by that process will always
return that same value or a more recent value
-A process that has seen a value x at time t will never see an older
version of x at a later time
-Used in managing mailboxes
-WS(x1, x2): means that write operation on x2 is preceded by a write on x1 (on
the site where the second update takes place).
The read operations performed by a single process P at two different local copies
of the same data store.
a) A monotonic-read consistent data store
b) A data store that does not provide monotonic reads.
30
Monotonic Writes
Principle: a write operation by a process on a data item x is
completed before any successive write operation on x by the
same process.
•
•
Completing a write operation means that the copy on which a
successive write operation is performed, reflects the effects
of a previous write operation by the same process.
Resembles FIFO-consistency
The write operations performed by a single process P at two different local
copies of the same data store
a) A monotonic-write consistent data store.
b) A data store that does not provide monotonic-write consistency.
31
Read Your Writes
Principle: the effect of a write operation by a process on data
item x will always be seen by a successive read operation on x
by the same process.
•A write operation is always complete before a successive read operation
•HTML pages do not usually adhere to this consistency model.
a)
b)
A data store that provides read-your-writes consistency.
A data store that does not (W(x1) effect has not be
propagated to L2).
32
Writes Follow Reads
Principle: a write operation by a process on a data item x following
a previous read operation on x by the same process, is guaranteed to
take place on the same or a more recent value of x that was read.
•Updates are propagated as a result of previous read operations
•Any successive write operation by a process on a data item x will be performed on
a copy of x that is up to date with the value most recently read by that process.
a)
b)
A writes-follow-reads consistent data store
A data store that does not provide writes-follow-reads
consistency
33
Client-Centric
• These consistency models are derived from Bayou
[Terry94]
– Database system developed for a mobile computing environment.
– Network connectivity is assumed unreliable.
– Wireless networks and large-geographic area systems are such.
34
Replica Placement
The logical organization of different kinds of
copies of a data store into three concentric rings.
35
Permanent Replicas
• Small in number
• Mirroring (as the main/initial tool for populating the sites –
especially in the World Wide Web).
• Similar organizations appear in (static) Data Architectures
in COWs (shared-nothing-environments).
36
Server-Initiated Replicas
• In a number of instances, temporary replicas may be established in
various geographic regions (especially for the content of WWW).
• Such replicas are also knows as push-caches.
• Problem: to dynamically place replicas of files close to user-sites
[Rabi99]
– Replicate WWW file to reduce load on the server site
– Specific file has to be located/migrated close to clients that issue requests for it
(proximity clients)
37
Server-Initiated Replicas
• Counting access requests from different clients.
• When rep(S,F) > threshold’ -> initiate migration
– Degree of replication is increased
• When del(S,F) < threshold -> reverse migration
– degree of replication is decreased
• P can be selected with the help of routing databases.
• After the file F migrates to P’ the cntQ(P’, F) is maintained at P’.
38
Server-Initiated Replicas
• A server deletes a replica of data if access requests for an object F drop below
del(Q,F) (unless it is the last copy!)
• If for some server P, cntQ(P, F) exceeds more than half of total requests for F at Q,
server P takes over the copy of F.
– Server Q will attempt to migrate object F to P.
• Server-Initiated Replication:
if migration of F to P is not possible (for any reason)
– Q will attempt to replicate file F to another server.
– Server may start checking server that are located to furtherst away (for selecting a
potential hosting site).
39
Client-Initiated Replicas
• Client-initiated replicas are known as client-caches.
• Client-caches are used to improve access time to data.
• Data are kept in clients for a limited time (after which they
may become stale).
• Cache-hits can help in performance.
• Proxies further help (as they group access from multiple
parties in the same organization)
• Caches can be put at various level of an organization..
40
Update Propagation
Three fundamental options:
• Send a notification of the update (invalidation-copies might not be valid)
• Ship data from one copy to the other
• Propagate the update operations to other copies.
+ for Invalidation-based protocols:
•
•
•
Little network bandwidth is used
Only information that needs to be transferred is which data is not valid any more.
Useful when write-to-read ratio is large (and replicas are rendered useless).
+ for transferring of modified data among copies:
• Useful when write-to-read ratio is small
+ for transferring the update operations (aka active replication):
• Small patches of the description of updates are shipped
• CPU requirements may be increased especially when update operations are complex.
41
Push vs. Pull Protocols for Update Propagation
• Push-based (Server-based) protocol
– Updates are sent over to other replicas without the latter to ask for such updates
– Replicas need to be kept identical
– Ratio of read-to-write must be large
• Pull-based (client-based) protocols
– Effective when read-to-write is low
– Web caches mostly use this approach.
– The response time increases if cache misses appear.
42
Comparison of Pull vs. Push Protocols
Issue
Push-based
Pull-based
State of server
List of client replicas and caches
None
Messages sent
Update (and possibly fetch update later)
Poll and update
Response time at client
Immediate (or fetch-update time)
Fetch-update time
• A comparison between push-based and pull-based protocols in the
case of multiple client, single server systems.
43
Leases +
• A compromise between the two schemes above is the use of leases:
– Lease is a promise by the server that it will push updates within a pre-specified
time period.
– When a lease expires, the client is forced to poll and pull possible updates
– Otherwise, the client requests a new lease for pushing updates
• Leases can be of various types:
– Age-based leases
– How often a client demands that its cache is updated
– State-space overhead at the server: the server goes from a state-conscious
oriented approach to a more stateless one.
Finally, related to pushing is the unicast/multicast mechanism of the
underlying network (ie, use multicast whenever possible).
44
Epidemic Protocols
Update propagation in eventual-consistent environments is implemented
by a class of algorithms called epidemic algorithms:
• Propagate updates using the least number of messages
• Follow the ideas in the spreading of infectious diseases (the reverse!)
• Health Organizations try to prevent spreading of diseases, epidemic algorithms try to
spread updates (infection) as fast as possible.
Server Classification
 A server is infective when it holds an update and is willing to spread it to other
servers.
 A server that has not received the update yet is called susceptible.
 An updated server that is not willing to able to spread the update is said to be
removed.
45
Propagation Model: Anti-entropy
 A server P picks another server Q at random and exchanges updates
with Q.
 The exchange is done with either one of the three approaches:
 P only pushes its own updates to Q
 P only pulls-in new updates from Q
 P and Q send updates to each other (pull-push approach)
 Push-based protocols are not good for spreading the changes rapidly
(infective servers are many making the probability of “finding”
randomly a susceptible server small)
 Pull-based protocols are a much better option when many servers are
infective.
46
Rumor Spreading
o If server P has just been updated for data item x, it contacts an
arbitrary server Q and tries to push the update to Q.
o It is possible that Q has already “seen” the update.
o In this case, P may loose its interest in spreading the update further
and my become removed (with probability 1/k)
It can be shown that if a distributed data store consists of a number of
servers, their fraction s that will remain ignorant of the update can be
found by solving the equation:
s=e-(k+1)(1-s)
o If k=3, the s<=0.02
47
Removing Data using Epidemic Algorithms
Epidemic algorithms have an undesirable side-effect:
• They cannot easily spread the deletion of data items.
• Deletion destroys all information about an item
• When a data item is removed from a server, the server may eventually
receive old copies of the data item and think that these are updates! (or
something that it did not have before).
Idea: keep the items around but circulate death-certificates (that will
ultimately need to be cleaned up).
Death certificates are timestamped and are allowed to exist for a longenough period of time (until it is certain that the data item is certainly
dead!)
48
Consistency Protocols
• A consistency protocol describes the implementation of a
consistency model
• More commonly implemented models are:
– Sequential consistency
– Weak consistency with synchronization
– Atomic Transactions
• Primary-based protocols: each data item has an associated primary,
which is responsible for coordinating write operations on item x
– The simplest such model is the one in which all read and write operations are
carried out at a remote single server (traditional CS model).
49
Remote-Write Protocol (Client-Server)
Primary-based remote-write protocol with a fixed server
to which all read and write operations are forwarded.
50
Primary-Backup Protocol
More interesting are protocols that help clients read from local
servers and send updates to remote ones..
•
•
•
•
The principle of primary-backup protocol.
Updates are implemented as “blocking” operations
Provide a straightforward implementation of sequential consistency.
Such operations impose delays – have problems with fault tolerance if non51
blocking writes are used.. .
Local-Write Protocol
• Primary-based local-write protocol: a single copy is migrated
(moves!) between processes.
• Consistency is easy
• Keep track of where the various copies are..
52
Primary Backup Protocol-Primary copy migrates to the process that
wants to perform the update.
• Primary-backup protocol in which the primary migrates to the process
wanting to perform an update.
• Multiple successive operations can be carried out “locally” (once the item
has been transferred).
• Still need for non-blocking write-through protocol..
53
Replicated-Write Protocols: Active Replication
• Each replica has an associated process that carries out
updates
• Generally updates are propagated with the modifications
that cause the changes (operation is sent to replica in order
to be applied locally).
• Potential problem is that all changes have to be done in the
same order..
– Use of Lamport’s time stamping mechanism to provide order.
– Use of a sequencer (that is working as the central administrator).
• Ship the update first to the sequencer
• Have an id or time-stamp issued and then move the operations to sites
with replicas.
54
Quorum-Based Protocols
• In order to support writes on replicated data, use voting
• Assume N servers with a copy of the object x
– For an update to happen get half and one more sites to agree (and accept the
change)
– Once this has been achieved, the new version of the item is ready.
• To read the object x
– Ask half and one more site to send the version number
– If all the version numbers are the same, then the updated format of the item is at
hand.
55
Quorum-Based Protocols (Gifford Edition)
• Gifford’s scheme is a bit more general:
– To read an item of which N replicas exist, a client needs to assemble a read
quorum (an arbitrary collection of any NR servers)
– To modify an item a write quorum (of at least NW) servers is required.
– Two conditions have to be true at all times:
•
•
•
•
NR+ NW > N and
NW > N/2
The first prevents the read-write conflicts
The second the write-write conflicts
56
Example: Gifford’s Quorum-Based Protocol
Three examples of the voting algorithm:
a)
A correct choice of read and write set
b)
A choice that may lead to write-write conflicts (problem!)
if WSclient1={ABCEFG} and WSclient2= {DHIJKL} then two different updates
can go through with no problem!
c)
A correct choice, known as ROWA (read one, write all)
57