Distributed File Systems - The University of Alabama in

Download Report

Transcript Distributed File Systems - The University of Alabama in

Distributed File Systems
Synchronization – 11.5
Consistency and Replication - 11.6
Fault Tolerance – 11.7
11.5: Synchronization
• File System Semantics
• File Locking
Synchronization
• Is an issue only if files are shared
• Sharing in a distributed system is often
necessary, and at the same time can affect
performance in various ways.
• In the following discussion we assume file
sharing takes place in the absence of
process-implemented synchronization
operations such as mutual exclusion.
UNIX File Semantics
• In a single-processor system, any file read
operation returns the result of the most
recent write operation.
• Even if two writes occur very close together,
the next read returns the result of the last
write.
• It is as if all reads and writes are timestamped from the same clock. Operation
order is based on strict time ordering.
UNIX Semantics in DFS
• Possible to (almost) achieve IF…
– There is only one server
– There is no caching at the client
• In this case every read and write goes
directly to the server, which processes
them in sequential order.
– Network delays might make minor differences
in wall clock ordering.
Caching and UNIX Semantics
• Single-server + no client caching leads to
poor performance, so most file systems
allow users to make local copies of files
(or file blocks) that are currently in use.
• Now UNIX semantics are problematic: a
write executed on a local copy only will not
be seen by another client that reads the
file from the server, or from other clients
that have the file cached.
Write-Through
• A possible solution is to require all
changes to local copies to be immediately
written to the server.
– Inefficient – caching is no longer as useful
– Not a total solution: what happens when two
users have the same file cached?
Consistency Models
• Recall discussion of consistency models in
Chapter 7
• Realistically, strict consistency or even
sequential consistency can’t be easily
achieved without synchronization
techniques such as transactions or locks
• Here we consider what the file system can
do in the absence of user-enabled
methods.
Session Semantics
• Instead of trying to implement UNIX
semantics where it really is impractical,
define a new semantic:
– Local changes to a file are not made
permanent until the file is closed. If another
user opens the file, it gets the original version.
– This approach is common in DFS’s.
• In effect, this turns a remote-access model
into an upload-download model.
Simultaneous Caching
• What if two users concurrently cache and
modify the same file? How do we
determine the “new” state of the file?
• Possibilities:
– The most recently closed file becomes the
new “official” version (most common)
– The decision is unspecified (an unlikely
choice)
Immutable Files
• The only operations on a file are,
effectively, create, read, and replace.
– Once a file is created it can be read but not
changed.
– A new file (incorporating changes to a current
file) can be created and placed in the
directory instead of the original version.
• If several users try to replace an existing
file at the same time, one is chosen: either
the last to close, or non-deterministically.
Review: File System Semantics
• UNIX semantics
• Every file operation is instantly
visible to all processes
• Session semantics • No changes are visible until
the file is closed.
• Immutable files
• No updates are possible; files
can only be replaced
Transaction Semantics
• Transactions are a way of grouping
several file operations together and
ensuring that they are either all executed
or none is executed.
– We say they are atomic.
• The transaction system is responsible for
ensuring that all of the operations are
carried out in order, without any
interference from concurrent transactions.
The Transaction Model
• Transaction: a set of operations which
must be executed entirely, or not at all.
• Processes in a transaction can fail at
random
– Failure causes: hardware or software
problems, network problems, lost messages,
etc.
• Transactions will either commit or abort:
– Commit => successful completion (All)
– Abort => partial results are undone (Nothing)
Transaction Model
• Transactions are delimited by two special
primitives:
Begin_transaction // or something similar
transaction operations
(read, write, open, close, etc.)
End_transaction
• If the transaction successfully reaches the end
statement, it “commits” and all changes become
permanent; otherwise it aborts.
ACID Properties of Transactions
• Atomic: either all or none of the operations in a
transaction are performed
• Consistent: the transaction doesn’t affect system
invariants; e.g., no money “lost” in a banking
system
• Isolated (serializable): one transaction can’t
affect others until it completes
• Durability: changes made by a committed
transaction are permanent, even if the process
or server fails.
Atomicity
• An atomic action is one that appears to be
“indivisible and instantaneous” to the rest
of the system. For example, machine
language instructions.
• Transactions support the execution of
multiple instructions as if they were a
single atomic instruction.
Consistent
• A state is consistent if invariants hold
• An invariant is a predicate which states
a condition that must be true.
• Invariants for the airline ticket example:
– seatsLeft = seatsTotal – seatsSold
– seatsLeft >=0
• In the bank case (simplified)
– balancefinal = balanceoriginal – withdrawals +
deposits
Isolated
• No other transaction will see the
intermediate results of a transaction.
• Concurrent transactions have the same
effect on the database as if they had run
serially. Notice the similarity to critical
sections, which do run serially.
• This characteristic is enforced through
special concurrency control measures.
AD Properties
• ACID is a commonly used term, but
somewhat redundant.
• Transactions that execute atomically will
be consistent and isolated.
• Atomicity and durability capture the
essential qualities.
Semantics of File Sharing in
Distributed Systems
• UNIX semantics
• Every file operation is instantly
visible to all processes
• Session semantics • No changes are visible until
the file is closed.
• Immutable files
• No updates are possible; files
can only be replaced
• Transactions
• All changes occur and are
visible atomically – or not at all
File Locking
• UNIX file semantics are not possible in
DFS
• Session semantics and immutable files do
not always support the kind of sharing
processes need.
• Transactions have a heavy overhead.
• Thus some additional form of locking is
desirable to enforce mutual exclusion on
writes.
File Locking in NFSv4
• Lock managers in NSF, as in other file
systems, are based on the centralized
scheme discussed in Chapter 6
– Client requests lock
– Lock manager grants lock
– Client releases lock (or it expires after a time)
• In NSF, if a client requests a lock which
cannot be granted, the client is not
blocked – must try again later.
Denied Requests
• If a client’s request for a lock is denied, it
receives an error message.
– Poll the server later for lock availability
• Clients can request to be put on a FIFO
queue; when a lock is released it is
reserved for the first process on the
queue; if that process polls within a certain
amount of time it gets the lock.
File locking in NFS
• Two types of locks:
– Reader locks, which can be held
simultaneously,
– Writer locks, which guarantee exclusive
access.
• The lock operation is applied to
consecutive byte sequences in the file,
rather than to the whole file.
NFSv4 Lock Related Operations
Operation
Description
• Lock
• Create a lock for a range of
bytes
• Test whether a conflicting lock
has been granted
• Remove a lock from a range of
bytes
• Renew the lease on a lock
• Lockt
• Locku
• Renew
Leases
• Locks are granted for a specific time
interval.
• At the end of that interval the lock is
removed unless the client has requested
an extension.
Share Reservations in NFS
• An open request specifies the kind of
access the application requires: READ,
WRITE, BOTH
• It also specifies the kind of access that
should be denied other clients: NONE,
READ, WRITE, BOTH
• If requirements can’t be met, open fails
• Share reservations = implicit locking
Share Reservations - Example
• Client tries to open a file for reading and
writing, and deny concurrent write access.
– If no other client has the file open, the request
succeeds.
– If another client has opened the file for reading,
the request succeeds
– If another client has opened the file for writing,
the request fails.
– If another client has the file open and has
denied read or write access, the request fails.
11.6: Consistency and Replication
• Client-Side Caching
• Server-Side Replication
• Replication in P2P Systems
Introduction
• Replication (and caching) => multiple copies
of something
• Two reasons for replication:
– Reliability (protection against failure, corruption)
– Performance (size of user base, geographical
extent of system)
• Replication can cause inconsistency: at least
one copy is different from the rest.
Caching in a DFS
• Caching in any DFS reduces access
delays due to disk access times or
network latency.
• Caches can be located in the main
memory of either the server or client
and/or in the disk of the client
• Client-side caching (memory or disk)
offers most benefits, but also leads to
potential inconsistencies.
Cache Consistency Measures
• Server-initiated consistency: server
notifies client if its data becomes stale
– e.g., another client closes its copy of the file,
which was opened for writing.
• Client-initiated consistency: client is
responsible for consistency of data
– e.g., client side software can periodically
check with server to see if file has been
modified.
Caching in NFS
• NFSv3 did not define a caching protocol.
• Different implementations led to different
results.
• “Stale” data – data that doesn’t agree with
the data at the server – could exist for
periods ranging from a few seconds to ½
minute
Cache Consistency Problem
• How can stale data (relative to server) be
avoided?
• NFSv4 does not improve the system
enormously, but there are some changes
• Many details are still implementation
dependent.
• General structure – next slide
Client Side Caching in NFS
Figure 11-21.
Memory
Cache
Client
application
Disk
cache
Network
NFS
server
What Do Clients Cache?
• File data blocks
• File handles – for future reference
• Directories
Caching File Data
• The simplest approach to caching allows the
server to retain control over the file.
• Procedure
–
–
–
–
Client opens file
Data blocks are transferred to the client (by read ops)
Client can read and write data in the cache.
When the file closes, flush changes back to server
• Session semantics & NFS: the last (most recent)
process to close a file has its changes become
permanent. Changes made by processes that run
concurrently are lost.
Caching with Server Control
• In caching with server control
– All clients on a single machine may read and write the
same cached data if they have access rights
– data remaining in the cache after a file closes doesn’t
need to be removed, altho changes must be sent to
server.
• If a new client on the same machine opens a file
after it has been closed, the client cache
manager usually must validate local cached data
with the server
– If the data is stale, replace it.
Caching With Open Delegation
• Allows a client machine to handle some
local open and close operations from
other clients on the same machine.
– Normally the server decides if a client can
open a file
• Delegation can improve performance by
limiting contact with the server
• The client machine gets a copy of the
entire file, not just certain blocks.
Open delegation – Examples*
• Suppose a client machine has opened a
file for writing, and has been delegated
rights to control the file locally.
– If another local client tries to lock the file, the
local machine can decide whether or not to
grant the lock
– If a remote client tries to lock the file (at the
server) the server will deny file access
• If a client has opened the file for reading,
only, local clients desiring write privileges
must still contact the server.
Delegation and Callbacks
• Server may need to “undelegate” the file –
perhaps when another client needs to
obtain access.
• This can be done with a callback, which is
essentially an RPC from server to client.
• Callbacks require the server to maintain
state (knowledge) about clients – a reason
for NFS to be stateful.
Caching Attributes*
• Clients can cache attributes as well as data.
– (size of file, number of links, last date modified, etc.)
• Cached attributes are kept consistent by the
client, if at all
– No guarantee that the same file cached at two sites
will have the same attributes at both sites
• Attribute modifications should be written through
to the server (write through cache coherence
policy), although there’s no requirement to do so
Leases*
• Lease: cached data is automatically
invalidated after a certain period of time.
– Applies to file attributes, file handles (mapping
of name to file handle), directories, and
sometimes data.
– When lease expires, must renew data from
server
– Helps with consistency.
An Implementation of Leases*
• Data blocks have time-stamps applied by the
server that indicate when they were last
modified.
• When a block is cached at a client, the server’s
time-stamp is also cached.
• After a period of time, the client confirms the
validity of the data
– Compare timestamp at the client to timestamp at
server
– If server timestamp is more recent, invalidate client
data
Coda
A Prototype Distributed File System
• Developed at CMU – M. Satarayanan
• Started in 1987 as an improvement on the
Andrew file system ( a classic research
FS)
• Most recent version of Coda (6.9.3) was
released 1/11/2008
(http://www.coda.cs.cmu.edu/news.html )
Objectives of Coda
• Support disconnected operation (server
goes down, laptop is disconnected from
network, etc.)
• Client side caching is extensive
– Uses client disk cache
• Replication contributes to availability, fault
tolerance, scalability
Caching in Coda
• Critical, because of Coda’s objectives
• Caching achieves scalability; provides
more fault tolerance for the client in case it
is disconnected from the server.
• When a client opens a file, the entire file is
downloaded. This is true for reads and
writes.
Concurrent Access
• In Coda, many clients may have a file
open for reading, but only one for writing.
– Multiple readers and single writer may exist
concurrently
– In NFS and most other file systems, multiple
readers and multiple writers can exist
concurrently.
Callbacks/Server Initiated Cache
Consistency
• A Coda callback is an agreement
between the server and a client. Server
agrees to notify client when a file has
been modified by another client.
• At this time, the client may purge the file
from its cache, but it may also continue
reading the outdated copy.
• This is a blend of session and
transaction semantics.
Coda Callbacks
• Callback promise: server’s commitment to
notify client when file changes
• Callback break: notice from server that the
client’s file is stale; called a “break”
because it terminates the agreement.
There will be no further callbacks unless
the client renews it.
Figure 11-23, page 523
• Local copies of files can be used as long
as the client still has an outstanding
callback promise
– No other client has closed a modified file.
client 1
cache
server
client 2
cache
Suppose clients 1 & 2 have cached the same file.
Client 1 modifies the file
How/when does client2 know?
What role, if any, does the server have?
Are Coda and NFS different?
11.6.2: Server-Side Replication
• Caching is a form of replication at the client
side.
– Initiated by client request
– Cached information is temporary
– Unit of caching = a file, or less (usually)
– Purpose: improved performance
• Server replication
– Mainly for fault tolerance & availability
– May actually degrade performance (overhead)
– Less common than caching in DFS
Caching & Replication in Coda
• Unit of replication = volume (group of
related files)
• Each volume is stored on several servers,
its Volume Storage Group (VSG)
• Available Volume Storage Group (AVSG)
is the set of servers a client can actually
reach
• Contact one server to get permission to
R/W, contact all when closing an updated
file.
Server
S1
Server
S3
Server
S2
Client
A
Open(f)
Broken
network
Client
B
Open(f)
Figure 11-24. Two clients with a different AVSG for the same file
Writing in Disconnected Systems
• Each file has a Coda version vector (CVV),
analogous to vector timestamps, one
component per server. Starts at (1, 1, 1)
• Update local component after a file is
updated.
• As long as all servers get all updates, all
timestamps will be equal
Detecting Inconsistencies
• In the previous example, both A and B will
be allowed to open a file for writing.
• When A closes, it will update S1 and S2, but
not S3; B will update S3, but not S1, S2.
• The timestamp at S1 and S2 will be [2, 2, 1].
• The timestamp at S3 will be [1, 1, 2].
• It is easy to detect the inconsistency, but
knowing how to resolve them is application
dependent.
Replication in P2P Systems
• In P2P systems replication is more
important because
– P2P members are less reliable – may leave
the system or remove files
– Load balance is important since there are no
designated servers
• File usage in P2P is different: most files
are read only, updates consist of adding
new files, so consistency is less of an
issue.
Unstructured P2P Systems
(each node knows n neighbors)
• Look-up = search (in structured systems,
lookup is directed by some algorithm)
• Replication speeds up the process
• How to allocate files to nodes (it may not
be possible to force a node to store files)
– Uniformly distribute n copies across network
– Allocate more replicas for popular files
– Users who download files are responsible for
sharing them with others (as in BitTorrent)
Structured P2P Systems
• Replication is used primarily for load
balance
• Possible approaches:
– Store a replica at each node in the search
path (concentrates replicas near the prime
copy, but may unbalance some nodes)
– Store replicas at nodes that request a file,
store pointers to it at nodes along the way.
11.7: Fault Tolerance in DFS*
• Review of Fault Tolerance
• Handling Byzantine Failures
• High Availability in P2P systems
Basic Concepts - Review
• Distributed systems may experience partial failure
• Build systems to automatically recover from
crashes.
• Continue to operate normally while failures are
being repaired; i.e., be fault tolerant.
• Fault tolerant systems exhibit dependabilty.
– Availability: the system is immediately ready to use
– Reliability: the system can run continuously without
failing.
• (remember availability/reliability example)
– Safety: system failure doesn’t have disastrous
consequences
– Maintainability: easy to repair
Failure Models
• Failure may be due to an error at any place in
the system:
–
–
–
–
The server crashes
The network goes down
A disk crashes
Security violations occur
• Crash failure, omission failure, Byzantine
failure:
– Incorrect, but undetectable;
– malicious servers produce deliberately wrong results,
– ...
Handling Byzantine Failures in
Distributed File Systems
• Replication handles many errors in DFS
but Byzantine errors are harder to solve.
• Text presents an algorithm by Castro and
Liskov that works as long as no more than
1/3 of the nodes is faulty at any moment.
• Clients must get the same answer from
k+1 servers (in a system with 3k +1) to be
sure the answer is correct.
Availability in P2P Systems
• Possible approaches
– Replication (although must be at very high
levels due to unreliability of nodes)
– Erasure coding: divides a file into m
fragments, recodes them into n > m fragments
such that any set of m fragments can be used
to reconstruct the entire file. Distribute
fragments, rather than entire file replicas
• Requires less redundancy than full replication.
THE END