PowerPoint 프레젠테이션 - United International College
Download
Report
Transcript PowerPoint 프레젠테이션 - United International College
Distributed System and Middleware
Distributed Systems
Distributed File System
Dr. Sunny Jeong. [email protected]
Mr. Coling Zhang [email protected]
With Thanks to Prof. G. Coulouris, Prof. A.S. Tanenbaum
and Prof. S.C Joo
Distributed System and Middleware
Overview
Requirements for distributed file systems
transparency, performance, fault-tolerance, Consistency...
Design issues
possible options, architectures
file sharing, concurrent updates
Caching
Examples
Sun Network File System
Andrew File System
1
Distributed System and Middleware
Distributed Services
2
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
Definitions
•
A Distributed File System ( DFS ) is simply a classical model of a file
system ( as discussed before ) distributed across multiple machines.
The purpose is to promote sharing of dispersed files.
•
This is an area of active research interest today.
•
The resources on a particular machine are local to itself.
other machines are remote.
•
A file system provides a service for clients. The server interface is the
normal set of file operations: create, read, etc. on files.
Resources on
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
Definitions
Clients, servers, and storage are dispersed across machines. Configuration and
implementation may vary a)
b)
c)
a)
Servers may run on dedicated machines, OR
Servers and clients can be on the same machines.
The OS itself can be distributed (with the file system a part of that
distribution.
A distribution layer can be interposed between a conventional OS and the
file system.
Clients should view a DFS the same way they would a centralized FS; the distribution is
hidden at a lower level.
Performance is concerned with throughput and response time.
Distributed System and Middleware
Distributed file service
Basic services
persistent file storage of data and programs
operations on files (create, open, read, write…)
multiple remote clients within intranet
file sharing
typically one-copy update semantics over RPC
Many new developments
persistent object stores (storage of objects)
Persistent Java, CORBA, …
replication, whole-file caching
distributed multimedia (Tiger video file server)
5
Distributed System and Middleware
Storage system and their properties
Sharing
Persistence
Distributed
cache/replicas
Consistency
maintenance
Example
Main memory
1
RAM
File system
1
UNIX file system
Distributed file system
Sun NFS
Web
Web server
Distributed shared memory
Ivy (Chap. 16)
Remote objects (RMI/ORB)
1
CORBA
Persistent object store
1
CORBA Persistent
Object Service
Persistent distributed object store
PerDiS, Khazana
* “1” is for one-copy consistency
6
Distributed System and Middleware
Characteristics of file systems
Operations on files ( =data + attributes)
create/delete
query/modify attributes
open/close
read/write
access control
Storage organization
directory structure (hierarchical, pathnames)
metadata (= file management information, data about data)
file attributes
directory structure information, etc
7
Distributed System and Middleware
Characteristics of file systems
Persistently stored data sets( files = data + attributes)
Hierarchic name space visible to all processes
API with the following characteristics:
access and update operations on persistently stored data sets
sequential access model (with additional random facilities)
Sharing of data between users, with access control
Concurrent access:
certainly for read-only access
what about updates?
8
Distributed System and Middleware
File attribute record structure
File length
updated
by system:
Creation timestamp
Read timestamp
Write timestamp
Attribute timestamp
Reference count
Owner
updated
by owner:
User
controlled
File type
Access control list(ACL)
E.g. for UNIX: rw-rw-r--
9
Distributed System and Middleware
File system Modules
Directory modul e:
relates fi le names to fil e IDs
File module:
relates fi le IDs to partic ul ar fil es
Acc es s control module:
c hecks permi ss ion for operati on reques ted
File ac cess module:
reads or writes file data or attributes
Bloc k module:
acc es ses and allocates dis k blocks
Device module:
disk I/O and buffering
Concentrate on higher levels.
10
Distributed System and Middleware
Distributed file system requirements [1/4]
Facilities
support the sharing of persistent storage and information
enable user programs to access files without copying them to a local disk
Transparency (clients unaware of the distributed nature)
access transparency - client unaware of distribution of files, same interface
for local/remote files
location transparency - uniform file name space from any client workstation
mobility transparency - files can be moved from one server to another
without affecting client
performance transparency - client performance not affected by load on
service
scaling transparency - expansion possible if numbers of clients increase
11
Distributed System and Middleware
[Distributed file system requirements –ctd[2/4]
Concurrent file updates
changes by one client do not affect another
Isolation
File-level or record-level locking
Other forms of concurrency control to minimise contention (Minimum
Competition)
File replication
File service maintains multiple identical copies of files
Load-sharing between servers makes service more scalable
Local access has better response (lower latency)
Fault tolerance
Full replication is difficult to implement
Caching (of all or part of a file) gives most of the benefits (except fault
tolerance)
12
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
Naming and Transparency
Naming is the mapping between logical and physical objects.
Example: A user filename maps to <cylinder, sector>.
In a conventional file system, it's understood where the file actually resides; the
system and disk are known.
In a transparent DFS, the location of a file, somewhere in the network, is
hidden.
File replication means multiple copies of a file; mapping returns a SET of
locations for the replicas.
Location transparency a) The name of a file does not reveal any hint of the file's physical storage
location.
a) File name still denotes a specific, although hidden, set of physical disk blocks.
b) This is a convenient way to share data.
c) Can expose correspondence between component units and machines.
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
Naming and Transparency
Location independence The name of a file doesn't need to be changed when the file's
physical storage location changes. Dynamic, one-to-many mapping.
Better file abstraction.
Promotes sharing the storage space itself.
Separates the naming hierarchy from the storage devices hierarchy.
Most DFSs today:
Support location transparent systems.
Do NOT support migration; (automatic movement of a file from
machine to machine.)
Files are permanently associated with specific disk blocks.
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
Naming and Transparency
The ANDREW DFS AS AN EXAMPLE:
Is location independent.
Supports file mobility.
Separation of FS and OS allows for disk-less systems. These have lower cost
and convenient system upgrades. The performance is not as good.
NAMING SCHEMES:
There are three main approaches to naming files:
1. Files are named with a combination of host and local name.
•
This guarantees a unique name. NOT location transparent NOR location
independent.
•
Same naming works on local and remote files. The DFS is a loose collection of
independent file systems.
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
Naming and Transparency
NAMING SCHEMES:
2. Remote directories are mounted to local directories.
• So a local system seems to have a coherent directory structure.
• The remote directories must be explicitly mounted. The files are
location independent.
• SUN NFS is a good example of this technique.
3. A single global name structure spans all the files in the system.
• The DFS is built the same way as a local filesystem. Location
independent.
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
Naming and Transparency
IMPLEMENTATION TECHNIQUES:
Can Map directories or larger aggregates rather than individual files.
A non-transparent mapping technique:
name ----> < system, disk, cylinder, sector >
A transparent mapping technique:
name ----> file_identifier ----> < system, disk, cylinder, sector >
So when changing the physical location of a file, only the file identifier
need be modified. This identifier must be "unique" in the universe.
Distributed System and Middleware
File Service Design Options
State full
server holds information on open files, current position, file locks
open before access, close after access
better performance
shorter message, read-ahead possible
server failure
lose state
client failure
tables fill up
can provide file locks
18
Distributed System and Middleware
File Service Design Options -ctd
Stateless
no state information held by server
file operations(idempotent) must contain all information needed
(longer message)
simpler file server design
can recover easily from client or server crash
locking requires extra lock server to hold state
19
Distributed System and Middleware
File Service Architecture
Client Side
File server
Side
20
Distributed System and Middleware
File Service Architecture
Client computer
Lookup
AddName
UnName
GetNames
Server computer
Directory service
Application Application
program
program
Flat file service
Client module
Read
Write
Create
Delete
GetAttributes
SetAttributes
21
Distributed System and Middleware
File Server Architecture -ctd
Components (for openness):
Flat file service
Flat file service operations below on file contents
Have unique file identifiers (UFIDs) translates UFIDs to file locations
Read(FileId, i, n) -> Data
— throws BadPosition
If 1 ≤ i ≤ Length(File): Reads a sequence of up to n items
from a file starting at item i and returns it in Data.
Write(FileId, i, Data)
— throws BadPosition
If 1 ≤ i ≤ Length(File)+1: Writes a sequence of Data to a
file, starting at item i, extending the file if necessary.
Create() -> FileId
Creates a new file of length 0 and delivers a UFID for it.
Delete(FileId)
Removes the file from the file store.
GetAttributes(FileId) -> Attr
Returns the file attributes for the file.
SetAttributes(FileId, Attr)
Sets the file attributes (only those attributes that are not
shaded in ).
22
Distributed System and Middleware
File Server Architecture -ctd
Directory service
mapping between text-(file) names to UFIDs
Flat file service
Read(FileId, i, n) -> Data
Write(FileId, i, Data)
Create() -> FileId
Directory service
Lookup(Dir, Name) -> FileId
AddName(Dir, Name, FileId)
Delete(FileId)
UnName(Dir, Name)
GetAttributes(FileId) -> Attr
GetNames(Dir, Pattern) -> NameSeq
SetAttributes(FileId, Attr)
Client module
API for file access, one per client computer
holds states: open files, positions
knows network location of flat file & directory server
23
Distributed System and Middleware
Flat file service RPC interface
Used by client modules, not user programs
FileId (UFID) uniquely identifies file
invalid if file not present or inappropriate access
Read/Write; Create/Delete; Get/SetAttributes
No open/close! (unlike UNIX)
access immediate with FileId
Read/Write identify starting point
Improved fault-tolerance
operations idempotent except Create, can be repeated (at-least-once RPC
semantics)
stateless service
24
Distributed System and Middleware
Access control
In UNIX file system
access rights are checked against the access mode (read, write, execute) in
open
user identity checked at login time, cannot be tampered(=changed) with in
non-distributed implementations.
In distributed (file) systems
Access rights must be checked at server
RPC unprotected
Forging identity possible, a security risk
user id typically passed with every request (e.g. Sun NFS)
stateless
25
Distributed System and Middleware
Directory structure
Hierarchical
tree-like, pathnames from root
(in UNIX) several names per file (link operation)
(root)
export
Naming system
implemented by client module, using directory service
root has well-known UFID
locate file following path from root
people
big jon bob
...
26
Distributed System and Middleware
File names
Text name = directory pathname+file name
hostname:local name
not mobility transparent
uniform name structure (the same name space for all clients)
remote mount (e.g. Sun NFS)
remote directory inserted into local directory
relies on clients maintaining consistent naming conventions across all clients
all clients must implement same local tree
must mount remote directory into the same local directory
27
Distributed System and Middleware
File names
Mount operation:
mount(remotehost, remotedirectory, localdirectory)
A server maintains a table of clients who have mounted file
systems at that server.
Each client maintains a table of mounted file systems holding:
< IP address, port number, file handle>
Hard versus soft mounts
28
Distributed System and Middleware
Remote mount
Server 1
Client
Server 2
(root)
(root)
(root)
export
...
vmunix
usr
nfs
Remote
people
big jon bob
mount
Remote
students
x
staff
mount
...
users
jim ann jane joe
Note: The file system mounted at /usr/students in the client
is actually the sub-tree located at /export/people in Server 1;
the file system mounted at /usr/staff in the client
is actually the sub-tree located at /nfs/users in Server 2.
server-side : /export/people, : /nfs/users
client-side : mount -t nfs server1:/export/people /usr/students /* client: /usr/students(=people)/jon,… */
client-side : mount -t nfs server2:/nfs/users /usr/staff /* client:/usr/staff(=users)/jane, … */
29
Distributed System and Middleware
Directory service
Directory
conventional file (client of the flat file service)
mapping from text names to UFIDs
Operations
require FileId, machine readable UFID as parameter
locate file (LookUp)
add/delete file (AddName/UnName)
match file names to regular expression (GetNames)
30
Distributed System and Middleware
Directory service operations
Lookup(Dir, Name) -> FileId
— throws NotFound
Locates the text name in the directory and returns the
relevant UFID. If Name is not in the directory, throws an
exception.
AddName(Dir, Name, File)
— throws NameDuplicate
If Name is not in the directory, adds (Name, File) to the
directory and updates the file’s attribute record.
If Name is already in the directory: throws an exception.
UnName(Dir, Name)
— throws NotFound
If Name is in the directory: the entry containing Name is
removed from the directory.
If Name is not in the directory: throws an exception.
GetNames(Dir, Pattern) -> NameSeq
Returns all the text names in the directory that match the
regular expression Pattern.
31
Distributed System and Middleware
File sharing
Multiple clients share the same file for read/write access.
One-copy update semantics
every read sees the effect of all previous writes
a write is immediately visible to clients who have the file open for reading
Problems!
caching: maintaining consistency between several copies difficult to achieve
serialize access by using file locks (affects performance )
trade-off between consistency and performance
32
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
Remote File Access
CACHING
Reduce network traffic by retaining recently accessed disk blocks in a
cache, so that repeated accesses to the same information can be handled
locally.
If required data is not already cached, a copy of data is brought from the
server to the user.
Perform accesses on the cached copy.
Files are identified with one master copy residing at the server machine,
Copies of (parts of) the file are scattered in different caches.
Cache Consistency Problem -- Keeping the cached copies consistent with
the master file.
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
Remote File Access
CACHING
A remote service ((RPC) has these characteristic steps:
a)
b)
c)
d)
The client makes a request for file access.
The request is passed to the server in message format.
The server makes the file access.
Return messages bring the result back to the client.
This is equivalent to performing a disk access for each request.
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
Remote File Access
CACHE LOCATION:
Caching is a mechanism for maintaining disk data on the local machine. This
data can be kept in the local memory or in the local disk. Caching can be
advantageous both for read ahead and read again.
The cost of getting data from a cache is a few HUNDRED instructions; disk
accesses cost THOUSANDS of instructions.
The master copy of a file doesn't move, but caches contain replicas of
portions of the file.
Caching behaves just like "networked virtual memory".
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
Remote File Access
CACHE LOCATION:
What should be cached? << blocks <---> files >>.
Bigger sizes give a better hit rate;
Smaller give better transfer times.
Caching on disk gives:
— Better reliability.
Caching in memory gives:
— The possibility of diskless work stations,
— Greater speed,
Since the server cache is in memory, it allows the use of only one mechanism.
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
Remote File Access
CACHE UPDATE POLICY:
A write through cache has good reliability. But the user must wait for writes to
get to the server. Used by NFS.
Delayed write - write requests complete more rapidly. Data may be written
over the previous cache write, saving a remote write. Poor reliability on a
crash.
Flush sometime later tries to regulate the frequency of writes.
Write on close delays the write even longer.
Which would you use for a database file? For file editing?
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
Example: NFS with Cachefs
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
Remote File Access
CACHE CONSISTENCY:
The basic issue is, how to determine that the client-cached data is
consistent with what's on the server.
Client - initiated approach The client asks the server if the cached data is OK. What should be the
frequency of "asking"? On file open, at fixed time interval, ...?
Server - initiated approach Possibilities: A and B both have the same file open. When A closes the
file, B "discards" its copy. Then B must start over.
The server is notified on every open. If a file is opened for writing, then
disable caching by other clients for that file.
Get read/write permission for each block; then disable caching only for
particular blocks.
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
Remote File Access
COMPARISON OF CACHING AND REMOTE SERVICE:
Many remote accesses can be handled by a local cache. There's a great deal of
locality of reference in file accesses. Servers can be accessed only occasionally
rather than for each access.
Caching causes data to be moved in a few big chunks rather than in many smaller
pieces; this leads to considerable efficiency for the network.
Cache consistency is the major problem with caching. When there are infrequent
writes, caching is a win. In environments with many writes, the work required to
maintain consistency overwhelms caching advantages.
Caching requires a whole separate mechanism to support acquiring and storage of
large amounts of data. Remote service merely does what's required for each call. As
such, caching introduces an extra layer and mechanism and is more complicated
than remote service.
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
Remote File Access
STATEFUL VS. STATELESS SERVICE:
Stateful: A server keeps track of information about client requests.
It maintains what files are opened by a client; connection
identifiers; server caches.
Memory must be reclaimed when client closes file or when client
dies.
Stateless: Each client request provides complete information needed by the
server (i.e., filename, file offset ).
The server can maintain information on behalf of the client, but it's
not required.
Useful things to keep include file info for the last N files touched.
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
Remote File Access
STATEFUL VS. STATELESS SERVICE:
Performance is better for stateful.
Don't need to parse the filename each time, or "open/close" file on
every request.
Stateful can have a read-ahead cache.
Fault Tolerance: A stateful server loses everything when it crashes.
Server must poll clients in order to renew its state.
Client crashes force the server to clean up its encached information.
Stateless remembers nothing so it can start easily after a crash.
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
Remote File Access
FILE REPLICATION:
Duplicating files on multiple machines improves availability and
performance.
Placed on failure-independent machines ( they won't fail together ).
Replication management should be "location-opaque".
The main problem is consistency - when one copy changes, how do other
copies reflect that change? Often there is a tradeoff: consistency versus
availability and performance.
Example:
"Demand replication" is like whole-file caching; reading a file causes it to
be cached locally. Updates are done only on the primary file at which time
all other copies are invalidated.
Atomic and serialized invalidation isn't guaranteed ( message could get
lost / machine could crash. )
Distributed System and Middleware
Example: Sun NFS (1985)
An industry standard for file sharing on local networks since the
1980s
An open standard with clear and simple interfaces
Closely follows the abstract file service model defined above
Supports many of the design requirements already mentioned:
transparency
heterogeneity
efficiency
fault tolerance
Limited achievement of:
concurrency
replication
consistency
security
44
Distributed System and Middleware
Example: Sun NFS (1985)
Structure of flat file, client & directory service
NFS protocol
RPC based, OS independent (originally UNIX)
NFS server
stateless (no open/close)
no locks or concurrency control
no replication with updates
Virtual file system, Remote mount
Access control (user id with each request)
security loophol
modify RPC to impersonate users
Client and Server caching
45
Distributed System and Middleware
Sun NFS architecture
Client computer
Server computer
Application Application
program
program
UNIX
system calls
UNIX kernel
Virtual file system
Local
UNIX
file
system
Other
file system
UNIX kernel
Virtual file system
Remote
NFS
client
Operations
on remote files
NFS
protocol
NFS
server
UNIX
file
system
46
Distributed System and Middleware
File identifier (FileId)
Simple Solution
i-node (number identifying file within
file system)
file migration requires finding and
changing all FileIds
UNIX reuses i-node numbers after
file is deleted (i-generation. no)
Server address
IP address.socket
Index
i-node number
NFS file handle
Virtual file system uses i-node if local, file handle(fh) if remote.
fh = file handle:
File
handle(fh)
Filesystem
identifier
number
File system
identifier i-node
i-node
no.
i-node generation
i-node
gener. no.no
47
Distributed System and Middleware
NFS Server Operations (simplified)
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
read(fh, offset, count) -> attr, data
fh = file handle:
write(fh, offset, count, data) -> attr
create(dirfh, name, attr) -> newfh, attr
Filesystem identifier i-node number
remove(dirfh, name) -> status
getattr(fh) -> attr
•i-node contains information of files
setattr(fh, attr) -> attr
lookup(dirfh, name) -> fh, attr
rename(dirfh, name, todirfh, toname)
link(newdirfh, newname, dirfh, name)
readdir(dirfh, cookie, count) -> entries
symlink(newdirfh, newname, string) -> status
readlink(fh) -> string
mkdir(dirfh, name, attr) -> newfh, attr
rmdir(dirfh, name) -> status
statfs(fh) -> fsstats
i-node generation no
48
Distributed System and Middleware
Caching in NFS
Indispensable for performance (necessary)
Caching
Retains recently the used data (file pages, directories, file attributes) in cache
updates data in cache for speed
block size typically 8kbytes
Server caching
cache in server memory (UNIX kernel)
Client caching
cache in client memory, local disk
49
Distributed System and Middleware
Server caching
Store data in server memory
Read-ahead: anticipate which pages to read
Delayed write
update in cache; write to disk periodically (UNIX sync to synchronize cache)
or when space needed
which contents seen by users depends on timing
Write through
cache and write to disk (reliable, poor performance), whenever updated
Write on close
write to disk only when commit received (fast but problems with files open
for a long time)
50
Distributed System and Middleware
Client caching
Potential consistency problems!
different versions, portions of files, … since writes delayed
clients poll server to check if copy still valid
Timestamp method
Tag with latest time of validity check and modification time
copy valid if time since last check less than freshness interval, or
modification time on server the same
choose freshness interval adaptively, 3~30 sec for files, 30~60 sec for
directories
for small freshness interval, potential heavy load on Network
51
Distributed System and Middleware
Client caching ctd
Reads
perform validity check whenever cache entry(input) used
if not valid, request data from server
several optimizations to reduce traffic
Recent updates not always visible (timing!)
Writes
when page modified, marked as dirty
dirty pages flushed asynchronously, periodically (client’s synch) and on
close
Not truly one-copy update semantics...
52
Distributed System and Middleware
NFS summary
Transparency
Access transparency
providing application programming interface(= local system interface)
Location transparency
supporting a single network file name space
Mobility transparency
migration transparency
Scalability
To handle very large-world loads efficiently
File replication
NSF : read-only replica
supporting file replication with updates
Hardware and operating system - heterogeneity
Fault tolerance
53
Distributed System and Middleware
Example: Andrew File System(AFS)
Overview
A distributed computing environment (Andrew) under
development since 1983 at Carnegie-Mellon University,
purchased by IBM and released as Transarc DFS, now open
sourced as OpenAFS.
Information sharing on a large scale via transparency
NFS compatible(called NSF-2)
File reference by NFS-style file handle
54
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
AFS tries to solve complex issues such as uniform name space, locationindependent file sharing, client-side caching (with cache consistency), secure
authentication (via Kerberos)
Also includes server-side caching (via replicas), high availability
Can span 5,000 workstations
Scalable
Whole-file serving (> 64kbytes)
Whole-file caching (on local client disk, 100s of recently used files)
Characteristics of AFS
local-cached copy
providing sufficient cache storage
UNIX based on file size and referencing locality
Distributed System and Middleware
AFS Software architecture
Workstations(clients)
Servers
User Venus
program
Vice
UNIX kernel
UNIX kernel
User Venus
program
UNIX kernel
Network
Vice
Venus
User
program
UNIX kernel
UNIX kernel
Two software components
Vice(user-level UNIX processing running in server, server module)
Venus( user-level process running in a client, client module)
56
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
Andrew File System
SHARED NAME SPACE:
The server file space is divided into volumes. Volumes contain files of only
one user. It's these volumes that are the level of granularity attached to a
client.
A vice file can be accessed using a fid = <volume number, vnode >. The
fid doesn't depend on machine location. A client queries a volume-location
database for this information.
Volumes can migrate between servers to balance space and utilization. Old
server has "forwarding" instructions and handles client updates during
migration.
Read-only volumes ( system files, etc. ) can be replicated. The volume
database knows how to find these.
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
Andrew File System
FILE OPERATIONS AND CONSISTENCY SEMANTICS:
Andrew caches entire files form servers
A client workstation interacts with Vice servers only during opening and
closing of files
Venus – caches files from Vice when they are opened, and stores modified
copies of files back when they are closed
Reading and writing bytes of a file are done by the kernel without Venus
intervention on the cached copy
Venus caches contents of directories and symbolic links, for path-name
translation
Exceptions to the caching policy are modifications to directories that are
made directly on the server responsibility for that directory
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
Andrew File System
Clients have a partitioned space of file names:
a local name space and a shared name space
Dedicated servers, called Vice, present the shared name space to the
clients as an homogeneous, identical, and location transparent file
hierarchy
Workstations run the Virtue protocol to communicate with Vice.
Are required to have local disks where they store their local name space
Servers collectively are responsible for the storage and management of the
shared name space
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
Andrew File System
Clients and servers are structured in clusters interconnected by a backbone
LAN
A cluster consists of a collection of workstations and a cluster server and is
connected to the backbone by a router
A key mechanism selected for remote file operations is whole file caching
Opening a file causes it to be cached, in its entirety, on the local disk
Distributed System and Middleware
DISTRIBUTED FILE SYSTEMS
Andrew File System
IMPLEMENTATION – Flow of a request:
Deflection of open/close:
The client kernel is modified to detect references to vice files.
The request is forwarded to Venus with these steps:
Venus does pathname translation.
Asks Vice for the file
Moves the file to local disk
Passes inode of file back to client kernel.
Venus maintains caches for status ( in memory ) and data ( on local disk.)
A server user-level process handles client requests.
A lightweight process handles concurrent RPC requests from clients.
State information is cached in this process.
Susceptible to reliability problems.
Distributed System and Middleware
New developments -ctd
AFS enhancements
DCE/DFS standards, adopts a similar Spritely NFS and NQNFS to callbacks
improving in storage organization
Redundant array of inexpensive(RAID)
Log-structure file storage(LFS)
New design approaches(UC of Berkeley)
xFS (serverless network architecture, file serving responsibility distributed
across LAN)
Frangipni( high scalable distributed file system, Digital System Research Center,
1997)
62
Distributed System and Middleware
Summary
File service
crucial to the running of a distributed system
performance, consistency and easy recovery essential
Design issues
separate flat file service from directory service and client module
stateless for performance and fault-tolerance
caching for performance
concurrent updates difficult with caching
approximation of one-copy update semantics
Case studies
SUN-NFS
AFS
Recent advances
63