Figure 15.1 A distributed multimedia system

Download Report

Transcript Figure 15.1 A distributed multimedia system

Teaching material
based on Distributed
Systems: Concepts
and Design, Edition 3,
Addison-Wesley 2001.
Copyright © George
Coulouris, Jean Dollimore,
Tim Kindberg 2001
email: [email protected]
This material is made
available for private study
and for direct use by
individual teachers.
It may not be included in any
product or employed in any
service without the written
permission of the authors.
Viewing: These slides
must be viewed in
slide show mode.
Distributed File Systems
(DFS)
Updated by Rajkumar Buyya
Chapter 8:
8.1
8.2
8.3
[8.4
8.5
8.6
Introduction
File service architecture
Sun Network File System (NFS)
Andrew File System (personal study)]
Recent advances
Summary
Learning objectives
 Understand the requirements that affect the design
of distributed services
 NFS: understand how a relatively simple, widelyused service is designed
–
–
–
–
Obtain a knowledge of file systems, both local and networked
Caching as an essential design technique
Remote interfaces are not the same as APIs
Security requires special consideration
 Recent advances: appreciate the ongoing research
that often leads to major advances
2
Introduction
 Why do we need a DFS?
– Primary purpose of a Distributed System…
Connecting users and resources
– Resources…
 … can be inherently distributed
 … can actually be data (files, databases, …) and…
 … their availability becomes a crucial issue for the performance of a
Distributed System
Introduction
 A case for DFS
Uhm… perhaps time
has come to buy a rack
of servers….
Server A
I want to store
my thesis on the
server!
I need to have my
book always
available..
I need
storage for
my reports
My boss
wants…
I need to
store my
analysis and
reports
safely…
Introduction
 A Case for DS
Server A
Same here…
I don’t
remember..
Server B
Server C
Hey… but
where did I
put my docs?
Uhm… … maybe we
need a DFS?... Well
after the paper and a
nap…
I am not sure whether
server A, or B, or C…
Wow… now I can
store a lot more
documents…
Introduction
 A Case for DFS
Server B
Server A
Server C
Distributed File System
It is reliable, fault tolerant,
highly available, location
transparent…. I hope I can
finish my newspaper
now…
Nice… my
boss will
promote me!
Good… I can access my
Wow! I do not have
folders from anywhere..
to remember which
server I stored the
data into…
Storage systems and their properties
 In first generation of distributed systems
(1974-95), file systems (e.g. NFS) were
the only networked storage systems.
 With the advent of distributed object
systems (CORBA, Java) and the web,
the picture has become more complex.
 Current focus is on large scale, scalable
storage.
– Google File System
– Amazon S3 (Simple Storage Service)
– Cloud Storage (e.g., DropBox)
7
1974 - 1995
1995 - 2007
2007 - now
Storage systems and their properties
Figure 8.1
Types of consistency between copies: 1 - strict one-copy consistency
√ - approximate consistency
X - no automatic consistency
Sharing Persis- Distributed
Consistency Example
tence
cache/replicas maintenance
Main memory
1
RAM
File system
1
UNIX file system
Distributed file system
Sun NFS
Web
Web server
Distributed shared memory
Ivy (Ch. 16)
Remote objects (RMI/ORB)
1
CORBA
Persistent object store
1
CORBA Persistent
Object Service
Persistent distributed object store
PerDiS, Khazana
8
What is a file system?
1
 Persistent stored data sets
 Hierarchic name space visible to all processes
 API with the following characteristics:
– access and update operations on persistently stored data sets
– Sequential access model (with additional random facilities)
 Sharing of data between users, with access control
 Concurrent access:
– certainly for read-only access
– what about updates?
 Other features:
– mountable file stores
– more? ...
9
*
What is a file system?
2
Figure 8.4 UNIX file system operations
filedes = open(name, mode)
filedes = creat(name, mode)
status = close(filedes)
count = read(filedes, buffer, n)
count = write(filedes, buffer, n)
pos = lseek(filedes, offset,
whence)
status = unlink(name)
status = link(name1, name2)
status = stat(name, buffer)
Opens
an Exercise
existing file A
with the given name.
Class
Creates a new file with the given name.
Write a simple C program to copy a file using the UNIX
Both operations deliver a file descriptor referencing the open
file The
system
file.
modeoperations
is read, writeshown
or both.in Figure 8.4.
Closes the open file filedes.
copyfile(char * oldfile, * newfile)
Transfers n bytes from the file referenced by filedes to buffer.
{
Transfers
n bytes to the file referenced by filedes from buffer.
<you
write
this part,
usingofopen(),
creat(),transferred
read(),
Both operations
deliver
the number
bytes actually
write()>
and advance
the read-write pointer.
}
Moves
the read-write pointer to offset (relative or absolute,
depending on whence).
Note: remember
that
read()
returns structure.
0 when you
Removes
the file name
from
the directory
If theattempt
file
to read
beyond
of the file.
has
no other
names,the
it isend
deleted.
Adds a new name (name2) for a file (name1).
Gets the file attributes for file name into buffer.
10
*
What is a file system?
(a typical module structure for implementation of non-DFS)
Figure 8.2 File system modules
Directory modul e:
relates fi le names to fil e IDs
File module:
relates fi le IDs to partic ul ar fil es
Acc es s control module:
c hecks permi ss ion for operati on reques ted
File ac cess module:
reads or writes file data or attributes
Bloc k module:
acc es ses and allocates dis k blocks
Device module:
disk I/O and buffering
Blocks
Device
11
Files
Directories
What is a file system?
4
Figure 8.3 File attribute record structure
File length
Creation timestamp
updated
by system:
Read timestamp
Write timestamp
Attribute timestamp
Reference count
Owner
updated
by owner:
File type
Access control list
E.g. for UNIX: rw-rw-r-12
Distributed File system/service requirements
 Transparency
 Concurrency
 Replication
 Heterogeneity
 Fault tolerance
 Consistency
 Security
 Efficiency..
File service is most heavily loaded
service in an intranet, so its
functionality and performance
are critical
Tranparencies
Changes
to aproperties
file by one client should not interfere
Replication
Heterogeneity
properties
Access:
Same
operations
programs
Fault
tolerance
with
the
operation(client
of other
clients are
Consistency
Security
File
service
maintains
multiple
identical
copies
of
Efficiency
unaware
of
distribution
of
files)
Service
can
be
accessed
by
clients
running
on
simultaneously
accessing
or
changing
Service
mustone-copy
continue
tocontrol
operate
even
when
Unix
offers
update
semantics
for asclients
files
Must
maintain
access
and
privacy
(almost)
any
OS
or
hardware
platform.
Goal
for
distributed
file systems
is usuallyof for
the
same
file.
Location:
Same
name
space
after
relocation
make
errors
or
crash.
operations
local files - caching is completely
local
files. on
•
Load-sharing
between
servers
makes
service
performance
comparable
tothe
local
file
system.
files
or compatible
processes
(client
programs
Design
must
be
with
file
systems
of
Concurrency
properties
transparent.
Service
must
resume
after
a
server
machine
•based see
on identity
of user
more
scalable
should
a uniform
filemaking
name request
space)
different
OSes
crashes.
Isolation
Difficult to
achieveofthe
same
for distributed
file
•identities
remote
users
must
be
authenticated
• Service
Local access
has
better
response
(lower
latency)
Mobility:
Automatic
relocation
of
files
is
possible
interfaces
mustlocking
be open
-continue
precise
systems
while
maintaining
good
performance
If the
service
is
replicated,
it
can
to
File-level
or
record-level
•privacy
requires
secure
communication
(neither
client
programs
nor
system
• Fault
specifications
of
APIs aare
published.
andtolerance
scalability.
operate
even
during
server
crash.
Other
formsadmin
of concurrency
control
to processes
minimise
tablesare
in open
client
nodes
need to be
Service
interfaces
to
all
not
Full replication
is
difficult
to
implement.
contention
changed
files are moved).
excluded
by awhen
firewall.
Caching (of all
or part of a
file) gives most
of the
Performance:
Satisfactory
performance
across
a
•vulnerable
to
impersonation
and
other
benefitsspecified
(except fault
tolerance)
range
of system loads
attacks
Scaling:
Service can be expanded to meet
additional loads or growth.
13
*
8.2 File Service Architecture
 An architecture that offers a clear separation of the
main concerns in providing access to files is
obtained by structuring the file service as three
components:
– A flat file service
– A directory service
– A client module.
 The relevant modules and their relationship is
(shown next).
 The Client module implements exported interfaces
by flat file and directory services on server side.
14
Model file service architecture
Figure 8.5
Client computer
Lookup
AddName
UnName
GetNames
Server computer
Directory service
Application Application
program
program
Flat file service
Client module
Read
Write
Create
Delete
GetAttributes
SetAttributes
15
Responsibilities of various modules
 Flat file service:
– Concerned with the implementation of operations on the contents of file.
Unique File Identifiers (UFIDs) are used to refer to files in all requests for flat
file service operations. UFIDs are long sequences of bits chosen so that each
file has a unique among all of the files in a distributed system.
 Directory Service:
– Provides mapping between text names for the files and their UFIDs. Clients
may obtain the UFID of a file by quoting its text name to directory service.
Directory service supports functions needed to generate directories and to add
new files to directories.
 Client Module:
– It runs on each computer and provides integrated service (flat file and
directory) as a single API to application programs. For example, in UNIX
hosts, a client module emulates the full set of Unix file operations.
– It holds information about the network locations of flat-file and directory server
processes; and achieve better performance through implementation of a
cache of recently used file blocks at the client.
16
Server operations/interfaces for the model file service
Figures 8.6 and 8.7
Flat file service
Directory service
position of first byte
Read(FileId, i, n) -> Data
position of first byte
Write(FileId, i, Data)
Create() -> FileId
Delete(FileId)
GetAttributes(FileId) -> Attr
SetAttributes(FileId, Attr)
Lookup(Dir, Name) -> FileId
FileId
AddName(Dir, Name, File)
UnName(Dir, Name)
GetNames(Dir, Pattern) -> NameSeq
Pathname lookup
FileId
Pathnames such as '/usr/bin/tar' are resolved
A unique identifier for files anywhere in the
by iterative calls to lookup(), one call for
network. Similar to the remote object
each component of the path, starting with
references described in Section 4.3.3.
the ID of the root directory '/' which is
known in every client.
17
*
File Group
A collection of files that can be
located on any server or moved
between servers while
maintaining the same names.
– Similar to a UNIX filesystem
– Helps with distributing the load of file
serving between several servers.
– File groups have identifiers which are
unique throughout the system (and
hence for an open system, they must
be globally unique).
 Used to refer to file groups and files
18
To construct a globally unique
ID we use some unique
attribute of the machine on
which it is created, e.g. IP
number, even though the file
group may move subsequently.
File Group ID:
32 bits
IP address
16 bits
date
*
DFS: Case Studies
 NFS (Network File System)
– Developed by Sun Microsystems (in 1985)
– Most popular, open, and widely used.
– NFS protocol standardised through IETF (RFC 1813)
 AFS (Andrew File System)
– Developed by Carnegie Mellon University as part of Andrew
distributed computing environments (in 1986)
– A research project to create campus wide file system.
– Public domain implementation is available on Linux (LinuxAFS)
– It was adopted as a basis for the DCE/DFS file system in the Open
Software Foundation (OSF, www.opengroup.org) DEC (Distributed
Computing Environment)
19
Case Study: Sun NFS
 An industry standard for file sharing on local networks since the 1980s
 An open standard with clear and simple interfaces
 Closely follows the abstract file service model defined above
 Supports many of the design requirements already mentioned:
–
–
–
–
transparency
heterogeneity
efficiency
fault tolerance
 Limited achievement of:
–
–
–
–
concurrency
replication
consistency
security
20
*
NFS - History
 1985: Original Version (in-house use)
 1989: NFSv2 (RFC 1094)
–
–
–
Operated entirely over UDP
Stateless protocol (the core)
Support for 2GB files
 1995: NFSv3 (RFC 1813)
–
–
–
–
–
Support for 64 bit (> 2GB files)
Support for asynchronous writes
Support for TCP
Support for additional attributes
Other improvements
 2000-2003: NFSv4 (RFC 3010, RFC 3530)
–
–
Collaboration with IETF
Sun hands over the development of NFS
 2010: NFSv4.1
–
Adds Parallel NFS (pNFS) for parallel data access
NFS architecture
Client computer
Figure 8.8
NFS
Application
program Client
Client computer
Application Application
program
program
Server computer
Application
program
Kernel
UNIX
system calls
Virtual file system
Operations
on local files
UNIX
file
system
Other
file system
UNIX kernel
Virtual file system
Operations
on
remote files
NFS
client
NFS
server
NFS
Client
UNIX
file
system
NFS
protocol
(remote operations)
22
*
NFS architecture:
does the implementation have to be in the system kernel?
No:
– there are examples of NFS clients and servers that run at applicationlevel as libraries or processes (e.g. early Windows and MacOS
implementations, current PocketPC, etc.)
But, for a Unix implementation there are advantages:
– Binary code compatible - no need to recompile applications
 Standard system calls that access remote files can be routed through the
NFS client module by the kernel
– Shared cache of recently-used blocks at client
– Kernel-level server can access i-nodes and file blocks directly
 but a privileged (root) application program could do almost the same.
– Security of the encryption key used for authentication.
23
*
NFS server operations (simplified)
Figure 8.9
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
fh = fileModel
handle:flat file service
read(fh, offset, count) -> attr, data
Read(FileId, i, n) -> Data
write(fh, offset, count, data) -> attr
identifier i-node
number i-node generation
Write(FileId,
i, Data)
create(dirfh, name, attr) -> newfh, attr Filesystem
Create() -> FileId
remove(dirfh, name) status
Delete(FileId)
getattr(fh) -> attr
GetAttributes(FileId) -> Attr
setattr(fh, attr) -> attr
SetAttributes(FileId, Attr)
lookup(dirfh, name) -> fh, attr
rename(dirfh, name, todirfh, toname)
Model directory service
link(newdirfh, newname, dirfh, name)
Lookup(Dir, Name) -> FileId
readdir(dirfh, cookie, count) -> entries
AddName(Dir, Name, File)
symlink(newdirfh, newname, string) -> statusUnName(Dir, Name)
readlink(fh) -> string
GetNames(Dir, Pattern)
mkdir(dirfh, name, attr) -> newfh, attr
->NameSeq
rmdir(dirfh, name) -> status
statfs(fh) -> fsstats
24
*
NFS access control and authentication
 Stateless server, so the user's identity and access rights must
be checked by the server on each request.
– In the local file system they are checked only on open()
 Every client request is accompanied by the userID and groupID
– not shown in the Figure 8.9 because they are inserted by the RPC system
 Server is exposed to imposter attacks unless the userID and
groupID are protected by encryption
 Kerberos has been integrated with NFS to provide a stronger
and more comprehensive security solution
– Kerberos is described in Chapter 7.
25
*
Architecture Components (UNIX / Linux)
 Server:
– nfsd: NFS server daemon that services requests from clients.
– mountd: NFS mount daemon that carries out the mount request
passed on by nfsd.
– rpcbind: RPC port mapper used to locate the nfsd daemon.
– /etc/exports: configuration file that defines which portion of the file
systems are exported through NFS and how.
 Client:
– mount: standard file system mount command.
– /etc/fstab: file system table file.
– nfsiod: (optional) local asynchronous NFS I/O server.
Mount service
 Mount operation:
mount(remotehost, remotedirectory, localdirectory)
 Server maintains a table of clients who have
mounted filesystems at that server
 Each client maintains a table of mounted file
systems holding:
< IP address, port number, file handle>
 Hard versus soft mounts
27
*
Local and remote file systems accessible on an NFS client
Figure 8.10
Server 1
Client
(root)
(root)
export
. ..
vmuni x
Server 2
(root)
usr
nfs
Remote
people
big jon bob
mount
Remote
s tudents
x
. ..
s taff
mount
users
ji m ann jane joe
Note: The file system mounted at /usr/students in the client is actually the sub-tree located at /export/people in Server 1;
the file system mounted at /usr/staff in the client is actually the sub-tree located at /nfs/users in Server 2.
28
*
Automounter
NFS client catches attempts to access 'empty' mount
points and routes them to the Automounter
– Automounter has a table of mount points and multiple candidate
serves for each
– it sends a probe message to each candidate server and then uses the
mount service to mount the filesystem at the first server to respond
 Keeps the mount table small
 Provides a simple form of replication for read-only
filesystems
– E.g. if there are several servers with identical copies of /usr/lib then
each server will have a chance of being mounted at some clients.
29
*
Kerberized NFS
 Kerberos protocol is too costly to apply on each file access
request
 Kerberos is used in the mount service:
– to authenticate the user's identity
– User's UserID and GroupID are stored at the server with the client's IP address
 For each file request:
– The UserID and GroupID sent must match those stored at the server
– IP addresses must also match
 This approach has some problems
– can't accommodate multiple users sharing the same client computer
– all remote filestores must be mounted each time a user logs in
30
*
New design approaches
1
Distribute file data across several servers
– Exploits high-speed networks (InfiniBand, Gigabit Ethernet)
– Layered approach, lowest level is like a 'distributed virtual disk'
– Achieves scalability even for a single heavily-used file
'Serverless' architecture
– Exploits processing and disk resources in all available network nodes
– Service is distributed at the level of individual files
Examples:
xFS (section 8.5): Experimental implementation demonstrated a substantial
performance gain over NFS and AFS
Peer-to-peer systems: Napster, OceanStore (UCB), Farsite (MSR), Publius (AT&T
research) - see web for documentation on these very recent systems
Cloud-based File Systems: DropBox
31
*
Dropbox Folder
Automatic
synchronization
Dropbox Folder
Dropbox Folder
Summary

Distributed File systems provide illusion of a local file system and hide complexity
from end users.

Sun NFS is an excellent example of a distributed service designed to meet many
important design requirements

Effective client caching can produce file service performance equal to or better than
local file systems

Consistency versus update semantics versus fault tolerance remains an issue

Most client and server failures can be masked

Superior scalability can be achieved with whole-file serving (Andrew FS) or the
distributed virtual disk approach
Future requirements:
– support for mobile users, disconnected operation, automatic re-integration (Cf.
Coda file system, Chapter 14)
– support for data streaming and quality of service (Cf. Tiger file system, Chapter 15)
33
*
Backup – Personal Study
34
Exercise A solution
Write a simple C program to copy a file using the UNIX file system operations shown in Figure 8.4.
#define BUFSIZE 1024
#define READ 0
#define FILEMODE 0644
void copyfile(char* oldfile, char* newfile)
{
char buf[BUFSIZE]; int i,n=1, fdold, fdnew;
if((fdold = open(oldfile, READ))>=0) {
fdnew = creat(newfile, FILEMODE);
while (n>0) {
n = read(fdold, buf, BUFSIZE);
if(write(fdnew, buf, n) < 0) break;
}
close(fdold); close(fdnew);
}
else printf("Copyfile: couldn't open file: %s \n", oldfile);
}
main(int argc, char **argv) {
copyfile(argv[1], argv[2]);
}
35
*
server operations for: copyfile("/usr/include/glob.h", "/foo")
fdold = open('/usr/include/glob.h", READ)
Exercise B solution
Client module actions:
FileId = Lookup(Root, "usr")
- remote invocation
FileId
Lookup(FileId,
"include")
- remote
invocation
Show how each file operation of the program that
you= wrote
in Class Exercise
A would
be executed
= Lookup(FileId,
using the operations of the Model File ServiceFileId
in Figures
8.6 and 8.7. "glob.h") - remote invocation
client module makes an entry in an open files table with
file = FileId, mode = READ, and RWpointer = 0. It returns
if((fdold = open(oldfile, READ))>=0) {
fdnew = creat(newfile, FILEMODE); the table row number as the value for fdold
while (n>0) {
fdnew = creat("/foo", FILEMODE)
n = read(fdold, buf, BUFSIZE);
Client module actions:
if(write(fdnew, buf, n) < 0) break;
FileId = create()
}
AddName(Root, "foo", FileId)
close(fdold); close(fdnew);
- remote invocation
- remote invocation
SetAttributes(FileId, attributes)
- remote invocation
client module makes an entry in its openfiles table with
file = FileId, mode = WRITE, and RWpointer = 0. It returns
the
row number
as the value for fdnew
n = table
read(fdold,
buf, BUFSIZE)
Client module actions:
Read(openfiles[fdold].file, openfiles[fdold].RWpointer,
BUFSIZE)
- remote invocation
increment the RWpointer in the openfiles table by BUFSIZE
and assign36 the resulting array of data to buf
*
Figure 8.11
Distribution of processes in the Andrew File System
Works tations
Servers
User Venus
program
Vice
UNIX kernel
UNIX kernel
User Venus
program
Network
UNIX kernel
Vice
Venus
User
program
UNIX kernel
UNIX kernel
37
*
Figure 8.12
File name space seen by clients of AFS
Loc al
Shared
/ (root)
tmp
bin
. . .
c mu
vmuni x
bin
Symbolic
li nks
38
*
Figure 8.13
System call interception in AFS
Works tation
User
program
Venus
UNIX file
s ystem c al ls
Non-local file
operations
UNIX kernel
UNIX file s ys tem
Loc al
disk
39
*
Figure 8.14
Implementation of file system calls in AFS
User proce ss
open(Fi leName,
mode )
UNIX k ernel
V enus
Net
If F ileName refe rs t o a
fi le i n shared fil e spac e,
Chec k l ist of file s i n
pa ss the re que st to
local ca che. If not
Venus.
prese nt or the re is no
va lid callbac k promise,
se nd a request for the
fi le t o t he Vi ce server
that is c ust odian of the
volume conta ini ng the
fi le.
Pl ace the c opy of the
fi le i n t he loc al file
Open the l oca l file and syste m, enter its loc al
na me in the l oca l c ac he
re turn t he file
li st a nd ret urn the l oca l
de script or to the
appli ca tion.
na me to UNIX.
re ad(Fi leDesc riptor,
B uffe r, length)
V ice
T ransfe r a copy of t he
fi le a nd acallbac k
promise t o the
worksta tion. L og the
ca llbac k promise .
Pe rform a norma l
UNIX re ad opera tion
on the l oca l c opy.
write (F ileDescri ptor, Pe rform a norma l
B uffe r, length) UNIX write operation
on the l oca l c opy.
cl ose (F ileDescri ptor) Close the l oc al c opy
and not ify Ve nus that
the file ha s been cl ose d. If the loca l c opy ha s
be en changed, se nd a
copy to the Vice se rve r
that is t he custodian of
the file .
40
Repl ace the file
conte nt s and send a
callbac k t o a ll othe r
cl ients hol dingcallback
promise s on t he file .
*
Figure 8.15
The main components of the Vice service interface
Fetch(fid) -> attr, data
Returns the attributes (status) and, optionally, the contents of file
identified by the fid and records a callback promise on it.
Store(fid, attr, data)
Updates the attributes and (optionally) the contents of a specified
file.
Creates a new file and records a callback promise on it.
Deletes the specified file.
Create() -> fid
Remove(fid)
SetLock(fid, mode)
ReleaseLock(fid)
RemoveCallback(fid)
BreakCallback(fid)
Sets a lock on the specified file or directory. The mode of the
lock may be shared or exclusive. Locks that are not removed
expire after 30 minutes.
Unlocks the specified file or directory.
Informs server that a Venus process has flushed a file from its
cache.
This call is made by a Vice server to a Venus process. It cancels
the callback promise on the relevant file.
41
*