Distributed File System: Design Comparisons

Download Report

Transcript Distributed File System: Design Comparisons

Distributed File System: Data
Storage for Networks Large and
Small
Pei Cao
Cisco Systems, Inc.
Review: DFS Design Considerations
1.
2.
3.
4.
5.
6.
Name space construction
AAA
Operator batching
Client caching
Data consistency
Locking
Summing it Up: CIFS as an Example
• Network transport in CIFS
– Use SMB (Server Message block) messages
over a reliable connection-oriented transport
• TCP
• NetBIOS over TCP
– Use persistent connections called “sessions”
• If a session is broken, client does the recovery
Design Choices in CIFS
• Name space construction:
– per-client linkage, multiple methods for server
resolution
• file://fs.xyz.com/users/alice/stuff.doc
• \\cifsserver\users\alice\stuff.doc
• E:\stuff.doc
– CIFS also offers “redirection” method
• A share can be replicated in multiple servers or moved
• Client open  server reply
“STATUS_DFS_PATH_NOT_COVERED”  client issues
“TRANS2_DFS_GET_REFERRAL”  server reply with new
server
Design Choices in CIFS
• AAA: Kerberos
– Older systems use NTLM
• Operator batching: supported
– These methods have “AndX” variations:
TREE_CONNECT, OPEN, CREATE, READ, WRITE,
LOCK
– Server implicitly takes results of preceding operations
as input for subsequent operations
– First command that encounters an error stops all
subsequent processing in the batch
Design Choices in CIFS
• Client caching
– Cache both file data and file metadata, write-back cache, can readahead
– Offers strong cache consistency using an invalidation-based
approach
• Data access consistency
– Oplocks: similar to “tokens” in AFS v3
• “level II oplock”: read-only data locks
• “exclusive oplock”: exclusive read/write data lock
• “batch oplock”: exclusive read/write “open” lock and data lock and
metadata lock
– Transition among the oplocks
– Observation: can have a hierarchy of lock managers
Design Choices in CIFS
• File and data record locking
– Offer “shared” (read-only) and “exclusive” (read/write)
locks
– Part of the file system; Mandatory
– Can lock either a whole file or byte-range in the file
– Lock request can specify a timeout for waiting
– Enables atomic writes with the “ANDX” batching with
Writes
• “Lock/write/unlock” as a batched command sequence
• Additional capability: “directory change
notification”
DFS for Mobile Networks
• What properties of DFS are desirable:
– Handle frequent connection and disconnection
– Enable clients to operate in disconnected state
for an extended period of time
– Ways to resolve/merge conflicts
Design Issues for DFS in Mobile
Networks
• What should be kept in client cache?
• How to update the client cache copies with
changes made on the server?
• How to upload changes made by the client
to the server?
• How to resolve conflicts when more than
one clients change a file during
disconnected state?
Example System: Coda
• Client cache content:
– User can specify which directories should always be
cached on the client
– Also cache recently used files
– Cache replacement: walk over the cached items every
10 min to reevaluate their priorities
• Updates from server to client:
– The server keeps a log of callbacks that couldn’t be
delivered and deliver them upon client connection
Coda File System
• Upload the changes from client to server
– The client has to keep a “replay log”
• Contents of the “replay log”
– Ways to reduce the “replay log” size
• Handling conflicts
– Detecting conflicts
– Resolving conflicts
Performance Issues in File Servers
• Components of server load
– Network protocol handling
– File system implementation
– Disk accesses
• Read operations
– Metadata
– Data
• Write operations
– Metadata
– Data
• Workload characterization
DFS for High-Speed Networks:
DAFS
• Proposal from Network Appliance and companies
• Goal: eliminate memory copies and protocol processing
– Standard implementation: network buffers  file system buffer
cache  user-level application buffers
• Designed to take advantage of RDMA (“Remote DMA”)
network protocols
– Network transport provides direct memory  memory transfer
– Protocol processing is provided in hardware
• Suitable for high-bandwidth, low-error-rate, low-latency
network
DAFS Protocol
• Data read from the client:
– RDMA request from the server to copy file data directly into application
buffer
• Data write from the client
– RDMA request from the server to copy application buffer into server
memory
• Implementation:
– as a library linked to user application interface with RDMA network
library directly
• Eliminate two data copies
– as a new file system implementation in the kernel
• Eliminate one data copy
• Performance advantage:
– Example: 90 usec/op in NFS vs. 25 usec/op in DAFS
DAFS Features
•
•
•
•
•
•
Session-based
Offer authentication of client machines
Flow control by server
Stateful lock implementation with leases
Offers atomic writes
Offers operator batching
Clustered File Servers
• Goal: scalability in file service
– Build a high-performance file service using a collection of cheap
file servers
• Methods for Partitioning the Workload
– Each server can support one “subtree”
• Advantages
• Disadvantages
– Each server can support a group of clients
• Advantages
• Disadvantages
– Client requests are sent to server in round-robin or load-balanced
fashion
• Advantages
• Disadvantages
Non-Subtree-Partition Clustered File
Servers
• Design issues
– On which disks should the data be stored?
– Management of memory cache in file servers
– Data consistency management
• Metadata operation consistency
• Data operation consistency
– Server failure management
• Single server failure fault tolerance
• Disk failure fault tolerance
Mapping Between Disks and Servers
• Direct-attached disks
• Network-attached disks
– Fiber-channel attached disks
– iSCSI attached disks
• Managing the network-attached disks:
“volume manager”
Functionalities of a Volume Manager
• Group multiple disk partitions into a “logical” disk
volume
• Volume can expand or shrink in size without
affecting existing data
• Volume can be RAID-0/1/5, tolerating disk
failures
• Volume can offer “snapshot” functionalities for
easy backup
• Volumes are “self-evident”
Implementations of Volume
Manager
• In-kernel implementation
– Example: Linux volume manager, Veritas
volume manager, etc.
• Disk server implementation
– Example: EMC storage systems
Serverless File Systems
• Serverless file systems in WAN
– Motivation: peer-to-peer storage; never lose the
file
• Serverless file system in LAN
– Motivation: client powerful enough to be like
servers; use all client’s memory to cache file
data