Garth A. Gibson*, David F. Nagle**, William Courtright II

Download Report

Transcript Garth A. Gibson*, David F. Nagle**, William Courtright II

Garth A. Gibson*, David F. Nagle**, William Courtright II*,
Nat Lanza*, Paul Mazaitis*, Marc Unangst*, Jim Zelenka*
"NASD
Scalable Storage Systems",USENIX99, Extreme Linux
Workshop, Monterey, CA, June 1999.
http://www.pdl.cs.cmu.edu/Publications/publications.html)
Motivation
• NASD minimizes server based data movement and separates
management and filesystem sematics from store-and-forward copying
• Figure 1: Standalone server with attached disks
– Look at long path requests and data take through OS layers and through
various machines
• Reference implementation of NASD for Linux 2.2 including NASD
device code that runs on workstation or PC masquerading as
subsystem or disk drive
• NFS-like distributed file system that uses NASD subsystems or
devices
• NASD striping middleware for large striped files
Figure 1 -- NetSCSI and NASD
• Figure 1 outlines data path where clients ask for data, servers forward
request to storage -- forwarded request is a DMA command to return
data directly to a client.
– When DMA is complete, status is returned to server and collected and
forwarded to client
• NASD
–
–
–
–
On first access, client contacts server for access checks
Server grants reusable rights or capabilities
Clients then present requests directly to storage
Storage verifies capabilities and directly replies
NASD Interface
•
•
•
•
•
Read, write object data
Read, write object attributes
Create, resize, remove soft partitions
Construct copy-on-write version of object
Logical version number on file can be changed by file manager to
revoke capability
NASD Security
• Security protocol
– Capability has public portion -- CAapArg, private key CapKey
– CapArg specifies what rights are being granted for which object
– CapKey is a keyed message digest of CapArg and a secret key shared only
with target drive
– Client sends CapArg with each request, gnerates a CapKey-keyed digest
of request parameters and CapArg
– Each drive knows its secret keys and receives CapArg with each request
– Can compute client’s CapKey and verify request
– If any field of CapArg or request has been changed, digest comparison
will fail
– Scheme protects integrity of requests but does not protect privacy of data
Filesystems for NASD
•
•
•
•
•
Constructed distributed file system with NFS-like semantics tailored for
NASD
Each file and directory occupies exactly one NASD object, offsets in files are
same as offsets in objects
File length, last file modify time correspond directly to NASD-maintained
object attributes
Remainder of file attributes stored in uninterpreted section of object’s
attributes
Data moving operations -- read, write) and attribute reads (getattr) are sent
directly to NASD drive
– file attributes are either computed from NASD object attributes (e.g. modify times
and object size) or stored in the uninterpreted filesystem-specific attribute
•
•
Other requests are handled by file manager
Capabilities are piggybacked on file manager’s response to lookup operations
Access to Striped Files and
Continuous Media
•
•
•
•
•
NASD-optimized parallel filesystem
Filesystem manages objects not directly backed by data
Backed by storage manager which redirects clients to component NASD
objects
NASD PFS supports SIO low-level parallel filesystem interface on top of
NASD-NFS files striped using user-level Cheops middleware
Figure 6
Garth A. Gibson, David F. Nagle, Khalil Amiri, Jeff Butler, Fay W. Chang,
Howard Gobioff, Charles Hardin, Erik Riedel, David Rochberg and Jim
Zelenka A cost-effective, high-bandwidth storage architecture.
Architectural Support for Programming Languages and Operating
Systems Proceedings of the 8th international conference on Architectural
support for programming languages and operating systems October 2 - 7,
1998, San Jose, CA USA Pages 92-103.
Evolution of storage architectures
•
•
•
•
•
Local Filesystem -- Simple- aggregate, application, file management
concurrency control, low level storage management. Data makes one trip of
peripheral area network such as SCSI. Disks offer fixed sized block
abstraction
Distributed Filesystem -- Intermediate server machine is introduced. Server
offers simple file access interface to clients.
Distributed Filesystem with RAID controller -- Interpose another computer -RAID controller.
Distributed Filesystem that employs DMA -- Can arrange to DMA data to
clients rather than to copy through server. HPSS is an example (although this
is not how it is usually employed).
NASD- based DFS, NASD-Cheops based DFS
Principals of NASD
• Direct transfer -- data moved between drive and client without
indirection or store-and-forward through file server
• Asynchronous oversight -- Ability of client to perform most operations
without synchronous appeal to the file manager
• Cryptographic integrity -- Drives ensure that commands and data have
not been tampered with by generating and verifying cryptographic
keyed digests
• Object based interface -- Drives export variable length objects instead
of fixed-size blocks. Allows disk drives to direct knowledge of
relationships between disk blocks and minimize security overhead.
Prototype Implementation
• NASD prototype drive runs on 133MHz, 64MB, Dec Alpha 3000/400
with two Seagate ST52160 disks attached by two 5 MB/s SCSI busses
• Intended to simulate a controller and drive
• NASD system implements own internal object access, cache, disk
space management modules
• Figure 6 -- Performance for sequential reads and writes
–
–
–
–
Sequential bandwidth as function of request size
NASD better tuned for disk access on reads that miss cache
FFS better tuned for cache accesses
Write performance of FFS due to immediate acknowledgement for writes
up to 64KB
Scalability
• 13 NASD drives, each linked by OC-3 ATM to 10 client machines
• Each client issues series of sequential 2MB read requests striped across
four NASDs.
• Each NASD can deliver 32MB/s from cache to RPC protocol stack
• DCE RPC cannot push more than 80Mb/s through a 155 Mb/s ATM
link before receiving client saturates
• Figure 7 demonstrates close to linear scaling up to 10 clients
Computational Requirements
•
•
•
Table 1 -- number of instructions needed to service given request size
including all communications (DCE RPC, UDP/IP)
Overhead mostly due to communications
Significantly more expensive than Seagate Barracuda
Filesystems for NASD
•
•
•
•
•
•
•
•
NFS covered in last paper
AFS -- lookup operations carried out by parsing directory files locally
AFS RPCs added to obtain and relinquish capabilities explicitly
AFS’s sequential consistency provided by breaking callbacks
(notifying holders of potentially stale copies) when a write capability is
issued
File manager does’nt know that a write operation has arrived at a drive
so it must tell clients when a write may occur
No new callbacks on file with outstanding write capability
AFS enforces per-volume quota on allocated disk space
File manager allocates space when it issues a capability, and it keeps
track of how much space is actually written to
Active Disks
• Provide full application-level programmability of drives
• Customize functionality for data intensive computations
• NASD’s object based interface provides knowledge of data at devices
without having to use external metadata