A Low-Bandwidth Network File System

Download Report

Transcript A Low-Bandwidth Network File System

A Low-Bandwidth Network
File System
A. Muthitacharoen, MIT
B. Chen, MIT
D. Mazieres, NYU
Key Ideas




A network file systems for slow or wide-area
networks
Exploits similarities between files or versions
of the same file
 Avoids sending data that can be found in
the server’s file system or the client’s cache
Also uses conventional compression and
caching
Requires 90% less bandwidth than traditional
network file systems
Working on slow networks



Make local copies
 Must worry about update conflicts
Use remote login
 Only for text-based applications
Use instead a LBFS
 Better than remote login
 Must deal with issues like auto-saves
blocking the editor for the duration of
transfer
LBFS




Exploits cross-file similarities especially with
previous versions of the same file
 Auto-save files, …
LBFS file server divides the files it stores into
chunks and indexes the chunks by hash value
LBFS client similarly indexes a large persistent
file cache
LBFS never transfers chunks that the
recipient already has
Previous Work (I)




AFS Callbacks require server to notify clients
when a cached file has been modified
Leases achieve same goal but have an
expiration time
Coda supports slow networks and even
disconnected operation
 Defers some updates to saves bandwidth
OceanStore applies Bayou’s conflict resolution
mechanisms to a file system
Previous Work (II)



Operation-based updates (Lee et al.)
 Proxy-client close to the server duplicates
client computations in the hope of
duplicating its output files
Spring and Wetherall propose to use two
large cooperating caches storing identical
copies of the last n megabytes of network
traffic
Rsync uses directory tree mirroring at client
and server.
LBFS



LBFS provides close-to-open consistency
 Similar to AFS session consistency
LBFS assumes clients will have a cache large
enough to contain a user’s entire working set
of files
When possible, LBFS reconstitutes files using
chunks of existing data in the file system and
client cache instead of transmitting those
chunks over the network
Indexing Issues

Major challenge is keeping the index a
reasonable size while dealing with shifting
offsets
 Indexing conventional file blocks would not
work
 Indexing and hashing overlapping file
blocks at all offsets would require too
much space
LBFS Solution




Considers only non-overlapping chunks of
files
Sets chunk boundaries based on file contents
to avoid sensitivity to shifting file offset
Examines every overlapping 48-byte region of
the file to selects boundary regions, or
breakpoints, using Rabin fingerprints
Expected chunk size is 8 KB plus the size of
the 48-byte breakpoint window
Handling Insertions
More Indexing Issues


Pathological cases
 Very small chunks
 Sending hashes of chunks would
consume as much bandwidth as just
sending the file
 Very large chunks
 Cannot be sent in a single RPC
LBFS imposes minimum and maximum chuck
sizes
The Chunk Database



Indexes each chunk by the first 64 bits of its
SHA-1 hash
To avoid synchronization problems, LBFS
always recomputes the SHA-1 hash of any
data chunk before using it
 Simplifies crash recovery
Recomputed SHA-1 values are also used to
detect hash collisions in the database
Protocol



Based on NFS version 3
Adds
 Extensions to exploit inter-file commonality
(GETHASH)
 Leases
Compresses all traffic using conventional gzip
File Consistency (I)



Whenever a client makes any RPC on an LBFS
file, it gets back a read lease on the file.
If a user opens a file whose lease has
expired, the client asks the server for the
attributes of the file
 Grants the client a lease on the file.
 Client can check if it has the current
version of the file in its cache
If the file times have changed, client must
obtain new contents of file from server
File Consistency (II)



No need for write leases
 LBFS provides close-to-open consistency
 Server never demands back a dirty file
If multiple clients are writing the same file,the
last one to close the file will overwrite
changes from the others
File updates are atomic
 Limits damage caused by concurrent
updates
Security Issues


LBFS uses SFS security infrastructure
 Servers have public keys
 Messages are encrypted
Specific security issue:

A user could check whether the file system
contains a particular chunk of data by
observing subtle timing differences in
server’s answer to CONDWRITE request
Implementation (I)
Implementation (II)



Uses NFS
Two NFS-related issues
 When server commits a temporary file to a
target file, it must copy the contents of the
temporary file onto the target file to
preserve the target file i-node
 Hard to preserve previous contents of a
truncated file
Message order is guaranteed by TCP
Evaluation (I)

Communality of data in /usr/local
Evaluation (II)

Normalized bandwidth consumption
(2 of 3 benchmarks)
Key




First four bars of each workload show
upstream bandwidth, the second four
downstream bandwidth.
CIFS is Windows natural network file system
“Leases+Gzip” uses LBFS file caching, leases,
and data compression but not its chunking
scheme
“LBFS, new DB” is LBFS starting with a a new
database
Evaluation (III)
Normalized application times
Key


Execution times weere normalized orma,ized
execution times Measurements made over a
cable modem link with 384 Kb/sc uplink and
1.5 Mb/s downlink
LAN data were obtained on a 100 Mb/s fullduplex LAN.
Conclusion


Under normal circumstances, LBFS consumes
90% less bandwidth than traditional file
systems.
Makes transparent remote file access a viable
and less frustrating alternative to running
interactive programs on remote machines.