NetworkFS-Andrea

Download Report

Transcript NetworkFS-Andrea

Network File
Systems II
Frangipani: A Scalable Distributed File System
A Low-bandwidth Network File System
Why Network File Systems?
• Scalability
– support more users and data
– handle server failure gracefully
• Improved accessibility
– allow more users access
– extend conditions under which access is
feasible
File System Requirements
• Coherence: consistent, predictable file state
• Efficiency: timely reads and writes
• Security: provide access control
• Recoverability: allow backup of file system
Frangipani and LBFS
• Frangipani file system: transparent scalability
– easy administration at any scale
– takes advantage of parallelism for good performance
• Low Bandwidth File System (LBFS): reduce
bandwidth to increase performance
– takes advantage of duplicate file information
– uses cacheing and compression to limit data volume
Features of Frangipani
• Petal: shared virtual disk
• Frangipani: provides naming and structure
for Petal
• Lock system: distributed across servers
• Leases: manage connections with lower
state requirements
• Backups: generated from Petal snapshots
using the recovery process
An Example Configuration
The Petal Virtual Disk
• Storage read/written in
blocks
• Sparse address space: 264
• Physical storage allotted
only on write
• Allows replication for
high availability
• Read-only snapshot
feature
Frangipani Disk Layout
• Region 1: Disk configuration info (1 TB)
• Region 2: Log space (1 TB), divided into 256
individual server logs
• Region 3: Allocation bitmaps (3 TB), chunks
owned by individual servers
More Frangipani Disk Layout
• Region 4: Inodes (1 TB), 231 512 byte inodes
• Region 5: Small data blocks (128 TB) 235 blocks
at 4 KB each
• Region 6: Large data blocks, 224 1 TB blocks
Frangipani Server Logs
• Bounded: 128 KB, split across physical disks
• Circular buffer scheme: 25% reclaimed when full
• Uses sequence numbers to mark wrap point
• 1000 to 1600 operations can be held in the log
(entry size 80 to 128 bytes)
Server Logging
• Write-ahead redo policy
• File metadata and physical
file dated updated on disk
after log write
• Unix daemon handles disk
writes every 30 seconds
Lock Service
• Many reader/single writer “sticky” locks
• Asynchronous communication
• Lamport’s Paxos algorithm replicates
infrequently-changed data
• Heartbeat messages determine liveness
Locking: Avoiding Contention
• Single lockable data structure per disk
sector eliminates false sharing
• Each file, directory, or symlink and its inode
treated as a single lockable segment
• Lock algorithm for aquiring multiple locks
to avoid deadlock
Crash Recovery
• Detection of server crash based on lapsed leases,
no network response
– Recovery daemon given now owns log and locks
– Metadata sequence nums prevent update replay
No high-level semantic guarantee
to users!
Petal snapshot can be used for entire
system recovery
Performance Benchmarks
Connectathon Benchmark
AdvFS Raw
AdvFS NVR
Frang. Raw
Frang. NVR
s tatfs
s ym li nk/re adl ink
l ink/re name
readdir
read
De s cription
write
chmod/s tat
ge twd/s tat
rm
cre ate
0
0.5
1
1.5
2
Se conds
2.5
3
3.5
Frangipani: Conclusions
• Frangipani meets the goals set for it:
– coherent access
– easy administration
– scalable performance (limit is network itself)
– good failure recovery
Testing on a larger scale will be the true
test of Frangipani
Introduction to LBFS
• Designed for efficient remote file access
over low bandwidth networks
– Exploits similarities between files and file
versions
– Client maintains a large cache of working files
– Compression further reduces data volume
– Uses NFS protocol for access control and
access to existing file systems
Why Do We Need LBFS?
• Typical network file systems designed for
10 Mbit/sec or better bandwidth
• Problems using FS over WAN:
–
–
–
–
interactive programs that freeze
batch commands that run several times slower
less agressive applications are starved
some applictions may not run at all!
Why LBS (contined)
• Downloading and editing files locally can
lead to version conflicts
• Upstream bandwidth is still limited with
broadband
LBFS eliminates these
problems while still
preserving consistency
LBFS File Chunk Scheme
In order to exploit commonality, files need to
be broken into chunks
• Server and client keep index of
hashed chunks
– Server index has chunk hashes for
entire FS
– Client index has chunk hashes for
working files
Chunk Creation Algorithm
• Need to handle shifting offsets while
keeping the chunk index managable
– Examine every overlapping 48 byte
region of the file
– With probability 2-13, consider a
region to be a breakpoint, or file
chunk end marker
Rabin Fingerprints
• Rabin fingerprints help find breakpoints
– Polynomial representation of data modulo an
irreducible polynomial
– When the low 13 bits of a region’s fingerprint
equals a certain number, then it is selected
– Given random data, the expected chunk size is
213 = 8192 = 8 KB + 48 byte breakpoint
File Revisions With Breakpoints
a. Original file
b. Text Insertion
c. Insert that includes breakpoint
d. Elimination of a breakpoint
Breakpoint Pathological Cases
Data is usually not random! Worst case scenarios:
– All 48 byte regions are breakpoints: the chunk index
same size as file
– No 48 byte regions are breakpoints: large chunks take
extra time and memory for RPC
Solution: define bounds:
– min chunk size = 2 KB
– max chunk size = 64 KB
The Chunk Database
• Each chunk indexed by the first 64 bits of
its SHA-1 hash
• Keys index <file, offset, count> tuples:
must update when chunk changes
• LBFS always recomputes hash value before
use
– hash collisions are detected
– penalty of bad DB data only performance hit
Benefits Provided by NFS 3
• NFS 3 IDs files by opaque handles that
persist through file renaming
• Handles access control for LBFS
• Allows LBFS to use NFS protocol to access
existing file systems
• Disadvantage: i-number not changed when
file is overwritten, so extra copy required
LBFS Protocol Enhancements
• Leases save permissions
checks and data
validation for recentlyaccessed files
• Uses RPC, but with
agressive pipelining
• Gzip compression
Maintaining File Consistency
• Close-to-open consistency
• Client needs whole-file cache
• Multiple processes on a single client are
allowed write access to same file
simultaneously
– LBFS writes back to file system on each close
– Last close overwrites previous changes
Profile of a Read Request
Profile of a Write Request
Security: One Concern
• It is possible, through a
systematic use of the
CONDWRITE RPC call,
to determine whether a
particular hashed chunk
exists in the file system:
given away by response
time variations
LBFS Server Implementation
• LBFS can run on top of another FS
– server pretends to be an NFS client
• Server creates a .lbfs.trash dir at root of
every exported system
– stores temp files indefinitely and garbage collect a
random file when full
LBFS Client Implementation
• Client uses xfs device driver
– passes messages through device node in /dev
– xfs tells LBFS when to transfer file contents to/from
server
– LBFS fetches files to client cache, notifies xfs driver of
bindings between cache contents and open files
LBFS Performance Testing
LBFS consumed far less bandwidth and
allowed better application performance
under test conditions
– Workloads tested were typical applications of
MSWord, gcc, and ed
– CIFS, NFS, and AFS were tested (based on
workload) for comparison
– Also tested a “Leases and Gzip” only version
LBFS: Conclusions
• In low-bandwidth networks, LBFS outperforms the traditional file systems tested
– similar consistency guarantees
– implemented as transparent layer on top of an
existing file system
– public key cryptography provides security
– client cacheing distributes load and reduces
network dependency
Last Word: Frangipani & LBFS
• Both Frangipani and LBFS meet file system
and distributed system requirements, but
targeted different problems:
– Frangipani achieved transparent scalability
without performance loss
– LBFS achieved feasible performance over
WANs as a transparent add-on to a traditional
FS using improved protocols and load sharing