Ivy: A Read/Write Peer-to-Peer File System Athicha

Download Report

Transcript Ivy: A Read/Write Peer-to-Peer File System Athicha

Ivy: A Read/Write
Peer-to-Peer File System
A. Muthitacharoen, R. Morris, T. M. Gil, and B. Chen
In Proceedings of OSDI ‘02
2003-4-29
Presenter : Chul Lee
What is IVY?
• A multi-user read/write peer-to-peer file
system
• No centralized/dedicated components
• Single file system image
• Conventional file system interface
– Case study of DHT use!
Ivy uses DHT
(Ivy)
Distributed application
put(key, data)
get (key)
Distributed hash table
lookup(key)
data
(DHash)
node IP address
Lookup service
• DHT provides
– Simple API
• Put(key, value) and get(key)  value
– Availability (Replication)
– Robustness (Integrity checking)
(Chord)
Prob.: Shared Data w/ DHT
DHT node
Root
Inode
Internet
Directory
Block
File1
Inode
File2
Inode
File3
Inode
File3
Data
Challenges
• Consistency of file system meta-data
• Locking is an unattractive approach
over unreliable participants.
• Undo modifications by untrustworthy
participants
• Operate while partitioned, repair
conflicting updates
Solution: Log Based
• Update: Each participant maintains a
log of changes to the file system
• Lookup: Each participant scans all logs
Software Structure
• Local NFS loop-back server
user
App
system
calls
NFS
Client
Ivy
Server
DHT
Node
Internet
NFS RPCs
kernel
DHT
Node
DHT
Node
DHT
Node
Example: Using Log
Local NFS Client
Local Ivy Server
LOOKUP(“d”, I-Num=10)
I-Num=1000
CREATE(“aaa”, I-Num=1000)
I-Num=9956
WRITE(“hello”, 0, I-Num=9956)
OK
• echo hello > d/aaa
• LOOKUP finds the I-Number of directory “d”
• CREATE creates file “aaa” in directory “d”
• WRITE writes “hello” at offset 0 in file “aaa”
Using Log: File Creation
Type: Create
I-num: 9956
…
Type: Link
Dir I-num: 1000
File I-num: 9956
Name: “aaa”
Type: Write
I-num: 9956
Offset: 0
Data: “hello”
Log
Head
• A log record describes a change to the file system
Using Log: Lookup
Type: Link
Dir I-num: 1000
File I-num: 9956
Name: “aaa”
Type: Link
Dir I-num: 1000
File I-num: 9876
Name: “bbb”
Type: Remove
Dir I-num: 1000
Name: “aaa”
• A scan follows the log backwards in time
• LOOKUP(name, dir I-num): last Link, but stop at Remove
• READDIR(dir I-num): accumulate Links, minus Removes
Contributions
• Multi-user read/write peer-to-peer
storage system
• Distributed file system with useful
integrity properties based on untrusted
components
• Use of distributed hash tables as a
building block
Design
•
•
•
•
•
DHash – maps keys to arbitrary values
Log Data Structure – a linked list
View – a set of logs
Combining logs – in ordering records
Snapshot – state of the file system
Log Data Structure
• A linked list of immutable log records
Log record types
• Roughly NFS update operations
• 160-bit i-numbers as file handle
User Cooperation: Views
• Set of logs that comprise the file system
• View block
– a immutable DHash content-hash block
Combining Logs
• Ivy orders records using version vectors
• Seq. field – starts from zero for each log
• Version vector: tuple (U:V) for each log
– U: Dhash key of the log-head
– V: Sequence number of the most recent record
• Example: (A:5 B:7)
– < (A:6 B:7) BUT concurrent with (A:6 B:6)
• Public keys used to order in case of
concurrency
Snapshots
• Each Ivy participant constructs a private
snapshot for speed
• Contains the entire state of the file
system
• Each snapshot stored in DHash for
persistency as content-hash blocks
Snapshot Data Structure
Application Semantics
• Concurrent Updates
• Partitioned Updates/ Conflict Resolution
Concurrent Updates
• Ivy does not serialize all updates
• Problem
– Unlink(“a”) and rename(“a”, “b”) at same
time
– Ivy correctly lets only one take effect
– But it may return “success” status for both
Partitioned Updates
• Ivy is not directly aware of partitions
– Ivy’s design maximizes availability at the
expense of consistency
– Letting updates proceed in all partitions
• All updates during a partition are
concurrent updates
• Conflict resolution -> “lc” tools
WAN Evaluation on MAB
•
•
•
•
•
•
Modified Andrew Benchmark
4 DHash nodes
Round-trip times: 9, 16, 82 milliseconds
No DHash replication
4 logs
One active writer
WAN Performance
Phase
Mkdir
Write
Stat
Read
Compile
Total
Ivy
NFS
11.2
89.2
65.6
65.8
144.2
376.0
4.8
42.0
47.8
55.6
130.2
280.4
Summary
• Exploring use of DHTs as a building
block
• Case study of DHT use: Ivy
– Read/write peer-to-peer file system
• Suitable for small groups of cooperating
participants who do not have a single
central server
Critiques
• Indefinite logs
• Scanning all logs for each request
• Rely on DHT’s block availability and
robustness
Discussion
• DHT interface
~ Disk Sector R/W interface
• Performance vs. Semantic
• Any other applications of DHT
– DB, LDAP server…