Slides - WWW4 Server
Download
Report
Transcript Slides - WWW4 Server
Computer Science
EMFS: Email-based Personal Cloud Storage
NAS 2011
Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu
•1/32
Agenda
Introduction
Data Organization and Access
Email-based File System Design
Performance Evaluation
Related Work
Conclusion
Computer Science
•2/32
Motivation
Existing personal cloud storage services
o Tie storage with internal data format and processing applications
o Non-free general-purpose storage and not widely utilized
Existing email services
o The capacity of a single email account has increased dramatically
o Provided by many reliable and reputable online service providers
Leveraging existing email services
o Benefit service providers as it extends their access to valuable customer
data
Computer Science
•3/32
EMFS Overview
Target Workload and Assumptions
o Typical personal workload
Reading, editing, and backing up documents such as Word, pdf, etc.
Targets file sizes ranging from several KBs to tens of MBs
o Users will not share storage with others or allow concurrent access to
his/her data.
Design Goals
o Usability (generic file system interface)
o Scalability (extensible personal storage space)
o Reliability (access despite single email failure)
Computer Science
•4/32
EMFS System Architecture
•Email File System Interface through FUSE
•Memory Cache
•Email Mapping Service
•Local Cache
•Email Cloud Storage Interface
•striping
•striping
•replication
•replication
•replication
•…
•replication
•…
Computer Science
•5/3
Agenda
Introduction
Data Organization and Access
Email-based File System Design
Performance Evaluation
Related Work
Conclusion
Computer Science
•6/32
Data Organization and Access
File Organization
o Metadata
o File Data stored as attachments or in the body of emails
Computer Science
•7/32
Data Organization and Access cont’d
Metadata and Data Access
o Client cache management
o Metadata update
o Data access operations
Consistency and Failure Recovery
o Adopt a mechanism to ensure the atomicity of updates
•(a) Lost metadata update
Computer Science
•(b) Lost part of data update
Agenda
Introduction
Data Organization and Access
Email-based File System Design
Performance Evaluation
Related Work
Conclusion
Computer Science
•9/32
Email Protocol Selection
Simple Mail Transfer Protocol(SMTP)
o Only used for transferring emails to the server
o Restriction on number of messages sent through SMTP
Internet Message Access Protocol (IMAP)
o Support both sending and retrieving messages
o Allows users to “append” a message to their own mailbox
o Not limited by traffic restrictions
Post Office Protocol (POP)
o Primarily used for retrieving emails
o Supports simple download-and-delete access pattern
Computer Science
Email Protocol Selection cont’d
Email sending and appending performance
o IMAP is faster than SMTP in almost all cases, by 5.5% on average and up
to 42.64%
Computer Science
Data Placement Within Emails
Multiple places used to store data in an email
o
o
o
o
Headers
Subject line
Body
Attachment
o In EMFS
o Metadata is stored in the body section
o The unique identifiers are stored in the subject line
o Data can be stored either as attachments or in the body
Computer Science
Data Placement Within Emails cont’s
Single email sending/retrieving performance
o Similar performance regardless of whether the payload is placed in the
body or the attachment
o Attachment payload slightly outperforms the body payload with Gmail
Computer Science
Block Size and File Striping
Organize email accounts as a RAID
o Each account identified by a ”RAID Index” from 0 to n-1
o Data blocks striped across email accounts
o Blocks stored on randomly chosen disks instead of having a fixed array
of email disks and striping data in a round-robin manner
o Metadata emails are usually small, so they are not striped
EMFS uses 512KB as its default block size and 8 as the default
stripe width
Computer Science
Block Size and File Striping cont’d
Figure 5 measures a 4MB file’s read/write latency
o File access latency steadily decreases when we increase the file block
(attachment) size, for both Gmail and Gaweb mail
Computer Science
Block Size and File Striping cont’d
Figure 6 and 7 show the effect of striping with different block
sizes
o Striping provides a significant performance improvement
o Increasing the stripe width beyond 8 or the block size beyond 1MB does
not help the performance
o Block sizes smaller than 256KB degrades performance in almost all cases
Computer Science
Data Replication
Replication group
o Consists of two or more disks mirroring the same data
o Updates written to one of the email disks within the group
o Email disks (accounts) can be added or removed from a group
Replication Strategies
o Read-one and Write-one
All reads and writes from EMFS go to the same email account
o Read-fast and Write-fast
Reads and writes go to different accounts based on their uploading
and downloading performance
Computer Science
Agenda
Introduction
Data Organization and Access
Email-based File System Design
Performance Evaluation
Related Work
Conclusion
Computer Science
•18/32
EMFS Evaluation
System Implementation
o Prototype is based on FUSE
o Implemented in around 3000 lines of Python code
o Two replication strategies implemented for comparison
What we do
o Compare EMFS with three existing distributed file systems
o Use Postmark and IOZone and a synthetic file access benchmark
Experiment Setup
o Duo-core desktop (2.66 Ghz) with 3 GB of RAM running Ubuntu 8.10
o Both NFS and AFS servers were configured on dedicated machines
inside the campus network
o Jungle Disk was configured such that background or asynchronous
transfers were disabled
o EMFS was configured using accounts from Gmail and Gawab Mail
Computer Science
Performance Results – Postmark
Postmark measures performance for network based systems by
simulating access on short lived small files
Generate different workloads (equal bias, read heavy, append
heavy, and create heavy) by varying the operation bias
Settings
200 files
File size range from 4K to 16MB
200 transactions
Results
AFS and NFS perform better than EMFS
and Jungle Disk
EMFS offers comparable performance to
Jungle Disk
EMFS-Fast does offer better performance
than EMFS-One
Computer Science
Performance Results – IOZone
Unlike Postmark, IOZone mainly focuses on file data access
Settings
16 MB file
Request sizes range from 128
KB to 4 MB
Results
AFS and Jungle Disk achieve a transfer rate between 25 to 50 MB/s for sequential read
EMFS reports very high transfer rates
Jungle Disk reports very low throughput (about 550-600 KB/s) for random reads
Computer Science
Performance Results – IOZone cont’d
Settings
16 MB file
Request sizes range from 128
KB to 4 MB
Results
EMFS is slightly better than Jungle Disk in terms of write throughput
NFS and AFS are faster due to their high file transfer performance and low overhead
Computer Science
Performance Results – Editing Workload
A synthetic benchmark that simulates a document editing task
Settings
100 files, 14 directories (with
a maximum depth of 3)
File sizes range from 8KB to
4MB
Results
Lookup operations for AFS is
lightning fast
EMFS-Prefetch help reducing the
total lookup time by 17.4%
All systems perform nearly the same for editing operations.
EMFS-Fast does bring an improvement of 31% for file save operation, which is quite close to
Jungle Disk.
Computer Science
Agenda
Introduction
Data Organization and Access
Email-based File System Design
Performance Evaluation
Related Work
Conclusion
Computer Science
•24/32
Related Work
Email-based file systems
o
GmailFS [http://sr71.net/projects/gmailfs/]
YaFS [Lu, et al., IPDPS 2009]
Free email accounts for data backup [Traeger, et al., StorageSS 2006]
EMFS systematically examines email-based file system design issues
Other existing client-server systems
LftpFS [http://lftpfs.sourceforge.net/]
ExpandDrive [http://en.wikipedia.org/wiki/ExpanDrive]
o EMFS enables users to take advantage of widely available and
increasingly powerful web-based email services
Distributed file systems
NFS [Pawlowski, et al., USENIX 1994], AFS [Howard, et al., ACM
Trans 1998], LBFS [Muthitacharoen, et al., SOSP 2001], GFS
[Ghemawat, et al., SOSP 2003], and Ceph [Weil, et al., SODI 2006]
o EMFS complements existing studies on distributed file/storage systems
Computer Science
Conclusion
To our best knowledge, our work is the first that systematically
examines email-based file system design issues, and
thoroughly
Contributions
o Provides a personal cloud storage solution on top of multiple web-based
free email accounts
o Implements a prototype based on FUSE
o Evaluates the effectiveness of features such as multi-account space
aggregation, file striping, and data replication
Computer Science
•26/32
•Questions?
•Thank you
Computer Science
•27/32