The design and implementation of a log-structured file system

Transcript The design and implementation of a log-structured file system

The design and implementation of
a log-structured file system
M. Rosenblum and J.K. Ousterhout
Proceedings of the 13th ACM Symposium on
Operating Systems Principles, December 1991
Things have changed …

CPU speeds have increased

Memories have become larger (cheaper)

Disk capacity has increased, but …
- disk performance has not kept pace
- dominated by seek & rotational latency
CS533 - Concepts of Operating Systems
2
Consequences …

Applications are disk-bound

File systems have large memory caches
- most read requests hit in cache
- so they never get to the disk
- but all writes must eventually go to disk!

Disk traffic is mostly writes!
- but data placement is optimized for reads!
CS533 - Concepts of Operating Systems
3
Why the poor performance?
• Data is updated in place
- can’t just write where the disk head is
• Meta-data is updated synchronously
- Even if data and meta-data blocks are clustered,
there is still some seeking
CS533 - Concepts of Operating Systems
4
Seek overhead in FFS
Creating a new file in FFS requires 5 disk I/Os:



2 for file i-node
1 for file data
2 for directory i-node and data
With small files, most of the time is spent in
seeking
CS533 - Concepts of Operating Systems
5
Log-structured file systems
• Buffer a series of writes in memory
• and write them asynchronously to disk
• Entire buffer copied to disk


in a single write to a contiguous segment
Includes data and meta data
• Allocate a new version instead of
updating the old one in place:
- All info on disk is in a single sequential
structure: the log
CS533 - Concepts of Operating Systems
6
Challenges for Log-structured FS
1) How to retrieve information from the log?
2) How to make sure there are large extents
of free space available for writing
contiguous log segments?
CS533 - Concepts of Operating Systems
7
File location and reading
Basic data structures analogous to Unix FFS:
 one inode per file:
 contains attributes, address of first 10
blocks or indirect blocks
But inodes are in the log, i.e. not at fixed
locations on disk…
So how do we find the right version?
CS533 - Concepts of Operating Systems
8
File location and reading
New data structure: inode map



Located in the log
Fixed checkpoint region on disk holds
addresses of all map blocks
Indexed by file id gives location of file’s inode
CS533 - Concepts of Operating Systems
9
Checkpoint regions
• Contains
- addresses of all blocks in inode map
- segment usage table
- current time
- pointer to last segment written
• Two of them, for safety
• Located at fixed positions on disk
• Used for crash recovery
CS533 - Concepts of Operating Systems
10
Free space management - 1
GOAL: keep large extents of free space
to write new data
• Divide disk into fixed-length segments
(512kB or 1MB)
• Write segments sequentially until end of
disk space
•- older segments get fragmented meanwhile
…and then?
CS533 - Concepts of Operating Systems
11
Free space management - 2
Need to clean segments periodically
Segment cleaning:
• Read a number of segments into memory
• Identify live data
• Write live data only back to smaller number of clean
segments
CS533 - Concepts of Operating Systems
12
Free space management
Old log end
Read these
segments
Free segment
Writing memory buffer
Cleaner thread: copy segments to memory buffer
CS533 - Concepts of Operating Systems
13
Free space management
Old log end
New log end
Writing memory buffer
Cleaner thread: identify live blocks
CS533 - Concepts of Operating Systems
14
Free space management
Old log end
New log end
Writing memory buffer
Cleaner thread: queue compacted data for writing
CS533 - Concepts of Operating Systems
15
Free space management
Old log end
New log end
Writing memory buffer
Writer thread: write compacted and new data to
segments, then mark old segments as free
CS533 - Concepts of Operating Systems
16
Implementation
Segment summary block – identifies each piece of
information in segment

E.g.: for a file, each data block identified by
version number+inode number (=unique
identifier, UID) and block number

Version number incremented in inode map when file
deleted

If UID of block not equal to that in inode map when
scanned, block is discarded
CS533 - Concepts of Operating Systems
17
Cleaning policies
1) Which segments to clean?
2) How should live blocks be grouped when
they are written out?
CS533 - Concepts of Operating Systems
18
Free space management – cleaning policies
Cleaning policies can be compared in terms of the Write
cost:
Write cost 
total bytes read and written N  N  u  N  1  u 
2


1  u 
new data written
N  1  u 
N = number of segments read
U = fraction of live data in read segments (0 u <1)
• Average amount of time disk is busy per byte if new data written
(seek and rot. latency negligible in LFS)
• Note: includes cleaning overhead
• Note dependence on u
CS533 - Concepts of Operating Systems
19
Cleaning policies
Low u = low write cost
• Note: underutilized disk gives low write cost, but high storage cost!
•…But u defined only for read segment (not overall)
• Achieve bimodal distribution: keep most segments nearly full, but a few
nearly empty (have cleaner work on these)
CS533 - Concepts of Operating Systems
20
Achieving a bimodal distribution?
• First attempt: cleaner always chooses lowest u segments and sorts
by age before writing – FAILURE!
• Free space in “cold” (i.e. more stable) segments is more “valuable”
(will last longer)
• Assumption: stability of segment proportional to age of youngest
block (i.e. older = colder)
• Replace greedy policy with Cost-benefit criterion
benefit free space generated  age 1  u   age


cost
cost
1 u
• Clean segments with higher ratio
• Still group by age before rewriting
CS533 - Concepts of Operating Systems
21
Cost-benefit - Results
• Left: bimodal distribution achieved - Cold cleaned at
u=75%, hot at u=15%
• Right: cost-benefit better, especially at utilization > 60%
CS533 - Concepts of Operating Systems
22
Performance – small files
• SunOS based on Unix FFS
• NB: best case for SpriteLFS: no cleaning overhead
• Sprite keeps disk 17% busy (85% for SunOS) and CPU saturated: will
improve with CPU speed (right)
CS533 - Concepts of Operating Systems
23
Performance – large files
Single 100MB file
• Traditional FS: logical locality – pay additional cost for
organizing disk layout, assuming read patterns
• LFS: temporal locality – group information created at the
same time – not optimal for reading randomly written files
CS533 - Concepts of Operating Systems
24
Performance – cleaning overhead
•
•
•
•
•
Statistics over several months of real usage
Previous results did not include cleaning
Write cost ranges 1.2-1.6 - more than half of cleaned segments empty
Cleaning overhead limits write performance: to ~70% of bandwidth
Improvement: cleaning could be performed at night or in idle periods
CS533 - Concepts of Operating Systems
25
Conclusions
• Prototype log-structured FS implemented/tested
• Due to cleaning overhead, segment cleaning policies are
crucial - tested in simulations before implementation
• Results in tests (without cleaning overhead)
 Higher performance than FFS in writes for both small and
large files
 Comparable read performance (except one case)
• Results in real usage (with cleaning)
 Simulation results confirmed
 70% of bandwidth can be used for writing
CS533 - Concepts of Operating Systems
26
References
• M. Rosenblum and J.Ousterhout, “The design and
implementation of a log-structured file system”,
Proceedings of the 13th ACM Symposium on Operating Systems
Principles, December 1991
• Marshall K. McKusick, William N. Joy, Samuel J.
Leffler, and Robert S. Fabry, “A Fast File System for
Unix”, ACM Transactions on Computer Systems, 2(3), August
1984, pp. 181-197
• A. Tanenbaum “Modern operating systems” 2nd ed.
(Chpt.4 “File systems”), Prentice Hall
CS533 - Concepts of Operating Systems
27

The design and implementation of a log-structured file system

Transcript The design and implementation of a log-structured file system

Directory